Please use this identifier to cite or link to this item: http://dx.doi.org/10.25673/86229
Title: Dissecting self-describing data formats to enable advanced querying of file metadata
Author(s): Duwe, Kira
Kuhn, Michael
Issue Date: 2021
Type: Konferenzobjekt
Language: English
URN: urn:nbn:de:gbv:ma9:1-1981185920-881815
Subjects: Information systems
Hierarchical storage management
Computer systems organization
Client-server architectures
Distributed storage
Abstract: In times of continuously growing data sizes, performing insightful analysis is increasingly difficult. I/O libraries such as NetCDF and ADIOS2 offer options to manage additional metadata to make the data retrieval more efficient. However, queries on this metadata are difficult as it is currently stored inside the corresponding self-describing data formats. By replacing the file system underneath with the storage framework JULEA, we can use dedicated backends for keyvalue and object stores, as well as databases. Splitting the BP file content into file metadata and file data enables novel and highly efficient data management techniques without creating redundancy.We have kept our approach transparent to the application layer by implementing a custom ADIOS2 engine. Moreover, our data analysis interface allows speeding up metadata queries by a factor of up to 60,000 in comparison to the ADIOS2 API and data formats.
URI: https://opendata.uni-halle.de//handle/1981185920/88181
http://dx.doi.org/10.25673/86229
Open Access: Open access publication
License: (CC BY 4.0) Creative Commons Attribution 4.0(CC BY 4.0) Creative Commons Attribution 4.0
Sponsor/Funder: Transformationsvertrag
Publisher: Association for Computing Machinery
Publisher Place: New York
Original Publication: 10.1145/3456727.3463778
Appears in Collections:Fakultät für Informatik (OA)

Files in This Item:
File Description SizeFormat 
Duwe et al._Dissecting self-describing_2021.pdfZweitveröffentlichung1.01 MBAdobe PDFThumbnail
View/Open