Exploring Metadata Search Essentials for Scientific Data Management

被引:7
|
作者
Zhang, Wei [1 ]
Byna, Suren [2 ]
Niu, Chenxu [1 ]
Chen, Yong [1 ]
机构
[1] Texas Tech Univ, Dept Comp Sci, Lubbock, TX 79409 USA
[2] Lawrence Berkeley Natl Lab, Computat Res Div, Berkeley, CA USA
基金
美国国家科学基金会;
关键词
Metadata Indexing; Metadata Search; Data Management; HDF5;
D O I
10.1109/HiPC.2019.00021
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Scientific experiments and observations store massive amounts of data in various scientific file formats. Metadata, which describes the characteristics of the data, is commonly used to sift through massive datasets in order to locate data of interest to scientists. Several indexing data structures (such as hash tables, trie, self-balancing search trees, sparse array, etc.) have been developed as part of efforts to provide an efficient method for locating target data. However, efficient determination of an indexing data structure remains unclear in the context of scientific data management, due to the lack of investigation on metadata, metadata queries, and corresponding data structures. In this study, we perform a systematic study of the metadata search essentials in the context of scientific data management. We study a real-world astronomy observation dataset and explore the characteristics of the metadata in the dataset. We also study possible metadata queries based on the discovery of the metadata characteristics and evaluate different data structures for various types of metadata attributes. Our evaluation on real-world dataset suggests that trie is a suitable data structure when prefix/suffix query is required, otherwise hash table should be used. We conclude our study with a summary of our findings. These findings provide a guideline and offers insights in developing metadata indexing methodologies for scientific applications.
引用
收藏
页码:83 / 92
页数:10
相关论文
共 50 条
  • [31] Metadata management in a multiversion data warehouse
    Wrembel, R
    Bebel, B
    [J]. ON THE MOVE TO MEANINGFUL INTERNET SYSTEMS 2005: COOPIS, DOA, AND ODBASE, PT 2, PROCEEDINGS, 2005, 3761 : 1347 - 1364
  • [32] Metadata management in a big data infrastructure
    Holom, Roxana-Maria
    Rafetseder, Katharina
    Kritzinger, Stefanie
    Sehrschoen, Herald
    [J]. INTERNATIONAL CONFERENCE ON INDUSTRY 4.0 AND SMART MANUFACTURING (ISM 2019), 2020, 42 : 375 - 382
  • [33] Fast search for multimedia metadata in an XML data repository
    Qiu, SB
    Li, S
    [J]. INFORMATION TECHNOLOGY AND ORGANIZATIONS: TRENDS, ISSUES, CHALLENGES AND SOLUTIONS, VOLS 1 AND 2, 2003, : 539 - 542
  • [34] WHEN THE METADATA EXCEED THE DATA - DATA MANAGEMENT WITH UNCERTAIN DATA
    KLENSIN, JC
    [J]. STATISTICS AND COMPUTING, 1995, 5 (01) : 73 - 84
  • [35] A Universal Namespace Approach to Support Metadata Management and Efficient Data Convergence of HPC and Cloud Scientific Workflows
    Chen, Hsing-bung
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2018, : 516 - 521
  • [36] Data management essentials using SAS and JMP
    Shanmugam, Ramalingam
    [J]. JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2019, 89 (05) : 946 - 947
  • [37] Essentials of Scientific Method
    不详
    [J]. AMERICAN JOURNAL OF SOCIOLOGY, 1926, 31 (06) : 840 - 840
  • [38] EMPRESS: Accelerating Scientific Discovery through Descriptive Metadata Management
    Lawson, Margaret
    Gropp, William
    Lofstead, Jay
    [J]. ACM TRANSACTIONS ON STORAGE, 2022, 18 (04)
  • [39] Scientific provenance metadata capture and management using Semantic Web
    Department of Computer Science, Federal University of Juiz de Fora, Juiz de Fora, Minas Gerais, Brazil
    [J]. Int. J. Metadata Semant. Ontol., 2 (123-138):
  • [40] YConcept of metadata in scientific publications and the way from data to information
    Bögel, H
    [J]. ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2003, 225 : U560 - U560