Exploring Metadata Search Essentials for Scientific Data Management

被引:7
|
作者
Zhang, Wei [1 ]
Byna, Suren [2 ]
Niu, Chenxu [1 ]
Chen, Yong [1 ]
机构
[1] Texas Tech Univ, Dept Comp Sci, Lubbock, TX 79409 USA
[2] Lawrence Berkeley Natl Lab, Computat Res Div, Berkeley, CA USA
基金
美国国家科学基金会;
关键词
Metadata Indexing; Metadata Search; Data Management; HDF5;
D O I
10.1109/HiPC.2019.00021
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Scientific experiments and observations store massive amounts of data in various scientific file formats. Metadata, which describes the characteristics of the data, is commonly used to sift through massive datasets in order to locate data of interest to scientists. Several indexing data structures (such as hash tables, trie, self-balancing search trees, sparse array, etc.) have been developed as part of efforts to provide an efficient method for locating target data. However, efficient determination of an indexing data structure remains unclear in the context of scientific data management, due to the lack of investigation on metadata, metadata queries, and corresponding data structures. In this study, we perform a systematic study of the metadata search essentials in the context of scientific data management. We study a real-world astronomy observation dataset and explore the characteristics of the metadata in the dataset. We also study possible metadata queries based on the discovery of the metadata characteristics and evaluate different data structures for various types of metadata attributes. Our evaluation on real-world dataset suggests that trie is a suitable data structure when prefix/suffix query is required, otherwise hash table should be used. We conclude our study with a summary of our findings. These findings provide a guideline and offers insights in developing metadata indexing methodologies for scientific applications.
引用
收藏
页码:83 / 92
页数:10
相关论文
共 50 条
  • [1] Scientific data management with navigational metadata
    Stillerman, J.
    Greenwald, M.
    Wright, J.
    [J]. FUSION ENGINEERING AND DESIGN, 2018, 128 : 113 - 116
  • [2] Design of metadata in a hydrological integrated scientific data management system
    Liu, ZP
    Liang, Y
    [J]. PROCEEDINGS OF THE 7TH JOINT CONFERENCE ON INFORMATION SCIENCES, 2003, : 418 - 421
  • [3] Metadata management for scientific databases
    Pinoli, Pietro
    Ceri, Stefano
    Martinenghi, Davide
    Nanni, Luca
    [J]. INFORMATION SYSTEMS, 2019, 81 : 1 - 20
  • [4] Constructing a dataspace based on metadata and ontology for complicated scientific data management
    Ning, Hong
    Wang, Ting
    [J]. 2007 2ND INTERNATIONAL CONFERENCE ON PERVASIVE COMPUTING AND APPLICATIONS, VOLS 1 AND 2, 2007, : 466 - 469
  • [5] Technologies for metadata management in scientific articles
    Castro-Romero, Alexander
    Gonzalez-Sanabria, Juan S.
    Ballesteros-Ricaurte, Javier A.
    [J]. INGENIERIA Y COMPETITIVIDAD, 2015, 17 (02): : 123 - 134
  • [6] Data and metadata collections for scientific applications
    Rajasekar, AK
    Moore, RW
    [J]. HIGH-PERFORMANCE COMPUTING AND NETWORKING, 2001, 2110 : 72 - 80
  • [7] Essentials of data management: an overview
    Miren B. Dhudasia
    Robert W. Grundmeier
    Sagori Mukhopadhyay
    [J]. Pediatric Research, 2023, 93 : 2 - 3
  • [8] Essentials of data management: an overview
    Dhudasia, Miren B.
    Grundmeier, Robert W.
    Mukhopadhyay, Sagori
    [J]. PEDIATRIC RESEARCH, 2023, 93 (01) : 2 - 3
  • [9] A semantically enabled metadata repository for scientific data
    Anne Wilson
    Michael Cox
    Don Elsborg
    Doug Lindholm
    Tyler Traver
    [J]. Earth Science Informatics, 2015, 8 : 649 - 661
  • [10] Metadata Management for Data Lakes
    Ravat, Franck
    Zhao, Yan
    [J]. NEW TRENDS IN DATABASES AND INFORMATION SYSTEMS, ADBIS 2019, 2019, 1064 : 37 - 44