Parallel membership queries on very large scientific data sets using bitmap indexes

被引:10
|
作者
Yildiz, Beytullah [1 ]
Wu, Kesheng [1 ]
Byna, Suren [1 ]
Shoshani, Arie [1 ]
机构
[1] Lawrence Berkeley Natl Lab, Computat Res Div, Mail Stop 50B-3238,1 Cyclotron Rd, Berkeley, CA 94720 USA
来源
关键词
big data; bitmap index; data management; membership query; parallel query; scientific data;
D O I
10.1002/cpe.5157
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Many scientific applications produce very large amounts of data as advances in hardware fuel computing and experimental facilities. Managing and analyzing massive quantities of scientific data is challenging as data are often stored in specific formatted files, such as HDF5 and NetCDF, which do not offer appropriate search capabilities. In this research, we investigated a special class of search capability, called membership query, to identify whether queried elements of a set are members of an attribute. Attributes that naturally have classification values appear frequently in scientific domains such as category and object type as well as in daily life such as zip code and occupation. Because classification attribute values are discrete and require random data access, performing a membership query on a large scientific data set creates challenges. We applied bitmap indexing and parallelization to membership queries to overcome these challenges. Bitmap indexing provides high performance not only for low cardinality attributes but also for high cardinality attributes, such as floating-point variables, electric charge, or momentum in a particle physics data set, due to compression algorithms such as Word-Aligned Hybrid. We conducted experiments, in a highly parallelized environment, on data obtained from a particle accelerator model and a synthetic data set.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] Using Data Clustering to Optimize Scatter Bitmap Index for Membership Queries
    Weahama, Weahason
    Vanichayobon, Sirirut
    Manfuekphan, Jarin
    [J]. 2009 INTERNATIONAL CONFERENCE ON COMPUTER AND AUTOMATION ENGINEERING, PROCEEDINGS, 2009, : 174 - 178
  • [2] Managing PMU Data Sets with Bitmap Indexes
    McCamish, Ben
    Chiu, David
    Histand, Miles
    Landford, Jordan
    Bass, Robert B.
    Meier, Rich
    Cotilla-Sanchez, Eduardo
    [J]. 2014 IEEE CONFERENCE ON TECHNOLOGIES FOR SUSTAINABILITY (SUSTECH), 2014,
  • [3] High Performance Queries Using Compressed Bitmap Indexes
    Yildiz, Beytullah
    [J]. EURO-PAR 2019: PARALLEL PROCESSING WORKSHOPS, 2020, 11997 : 493 - 505
  • [4] Multi-resolution bitmap indexes for scientific data
    Sinha, Rishi Rakesh
    Winslett, Marianne
    [J]. ACM TRANSACTIONS ON DATABASE SYSTEMS, 2007, 32 (03):
  • [5] Collection and Exploration of Large Data Monitoring Sets Using Bitmap Databases
    Deri, Luca
    Lorenzetti, Valeria
    Mortimer, Steve
    [J]. TRAFFIC MONITORING AND ANALYSIS, PROCEEDINGS, 2010, 6003 : 73 - 86
  • [6] Efficient of bitmap join indexes for optimising star join queries in relational data warehouses
    Yahyaoui, Mohammed
    Amjad, Souad
    Benameur, Lamia
    Jellouli, Ismail
    [J]. International Journal of Computational Intelligence Studies, 2020, 9 (03) : 220 - 233
  • [7] Parallel acceleration of CPU and GPU range queries over large data sets
    Nelson, Mitchell
    Sorenson, Zachary
    Myre, Joseph M.
    Sawin, Jason
    Chiu, David
    [J]. JOURNAL OF CLOUD COMPUTING-ADVANCES SYSTEMS AND APPLICATIONS, 2020, 9 (01):
  • [8] Parallel acceleration of CPU and GPU range queries over large data sets
    Mitchell Nelson
    Zachary Sorenson
    Joseph M. Myre
    Jason Sawin
    David Chiu
    [J]. Journal of Cloud Computing, 9
  • [9] Parallel processing of very large databases using distributed column indexes
    E. V. Ivanova
    L. B. Sokolinsky
    [J]. Programming and Computer Software, 2017, 43 : 131 - 144
  • [10] Parallel Processing of Very Large Databases Using Distributed Column Indexes
    Ivanova, E. V.
    Sokolinsky, L. B.
    [J]. PROGRAMMING AND COMPUTER SOFTWARE, 2017, 43 (03) : 131 - 144