In-memory Query System for Scientific Datasets

被引:4
|
作者
Hsuan-Te, Chiu [1 ]
Chou, Jerry [1 ]
Vishwanath, Venkat [2 ]
Wu, Kesheng [3 ]
机构
[1] Natl Tsing Hua Univ, Hsinchu 30013, Taiwan
[2] Argonne Natl Lab, Argonne, IL 60439 USA
[3] Lawrence Berkeley Natl Lab, Berkeley, CA USA
关键词
In-situ computing; query-driven analysis; indexing; scientific data; distributed shared memory;
D O I
10.1109/ICPADS.2015.53
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The growing gap between compute performance and I/O bandwidth coupled with the increasing data volumes has resulted in a bottleneck to the traditional post-simulation data processing method. Hence in-situ computing and query-driven data analysis are important techniques to minimize data movement. By taking advantage of the growing memory capacity on supercomputers, we developed an in-memory query system for scientific data analysis. Our approach is a combination of bitmap indexing, spatial data layout re-organization, distributed shared memory, and location-aware parallel execution. Our evaluations using real scientific datasets showed that we can aggregate the memory capacity from thousands of computes nodes to analyze a 750GB simulation dataset without transferring data to remote nodes or storage systems. Comparing to traditional solutions based on out-of-core parallel file systems, we achieve significant higher query performance.
引用
收藏
页码:362 / 371
页数:10
相关论文
共 50 条
  • [41] Performance Optimization of In-Memory File System in Distributed Storage System
    Li, Zhaowei
    Yan, Yunlong
    Mo, Jintao
    Wen, Zhaocong
    Wu, Junmin
    2017 INTERNATIONAL CONFERENCE ON NETWORKING, ARCHITECTURE, AND STORAGE (NAS), 2017, : 280 - 281
  • [42] iSPEED: an Efficient In-Memory Based Spatial Query System for Large-Scale 3D Data with Complex Structures
    Liang, Yanhui
    Hoang Vo
    Kong, Jun
    Wang, Fusheng
    25TH ACM SIGSPATIAL INTERNATIONAL CONFERENCE ON ADVANCES IN GEOGRAPHIC INFORMATION SYSTEMS (ACM SIGSPATIAL GIS 2017), 2017,
  • [43] A Recommender System for Scientific Datasets and Analysis Pipelines
    Mazaheri, Mandana
    Kiar, Gregory
    Glatard, Tristan
    PROCEEDINGS OF 16TH WORKSHOP ON WORKFLOWS IN SUPPORT OF LARGE-SCALE SCIENCE (WORKS21), 2021, : 1 - 8
  • [44] In-memory databases
    Jenkins, C.
    Computer Bulletin (London, 1986), 2001, 3 (05):
  • [45] In-Memory Intelligence
    Finkbeiner, Tim
    Hush, Glen
    Larsen, Troy
    Lea, Perry
    Leidel, John
    Manning, Troy
    IEEE MICRO, 2017, 37 (04) : 30 - 38
  • [46] Design of System on Chip Based on In-Memory Computing Theory
    Wu, Jin
    Wang, Yu
    Shi, Xiangyang
    Guo, Ruiqing
    Proceedings - 2022 4th International Conference on Natural Language Processing, ICNLP 2022, 2022, : 558 - 562
  • [47] Implementation of Distributed In-Memory Moving Objects Management System
    Lee, H.
    Kwak, Y.
    Song, S.
    ADVANCED SCIENCE LETTERS, 2017, 23 (10) : 10361 - 10365
  • [48] In-memory, distributed content-based recommender system
    Simon Dooms
    Pieter Audenaert
    Jan Fostier
    Toon De Pessemier
    Luc Martens
    Journal of Intelligent Information Systems, 2014, 42 : 645 - 669
  • [49] SharkDB:An In-Memory Storage System for Massive Trajectory Data
    Wang, Haozhou
    Zheng, Kai
    Zhou, Xiaofang
    Sadiq, Shazia
    SIGMOD'15: PROCEEDINGS OF THE 2015 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2015, : 1099 - 1104
  • [50] Optimizing Pipelined Execution for Distributed In-Memory OLAP System
    Wang, Li
    Zhang, Lei
    Yu, Chengcheng
    Zhou, Aoying
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, DASFAA 2014, 2014, 8505 : 204 - 216