DeStager: feature guided in-situ data management in distributed deep memory hierarchies

被引:2
|
作者
Zhang, Xuechen [1 ]
Zheng, Fang [2 ]
Bao Nguyen [1 ]
机构
[1] Washington State Univ, Sch Engn & Comp Sci, Vancouver, WA 98686 USA
[2] IBM TJ Watson Res Ctr, New York, NY USA
关键词
Indexing; R-tree; Octree; In-situ Analytics; SSDs; SIMULATION; COMBUSTION;
D O I
10.1007/s10619-018-7235-3
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In-situ analytics have been increasingly adopted by leadership scientific applications to gain fast insights into massive output data of simulations. With the current practice, systems buffer the output data in DRAM for analytics processing, constraining it to DRAM capacity un-used by the simulation. The rapid growth of data size requires alternative approaches to accommodating data-rich analytics, such as using solid-state disks to increase effective memory capacity. For this purpose, this paper explores software solutions for exploring the deep memory hierarchies expected on future high-end machines. Leveraging the fact that many analytics are sensitive to data features (regions-of-interest) hidden in the data being processed, the approach incorporates the knowledge of the data features into in-situ data management. It uses adaptive index creation/refinement to reduce the overhead of index management. In addition, it uses data features to predict data skew and improve load balance through controlling data distribution and placement on distributed staging servers. The experimental results show that such feature-guided optimizations achieve substantial improvements over state-of-the-art approaches for managing output data in-situ.
引用
收藏
页码:209 / 231
页数:23
相关论文
共 50 条
  • [1] DeStager: feature guided in-situ data management in distributed deep memory hierarchies
    Xuechen Zhang
    Fang Zheng
    Bao Nguyen
    Distributed and Parallel Databases, 2019, 37 : 209 - 231
  • [2] Feature Guided In-Situ Indices Generation and Data Placement on Distributed Deep Memory Hierarchies
    Zheng, Fang
    Nguyen, Bao
    Zhang, Xuechen
    2017 IEEE 23RD INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2017, : 258 - 265
  • [3] Data page layouts for relational databases on deep memory hierarchies
    Ailamaki, A
    DeWitt, DJ
    Hill, MD
    VLDB JOURNAL, 2002, 11 (03): : 198 - 215
  • [4] Data page layouts for relational databases on deep memory hierarchies
    Anastassia Ailamaki
    David J. DeWitt
    Mark D. Hill
    The VLDB Journal, 2002, 11 : 198 - 215
  • [5] Management of deep memory hierarchies - Recursive blocked algorithms and hybrid data structures for dense matrix computations
    Bo Kagstrom
    APPLIED PARALLEL COMPUTING: STATE OF THE ART IN SCIENTIFIC COMPUTING, 2006, 3732 : 21 - 32
  • [6] VISUAL DATA MINING FOR FEATURE SPACE EXPLORATION USING IN-SITU DATA
    Espinoza-Molina, Daniela
    Alonso, Kevin
    Datcu, Mihai
    2016 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS), 2016, : 5905 - 5908
  • [7] Distributed data integration prototype system for satellite, in-situ and model data
    Miura, Satoko Horiyama
    Aizawa, Kengo
    IGARSS: 2007 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, VOLS 1-12: SENSING AND UNDERSTANDING OUR PLANET, 2007, : 1366 - 1369
  • [8] Data Management for Extreme Scale In-situ Workflows
    Subedi, Pradeep
    Simonet, Anthony
    Davis, Philip E.
    Duan, Shaohua
    Wang, Zhe
    Parashar, Manish
    FUTURE TRENDS OF HPC IN A DISRUPTIVE SCENARIO, 2019, 34 : 82 - 97
  • [9] In-situ observations: Operational systems and data management
    Pouliquen, Sylvie
    OCEAN WEATHER FORECASTING: AN INTEGRATED VIEW OF OCEANOGRAPHY, 2006, : 207 - 227
  • [10] Exploring Data Staging Across Deep Memory Hierarchies for Coupled Data Intensive Simulation Workflows
    Jin, Tong
    Zhang, Fan
    Sun, Qian
    Bui, Hoang
    Romanus, Melissa
    Podhorszki, Norbert
    Klasky, Scott
    Kolla, Hemanth
    Chen, Jacqueline
    Hager, Robert
    Chang, Choong-Seock
    Parashar, Manish
    2015 IEEE 29TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2015, : 1033 - 1042