DeStager: feature guided in-situ data management in distributed deep memory hierarchies

被引:2
|
作者
Zhang, Xuechen [1 ]
Zheng, Fang [2 ]
Bao Nguyen [1 ]
机构
[1] Washington State Univ, Sch Engn & Comp Sci, Vancouver, WA 98686 USA
[2] IBM TJ Watson Res Ctr, New York, NY USA
关键词
Indexing; R-tree; Octree; In-situ Analytics; SSDs; SIMULATION; COMBUSTION;
D O I
10.1007/s10619-018-7235-3
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In-situ analytics have been increasingly adopted by leadership scientific applications to gain fast insights into massive output data of simulations. With the current practice, systems buffer the output data in DRAM for analytics processing, constraining it to DRAM capacity un-used by the simulation. The rapid growth of data size requires alternative approaches to accommodating data-rich analytics, such as using solid-state disks to increase effective memory capacity. For this purpose, this paper explores software solutions for exploring the deep memory hierarchies expected on future high-end machines. Leveraging the fact that many analytics are sensitive to data features (regions-of-interest) hidden in the data being processed, the approach incorporates the knowledge of the data features into in-situ data management. It uses adaptive index creation/refinement to reduce the overhead of index management. In addition, it uses data features to predict data skew and improve load balance through controlling data distribution and placement on distributed staging servers. The experimental results show that such feature-guided optimizations achieve substantial improvements over state-of-the-art approaches for managing output data in-situ.
引用
收藏
页码:209 / 231
页数:23
相关论文
共 50 条
  • [21] Exploring Energy and Performance Behaviors of Data-Intensive Scientific Workflows on Systems with Deep Memory Hierarchies
    Game, Marc
    Rodero, Ivan
    Parashar, Manish
    Poole, Stephen
    2013 20TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING (HIPC), 2013, : 226 - 235
  • [22] SciDFS: An In-situ Processing System for Scientific Array Data based on Distributed File System
    Han, Donghyoung
    Nam, Yoon-Min
    Kim, Min-Soo
    Park, Kyongseok
    Han, Sunggeun
    2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP), 2018, : 375 - 382
  • [23] Evolutionary assimilation of streamflow in distributed hydrologic modeling using in-situ soil moisture data
    Dumedah, Gift
    Coulibaly, Paulin
    ADVANCES IN WATER RESOURCES, 2013, 53 : 231 - 241
  • [24] Proactive Buffer Management of Shared-Memory Switches for Distributed Deep Learning
    Ye, Jin
    Peng, Yajun
    Li, Yijun
    Huang, Jiawei
    PROCEEDINGS OF THE 8TH ASIA-PACIFIC WORKSHOP ON NETWORKING, APNET 2024, 2024, : 183 - 184
  • [25] A Geohash-based Index for Spatial Data Management in Distributed Memory
    Liu, Jiajun
    Li, Haoran
    Gao, Yong
    Yu, Hao
    Jiang, Dan
    2014 22ND INTERNATIONAL CONFERENCE ON GEOINFORMATICS (GEOINFORMATICS 2014), 2014,
  • [26] Lifetime-Based Memory Management for Distributed Data Processing Systems
    Lu, Lu
    Shi, Xuanhua
    Zhou, Yongluan
    Zhang, Xiong
    Jin, Hai
    Pei, Cheng
    He, Ligang
    Geng, Yuanzhen
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2016, 9 (12): : 936 - 947
  • [27] In-situ feature-based objects tracking for data-intensive scientific and enterprise analytics workflows
    Lasluisa, Solomon
    Zhang, Fan
    Jin, Tong
    Rodero, Ivan
    Hoang Bui
    Parashar, Manish
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2015, 18 (01): : 29 - 40
  • [28] In-situ feature-based objects tracking for data-intensive scientific and enterprise analytics workflows
    Solomon Lasluisa
    Fan Zhang
    Tong Jin
    Ivan Rodero
    Hoang Bui
    Manish Parashar
    Cluster Computing, 2015, 18 : 29 - 40
  • [29] Memory-based Data Management for Large-scale Distributed Rendering
    Zheng, Ran
    Jia, Jinli
    Jin, Hai
    Lv, Xinqiao
    Yang, Shuai
    2016 IEEE 13TH INTERNATIONAL CONFERENCE ON E-BUSINESS ENGINEERING (ICEBE), 2016, : 123 - 128
  • [30] DATA MANAGEMENT FOR A CLASS OF ITERATIVE COMPUTATIONS ON DISTRIBUTED-MEMORY MIMD SYSTEMS
    CORNEAHASEGAN, MC
    MARINESCU, DC
    ZHANG, ZY
    CONCURRENCY-PRACTICE AND EXPERIENCE, 1994, 6 (03): : 205 - 229