Data Jockey: Automatic Data Management for HPC Multi-Tiered Storage Systems

被引:12
|
作者
Shin, Woong [1 ]
Brumgard, Christopher D. [1 ]
Xie, Bing [1 ]
Vazhkudai, Sudharshan S. [1 ]
Ghoshal, Devarshi [2 ]
Oral, Sarp [1 ]
Ramakrishnan, Lavanya [2 ]
机构
[1] Oak Ridge Natl Lab, Oak Ridge, TN 37830 USA
[2] Lawrence Berkeley Natl Lab, Berkeley, CA USA
关键词
D O I
10.1109/IPDPS.2019.00061
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
We present the design and implementation of Data Jockey, a data management system for HPC multi-tiered storage systems. As a centralized data management control plane, Data Jockey automates bulk data movement and placement for scientific workflows and integrates into existing HPC storage infrastructures. Data Jockey simplifies data management by eliminating human effort in programming complex data movements, laying datasets across multiple storage tiers when supporting complex workflows, which in turn increases the usability of multi-tiered storage systems emerging in modern HPC data centers. Specifically, Data Jockey presents a new data management scheme called "goal driven data management" that can automatically infer low-level bulk data movement plans from declarative high-level goal statements that come from the lifetime of iterative runs of scientific workflows. While doing so, Data Jockey aims to minimize data wait times by taking responsibility for datasets that are unused or to be used, and aggressively utilizing the capacity of the upper, higher performant storage tiers. We evaluated a prototype implementation of Data Jockey under a synthetic workload based on a year's worth of Oak Ridge Leadership Computing Facility's (OLCF) operational logs. Our evaluations suggest that Data Jockey leads to higher utilization of the upper storage tiers while minimizing the programming effort of data movement compared to human involved, per-domain ad-hoc data management scripts.
引用
收藏
页码:511 / 522
页数:12
相关论文
共 50 条
  • [1] Predicting file lifetimes for data placement in multi-tiered storage systems for HPC
    Thomas, Luis
    Gougeaud, Sebastien
    Rubini, Stephane
    Deniel, Philippe
    Boukhobza, Jalil
    [J]. OPERATING SYSTEMS REVIEW, 2021, 55 (01) : 99 - 107
  • [2] Live Data Migration For Reducing SLA Violations In Multi-tiered Storage Systems
    Tai, Jianzhe
    Sheng, Bo
    Yao, Yi
    Mi, Ningfang
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON CLOUD ENGINEERING (IC2E), 2014, : 361 - 366
  • [3] A Load-Balancing Data Caching Scheme in Multi-tiered Storage Systems
    Chang, Hsung-Pin
    Luo, Jhih-Cheng
    Chang, Da-Wei
    [J]. PROCEEDINGS OF 2016 IEEE 18TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS; IEEE 14TH INTERNATIONAL CONFERENCE ON SMART CITY; IEEE 2ND INTERNATIONAL CONFERENCE ON DATA SCIENCE AND SYSTEMS (HPCC/SMARTCITY/DSS), 2016, : 124 - +
  • [4] HCompress: Hierarchical Data Compression for Multi-Tiered Storage Environments
    Devarajan, Hariharan
    Kougkas, Anthony
    Logan, Luke
    Sun, Xian-He
    [J]. 2020 IEEE 34TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM IPDPS 2020, 2020, : 557 - 566
  • [5] Scheduling Parallel Data Transfers in Multi-tiered Persistent Storage
    Nan Noon Noon
    Gettn, Janusz R.
    Xin, Tianbing
    [J]. RECENT CHALLENGES IN INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2022, 2022, 1716 : 437 - 449
  • [6] Automated Lookahead Data Migration in SSD-enabled Multi-tiered Storage Systems
    Zhang, Gong
    Chiu, Lawrence
    Dickey, Clem
    Liu, Ling
    Muench, Paul
    Seshadri, Sangeetha
    [J]. 2010 IEEE 26TH SYMPOSIUM ON MASS STORAGE SYSTEMS AND TECHNOLOGIES (MSST), 2010,
  • [7] Tiered data management system: Accelerating data processing on HPC systems
    Cheng, Peng
    Lu, Yutong
    Du, Yunfei
    Chen, Zhiguang
    [J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2019, 101 : 894 - 908
  • [8] Load balancing and data placement for multi-tiered database systems
    Li, Wen-Syan
    Zilio, Daniel C.
    Batra, Vishal S.
    Zuzarte, Calisto
    Narang, Inderpal
    [J]. DATA & KNOWLEDGE ENGINEERING, 2007, 62 (03) : 523 - 546
  • [9] A Prefetching Scheme for Multi-tiered Storage Systems
    Chang, Hsung-Pin
    Chen, Chia-Yu
    Liu, Chien-Yi
    [J]. 2018 IEEE SMARTWORLD, UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTING, SCALABLE COMPUTING & COMMUNICATIONS, CLOUD & BIG DATA COMPUTING, INTERNET OF PEOPLE AND SMART CITY INNOVATION (SMARTWORLD/SCALCOM/UIC/ATC/CBDCOM/IOP/SCI), 2018, : 1582 - 1586
  • [10] Adaptive Data Placement in Multi-Tiered Data Staging Runtime
    Jin, Tong
    Sun, Qian
    Romanus, Melissa
    Parashar, Manish
    [J]. NEW FRONTIERS IN HIGH PERFORMANCE COMPUTING AND BIG DATA, 2017, 30 : 175 - 196