Predicting file lifetimes for data placement in multi-tiered storage systems for HPC

被引:0
|
作者
Thomas, Luis [1 ]
Gougeaud, Sebastien [2 ]
Rubini, Stephane [3 ]
Deniel, Philippe [2 ]
Boukhobza, Jalil [1 ]
机构
[1] ENSTA Bretagne, Lab STICC, CNRS, UMR 6285, Brest, France
[2] CEA, Bruyeres Le Chatel, France
[3] Univ Brest, Lab STICC, CNRS, UMR 6285, Brest, France
关键词
Data placement; Multi-Tier Storage; File lifetime; Convolutional Neural Network; Machine Learning; High Performance Computing; Heterogeneous Storage; Storage Hierarchy;
D O I
10.1145/3439839.3458733
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
The emergence of Exascale machines in HPC will have the foreseen consequence of putting more pressure on the storage systems in place, not only in terms of capacity but also bandwidth and latency. With limited budget we cannot imagine using only storage class memory, which leads to the use of a heterogeneous tiered storage hierarchy. In order to make the most efficient use of the high performance tier in this storage hierarchy, we need to be able to place user data on the right tier and at the right time. In this paper, we assume a 2-tier storage hierarchy with a high performance tier and a high capacity archival tier. Files are placed on the high performance tier at creation time and moved to capacity tier once their lifetime expires (that is once they are no more accessed). The main contribution of this paper lies in the design of a file lifetime prediction model solely based on its path based on the use of Convolutional Neural Network. Results show that our solution strikes a good trade-off between accuracy and under-estimation. Compared to previous work, our model made it possible to reach an accuracy close to previous work (around 98.60% compared to 98.84%) while reducing the underestimations by almost 10x to reach 2.21% (compared to 21.86%). The reduction in underestimations is crucial as it avoids misplacing files in the capacity tier while they are still in use.
引用
收藏
页码:99 / 107
页数:9
相关论文
共 50 条
  • [1] Predicting file lifetimes for data placement in multi-Tiered storage systems for HPC
    Thomas L.
    Gougeaud S.
    Rubini S.
    Deniel P.
    Boukhobza J.
    Operating Systems Review (ACM), 2021, 55 (01): : 99 - 107
  • [2] Data Jockey: Automatic Data Management for HPC Multi-Tiered Storage Systems
    Shin, Woong
    Brumgard, Christopher D.
    Xie, Bing
    Vazhkudai, Sudharshan S.
    Ghoshal, Devarshi
    Oral, Sarp
    Ramakrishnan, Lavanya
    2019 IEEE 33RD INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS 2019), 2019, : 511 - 522
  • [3] Load balancing and data placement for multi-tiered database systems
    Li, Wen-Syan
    Zilio, Daniel C.
    Batra, Vishal S.
    Zuzarte, Calisto
    Narang, Inderpal
    DATA & KNOWLEDGE ENGINEERING, 2007, 62 (03) : 523 - 546
  • [4] A Prefetching Scheme for Multi-tiered Storage Systems
    Chang, Hsung-Pin
    Chen, Chia-Yu
    Liu, Chien-Yi
    2018 IEEE SMARTWORLD, UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTING, SCALABLE COMPUTING & COMMUNICATIONS, CLOUD & BIG DATA COMPUTING, INTERNET OF PEOPLE AND SMART CITY INNOVATION (SMARTWORLD/SCALCOM/UIC/ATC/CBDCOM/IOP/SCI), 2018, : 1582 - 1586
  • [5] Adaptive Data Placement in Multi-Tiered Data Staging Runtime
    Jin, Tong
    Sun, Qian
    Romanus, Melissa
    Parashar, Manish
    NEW FRONTIERS IN HIGH PERFORMANCE COMPUTING AND BIG DATA, 2017, 30 : 175 - 196
  • [6] Live Data Migration For Reducing SLA Violations In Multi-tiered Storage Systems
    Tai, Jianzhe
    Sheng, Bo
    Yao, Yi
    Mi, Ningfang
    2014 IEEE INTERNATIONAL CONFERENCE ON CLOUD ENGINEERING (IC2E), 2014, : 361 - 366
  • [7] A Load-Balancing Data Caching Scheme in Multi-tiered Storage Systems
    Chang, Hsung-Pin
    Luo, Jhih-Cheng
    Chang, Da-Wei
    PROCEEDINGS OF 2016 IEEE 18TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS; IEEE 14TH INTERNATIONAL CONFERENCE ON SMART CITY; IEEE 2ND INTERNATIONAL CONFERENCE ON DATA SCIENCE AND SYSTEMS (HPCC/SMARTCITY/DSS), 2016, : 124 - +
  • [8] HCompress: Hierarchical Data Compression for Multi-Tiered Storage Environments
    Devarajan, Hariharan
    Kougkas, Anthony
    Logan, Luke
    Sun, Xian-He
    2020 IEEE 34TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM IPDPS 2020, 2020, : 557 - 566
  • [9] Scheduling Parallel Data Transfers in Multi-tiered Persistent Storage
    Nan Noon Noon
    Gettn, Janusz R.
    Xin, Tianbing
    RECENT CHALLENGES IN INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2022, 2022, 1716 : 437 - 449
  • [10] Automated Lookahead Data Migration in SSD-enabled Multi-tiered Storage Systems
    Zhang, Gong
    Chiu, Lawrence
    Dickey, Clem
    Liu, Ling
    Muench, Paul
    Seshadri, Sangeetha
    2010 IEEE 26TH SYMPOSIUM ON MASS STORAGE SYSTEMS AND TECHNOLOGIES (MSST), 2010,