HMM Optimized Modeling of SSD Storage for I/O MapReduce Workloads

被引:0
|
作者
Alsayoud, Fatimah [1 ]
Miri, Ali [1 ]
机构
[1] Ryerson Univ, Dept Comp Sci, Toronto, ON, Canada
关键词
Flash resource management; R/W ratio; IO patterns; Hidden Markov Model; Storage policies; MapReduce Workloads;
D O I
10.1109/iemcon.2019.8936243
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Flash-based SSD draws a considerable interest in big data platforms due to its performance and reliability. However, it still has limited usage as a result of its high cost and limited capacity. Control SSD provisioning on big data platforms reduce storage cost and guarantees performance. The workload is an essential SSD provisioning sources, thus analyzing the characteristics of the workloads would help optimize SSD management design. There is a significant correlation between the workload's IO patterns and the SSD cost and performance. Big data platforms with multi-stage architecture bring challenges into modeling IO patterns where each stage has it is unique IO patterns. Also, big data platforms run on a distributed environment where the workloads are interacting with local and remote storage during the execution. The designed HMM-based IO patterns model considers IO patterns for MapReduce workloads at different stages and different SSD locations. In this paper, we proposed a platform-level SSD, cost-efficiency controller. The controller is responsible for maximizing the SSD lifespan on the Hadoop platform through two phases. First, modeling MapReduce workload's IO patterns by employing the Hidden Markov Model (HMM). Then, defining platform-level SSD allocation policies. The designed allocation policies reduce SSD utilization and improve SSD lifespan on Hadoop by up to %40 compared to static allocation policies.
引用
收藏
页码:177 / 183
页数:7
相关论文
共 50 条
  • [41] Evaluation of Linux I/O Schedulers for Big Data Workloads
    Rezgui, Abdelmounaam
    White, Matthew
    Rezgui, Sami
    Malik, Zaki
    2014 IEEE FOURTH INTERNATIONAL CONFERENCE ON BIG DATA AND CLOUD COMPUTING (BDCLOUD), 2014, : 227 - 234
  • [42] Modeling the aging process of flash storage by leveraging semantic I/O
    Deng, Yuhui
    Lu, Lijuan
    Zou, Qiang
    Huang, Shuqiang
    Zhou, Jipeng
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2014, 32 : 338 - 344
  • [43] Synthesizing representative I/O workloads using iterative distillation
    Kurmas, Z
    Keeton, K
    Mackenzie, K
    PROCEEDINGS OF THE 11TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON MODELING, ANALYSIS AND SIMULATION OF COMPUTER TELECOMMUNICATIONS SYSTEMS, 2003, : 6 - 15
  • [44] A study of hotspot data prediction model in I/O workloads
    Yang, Yin
    Tan, Zhihu
    Xie, Changsheng
    Liang, Wei
    Yu, Jie
    He, Jian
    INTERNATIONAL JOURNAL OF COMPUTER MATHEMATICS, 2014, 91 (03) : 403 - 433
  • [45] Characteristics of I/O traffic in personal computer and server workloads
    Hsu, WW
    Smith, AJ
    IBM SYSTEMS JOURNAL, 2003, 42 (02) : 347 - 372
  • [46] The impact of asynchronous I/O in checkpoint-restart workloads
    Devarajan, Hariharan
    Moody, Adam
    Dai, Donglai
    Stanavige, Cameron
    Gonsiorowski, Elsa
    McFadden, Marty
    Faaland, Olaf
    Kosinovsky, Greg
    Mohror, Kathryn
    2024 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS, IPDPSW 2024, 2024, : 397 - 405
  • [47] hfplayer: Scalable Replay for Intensive Block I/O Workloads
    Haghdoost, Alireza
    He, Weiping
    Fredin, Jerry
    Du, David H. C.
    ACM TRANSACTIONS ON STORAGE, 2017, 13 (04)
  • [48] Modeling and performance evaluation of hybrid storage I/O in Data Grid
    Liu, Zhaobin
    Li, Haitao
    2007 IFIP INTERNATIONAL CONFERENCE ON NETWORK AND PARALLEL COMPUTING WORKSHOPS, PROCEEDINGS, 2007, : 624 - +
  • [49] Characterizing Deep-Learning I/O Workloads in TensorFlow
    Chien, Steven W. D.
    Markidis, Stefano
    Sishtla, Chaitanya Prasad
    Santos, Luis
    Herman, Pawel
    Narasimhamurthy, Sai
    Laure, Erwin
    PROCEEDINGS OF 2018 IEEE/ACM 3RD JOINT INTERNATIONAL WORKSHOP ON PARALLEL DATA STORAGE & DATA INTENSIVE SCALABLE COMPUTING SYSTEMS (PDSW-DISCS), 2018, : 54 - 63
  • [50] Detecting I/O Access Patterns of HPC Workloads at Runtime
    Bez, Jean Luca
    Boito, Francieli Zanon
    Nou, Ramon
    Miranda, Alberto
    Cortes, Toni
    Navaux, Philippe O. A.
    2019 31ST INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD 2019), 2019, : 80 - 87