Replication Management Framework for HDFS based on Prediction Technique

被引:5
|
作者
Bui, Dinh-Mao [1 ]
Thien Huynh-The [1 ]
Lee, Sungyoung [1 ]
Li, Bin [2 ]
Wang, Jin [2 ]
机构
[1] Kyung Hee Univ, Dept Comp Engn, Suwon, South Korea
[2] Yangzhou Univ, Coll Informat Engn, Yangzhou 225009, Jiangsu, Peoples R China
关键词
Replication; HDFS; proactive prediction; Bayesian Learning; Gaussian Process;
D O I
10.1109/CBD.2015.19
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The number of application based on Apache Hadoop is increasing dramatically due to the robustness and dynamic features of this system. At the heart of Apache Hadoop, the Hadoop File System (HDFS) provides the reliability, scalability and high availability to computation by applying a static replication strategy. However, because of the characteristics of parallel operations on the application layer, the accessing frequency for each data file in HDFS is totally different. Consequently, maintaining the same replicating mechanism for every data file might lead to bad effects on the performance. By rigorously considering the drawbacks of HDFS architecture, this paper proposes an approach to dynamically replicate the data file based on the predictive analysis. With the help of probability theory, the utilization of each data file can be predicted to create an individual replication strategy. Eventually, the data file can subsequently be replicated depending on its own access potential. Hence, this approach simultaneously improves the data locality while keeping the analogous redundancy of data storage in comparison with the default replicating scheme.
引用
下载
收藏
页码:58 / 63
页数:6
相关论文
共 50 条
  • [21] PACM: A Prediction-based Auto-adaptive Compression Model for HDFS
    Wang, Ruijian
    Wang, Chao
    Zha, Li
    2016 IEEE 30TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2016, : 1617 - 1626
  • [22] Avoiding Performance Impacts by Re-Replication Workload Shifting in HDFS Based Cloud Storage
    Shwe, Thanda
    Aritsugi, Masayoshi
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2018, E101D (12): : 2958 - 2967
  • [23] Research of Metadata Management Method of Hierarchical Storage System Based on HDFS
    Liu, Xiaoyu
    Xia, Libin
    Jiang, Xiaowei
    Sun, Gongxing
    Computer Engineering and Applications, 2023, 59 (17) : 257 - 265
  • [24] HDFS distributed metadata management research
    Xiong, An-ping
    Ma, Jin-yong
    PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON APPLIED SCIENCE AND ENGINEERING INNOVATION, 2015, 12 : 956 - 961
  • [25] Empirical validation for effectiveness of fault prediction technique based on cost analysis framework
    Kumar L.
    Rath S.K.
    Kumar, Lov (lovkumar505@gmail.com), 1600, Springer (08): : 1055 - 1068
  • [26] A Novel Prediction-Based Location Management Technique for Mobile Networks
    Biswash, Sanjay Kumar
    Kumar, Chiranjeev
    INTERNATIONAL JOURNAL OF MOBILE COMPUTING AND MULTIMEDIA COMMUNICATIONS, 2013, 5 (04) : 15 - 34
  • [27] Effective Management of ReRAM-based Hybrid SSD for Multiple Node HDFS
    Park, Nayoung
    Lee, Byungjun
    Kim, Kyung Tae
    Youn, Hee Yong
    INTERNATIONAL JOURNAL OF NETWORKED AND DISTRIBUTED COMPUTING, 2015, 3 (03) : 167 - 176
  • [28] Towards Adaptive Replication for Hot/Cold Blocks in HDFS using MemCached
    Liu, Pinchao
    Maruf, Adnan
    Yusuf, Farzana Beente
    Jahan, Labiba
    Xu, Hailu
    Guan, Boyuan
    Hu, Liting
    Iyengar, Sitharama S.
    2019 2ND INTERNATIONAL CONFERENCE ON DATA INTELLIGENCE AND SECURITY (ICDIS 2019), 2019, : 188 - 194
  • [29] In-Memory I/O and Replication for HDFS with Memcached: Early Experiences
    Islam, Nusrat Sharmin
    Lu, Xiaoyi
    Wasi-Ur-Rahman, Md
    Rajachandrasekar, Raghunath
    Panda, Dhabaleswar K.
    2014 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2014, : 213 - 218
  • [30] Performance Evaluation of HDFS in Big Data Management
    Dev, Dipayan
    Patgiri, Ripon
    2014 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND APPLICATIONS (ICHPCA), 2014,