Placement Scheduling for Replication in HDFS Based on Probabilistic Approach

被引:0
|
作者
Bui, Dinh-Mao [1 ]
Lee, Sungyoung [1 ]
机构
[1] Kyung Hee Univ, Dept Comp Engn, Suwon, South Korea
来源
关键词
Placement scheduling; HDFS; Execution time; Throughput; Probabilistic approach;
D O I
10.1007/978-3-319-39601-9_28
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Along with the rapid evolution in Big Data analysis, Apache Hadoop keeps the important role to deliver the high availability on top of computing clusters. Also, to maintain the high throughput access for computation, the Apache Hadoop is equipped with the Hadoop File System (HDFS) for managing the file operations. Besides, HDFS is ensured the reliability and high availability by using a specific replication mechanism. However, because the workload on each computing node is various, keeping the same replication strategy might result in imbalance. Targeting to solve this drawbacks of HDFS architecture, we proposes an approach to adaptively choose the placement for replicas. To do that, the network status and system utilization can be used to create the individual replication placement strategy for each file. Eventually, the proposed approach can provide the suitable destination for replicas to improve the performance. Subsequently, the availability of the system is enhanced while still keeping the reliability of data storage.
引用
收藏
页码:314 / 320
页数:7
相关论文
共 50 条
  • [1] Multicast-based Replication for Hadoop HDFS
    Wu, Jiadong
    Hong, Bo
    [J]. 2015 16TH IEEE/ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING AND PARALLEL/DISTRIBUTED COMPUTING (SNPD), 2015, : 143 - 148
  • [2] Supervised Learning based HDFS Replication Management System
    Ilakiyaa, R.
    Nalini, N. J.
    [J]. 2017 INTERNATIONAL CONFERENCE ON TECHNICAL ADVANCEMENTS IN COMPUTERS AND COMMUNICATIONS (ICTACC), 2017, : 116 - 120
  • [3] Replication Management Framework for HDFS based on Prediction Technique
    Bui, Dinh-Mao
    Thien Huynh-The
    Lee, Sungyoung
    Li, Bin
    Wang, Jin
    [J]. 2015 THIRD INTERNATIONAL CONFERENCE ON ADVANCED CLOUD AND BIG DATA, 2015, : 58 - 63
  • [4] Adaptive Replication Management in HDFS Based on Supervised Learning
    Bui, Dinh-Mao
    Hussain, Shujaat
    Huh, Eui-Nam
    Lee, Sungyoung
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2016, 28 (06) : 1369 - 1382
  • [5] Dynamic Replication Policy on HDFS Based on Machine Learning Clustering
    Ahmed, Motaz A.
    Khafagy, Mohamed H.
    Shaheen, Masoud E.
    Kaseb, Mostafa R.
    [J]. IEEE ACCESS, 2023, 11 : 18551 - 18559
  • [6] Probabilistic State Estimation based Scheduling Approach for Cloud Computing
    Zheng, Guide
    Chen, Ming
    [J]. RECENT TRENDS IN MATERIALS AND MECHANICAL ENGINEERING MATERIALS, MECHATRONICS AND AUTOMATION, PTS 1-3, 2011, 55-57 : 1053 - 1057
  • [7] A Machine Learning-Based Probabilistic Approach for Irrigation Scheduling
    Srivastava, Shivendra
    Kumar, Nishant
    Malakar, Arindam
    Choudhury, Sruti Das
    Ray, Chittaranjan
    Roy, Tirthankar
    [J]. WATER RESOURCES MANAGEMENT, 2024, 38 (05) : 1639 - 1653
  • [8] Fault Tolerant Erasure Coded Replication for HDFS Based Cloud Storage
    Ko, Aye Chan
    Zaw, Wint Thida
    [J]. 2014 IEEE FOURTH INTERNATIONAL CONFERENCE ON BIG DATA AND CLOUD COMPUTING (BDCLOUD), 2014, : 104 - 109
  • [9] The Dynamic Replication Mechanism of HDFS Hot File based on Cloud Storage
    Li, Mingyong
    Ma, Yan
    Chen, Meilian
    [J]. INTERNATIONAL JOURNAL OF SECURITY AND ITS APPLICATIONS, 2015, 9 (08): : 439 - 448
  • [10] Probabilistic Network-Aware Task Placement for MapReduce Scheduling
    Shen, Haiying
    Sarker, Ankur
    Yu, Lei
    Deng, Feng
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2016, : 241 - 250