An enhancement of data locality in Hadoop distributed file system

被引:0
|
作者
Reddy, A. Siva Krishna [1 ]
Sujatha, Pothula [1 ]
Koti, Prasad [2 ]
Dhavachelvan, P. [1 ]
Amudhavel, J. [3 ]
机构
[1] Pondicherry Univ, Dept Comp Sci, Pondicherry, India
[2] Saradha Gangadaran Coll, Dept Comp Sci, Pondicherry, India
[3] KL Univ, Dept CSE, Guntur, Andhra Pradesh, India
来源
关键词
DATA PLACEMENT; DISK CONSUMPTION; HADOOP; MAPREDUCE; STORAGE COST;
D O I
暂无
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
The MapReduce system has greater prevalence because of its advantages, for example, programming simplicity, fault tolerance and data distribution. The number of utilizations based on Hadoop is growing because of its robustness and features. Data locality is a critical issue in parallel data applications where the task processing is spending a various amount of time and resource at particular locations. Some methodologies have been proposed to enhance the data locality. In this paper, we identify the DP problem across nodes and improve the data locality. At first, MapReduce system divides the dataset into smaller subsets called data blocks. These data blocks are encoded with erasure coding to achieve the reliability. Then, the Flexible Data Placement (FDP) algorithm applies to the slave nodes (data nodes) which dynamically dispatches the data blocks based on their locality. It will reduce the collision of vulnerability, network traffic and increases the throughput of the Hadoop system. With the help of analytical model, execution time of every task is identified which detects the job with data locality problem. Then, the hash table is built for data blocks to the node. In data locality, a program is transferred to the node where the original data placed. Experiments are conducted on two real-world data sets with different data placement approaches, which show that the proposed methodology diminishes the execution time and upgrades the performance of 42.5%, which is the better performance than the existing methods.
引用
收藏
页码:123 / 133
页数:11
相关论文
共 50 条
  • [1] Data Security in Hadoop Distributed File System
    Shetty, Madhvaraj M.
    Manjaiah, D. H.
    [J]. IEEE INTERNATIONAL CONFERENCE ON EMERGING TECHNOLOGICAL TRENDS IN COMPUTING, COMMUNICATIONS AND ELECTRICAL ENGINEERING (ICETT), 2016,
  • [2] Hadoop Distributed File System for Big data analysis
    Almansouri, Hatim Talal
    Masmoudi, Youssef
    [J]. PROCEEDINGS OF 2019 IEEE 4TH WORLD CONFERENCE ON COMPLEX SYSTEMS (WCCS' 19), 2019, : 257 - 261
  • [3] The Hadoop Distributed File System
    Shvachko, Konstantin
    Kuang, Hairong
    Radia, Sanjay
    Chansler, Robert
    [J]. 2010 IEEE 26TH SYMPOSIUM ON MASS STORAGE SYSTEMS AND TECHNOLOGIES (MSST), 2010,
  • [4] Investigation of Replication Factor for Performance Enhancement in the Hadoop Distributed File System
    Ciritoglu, Hilmi Egemen
    de Almeida, Leandro Batista
    de Almeida, Eduardo Cunha
    Buda, Teodora Sandra
    Murphy, John
    Thorpe, Christina
    [J]. COMPANION OF THE 2018 ACM/SPEC INTERNATIONAL CONFERENCE ON PERFORMANCE ENGINEERING (ICPE '18), 2018, : 135 - 140
  • [5] Data Adaptively Storing Approach for Hadoop Distributed File System
    Fu, Yingxun
    Wen, Shilin
    Ma, Li
    [J]. 2017 2ND IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND APPLICATIONS (ICCIA), 2017, : 20 - 24
  • [6] A CKAN Plugin for Data Harvesting to the Hadoop Distributed File System
    Scholz, Robert
    Tcholtchev, Nikolay
    Laemmel, Philipp
    Schieferdecker, Ina
    [J]. CLOSER: PROCEEDINGS OF THE 7TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND SERVICES SCIENCE, 2017, : 19 - 28
  • [7] An Efficient Data Duplication System based on Hadoop Distributed File System
    Veeraiah, D.
    Rao, J. Nageswara
    [J]. PROCEEDINGS OF THE 5TH INTERNATIONAL CONFERENCE ON INVENTIVE COMPUTATION TECHNOLOGIES (ICICT-2020), 2020, : 197 - 200
  • [8] Complete Data Deletion Based on Hadoop Distributed File System
    Wang, Fulin
    Wu, Shunxiang
    Cai, Jianhuai
    Zhao, Longze
    Liao, Zhendong
    Ming, Daodong
    [J]. PROCEEDINGS OF THE THIRD INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND APPLICATION ENGINEERING (CSAE2019), 2019,
  • [9] Distributed File System to Leverage Data Locality for Large-File Processing
    da Silva, Erico Correia
    Sato, Liria Matsumoto
    Midorikawa, Edson Toshimi
    [J]. ELECTRONICS, 2024, 13 (01)
  • [10] Hadoop Distributed File System for the Grid
    Attebury, Garhan
    Baranovski, Andrew
    Bloom, Ken
    Bockelman, Brian
    Kcira, Dorian
    Letts, James
    Levshina, Tanya
    Lundestedt, Carl
    Martin, Terrence
    Maier, Will
    Pi, Haifeng
    Rana, Abhishek
    Sfiligoi, Igor
    Sim, Alexander
    Thomas, Michael
    Wuerthwein, Frank
    [J]. 2009 IEEE NUCLEAR SCIENCE SYMPOSIUM CONFERENCE RECORD, VOLS 1-5, 2009, : 1056 - +