A Novel Approach for Efficient Handling of Small Files in HDFS

被引:0
|
作者
Patel, Ankita [1 ]
Mehta, Mayuri A. [1 ]
机构
[1] Sarvajanik Coll Engn & Technol, Dept Comp Engn, Surat, India
关键词
Hadoop; HDFS; small files; file correlation; prefetching;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The Hadoop Distributed File System (HDFS) is a representative cloud storage platform having scalable, reliable and low-cost storage capability. It is designed to handle large files. Hence, it suffers performance penalty while handling a huge number of small files. Further, it does not consider the correlation between the files to provide prefetching mechanism that is useful to improve access efficiency. In this paper, we propose a novel approach to handle small files in HDFS. The proposed approach combines the correlated files into one single file to reduce the metadata storage on Namenode. We integrate the prefetching and caching mechanisms in the proposed approach to improve access efficiency of small files. Moreover, we analyze the performance of the proposed approach considering file sizes in range 32KB-4096KB. The results show that the proposed approach reduces the metadata storage compared to HDFS.
引用
收藏
页码:1258 / 1262
页数:5
相关论文
共 50 条
  • [21] MOSM: An Approach for Efficient Storing Massive Small Files on Hadoop
    Wang, Kun
    Yang, Yang
    Qiu, Xuesong
    Gao, Zhipeng
    2017 IEEE 2ND INTERNATIONAL CONFERENCE ON BIG DATA ANALYSIS (ICBDA), 2017, : 402 - 406
  • [22] Enhancing HDFS with a full-text search system for massive small files
    Xu, Wentao
    Zhao, Xin
    Lao, Bin
    Nong, Ge
    JOURNAL OF SUPERCOMPUTING, 2021, 77 (07): : 7149 - 7170
  • [23] Improving Hadoop Performance in Handling Small Files
    Mohandas, Neethu
    Thampi, Sabu M.
    ADVANCES IN COMPUTING AND COMMUNICATIONS, PT 4, 2011, 193 : 187 - 194
  • [24] Enhancing HDFS with a full-text search system for massive small files
    Wentao Xu
    Xin Zhao
    Bin Lao
    Ge Nong
    The Journal of Supercomputing, 2021, 77 : 7149 - 7170
  • [25] Hadoop Perfect File: A fast and memory-efficient metadata access archive file to face small files problem in HDFS
    Zhai, Yanlong
    Tchaye-Kondi, Jude
    Lin, Kwei-Jay
    Zhu, Liehuang
    Tao, Wenjun
    Du, Xiaojiang
    Guizani, Mohsen
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2021, 156 : 119 - 130
  • [26] Pseudo-Cache-Based IoT Small Files Management Framework in HDFS Cluster
    Isma Farah Siddiqui
    Nawab Muhammad Faseeh Qureshi
    Bhawani Shankar Chowdhry
    Muhammad Aslam Uqaili
    Wireless Personal Communications, 2020, 113 : 1495 - 1522
  • [27] Pseudo-Cache-Based IoT Small Files Management Framework in HDFS Cluster
    Siddiqui, Isma Farah
    Qureshi, Nawab Muhammad Faseeh
    Chowdhry, Bhawani Shankar
    Uqaili, Muhammad Aslam
    WIRELESS PERSONAL COMMUNICATIONS, 2020, 113 (03) : 1495 - 1522
  • [28] A Novel and Efficient De-duplication System For HDFS
    Ranjitha, S.
    Sudhakar, P.
    Seetharaman, K. S.
    2ND INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING, COMMUNICATION & CONVERGENCE, ICCC 2016, 2016, 92 : 498 - 505
  • [29] An archive-based method for efficiently handling small file problems in HDFS
    Liu, Junnan
    Jin, Shengyi
    Wang, Dong
    Li, Han
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2024, 36 (24):
  • [30] An Extended HDFS with an AVATAR NODE to handle both small files and to eliminate single point of failure
    Gupta, Tanvi
    Handa, S. S.
    2015 INTERNATIONAL CONFERENCE ON SOFT COMPUTING TECHNIQUES AND IMPLEMENTATIONS (ICSCTI), 2015,