A Novel Approach for Efficient Handling of Small Files in HDFS

被引:0
|
作者
Patel, Ankita [1 ]
Mehta, Mayuri A. [1 ]
机构
[1] Sarvajanik Coll Engn & Technol, Dept Comp Engn, Surat, India
关键词
Hadoop; HDFS; small files; file correlation; prefetching;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The Hadoop Distributed File System (HDFS) is a representative cloud storage platform having scalable, reliable and low-cost storage capability. It is designed to handle large files. Hence, it suffers performance penalty while handling a huge number of small files. Further, it does not consider the correlation between the files to provide prefetching mechanism that is useful to improve access efficiency. In this paper, we propose a novel approach to handle small files in HDFS. The proposed approach combines the correlated files into one single file to reduce the metadata storage on Namenode. We integrate the prefetching and caching mechanisms in the proposed approach to improve access efficiency of small files. Moreover, we analyze the performance of the proposed approach considering file sizes in range 32KB-4096KB. The results show that the proposed approach reduces the metadata storage compared to HDFS.
引用
收藏
页码:1258 / 1262
页数:5
相关论文
共 50 条
  • [1] A Novel Approach for Efficient Accessing of Small Files in HDFS: TLB-MapFile
    Meng Bing
    Guo Wei-bin
    Fan Gui-sheng
    Qian Neng-wu
    2016 17TH IEEE/ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING AND PARALLEL/DISTRIBUTED COMPUTING (SNPD), 2016, : 681 - 686
  • [2] An efficient distributed caching for accessing small files in HDFS
    Kyoungsoo Bok
    Hyunkyo Oh
    Jongtae Lim
    Yosop Pae
    Hyoungrak Choi
    Byoungyup Lee
    Jaesoo Yoo
    Cluster Computing, 2017, 20 : 3579 - 3592
  • [3] An efficient distributed caching for accessing small files in HDFS
    Bok, Kyoungsoo
    Oh, Hyunkyo
    Lim, Jongtae
    Pae, Yosop
    Choi, Hyoungrak
    Lee, Byoungyup
    Yoo, Jaesoo
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2017, 20 (04): : 3579 - 3592
  • [4] A Novel Approach in Improving I/O Performance of Small Meteorological Files on HDFS
    Xue, Sheng-jun
    Pan, Wu-bin
    Fang, Wei
    MATERIALS AND COMPUTATIONAL MECHANICS, PTS 1-3, 2012, 117-119 : 1759 - +
  • [5] Hmfs: Efficient Support of Small Files Processing over HDFS
    Yan, Cairong
    Li, Tie
    Huang, Yongfeng
    Gan, Yanglan
    ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2014, PT II, 2014, 8631 : 54 - 67
  • [6] A Novel Approach to Improve the Performance of Hadoop in Handling of Small Files
    Gohil, Parth
    Panchal, Bakul
    Dhobi, J. S.
    2015 IEEE INTERNATIONAL CONFERENCE ON ELECTRICAL, COMPUTER AND COMMUNICATION TECHNOLOGIES, 2015,
  • [7] An Algorithm of Merging Small Files in HDFS
    Ren, Xianzhen
    Geng, Xiuhua
    Zhu, Yi
    2019 2ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND BIG DATA (ICAIBD 2019), 2019, : 24 - 27
  • [8] An optimized approach for storing small files on HDFS based on dynamic queue
    Jing, Weipeng
    Tong, Danyu
    2016 INTERNATIONAL CONFERENCE ON IDENTIFICATION, INFORMATION AND KNOWLEDGE IN THE INTERNET OF THINGS (IIKI), 2016, : 173 - 178
  • [9] Storage and Accessing Small Files Based on HDFS
    Mao, Yingchi
    Min, Wei
    PROCEEDINGS OF INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND INFORMATION TECHNOLOGY (CSAIT 2013), 2014, 255 : 565 - 573
  • [10] Improving Metadata Management for Small Files in HDFS
    Mackey, Grant
    Sehrish, Saba
    Wang, Jun
    2009 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING AND WORKSHOPS, 2009, : 621 - 624