Research on Small File Processing Technology Based on HDFS

被引:0
|
作者
Gu, Rui
机构
关键词
HDFS; cloud storage; small files; file merge; insert;
D O I
暂无
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
With the rapid development of the Internet and the rapid growth of Internet users, the Internet data is also a sharp expansion. The emergence of cloud computing is a good solution to the large data computing and storage problems, massive data storage and analysis has become a very popular research field. HDFS uses a single NameNode to manage the metadata of the entire system, and stores metadata in memory in order to improve access efficiency, but when the system stores a large number of small files, it generates a lot of metadata, occupies larger NameNode memory. In addition, a large number of small file access need to frequently send a request to the NameNode, resulting in the NameNode overload. In view of this problem, this paper analyzes some of the previous research and improvement programs, and on this basis to do a corresponding improvement. On the basis of the original distributed file system, an independent small file processing module was added. The small file processing module merged the small files, created the index of the file, and passed the file cache to HDFS for data processing.
引用
收藏
页码:286 / 289
页数:4
相关论文
共 50 条
  • [1] An improved small file processing method for HDFS
    Wang, D. (wangdan@bjut.edu.cn), 1600, Advanced Institute of Convergence Information Technology, Myoungbo Bldg 3F,, Bumin-dong 1-ga, Seo-gu, Busan, 602-816, Korea, Republic of (06):
  • [2] An Improved HDFS for Small File
    Liu Changtong
    2016 18TH INTERNATIONAL CONFERENCE ON ADVANCED COMMUNICATIONS TECHNOLOGY (ICACT) - INFORMATION AND COMMUNICATIONS FOR SAFE AND SECURE LIFE, 2016, : 474 - 477
  • [3] A Distributed File System Based on HDFS
    Liu J.
    Leng F.-L.
    Li S.-Q.
    Bao Y.-B.
    Dongbei Daxue Xuebao/Journal of Northeastern University, 2019, 40 (06): : 795 - 800
  • [4] AN EFFECTIVE MERGE STRATEGY BASED HIERARCHY FOR IMPROVING SMALL FILE PROBLEM ON HDFS
    Gao, Zhipeng
    Qin, Yinghao
    Niu, Kun
    PROCEEDINGS OF 2016 4TH IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND INTELLIGENCE SYSTEMS (IEEE CCIS 2016), 2016, : 327 - 331
  • [5] An archive-based method for efficiently handling small file problems in HDFS
    Liu, Junnan
    Jin, Shengyi
    Wang, Dong
    Li, Han
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2024, 36 (24):
  • [6] Optimizing Small File Storage Process of the HDFS Which Based on the Indexing Mechanism
    Cheng, Wenjuan
    Zhou, Miaomiao
    Tong, Bing
    Zhu, Junhong
    2017 2ND IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND BIG DATA ANALYSIS (ICCCBDA 2017), 2017, : 44 - 48
  • [7] Theory and Technology Research on Software Health Based on HDFS
    You, Hangchao
    Li, Qiuying
    PROCEEDINGS OF THE 2015 3RD INTERNATIONAL CONFERENCE ON MACHINERY, MATERIALS AND INFORMATION TECHNOLOGY APPLICATIONS, 2015, 35 : 1097 - 1102
  • [8] On a Small File Merger for Fast Access and Modifiability of Small Files in HDFS
    Chen, Dingchao
    Wu, Chase Q.
    Shen, Wei
    Zhang, Yu
    2021 IEEE/ACS 18TH INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS (AICCSA), 2021,
  • [9] The Design of Distributed File System Based on HDFS
    Wang, Yannan
    Zhang, Shudong
    Liu, Hui
    APPLIED MATERIALS AND TECHNOLOGIES FOR MODERN MANUFACTURING, PTS 1-4, 2013, 423-426 : 2733 - 2736
  • [10] The Optimization Scheme Research of Small Files Storage Based on HDFS
    Mu, Qi
    Jia, Yikai
    Luo, Bibo
    2015 8TH INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DESIGN (ISCID), VOL 1, 2015, : 431 - 434