HDFSX: Big Data Distributed File System with Small Files Support

被引:0
|
作者
EIKafrawy, Passent M. [1 ]
Sauber, Amr M. [1 ]
Hafez, Mohamed M. [1 ]
机构
[1] Menoufia Univ, Fac Sci, Menoufia, Egypt
关键词
Big Data; Hadoop; HDFS; Small Files; HDFSX;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Hadoop Distributed File System (HDFS) is a file system designed to handle large files - which are in gigabytes or terabytes size - with streaming data access patterns, running clusters on commodity hardware. However, big data may exist in a huge number of small files such as: in biology, astronomy or some applications generating 30 million files with an average size of 190 Kbytes. Unfortunately, HDFS wouldn't be able to handle such kind of fractured big data because single Namenode is considered a bottleneck when handling large number of small files. In this paper, we present a new structure for HDFS (HDFSX) to avoid higher memory usage, flooding network, requests overhead and centralized point of failure (single point of failure "SPOF") of the single Namenode.
引用
收藏
页码:131 / 135
页数:5
相关论文
共 50 条
  • [1] Access efficiency of small sized files in Big Data using various Techniques on Hadoop Distributed File System platform
    Alange, Neeta
    Mathur, Anjali
    [J]. INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2021, 21 (07): : 359 - 364
  • [2] Distributed file system for rewriting Big Data files using a local-write protocol
    da Silva, Erico Correia
    Sato, Liria Matsumoto
    Midorikawa, Edson Toshimi
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2021, : 3646 - 3655
  • [3] Dealing with Small Files Problem in Hadoop Distributed File System
    Bende, Sachin
    Shedge, Ashree
    [J]. PROCEEDINGS OF INTERNATIONAL CONFERENCE ON COMMUNICATION, COMPUTING AND VIRTUALIZATION (ICCCV) 2016, 2016, 79 : 1001 - 1012
  • [4] A KIND OF DISTRIBUTED FILE SYSTEM BASED ON MASSIVE SMALL FILES STORAGE
    Liu, Di
    Kuang, Shi-Jie
    [J]. 2012 INTERNATIONAL CONFERENCE ON WAVELET ACTIVE MEDIA TECHNOLOGY AND INFORMATION PROCESSING (LCWAMTIP), 2012, : 394 - 397
  • [5] Hadoop Distributed File System for Big data analysis
    Almansouri, Hatim Talal
    Masmoudi, Youssef
    [J]. PROCEEDINGS OF 2019 IEEE 4TH WORLD CONFERENCE ON COMPLEX SYSTEMS (WCCS' 19), 2019, : 257 - 261
  • [6] A Novel Indexing Scheme for Efficient Handling of Small Files in Hadoop Distributed File System
    Chandrasekar, S.
    Dakshinamurthy, R.
    Seshakumar, P. G.
    Prabavathy, B.
    Babu, Chitra
    [J]. 2013 INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATION AND INFORMATICS, 2013,
  • [7] GDedup: Distributed File System Level Deduplication for Genomic Big Data
    Bartus, Paul
    Arzuaga, Emmanuel
    [J]. 2018 IEEE INTERNATIONAL CONGRESS ON BIG DATA (IEEE BIGDATA CONGRESS), 2018, : 120 - 127
  • [8] Computer Performance Determination System Based on Big Data Distributed File
    Lu, Kong
    [J]. CYBER SECURITY INTELLIGENCE AND ANALYTICS, 2020, 928 : 877 - 884
  • [9] An approach for Big Data Security based on Hadoop Distributed File system
    Mahmoud, Hadeer
    Hegazy, Abdelfatah
    Khafagy, Mohamed H.
    [J]. PROCEEDINGS OF 2018 INTERNATIONAL CONFERENCE ON INNOVATIVE TRENDS IN COMPUTER ENGINEERING (ITCE' 2018), 2018, : 109 - 114
  • [10] Key technology in distributed file system towards big data analysis
    [J]. Zhou, J. (zhoujiang@ncic.ac.cn), 1600, Science Press (51):