Distributed file system for rewriting Big Data files using a local-write protocol

被引:1
|
作者
da Silva, Erico Correia [1 ]
Sato, Liria Matsumoto [1 ]
Midorikawa, Edson Toshimi [1 ]
机构
[1] Univ Sao Paulo, Escola Politecn, Sao Paulo, Brazil
关键词
Distributed file systems; Hadoop; Big Data; Distributed lock management;
D O I
10.1109/BigData52589.2021.9671741
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the exponential volume growth of the data available for scientific and commercial use, more and more Big Data technologies are gaining focus and importance. Directly related to the efficiency of these techniques is the distributed file system used for data persistence, generally based on low-cost computer clusters. However, the environments used today for Big Data are based on file systems restricted to the WORM pattern (write once, read many) lacking POSIX compatibility. This work uses distributed lock management techniques to create a file system that allows random writing for both HPC and Big Data tools. A local write protocol is implemented to leverage the use of local copies of the data during the write process. Experiments were carried out to evaluate the performance of the proposed write protocol and the scalability of the developed file system. From the experimental results, it is possible to conclude that the achieved performance and scalability improvements were obtained by eliminating limitations imposed by HDFS and leveraging local writes.
引用
收藏
页码:3646 / 3655
页数:10
相关论文
共 50 条
  • [1] HDFSX: Big Data Distributed File System with Small Files Support
    EIKafrawy, Passent M.
    Sauber, Amr M.
    Hafez, Mohamed M.
    [J]. ICENCO 2016 - 2016 12TH INTERNATIONAL COMPUTER ENGINEERING CONFERENCE (ICENCO) - BOUNDLESS SMART SOCIETIES, 2016, : 131 - 135
  • [2] Access efficiency of small sized files in Big Data using various Techniques on Hadoop Distributed File System platform
    Alange, Neeta
    Mathur, Anjali
    [J]. INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2021, 21 (07): : 359 - 364
  • [3] Using Transparent Files in a Fault Tolerant Distributed File System
    Madruga, Marcelo
    Loest, Sergio
    Maziero, Carlos
    [J]. ISADS 2009: 2009 INTERNATIONAL SYMPOSIUM ON AUTONOMOUS DECENTRALIZED SYSTEMS, PROCEEDINGS, 2009, : 109 - 114
  • [4] Hadoop Distributed File System for Big data analysis
    Almansouri, Hatim Talal
    Masmoudi, Youssef
    [J]. PROCEEDINGS OF 2019 IEEE 4TH WORLD CONFERENCE ON COMPLEX SYSTEMS (WCCS' 19), 2019, : 257 - 261
  • [5] A multiple-file write scheme for improving write performance of small files in Fast File System
    Ahn, Woo Hyun
    Lee, Kyungjae
    Oh, Jaewon
    Min, Kyungsub
    Hong, Joon Sung
    [J]. INFORMATION PROCESSING LETTERS, 2009, 109 (18) : 1021 - 1026
  • [6] Developing a File System Structure to Solve Healthy Big Data Storage and Archiving Problems Using a Distributed File System
    Erguzen, Atilla
    Unver, Mahmut
    [J]. APPLIED SCIENCES-BASEL, 2018, 8 (06):
  • [7] PARADISE: Big data analytics using the DBMS tightly integrated with the distributed file system
    Kim, Jun-Sung
    Whang, Kyu-Young
    Kwon, Hyuk-Yoon
    Song, Il-Yeol
    [J]. WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2016, 19 (03): : 299 - 322
  • [8] PARADISE: Big data analytics using the DBMS tightly integrated with the distributed file system
    Jun-Sung Kim
    Kyu-Young Whang
    Hyuk-Yoon Kwon
    Il-Yeol Song
    [J]. World Wide Web, 2016, 19 : 299 - 322
  • [9] LEGAL FILES MANAGEMENT SYSTEM USING BIG DATA
    Aarthi, S.
    Siddharth, S.
    Athreya, Vishvak
    Balaji, Pavan
    [J]. PROCEEDINGS OF THE 3RD INTERNATIONAL CONFERENCE ON COMMUNICATION AND ELECTRONICS SYSTEMS (ICCES 2018), 2018, : 979 - 983
  • [10] GDedup: Distributed File System Level Deduplication for Genomic Big Data
    Bartus, Paul
    Arzuaga, Emmanuel
    [J]. 2018 IEEE INTERNATIONAL CONGRESS ON BIG DATA (IEEE BIGDATA CONGRESS), 2018, : 120 - 127