Building a Version Control System in the Hadoop HDFS

被引:0
|
作者
Yeh, Tsozen [1 ]
Chien, Tingyu [1 ]
机构
[1] Fu Jen Catholic Univ, Dept CSIE, New Taipei, Taiwan
关键词
cloud computing; Hadoop; HDFS; big data;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The cloud computing has been widely used in recent years. It facilitates the realization of many cutting-edge studies including big data, Internet of Things, and many others. The success of cloud computing cannot be achieved without reliable cloud infrastructure to store and handle the enormous volume of data stored therein. It is common that the contents of individual data files consist of data inserted at different periods of times in the cloud environment. In other words, data files often have chronological versions of contents since their creation. Unfortunately, file contents could be contaminated by bad data or viruses resulting in errors during the course of data processing. It will be easier and faster for users to identify the cause of the error if they could examine and process prior versions of data files in question. Consequently, by keeping versions of data files, cloud systems could help users solve problems more rapidly when errors occur. Hadoop is literally one of the most popular platforms adopted in the community of cloud computing. We designed and implemented an efficient scheme in HDFS, the default file system in Hadoop, to automatically maintain versions of individual data files when changes made to them. As a result, our system can retrieve prior versions of data files and display discrepancy between versions to ameliorate the data management in cloud centers.
引用
收藏
页数:5
相关论文
共 50 条
  • [1] A Review on Hadoop - HDFS Infrastructure Extensions
    Karun, Kala A.
    Chitharanjan, K.
    [J]. 2013 IEEE CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGIES (ICT 2013), 2013, : 132 - 137
  • [2] Hadoop, MapReduce and HDFS: A Developers Perspective
    Ghazi, Mohd Rehan
    Gangodkar, Durgaprasad
    [J]. INTERNATIONAL CONFERENCE ON COMPUTER, COMMUNICATION AND CONVERGENCE (ICCC 2015), 2015, 48 : 45 - 50
  • [3] SD-HDFS: Secure Deletion in Hadoop Distributed File System
    Agrawal, Bikash
    Hansen, Raymond
    Rong, Chunming
    Wiktorski, Tomasz
    [J]. 2016 IEEE INTERNATIONAL CONGRESS ON BIG DATA - BIGDATA CONGRESS 2016, 2016, : 181 - 189
  • [4] SecDedoop: Secure Deduplication with Access Control of Big Data in the HDFS/Hadoop Environment
    Ramya, P.
    Sundar, C.
    [J]. BIG DATA, 2020, 8 (02) : 147 - 163
  • [5] 关于Hadoop中HDFS的研究
    刘涌
    裴春梅
    韩伟
    高震宇
    [J]. 电脑知识与技术, 2018, 14 (01) : 7 - 8
  • [6] A RAM triage methodology for Hadoop HDFS forensics
    Leimich, Petra
    Harrison, Josh
    Buchanan, William J.
    [J]. DIGITAL INVESTIGATION, 2016, 18 : 96 - 109
  • [7] Hadoop HDFS和MapReduce架构浅析
    郝树魁
    [J]. 邮电设计技术, 2012, (07) : 37 - 42
  • [8] Multicast-based Replication for Hadoop HDFS
    Wu, Jiadong
    Hong, Bo
    [J]. 2015 16TH IEEE/ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING AND PARALLEL/DISTRIBUTED COMPUTING (SNPD), 2015, : 143 - 148
  • [9] A DYNAMIC REPLICA STRATEGY BASED ON MARKOV MODEL FOR HADOOP DISTRIBUTED FILE SYSTEM (HDFS)
    Qu, Kaiyang
    Meng, Luoming
    Yang, Yang
    [J]. PROCEEDINGS OF 2016 4TH IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND INTELLIGENCE SYSTEMS (IEEE CCIS 2016), 2016, : 337 - 342
  • [10] A comparative between Hadoop MapReduce and Apache Spark on HDFS
    Saouabi, Mohamed
    Ezzati, Abdellah
    [J]. PROCEEDINGS OF THE 1ST INTERNATIONAL CONFERENCE ON INTERNET OF THINGS AND MACHINE LEARNING (IML'17), 2017,