A Novel and Efficient De-duplication System For HDFS

被引:5
|
作者
Ranjitha, S. [1 ]
Sudhakar, P. [1 ]
Seetharaman, K. S. [2 ]
机构
[1] Annauniv Chennai, Kamaraj Coll Engn & Technolgy, Virudunagar, India
[2] Annauniv Chennai, Velammal Coll Engn & Technolgy, Madurai, Tamil Nadu, India
关键词
De-duplication; Hadoop; Bigdata; HDFS(Hadoop Distributed File System); MD5; SHA-1;
D O I
10.1016/j.procs.2016.07.374
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Big Data is a frequent generation and updating of large volume of data around the clock across the globe by the users. Handling large volume of data in a real time environment is a challenging task. Distributed File System is one of the strategy to handle large volume of data in the real time. Distributed file system is a collection of independent computers that appear to the users of the system as a single coherent system. In Distributed file system common files can be shared between the nodes, the drawbacks are scalability, replication, availability and very expensive to buy a hardware server. To overcome this issue Hadoop Distributed File System came into existence. Hadoop distributed file system to run on cluster of commodity hardware like personal computer and laptop. HDFS provides the scalable, fault-tolerance, cost-efficient storage for Bigdata. Hadoop Distributed File System support data duplication to achieve high data reliability. However additional utilization of storage space is required due to duplication strategy. HDFS Storage space can be managed efficiently by implementing De-duplication techniques. The objective of the research is to eliminate file duplication by implementing De-duplication strategy. A novel and efficient De-duplication system using HDFS approach is introduced in this research work. To implement De-duplication strategy, hash values are computed for files using MD5 and SHA1 algorithms. The generated hash value for a file is checked with the existing file to identify the presence of duplication. If duplication exists, the system will not allow the user to upload the duplicate copy to the HDFS. Hence memory utilization is handled efficiently in HDFS. (C) 2016 The Authors. Published by Elsevier B.V. This is an open acess article under the CC BY-NC-ND license.
引用
收藏
页码:498 / 505
页数:8
相关论文
共 50 条
  • [1] Efficient index lookup for De-duplication backup system
    Won, Youjip
    Ban, Jongmyeong
    Min, Jaehong
    Hur, Jungpil
    Oh, Sangkyu
    Lee, Jangsun
    [J]. 2008 IEEE INTERNATIONAL SYMPOSIUM ON MODELING, ANALYSIS & SIMULATION OF COMPUTER AND TELECOMMUNICATION SYSTEMS (MASCOTS), 2008, : 352 - +
  • [2] Hypervisor Support for Efficient Memory De-duplication
    Pan, Ying-Shiuan
    Chiang, Jui-Hao
    Li, Han-Lin
    Tsao, Po-Jui
    Lin, Ming-Fen
    Chiueh, Tzi-cker
    [J]. 2011 IEEE 17TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2011, : 33 - 39
  • [3] Energy Aware Data Layout for De-duplication System
    Yan Fang
    Tan YuAn
    Liang QingGang
    Xing NingNing
    Wang YaoLei
    Zhang Xiang
    [J]. 2012 13TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED COMPUTING, APPLICATIONS, AND TECHNOLOGIES (PDCAT 2012), 2012, : 511 - 516
  • [4] ENHANCED MECHANISM FOR EFFICIENT STORAGE, RETRIEVAL AND DE-DUPLICATION IN CLOUD
    Nandhini, K.
    Prabhu, L. Arokia Jepu
    [J]. 2020 INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATION AND INFORMATICS (ICCCI - 2020), 2020, : 626 - +
  • [5] De-Duplication Errors in a Biometric System : An Investigative Study
    DeCann, Brian
    Ross, Arun
    [J]. PROCEEDINGS OF THE 2013 IEEE INTERNATIONAL WORKSHOP ON INFORMATION FORENSICS AND SECURITY (WIFS'13), 2013, : 43 - 48
  • [6] Hashing Fingerprints for Identity De-duplication
    Wang, Yi
    Yuen, Pong C.
    Cheung, Yiu-ming
    [J]. PROCEEDINGS OF THE 2013 IEEE INTERNATIONAL WORKSHOP ON INFORMATION FORENSICS AND SECURITY (WIFS'13), 2013, : 49 - 54
  • [7] An efficient technique for cloud storage using secured de-duplication algorithm
    Mohan, Prakash
    Sundaram, Manikandan
    Satpathy, Sambit
    Das, Sanchali
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2021, 41 (02) : 2969 - 2980
  • [8] Overview of Secure Distributed De-duplication System with Improved Reliability
    Junghare, Shweta A.
    Mahalle, V. S.
    [J]. PROCEEDINGS OF THE 2017 INTERNATIONAL CONFERENCE ON INVENTIVE SYSTEMS AND CONTROL (ICISC 2017), 2017, : 684 - 687
  • [9] De-Duplication Of Passports Using Aadhaar
    Prathilothamai, M.
    Nair, Priyanka Sunil
    [J]. 2017 INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATION AND INFORMATICS (ICCCI), 2017,
  • [10] De-duplication in File Sharing Network
    Yadav, Divakar
    Dani, Deepali
    Kumari, Preeti
    [J]. CONTEMPORARY COMPUTING, 2011, 168 : 551 - 553