Bucket Based Data Deduplication Technique for Big Data Storage System

被引:0
|
作者
Kumar, Naresh [1 ]
Rawat, Rahul [1 ]
Jain, S. C. [2 ]
机构
[1] Kurukshetra Univ, UIET, Dept Comp Sci & Engn, Kurukshetra 136119, Haryana, India
[2] Rajasthan Tech Univ, Dept Comp Sci & Engn, KOTA 302017, Rajasthan, India
关键词
Big Data; Hadoop; CDC Chunking; Bucket; Deduplication; Chunk;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In this paper proposed bucket based data deduplication technique is presented. In proposed technique bigdata stream is given to the fixed size chunking algorithm to create fixed size chunks. When the chunks are obtained then these chunks are given to the MD5 algorithm module to generate hash values for the chunks. After that MapReduce model is applied to find whether hash values are duplicate or not. To detect the duplicate hash values MapReduce model compared these hash values with already stored hash values in bucket storage. If these hash values are already present in the bucket storage then these can be identified as duplicate. If the hash values are duplicated then do not store the data into the Hadoop Distributed File System (HDFS) else then store the data into the HDFS. The proposed technique is analyzed using real data set using Hadoop tool.
引用
收藏
页码:267 / 271
页数:5
相关论文
共 50 条
  • [1] Differential Evolution based bucket indexed data deduplication for big data storage
    Kumar, Naresh
    Antwal, Shobha
    Jain, S. C.
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2018, 34 (01) : 491 - 505
  • [2] Towards Efficient Big Data Storage With MapReduce Deduplication System
    Joe, Vijesh
    Raj, Jennifer S.
    Smys, S.
    [J]. INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGY AND WEB ENGINEERING, 2021, 16 (02) : 45 - 57
  • [3] Characterizing the Efficiency of Data Deduplication for Big Data Storage Management
    Zhou, Ruijin
    Liu, Ming
    Li, Tao
    [J]. 2013 IEEE INTERNATIONAL SYMPOSIUM ON WORKLOAD CHARACTERIZATION (IISWC 2013), 2013, : 98 - 108
  • [4] Genetic Optimized Data Deduplication for Distributed Big Data Storage Systems
    Kumar, Naresh
    Antwal, Shobha
    Samarthyam, Ganesh
    Jain, S. C.
    [J]. PROCEEDINGS OF 4TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, COMPUTING AND CONTROL (ISPCC 2K17), 2017, : 7 - 15
  • [5] Boafft: Distributed Deduplication for Big Data Storage in the Cloud
    Luo, Shengmei
    Zhang, Guangyan
    Wu, Chengwen
    Khan, Samee U.
    Li, Keqin
    [J]. IEEE TRANSACTIONS ON CLOUD COMPUTING, 2020, 8 (04) : 1199 - 1211
  • [6] A Bloom Filter-Based Data Deduplication for Big Data
    Podder, Shrayasi
    Mukherjee, S.
    [J]. ADVANCES IN DATA AND INFORMATION SCIENCES, VOL 1, 2018, 38 : 161 - 168
  • [7] Green Data Storage Strategy in Mobile Computing System using Deduplication Technique
    Mondal, Samit Kumar
    De, Debashis
    [J]. 2012 THIRD INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATION TECHNOLOGY (ICCCT), 2012, : 239 - 240
  • [8] Enhanced attribute based access control with secure deduplication for big data storage in cloud
    Praveen Kumar Premkamal
    Syam Kumar Pasupuleti
    Abhishek Kumar Singh
    P. J. A. Alphonse
    [J]. Peer-to-Peer Networking and Applications, 2021, 14 : 102 - 120
  • [9] Enhanced attribute based access control with secure deduplication for big data storage in cloud
    Premkamal, Praveen Kumar
    Pasupuleti, Syam Kumar
    Singh, Abhishek Kumar
    Alphonse, P. J. A.
    [J]. PEER-TO-PEER NETWORKING AND APPLICATIONS, 2021, 14 (01) : 102 - 120
  • [10] DedupeSwift: Object-oriented Storage System based on Data Deduplication
    Ma, Jingwei
    Wang, Gang
    Liu, Xiaoguang
    [J]. 2016 IEEE TRUSTCOM/BIGDATASE/ISPA, 2016, : 1069 - 1076