Bucket Based Data Deduplication Technique for Big Data Storage System

被引：0

作者：

Kumar, Naresh ^{[1
]}

Rawat, Rahul ^{[1
]}

Jain, S. C. ^{[2
]}

机构：

[1] Kurukshetra Univ, UIET, Dept Comp Sci & Engn, Kurukshetra 136119, Haryana, India

[2] Rajasthan Tech Univ, Dept Comp Sci & Engn, KOTA 302017, Rajasthan, India

来源：

2016 5TH INTERNATIONAL CONFERENCE ON RELIABILITY, INFOCOM TECHNOLOGIES AND OPTIMIZATION (TRENDS AND FUTURE DIRECTIONS) (ICRITO) | 2016年

关键词：

Big Data; Hadoop; CDC Chunking; Bucket; Deduplication; Chunk;

D O I：

暂无

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

In this paper proposed bucket based data deduplication technique is presented. In proposed technique bigdata stream is given to the fixed size chunking algorithm to create fixed size chunks. When the chunks are obtained then these chunks are given to the MD5 algorithm module to generate hash values for the chunks. After that MapReduce model is applied to find whether hash values are duplicate or not. To detect the duplicate hash values MapReduce model compared these hash values with already stored hash values in bucket storage. If these hash values are already present in the bucket storage then these can be identified as duplicate. If the hash values are duplicated then do not store the data into the Hadoop Distributed File System (HDFS) else then store the data into the HDFS. The proposed technique is analyzed using real data set using Hadoop tool.

引用

页码：267 / 271

页数：5

共 50 条

[1] Differential Evolution based bucket indexed data deduplication for big data storage
Kumar, Naresh
Antwal, Shobha
Jain, S. C.
[J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2018, 34 (01) : 491 - 505
[2] Towards Efficient Big Data Storage With MapReduce Deduplication System
Joe, Vijesh
Raj, Jennifer S.
Smys, S.
[J]. INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGY AND WEB ENGINEERING, 2021, 16 (02) : 45 - 57
[3] Characterizing the Efficiency of Data Deduplication for Big Data Storage Management
Zhou, Ruijin
Liu, Ming
Li, Tao
[J]. 2013 IEEE INTERNATIONAL SYMPOSIUM ON WORKLOAD CHARACTERIZATION (IISWC 2013), 2013, : 98 - 108
[4] Genetic Optimized Data Deduplication for Distributed Big Data Storage Systems
Kumar, Naresh
Antwal, Shobha
Samarthyam, Ganesh
Jain, S. C.
[J]. PROCEEDINGS OF 4TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, COMPUTING AND CONTROL (ISPCC 2K17), 2017, : 7 - 15
[5] Boafft: Distributed Deduplication for Big Data Storage in the Cloud
Luo, Shengmei
Zhang, Guangyan
Wu, Chengwen
Khan, Samee U.
Li, Keqin
[J]. IEEE TRANSACTIONS ON CLOUD COMPUTING, 2020, 8 (04) : 1199 - 1211
[6] A Bloom Filter-Based Data Deduplication for Big Data
Podder, Shrayasi
Mukherjee, S.
[J]. ADVANCES IN DATA AND INFORMATION SCIENCES, VOL 1, 2018, 38 : 161 - 168
[7] Green Data Storage Strategy in Mobile Computing System using Deduplication Technique
Mondal, Samit Kumar
De, Debashis
[J]. 2012 THIRD INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATION TECHNOLOGY (ICCCT), 2012, : 239 - 240
[8] Enhanced attribute based access control with secure deduplication for big data storage in cloud
Praveen Kumar Premkamal
Syam Kumar Pasupuleti
Abhishek Kumar Singh
P. J. A. Alphonse
[J]. Peer-to-Peer Networking and Applications, 2021, 14 : 102 - 120
[9] Enhanced attribute based access control with secure deduplication for big data storage in cloud
Premkamal, Praveen Kumar
Pasupuleti, Syam Kumar
Singh, Abhishek Kumar
Alphonse, P. J. A.
[J]. PEER-TO-PEER NETWORKING AND APPLICATIONS, 2021, 14 (01) : 102 - 120
[10] DedupeSwift: Object-oriented Storage System based on Data Deduplication
Ma, Jingwei
Wang, Gang
Liu, Xiaoguang
[J]. 2016 IEEE TRUSTCOM/BIGDATASE/ISPA, 2016, : 1069 - 1076

← 1 2 3 4 5 →