Towards Efficient Big Data Storage With MapReduce Deduplication System

被引：0

作者：

Joe, Vijesh ^{[1
]}

Raj, Jennifer S. ^{[2
]}

Smys, S. ^{[3
]}

机构：

[1] VV Coll Engn, Dept Comp Sci & Engn, Tirunelveli, India

[2] Gnanamani Coll Technol, Dept Elect & Commun Engn, Namakkal 637018, India

[3] RVS Tech Campus, Coimbatore, Tamil Nadu, India

来源：

INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGY AND WEB ENGINEERING | 2021年 / 16卷 / 02期

关键词：

Content-Defined Chunking; Data Redundancy; De-Duplication; Fractal Index Tree; Hashing; MapReduce; SHA-3; Two Threshold Two Divisor with Switch (TTTD-S); CONTENT-DEFINED CHUNKING;

D O I：

10.4018/IJITWE.2021040103

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In the big data era, there is a high requirement for data storage and processing. The conventional approach faces a great challenge, and de-duplication is an excellent approach to reduce the storage space and computational time. Many existing approaches take much time to pinpoint the similar data. MapReduce de-duplication system is proposed to attain high duplication ratio. MapReduce is the parallel processing approach that helps to process large number of files in less time. The proposed system uses two threshold two divisor with switch algorithm for chunking. Switch is the average parameter used by TTTD-S to minimize the chunk size variance. Hashing using SHA-3 and fractal tree indexing is used here. In fractal index tree, read and write takes place at the same time. Data size after de-duplication, de-duplication ratio, throughput, hash time, chunk time, and de-duplication time are the parameters used. The performance of the system is tested by college scorecard and ZCTA dataset. The experimental results show that the proposed system can lessen the duplicity and processing time.

引用

页码：45 / 57

页数：13

共 50 条

[1] Bucket Based Data Deduplication Technique for Big Data Storage System
Kumar, Naresh
Rawat, Rahul
Jain, S. C.
[J]. 2016 5TH INTERNATIONAL CONFERENCE ON RELIABILITY, INFOCOM TECHNOLOGIES AND OPTIMIZATION (TRENDS AND FUTURE DIRECTIONS) (ICRITO), 2016, : 267 - 271
[2] Characterizing the Efficiency of Data Deduplication for Big Data Storage Management
Zhou, Ruijin
Liu, Ming
Li, Tao
[J]. 2013 IEEE INTERNATIONAL SYMPOSIUM ON WORKLOAD CHARACTERIZATION (IISWC 2013), 2013, : 98 - 108
[3] Boafft: Distributed Deduplication for Big Data Storage in the Cloud
Luo, Shengmei
Zhang, Guangyan
Wu, Chengwen
Khan, Samee U.
Li, Keqin
[J]. IEEE TRANSACTIONS ON CLOUD COMPUTING, 2020, 8 (04) : 1199 - 1211
[4] Secure and Efficient Data Deduplication in JointCloud Storage
Zhang, Di
Le, Junqing
Mu, Nankun
Wu, Jiahui
Liao, Xiaofeng
[J]. IEEE TRANSACTIONS ON CLOUD COMPUTING, 2023, 11 (01) : 156 - 167
[5] Data Deduplication for Efficient Cloud Storage and Retrieval
Misal, Rishikesh
Perumal, Boominathan
[J]. INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2019, 16 (05) : 922 - 927
[6] Efficient Big Data Processing in Hadoop MapReduce
Dittrich, Jens
Quiane-Ruiz, Jorge-Arnulfo
[J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2012, 5 (12): : 2014 - 2015
[7] Secure and efficient big data deduplication in fog computing
Yan, Jiajun
Wang, Xiaoming
Gan, Qingqing
Li, Suyu
Huang, Daxin
[J]. SOFT COMPUTING, 2020, 24 (08) : 5671 - 5682
[8] Secure and efficient big data deduplication in fog computing
Jiajun Yan
Xiaoming Wang
Qingqing Gan
Suyu Li
Daxin Huang
[J]. Soft Computing, 2020, 24 : 5671 - 5682
[9] Genetic Optimized Data Deduplication for Distributed Big Data Storage Systems
Kumar, Naresh
Antwal, Shobha
Samarthyam, Ganesh
Jain, S. C.
[J]. PROCEEDINGS OF 4TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, COMPUTING AND CONTROL (ISPCC 2K17), 2017, : 7 - 15
[10] Efficient cross user Data Deduplication in Remote Data Storage
Prajapati, Priteshkumar
Shah, Parth
[J]. 2014 INTERNATIONAL CONFERENCE FOR CONVERGENCE OF TECHNOLOGY (I2CT), 2014,

← 1 2 3 4 5 →