Towards Efficient Big Data Storage With MapReduce Deduplication System

被引:0
|
作者
Joe, Vijesh [1 ]
Raj, Jennifer S. [2 ]
Smys, S. [3 ]
机构
[1] VV Coll Engn, Dept Comp Sci & Engn, Tirunelveli, India
[2] Gnanamani Coll Technol, Dept Elect & Commun Engn, Namakkal 637018, India
[3] RVS Tech Campus, Coimbatore, Tamil Nadu, India
关键词
Content-Defined Chunking; Data Redundancy; De-Duplication; Fractal Index Tree; Hashing; MapReduce; SHA-3; Two Threshold Two Divisor with Switch (TTTD-S); CONTENT-DEFINED CHUNKING;
D O I
10.4018/IJITWE.2021040103
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In the big data era, there is a high requirement for data storage and processing. The conventional approach faces a great challenge, and de-duplication is an excellent approach to reduce the storage space and computational time. Many existing approaches take much time to pinpoint the similar data. MapReduce de-duplication system is proposed to attain high duplication ratio. MapReduce is the parallel processing approach that helps to process large number of files in less time. The proposed system uses two threshold two divisor with switch algorithm for chunking. Switch is the average parameter used by TTTD-S to minimize the chunk size variance. Hashing using SHA-3 and fractal tree indexing is used here. In fractal index tree, read and write takes place at the same time. Data size after de-duplication, de-duplication ratio, throughput, hash time, chunk time, and de-duplication time are the parameters used. The performance of the system is tested by college scorecard and ZCTA dataset. The experimental results show that the proposed system can lessen the duplicity and processing time.
引用
收藏
页码:45 / 57
页数:13
相关论文
共 50 条
  • [1] Bucket Based Data Deduplication Technique for Big Data Storage System
    Kumar, Naresh
    Rawat, Rahul
    Jain, S. C.
    [J]. 2016 5TH INTERNATIONAL CONFERENCE ON RELIABILITY, INFOCOM TECHNOLOGIES AND OPTIMIZATION (TRENDS AND FUTURE DIRECTIONS) (ICRITO), 2016, : 267 - 271
  • [2] Characterizing the Efficiency of Data Deduplication for Big Data Storage Management
    Zhou, Ruijin
    Liu, Ming
    Li, Tao
    [J]. 2013 IEEE INTERNATIONAL SYMPOSIUM ON WORKLOAD CHARACTERIZATION (IISWC 2013), 2013, : 98 - 108
  • [3] Boafft: Distributed Deduplication for Big Data Storage in the Cloud
    Luo, Shengmei
    Zhang, Guangyan
    Wu, Chengwen
    Khan, Samee U.
    Li, Keqin
    [J]. IEEE TRANSACTIONS ON CLOUD COMPUTING, 2020, 8 (04) : 1199 - 1211
  • [4] Secure and Efficient Data Deduplication in JointCloud Storage
    Zhang, Di
    Le, Junqing
    Mu, Nankun
    Wu, Jiahui
    Liao, Xiaofeng
    [J]. IEEE TRANSACTIONS ON CLOUD COMPUTING, 2023, 11 (01) : 156 - 167
  • [5] Data Deduplication for Efficient Cloud Storage and Retrieval
    Misal, Rishikesh
    Perumal, Boominathan
    [J]. INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2019, 16 (05) : 922 - 927
  • [6] Efficient Big Data Processing in Hadoop MapReduce
    Dittrich, Jens
    Quiane-Ruiz, Jorge-Arnulfo
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2012, 5 (12): : 2014 - 2015
  • [7] Secure and efficient big data deduplication in fog computing
    Yan, Jiajun
    Wang, Xiaoming
    Gan, Qingqing
    Li, Suyu
    Huang, Daxin
    [J]. SOFT COMPUTING, 2020, 24 (08) : 5671 - 5682
  • [8] Secure and efficient big data deduplication in fog computing
    Jiajun Yan
    Xiaoming Wang
    Qingqing Gan
    Suyu Li
    Daxin Huang
    [J]. Soft Computing, 2020, 24 : 5671 - 5682
  • [9] Genetic Optimized Data Deduplication for Distributed Big Data Storage Systems
    Kumar, Naresh
    Antwal, Shobha
    Samarthyam, Ganesh
    Jain, S. C.
    [J]. PROCEEDINGS OF 4TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, COMPUTING AND CONTROL (ISPCC 2K17), 2017, : 7 - 15
  • [10] Efficient cross user Data Deduplication in Remote Data Storage
    Prajapati, Priteshkumar
    Shah, Parth
    [J]. 2014 INTERNATIONAL CONFERENCE FOR CONVERGENCE OF TECHNOLOGY (I2CT), 2014,