Hadoop Based Scalable Cluster Deduplication for Big Data

被引：4

作者：

Liu, Qing ^{[1
]}

Fu, Yinjin ^{[1
]}

Ni, Guiqiang ^{[1
]}

Hou, Rui ^{[2
]}

机构：

[1] PLA Univ Sci & Technol, Coll Command Informat Syst, Nanjing, Jiangsu, Peoples R China

[2] Inst Elect Syst Engn, Beijing, Peoples R China

来源：

2016 IEEE 36TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS WORKSHOPS (ICDCSW 2016) | 2016年

关键词：

data deduplication; big data; Hadoop; HBase; index management;

D O I：

10.1109/ICDCSW.2016.17

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The exponential growth of data has brought a tremendous challenge on the storage system in data center. Data deduplication technology which detects and eliminates redundant data in the dataset can greatly reduce the quantity of data and optimize the utilization of storage space. This paper presented a scalable and reliable cluster deduplication system Halodedu over the Hadoop-based cloud computing platform. Halodedu used MapReduce and HDFS to realize parallel deduplication processing and manage data storage, respectively. Intra-node local database was used to build up a fast and distributed chunk fingerprint index management. In order to maintain the availability and reliability of metadata, HBase was utilized to store the metadata of backup files. We further used virtual machine images as input dataset to evaluate Halodedu. The comparative experiments demonstrated that Halodedu has improvements on deduplication speed and system scalability.

引用

页码：98 / 105

页数：8

共 50 条

[21] Deduplication on Encrypted Big Data in Cloud
Yan, Zheng
Ding, Wenxiu
Yu, Xixun
Zhu, Haiqi
Deng, Robert H.
[J]. IEEE Transactions on Big Data, 2016, 2 (02): : 138 - 150
[22] The Research on Big Data Security Architecture Based on Hadoop
Zhuang, Miao
[J]. PROCEEDINGS OF THE 2015 4TH NATIONAL CONFERENCE ON ELECTRICAL, ELECTRONICS AND COMPUTER ENGINEERING ( NCEECE 2015), 2016, 47 : 241 - 244
[23] Power Big Data platform Based on Hadoop Technology
Chen, Jilin
Liu, Nana
Chen, Yong
Qiu, Weijiang
[J]. PROCEEDINGS OF THE 2016 6TH INTERNATIONAL CONFERENCE ON MACHINERY, MATERIALS, ENVIRONMENT, BIOTECHNOLOGY AND COMPUTER (MMEBC), 2016, 88 : 571 - 576
[24] Hadoop based Demography Big Data Management System
Bukhari, Syeda Sana
Park, JinHyuck
Shin, Dong Ryeol
[J]. 2018 19TH IEEE/ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING AND PARALLEL/DISTRIBUTED COMPUTING (SNPD), 2018, : 93 - 98
[25] Performance Evaluation Of Association Mining In Hadoop Single Node Cluster With Big Data
Asbern, A.
Asha, P.
[J]. 2015 INTERNATIONAL CONFERENCED ON CIRCUITS, POWER AND COMPUTING TECHNOLOGIES (ICCPCT-2015), 2015,
[26] Elastic Data Routing in Cluster-based Deduplication Systems
Wang, Yufeng
Tang, Shaojie
Tan, Chiu C.
[J]. 2014 IEEE CONFERENCE ON COMPUTER COMMUNICATIONS WORKSHOPS (INFOCOM WKSHPS), 2014, : 117 - 118
[27] Differential Evolution based bucket indexed data deduplication for big data storage
Kumar, Naresh
Antwal, Shobha
Jain, S. C.
[J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2018, 34 (01) : 491 - 505
[28] SDVC: A Scalable Deduplication Cluster for Virtual Machine Images in Cloud
Lin, Chuan
Cao, Qiang
Zhang, Hongliang
Huang, Guoqiang
Xie, Changsheng
[J]. 2014 9TH IEEE INTERNATIONAL CONFERENCE ON NETWORKING, ARCHITECTURE, AND STORAGE (NAS), 2014, : 88 - 92
[29] Hadoop Based Parallel Deduplication Method for Web Documents
Song, Junjie
Liu, Jin
Zheng, Yuhui
[J]. ADVANCES IN COMPUTER SCIENCE AND UBIQUITOUS COMPUTING, 2018, 474 : 499 - 504
[30] Design of an Exact Data Deduplication Cluster
Kaiser, Juergen
Meister, Dirk
Brinkmann, Andre
Effert, Sascha
[J]. 2012 IEEE 28TH SYMPOSIUM ON MASS STORAGE SYSTEMS AND TECHNOLOGIES (MSST), 2012,

← 1 2 3 4 5 →