A cluster-based data deduplication technology

被引:1
|
作者
Tseng, Chuan-Mu [1 ]
Ciou, Jheng-Rong [2 ]
Liu, Tzong-Jye [2 ]
机构
[1] Jeh Teh Jr Coll Med Nursing & Management, Dept Appl Digital Media, Miaoli, Taiwan
[2] Feng Chia Univ, Dept Informat Engn & Comp Sci, Taichung, Taiwan
关键词
Bloom filter; cluster; data deduplication;
D O I
10.1109/CANDAR.2014.22
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Data deduplication technology usually identifies redundant data quickly and correctly by using bloom filter technology. A bloom filter can determine whether there is redundant data. However, there are the presences of false positives. In order to avoid false positives, we need to compare a new chunk with chunks that have been stored. In order to reduce the time to exclude the bloom filter false positives, current research uses many small size index tables to store chunk ID. However, the target chunk ID only stores in one index table. Searching for the target chunk ID at another index table uselessly took a great deal of time. In this paper, we cluster the stored chunks to reduce the time of excluding the false positive problem induced by bloom filter.
引用
收藏
页码:226 / 230
页数:5
相关论文
共 50 条
  • [1] Elastic Data Routing in Cluster-based Deduplication Systems
    Wang, Yufeng
    Tang, Shaojie
    Tan, Chiu C.
    [J]. 2014 IEEE CONFERENCE ON COMPUTER COMMUNICATIONS WORKSHOPS (INFOCOM WKSHPS), 2014, : 117 - 118
  • [2] Cluster-based visualisation of marketing data
    Lisboa, PJG
    Patel, S
    [J]. INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING IDEAL 2004, PROCEEDINGS, 2004, 3177 : 552 - 558
  • [3] Cluster-based analysis of FMRI data
    Heller, Ruth
    Stanley, Damian
    Yekutieli, Daniel
    Rubin, Nava
    Benjamini, Yoav
    [J]. NEUROIMAGE, 2006, 33 (02) : 599 - 608
  • [4] Hadoop Based Scalable Cluster Deduplication for Big Data
    Liu, Qing
    Fu, Yinjin
    Ni, Guiqiang
    Hou, Rui
    [J]. 2016 IEEE 36TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS WORKSHOPS (ICDCSW 2016), 2016, : 98 - 105
  • [5] Cluster-Based Data Oriented Hashing
    Chafik, Sanaa
    Daoudi, Imane
    El Yacoubi, Mounim A.
    El Ouardi, Hamid
    [J]. PROCEEDINGS OF THE 2015 IEEE INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (IEEE DSAA 2015), 2015, : 1037 - 1043
  • [6] Cluster-based data relabelling for classification
    Wan, Huan
    Wang, Hui
    Scotney, Bryan
    Liu, Jun
    Wei, Xin
    [J]. INFORMATION SCIENCES, 2023, 648
  • [7] Cluster-Based Cooperative Data Service for VANETs
    Shi, Yongyue
    Peng, Xiao-Hong
    Shen, Hang
    Bai, Guangwei
    [J]. WIRELESS INTERNET (WICON 2017), 2018, 230 : 119 - 129
  • [8] Cluster-based sampling of multiclass imbalanced data
    Prachuabsupakij, Wanthanee
    Soonthornphisaj, Nuanwan
    [J]. INTELLIGENT DATA ANALYSIS, 2014, 18 (06) : 1109 - 1135
  • [9] Cluster-Based Prediction for Batteries in Data Centers
    Haider, Syed Naeem
    Zhao, Qianchuan
    Li, Xueliang
    [J]. ENERGIES, 2020, 13 (05)
  • [10] Localization techniques for cluster-based data grid
    Hsu, CH
    Lin, GH
    Li, KC
    Yang, CT
    [J]. DISTRIBUTED AND PARALLEL COMPUTING, 2005, 3719 : 83 - 92