Object-based data de-duplication method for OpenXML compound files

被引:0
|
作者
School of Computer Science & Technology, Beijing Institute of Technology, Beijing [1 ]
100086, China
不详 [2 ]
101149, China
机构
来源
Jisuanji Yanjiu yu Fazhan | / 7卷 / 1546-1557期
关键词
Object detection;
D O I
10.7544/issn1000-1239.2015.20140093
中图分类号
学科分类号
摘要
Content defined chunking (CDC) is a prevalent data de-duplication algorithm for removing redundant data segments in storage systems. Current researches on CDC do not consider the unique content characteristic of different file types, and they determine chunk boundaries in a random way and apply a single strategy for all the file types. It has been proven that such method is suitable for text and simple contents, and it doesn't achieve the optimal performance for compound files. Compound file is composed of unstructured data, usually occupying large storage space and containing multimedia data. Object-based data de-duplication is the current most advanced method and is the effective solution for detecting duplicate data for such files. We analyze the content characteristic of OpenXML files and develop an object extraction method. A de-duplication granularity determining algorithm based on the object structure and distribution is proposed during this process. The purpose is to effectively detect the same objects in a file or between the different files, and to be effectively de-duplicated when the file physical layout is changed for compound files. Through the simulation experiments with typical unstructured data collection, the efficiency is promoted by 10% compared with CDC method in the unstructured data in general. ©, 2015, Science Press. All right reserved.
引用
下载
收藏
相关论文
共 50 条
  • [41] Data De-duplication and Event Processing for Security Applications on an Embedded Processor
    Nagarajaiah, Harsha
    Upadhyaya, Shambhu
    Gopal, Vinodh
    2012 31ST INTERNATIONAL SYMPOSIUM ON RELIABLE DISTRIBUTED SYSTEMS (SRDS 2012), 2012, : 418 - 423
  • [42] De-duplication scheduling strategy in real-time data warehouse
    Liu, Hui
    Song, Jie
    Wu, Jin Bo
    Bao, Yu-Bin
    Open Cybernetics and Systemics Journal, 2015, 9 (01): : 37 - 43
  • [43] 3DNBS: A Data De-duplication Disk-based Network Backup System
    Yang, Tianming
    Feng, Dan
    Liu, Jingning
    Wan, Yaping
    Niu, Zhongying
    Ke, Yuchang
    NAS: 2009 IEEE INTERNATIONAL CONFERENCE ON NETWORKING, ARCHITECTURE, AND STORAGE, 2009, : 287 - 294
  • [44] Secure data de-duplication based on threshold blind signature and bloom filter in internet of things
    Mi, Bo
    Li, Yang
    Darong, Huang
    Wei, Tiancheng
    Zou, Qianqian
    IEEE Access, 2020, 8 : 167113 - 167122
  • [45] Finite State Automata Based Cryptosystem for Secure Data Sharing and De-duplication in Cloud Computing
    Basappa B. Kodada
    Demian Antony D’Mello
    D. K. Santhosh Kumar
    SN Computer Science, 5 (6)
  • [46] Secure Data De-Duplication Based on Threshold Blind Signature and Bloom Filter in Internet of Things
    Mi, Bo
    Li, Yang
    Darong, Huang
    Wei, Tiancheng
    Zou, Qianqian
    IEEE ACCESS, 2020, 8 : 167113 - 167122
  • [47] Research on fast de-duplication of text backup information in library database based on big data
    Ji L.
    International Journal of Information and Communication Technology, 2021, 19 (01) : 76 - 92
  • [48] Large-Scale Data Management System Using Data De-duplication System
    Abirami, S.
    Vikraman, Rashmi
    Murugappan, S.
    PROCEEDINGS OF THE SECOND INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATION TECHNOLOGIES, IC3T 2015, VOL 1, 2016, 379 : 225 - 234
  • [49] An Effective Data Storage Model for Cloud Databases using Temporal Data De-duplication Approach
    Muthurajkumar, S.
    Vijayalakshmi, M.
    Kannan, A.
    2016 EIGHTH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING (ICOAC), 2017, : 42 - 45
  • [50] Flexible yet Secure De-duplication Service for Enterprise Data on Cloud Storage
    Chuan, Wen Bing
    Ren, Shu Qin
    Keoh, Sye Loong
    Aung, Khin Mi Mi
    2015 INTERNATIONAL CONFERENCE ON CLOUD COMPUTING RESEARCH AND INNOVATION (ICCCRI), 2015, : 37 - 44