Index of meta-data set of the similar files for inline de-duplication in distributed storage systems

被引:0
|
作者
Sun, Jing [1 ,2 ,3 ]
Yu, Hongliang [1 ,2 ,3 ]
Zheng, Weimin [1 ,2 ,3 ]
机构
[1] Research Institute of Tsinghua University in Shenzhen, Shenzhen, Guangdong 518057, China
[2] Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China
[3] National Engineering Laboratory for Disaster Backup and Recovery, Beijing 100876, China
关键词
Cost effectiveness - Metadata - Scalability - Cloud storage;
D O I
暂无
中图分类号
学科分类号
摘要
Distributed storage systems have been widely adopted in the cloud storages and enterprise storage infrastructure, because of their high scalability and cost effectiveness. In the storage systems, data de-duplication can save most of storage space for the devices, and can improve the efficiency of data transmission. The key of de-duplicating in the distributed storage systems is how to implement a high performance and scalability meta-data index that should not hurt the writing throughput. This paper proposes an index of meta-data sets of the similar files. The index uses a locality sensitive Hashing function to organize meta-data set, and accesses the disk only one time for the lookups for the chunks of a file. Consequently, the index improves the indexing performance with high scalability and a small memory footprint, which is suitable for the cloud and enterprise storages.
引用
收藏
页码:197 / 205
相关论文
共 24 条
  • [1] Semantic Data De-duplication for Archival Storage Systems
    Liu, Chuanyi
    Ju, Dapeng
    Gu, Yu
    Zhang, Youhui
    Wang, Dongsheng
    Du, David H. C.
    2008 13TH ASIA-PACIFIC COMPUTER SYSTEMS ARCHITECTURE CONFERENCE, 2008, : 154 - +
  • [2] Data De-duplication on Similar File Detection
    Zhu, Yueguang
    Zhang, Xingjun
    Zhao, Runting
    Dong, Xiaoshe
    2014 EIGHTH INTERNATIONAL CONFERENCE ON INNOVATIVE MOBILE AND INTERNET SERVICES IN UBIQUITOUS COMPUTING (IMIS), 2014, : 66 - 73
  • [3] Provable Ownership of Encrypted Files in De-Duplication Cloud Storage
    Yang, Chao
    Ma, Jianfeng
    Ren, Jian
    AD HOC & SENSOR WIRELESS NETWORKS, 2015, 26 (1-4) : 43 - 72
  • [4] A study on disk index design for large scale de-duplication storage systems
    Yang, Tian-Ming
    Feng, Dan
    Chou, Wen-Kuang
    Liu, Jing-Ning
    INTERNATIONAL JOURNAL OF COMPUTATIONAL SCIENCE AND ENGINEERING, 2015, 10 (1-2) : 171 - 180
  • [5] A study on disk index design for large scale de-duplication storage systems
    Yang, Tian-Ming
    Feng, Dan
    Chou, Wen-Kuang
    Liu, Jing-Ning
    International Journal of Computational Science and Engineering, 2015, 10 (01) : 171 - 180
  • [6] Data Structure for Packet De-duplication in Distributed Environments
    Finta, Istvan
    Farkas, Lorant
    Szenasi, Sandor
    2020 IEEE SIXTH INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING SERVICE AND APPLICATIONS (BIGDATASERVICE 2020), 2020, : 184 - 189
  • [7] Secure biometric authentication with de-duplication on distributed cloud storage
    Kumar, Vinoth M.
    Venkatachalam, K.
    Prabu, P.
    Almutairi, Abdulwahab
    Abouhawwash, Mohamed
    PEERJ COMPUTER SCIENCE, 2021, 7
  • [8] A study on data de-duplication schemes in cloud storage
    Kumar, Priyan Malarvizhi
    Devi, G. Usha
    Basheer, Shakila
    Parthasarathy, P.
    INTERNATIONAL JOURNAL OF GRID AND UTILITY COMPUTING, 2020, 11 (04) : 509 - 516
  • [9] Object-based data de-duplication method for OpenXML compound files
    School of Computer Science & Technology, Beijing Institute of Technology, Beijing
    100086, China
    不详
    101149, China
    Jisuanji Yanjiu yu Fazhan, 7 (1546-1557):
  • [10] Data De-duplication Using Cuckoo Hashing in Cloud Storage
    Sridharan, J.
    Valliyammai, C.
    Karthika, R. N.
    Kulasekaran, L. Nihil
    SOFT COMPUTING IN DATA ANALYTICS, SCDA 2018, 2019, 758 : 707 - 715