Weight Based Deduplication for Minimizing Data Replication in Public Cloud Storage

被引:0
|
作者
Pugazhendi, E. [1 ]
Sumalatha, M. R. [1 ]
Harika, Lakshmi P. [1 ]
机构
[1] Anna Univ, Dept Informat Technol, MIT Campus, Chennai 600044, Tamil Nadu, India
来源
关键词
Cloud Computing; Document Frequency; Document Retrieval; Document Weight; Dropbox Cloud;
D O I
暂无
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
The approach to optimize the data replication in public cloud storage when targeting the multiple instances is one of the challenging issues to process the text data. The amount of digital data has been increasing exponentially. There is a need to reduce the amount of storage space by storing the data efficiently. In cloud storage environment, the data replication provides high availability with fault tolerance system. An effective approach of deduplication system using weight based method is proposed at the target level in order to reduce the unwanted storage spaces in cloud. Storage space can be efficiently utilized by removing the unpopular files from the secondary servers. Target level consumes less processing power than source level deduplication. Multiple input text documents are stored into dropbox cloud. The top text features are detected using the Term Frequency (TF) and Named Entity Recognition (NER) and they are stored in text database. After storing the top features in database, fresh text documents are collected to find the popular and unpopular files in order to optimize the existing text corpus of cloud storage. Top Text features of the freshly collected text documents are detected using TF and NER and these unique features after the removing the duplicate features cleaning are compared with the existing features stored in the database. On the comparison, relevant text documents are listed. After listing the text documents, document frequency, document weight and threshold factor are detected. Depending on average threshold value, the popular and unpopular files are detected. The popular files are retained in all the storage nodes to achieve the full availability of data and unpopular files are removed from all the secondary servers except primary server. Before deduplication, the storage space occupied in the dropbox cloud is 8.09 MB. After deduplication, the unpopular files are removed from secondary storage nodes and the storage space in the dropbox cloud is optimized to 4.82MB. Finally, data replications are minimized and 45.60% of the cloud storage space is efficiently saved by applying the weight based deduplication system.
引用
收藏
页码:260 / 269
页数:10
相关论文
共 50 条
  • [1] Secure Deduplication on Public Cloud Storage
    Graupner, Hendrik
    Torkura, Kennedy A.
    Sukmana, Muhammad I. H.
    Meinel, Christoph
    [J]. ICBDC 2019: PROCEEDINGS OF 2019 4TH INTERNATIONAL CONFERENCE ON BIG DATA AND COMPUTING, 2019, : 34 - 41
  • [2] Dynamic Data Deduplication in Cloud Storage
    Leesakul, Waraporn
    Townend, Paul
    Xu, Jie
    [J]. 2014 IEEE 8TH INTERNATIONAL SYMPOSIUM ON SERVICE ORIENTED SYSTEM ENGINEERING (SOSE), 2014, : 320 - 325
  • [3] Encrypted Data Deduplication in Cloud Storage
    Fan, Chun-I
    Huang, Shi-Yuan
    Hsu, Wen-Che
    [J]. 2015 10TH ASIA JOINT CONFERENCE ON INFORMATION SECURITY (ASIAJCIS), 2015, : 18 - 25
  • [4] Data Deduplication Technology for Cloud Storage
    He, Qinlu
    Bian, Genqing
    Shao, Bilin
    Zhang, Weiqi
    [J]. TEHNICKI VJESNIK-TECHNICAL GAZETTE, 2020, 27 (05): : 1444 - 1451
  • [5] Fine-grained Data Deduplication and proof of storage Scheme in Public Cloud Storage
    Gajera, Hardik
    Das, Manik Lal
    [J]. 2021 INTERNATIONAL CONFERENCE ON COMMUNICATION SYSTEMS & NETWORKS (COMSNETS), 2021, : 237 - 241
  • [6] Deduplication Based Storage and Retrieval of Data from Cloud Environment
    Pritha, N. Lakshmi
    Velmurugan, N.
    Winster, S. Godfrey
    Vijayaraj, A.
    [J]. INTERNATIONAL CONFERENCE ON INNOVATION INFORMATION IN COMPUTING TECHNOLOGIES, 2015, 2015,
  • [7] Public Auditing for Encrypted Data with Client-Side Deduplication in Cloud Storage
    HE Kai
    HUANG Chuanhe
    ZHOU Hao
    SHI Jiaoli
    WANG Xiaomao
    DAN Feng
    [J]. Wuhan University Journal of Natural Sciences, 2015, 20 (04) : 291 - 298
  • [8] Improving Data Availability for Deduplication in Cloud Storage
    Li, Jun
    Hou, Mengshu
    [J]. INTERNATIONAL JOURNAL OF GRID AND HIGH PERFORMANCE COMPUTING, 2018, 10 (02) : 70 - 89
  • [9] Data deduplication mechanism for cloud storage systems
    Xu, Xiaolong
    Tu, Qun
    [J]. 2015 INTERNATIONAL CONFERENCE ON CYBER-ENABLED DISTRIBUTED COMPUTING AND KNOWLEDGE DISCOVERY, 2015, : 286 - 294
  • [10] Data Deduplication for Efficient Cloud Storage and Retrieval
    Misal, Rishikesh
    Perumal, Boominathan
    [J]. INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2019, 16 (05) : 922 - 927