Weight Based Deduplication for Minimizing Data Replication in Public Cloud Storage

被引：0

作者：

Pugazhendi, E. ^{[1
]}

Sumalatha, M. R. ^{[1
]}

Harika, Lakshmi P. ^{[1
]}

机构：

[1] Anna Univ, Dept Informat Technol, MIT Campus, Chennai 600044, Tamil Nadu, India

来源：

JOURNAL OF SCIENTIFIC & INDUSTRIAL RESEARCH | 2021年 / 80卷 / 03期

关键词：

Cloud Computing; Document Frequency; Document Retrieval; Document Weight; Dropbox Cloud;

D O I：

暂无

中图分类号：

T [工业技术];

学科分类号：

08 ;

摘要：

The approach to optimize the data replication in public cloud storage when targeting the multiple instances is one of the challenging issues to process the text data. The amount of digital data has been increasing exponentially. There is a need to reduce the amount of storage space by storing the data efficiently. In cloud storage environment, the data replication provides high availability with fault tolerance system. An effective approach of deduplication system using weight based method is proposed at the target level in order to reduce the unwanted storage spaces in cloud. Storage space can be efficiently utilized by removing the unpopular files from the secondary servers. Target level consumes less processing power than source level deduplication. Multiple input text documents are stored into dropbox cloud. The top text features are detected using the Term Frequency (TF) and Named Entity Recognition (NER) and they are stored in text database. After storing the top features in database, fresh text documents are collected to find the popular and unpopular files in order to optimize the existing text corpus of cloud storage. Top Text features of the freshly collected text documents are detected using TF and NER and these unique features after the removing the duplicate features cleaning are compared with the existing features stored in the database. On the comparison, relevant text documents are listed. After listing the text documents, document frequency, document weight and threshold factor are detected. Depending on average threshold value, the popular and unpopular files are detected. The popular files are retained in all the storage nodes to achieve the full availability of data and unpopular files are removed from all the secondary servers except primary server. Before deduplication, the storage space occupied in the dropbox cloud is 8.09 MB. After deduplication, the unpopular files are removed from secondary storage nodes and the storage space in the dropbox cloud is optimized to 4.82MB. Finally, data replications are minimized and 45.60% of the cloud storage space is efficiently saved by applying the weight based deduplication system.

引用

页码：260 / 269

页数：10

共 50 条

[1] Secure Deduplication on Public Cloud Storage
Graupner, Hendrik
Torkura, Kennedy A.
Sukmana, Muhammad I. H.
Meinel, Christoph
[J]. ICBDC 2019: PROCEEDINGS OF 2019 4TH INTERNATIONAL CONFERENCE ON BIG DATA AND COMPUTING, 2019, : 34 - 41
[2] Dynamic Data Deduplication in Cloud Storage
Leesakul, Waraporn
Townend, Paul
Xu, Jie
[J]. 2014 IEEE 8TH INTERNATIONAL SYMPOSIUM ON SERVICE ORIENTED SYSTEM ENGINEERING (SOSE), 2014, : 320 - 325
[3] Encrypted Data Deduplication in Cloud Storage
Fan, Chun-I
Huang, Shi-Yuan
Hsu, Wen-Che
[J]. 2015 10TH ASIA JOINT CONFERENCE ON INFORMATION SECURITY (ASIAJCIS), 2015, : 18 - 25
[4] Data Deduplication Technology for Cloud Storage
He, Qinlu
Bian, Genqing
Shao, Bilin
Zhang, Weiqi
[J]. TEHNICKI VJESNIK-TECHNICAL GAZETTE, 2020, 27 (05): : 1444 - 1451
[5] Fine-grained Data Deduplication and proof of storage Scheme in Public Cloud Storage
Gajera, Hardik
Das, Manik Lal
[J]. 2021 INTERNATIONAL CONFERENCE ON COMMUNICATION SYSTEMS & NETWORKS (COMSNETS), 2021, : 237 - 241
[6] Deduplication Based Storage and Retrieval of Data from Cloud Environment
Pritha, N. Lakshmi
Velmurugan, N.
Winster, S. Godfrey
Vijayaraj, A.
[J]. INTERNATIONAL CONFERENCE ON INNOVATION INFORMATION IN COMPUTING TECHNOLOGIES, 2015, 2015,
[7] Public Auditing for Encrypted Data with Client-Side Deduplication in Cloud Storage
HE Kai
HUANG Chuanhe
ZHOU Hao
SHI Jiaoli
WANG Xiaomao
DAN Feng
[J]. Wuhan University Journal of Natural Sciences, 2015, 20 (04) : 291 - 298
[8] Improving Data Availability for Deduplication in Cloud Storage
Li, Jun
Hou, Mengshu
[J]. INTERNATIONAL JOURNAL OF GRID AND HIGH PERFORMANCE COMPUTING, 2018, 10 (02) : 70 - 89
[9] Data deduplication mechanism for cloud storage systems
Xu, Xiaolong
Tu, Qun
[J]. 2015 INTERNATIONAL CONFERENCE ON CYBER-ENABLED DISTRIBUTED COMPUTING AND KNOWLEDGE DISCOVERY, 2015, : 286 - 294
[10] Data Deduplication for Efficient Cloud Storage and Retrieval
Misal, Rishikesh
Perumal, Boominathan
[J]. INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2019, 16 (05) : 922 - 927

← 1 2 3 4 5 →