Semantic Analysis of Big Data by Applying De-duplication techniques

被引:0
|
作者
Garg, Sanjeev [1 ]
Bala, Anju [1 ]
机构
[1] Thapar Univ Patiala, Comp Sci Dept, Patiala, Punjab, India
关键词
Data de-duplication; Karma-Data integration tool; Integration; Record Linkage; Chunk; Byte level;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
As the data available on the web is in heterogeneous formats such as text, video, audio etc. Hence, there is need to integrate the data from the different sources and analyze the data which can be utilized for efficient query execution. After integration the data becomes large and there is need to used different techniques to analyze data. Thus, a de-duplication technique has been introduced by integrating heterogeneous data in same format. Further, to analyze large amount of data, record level data duplication is being applied on integrated data that is also used for removing similar type of data. Finally, the experimental results validate the efficiency in terms of execution time, storage space and success.
引用
收藏
页码:660 / 665
页数:6
相关论文
共 50 条
  • [1] Semantic Data De-duplication for Archival Storage Systems
    Liu, Chuanyi
    Ju, Dapeng
    Gu, Yu
    Zhang, Youhui
    Wang, Dongsheng
    Du, David H. C.
    [J]. 2008 13TH ASIA-PACIFIC COMPUTER SYSTEMS ARCHITECTURE CONFERENCE, 2008, : 154 - +
  • [2] Towards De-duplication Framework in Big Data Analysis. A Case Study
    Maslankowski, Jacek
    [J]. INFORMATION SYSTEMS: DEVELOPMENT, RESEARCH, APPLICATIONS, EDUCATION, 2016, 264 : 104 - 113
  • [3] Secure Static Data De-duplication
    Pawar, Rohit
    Zanwar, Payal
    Bora, Shruti
    Kullkarni, Shweta
    [J]. INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2016, 16 (03): : 69 - 73
  • [4] GDup: De-duplication of Scholarly Communication Big Graphs
    Atzori, Claudio
    Manghi, Paolo
    Bardi, Alessia
    [J]. 2018 IEEE/ACM 5TH INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING APPLICATIONS AND TECHNOLOGIES (BDCAT), 2018, : 142 - 151
  • [5] Data De-duplication on Similar File Detection
    Zhu, Yueguang
    Zhang, Xingjun
    Zhao, Runting
    Dong, Xiaoshe
    [J]. 2014 EIGHTH INTERNATIONAL CONFERENCE ON INNOVATIVE MOBILE AND INTERNET SERVICES IN UBIQUITOUS COMPUTING (IMIS), 2014, : 66 - 73
  • [6] Research on Chunking Algorithms of Data De-duplication
    Bo, Cai
    Li, Zhang Feng
    Can, Wang
    [J]. PROCEEDINGS OF THE 2012 INTERNATIONAL CONFERENCE ON COMMUNICATION, ELECTRONICS AND AUTOMATION ENGINEERING, 2013, 181 : 1019 - 1025
  • [7] An incremental clustering scheme for data de-duplication
    Gianni Costa
    Giuseppe Manco
    Riccardo Ortale
    [J]. Data Mining and Knowledge Discovery, 2010, 20 : 152 - 187
  • [8] An incremental clustering scheme for data de-duplication
    Costa, Gianni
    Manco, Giuseppe
    Ortale, Riccardo
    [J]. DATA MINING AND KNOWLEDGE DISCOVERY, 2010, 20 (01) : 152 - 187
  • [9] Sequence of hashes compression in data de-duplication
    Balachandran, Subashini
    Constantinescu, Cornel
    [J]. DCC: 2008 DATA COMPRESSION CONFERENCE, PROCEEDINGS, 2008, : 505 - 505
  • [10] A Distributed and Scalable Solution for Applying Semantic Techniques to Big Data
    Amato, Alba
    Venticinque, Salvatore
    Di Martino, Beniamino
    [J]. INTERNATIONAL JOURNAL OF MOBILE COMPUTING AND MULTIMEDIA COMMUNICATIONS, 2014, 6 (02) : 50 - 67