Towards De-duplication Framework in Big Data Analysis. A Case Study

被引:3
|
作者
Maslankowski, Jacek [1 ]
机构
[1] Univ Gdansk, Dept Business Informat, Gdansk, Poland
关键词
Business informatics; Big Data; Unstructured data; Data analysis; Data quality; ISSUES;
D O I
10.1007/978-3-319-46642-2_7
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Big Data analysis gives access to wider perspectives of information. Especially it allows processing unstructured and structured data together. However lots of data sources do not mean that the quality of data is enough to provide reliable results. There are several different quality indicators related to Big Data analysis. In this paper we will focus on two of them that are the most critical in the first phase of data processing: ambiguousness and duplicates. The goal of this paper is to present the proposal of the framework used to eliminate duplicates in large datasets acquired with Big Data analysis.
引用
收藏
页码:104 / 113
页数:10
相关论文
共 50 条
  • [1] Semantic Analysis of Big Data by Applying De-duplication techniques
    Garg, Sanjeev
    Bala, Anju
    [J]. 2016 INTERNATIONAL CONFERENCE ON INVENTIVE COMPUTATION TECHNOLOGIES (ICICT), VOL 3, 2015, : 660 - 665
  • [2] Decentralized and Privacy Sensitive Data De-Duplication Framework for Convenient Big Data Management in Cloud Backup Systems
    Jeslin, J. Gnana
    Kumar, P. Mohan
    [J]. SYMMETRY-BASEL, 2022, 14 (07):
  • [3] A data de-duplication access framework for solid state drives
    Department of Electronic Engineering, National Taiwan University of Science and Technology, Taipei, 106, Taiwan
    [J]. J. Inf. Sci. Eng., 2012, 5 (941-954):
  • [4] A Data De-duplication Access Framework for Solid State Drives
    Wu, Chin-Hsien
    Wu, Hau-Shan
    [J]. JOURNAL OF INFORMATION SCIENCE AND ENGINEERING, 2012, 28 (05) : 941 - 954
  • [5] A study on data de-duplication schemes in cloud storage
    Kumar, Priyan Malarvizhi
    Devi, G. Usha
    Basheer, Shakila
    Parthasarathy, P.
    [J]. INTERNATIONAL JOURNAL OF GRID AND UTILITY COMPUTING, 2020, 11 (04) : 509 - 516
  • [6] A proficient cost reduction framework for de-duplication of records in data integration
    Asif Sohail
    Muhammad Murtaza Yousaf
    [J]. BMC Medical Informatics and Decision Making, 16
  • [7] A proficient cost reduction framework for de-duplication of records in data integration
    Sohail, Asif
    Yousaf, Muhammad Murtaza
    [J]. BMC MEDICAL INFORMATICS AND DECISION MAKING, 2016, 16
  • [8] Secure Static Data De-duplication
    Pawar, Rohit
    Zanwar, Payal
    Bora, Shruti
    Kullkarni, Shweta
    [J]. INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2016, 16 (03): : 69 - 73
  • [9] GDup: De-duplication of Scholarly Communication Big Graphs
    Atzori, Claudio
    Manghi, Paolo
    Bardi, Alessia
    [J]. 2018 IEEE/ACM 5TH INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING APPLICATIONS AND TECHNOLOGIES (BDCAT), 2018, : 142 - 151
  • [10] De-duplication Framework to Reduce the Record Linkage Problem
    Dagade, Akshata
    Mali, Manisha
    [J]. 2017 INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATION AND INFORMATICS (ICCCI), 2017,