Quality-Based Online Data Reconciliation

被引:2
|
作者
Abboura, Asma [1 ]
Sahri, Soror [2 ]
Baba-Hamed, Latifa [1 ]
Ouziri, Mourad [2 ]
Benbernou, Salima [2 ]
机构
[1] Univ Oran 1, Oran, Algeria
[2] Univ Paris 05, Sorbonnes Paris Cite, LIPADE Lab, Paris, France
关键词
Duplicates; data reconciliation; data quality rules; source quality;
D O I
10.1145/2806888
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
One of the main challenges in data matching and data cleaning, in highly integrated systems, is duplicates detection. While the literature abounds of approaches detecting duplicates corresponding to the same real world entity, most of these approaches tend to eliminate duplicates (wrong information) from the sources, hence leading to what is called data repair. In this article, we propose a framework that automatically detects duplicates at query time and effectively identifies the consistent version of the data, while keeping inconsistent data in the sources. Our framework uses matching dependencies (MDs) to detect duplicates through the concept of data reconciliation rules (DRR) and conditional function dependencies (CFDs) to assess the quality of different attribute values. We also build a duplicate reconciliation index (DRI), based on clusters of duplicates detected by a set of DRRs to speed up the online data reconciliation process. Our experiments of a real-world data collection show the efficiency and effectiveness of our framework.
引用
收藏
页数:21
相关论文
共 50 条
  • [1] Quality-Based Learning for Web Data Classification
    Wu, Ou
    Hu, Ruiguang
    Mao, Xue
    Hu, Weiming
    PROCEEDINGS OF THE TWENTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2014, : 194 - 200
  • [2] Supplier Evaluation with Quality-Based Fuzzy Data
    Chiang, Ching-Yi
    Shu, Ming-Hung
    2009 IEEE INTERNATIONAL CONFERENCE ON INDUSTRIAL ENGINEERING AND ENGINEERING MANAGEMENT, VOLS 1-4, 2009, : 472 - 476
  • [3] Crowdsourcing Quality Control of Online Information: A Quality-Based Cascade Model
    Fu, Wai-Tat
    Liao, Vera
    SOCIAL COMPUTING, BEHAVIORAL-CULTURAL MODELING AND PREDICTION, 2011, 6589 : 147 - 154
  • [4] Quality-Based Framework for Requirement Analysis in Data Warehouse
    Munawar
    Salim, Naomie
    Ibrahim, Roliana
    2014 INTERNATIONAL CONFERENCE OF ADVANCED INFORMATICS: CONCEPT, THEORY AND APPLICATION (ICAICTA), 2014, : 152 - 158
  • [5] Quality-based medicine
    Toole, JF
    ARCHIVES OF NEUROLOGY, 1997, 54 (01) : 23 - 24
  • [6] For a quality-based politics
    Chabot, JM
    ANNALES DE PATHOLOGIE, 2005, 25 (05) : 415 - 416
  • [7] Data quality-based view selection in big data integration system
    Anter S.
    International Journal of Business Intelligence and Data Mining, 2023, 23 (03) : 264 - 276
  • [8] Quality-based learning
    Schnattinger, K
    Hahn, U
    ECAI 1998: 13TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 1998, : 160 - 164
  • [9] Quality-Based Combination of Multi-Source Precipitation Data
    Jurczyk, Anna
    Szturc, Jan
    Otop, Irena
    Osrodka, Katarzyna
    Struzik, Piotr
    REMOTE SENSING, 2020, 12 (11)
  • [10] Quality-based supplier selection and evaluation using fuzzy data
    Shu, Ming-Hung
    Wu, Hsien-Chung
    COMPUTERS & INDUSTRIAL ENGINEERING, 2009, 57 (03) : 1072 - 1079