Quality-Based Online Data Reconciliation

被引:2
|
作者
Abboura, Asma [1 ]
Sahri, Soror [2 ]
Baba-Hamed, Latifa [1 ]
Ouziri, Mourad [2 ]
Benbernou, Salima [2 ]
机构
[1] Univ Oran 1, Oran, Algeria
[2] Univ Paris 05, Sorbonnes Paris Cite, LIPADE Lab, Paris, France
关键词
Duplicates; data reconciliation; data quality rules; source quality;
D O I
10.1145/2806888
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
One of the main challenges in data matching and data cleaning, in highly integrated systems, is duplicates detection. While the literature abounds of approaches detecting duplicates corresponding to the same real world entity, most of these approaches tend to eliminate duplicates (wrong information) from the sources, hence leading to what is called data repair. In this article, we propose a framework that automatically detects duplicates at query time and effectively identifies the consistent version of the data, while keeping inconsistent data in the sources. Our framework uses matching dependencies (MDs) to detect duplicates through the concept of data reconciliation rules (DRR) and conditional function dependencies (CFDs) to assess the quality of different attribute values. We also build a duplicate reconciliation index (DRI), based on clusters of duplicates detected by a set of DRRs to speed up the online data reconciliation process. Our experiments of a real-world data collection show the efficiency and effectiveness of our framework.
引用
收藏
页数:21
相关论文
共 50 条
  • [21] Online data reconciliation and error detection
    Albers, JE
    HYDROCARBON PROCESSING, 1997, 76 (07): : 101 - &
  • [22] A Methodology for Quality-Based Selection of Internet Data Sources in Maritime Domain
    Strozyna, Milena
    Eiden, Gerd
    Filipiak, Dominik
    Malyszko, Jacek
    Wecel, Krzysztof
    BUSINESS INFORMATION SYSTEMS (BIS 2016), 2016, 255 : 15 - 27
  • [23] WSEMQT: a novel approach for quality-based evaluation of web data sources for a data warehouse
    Bhutani, Priyanka
    Saha, Anju
    Gosain, Anjana
    IET SOFTWARE, 2020, 14 (07) : 806 - 815
  • [24] Online Monitoring of Catalyst Deactivation Based on Data Reconciliation and Flowsheeting Simulator
    Farsang, Barbara
    Nemeth, Sandor
    Abonyi, Janos
    PERIODICA POLYTECHNICA-CHEMICAL ENGINEERING, 2015, 59 (02) : 145 - 150
  • [25] Batch scheduling with quality-based changeovers
    Brunaud, Braulio
    Perez, Hector D.
    Amaran, Satyajith
    Bury, Scott
    Wassick, John
    Grossmann, Ignacio E.
    COMPUTERS & CHEMICAL ENGINEERING, 2020, 132 (132)
  • [26] Quality-Based Recommendations for Mashup Composition
    Picozzi, Matteo
    Rodolfi, Marta
    Cappiello, Cinzia
    Matera, Maristella
    CURRENT TRENDS IN WEB ENGINEERING, 2010, 6385s : 360 - 371
  • [27] Support for quality-based designed and inspection
    Tervonen, I
    IEEE SOFTWARE, 1996, 13 (01) : 44 - &
  • [28] Quality-based compensation of multimedia objects
    Kanezuka, T
    Higaki, H
    Takizawa, M
    2ND IEEE INTERNATIONAL SYMPOSIUM ON OBJECT-ORIENTED REAL-TIME DISTRIBUTED COMPUTING (ISORC'99), PROCEEDINGS, 1999, : 195 - 202
  • [29] Quality-Based Ranking of Translation Outputs
    Bharti, Nivedita
    Joshi, Nisheeth
    Mathur, Iti
    Katyayan, Pragya
    IT PROFESSIONAL, 2020, 22 (04) : 21 - 27
  • [30] Quality-Based Explanations of Incumbency Effects
    Eggers, Andrew C.
    JOURNAL OF POLITICS, 2017, 79 (04): : 1315 - 1328