Efficient data reconciliation

被引:47
|
作者
Cochinwala, M
Kurien, V
Lalk, G
Shasha, D
机构
[1] Telcordia Technol Inc, Morristown, NJ 07960 USA
[2] Niksun Inc, N Brunswick, NJ 08902 USA
[3] NYU, New York, NY 10012 USA
关键词
D O I
10.1016/S0020-0255(00)00070-0
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Data reconciliation is the process of matching records across different databases. Data reconciliation requires "joining" on fields that have traditionally been non-key fields. Generally, the operational databases are of sufficient quality for the purposes for which they were initially designed but since the data in the different databases do not have a canonical structure and may have errors, approximate matching algorithms are required. Approximate matching algorithms can have many different parameter settings. The number of parameters will affect the complexity of the algorithm due to the number of comparisons needed to identify matching records across different datasets. For large datasets that are prevalent in data warehouses, the increased complexity may result in impractical solutions. In this paper, we describe an efficient method for data reconciliation. Our main contribution is the incorporation of machine learning and statistical techniques to reduce the complexity of the matching algorithms via identification and elimination of redundant or useless parameters. We have conducted experiments on actual data that demonstrate the validity of our techniques. In our experiments, the techniques reduced complexity by 50% while significantly increasing matching accuracy. (C) 2001 Telcordia Technologies Inc. Published by Elsevier Science Inc. All rights reserved.
引用
收藏
页码:1 / 15
页数:15
相关论文
共 50 条
  • [1] An efficient formulation for batch reactor data reconciliation
    Fillon, M
    Meyer, M
    Pingaud, H
    Enjalbert, M
    [J]. INDUSTRIAL & ENGINEERING CHEMISTRY RESEARCH, 1996, 35 (07) : 2288 - 2298
  • [2] Efficient Tag Grouping via Collision Reconciliation and Data Compression
    Wang, Xia
    Liu, Jia
    Wang, Yanyan
    Chen, Xingyu
    Chen, Lijun
    [J]. IEEE TRANSACTIONS ON MOBILE COMPUTING, 2021, 20 (05) : 1817 - 1831
  • [3] ALGEBRA OF DATA RECONCILIATION
    Csirmaz, Elod P.
    Csirmaz, Laszlo
    [J]. STUDIA SCIENTIARUM MATHEMATICARUM HUNGARICA, 2022, 59 (3-4) : 209 - 231
  • [4] EFFICIENT DATA RECONCILIATION AND ESTIMATION FOR DYNAMIC PROCESSES USING NONLINEAR-PROGRAMMING TECHNIQUES
    LIEBMAN, MJ
    EDGAR, TF
    LASDON, LS
    [J]. COMPUTERS & CHEMICAL ENGINEERING, 1992, 16 (10-11) : 963 - 986
  • [5] An efficient reconciliation algorithm for social networks
    Korula, Nitish
    Lattanzi, Silvio
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2014, 7 (05): : 377 - 388
  • [6] EUCALYPT: efficient tree reconciliation enumerator
    Beatrice Donati
    Christian Baudet
    Blerina Sinaimeri
    Pierluigi Crescenzi
    Marie-France Sagot
    [J]. Algorithms for Molecular Biology, 10
  • [7] EUCALYPT: efficient tree reconciliation enumerator
    Donati, Beatrice
    Baudet, Christian
    Sinaimeri, Blerina
    Crescenzi, Pierluigi
    Sagot, Marie-France
    [J]. ALGORITHMS FOR MOLECULAR BIOLOGY, 2015, 10
  • [8] Data reconciliation - Progress and challenges
    Crowe, CM
    [J]. JOURNAL OF PROCESS CONTROL, 1996, 6 (2-3) : 89 - 98
  • [9] Data reconciliation - progress and challenges
    McMaster Univ, Hamilton, Canada
    [J]. J Process Control, 2-3 (89-98):
  • [10] DATA RECONCILIATION WITH UNMEASURED VARIABLES
    ALBERS, JE
    [J]. HYDROCARBON PROCESSING, 1994, 73 (03): : 65 - 66