Record matching in data warehouses: A decision model for data consolidation

被引:9
|
作者
Dey, D [1 ]
机构
[1] Univ Washington, Sch Business, Seattle, WA 98195 USA
关键词
D O I
10.1287/opre.51.2.240.12779
中图分类号
C93 [管理学];
学科分类号
12 ; 1201 ; 1202 ; 120202 ;
摘要
The notion of a data warehouse for integrating operational data into a single repository is rapidly becoming popular in modern organizations. An important issue in the integration process is how to deal with the identifier mismatch problem when combining similar data from disparate sources. A real-world entity may be represented using different identifiers in different operational data sources, and matching them may often be difficult using simple database operations expressed, say, as an SQL query. A record-by-record manual matching is also not practical because the databases may be large. A decision model is presented that combines probability-based automated matching with manual matching in a cost minimization formulation. A heuristic approach is proposed for solving the decision model. Both the model and the heuristic solution approach have been tested on real data. The results from the testing indicate that the model can be effectively used in real-world situations.
引用
收藏
页码:240 / 254
页数:15
相关论文
共 50 条
  • [41] A probabilistic data model and algebra for location-based data warehouses and their implementation
    Igor Timko
    Curtis Dyreson
    Torben Bach Pedersen
    GeoInformatica, 2014, 18 : 357 - 403
  • [42] Mobility Data Warehouses
    Vaisman, Alejandro
    Zimanyi, Esteban
    ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION, 2019, 8 (04)
  • [43] Caring for data warehouses
    Strategic Systems, 1997, 10 (02):
  • [44] Complements for data warehouses
    Laurent, D
    Lechtenbörger, J
    Spyratos, N
    Vossen, G
    15TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 1999, : 490 - 499
  • [45] A Model Driven Process for Spatial Data Sources and Spatial Data Warehouses Reconcilation
    Glorio, Octavio
    Mazon, Jose-Norberto
    Trujillo, Juan
    COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2010, PT 1, PROCEEDINGS, 2010, 6016 : 461 - 475
  • [46] Designing data warehouses
    Theodoratos, D
    Sellis, T
    DATA & KNOWLEDGE ENGINEERING, 1999, 31 (03) : 279 - 301
  • [47] Deductive Data Warehouses
    Rabuzin, Kornelije
    INTERNATIONAL JOURNAL OF DATA WAREHOUSING AND MINING, 2014, 10 (01) : 16 - 31
  • [48] Logical Representation of a Conceptual Model for Spatial Data Warehouses
    Elzbieta Malinowski
    Esteban Zimányi
    GeoInformatica, 2007, 11 : 431 - 457
  • [49] Multimedia data warehouses: a multiversion model and a medical application
    Arigon, Anne-Muriel
    Miquel, Maryvonne
    Tchounikine, Anne
    MULTIMEDIA TOOLS AND APPLICATIONS, 2007, 35 (01) : 91 - 108
  • [50] Logical design of multi-model data warehouses
    Sandro Bimonte
    Enrico Gallinucci
    Patrick Marcel
    Stefano Rizzi
    Knowledge and Information Systems, 2023, 65 : 1067 - 1103