Record matching in data warehouses: A decision model for data consolidation

被引:9
|
作者
Dey, D [1 ]
机构
[1] Univ Washington, Sch Business, Seattle, WA 98195 USA
关键词
D O I
10.1287/opre.51.2.240.12779
中图分类号
C93 [管理学];
学科分类号
12 ; 1201 ; 1202 ; 120202 ;
摘要
The notion of a data warehouse for integrating operational data into a single repository is rapidly becoming popular in modern organizations. An important issue in the integration process is how to deal with the identifier mismatch problem when combining similar data from disparate sources. A real-world entity may be represented using different identifiers in different operational data sources, and matching them may often be difficult using simple database operations expressed, say, as an SQL query. A record-by-record manual matching is also not practical because the databases may be large. A decision model is presented that combines probability-based automated matching with manual matching in a cost minimization formulation. A heuristic approach is proposed for solving the decision model. Both the model and the heuristic solution approach have been tested on real data. The results from the testing indicate that the model can be effectively used in real-world situations.
引用
收藏
页码:240 / 254
页数:15
相关论文
共 50 条
  • [21] Multidimensional data modeling for data warehouses
    Harbin Inst of Technology, Harbin, China
    Ruan Jian Xue Bao/Journal of Software, 2000, 11 (07): : 908 - 917
  • [22] Populating Data Warehouses with Semantic Data
    Nebot, V.
    Berlanga, R.
    IEEE LATIN AMERICA TRANSACTIONS, 2010, 8 (02) : 150 - 157
  • [23] Identifying data sources for data warehouses
    Koncilia, C
    Pozewaunig, H
    INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2002, 2002, 2412 : 213 - 218
  • [24] An Ensemble Approach for Record Matching in Data Linkage
    Poon, Simon K.
    Poon, Josiah
    Lam, Mary K.
    Yin, Qinglan
    Sze, Daniel M-Y.
    Wu, Justin C. Y.
    Mok, Vincent C. T.
    Ching, Jessica Y. L.
    Chan, Kam-Leung
    Cheung, William H. N.
    Lau, Alexander Y.
    DIGITAL HEALTH INNOVATION FOR CONSUMERS, CLINICIANS, CONNECTIVITY AND COMMUNITY, 2016, 227 : 113 - 119
  • [25] A FRAMEWORK FOR DATA CLEANING IN DATA WAREHOUSES
    Peng, Taoxin
    ICEIS 2008: PROCEEDINGS OF THE TENTH INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS, VOL DISI: DATABASES AND INFORMATION SYSTEMS INTEGRATION, 2008, : 473 - 478
  • [26] Querying Compressed Data in Data Warehouses
    Anindya Datta
    Helen Thomas
    Information Technology and Management, 2002, 3 (4) : 353 - 386
  • [27] Interaction between Record Matching and Data Repairing
    Fan, Wenfei
    Ma, Shuai
    Tang, Nan
    Yu, Wenyuan
    ACM JOURNAL OF DATA AND INFORMATION QUALITY, 2014, 4 (04):
  • [28] DATA ANALYTICAL PROCESSING IN DATA WAREHOUSES
    Rostek, Katarzyna
    FOUNDATIONS OF MANAGEMENT, 2010, 2 (01) : 99 - 116
  • [29] Data mining and data warehouses - An overview
    Gray, P
    ASSOCIATION FOR INFORMATION SYSTEMS PROCEEDING OF THE AMERICAS CONFERENCE ON INFORMATION SYSTEMS, 1997, : 857 - 859
  • [30] Minimizing detail data in data warehouses
    Akinde, MO
    Jensen, OG
    Böhlen, MH
    ADVANCES IN DATABASE TECHNOLOGY - EDBT'98, 1998, 1377 : 293 - 307