Record matching in data warehouses: A decision model for data consolidation

被引:9
|
作者
Dey, D [1 ]
机构
[1] Univ Washington, Sch Business, Seattle, WA 98195 USA
关键词
D O I
10.1287/opre.51.2.240.12779
中图分类号
C93 [管理学];
学科分类号
12 ; 1201 ; 1202 ; 120202 ;
摘要
The notion of a data warehouse for integrating operational data into a single repository is rapidly becoming popular in modern organizations. An important issue in the integration process is how to deal with the identifier mismatch problem when combining similar data from disparate sources. A real-world entity may be represented using different identifiers in different operational data sources, and matching them may often be difficult using simple database operations expressed, say, as an SQL query. A record-by-record manual matching is also not practical because the databases may be large. A decision model is presented that combines probability-based automated matching with manual matching in a cost minimization formulation. A heuristic approach is proposed for solving the decision model. Both the model and the heuristic solution approach have been tested on real data. The results from the testing indicate that the model can be effectively used in real-world situations.
引用
收藏
页码:240 / 254
页数:15
相关论文
共 50 条
  • [1] Record matching in data warehouses: A decision model for data consolidation
    Dey, D. (ddey@u.washington.edu), 1600, INFORMS Inst.for Operations Res.and the Management Sciences (51):
  • [2] Data warehouses and decision support systems
    Blaha, M
    COMPUTER, 2001, 34 (12) : 38 - 39
  • [3] Data warehouses-TOLAP-decision making
    Chountas, P
    Vasilakis, C
    El-Darzi, E
    Petrounias, I
    Tseng, A
    2003 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS, VOLS 1-5, CONFERENCE PROCEEDINGS, 2003, : 3876 - 3880
  • [4] Automating the schema matching process for heterogeneous data warehouses
    Banek, Marko
    Vrdoljak, Boris
    Tjoa, A. Min
    Skocir, Zoran
    DATA WAREHOUSING AND KNOWLEDGE DISCOVERY, PROCEEDINGS, 2007, 4654 : 45 - +
  • [5] An Analytical Model for Data Persistence in Business Data Warehouses
    Koeppen, Veit
    Winsemann, Thorsten
    Saake, Gunter
    2015 IEEE 9TH INTERNATIONAL CONFERENCE ON RESEARCH CHALLENGES IN INFORMATION SCIENCE (RCIS), 2015, : 351 - 362
  • [6] Model driven development of Data Warehouses
    Tonkunaite, Jurgita
    Nemuraite, Lina
    Paradauskas, Bronius
    2006 SEVENTH INTERNATIONAL BALTIC CONFERENCE ON DATABASES AND INFORMATION SYSTEMS - PROCEEDINGS, 2006, : 106 - +
  • [7] A general model for the design of data warehouses
    Schneider, Michel
    INTERNATIONAL JOURNAL OF PRODUCTION ECONOMICS, 2008, 112 (01) : 309 - 325
  • [8] USING DATA WAREHOUSES TO OPTIMIZE HEALTHCARE DECISION MAKING
    Brodinova, S.
    Ihl, M.
    Hormayer, V
    Miksch, F.
    Boehm, S.
    Kollmann, I
    Hainz, R.
    Skoumal, M.
    VALUE IN HEALTH, 2019, 22 : S834 - S834
  • [9] Customer and household matching: resolving entity identity in data warehouses
    Berndt, DJ
    Satterfield, RK
    DATA MINING AND KNOWLEDGE DISCOVERY: THEORY, TOOLS, AND TECHNOLOGY II, 2000, 4057 : 173 - 180
  • [10] The dimensional fact model: A conceptual model for data warehouses
    Golfarelli, M
    Maio, D
    Rizzi, S
    INTERNATIONAL JOURNAL OF COOPERATIVE INFORMATION SYSTEMS, 1998, 7 (2-3) : 215 - 247