The Interaction Between Schema Matching and Record Matching in Data Integration

被引:14
|
作者
Gu, Binbin [1 ]
Li, Zhixu [1 ]
Zhang, Xiangliang [2 ]
Liu, An [1 ]
Liu, Guanfeng [1 ]
Zheng, Kai [1 ]
Zhao, Lei [1 ]
Zhou, Xiaofang [1 ,3 ]
机构
[1] Soochow Univ, Sch Comp Sci & Technol, Suzhou 215006, Jiangsu, Peoples R China
[2] King Abdullah Univ Sci & Technol, Jeddah 239556900, Thuwal, Saudi Arabia
[3] Univ Queensland, Sch Informat Technol & Elect Engn, Brisbane, Qld 4072, Australia
基金
澳大利亚研究理事会;
关键词
Data integration; schema matching; record matching; LINKAGE;
D O I
10.1109/TKDE.2016.2611577
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Schema Matching (SM) and Record Matching (RM) are two necessary steps in integrating multiple relational tables of different schemas, where SM unifies the schemas and RM detects records referring to the same real-world entity. The two processes have been thoroughly studied separately, but few attention has been paid to the interaction of SM and RM. In this work, we find that, even alternating them in a simple manner, SM and RM can benefit from each other to reach a better integration performance (i.e., in terms of precision and recall). Therefore, combining SM and RM is a promising solution for improving data integration. To this end, we define novel matching rules for SM and RM, respectively, that is, every SM decision is made based on intermediate RM results, and vice versa, such that SM and RM can be performed alternately. The quality of integration is guaranteed by a Matching Likelihood Estimation model and the control of semantic drift, which prevent the effect of mismatch magnification. To reduce the computational cost, we design an index structure based on q-grams and a greedy search algorithm that can reduce around 90 percent overhead of the interaction. Extensive experiments on three data collections show that the combination and interaction between SM and RM significantly outperforms previous works that conduct SM and RM separately.
引用
收藏
页码:186 / 199
页数:14
相关论文
共 50 条
  • [31] Research of matching technology in data integration
    College of Computer and Information Engineering, Hohai University, Nanjing 210098, China
    Jisuanji Gongcheng, 2006, 6 (40-41):
  • [32] Ontology-based GML schema matching for spatial information integration
    Guan, JH
    Zhou, SG
    Chen, JP
    Chen, XL
    An, Y
    Yu, W
    Wang, R
    Liu, XJ
    2003 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-5, PROCEEDINGS, 2003, : 2240 - 2245
  • [33] Schema label normalization for improving schema matching
    Sorrentino, Serena
    Bergamaschi, Sonia
    Gawinecki, Maciej
    Po, Laura
    DATA & KNOWLEDGE ENGINEERING, 2010, 69 (12) : 1254 - 1273
  • [34] Schema homomorphism - An algebraic framework for schema matching
    Zhang, Z
    Che, HY
    Shi, PF
    Sun, Y
    Gu, J
    ADVANCES IN COMPUTER SCIENCE - ASIAN 2005, PROCEEDINGS: DATA MANAGEMENT ON THE WEB, 2005, 3818 : 255 - 256
  • [35] Unsupervised record matching with noisy and incomplete data
    van Gennip Y.
    Hunter B.
    Ma A.
    Moyer D.
    de Vera R.
    Bertozzi A.L.
    van Gennip, Yves (y.vangennip@nottingham.ac.uk), 2018, Springer Science and Business Media Deutschland GmbH (06) : 109 - 129
  • [36] An Ensemble Approach for Record Matching in Data Linkage
    Poon, Simon K.
    Poon, Josiah
    Lam, Mary K.
    Yin, Qinglan
    Sze, Daniel M-Y.
    Wu, Justin C. Y.
    Mok, Vincent C. T.
    Ching, Jessica Y. L.
    Chan, Kam-Leung
    Cheung, William H. N.
    Lau, Alexander Y.
    DIGITAL HEALTH INNOVATION FOR CONSUMERS, CLINICIANS, CONNECTIVITY AND COMMUNITY, 2016, 227 : 113 - 119
  • [37] An algebraic framework for schema matching
    Zhang, Zhi
    Shi, Pengfei
    Che, Haoyang
    Gu, Jun
    INFORMATICA, 2008, 19 (03) : 421 - 446
  • [38] An algebraic framework for schema matching
    Zhang, Z
    Che, HY
    Shi, PF
    Sun, Y
    Gu, J
    ADVANCES IN WEB-AGE INFORMATION MANAGEMENT, PROCEEDINGS, 2005, 3739 : 694 - 699
  • [39] Database conceptual schema matching
    Casanova, Marco A.
    Breitman, Karin K.
    Brauner, Daniela F.
    Marins, Andre L. A.
    COMPUTER, 2007, 40 (10) : 102 - 104
  • [40] Soundness of schema matching methods
    Benerecetti, M
    Bouquet, P
    Zanobini, S
    SEMANTIC WEB: RESEARCH AND APPLICATIONS, PROCEEDINGS, 2005, 3532 : 211 - 225