The Interaction Between Schema Matching and Record Matching in Data Integration

被引:14
|
作者
Gu, Binbin [1 ]
Li, Zhixu [1 ]
Zhang, Xiangliang [2 ]
Liu, An [1 ]
Liu, Guanfeng [1 ]
Zheng, Kai [1 ]
Zhao, Lei [1 ]
Zhou, Xiaofang [1 ,3 ]
机构
[1] Soochow Univ, Sch Comp Sci & Technol, Suzhou 215006, Jiangsu, Peoples R China
[2] King Abdullah Univ Sci & Technol, Jeddah 239556900, Thuwal, Saudi Arabia
[3] Univ Queensland, Sch Informat Technol & Elect Engn, Brisbane, Qld 4072, Australia
基金
澳大利亚研究理事会;
关键词
Data integration; schema matching; record matching; LINKAGE;
D O I
10.1109/TKDE.2016.2611577
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Schema Matching (SM) and Record Matching (RM) are two necessary steps in integrating multiple relational tables of different schemas, where SM unifies the schemas and RM detects records referring to the same real-world entity. The two processes have been thoroughly studied separately, but few attention has been paid to the interaction of SM and RM. In this work, we find that, even alternating them in a simple manner, SM and RM can benefit from each other to reach a better integration performance (i.e., in terms of precision and recall). Therefore, combining SM and RM is a promising solution for improving data integration. To this end, we define novel matching rules for SM and RM, respectively, that is, every SM decision is made based on intermediate RM results, and vice versa, such that SM and RM can be performed alternately. The quality of integration is guaranteed by a Matching Likelihood Estimation model and the control of semantic drift, which prevent the effect of mismatch magnification. To reduce the computational cost, we design an index structure based on q-grams and a greedy search algorithm that can reduce around 90 percent overhead of the interaction. Extensive experiments on three data collections show that the combination and interaction between SM and RM significantly outperforms previous works that conduct SM and RM separately.
引用
收藏
页码:186 / 199
页数:14
相关论文
共 50 条
  • [1] The Interaction between Schema Matching and Record Matching in Data Integration (Extended Abstract)
    Gu, Binbin
    Li, Zhixu
    Zhang, Xiangliang
    Liu, An
    Liu, Guanfeng
    Zheng, Kai
    Zhao, Lei
    Zhou, Xiaofang
    [J]. 2017 IEEE 33RD INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2017), 2017, : 33 - 34
  • [2] Interaction between Record Matching and Data Repairing
    Fan, Wenfei
    Ma, Shuai
    Tang, Nan
    Yu, Wenyuan
    [J]. ACM JOURNAL OF DATA AND INFORMATION QUALITY, 2014, 4 (04):
  • [3] SmartInt: A Demonstration System for the Interaction Between Schema Mapping and Record Matching
    Jiang, Jun
    Li, Zhixu
    Yang, Qiang
    Zhao, Pengpeng
    Liu, Guanfeng
    Zhao, Lei
    [J]. WEB-AGE INFORMATION MANAGEMENT (WAIM 2015), 2015, 9098 : 587 - 589
  • [4] Schema Matching and Data Integration on Protein Crystallization Screens
    Shrestha, Midusha
    Bhattarai, Bidhan
    Aygun, Ramazan S.
    Pusey, Marc L.
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2017, : 2306 - 2308
  • [5] Schema matching and database integration
    Karasnehl, Yaser
    Ibrahim, Hamidah
    Othman, Mohamed
    Yaakob, Razali
    [J]. World Academy of Science, Engineering and Technology, 2009, 38 : 1205 - 1208
  • [6] Interpreting similarity measures: Bridging the gap between schema matching and data integration
    Gal, Avigdor
    [J]. 2008 IEEE 24TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING WORKSHOP, VOLS 1 AND 2, 2008, : 345 - 352
  • [7] OntoMatch: A Monotonically Improving Schema Matching System for Autonomous Data Integration
    Bhattacharjee, Anupam
    Jamil, Hasan
    [J]. PROCEEDINGS OF THE 2009 IEEE INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION, 2008, : 318 - 323
  • [8] Schema Matching and Data Integration with Consistent Naming on Protein Crystallization Screens
    Shrestha, Midusha
    Tran, Truong X.
    Bhattarai, Bidhan
    Pusey, Marc L.
    Aygun, Ramazan S.
    [J]. IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2020, 17 (06) : 2074 - 2085
  • [9] GSMA: A structural matching algorithm for schema matching in data warehousing
    Cheng, W
    Sun, YF
    [J]. FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, PT 2, PROCEEDINGS, 2005, 3614 : 408 - 411
  • [10] Schema-Matching with Data Dictionaries
    Coen, Gary
    Xue, Ping
    [J]. NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, 2010, 5723 : 62 - 78