Entity Matching across Heterogeneous Sources

被引:21
|
作者
Yang, Yang [1 ]
Sun, Yizhou [3 ]
Tang, Jie [1 ,2 ]
Ma, Bo [4 ]
Li, Juanzi [1 ]
机构
[1] Tsinghua Univ, Dept Comp Sci & Technol, Beijing, Peoples R China
[2] Tsinghua Natl Lab Informat Sci & Technol TNList, Beijing, Peoples R China
[3] Northeastern Univ, Dept Comp Sci, Boston, MA 02115 USA
[4] Carnegie Mellon Univ, Dept Comp Sci, Pittsburgh, PA 15213 USA
基金
美国国家科学基金会;
关键词
Heterogeneous sources; Cross-lingual matching; Topic model;
D O I
10.1145/2783258.2783353
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Given an entity in a source domain, finding its matched entities from another (target) domain is an important task in many applications. Traditionally, the problem was usually addressed by first extracting major keywords corresponding to the source entity and then query relevant entities from the target domain using those keywords. However, the method would inevitably fails if the two domains have less or no overlapping in the content. An extreme case is that the source domain is in English and the target domain is in Chinese. In this paper, we formalize the problem as entity matching across heterogeneous sources and propose a probabilistic topic model to solve the problem. The model integrates the topic extraction and entity matching, two core subtasks for dealing with the problem, into a unified model. Specifically, for handling the text disjointing problem, we use a cross-sampling process in our model to extract topics with terms coming from all the sources, and leverage existing matching relations through latent topic layers instead of at text layers. Benefit from the proposed model, we can not only find the matched documents for a query entity, but also explain why these documents are related by showing the common topics they share. Our experiments in two real-world applications show that the proposed model can extensively improve the matching performance (+19.8% and +7.1% in two applications respectively) compared with several alternative methods.
引用
收藏
页码:1395 / 1404
页数:10
相关论文
共 50 条
  • [11] Holistic Entity Matching Across Knowledge Graphs
    Pershina, Maria
    Yakout, Mohamed
    Chakrabarti, Kaushik
    PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2015, : 1585 - 1590
  • [12] Project entity matching across FLOSS repositories
    Conklin, Megan
    OPEN SOURCE DEVELOPMENT, ADOPTION AND INNOVATION, 2007, 234 : 45 - 57
  • [13] Account Matching Across Heterogeneous Networks
    Liu, Qiang
    Li, Jingyuan
    Wang, Yuanzhuo
    Xing, Guoliang
    Ren, Yan
    2014 5th International Conference on Game Theory for Networks (GAMENETS), 2014,
  • [14] Probabilistic decision model for entity matching in heterogeneous databases
    Univ of Washington, Seattle, United States
    Manage Sci, 10 (1379-1395):
  • [15] A probabilistic decision model for entity matching in heterogeneous databases
    Dey, D
    Sarkar, S
    De, P
    MANAGEMENT SCIENCE, 1998, 44 (10) : 1379 - 1395
  • [16] Entity matching in heterogeneous databases: A logistic regression approach
    Dey, Debabrata
    DECISION SUPPORT SYSTEMS, 2008, 44 (03) : 740 - 747
  • [17] A Semantic Matching Approach for Mediating Heterogeneous Sources
    Schneider, Michel
    Bejaoui, Lotfi
    Bertin, Guillaume
    METADATA AND SEMANTICS, 2009, : 537 - +
  • [18] Deep Indexed Active Learning for Matching Heterogeneous Entity Representations
    Jain, Arjit
    Sarawagi, Sunita
    Sen, Prithviraj
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2021, 15 (01): : 31 - 45
  • [19] Word Embedding based Heterogeneous Entity Matching on Web of Things
    Xue, Xingsi
    Guo, Jianhua
    COMPANION PROCEEDINGS OF THE WEB CONFERENCE 2022, WWW 2022 COMPANION, 2022, : 941 - 947
  • [20] Cross-Lingual Entity Matching for Heterogeneous Online Wikis
    Lu, Weiming
    Wang, Peng
    Wang, Huan
    Liu, Jiahui
    Dai, Hao
    Wei, Baogang
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, NLPCC 2017, 2018, 10619 : 887 - 899