EnAli: entity alignment across multiple heterogeneous data sources

被引:0
|
作者
Chao Kong
Ming Gao
Chen Xu
Yunbin Fu
Weining Qian
Aoying Zhou
机构
[1] East China Normal University,School of Data Science and Engineering
[2] Technische Universität Berlin,undefined
来源
关键词
entity alignment; exponential family; locality sensitive hashing; EM-algorithm;
D O I
暂无
中图分类号
学科分类号
摘要
Entity alignment is the problem of identifying which entities in a data source refer to the same real-world entity in the others. Identifying entities across heterogeneous data sources is paramount to many research fields, such as data cleaning, data integration, information retrieval and machine learning. The aligning process is not only overwhelmingly expensive for large data sources since it involves all tuples from two or more data sources, but also need to handle heterogeneous entity attributes. In this paper, we propose an unsupervised approach, called EnAli, to match entities across two or more heterogeneous data sources. EnAli employs a generative probabilistic model to incorporate the heterogeneous entity attributes via employing exponential family, handle missing values, and also utilize the locality sensitive hashing schema to reduce the candidate tuples and speed up the aligning process. EnAli is highly accurate and efficient even without any ground-truth tuples. We illustrate the performance of EnAli on re-identifying entities from the same data source, as well as aligning entities across three real data sources. Our experimental results manifest that our proposed approach outperforms the comparable baseline.
引用
收藏
页码:157 / 169
页数:12
相关论文
共 50 条
  • [1] EnAli: entity alignment across multiple heterogeneous data sources
    Kong, Chao
    Gao, Ming
    Xu, Chen
    Fu, Yunbin
    Qian, Weining
    Zhou, Aoying
    FRONTIERS OF COMPUTER SCIENCE, 2019, 13 (01) : 157 - 169
  • [2] Entity Matching across Heterogeneous Sources
    Yang, Yang
    Sun, Yizhou
    Tang, Jie
    Ma, Bo
    Li, Juanzi
    KDD'15: PROCEEDINGS OF THE 21ST ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2015, : 1395 - 1404
  • [3] Entity matching across heterogeneous data sources: An approach based on constrained cascade generalization
    Zhao, Huimin
    Ram, Sudha
    DATA & KNOWLEDGE ENGINEERING, 2008, 66 (03) : 368 - 381
  • [4] Information sharing among multiple heterogeneous data sources distributed across the Internet
    Ram, S
    Ramesh, V
    PROCEEDINGS OF THE THIRTY-FIRST HAWAII INTERNATIONAL CONFERENCE ON SYSTEM SCIENCES, VOL IV: INTERNET AND THE DIGITAL ECONOMY TRACT, 1998, : 504 - 504
  • [5] Semantic matching across heterogeneous data sources
    Zhao, Huimin
    COMMUNICATIONS OF THE ACM, 2007, 50 (01) : 45 - 50
  • [6] ConnectionLens: Finding Connections Across Heterogeneous Data Sources
    Chanial, Camille
    Dziri, Redouane
    Galhardas, Helena
    Leblay, Julien
    Minh-Huong Le Nguyen
    Manolescu, Ioana
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2018, 11 (12): : 2030 - 2033
  • [7] Early Integration Testing for Entity Reconciliation in the Context of Heterogeneous Data Sources
    Blanco, Raquel
    Enriquez, Jose G.
    Dominguez-Mayo, Francisco J.
    Escalona, M. J.
    Tuya, Javier
    IEEE TRANSACTIONS ON RELIABILITY, 2018, 67 (02) : 538 - 556
  • [8] Conceptual Framework for entity integration from multiple data sources
    Orescanin, Drazen
    Tan, Ran
    Ao, Jing
    2019 42ND INTERNATIONAL CONVENTION ON INFORMATION AND COMMUNICATION TECHNOLOGY, ELECTRONICS AND MICROELECTRONICS (MIPRO), 2019, : 1232 - 1237
  • [9] Discovering Conflicts of Interest across Heterogeneous Data Sources with ConnectionLens
    Anadiotis, Angelos-Christos
    Balalau, Oana
    Bouganim, Theo
    Chimienti, Francesco
    Galhardas, Helena
    Haddad, Mhd-Yamen
    Horel, Stephane
    Manolescu, Ioana
    Youssef, Youssr
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, CIKM 2021, 2021, : 4670 - 4674
  • [10] Research on Semantic Integration across Heterogeneous Data Sources in Grid
    Liu, Guofeng
    Huang, Shaobin
    Cheng, Yuan
    FRONTIERS IN COMPUTER EDUCATION, 2012, 133 : 397 - 404