OAG: Toward Linking Large-scale Heterogeneous Entity Graphs

被引:59
|
作者
Zhang, Fanjin [1 ]
Liu, Xiao [1 ]
Tang, Jie [1 ]
Dong, Yuxiao [2 ]
Yao, Peiran [1 ]
Zhang, Jie [1 ]
Gu, Xiaotao [1 ,3 ,4 ]
Wang, Yan [1 ]
Shao, Bin [2 ]
Li, Rui [2 ,3 ,4 ]
Wang, Kuansan [2 ]
机构
[1] Tsinghua Univ, Beijing Natl Res Ctr Informat Sci & Technol, Beijing, Peoples R China
[2] Microsoft Res, Redmond, WA USA
[3] UIUC, Champaign, IL USA
[4] Google, Menlo Pk, CA USA
关键词
Entity Linking; Name Ambiguity; Heterogeneous Networks; OAG;
D O I
10.1145/3292500.3330785
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Linking entities from different sources is a fundamental task in building open knowledge graphs. Despite much research conducted in related fields, the challenges of linking large-scale heterogeneous entity graphs are far from resolved. Employing two billion-scale academic entity graphs (Microsoft Academic Graph and AMiner) as sources for our study, we propose a unified framework - LinKG - to address the problem of building a large-scale linked entity graph. LinKG is coupled with three linking modules, each of which addresses one category of entities. To link word-sequence-based entities (e.g., venues), we present a long short-term memory network-based method for capturing the dependencies. To link large-scale entities (e.g., papers), we leverage locality-sensitive hashing and convolutional neural networks for scalable and precise linking. To link entities with ambiguity (e.g., authors), we propose heterogeneous graph attention networks to model different types of entities. Our extensive experiments and systematical analysis demonstrate that LinKG can achieve linking accuracy with an F1-score of 0.9510, significantly outperforming the state-of-the-art. LinKG has been deployed to Microsoft Academic Search and AMiner to integrate the two large graphs. We have published the linked results-the Open Academic Graph (OAG)(1), making it the largest publicly available heterogeneous academic graph to date.
引用
收藏
页码:2585 / 2595
页数:11
相关论文
共 50 条
  • [1] OAG: Linking Entities Across Large-Scale Heterogeneous Knowledge Graphs
    Zhang, Fanjin
    Liu, Xiao
    Tang, Jie
    Dong, Yuxiao
    Yao, Peiran
    Zhang, Jie
    Gu, Xiaotao
    Wang, Yan
    Kharlamov, Evgeny
    Shao, Bin
    Li, Rui
    Wang, Kuansan
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (09) : 9225 - 9239
  • [2] Generating Large-Scale Heterogeneous Graphs for Benchmarking
    Gupta, Amarnath
    [J]. SPECIFYING BIG DATA BENCHMARKS, 2014, 8163 : 113 - 128
  • [3] Large-scale neural biomedical entity linking with layer overwriting
    Tsujimura, Tomoki
    Miwa, Makoto
    Sasaki, Yutaka
    [J]. JOURNAL OF BIOMEDICAL INFORMATICS, 2023, 143
  • [4] SEED: A System for Entity Exploration and Debugging in Large-Scale Knowledge Graphs
    Chen, Jun
    Chen, Yueguo
    Du, Xiaoyong
    Zhang, Xiangling
    Zhou, Xuan
    [J]. 2016 32ND IEEE INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2016, : 1350 - 1353
  • [5] Constructing the Three Graphs for the Large-Scale Heterogeneous Information System
    Li, Bing
    [J]. 2016 IEEE 2ND INTERNATIONAL CONFERENCE ON COLLABORATION AND INTERNET COMPUTING (IEEE CIC), 2016, : 292 - 303
  • [6] Routing the Social Graphs for the Large-Scale Heterogeneous Information Accessing
    Li, Bing
    [J]. 2016 SIXTH INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND TECHNOLOGY (ICIST), 2016, : 122 - 131
  • [7] A collective entity linking algorithm with parallel computing on large-scale knowledge base
    Xia, Yingchun
    Wang, Xingyue
    Gu, Lichuan
    Gao, Qijuan
    Jiao, Jun
    Wang, Chao
    [J]. JOURNAL OF SUPERCOMPUTING, 2020, 76 (02): : 948 - 963
  • [8] A collective entity linking algorithm with parallel computing on large-scale knowledge base
    Yingchun Xia
    Xingyue Wang
    Lichuan Gu
    Qijuan Gao
    Jun Jiao
    Chao Wang
    [J]. The Journal of Supercomputing, 2020, 76 : 948 - 963
  • [9] Toward Tweet Entity Linking With Heterogeneous Information Networks
    Shen, Wei
    Yin, Yuwei
    Yang, Yang
    Han, Jiawei
    Wang, Jianyong
    Yuan, Xiaojie
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2022, 34 (12) : 6003 - 6017
  • [10] Integrating Manifold Knowledge for Global Entity Linking with Heterogeneous Graphs
    Chen, Zhibin
    Wu, Yuting
    Feng, Yansong
    Zhao, Dongyan
    [J]. KNOWLEDGE GRAPH AND SEMANTIC COMPUTING: KNOWLEDGE GRAPH EMPOWERS NEW INFRASTRUCTURE CONSTRUCTION, 2021, 1466 : 91 - 103