Cross-Lingual Entity Matching for Heterogeneous Online Wikis

被引:0
|
作者
Lu, Weiming [1 ]
Wang, Peng [1 ]
Wang, Huan [1 ]
Liu, Jiahui [1 ]
Dai, Hao [1 ]
Wei, Baogang [1 ]
机构
[1] Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou, Zhejiang, Peoples R China
关键词
D O I
10.1007/978-3-319-73618-1_78
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Knowledge bases play an increasing important role in many applications. However, many knowledge bases mainly focus on English knowledge, and have only a few knowledge for low-resource languages (LLs). If we can map the entities in LLs to these in high-resource languages (HLs), many knowledge such as relation between entities can be transferred from HLs to LLs. In this paper, we propose an efficient and effective Cross-Lingual Entity Matching approach (CL-EM) to enrich the existing cross-lingual links by learning to rank framework with the learned language-independent features, including cross-lingual topic features and document embedding features. In the experiments, we verified our approach on the existing cross-lingual links between Chinese Wikipedia and English Wikipedia by comparing it with other state-of-art approaches. In addition, we also discovered 141,754 new cross-lingual links between Baidu Baike and English Wikipedia, which almost doubles the number of the existing cross-lingual links.
引用
收藏
页码:887 / 899
页数:13
相关论文
共 50 条
  • [21] Cross-lingual Named Entity List Search via Transliteration
    Khakhmovich, Aleksandr
    Pavlova, Svetlana
    Kirillova, Kira
    Arefyev, Nikolay
    Savilova, Ekaterina
    [J]. PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 4247 - 4255
  • [22] A Study of Neural Matching Models for Cross-lingual IR
    Yu, Puxuan
    Allan, James
    [J]. PROCEEDINGS OF THE 43RD INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '20), 2020, : 1637 - 1640
  • [23] Neural Cross-Lingual Named Entity Recognition with Minimal Resources
    Xie, Jiateng
    Yang, Zhilin
    Neubig, Graham
    Smith, Noah A.
    Carbonell, Jaime
    [J]. 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 369 - 379
  • [24] Medical Crossing: a Cross-lingual Evaluation of Clinical Entity Linking
    Alekseev, Anton
    Miftahutdinov, Zulfat
    Tutubalina, Elena
    Shelmanov, Artem
    Ivanov, Vladimir
    Kokh, Vladimir
    Nesterov, Alexander
    Avetisian, Manvel
    Chertok, Andrey
    Nikolenko, Sergey
    [J]. LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 4212 - 4220
  • [25] Zero-Resource Cross-Lingual Named Entity Recognition
    Bari, M. Saiful
    Joty, Shafiq
    Jwalapuram, Prathyusha
    [J]. THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 7415 - 7423
  • [26] Towards an entity relation extraction framework in the cross-lingual context
    Yu, Chuanming
    Xue, Haodong
    Wang, Manyi
    An, Lu
    [J]. ELECTRONIC LIBRARY, 2021, 39 (03): : 411 - 434
  • [27] Cross-lingual Transfer Learning for Japanese Named Entity Recognition
    Johnson, Andrew
    Karanasou, Penny
    Gaspers, Judith
    Klakow, Dietrich
    [J]. 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES(NAACL HLT 2019), VOL. 2 (INDUSTRY PAPERS), 2019, : 182 - 189
  • [28] Cross-Lingual Transfer Learning for Medical Named Entity Recognition
    Ding, Pengjie
    Wang, Lei
    Liang, Yaobo
    Lu, Wei
    Li, Linfeng
    Wang, Chun
    Tang, Buzhou
    Yan, Jun
    [J]. DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA 2020), PT I, 2020, 12112 : 403 - 418
  • [29] Improving Cross-lingual Entity Alignment via Optimal Transport
    Pei, Shichao
    Yu, Lu
    Zhang, Xiangliang
    [J]. PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 3231 - 3237
  • [30] Heterogeneous Document Embeddings for Cross-Lingual Text Classification
    Moreo, Alejandro
    Pedrotti, Andrea
    Sebastiani, Fabrizio
    [J]. 36TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, SAC 2021, 2021, : 685 - 688