Large-scale neural biomedical entity linking with layer overwriting

被引:2
|
作者
Tsujimura, Tomoki [1 ]
Miwa, Makoto [1 ]
Sasaki, Yutaka [1 ]
机构
[1] Toyota Technol Inst, Computat Intelligence Lab, 2-12-1 Hisakata,Tempaku Ku, Nagoya, Aichi 4688511, Japan
关键词
Natural language processing; Entity linking; Cosine similarity; Data augmentation; Layer overwriting; BLAST;
D O I
10.1016/j.jbi.2023.104433
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Motivation: Entity linking is the task of linking entity mentions to the database entries corresponding to the entity mentions. Entity linking enables the treatment of superficially different but semantically identical mentions as the same entity. Since millions of concepts are listed in biomedical databases, selecting the correct database entry for each targeted entity is challenging. Simple string matching between the word and each synonym in biomedical databases is insufficient to handle a wide variety of variants of biomedical entities appearing in the biomedical literature. Recent progress in neural approaches is promising for entity linking. Still, existing neural methods require sufficient data, which is difficult to prepare in biomedical entity linking that deals with millions of biomedical concepts. Therefore, we need to develop a new neural method to train entity-linking models over the sparse training data covering a very limited part of the biomedical concepts. Results: We have devised a pure neural model that classifies biomedical entity mentions into millions of biomedical concepts. The classifier employs (1) the layer overwriting that breaks through the performance ceiling during training, (2) training data augmentation using database entries that compensate for the problem of insufficient training data, and (3) the cosine similarity-based loss function that helps distinguish the millions of biomedical concepts. Our system using the proposed classifier was ranked first in the official run of the National NLP Clinical Challenges (n2c2) 2019 Track 3, which targeted linking medical/clinical entity mentions to 434,056 Concept Unique Identifier (CUI) entries. We also applied our system to the MedMentions dataset, which has 3.2M candidate concepts. Experimental results confirmed the same advantages of our proposed method. We further evaluated our system on the NLM-CHEM corpus with 350K candidate concepts, and our system achieved a new state-of-the-art performance on the corpus.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Large-scale entity representation learning for biomedical relationship extraction
    Saenger, Mario
    Leser, Ulf
    [J]. BIOINFORMATICS, 2021, 37 (02) : 236 - 242
  • [2] OAG: Toward Linking Large-scale Heterogeneous Entity Graphs
    Zhang, Fanjin
    Liu, Xiao
    Tang, Jie
    Dong, Yuxiao
    Yao, Peiran
    Zhang, Jie
    Gu, Xiaotao
    Wang, Yan
    Shao, Bin
    Li, Rui
    Wang, Kuansan
    [J]. KDD'19: PROCEEDINGS OF THE 25TH ACM SIGKDD INTERNATIONAL CONFERENCCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2019, : 2585 - 2595
  • [3] A Lightweight Neural Model for Biomedical Entity Linking
    Chen, Lihu
    Varoquaux, Gael
    Suchanek, Fabian M.
    [J]. THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 12657 - 12665
  • [4] A collective entity linking algorithm with parallel computing on large-scale knowledge base
    Yingchun Xia
    Xingyue Wang
    Lichuan Gu
    Qijuan Gao
    Jun Jiao
    Chao Wang
    [J]. The Journal of Supercomputing, 2020, 76 : 948 - 963
  • [5] A collective entity linking algorithm with parallel computing on large-scale knowledge base
    Xia, Yingchun
    Wang, Xingyue
    Gu, Lichuan
    Gao, Qijuan
    Jiao, Jun
    Wang, Chao
    [J]. JOURNAL OF SUPERCOMPUTING, 2020, 76 (02): : 948 - 963
  • [6] Large-Scale Collective Entity Matching
    Rastogi, Vibhor
    Dalvi, Nilesh
    Garofalakis, Minos
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2011, 4 (04): : 208 - 218
  • [7] Entity linking for biomedical literature
    Jin G Zheng
    Daniel Howsmon
    Boliang Zhang
    Juergen Hahn
    Deborah McGuinness
    James Hendler
    Heng Ji
    [J]. BMC Medical Informatics and Decision Making, 15
  • [8] Entity linking for biomedical literature
    Zheng, Jin G.
    Howsmon, Daniel
    Zhang, Boliang
    Hahn, Juergen
    McGuinness, Deborah
    Hendler, James
    Ji, Heng
    [J]. BMC MEDICAL INFORMATICS AND DECISION MAKING, 2015, 15
  • [9] Generating a Large-Scale Entity Linking Dictionary from Wikipedia Link Structure and Article Text
    Harige, Ravindra
    Buitelaar, Paul
    [J]. LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 2431 - 2434
  • [10] Active Learning for Large-Scale Entity Resolution
    Qian, Kun
    Popa, Lucian
    Sen, Prithviraj
    [J]. CIKM'17: PROCEEDINGS OF THE 2017 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2017, : 1379 - 1388