Large-scale neural biomedical entity linking with layer overwriting

被引:2
|
作者
Tsujimura, Tomoki [1 ]
Miwa, Makoto [1 ]
Sasaki, Yutaka [1 ]
机构
[1] Toyota Technol Inst, Computat Intelligence Lab, 2-12-1 Hisakata,Tempaku Ku, Nagoya, Aichi 4688511, Japan
关键词
Natural language processing; Entity linking; Cosine similarity; Data augmentation; Layer overwriting; BLAST;
D O I
10.1016/j.jbi.2023.104433
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Motivation: Entity linking is the task of linking entity mentions to the database entries corresponding to the entity mentions. Entity linking enables the treatment of superficially different but semantically identical mentions as the same entity. Since millions of concepts are listed in biomedical databases, selecting the correct database entry for each targeted entity is challenging. Simple string matching between the word and each synonym in biomedical databases is insufficient to handle a wide variety of variants of biomedical entities appearing in the biomedical literature. Recent progress in neural approaches is promising for entity linking. Still, existing neural methods require sufficient data, which is difficult to prepare in biomedical entity linking that deals with millions of biomedical concepts. Therefore, we need to develop a new neural method to train entity-linking models over the sparse training data covering a very limited part of the biomedical concepts. Results: We have devised a pure neural model that classifies biomedical entity mentions into millions of biomedical concepts. The classifier employs (1) the layer overwriting that breaks through the performance ceiling during training, (2) training data augmentation using database entries that compensate for the problem of insufficient training data, and (3) the cosine similarity-based loss function that helps distinguish the millions of biomedical concepts. Our system using the proposed classifier was ranked first in the official run of the National NLP Clinical Challenges (n2c2) 2019 Track 3, which targeted linking medical/clinical entity mentions to 434,056 Concept Unique Identifier (CUI) entries. We also applied our system to the MedMentions dataset, which has 3.2M candidate concepts. Experimental results confirmed the same advantages of our proposed method. We further evaluated our system on the NLM-CHEM corpus with 350K candidate concepts, and our system achieved a new state-of-the-art performance on the corpus.
引用
收藏
页数:11
相关论文
共 50 条
  • [31] Evaluation of a large-scale biomedical data annotation initiative
    Ronilda Lacson
    Erik Pitzer
    Christian Hinske
    Pedro Galante
    Lucila Ohno-Machado
    [J]. BMC Bioinformatics, 10
  • [32] Evaluation of a large-scale biomedical data annotation initiative
    Lacson, Ronilda
    Pitzer, Erik
    Hinske, Christian
    Galante, Pedro
    Ohno-Machado, Lucila
    [J]. BMC BIOINFORMATICS, 2009, 10
  • [33] An overview of biomedical entity linking throughout the years
    French, Evan
    McInnes, Bridget T.
    [J]. JOURNAL OF BIOMEDICAL INFORMATICS, 2023, 137
  • [34] Large-Scale Reasoning over Functions in Biomedical Ontologies
    Hoehndorf, Robert
    Mencel, Liam
    Gkoutos, Georgios V.
    Schofield, Paul N.
    [J]. FORMAL ONTOLOGY IN INFORMATION SYSTEMS, 2016, 283 : 299 - 312
  • [35] BioRel: A Large-Scale Dataset for Biomedical Relation Extraction
    Xing, Rui
    Luo, Jie
    Song, Tengwei
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2019, : 1801 - 1808
  • [36] Topological analysis of large-scale biomedical terminology structures
    Bales, Michael E.
    Lussier, Yves A.
    Johnson, Stephen B.
    [J]. JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2007, 14 (06) : 788 - 797
  • [37] BioRel: towards large-scale biomedical relation extraction
    Xing, Rui
    Luo, Jie
    Song, Tengwei
    [J]. BMC BIOINFORMATICS, 2020, 21 (Suppl 16)
  • [38] BioRel: towards large-scale biomedical relation extraction
    Rui Xing
    Jie Luo
    Tengwei Song
    [J]. BMC Bioinformatics, 21
  • [39] Large-scale biomedical image analysis in grid environments
    Kumar, Vijay S.
    Rutt, Benjamin
    Kurc, Tahsin
    Catalyurek, Umit V.
    Pan, Tony C.
    Chow, Sunny
    Lamont, Stephan
    Martone, Maryann
    Saltz, Joel H.
    [J]. IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE, 2008, 12 (02): : 154 - 161
  • [40] Interactive Histology of Large-Scale Biomedical Image Stacks
    Jeong, Won-Ki
    Schneider, Jens
    Turney, Stephen G.
    Faulkner-Jones, Beverly E.
    Meyer, Dominik
    Westermann, Ruediger
    Reid, R. Clay
    Lichtman, Jeff
    Pfister, Hanspeter
    [J]. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2010, 16 (06) : 1386 - 1395