NERO: a biomedical named-entity (recognition) ontology with a large, annotated corpus reveals meaningful associations through text embedding

被引:0
|
作者
Kanix Wang
Robert Stevens
Halima Alachram
Yu Li
Larisa Soldatova
Ross King
Sophia Ananiadou
Annika M. Schoene
Maolin Li
Fenia Christopoulou
José Luis Ambite
Joel Matthew
Sahil Garg
Ulf Hermjakob
Daniel Marcu
Emily Sheng
Tim Beißbarth
Edgar Wingender
Aram Galstyan
Xin Gao
Brendan Chambers
Weidi Pan
Bohdan B. Khomtchouk
James A. Evans
Andrey Rzhetsky
机构
[1] University of Chicago,The Committee on Genetics, Genomics, and Systems Biology
[2] University of Chicago,The Institute of Genomics and Systems Biology
[3] University of Manchester,Depatment of Computer Science
[4] University of Göttingen,Institute of Medical Bioinformatics
[5] Computer,Computational Bioscience Research Center
[6] Electrical and Mathematical Sciences and Engineering Division King Abdullah University of Science and Technology (KAUST) Thuwal,Department of Chemical Engineering and Biotechnology
[7] Goldsmiths,Department of Biology and Biological Engineering
[8] University of London,National Centre for Text Mining
[9] University of Cambridge,The Information Sciences Institute
[10] Alan Turing Institute,Knowledge Lab, Department of Sociology
[11] Chalmers University of Technology,Master of Science in Statistics Program
[12] University of Manchester,Department of Medicine
[13] University of Southern California,Department of Human Genetics
[14] geneXplain GmbH,undefined
[15] University of Chicago,undefined
[16] University of Chicago,undefined
[17] University of Chicago,undefined
[18] University of Chicago,undefined
关键词
D O I
暂无
中图分类号
学科分类号
摘要
Machine reading (MR) is essential for unlocking valuable knowledge contained in millions of existing biomedical documents. Over the last two decades1,2, the most dramatic advances in MR have followed in the wake of critical corpus development3. Large, well-annotated corpora have been associated with punctuated advances in MR methodology and automated knowledge extraction systems in the same way that ImageNet4 was fundamental for developing machine vision techniques. This study contributes six components to an advanced, named entity analysis tool for biomedicine: (a) a new, Named Entity Recognition Ontology (NERO) developed specifically for describing textual entities in biomedical texts, which accounts for diverse levels of ambiguity, bridging the scientific sublanguages of molecular biology, genetics, biochemistry, and medicine; (b) detailed guidelines for human experts annotating hundreds of named entity classes; (c) pictographs for all named entities, to simplify the burden of annotation for curators; (d) an original, annotated corpus comprising 35,865 sentences, which encapsulate 190,679 named entities and 43,438 events connecting two or more entities; (e) validated, off-the-shelf, named entity recognition (NER) automated extraction, and; (f) embedding models that demonstrate the promise of biomedical associations embedded within this corpus.
引用
收藏
相关论文
共 4 条
  • [1] NERO: a biomedical named-entity (recognition) ontology with a large, annotated corpus reveals meaningful associations through text embedding
    Wang, Kanix
    Stevens, Robert
    Alachram, Halima
    Li, Yu
    Soldatova, Larisa
    King, Ross
    Ananiadou, Sophia
    Schoene, Annika M.
    Li, Maolin
    Christopoulou, Fenia
    Ambite, Jose Luis
    Matthew, Joel
    Garg, Sahil
    Hermjakob, Ulf
    Marcu, Daniel
    Sheng, Emily
    Beissbarth, Tim
    Wingender, Edgar
    Galstyan, Aram
    Gao, Xin
    Chambers, Brendan
    Pan, Weidi
    Khomtchouk, Bohdan B.
    Evans, James A.
    Rzhetsky, Andrey
    NPJ SYSTEMS BIOLOGY AND APPLICATIONS, 2021, 7 (01)
  • [2] BanglaBioMed: A Biomedical Named-Entity Annotated Corpus for Bangla (Bengali)
    Sazzed, Salim
    PROCEEDINGS OF THE 21ST WORKSHOP ON BIOMEDICAL LANGUAGE PROCESSING (BIONLP 2022), 2022, : 323 - 329
  • [3] Increasing Teachers' Trust in Automatic Text Assessment Through Named-Entity Recognition
    Walter, Candy
    ARTIFICIAL INTELLIGENCE IN EDUCATION: POSTERS AND LATE BREAKING RESULTS, WORKSHOPS AND TUTORIALS, INDUSTRY AND INNOVATION TRACKS, PRACTITIONERS AND DOCTORAL CONSORTIUM, PT II, 2022, 13356 : 191 - 194
  • [4] Biomedical Named-Entity Recognition by Hierarchically Fusing BioBERT Representations and Deep Contextual-Level Word-Embedding
    Naseem, Usman
    Musial, Katarzyna
    Eklund, Peter
    Prasad, Mukesh
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,