A semantic-based method for extracting concept definitions from scientific publications: evaluation in the autism phenotype domain

被引:10
|
作者
Hassanpour, Saeed [1 ]
O'Connor, Martin J. [1 ]
Das, Amar K. [2 ]
机构
[1] Stanford Ctr Biomed Informat Res, Stanford, CA 94305 USA
[2] Geisel Sch Med Dartmouth, Lebanon, NH 03766 USA
来源
基金
美国国家卫生研究院;
关键词
Knowledge acquisition; Ontologies; Rules; Biomedical definitions; Autism phenotypes; RULE IDENTIFICATION; ONTOLOGY; KNOWLEDGE;
D O I
10.1186/2041-1480-4-14
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background: A variety of informatics approaches have been developed that use information retrieval, NLP and text-mining techniques to identify biomedical concepts and relations within scientific publications or their sentences. These approaches have not typically addressed the challenge of extracting more complex knowledge such as biomedical definitions. In our efforts to facilitate knowledge acquisition of rule-based definitions of autism phenotypes, we have developed a novel semantic-based text-mining approach that can automatically identify such definitions within text. Results: Using an existing knowledge base of 156 autism phenotype definitions and an annotated corpus of 26 source articles containing such definitions, we evaluated and compared the average rank of correctly identified rule definition or corresponding rule template using both our semantic-based approach and a standard term-based approach. We examined three separate scenarios: (1) the snippet of text contained a definition already in the knowledge base; (2) the snippet contained an alternative definition for a concept in the knowledge base; and (3) the snippet contained a definition not in the knowledge base. Our semantic-based approach had a higher average rank than the term-based approach for each of the three scenarios (scenario 1: 3.8 vs. 5.0; scenario 2: 2.8 vs. 4.9; and scenario 3: 4.5 vs. 6.2), with each comparison significant at the p-value of 0.05 using the Wilcoxon signed-rank test. Conclusions: Our work shows that leveraging existing domain knowledge in the information extraction of biomedical definitions significantly improves the correct identification of such knowledge within sentences. Our method can thus help researchers rapidly acquire knowledge about biomedical definitions that are specified and evolving within an ever-growing corpus of scientific publications.
引用
收藏
页数:10
相关论文
共 22 条
  • [1] A semantic-based method for extracting concept definitions from scientific publications: evaluation in the autism phenotype domain
    Saeed Hassanpour
    Martin J O’Connor
    Amar K Das
    Journal of Biomedical Semantics, 4
  • [2] Semantic Annotation of Scientific Publications Based on Integration of Concept Knowledge
    Phyo, Shwe Sin
    Myo, Nyein Nyein
    EMERGING TRENDS IN INTELLIGENT COMPUTING AND INFORMATICS: DATA SCIENCE, INTELLIGENT INFORMATION SYSTEMS AND SMART COMPUTING, 2020, 1073 : 98 - 109
  • [3] A Semantic-based Clustering Method to Build Domain Ontology from Multiple Heterogeneous Knowledge Sources
    凌玲
    胡于进
    王学林
    李成刚
    Journal of Donghua University(English Edition), 2006, (02) : 1 - 7
  • [4] Semantic-based clustering method to build domain ontology from multiple heterogeneous knowledge sources
    Ling, Ling
    Hu, Yu-Jin
    Wang, Xue-Lin
    Li, Cheng-Gang
    Journal of Donghua University (English Edition), 2006, 23 (02) : 1 - 7
  • [5] Concept Extraction Based on Semantic Models Using Big Amount of Patents and Scientific Publications Data
    Kaliteevskii, Vasilii
    Deder, Arthur
    Peric, Nemanja
    Chechurin, Leonid
    CREATIVE SOLUTIONS FOR A SUSTAINABLE DEVELOPMENT (TFC 2021), 2021, 635 : 141 - 149
  • [6] Performance Evaluation for Semantic-based Risk Factors Extraction from Clinical Narratives
    Sabra, Susan
    Alobaidi, Mazen
    Malik, Khalid Mahmood
    Sabeeh, Vian
    2018 IEEE 8TH ANNUAL COMPUTING AND COMMUNICATION WORKSHOP AND CONFERENCE (CCWC), 2018, : 695 - 701
  • [7] A deep learning based method for extracting semantic information from patent documents
    Liang Chen
    Shuo Xu
    Lijun Zhu
    Jing Zhang
    Xiaoping Lei
    Guancan Yang
    Scientometrics, 2020, 125 : 289 - 312
  • [8] A deep learning based method for extracting semantic information from patent documents
    Chen, Liang
    Xu, Shuo
    Zhu, Lijun
    Zhang, Jing
    Lei, Xiaoping
    Yang, Guancan
    SCIENTOMETRICS, 2020, 125 (01) : 289 - 312
  • [9] Ontology-based semantic integration method for domain-specific scientific data
    Hu Changjun
    Zhang Xiaoming
    Zhao Qian
    Zhao Chongchong
    SNPD 2007: EIGHTH ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING, AND PARALLEL/DISTRIBUTED COMPUTING, VOL 3, PROCEEDINGS, 2007, : 772 - +
  • [10] Quantitative Method for Semantic Evaluation Metrics of Scientific Papers Based on Prompt Tuning
    Li, Xiyu
    Qian, Li
    Zhang, Zhixiong
    Data Analysis and Knowledge Discovery, 2024, 8 (8-9) : 200 - 212