Concept selection for phenotypes and diseases using learn to rank

被引:11
|
作者
Collier, Nigel [1 ,2 ]
Oellrich, Anika [3 ]
Groza, Tudor [4 ]
机构
[1] Univ Cambridge, Cambridge, England
[2] European Bioinformat Inst EMBL EBI, Cambridge, England
[3] Wellcome Trust Sanger Inst, Cambridge, England
[4] Garvan Inst Med Res, Sydney, NSW, Australia
来源
基金
澳大利亚研究理事会;
关键词
BIOMEDICAL CONCEPT RECOGNITION; ONTOLOGY; EXTRACTION; TOOL; INFORMATION; SYSTEM; TEXT;
D O I
10.1186/s13326-015-0019-z
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background: Phenotypes form the basis for determining the existence of a disease against the given evidence. Much of this evidence though remains locked away in text - scientific articles, clinical trial reports and electronic patient records (EPR) - where authors use the full expressivity of human language to report their observations. Results: In this paper we exploit a combination of off-the-shelf tools for extracting a machine understandable representation of phenotypes and other related concepts that concern the diagnosis and treatment of diseases. These are tested against a gold standard EPR collection that has been annotated with Unified Medical Language System (UMLS) concept identifiers: the ShARE/CLEF 2013 corpus for disorder detection. We evaluate four pipelines as stand-alone systems and then attempt to optimise semantic-type based performance using several learn-to-rank (LTR) approaches - three pairwise and one listwise. We observed that whilst overall Apache cTAKES tended to outperform other stand-alone systems on a strong recall (R = 0.57), precision was low (P = 0.09) leading to low-to-moderate F1 measure (F1 = 0.16). Moreover, there is substantial variation in system performance across semantic types for disorders. For example, the concept Findings (T033) seemed to be very challenging for all systems. Combining systems within LTR improved F1 substantially (F1 = 0.24) particularly for Disease or syndrome (T047) and Anatomical abnormality (T190). Whilst recall is improved markedly, precision remains a challenge (P = 0.15, R = 0.59).
引用
收藏
页数:12
相关论文
共 50 条
  • [31] Multicategory prediction of multifactorial diseases through risk factor fusion and rank-sum selection
    Phegley, JW
    Perkins, K
    Gupta, L
    Hughes, LF
    IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART A-SYSTEMS AND HUMANS, 2005, 35 (05): : 718 - 726
  • [32] RANK CONDITIONED RANK SELECTION FILTERS FOR SIGNAL RESTORATION
    HARDIE, RC
    BARNER, KE
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 1994, 3 (02) : 192 - 206
  • [33] Identifying Genotypes and Phenotypes of Cardiovascular Diseases Using Big Data Analytics
    Krittanawong, Chayakrit
    Kitai, Takeshi
    JAMA CARDIOLOGY, 2017, 2 (10) : 1169 - 1170
  • [34] Concept selection using s-Pareto frontiers
    Mattson, CA
    Messac, A
    AIAA JOURNAL, 2003, 41 (06) : 1190 - 1198
  • [35] Handling Concept Drifts Using Dynamic Selection of Classifiers
    Lisboa de Almeida, Paulo R.
    Oliveira, Luiz S.
    Britto, Alceu de Souza, Jr.
    Sabourin, Robert
    2016 IEEE 28TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2016), 2016, : 989 - 995
  • [36] Concept of the Software for Materials Selection Using .NET Technologies
    Dobrotvorskiy, Sergey
    Balog, Michal
    Basova, Yevheniia
    Dobrovolska, Ludmila
    Zinchenko, Artem
    ADVANCED MANUFACTURING PROCESSES (INTERPARTNER-2019), 2020, : 32 - 43
  • [37] Precise Learn-to-Rank Fault Localization Using Dynamic and Static Features of Target Programs
    Kim, Yunho
    Mun, Seokhyeon
    Yoo, Shin
    Kim, Moonzoo
    ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY, 2019, 28 (04)
  • [38] Managing Dynamism of Multimodal Detection in Machine Vision Using Selection of Phenotypes
    Kale, Anup
    Chaczko, Zenon
    Rudas, Imre
    COMPUTER AIDED SYSTEMS THEORY, PT II, 2013, 8112 : 483 - 490
  • [39] Representative & Informative Query Selection for Learning to Rank using Submodular Functions
    Mehrotra, Rishabh
    Yilmaz, Emine
    SIGIR 2015: PROCEEDINGS OF THE 38TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2015, : 545 - 554
  • [40] SELECTION OF THYMOCYTES WITH THE PHENOTYPES OF MATURE T-CELLS USING CORTICOSTEROIDS
    ROGERS, P
    MATOSSIANROGERS, A
    IRCS MEDICAL SCIENCE-BIOCHEMISTRY, 1981, 9 (07): : 564 - 564