Concept selection for phenotypes and diseases using learn to rank

被引:11
|
作者
Collier, Nigel [1 ,2 ]
Oellrich, Anika [3 ]
Groza, Tudor [4 ]
机构
[1] Univ Cambridge, Cambridge, England
[2] European Bioinformat Inst EMBL EBI, Cambridge, England
[3] Wellcome Trust Sanger Inst, Cambridge, England
[4] Garvan Inst Med Res, Sydney, NSW, Australia
来源
基金
澳大利亚研究理事会;
关键词
BIOMEDICAL CONCEPT RECOGNITION; ONTOLOGY; EXTRACTION; TOOL; INFORMATION; SYSTEM; TEXT;
D O I
10.1186/s13326-015-0019-z
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background: Phenotypes form the basis for determining the existence of a disease against the given evidence. Much of this evidence though remains locked away in text - scientific articles, clinical trial reports and electronic patient records (EPR) - where authors use the full expressivity of human language to report their observations. Results: In this paper we exploit a combination of off-the-shelf tools for extracting a machine understandable representation of phenotypes and other related concepts that concern the diagnosis and treatment of diseases. These are tested against a gold standard EPR collection that has been annotated with Unified Medical Language System (UMLS) concept identifiers: the ShARE/CLEF 2013 corpus for disorder detection. We evaluate four pipelines as stand-alone systems and then attempt to optimise semantic-type based performance using several learn-to-rank (LTR) approaches - three pairwise and one listwise. We observed that whilst overall Apache cTAKES tended to outperform other stand-alone systems on a strong recall (R = 0.57), precision was low (P = 0.09) leading to low-to-moderate F1 measure (F1 = 0.16). Moreover, there is substantial variation in system performance across semantic types for disorders. For example, the concept Findings (T033) seemed to be very challenging for all systems. Combining systems within LTR improved F1 substantially (F1 = 0.24) particularly for Disease or syndrome (T047) and Anatomical abnormality (T190). Whilst recall is improved markedly, precision remains a challenge (P = 0.15, R = 0.59).
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Concept selection for phenotypes and diseases using learn to rank
    Nigel Collier
    Anika Oellrich
    Tudor Groza
    Journal of Biomedical Semantics, 6
  • [2] A LEARN-TO-RANK APPROACH TO MEDICINE SELECTION FOR PATIENT TREATMENTS
    Farouqa, Maher
    Azzeh, Mohammad
    Interdisciplinary Journal of Information, Knowledge, and Management, 2024, 19
  • [3] Functional analysis of mutations in rank that result in diseases with opposite phenotypes
    Mellis, D. J.
    Guerrini, M. M.
    Greenhorn, J.
    Vezzoni, P.
    Villa, A.
    Rogers, M. J.
    Helfrich, M. H.
    Crockett, J. C.
    BONE, 2009, 44 (02) : S325 - S325
  • [4] Machines learn phenotypes
    Natalie de Souza
    Nature Methods, 2013, 10 (1) : 38 - 38
  • [5] Threshold selection using the rank statistics
    Mironov, A. A.
    PROCEEDINGS OF THE FIFTH INTERNATIONAL CONFERENCE ON BIOINFORMATICS OF GENOME REGULATION AND STRUCTURE, VOL 1, 2006, : 110 - 113
  • [6] DISCOVERING PATIENT PHENOTYPES USING GENERALIZED LOW RANK MODELS
    Schuler, Alejandro
    Liu, Vincent
    Wan, Joe
    Callahan, Alison
    Udell, Madeleine
    Stark, David E.
    Shah, Nigam H.
    PACIFIC SYMPOSIUM ON BIOCOMPUTING 2016, 2016, : 144 - 155
  • [7] USING ASSOCIATION RULES TO LEARN CONCEPT RELATIONSHIPS IN ONTOLOGIES
    Gulla, Jon Atle
    Brasethvik, Terje
    Kvarv, Goran Sveia
    ICEIS 2008: PROCEEDINGS OF THE TENTH INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS, VOL ISAS-1: INFORMATION SYSTEMS ANALYSIS AND SPECIFICATION, VOL 1, 2008, : 58 - 65
  • [8] PIGEONS LEARN CONCEPT OF AN A
    MORGAN, MJ
    FITCH, MD
    HOLMAN, JG
    LEA, SEG
    PERCEPTION, 1976, 5 (01) : 57 - 66
  • [9] Sustainable concept selection using ELECTRE
    S. Vinodh
    R. Jeya Girubha
    Clean Technologies and Environmental Policy, 2012, 14 : 651 - 656
  • [10] Sustainable concept selection using ELECTRE
    Vinodh, S.
    Girubha, R. Jeya
    CLEAN TECHNOLOGIES AND ENVIRONMENTAL POLICY, 2012, 14 (04) : 651 - 656