Concept selection for phenotypes and diseases using learn to rank

被引:11
|
作者
Collier, Nigel [1 ,2 ]
Oellrich, Anika [3 ]
Groza, Tudor [4 ]
机构
[1] Univ Cambridge, Cambridge, England
[2] European Bioinformat Inst EMBL EBI, Cambridge, England
[3] Wellcome Trust Sanger Inst, Cambridge, England
[4] Garvan Inst Med Res, Sydney, NSW, Australia
来源
基金
澳大利亚研究理事会;
关键词
BIOMEDICAL CONCEPT RECOGNITION; ONTOLOGY; EXTRACTION; TOOL; INFORMATION; SYSTEM; TEXT;
D O I
10.1186/s13326-015-0019-z
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background: Phenotypes form the basis for determining the existence of a disease against the given evidence. Much of this evidence though remains locked away in text - scientific articles, clinical trial reports and electronic patient records (EPR) - where authors use the full expressivity of human language to report their observations. Results: In this paper we exploit a combination of off-the-shelf tools for extracting a machine understandable representation of phenotypes and other related concepts that concern the diagnosis and treatment of diseases. These are tested against a gold standard EPR collection that has been annotated with Unified Medical Language System (UMLS) concept identifiers: the ShARE/CLEF 2013 corpus for disorder detection. We evaluate four pipelines as stand-alone systems and then attempt to optimise semantic-type based performance using several learn-to-rank (LTR) approaches - three pairwise and one listwise. We observed that whilst overall Apache cTAKES tended to outperform other stand-alone systems on a strong recall (R = 0.57), precision was low (P = 0.09) leading to low-to-moderate F1 measure (F1 = 0.16). Moreover, there is substantial variation in system performance across semantic types for disorders. For example, the concept Findings (T033) seemed to be very challenging for all systems. Combining systems within LTR improved F1 substantially (F1 = 0.24) particularly for Disease or syndrome (T047) and Anatomical abnormality (T190). Whilst recall is improved markedly, precision remains a challenge (P = 0.15, R = 0.59).
引用
收藏
页数:12
相关论文
共 50 条
  • [21] Expanding concept of learning to learn
    Azuma, H
    TURKISH JOURNAL OF PEDIATRICS, 1999, 41 : 5 - 6
  • [22] Using concept lattices to support service selection
    Aversano, Lerina
    Bruno, Marcello
    Canfora, Gerardo
    Di Penta, Massimiliano
    Distante, Damiano
    INTERNATIONAL JOURNAL OF WEB SERVICES RESEARCH, 2006, 3 (04) : 32 - 51
  • [23] Design Concept Selection Using Spreadsheet Analysis
    Hurst, Ken S.
    JOURNAL OF ENGINEERING DESIGN, 1991, 2 (04) : 291 - 302
  • [24] ON RANK SELECTION PROBABILITIES
    KUOSMANEN, P
    ASTOLA, J
    AGAIAN, S
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 1994, 42 (11) : 3255 - 3258
  • [25] Learning to rank for information retrieval using the clonal selection algorithm
    He, Qiang
    Ma, Jun
    Niu, Xiaofei
    Journal of Information and Computational Science, 2010, 7 (01): : 153 - 159
  • [26] Feature Selection for Learning-to-Rank using Simulated Annealing
    Allvi, Mustafa Wasif
    Hasan, Mahamudul
    Rayon, Lazim
    Shahabuddin, Mohammad
    Khan, Md Mosaddek
    Ibrahim, Muhammad
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2020, 11 (03) : 699 - 705
  • [27] Transformers learn through gradual rank increase
    Boix-Adsera, Enric
    Littwin, Etai
    Abbe, Emmanuel
    Bengio, Samy
    Susskind, Joshua
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [28] AN EXTENSION OF THE RANK TRANSFORMATION CONCEPT
    ZIMMERMAN, DW
    JOURNAL OF EXPERIMENTAL EDUCATION, 1992, 61 (01): : 73 - 80
  • [29] The concept of operational rank extractor
    Opris, IE
    ANALOG INTEGRATED CIRCUITS AND SIGNAL PROCESSING, 2000, 23 (03) : 189 - 198
  • [30] The Concept of Operational Rank Extractor
    Ion E. Opris
    Analog Integrated Circuits and Signal Processing, 2000, 23 : 189 - 198