Comparison of automated and human assignment of MeSH terms on publicly-available molecular datasets

被引:9
|
作者
Ruau D. [1 ]
Mbagwu M. [2 ]
Dudley J.T. [1 ,3 ]
Krishnan V. [4 ]
Butte A.J. [1 ]
机构
[1] Division of Systems Medicine, Department of Pediatrics, Stanford, CA
[2] School of Allied Medical Professions, The Ohio State University College of Medicine, Columbus
[3] Program in Biomedical Informatics, Stanford University School of Medicine, Stanford
[4] Department of Computer Science, Stanford University School of Medicine, Stanford
关键词
Annotations; Concept Identification; MEDLINE; Natural Language Processing; Ontologies; Proteomics;
D O I
10.1016/j.jbi.2011.03.007
中图分类号
学科分类号
摘要
Publicly available molecular datasets can be used for independent verification or investigative repurposing, but depends on the presence, consistency and quality of descriptive annotations. Annotation and indexing of molecular datasets using well-defined controlled vocabularies or ontologies enables accurate and systematic data discovery, yet the majority of molecular datasets available through public data repositories lack such annotations. A number of automated annotation methods have been developed; however few systematic evaluations of the quality of annotations supplied by application of these methods have been performed using annotations from standing public data repositories. Here, we compared manually-assigned Medical Subject Heading (MeSH) annotations associated with experiments by data submitters in the PRoteomics IDEntification (PRIDE) proteomics data repository to automated MeSH annotations derived through the National Center for Biomedical Ontology Annotator and National Library of Medicine MetaMap programs. These programs were applied to free-text annotations for experiments in PRIDE. As many submitted datasets were referenced in publications, we used the manually curated MeSH annotations of those linked publications in MEDLINE as "gold standard" Annotator and MetaMap exhibited recall performance 3-fold greater than that of the manual annotations. We connected PRIDE experiments in a network topology according to shared MeSH annotations and found 373 distinct clusters, many of which were found to be biologically coherent by network analysis. The results of this study suggest that both Annotator and MetaMap are capable of annotating public molecular datasets with a quality comparable, and often exceeding, that of the actual data submitters, highlighting a continuous need to improve and apply automated methods to molecular datasets in public data repositories to maximize their value and utility. © 2011 Elsevier Inc.
引用
收藏
页码:S39 / S43
页数:4
相关论文
共 20 条
  • [1] USING PUBLICLY-AVAILABLE DATASETS TO ANSWER INNOVATIVE HEALTH BEHAVIOR RESEARCH QUESTIONS
    Romano, Kelly A.
    Mason, Tyler
    Fields, Sherecce
    Chiong, Reah
    Hecht, Leah
    Heron, Kristin E.
    Figueroa, Roger
    ANNALS OF BEHAVIORAL MEDICINE, 2022, 56 (SUPP 1) : S602 - S602
  • [2] Comparison of Publicly-Available Knowledge-Based Planning Prostate Routines Using Automated Batch Planning
    Fave, X.
    Kaderka, R.
    Hild, S.
    Cornell, M.
    Moore, K.
    MEDICAL PHYSICS, 2019, 46 (06) : E180 - E180
  • [3] CTNS mutations in publicly-available human cystinosis cell lines
    Zykovich, Artem
    Kinkade, Renee
    Royal, Gary
    Zankel, Todd
    MOLECULAR GENETICS AND METABOLISM REPORTS, 2015, 5 : 63 - 66
  • [4] Leveraging Worldwide, Publicly-Available Data to Create an Automated Satnav Interference Detection System
    Stader, John
    Gunawardena, Sanjeev
    PROCEEDINGS OF THE 2021 INTERNATIONAL TECHNICAL MEETING OF THE INSTITUTE OF NAVIGATION, 2021, : 69 - 83
  • [5] Robust Brain Extraction Across Datasets and Comparison With Publicly Available Methods
    Iglesias, Juan Eugenio
    Liu, Cheng-Yi
    Thompson, Paul M.
    Tu, Zhuowen
    IEEE TRANSACTIONS ON MEDICAL IMAGING, 2011, 30 (09) : 1617 - 1634
  • [6] Review Fair comparison of skin detection approaches on publicly available datasets
    Lumini, Alessandra
    Nanni, Loris
    EXPERT SYSTEMS WITH APPLICATIONS, 2020, 160
  • [7] A time-space varying distributed unit hydrograph (TS-DUH) for operational flash flood forecasting using publicly-available datasets
    Hu, Ying
    Wu, Huan
    Alfieri, Lorenzo
    Gu, Guojun
    Yilmaz, Koray K.
    Li, Chaoqun
    Jiang, Lulu
    Huang, Zhijun
    Chen, Weitian
    Wu, Wei
    Han, Qinzhe
    JOURNAL OF HYDROLOGY, 2024, 642
  • [8] Aquaticus: Publicly Available Datasets from a Marine Human-Robot Teaming Testbed
    Novitzky, Michael
    Robinette, Paul
    Benjamin, Michael R.
    Fitzgerald, Caileigh
    Schmidt, Henrik
    HRI '19: 2019 14TH ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTION, 2019, : 392 - 400
  • [9] RELATIONSHIP BETWEEN MACROPHAGE AND RADIOSENSITIVITY IN HUMAN PRIMARY AND RECURRENT GLIOBLASTOMA: IN SILICO ANALYSIS WITH PUBLICLY AVAILABLE DATASETS
    Kim, In Ah
    Jang, Bum Sup
    NEURO-ONCOLOGY, 2022, 24 : 283 - 283
  • [10] Relationship between Macrophage and Radiosensitivity in Human Primary and Recurrent Glioblastoma: In Silico Analysis with Publicly Available Datasets
    Jang, Bum-Sup
    Kim, In Ah
    BIOMEDICINES, 2022, 10 (02)