High-throughput multimodal automated phenotyping (MAP) with application to PheWAS

被引:56
|
作者
Liao, Katherine R. [1 ,2 ,3 ]
Sun, Jiehuan [3 ,4 ]
Cai, Tianrun A. [1 ,2 ,3 ]
Link, Nicholas [3 ]
Hong, Chuan [2 ,3 ,4 ]
Huang, Jie [2 ]
Huffman, Jennifer E. [3 ]
Gronsbell, Jessica [5 ]
Zhang, Yichi [4 ,6 ]
Ho, Yuk-Lam [3 ]
Castro, Victor [7 ]
Gainer, Vivian [7 ]
Murphy, Shawn N. [2 ,7 ,8 ]
ODonnell, Christopher J. [1 ,3 ]
Gaziano, J. Michael [1 ,2 ,3 ]
Cho, Kelly [1 ,2 ,3 ]
Szolovits, Peter [9 ]
Kohane, Isaac S. [2 ]
Yu, Sheng [10 ,11 ,12 ]
Cai, Tianxi [2 ,3 ,4 ]
机构
[1] Brigham & Womens Hosp, Div Rheumatol Immunol & Allergy, 75 Francis St, Boston, MA 02115 USA
[2] Harvard Med Sch, Dept Biomed Informat, Boston, MA 02115 USA
[3] VA Boston Healthcare Syst, Div Data Sci, Boston, MA USA
[4] Harvard TH Chan Sch Publ Hlth, Dept Biostat, Boston, MA USA
[5] Verily Life Sci, Cambridge, MA USA
[6] Univ Rhode Isl, Kingston, RI 02881 USA
[7] Partners Healthcare Syst, Summerville, MA USA
[8] Massachusetts Gen Hosp, Boston, MA 02114 USA
[9] MIT, Comp Sci & Artificial Intelligence Lab, 77 Massachusetts Ave, Cambridge, MA 02139 USA
[10] Tsinghua Univ, Ctr Stat Sci, Beijing, Peoples R China
[11] Tsinghua Univ, Dept Ind Engn, Beijing, Peoples R China
[12] Tsinghua Univ, Inst Data Sci, Beijing, Peoples R China
基金
国家重点研发计划; 中国国家自然科学基金; 美国国家卫生研究院;
关键词
High-throughput; phenotyping; PheWAS; ELECTRONIC MEDICAL-RECORDS; ICD-9-CM CODES; HEALTH; ASSOCIATION; ALGORITHMS; CLASSIFICATION; IDENTIFICATION; VALIDATION; DISEASE;
D O I
10.1093/jamia/ocz066
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Objective: Electronic health records linked with biorepositories are a powerful platform for translational studies. A major bottleneck exists in the ability to phenotype patients accurately and efficiently. The objective of this study was to develop an automated high-throughput phenotyping method integrating International Classification of Diseases (ICD) codes and narrative data extracted using natural language processing (NLP). Materials and Methods: We developed a mapping method for automatically identifying relevant ICD and NLP concepts for a specific phenotype leveraging the Unified Medical Language System. Along with health care utilization, aggregated ICD and NLP counts were jointly analyzed by fitting an ensemble of latent mixture models. The multimodal automated phenotyping (MAP) algorithm yields a predicted probability of phenotype for each patient and a threshold for classifying participants with phenotype yes/no. The algorithm was validated using labeled data for 16 phenotypes from a biorepository and further tested in an independent cohort phenome-wide association studies (PheWAS) for 2 single nucleotide polymorphisms with known associations. Results: The MAP algorithm achieved higher or similar AUC and F-scores compared to the ICD code across all 16 phenotypes. The features assembled via the automated approach had comparable accuracy to those assembled via manual curation (AUC(MAP) 0.943, AUC(manual) 0.941). The PheWAS results suggest that the MAP approach detected previously validated associations with higher power when compared to the standard PheWAS method based on ICD codes. Conclusion: The MAP approach increased the accuracy of phenotype definition while maintaining scalability, thereby facilitating use in studies requiring large-scale phenotyping, such as PheWAS.
引用
收藏
页码:1255 / 1262
页数:8
相关论文
共 50 条
  • [1] High-Throughput Phenotyping: Application in Maize Breeding
    Resende, Ewerton Lelys
    Bruzi, Adriano Teodoro
    Cardoso, Everton da Silva
    Carneiro, Vinicius Quintao
    Pereira de Souza, Vitorio Antonio
    Frois Correa Barros, Paulo Henrique
    Pereira, Raphael Rodrigues
    AGRIENGINEERING, 2024, 6 (02): : 1078 - 1092
  • [2] High-throughput phenotyping
    Natalie de Souza
    Nature Methods, 2010, 7 (1) : 36 - 36
  • [3] High-throughput phenotyping
    Gehan, Malia A.
    Kellogg, Elizabeth A.
    AMERICAN JOURNAL OF BOTANY, 2017, 104 (04) : 505 - 508
  • [4] High-throughput phenotyping
    de Souza, Natalie
    NATURE METHODS, 2010, 7 (01) : 36 - 36
  • [5] High-throughput mouse phenotyping
    Gates, Hilary
    Mallon, Ann-Marie
    Brown, Steve D. M.
    METHODS, 2011, 53 (04) : 394 - 404
  • [6] Automated segmentation and classification of zebrafish histology images for high-throughput phenotyping
    Canada, Brian
    Thomas, Georgia
    Cheng, Keith
    Wang, James Z.
    2007 IEEE/NIH LIFE SCIENCE SYSTEMS AND APPLICATIONS WORKSHOP, 2007, : 245 - +
  • [7] High-Throughput Automated Olfactory Phenotyping of Group-Housed Mice
    Reinert, Janine K.
    Schaefer, Andreas T.
    Kuner, Thomas
    FRONTIERS IN BEHAVIORAL NEUROSCIENCE, 2019, 13
  • [8] High-throughput phenotyping of plant resistance to aphids by automated video tracking
    Karen J Kloth
    Cindy JM ten Broeke
    Manus PM Thoen
    Marianne Hanhart-van den Brink
    Gerrie L Wiegers
    Olga E Krips
    Lucas PJJ Noldus
    Marcel Dicke
    Maarten A Jongsma
    Plant Methods, 11
  • [9] High-throughput phenotyping of plant resistance to aphids by automated video tracking
    Kloth, Karen J.
    ten Broeke, Cindy J. M.
    Thoen, Manus P. M.
    den Brink, Marianne Hanhart-van
    Wiegers, Gerrie L.
    Krips, Olga E.
    Noldus, Lucas P. J. J.
    Dicke, Marcel
    Jongsma, Maarten A.
    PLANT METHODS, 2015, 11
  • [10] Application of deep learning for high-throughput phenotyping of seed: a review
    Jin, Chen
    Zhou, Lei
    Pu, Yuanyuan
    Zhang, Chu
    Qi, Hengnian
    Zhao, Yiying
    ARTIFICIAL INTELLIGENCE REVIEW, 2025, 58 (03)