From Micro to Macro: Data Driven Phenotyping by Densification of Longitudinal Electronic Medical Records

被引:76
|
作者
Zhou, Jiayu [1 ,2 ]
Wang, Fei [3 ]
Hu, Jianying [3 ]
Ye, Jieping [1 ,2 ]
机构
[1] ASU, Biodesign Inst, Ctr Evolutionary Med & Informat, Tempe, AZ 85281 USA
[2] ASU, Dept Comp Sci & Engn, Tempe, AZ USA
[3] IBM TJ Watson Res Ctr, Healthcare Analyt, Yorktown Hts, NY USA
关键词
Medical informatics; phenotyping; sparse learning; matrix completion; densification; HEART-FAILURE; RISK; PREDICTION; MODEL; FACTORIZATION; IMPUTATION;
D O I
10.1145/2623330.2623711
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Inferring phenotypic patterns from population-scale clinical data is a core computational task in the development of personalized medicine. One important source of data on which to conduct this type of research is patient Electronic Medical Records (EMR). However, the patient EMRs are typically sparse and noisy, which creates significant challenges if we use them directly to represent patient phenotypes. In this paper, we propose a data driven phenotyping framework called PACIFIER (PAtient reCord densIFIER), where we interpret the longitudinal EMR data of each patient as a sparse matrix with a feature dimension and a time dimension, and derive more robust patient phenotypes by exploring the latent structure of those matrices. Specifically, we assume that each derived phenotype is composed of a subset of the medical features contained in original patient EMR, whose value evolves smoothly over time. We propose two formulations to achieve such goal. One is Individual Basis Approach (IRA), which assumes the phenotypes are different for every patient. The other is Shared Basis Approach (SBA), which assumes the patient population shares a common set of phenotypes. We develop an efficient optimization algorithm that is capable of resolving both problems efficiently. Finally we validate PACIFIER on two real world EMR cohorts for the tasks of early prediction of Congestive Heart Failure (CHF) and End Stage Renal Disease (ESRD). Our results show that the predictive performance in both tasks can be improved significantly by the proposed algorithms (average AUC score improved from 0.689 to 0.816 on CHF, and from 0.756 to 0.838 on ESRD respectively, on diagnosis group granularity). We also illustrate some interesting phenotypes derived from our data.
引用
收藏
页码:135 / 144
页数:10
相关论文
共 50 条
  • [1] Data-Driven Information Extraction from Chinese Electronic Medical Records
    Xu, Dong
    Zhang, Meizhuo
    Zhao, Tianwan
    Ge, Chen
    Gao, Weiguo
    Wei, Jia
    Zhu, Kenny Q.
    [J]. PLOS ONE, 2015, 10 (08):
  • [2] Temporal Phenotyping from Longitudinal Electronic Health Records: A Graph Based Framework
    Liu, Chuanren
    Wang, Fei
    Hu, Jianying
    Xiong, Hui
    [J]. KDD'15: PROCEEDINGS OF THE 21ST ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2015, : 705 - 714
  • [3] Anonymization of Longitudinal Electronic Medical Records
    Tamersoy, Acar
    Loukides, Grigorios
    Nergiz, Mehmet Ercan
    Saygin, Yucel
    Malin, Bradley
    [J]. IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE, 2012, 16 (03): : 413 - 423
  • [4] Data-driven approach for creating synthetic electronic medical records
    Buczak, Anna L.
    Babin, Steven
    Moniz, Linda
    [J]. BMC MEDICAL INFORMATICS AND DECISION MAKING, 2010, 10
  • [5] Data-driven approach for creating synthetic electronic medical records
    Anna L Buczak
    Steven Babin
    Linda Moniz
    [J]. BMC Medical Informatics and Decision Making, 10
  • [6] LATTE: Label-efficient incident phenotyping from longitudinal electronic health records
    Wen, Jun
    Hou, Jue
    Bonzel, Clara -Lea
    Zhao, Yihan
    Castro, Victor M.
    Gainer, Vivian S.
    Weisenfeld, Dana
    Cai, Tianrun
    Ho, Yuk-Lam
    Panickan, Vidul A.
    Costa, Lauren
    Hong, Chuan
    Gaziano, J. Michael
    Liao, Katherine P.
    Lu, Junwei
    Cho, Kelly
    Cai, Tianxi
    [J]. PATTERNS, 2024, 5 (01):
  • [7] Phenotyping of Cardiac Amyloidosis Advancing From Macro to Micro?
    Cuddy, Sarah
    Jerosch-Herold, Michael
    Dorbala, Sharmila
    [J]. CIRCULATION-CARDIOVASCULAR IMAGING, 2020, 13 (05)
  • [8] Using Electronic Health Records Driven Phenotyping for Major Depressive Disorder
    Pathak, Jyotishman
    Hall-Flavin, Daniel K.
    Biernacka, Joanna M.
    Jenkins, Gregory D.
    Bruce, Kevin T.
    Murphy, Sean P.
    Sagen, Jessica A.
    Skime, Michelle K.
    Bobo, William V.
    Chute, Christopher G.
    Wang, Leiwei
    Weinshilboum, Richard M.
    [J]. BIOLOGICAL PSYCHIATRY, 2014, 75 (09) : 343S - 343S
  • [9] Data-driven approach for assessing utility of medical tests using electronic medical records
    Skrovseth, Stein Olav
    Augestad, Knut Magne
    Ebadollahi, Shahram
    [J]. JOURNAL OF BIOMEDICAL INFORMATICS, 2015, 53 : 270 - 276
  • [10] The Effectiveness of Multitask Learning for Phenotyping with Electronic Health Records Data
    Ding, Daisy Yi
    Simpson, Chloe
    Pfohl, Stephen
    Kale, Dave C.
    Jung, Kenneth
    Shah, Nigam H.
    [J]. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019, 2019, : 18 - 29