Multi-ancestry genome- and phenome-wide association studies of diverticular disease in electronic health records with natural language processing enriched phenotyping algorithm

被引:1
|
作者
Joo, Yoonjung Yoonie [1 ]
Pacheco, Jennifer A. [2 ]
Thompson, William K. [3 ]
Rasmussen-Torvik, Laura J. [4 ]
Rasmussen, Luke V. [4 ]
Lin, Frederick T. J. [1 ]
Andrade, Mariza de [5 ]
Borthwick, Kenneth M. [6 ]
Bottinger, Erwin [7 ]
Cagan, Andrew [8 ]
Carrell, David S. [9 ]
Denny, Joshua C. [10 ]
Ellis, Stephen B. [11 ]
Gottesman, Omri [11 ]
Linneman, James G. [12 ]
Pathak, Jyotishman [13 ]
Peissig, Peggy L. [14 ]
Shang, Ning [15 ]
Tromp, Gerard [16 ]
Veerappan, Annapoorani [17 ]
Smith, Maureen E. [2 ]
Chisholm, Rex L. [2 ]
Gawron, Andrew J. [18 ]
Hayes, M. Geoffrey [1 ,2 ,19 ]
Kho, Abel N. [3 ,20 ]
机构
[1] Northwestern Univ, Dept Med, Feinberg Sch Med, Chicago, IL 60611 USA
[2] Northwestern Univ, Ctr Genet Med, Feinberg Sch Med, Chicago, IL 60611 USA
[3] Northwestern Univ, Ctr Hlth Informat Partnerships, Feinberg Sch Med, Chicago, IL 60611 USA
[4] Northwestern Univ, Dept Prevent Med, Feinberg Sch Med, Chicago, IL USA
[5] Mayo Clin, Coll Med, Rochester, MN USA
[6] Geisinger, Danville, PA USA
[7] Icahn Sch Med Mt Sinai, New York, NY USA
[8] Partners Healthcare, Charlestown, MA USA
[9] Kaiser Permanente Washington Hlth Res Inst, Seattle, WA USA
[10] Vanderbilt Univ, Dept Biomed Informat & Med, Nashville, TN USA
[11] Icahn Sch Med Mt Sinai, Charles Bronfman Inst Personalized Med, New York, NY USA
[12] Marshfield Clin Fdn Med Res & Educ, Off Res Comp & Analyt, Res Inst, Marshfield, WI USA
[13] Weill Cornell Med Coll, Dept Healthcare Policy & Res, New York, NY USA
[14] Marshfield Clin Res Inst, Ctr Precis Med Res, Marshfield, WI USA
[15] Columbia Univ, Dept Biomed Informat, New York, NY USA
[16] Stellenbosch Univ, Fac Med & Hlth Sci, Dept Biomed Sci, Div Mol Biol & Human Genet, Stellenbosch, South Africa
[17] Duke Univ, Dept Med, Gastroenterol, Durham, NC USA
[18] Univ Utah, Div Gastroenterol Hepatol & Nutr, Salt Lake City, UT USA
[19] Northwestern Univ, Dept Anthropol, Evanston, IL 60611 USA
[20] Northwestern Univ, Dept Med, Div Gen Internal Med & Geriatr, Feinberg Sch Med, Chicago, IL 60611 USA
来源
PLOS ONE | 2023年 / 18卷 / 05期
关键词
FIBER DIET; LESSONS; BURDEN; RISK;
D O I
10.1371/journal.pone.0283553
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Objective Diverticular disease (DD) is one of the most prevalent conditions encountered by gastroenterologists, affecting similar to 50% of Americans before the age of 60. Our aim was to identify genetic risk variants and clinical phenotypes associated with DD, leveraging multiple electronic health record (EHR) data sources of 91,166 multi-ancestry participants with a Natural Language Processing (NLP) technique. Materials and methods We developed a NLP-enriched phenotyping algorithm that incorporated colonoscopy or abdominal imaging reports to identify patients with diverticulosis and diverticulitis from multicenter EHRs. We performed genome-wide association studies (GWAS) of DD in European, African and multi-ancestry participants, followed by phenome-wide association studies (PheWAS) of the risk variants to identify their potential comorbid/pleiotropic effects in clinical phenotypes. Results Our developed algorithm showed a significant improvement in patient classification performance for DD analysis (algorithm PPVs >= 0.94), with up to a 3.5 fold increase in terms of the number of identified patients than the traditional method. Ancestry-stratified analyses of diverticulosis and diverticulitis of the identified subjects replicated the well-established associations between ARHGAP15 loci with DD, showing overall intensified GWAS signals in diverticulitis patients compared to diverticulosis patients. Our PheWAS analyses identified significant associations between the DD GWAS variants and circulatory system, genitourinary, and neoplastic EHR phenotypes. Discussion As the first multi-ancestry GWAS-PheWAS study, we showcased that heterogenous EHR data can be mapped through an integrative analytical pipeline and reveal significant genotype-phenotype associations with clinical interpretation. Conclusion A systematic framework to process unstructured EHR data with NLP could advance a deep and scalable phenotyping for better patient identification and facilitate etiological investigation of a disease with multilayered data.
引用
收藏
页数:17
相关论文
共 8 条
  • [1] An analytic framework for exploring sampling and observation process biases in genome and phenome-wide association studies using electronic health records
    Beesley, Lauren J.
    Fritsche, Lars G.
    Mukherjee, Bhramar
    [J]. STATISTICS IN MEDICINE, 2020, 39 (14) : 1965 - 1979
  • [2] Phenome-Wide Association Study of Polygenic Risk Score for Alzheimer's Disease in Electronic Health Records
    Fu, Mingzhou
    Chang, Timothy S.
    [J]. FRONTIERS IN AGING NEUROSCIENCE, 2022, 14
  • [3] Variants Near FOXE1 Are Associated with Hypothyroidism and Other Thyroid Conditions: Using Electronic Medical Records for Genome- and Phenome-wide Studies
    Denny, Joshua C.
    Crawford, Dana C.
    Ritchie, Marylyn D.
    Bielinski, Suzette J.
    Basford, Melissa A.
    Bradford, Yuki
    Chai, High Seng
    Bastarache, Lisa
    Zuvich, Rebecca
    Peissig, Peggy
    Carrell, David
    Ramirez, Andrea H.
    Pathak, Jyotishman
    Wilke, Russell A.
    Rasmussen, Luke
    Wang, Xiaoming
    Pacheco, Jennifer A.
    Kho, Abel N.
    Hayes, M. Geoffrey
    Weston, Noah
    Matsumoto, Martha
    Kopp, Peter A.
    Newton, Katherine M.
    Jarvik, Gail P.
    Li, Rongling
    Manolio, Teri A.
    Kullo, Iftikhar J.
    Chute, Christopher G.
    Chisholm, Rex L.
    Larson, Eric B.
    McCarty, Catherine A.
    Masys, Daniel R.
    Roden, Dan M.
    de Andrade, Mariza
    [J]. AMERICAN JOURNAL OF HUMAN GENETICS, 2011, 89 (04) : 529 - 542
  • [4] Multi-Ancestry Meta-Analysis of Genome-Wide Association Studies Identifies over 200 Novel Genomic Loci for Diverticular Disease
    Neylan, Christopher
    Roberson, Jeffrey L.
    Kim, Alyson
    Walker, Venexia
    Damrauer, Scott M.
    Levin, Michael
    Maguire, Lillias H.
    [J]. JOURNAL OF THE AMERICAN COLLEGE OF SURGEONS, 2023, 237 (05) : S82 - S83
  • [5] GENETIC ANALYSIS OF KIDNEY STONE DISEASE IN A MULTI-ETHNIC COHORT: INSIGHTS FROM GENOME-WIDE AND PHENOME-WIDE ASSOCIATION STUDIES
    Patel, Parth
    Venkateswaran, Vidhya
    Pasanuic, Bogdan
    Scotland, Kymora
    [J]. JOURNAL OF UROLOGY, 2022, 207 (05): : E69 - E70
  • [6] Multi-ancestry genome-wide association study of cannabis use disorder yields insight into disease biology and public health implications
    Daniel F. Levey
    Marco Galimberti
    Joseph D. Deak
    Frank R. Wendt
    Arjun Bhattacharya
    Dora Koller
    Kelly M. Harrington
    Rachel Quaden
    Emma C. Johnson
    Priya Gupta
    Mahantesh Biradar
    Max Lam
    Megan Cooke
    Veera M. Rajagopal
    Stefany L. L. Empke
    Hang Zhou
    Yaira Z. Nunez
    Henry R. Kranzler
    Howard J. Edenberg
    Arpana Agrawal
    Jordan W. Smoller
    Todd Lencz
    David M. Hougaard
    Anders D. Børglum
    Ditte Demontis
    J. Michael Gaziano
    Michael J. Gandal
    Renato Polimanti
    Murray B. Stein
    Joel Gelernter
    [J]. Nature Genetics, 2023, 55 : 2094 - 2103
  • [7] Multi-ancestry genome-wide association study of cannabis use disorder yields insight into disease biology and public health implications
    Levey, Daniel F.
    Galimberti, Marco
    Deak, Joseph D.
    Wendt, Frank R.
    Bhattacharya, Arjun
    Koller, Dora
    Harrington, Kelly M.
    Quaden, Rachel
    Johnson, Emma C.
    Gupta, Priya
    Biradar, Mahantesh
    Lam, Max
    Cooke, Megan
    Rajagopal, Veera M.
    Empke, Stefany L. L.
    Zhou, Hang
    Nunez, Yaira Z.
    Kranzler, Henry R.
    Edenberg, Howard J.
    Agrawal, Arpana
    Smoller, Jordan W.
    Lencz, Todd
    Hougaard, David M.
    Borglum, Anders D.
    Demontis, Ditte
    Gaziano, J. Michael
    Gandal, Michael J.
    Polimanti, Renato
    Stein, Murray B.
    Gelernter, Joel
    [J]. NATURE GENETICS, 2023, 55 (12) : 2094 - 2103
  • [8] ADAPTATION AND VALIDATION OF A NATURAL LANGUAGE PROCESSING ALGORITHM TO USE IN ELECTRONIC HEALTH RECORDS TO IDENTIFY PATIENTS WITH PROGRESSIVE FIBROSING-INTERSTITIAL LUNG DISEASE IN SPAIN
    Balcells, E.
    Castellvi, I
    Caballero, P.
    Salinas, M. B.
    Tort, M.
    [J]. VALUE IN HEALTH, 2023, 26 (12) : S522 - S522