Deep learning-based phenotype imputation on population-scale biobank data increases genetic discoveries

被引:0
|
作者
Ulzee An
Ali Pazokitoroudi
Marcus Alvarez
Lianyun Huang
Silviu Bacanu
Andrew J. Schork
Kenneth Kendler
Päivi Pajukanta
Jonathan Flint
Noah Zaitlen
Na Cai
Andy Dahl
Sriram Sankararaman
机构
[1] UCLA,Computer Science Department
[2] David Geffen School of Medicine at UCLA,Department of Human Genetics
[3] Helmholtz Zentrum München,Helmholtz Pioneer Campus
[4] Helmholtz Zentrum München,Computational Health Centre
[5] Technical University of Munich,School of Medicine
[6] Virginia Commonwealth University,Virginia Institute for Psychiatric and Behavioral Genetics and Department of Psychiatry
[7] Copenhagen University Hospital,Institute of Biological Psychiatry, Mental Health Center
[8] The Translational Genomics Research Institute (TGEN), Sct Hans
[9] Copenhagen University,Neurogenomics Division
[10] David Geffen School of Medicine at UCLA,Section for Geogenetics, GLOBE Institute, Faculty of Health and Medical Sciences
[11] UCLA,Institute for Precision Health
[12] University of Chicago,Neurology Department
[13] UCLA,Section of Genetic Medicine
来源
Nature Genetics | 2023年 / 55卷
关键词
D O I
暂无
中图分类号
学科分类号
摘要
Biobanks that collect deep phenotypic and genomic data across many individuals have emerged as a key resource in human genetics. However, phenotypes in biobanks are often missing across many individuals, limiting their utility. We propose AutoComplete, a deep learning-based imputation method to impute or ‘fill-in’ missing phenotypes in population-scale biobank datasets. When applied to collections of phenotypes measured across ~300,000 individuals from the UK Biobank, AutoComplete substantially improved imputation accuracy over existing methods. On three traits with notable amounts of missingness, we show that AutoComplete yields imputed phenotypes that are genetically similar to the originally observed phenotypes while increasing the effective sample size by about twofold on average. Further, genome-wide association analyses on the resulting imputed phenotypes led to a substantial increase in the number of associated loci. Our results demonstrate the utility of deep learning-based phenotype imputation to increase power for genetic discoveries in existing biobank datasets.
引用
收藏
页码:2269 / 2276
页数:7
相关论文
共 50 条
  • [1] Deep learning-based phenotype imputation on population-scale biobank data increases genetic discoveries
    An, Ulzee
    Pazokitoroudi, Ali
    Alvarez, Marcus
    Huang, Lianyun
    Bacanu, Silviu
    Schork, Andrew J.
    Kendler, Kenneth
    Pajukanta, Paeivi
    Flint, Jonathan
    Zaitlen, Noah
    Cai, Na
    Dahl, Andy
    Sankararaman, Sriram
    [J]. NATURE GENETICS, 2023, 55 (12) : 2269 - 2272
  • [2] Deep learning-based data imputation on time-variant data using recurrent neural network
    Sangeetha, M.
    Senthil Kumaran, M.
    [J]. SOFT COMPUTING, 2020, 24 (17) : 13369 - 13380
  • [3] Genetic population structure and its consequences in biobank scale data
    Lawson, Daniel J.
    [J]. EUROPEAN JOURNAL OF HUMAN GENETICS, 2022, 30 (SUPPL 1) : 21 - 21
  • [4] Machine learning-based imputation soft computing approach for large missing scale and non-reference data imputation
    Alamoodi, A. H.
    Zaidan, B. B.
    Zaidan, A. . A. .
    Albahri, O. S.
    Chen, Juliana
    Chyad, M. A.
    Garfan, Salem
    Aleesa, A. M.
    [J]. CHAOS SOLITONS & FRACTALS, 2021, 151
  • [5] Handling missing values in healthcare data: A systematic review of deep learning-based imputation techniques
    Liu, Mingxuan
    Li, Siqi
    Yuan, Han
    Ong, Marcus Eng Hock
    Ning, Yilin
    Xie, Feng
    Saffari, Seyed Ehsan
    Shang, Yuqing
    Volovici, Victor
    Chakraborty, Bibhas
    Liu, Nan
    [J]. ARTIFICIAL INTELLIGENCE IN MEDICINE, 2023, 142
  • [6] A Deep Learning Based Approach for Traffic Data Imputation
    Duan, Yanjie
    Lv, Yisheng
    Kang, Wenwen
    Zhao, Yifei
    [J]. 2014 IEEE 17TH INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS (ITSC), 2014, : 912 - 917
  • [7] Characterization of Genetic Diversity in the Nematode Pristionchus pacificus from Population-Scale Resequencing Data
    Roedelsperger, Christian
    Neher, Richard A.
    Weller, Andreas M.
    Eberhardt, Gabi
    Witte, Hanh
    Mayer, Werner E.
    Dieterich, Christoph
    Sommer, Ralf J.
    [J]. GENETICS, 2014, 196 (04) : 1153 - +
  • [8] Population-scale Genomic Data Augmentation Based on Conditional Generative Adversarial Networks
    Chen, Junjie
    Mowlaei, Mohammad Erfan
    Shi, Xinghua
    [J]. ACM-BCB 2020 - 11TH ACM CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY, AND HEALTH INFORMATICS, 2020,
  • [9] Deep Learning-Based Classification of Hyperspectral Data
    Chen, Yushi
    Lin, Zhouhan
    Zhao, Xing
    Wang, Gang
    Gu, Yanfeng
    [J]. IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2014, 7 (06) : 2094 - 2107
  • [10] A deep learning-based imputation method for missing gaps in satellite aerosol products by fusing numerical model data
    Liu, Ning
    Li, Yi
    Zang, Zengliang
    Hu, Yiwen
    Fang, Xin
    Lolli, Simone
    [J]. ATMOSPHERIC ENVIRONMENT, 2024, 325