High-Accuracy HLA Type Inference from Whole-Genome Sequencing Data Using Population Reference Graphs

被引:63
|
作者
Dilthey, Alexander T. [1 ,2 ]
Gourraud, Pierre-Antoine [3 ,4 ]
Mentzer, Alexander J. [1 ]
Cereb, Nezih [5 ]
Iqbal, Zamin [1 ]
McVean, Gil [1 ,6 ]
机构
[1] Univ Oxford, Wellcome Trust Ctr Human Genet, Oxford, England
[2] NHGRI, NIH, Bethesda, MD 20892 USA
[3] UCSF, Dept Neurol, San Francisco, CA USA
[4] Univ Nantes, Nantes Univ Hosp, INSERM, Unit ATIP 1064,Avenir Team 6, Nantes, France
[5] Histogenetics, Ossining, NY USA
[6] Univ Oxford, Li Ka Shing Ctr Hlth Informat & Discovery, Oxford, England
基金
欧洲研究理事会; 英国惠康基金;
关键词
HIGH-RESOLUTION HLA; CLASS-I; SUSCEPTIBILITY;
D O I
10.1371/journal.pcbi.1005151
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Genetic variation at the Human Leucocyte Antigen (HLA) genes is associated with many autoimmune and infectious disease phenotypes, is an important element of the immunological distinction between self and non-self, and shapes immune epitope repertoires. Determining the allelic state of the HLA genes (HLA typing) as a by-product of standard whole-genome sequencing data would therefore be highly desirable and enable the immunogenetic characterization of samples in currently ongoing population sequencing projects. Extensive hyperpolymorphism and sequence similarity between the HLA genes, however, pose problems for accurate read mapping and make HLA type inference from whole-genome sequencing data a challenging problem. We describe how to address these challenges in a Population Reference Graph (PRG) framework. First, we construct a PRG for 46 (mostly HLA) genes and pseudogenes, their genomic context and their characterized sequence variants, integrating a database of over 10,000 known allele sequences. Second, we present a sequence-to-PRG paired-end read mapping algorithm that enables accurate read mapping for the HLA genes. Third, we infer the most likely pair of underlying alleles at G group resolution from the IMGT/HLA database at each locus, employing a simple likelihood framework. We show that HLA*PRG, our algorithm, outperforms existing methods by a wide margin. We evaluate HLA*PRG on six classical class I and class II HLA genes (HLA-A, -B, -C, -DQA1, -DQB1, -DRB1) and on a set of 14 samples (3 samples with 2 x 100bp, 11 samples with 2 x 250bp Illumina HiSeq data). Of 158 alleles tested, we correctly infer 157 alleles (99.4%). We also identify and re-type two erroneous alleles in the original validation data. We conclude that HLA*PRG for the first time achieves accuracies comparable to gold-standard reference methods from standard whole-genome sequencing data, though high computational demands (currently similar to 30-250 CPU hours per sample) remain a significant challenge to practical application.
引用
收藏
页数:16
相关论文
共 50 条
  • [21] Empirical evaluation of variant calling accuracy using ultra-deep whole-genome sequencing data
    Kishikawa, Toshihiro
    Okada, Yukinori
    CANCER SCIENCE, 2018, 109 : 1043 - 1043
  • [22] Population assignment from genotype likelihoods for low-coverage whole-genome sequencing data
    Desaix, Matthew G.
    Rodriguez, Marina D.
    Ruegg, Kristen C.
    Anderson, Eric C.
    METHODS IN ECOLOGY AND EVOLUTION, 2024, 15 (03): : 493 - 510
  • [23] Improved Imputation Accuracy of Rare and Low-Frequency Genetic Variants Using Population-Specific High-Coverage Whole-Genome Sequencing Data Based Imputation Reference Panel
    Mitt, Mario
    Kals, Mart
    Parn, Kalle
    Gabriel, Stacey B.
    Lander, Eric S.
    Palotie, Aarno
    Ripatti, Samuli
    Morris, Andrew P.
    Metspalu, Andres
    Esko, Tonu
    Magi, Reedik
    Palta, Priit
    HUMAN HEREDITY, 2016, 81 (04) : 235 - 235
  • [24] Insights into Platypus Population Structure and History from Whole-Genome Sequencing
    Martin, Hilary C.
    Batty, Elizabeth M.
    Hussin, Julie
    Westall, Portia
    Daish, Tasman
    Kolomyjec, Stephen
    Piazza, Paolo
    Bowden, Rory
    Hawkins, Margaret
    Grant, Tom
    Moritz, Craig
    Grutzner, Frank
    Gongora, Jaime
    Donnelly, Peter
    MOLECULAR BIOLOGY AND EVOLUTION, 2018, 35 (05) : 1238 - 1252
  • [25] Identification of individuals by trait prediction using whole-genome sequencing data
    Lippert, Christoph
    Sabatini, Riccardo
    Maher, M. Cyrus
    Kang, Eun Yong
    Lee, Seunghak
    Arikan, Okan
    Harley, Alena
    Bernal, Axel
    Garst, Peter
    Lavrenko, Victor
    Yocum, Ken
    Wong, Theodore
    Zhu, Mingfu
    Yang, Wen-Yun
    Chang, Chris
    Lu, Tim
    Lee, Charlie W. H.
    Hicks, Barry
    Ramakrishnan, Smriti
    Tang, Haibao
    Xie, Chao
    Piper, Jason
    Brewerton, Suzanne
    Turpaz, Yaron
    Telenti, Amalio
    Roby, Rhonda K.
    Och, Franz J.
    Venter, J. Craig
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2017, 114 (38) : 10166 - 10171
  • [26] Comprehensive clinical pharmacogenomic profiling using whole-genome sequencing data
    Zhang, Lusi
    Bishop, Jeffrey R.
    Mroz, Pawel
    PHARMACOGENETICS AND GENOMICS, 2023, 33 (08): : 189 - 189
  • [27] Detecting Copy Number Variation from Whole-Genome Sequencing Data
    Jobanputra, V.
    Klein, R.
    Nahum, O.
    Yang, S.
    Ballinger, D.
    Beilharz, E.
    Levy, B.
    CYTOGENETIC AND GENOME RESEARCH, 2014, 142 (03)
  • [28] Population genetic inferences from whole-genome variation data
    Clark, Andrew
    FASEB JOURNAL, 2010, 24
  • [29] Detection of structural mosaicism from targeted and whole-genome sequencing data
    King, Daniel A.
    Sifrim, Alejandro
    Fitzgerald, Tomas W.
    Rahbari, Raheleh
    Hobson, Emma
    Homfray, Tessa
    Mansour, Sahar
    Mehta, Sarju G.
    Shehla, Mohammed
    Tomkins, Susan E.
    Vasudevan, Pradeep C.
    Hurles, Matthew E.
    GENOME RESEARCH, 2017, 27 (10) : 1704 - 1714
  • [30] Robustness in population-structure and demographic-inference results derived from the Aedes aegypti genotyping chip and whole-genome sequencing data
    Gomez-Palacio, Andres
    Morinaga, Gen
    Turner, Paul E.
    Micieli, Maria Victoria
    Elnour, Mohammed-Ahmed B.
    Salim, Bashir
    Surendran, Sinnathamby Noble
    Ramasamy, Ranjan
    Powell, Jeffrey R.
    Soghigian, John
    Gloria-Soria, Andrea
    G3-GENES GENOMES GENETICS, 2024, 14 (06):