High-Accuracy HLA Type Inference from Whole-Genome Sequencing Data Using Population Reference Graphs

被引:63
|
作者
Dilthey, Alexander T. [1 ,2 ]
Gourraud, Pierre-Antoine [3 ,4 ]
Mentzer, Alexander J. [1 ]
Cereb, Nezih [5 ]
Iqbal, Zamin [1 ]
McVean, Gil [1 ,6 ]
机构
[1] Univ Oxford, Wellcome Trust Ctr Human Genet, Oxford, England
[2] NHGRI, NIH, Bethesda, MD 20892 USA
[3] UCSF, Dept Neurol, San Francisco, CA USA
[4] Univ Nantes, Nantes Univ Hosp, INSERM, Unit ATIP 1064,Avenir Team 6, Nantes, France
[5] Histogenetics, Ossining, NY USA
[6] Univ Oxford, Li Ka Shing Ctr Hlth Informat & Discovery, Oxford, England
基金
欧洲研究理事会; 英国惠康基金;
关键词
HIGH-RESOLUTION HLA; CLASS-I; SUSCEPTIBILITY;
D O I
10.1371/journal.pcbi.1005151
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Genetic variation at the Human Leucocyte Antigen (HLA) genes is associated with many autoimmune and infectious disease phenotypes, is an important element of the immunological distinction between self and non-self, and shapes immune epitope repertoires. Determining the allelic state of the HLA genes (HLA typing) as a by-product of standard whole-genome sequencing data would therefore be highly desirable and enable the immunogenetic characterization of samples in currently ongoing population sequencing projects. Extensive hyperpolymorphism and sequence similarity between the HLA genes, however, pose problems for accurate read mapping and make HLA type inference from whole-genome sequencing data a challenging problem. We describe how to address these challenges in a Population Reference Graph (PRG) framework. First, we construct a PRG for 46 (mostly HLA) genes and pseudogenes, their genomic context and their characterized sequence variants, integrating a database of over 10,000 known allele sequences. Second, we present a sequence-to-PRG paired-end read mapping algorithm that enables accurate read mapping for the HLA genes. Third, we infer the most likely pair of underlying alleles at G group resolution from the IMGT/HLA database at each locus, employing a simple likelihood framework. We show that HLA*PRG, our algorithm, outperforms existing methods by a wide margin. We evaluate HLA*PRG on six classical class I and class II HLA genes (HLA-A, -B, -C, -DQA1, -DQB1, -DRB1) and on a set of 14 samples (3 samples with 2 x 100bp, 11 samples with 2 x 250bp Illumina HiSeq data). Of 158 alleles tested, we correctly infer 157 alleles (99.4%). We also identify and re-type two erroneous alleles in the original validation data. We conclude that HLA*PRG for the first time achieves accuracies comparable to gold-standard reference methods from standard whole-genome sequencing data, though high computational demands (currently similar to 30-250 CPU hours per sample) remain a significant challenge to practical application.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] Haplotype and population structure inference using neural networks in whole-genome sequencing data
    Meisner, Jonas
    Albrechtsen, Anders
    GENOME RESEARCH, 2022, 32 (08) : 1542 - 1552
  • [2] Deciphering the Population Characteristics of Leiqiong Cattle Using Whole-Genome Sequencing Data
    Guo, Yingwei
    Zhao, Zhihui
    Ge, Fei
    Yu, Haibin
    Lyu, Chenxiao
    Liu, Yuxin
    Li, Junya
    Chen, Yan
    ANIMALS, 2025, 15 (03):
  • [3] Population analysis of the Korean native duck using whole-genome sequencing data
    Lee, Daehwan
    Lee, Jongin
    Heo, Kang-Neung
    Kwon, Kisang
    Moon, Youngbeen
    Lim, Dajeong
    Lee, Kyung-Tai
    Kim, Jaebum
    BMC GENOMICS, 2020, 21 (01)
  • [4] Population analysis of the Korean native duck using whole-genome sequencing data
    Daehwan Lee
    Jongin Lee
    Kang-Neung Heo
    Kisang Kwon
    Youngbeen Moon
    Dajeong Lim
    Kyung-Tai Lee
    Jaebum Kim
    BMC Genomics, 21
  • [5] HLA-VBSeq: accurate HLA typing at full resolution from whole-genome sequencing data
    Nariai, Naoki
    Kojima, Kaname
    Saito, Sakae
    Mimori, Takahiro
    Sato, Yukuto
    Kawai, Yosuke
    Yamaguchi-Kabata, Yumi
    Yasuda, Jun
    Nagasaki, Masao
    BMC GENOMICS, 2015, 16
  • [6] HLA-VBSeq: accurate HLA typing at full resolution from whole-genome sequencing data
    Naoki Nariai
    Kaname Kojima
    Sakae Saito
    Takahiro Mimori
    Yukuto Sato
    Yosuke Kawai
    Yumi Yamaguchi-Kabata
    Jun Yasuda
    Masao Nagasaki
    BMC Genomics, 16
  • [7] epiG: statistical inference and profiling of DNA methylation from whole-genome bisulfite sequencing data
    Vincent, Martin
    Mundbjerg, Kamilla
    Pedersen, Jakob Skou
    Liang, Gangning
    Jones, Peter A.
    Orntoft, Torben Falck
    Sorensen, Karina Dalsgaard
    Wiuf, Carsten
    GENOME BIOLOGY, 2017, 18
  • [8] epiG: statistical inference and profiling of DNA methylation from whole-genome bisulfite sequencing data
    Martin Vincent
    Kamilla Mundbjerg
    Jakob Skou Pedersen
    Gangning Liang
    Peter A. Jones
    Torben Falck Ørntoft
    Karina Dalsgaard Sørensen
    Carsten Wiuf
    Genome Biology, 18
  • [9] Assessing the digenic model in rare disorders using population whole-genome sequencing data
    Moreno-Ruiz, Nerea
    Lao, Oscar
    Ignacio Arostegui, Juan
    Laayouni, Hafid
    Casals, Ferran
    EUROPEAN JOURNAL OF HUMAN GENETICS, 2023, 31 : 579 - 579
  • [10] Improving population scale statistical phasing with whole-genome sequencing data
    Wertenbroek, Rick
    Hofmeister, Robin J.
    Xenarios, Ioannis
    Thoma, Yann
    Delaneau, Olivier
    PLOS GENETICS, 2024, 20 (07):