High-Accuracy HLA Type Inference from Whole-Genome Sequencing Data Using Population Reference Graphs

被引：63

作者：

Dilthey, Alexander T. ^{[1
,2
]}

Gourraud, Pierre-Antoine ^{[3
,4
]}

Mentzer, Alexander J. ^{[1
]}

Cereb, Nezih ^{[5
]}

Iqbal, Zamin ^{[1
]}

McVean, Gil ^{[1
,6
]}

机构：

[1] Univ Oxford, Wellcome Trust Ctr Human Genet, Oxford, England

[2] NHGRI, NIH, Bethesda, MD 20892 USA

[3] UCSF, Dept Neurol, San Francisco, CA USA

[4] Univ Nantes, Nantes Univ Hosp, INSERM, Unit ATIP 1064,Avenir Team 6, Nantes, France

[5] Histogenetics, Ossining, NY USA

[6] Univ Oxford, Li Ka Shing Ctr Hlth Informat & Discovery, Oxford, England

来源：

PLOS COMPUTATIONAL BIOLOGY | 2016年 / 12卷 / 10期

基金：

欧洲研究理事会; 英国惠康基金;

关键词：

HIGH-RESOLUTION HLA; CLASS-I; SUSCEPTIBILITY;

D O I：

10.1371/journal.pcbi.1005151

中图分类号：

Q5 [生物化学];

学科分类号：

071010 ; 081704 ;

摘要：

Genetic variation at the Human Leucocyte Antigen (HLA) genes is associated with many autoimmune and infectious disease phenotypes, is an important element of the immunological distinction between self and non-self, and shapes immune epitope repertoires. Determining the allelic state of the HLA genes (HLA typing) as a by-product of standard whole-genome sequencing data would therefore be highly desirable and enable the immunogenetic characterization of samples in currently ongoing population sequencing projects. Extensive hyperpolymorphism and sequence similarity between the HLA genes, however, pose problems for accurate read mapping and make HLA type inference from whole-genome sequencing data a challenging problem. We describe how to address these challenges in a Population Reference Graph (PRG) framework. First, we construct a PRG for 46 (mostly HLA) genes and pseudogenes, their genomic context and their characterized sequence variants, integrating a database of over 10,000 known allele sequences. Second, we present a sequence-to-PRG paired-end read mapping algorithm that enables accurate read mapping for the HLA genes. Third, we infer the most likely pair of underlying alleles at G group resolution from the IMGT/HLA database at each locus, employing a simple likelihood framework. We show that HLA*PRG, our algorithm, outperforms existing methods by a wide margin. We evaluate HLA*PRG on six classical class I and class II HLA genes (HLA-A, -B, -C, -DQA1, -DQB1, -DRB1) and on a set of 14 samples (3 samples with 2 x 100bp, 11 samples with 2 x 250bp Illumina HiSeq data). Of 158 alleles tested, we correctly infer 157 alleles (99.4%). We also identify and re-type two erroneous alleles in the original validation data. We conclude that HLA*PRG for the first time achieves accuracies comparable to gold-standard reference methods from standard whole-genome sequencing data, though high computational demands (currently similar to 30-250 CPU hours per sample) remain a significant challenge to practical application.

引用

页数：16

共 50 条

[31] Bayesian Inference of Infectious Disease Transmission from Whole-Genome Sequence Data
Didelot, Xavier
Gardy, Jennifer
Colijn, Caroline
MOLECULAR BIOLOGY AND EVOLUTION, 2014, 31 (07) : 1869 - 1879
[32] The inference of sex-biased human demography from whole-genome data
Musharoff, Sheila
Shringarpure, Suyash
Bustmante, Carlos D.
Ramachandran, Sohini
PLOS GENETICS, 2019, 15 (09):
[33] Inference of Gorilla Demographic and Selective History from Whole-Genome Sequence Data
McManus, Kimberly F.
Kelley, Joanna L.
Song, Shiya
Veeramah, Krishna R.
Woerner, August E.
Stevison, Laurie S.
Ryder, Oliver A.
Kidd, Jeffrey M.
Wall, Jeffrey D.
Bustamante, Carlos D.
Hammer, Michael F.
MOLECULAR BIOLOGY AND EVOLUTION, 2015, 32 (03) : 600 - 612
[34] Using whole-genome sequencing data to derive the homologous recombination deficiency scores
Xavier M. de Luca
Felicity Newell
Stephen H. Kazakoff
Gunter Hartel
Amy E. McCart Reed
Oliver Holmes
Qinying Xu
Scott Wood
Conrad Leonard
John V. Pearson
Sunil R. Lakhani
Nicola Waddell
Katia Nones
Peter T. Simpson
npj Breast Cancer, 6
[35] Prioritising positively selected variants in whole-genome sequencing data using FineMAV
Wahyudi, Fadilla
Aghakhanian, Farhang
Rahman, Sadequr
Teo, Yik-Ying
Szpak, Michal
Dhaliwal, Jasbir
Ayub, Qasim
BMC BIOINFORMATICS, 2021, 22 (01)
[36] Using whole-genome sequencing data to derive the homologous recombination deficiency scores
de Luca, Xavier M.
Newell, Felicity
Kazakoff, Stephen H.
Hartel, Gunter
Reed, Amy E. McCart
Holmes, Oliver
Xu, Qinying
Wood, Scott
Leonard, Conrad
Pearson, John, V
Lakhani, Sunil R.
Waddell, Nicola
Nones, Katia
Simpson, Peter T.
NPJ BREAST CANCER, 2020, 6 (01)
[37] Prioritising positively selected variants in whole-genome sequencing data using FineMAV
Fadilla Wahyudi
Farhang Aghakhanian
Sadequr Rahman
Yik-Ying Teo
Michał Szpak
Jasbir Dhaliwal
Qasim Ayub
BMC Bioinformatics, 22
[38] Investigation of selection signatures of dairy goats using whole-genome sequencing data
Peng, Weifeng
Zhang, Yiyuan
Gao, Lei
Wang, Shuping
Liu, Mengting
Sun, Enrui
Lu, Kaixin
Zhang, Yunxia
Li, Bing
Li, Guoyin
Cao, Jingya
Yang, Mingsheng
Guo, Yanfeng
Wang, Mengyun
Zhang, Yuming
Wang, Zihan
Han, Yan
Fan, Shuhua
Huang, Li
BMC GENOMICS, 2025, 26 (01):
[39] GENOME-WIDE ASSOCIATION STUDY OF EXTREME LONGEVITY USING WHOLE-GENOME SEQUENCING DATA
Gurinovich, Anastasia
Bae, Harold
Song, Zeyuan
Leshchyk, Anastasia
Li, Mengze
Andersen, Stacy
Perls, Thomas
Sebastiani, Paola
INNOVATION IN AGING, 2022, 6 : 395 - 395
[40] Relating Phage Genomes to Helicobacter pylori Population Structure: General Steps Using Whole-Genome Sequencing Data
Vale, Filipa F.
Lehours, Philippe
INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2018, 19 (07)

← 1 2 3 4 5 →