High-Accuracy HLA Type Inference from Whole-Genome Sequencing Data Using Population Reference Graphs

被引：63

作者：

Dilthey, Alexander T. ^{[1
,2
]}

Gourraud, Pierre-Antoine ^{[3
,4
]}

Mentzer, Alexander J. ^{[1
]}

Cereb, Nezih ^{[5
]}

Iqbal, Zamin ^{[1
]}

McVean, Gil ^{[1
,6
]}

机构：

[1] Univ Oxford, Wellcome Trust Ctr Human Genet, Oxford, England

[2] NHGRI, NIH, Bethesda, MD 20892 USA

[3] UCSF, Dept Neurol, San Francisco, CA USA

[4] Univ Nantes, Nantes Univ Hosp, INSERM, Unit ATIP 1064,Avenir Team 6, Nantes, France

[5] Histogenetics, Ossining, NY USA

[6] Univ Oxford, Li Ka Shing Ctr Hlth Informat & Discovery, Oxford, England

来源：

PLOS COMPUTATIONAL BIOLOGY | 2016年 / 12卷 / 10期

基金：

欧洲研究理事会; 英国惠康基金;

关键词：

HIGH-RESOLUTION HLA; CLASS-I; SUSCEPTIBILITY;

D O I：

10.1371/journal.pcbi.1005151

中图分类号：

Q5 [生物化学];

学科分类号：

071010 ; 081704 ;

摘要：

Genetic variation at the Human Leucocyte Antigen (HLA) genes is associated with many autoimmune and infectious disease phenotypes, is an important element of the immunological distinction between self and non-self, and shapes immune epitope repertoires. Determining the allelic state of the HLA genes (HLA typing) as a by-product of standard whole-genome sequencing data would therefore be highly desirable and enable the immunogenetic characterization of samples in currently ongoing population sequencing projects. Extensive hyperpolymorphism and sequence similarity between the HLA genes, however, pose problems for accurate read mapping and make HLA type inference from whole-genome sequencing data a challenging problem. We describe how to address these challenges in a Population Reference Graph (PRG) framework. First, we construct a PRG for 46 (mostly HLA) genes and pseudogenes, their genomic context and their characterized sequence variants, integrating a database of over 10,000 known allele sequences. Second, we present a sequence-to-PRG paired-end read mapping algorithm that enables accurate read mapping for the HLA genes. Third, we infer the most likely pair of underlying alleles at G group resolution from the IMGT/HLA database at each locus, employing a simple likelihood framework. We show that HLA*PRG, our algorithm, outperforms existing methods by a wide margin. We evaluate HLA*PRG on six classical class I and class II HLA genes (HLA-A, -B, -C, -DQA1, -DQB1, -DRB1) and on a set of 14 samples (3 samples with 2 x 100bp, 11 samples with 2 x 250bp Illumina HiSeq data). Of 158 alleles tested, we correctly infer 157 alleles (99.4%). We also identify and re-type two erroneous alleles in the original validation data. We conclude that HLA*PRG for the first time achieves accuracies comparable to gold-standard reference methods from standard whole-genome sequencing data, though high computational demands (currently similar to 30-250 CPU hours per sample) remain a significant challenge to practical application.

引用

页数：16

共 50 条

[41] Using imputation-based whole-genome sequencing data to improve the accuracy of genomic prediction for combined populations in pigs
Hailiang Song
Shaopan Ye
Yifan Jiang
Zhe Zhang
Qin Zhang
Xiangdong Ding
Genetics Selection Evolution, 51
[42] Establishing community reference samples, data and call sets for benchmarking cancer mutation detection using whole-genome sequencing
Li Tai Fang
Bin Zhu
Yongmei Zhao
Wanqiu Chen
Zhaowei Yang
Liz Kerrigan
Kurt Langenbach
Maryellen de Mars
Charles Lu
Kenneth Idler
Howard Jacob
Yuanting Zheng
Luyao Ren
Ying Yu
Erich Jaeger
Gary P. Schroth
Ogan D. Abaan
Keyur Talsania
Justin Lack
Tsai-Wei Shen
Zhong Chen
Seta Stanbouly
Bao Tran
Jyoti Shetty
Yuliya Kriga
Daoud Meerzaman
Cu Nguyen
Virginie Petitjean
Marc Sultan
Margaret Cam
Monika Mehta
Tiffany Hung
Eric Peters
Rasika Kalamegham
Sayed Mohammad Ebrahim Sahraeian
Marghoob Mohiyuddin
Yunfei Guo
Lijing Yao
Lei Song
Hugo Y. K. Lam
Jiri Drabek
Petr Vojta
Roberta Maestro
Daniela Gasparotto
Sulev Kõks
Ene Reimann
Andreas Scherer
Jessica Nordlund
Ulrika Liljedahl
Roderick V. Jensen
Nature Biotechnology, 2021, 39 : 1151 - 1160
[43] Using imputation-based whole-genome sequencing data to improve the accuracy of genomic prediction for combined populations in pigs
Song, Hailiang
Ye, Shaopan
Jiang, Yifan
Zhang, Zhe
Zhang, Qin
Ding, Xiangdong
GENETICS SELECTION EVOLUTION, 2019, 51 (01)
[44] Establishing community reference samples, data and call sets for benchmarking cancer mutation detection using whole-genome sequencing
Fang, Li Tai
Zhu, Bin
Zhao, Yongmei
Chen, Wanqiu
Yang, Zhaowei
Kerrigan, Liz
Langenbach, Kurt
de Mars, Maryellen
Lu, Charles
Idler, Kenneth
Jacob, Howard
Zheng, Yuanting
Ren, Luyao
Yu, Ying
Jaeger, Erich
Schroth, Gary P.
Abaan, Ogan D.
Talsania, Keyur
Lack, Justin
Shen, Tsai-Wei
Chen, Zhong
Stanbouly, Seta
Tran, Bao
Shetty, Jyoti
Kriga, Yuliya
Meerzaman, Daoud
Nguyen, Cu
Petitjean, Virginie
Sultan, Marc
Cam, Margaret
Mehta, Monika
Hung, Tiffany
Peters, Eric
Kalamegham, Rasika
Sahraeian, Sayed Mohammad Ebrahim
Mohiyuddin, Marghoob
Guo, Yunfei
Yao, Lijing
Song, Lei
Lam, Hugo Y. K.
Drabek, Jiri
Vojta, Petr
Maestro, Roberta
Gasparotto, Daniela
Koks, Sulev
Reimann, Ene
Scherer, Andreas
Nordlund, Jessica
Liljedahl, Ulrika
Jensen, Roderick, V
NATURE BIOTECHNOLOGY, 2021, 39 (09) : 1151 - +
[45] NyuWa Genome resource: A deep whole-genome sequencing-based variation profile and reference panel for the Chinese population
Zhang, Peng
Li, Yanyan
Luo, Huaxia
Wang, You
Wang, Jiajia
Zheng, Yu
Niu, Yiwei
Shi, Yirong
Zhou, Honghong
Song, Tingrui
Kang, Quan
Xu, Tao
He, Shunmin
CELL REPORTS, 2021, 37 (07):
[46] Prediction of antimicrobial resistance in clinicalCampylobacter jejuniisolates from whole-genome sequencing data
Dahl, Louise Gade
Joensen, Katrine Grimstrup
Osterlund, Mark Thomas
Kiil, Kristoffer
Nielsen, Eva Moller
EUROPEAN JOURNAL OF CLINICAL MICROBIOLOGY & INFECTIOUS DISEASES, 2021, 40 (04) : 673 - 682
[47] ConsensuSV-from the whole-genome sequencing data to the complete variant list
Chilinski, Mateusz
Plewczynski, Dariusz
BIOINFORMATICS, 2022, 38 (24) : 5440 - 5442
[48] Detecting the Population Structure and Scanning for Signatures of Selection in Horses (Equus caballus) From Whole-Genome Sequencing Data
Zhang, Cheng
Ni, Pan
Ahmad, Hafiz Ishfaq
Gemingguli, M.
Baizilaitibei, A.
Gulibaheti, D.
Fang, Yaping
Wang, Haiyang
Asif, Akhtar Rasool
Xiao, Changyi
Chen, Jianhai
Ma, Yunlong
Liu, Xiangdong
Du, Xiaoyong
Zhao, Shuhong
EVOLUTIONARY BIOINFORMATICS, 2018, 14
[49] Toward genomic prediction from whole-genome sequence data: impact of sequencing design on genotype imputation and accuracy of predictions
Druet, T.
Macleod, I. M.
Hayes, B. J.
HEREDITY, 2014, 112 (01) : 39 - 47
[50] Toward genomic prediction from whole-genome sequence data: impact of sequencing design on genotype imputation and accuracy of predictions
T Druet
I M Macleod
B J Hayes
Heredity, 2014, 112 : 39 - 47

← 1 2 3 4 5 →