HLA Haplotyping from RNA-seq Data Using Hierarchical Read Weighting

被引:40
|
作者
Kim, Hyunsung John [1 ]
Pourmand, Nader [1 ]
机构
[1] Univ Calif Santa Cruz, Dept Biomol Engn, Baskin Sch Engn, Santa Cruz, CA 95064 USA
来源
PLOS ONE | 2013年 / 8卷 / 06期
关键词
STEM-CELL TRANSPLANTATION; HIGH-RESOLUTION HLA; HIGH-THROUGHPUT; GENE FUSIONS; GENERATION; CANCER; MHC; NOMENCLATURE; POPULATION; ALLELES;
D O I
10.1371/journal.pone.0067885
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Correctly matching the HLA haplotypes of donor and recipient is essential to the success of allogenic hematopoietic stem cell transplantation. Current HLA typing methods rely on targeted testing of recognized antigens or sequences. Despite advances in Next Generation Sequencing, general high throughput transcriptome sequencing is currently underutilized for HLA haplotyping due to the central difficulty in aligning sequences within this highly variable region. Here we present the method, HLAforest, that can accurately predict HLA haplotype by hierarchically weighting reads and using an iterative, greedy, top down pruning technique. HLAforest correctly predicts >99% of allele group level (2 digit) haplotypes and 93% of peptide-level (4 digit) haplotypes of the most diverse HLA genes in simulations with read lengths and error rates modeling currently available sequencing technology. The method is very robust to sequencing error and can predict 99% of allele-group level haplotypes with substitution rates as high as 8.8%. When applied to data generated from a trio of cell lines, HLAforest corroborated PCR-based HLA haplotyping methods and accurately predicted 16/18 (89%) major class I genes for a daughter-father-mother trio at the peptide level. Major class II genes were predicted with 100% concordance between the daughter-father-mother trio. In fifty HapMap samples with paired end reads just 37 nucleotides long, HLAforest predicted 96.5% of allele group level HLA haplotypes correctly and 83% of peptide level haplotypes correctly. In sixteen RNAseq samples with limited coverage across HLA genes, HLAforest predicted 97.7% of allele group level haplotypes and 85% of peptide level haplotypes correctly.
引用
收藏
页数:10
相关论文
共 50 条
  • [41] Optimizing de novo common wheat transcriptome assembly using short-read RNA-Seq data
    Jialei Duan
    Chuan Xia
    Guangyao Zhao
    Jizeng Jia
    Xiuying Kong
    BMC Genomics, 13
  • [42] Joint estimation of isoform expression and isoform-specific read distribution using multisample RNA-Seq data
    Suo, Chen
    Calza, Stefano
    Salim, Agus
    Pawitan, Yudi
    BIOINFORMATICS, 2014, 30 (04) : 506 - 513
  • [43] Dimensionality Reduction of RNA-Seq Data
    Al-Turaiki, Isra
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2021, 21 (03): : 31 - 36
  • [44] Bubble: a fast single-cell RNA-seq imputation using an autoencoder constrained by bulk RNA-seq data
    Chen, Siqi
    Yan, Xuhua
    Zheng, Ruiqing
    Li, Min
    BRIEFINGS IN BIOINFORMATICS, 2023, 24 (01)
  • [45] Optimizing de novo common wheat transcriptome assembly using short-read RNA-Seq data
    Duan, Jialei
    Xia, Chuan
    Zhao, Guangyao
    Jia, Jizeng
    Kong, Xiuying
    BMC GENOMICS, 2012, 13
  • [46] Transcript quantification with RNA-Seq data
    Bohnert, Regina
    Behr, Jonas
    Raetsch, Gunnar
    BMC BIOINFORMATICS, 2009, 10 : P5
  • [47] Statistical Modeling of RNA-Seq Data
    Salzman, Julia
    Jiang, Hui
    Wong, Wing Hung
    STATISTICAL SCIENCE, 2011, 26 (01) : 62 - 83
  • [48] Analysis of clustered RNA-seq data
    Park, Hyunjin
    Lee, Seungyeoun
    Kim, Ye Jin
    Choi, Myung-Sook
    Park, Taesung
    INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS, 2017, 19 (01) : 19 - 31
  • [49] Transcript quantification with RNA-Seq data
    Regina Bohnert
    Jonas Behr
    Gunnar Rätsch
    BMC Bioinformatics, 10
  • [50] RNA-Seq Data: A Complexity Journey
    Capobianco, Enrico
    COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2014, 11 (19): : 123 - 130