HLA Haplotyping from RNA-seq Data Using Hierarchical Read Weighting

被引:40
|
作者
Kim, Hyunsung John [1 ]
Pourmand, Nader [1 ]
机构
[1] Univ Calif Santa Cruz, Dept Biomol Engn, Baskin Sch Engn, Santa Cruz, CA 95064 USA
来源
PLOS ONE | 2013年 / 8卷 / 06期
关键词
STEM-CELL TRANSPLANTATION; HIGH-RESOLUTION HLA; HIGH-THROUGHPUT; GENE FUSIONS; GENERATION; CANCER; MHC; NOMENCLATURE; POPULATION; ALLELES;
D O I
10.1371/journal.pone.0067885
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Correctly matching the HLA haplotypes of donor and recipient is essential to the success of allogenic hematopoietic stem cell transplantation. Current HLA typing methods rely on targeted testing of recognized antigens or sequences. Despite advances in Next Generation Sequencing, general high throughput transcriptome sequencing is currently underutilized for HLA haplotyping due to the central difficulty in aligning sequences within this highly variable region. Here we present the method, HLAforest, that can accurately predict HLA haplotype by hierarchically weighting reads and using an iterative, greedy, top down pruning technique. HLAforest correctly predicts >99% of allele group level (2 digit) haplotypes and 93% of peptide-level (4 digit) haplotypes of the most diverse HLA genes in simulations with read lengths and error rates modeling currently available sequencing technology. The method is very robust to sequencing error and can predict 99% of allele-group level haplotypes with substitution rates as high as 8.8%. When applied to data generated from a trio of cell lines, HLAforest corroborated PCR-based HLA haplotyping methods and accurately predicted 16/18 (89%) major class I genes for a daughter-father-mother trio at the peptide level. Major class II genes were predicted with 100% concordance between the daughter-father-mother trio. In fifty HapMap samples with paired end reads just 37 nucleotides long, HLAforest predicted 96.5% of allele group level HLA haplotypes correctly and 83% of peptide level haplotypes correctly. In sixteen RNAseq samples with limited coverage across HLA genes, HLAforest predicted 97.7% of allele group level haplotypes and 85% of peptide level haplotypes correctly.
引用
收藏
页数:10
相关论文
共 50 条
  • [31] Mapping and differential expression analysis from short-read RNA-Seq data in model organisms
    QiongYi Zhao
    Jacob Gratten
    Restuadi Restuadi
    Xuan Li
    Quantitative Biology, 2016, 4 (01) : 22 - 35
  • [32] Computational analysis of alternative polyadenylation from standard RNA-seq and single-cell RNA-seq data
    Gao, Yipeng
    Li, Wei
    MRNA 3' END PROCESSING AND METABOLISM, 2021, 655 : 225 - 243
  • [33] Estimation of Isoform Expression using Hierarchical Bayesian Model by RNA-seq
    Wang, Zengmiao
    Wang, Jun
    Deng, Minghua
    2015 34TH CHINESE CONTROL CONFERENCE (CCC), 2015, : 8554 - 8558
  • [34] Disease Biomarker Query from RNA-Seq Data
    Han, Henry
    Jiang, Xiaoqian
    CANCER INFORMATICS, 2014, 13 : 81 - 94
  • [35] Optimizing de novo assembly of short-read RNA-seq data for phylogenomics
    Ya Yang
    Stephen A Smith
    BMC Genomics, 14
  • [36] Optimizing de novo assembly of short-read RNA-seq data for phylogenomics
    Yang, Ya
    Smith, Stephen A.
    BMC GENOMICS, 2013, 14
  • [37] Modeling non-uniformity in short-read rates in RNA-Seq data
    Jun Li
    Hui Jiang
    Wing Hung Wong
    Genome Biology, 11
  • [38] Modeling non-uniformity in short-read rates in RNA-Seq data
    Li, Jun
    Jiang, Hui
    Wong, Wing Hung
    GENOME BIOLOGY, 2010, 11 (05):
  • [39] Biomarker Identification from RNA-Seq Data using a Robust Statistical Approach
    Akond, Zobaer
    Alam, Munirul
    Mollah, Md. Nurul Haque
    BIOINFORMATION, 2018, 14 (04) : 153 - 163
  • [40] Prediction of gene structures from RNA-seq data using dual decomposition
    Inatsuki T.
    Sato K.
    Sakakibara Y.
    IPSJ Transactions on Bioinformatics, 2016, 9 : 1 - 6