HLA Haplotyping from RNA-seq Data Using Hierarchical Read Weighting

被引:40
|
作者
Kim, Hyunsung John [1 ]
Pourmand, Nader [1 ]
机构
[1] Univ Calif Santa Cruz, Dept Biomol Engn, Baskin Sch Engn, Santa Cruz, CA 95064 USA
来源
PLOS ONE | 2013年 / 8卷 / 06期
关键词
STEM-CELL TRANSPLANTATION; HIGH-RESOLUTION HLA; HIGH-THROUGHPUT; GENE FUSIONS; GENERATION; CANCER; MHC; NOMENCLATURE; POPULATION; ALLELES;
D O I
10.1371/journal.pone.0067885
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Correctly matching the HLA haplotypes of donor and recipient is essential to the success of allogenic hematopoietic stem cell transplantation. Current HLA typing methods rely on targeted testing of recognized antigens or sequences. Despite advances in Next Generation Sequencing, general high throughput transcriptome sequencing is currently underutilized for HLA haplotyping due to the central difficulty in aligning sequences within this highly variable region. Here we present the method, HLAforest, that can accurately predict HLA haplotype by hierarchically weighting reads and using an iterative, greedy, top down pruning technique. HLAforest correctly predicts >99% of allele group level (2 digit) haplotypes and 93% of peptide-level (4 digit) haplotypes of the most diverse HLA genes in simulations with read lengths and error rates modeling currently available sequencing technology. The method is very robust to sequencing error and can predict 99% of allele-group level haplotypes with substitution rates as high as 8.8%. When applied to data generated from a trio of cell lines, HLAforest corroborated PCR-based HLA haplotyping methods and accurately predicted 16/18 (89%) major class I genes for a daughter-father-mother trio at the peptide level. Major class II genes were predicted with 100% concordance between the daughter-father-mother trio. In fifty HapMap samples with paired end reads just 37 nucleotides long, HLAforest predicted 96.5% of allele group level HLA haplotypes correctly and 83% of peptide level haplotypes correctly. In sixteen RNAseq samples with limited coverage across HLA genes, HLAforest predicted 97.7% of allele group level haplotypes and 85% of peptide level haplotypes correctly.
引用
收藏
页数:10
相关论文
共 50 条
  • [21] Rcount: simple and flexible RNA-Seq read counting
    Schmid, Marc W.
    Grossniklaus, Ueli
    BIOINFORMATICS, 2015, 31 (03) : 436 - 437
  • [22] Characterizing and annotating the genome using RNA-seq data
    Geng Chen
    Tieliu Shi
    Leming Shi
    Science China Life Sciences, 2017, 60 : 116 - 125
  • [23] Measure transcript integrity using RNA-seq data
    Wang, Liguo
    Nie, Jinfu
    Sicotte, Hugues
    Li, Ying
    Eckel-Passow, Jeanette E.
    Dasari, Surendra
    Vedell, Peter T.
    Barman, Poulami
    Wang, Liewei
    Weinshiboum, Richard
    Jen, Jin
    Huang, Haojie
    Kohli, Manish
    Kocher, Jean-Pierre A.
    BMC BIOINFORMATICS, 2016, 17
  • [24] Assessing allele-specific expression across multiple tissues from RNA-seq read data
    Pirinen, Matti
    Lappalainen, Tuuli
    Zaitlen, Noah A.
    Dermitzakis, Emmanouil T.
    Donnelly, Peter
    McCarthy, Mark I.
    Rivas, Manuel A.
    BIOINFORMATICS, 2015, 31 (15) : 2497 - 2504
  • [25] DELongSeq for efficient detection of differential isoform expression from long-read RNA-seq data
    Hu, Yu
    Gouru, Anagha
    Wang, Kai
    NAR GENOMICS AND BIOINFORMATICS, 2023, 5 (01)
  • [26] Partitioning RNAs by length improves transcriptome reconstruction from short-read RNA-seq data
    Ringeling, Francisca Rojas
    Chakraborty, Shounak
    Vissers, Caroline
    Reiman, Derek
    Patel, Akshay M.
    Lee, Ki-Heon
    Hong, Ari
    Park, Chan-Woo
    Reska, Tim
    Gagneur, Julien
    Chang, Hyeshik
    Spletter, Maria L.
    Yoon, Ki-Jun
    Ming, Guo-li
    Song, Hongjun
    Canzar, Stefan
    NATURE BIOTECHNOLOGY, 2022, 40 (05) : 741 - +
  • [27] Partitioning RNAs by length improves transcriptome reconstruction from short-read RNA-seq data
    Francisca Rojas Ringeling
    Shounak Chakraborty
    Caroline Vissers
    Derek Reiman
    Akshay M. Patel
    Ki-Heon Lee
    Ari Hong
    Chan-Woo Park
    Tim Reska
    Julien Gagneur
    Hyeshik Chang
    Maria L. Spletter
    Ki-Jun Yoon
    Guo-li Ming
    Hongjun Song
    Stefan Canzar
    Nature Biotechnology, 2022, 40 : 741 - 750
  • [28] Context-aware transcript quantification from long-read RNA-seq data with Bambu
    Chen, Ying
    Sim, Andre
    Wan, Yuk Kei
    Yeo, Keith
    Lee, Joseph Jing Xian
    Ling, Min Hao
    Love, Michael I.
    Goke, Jonathan
    NATURE METHODS, 2023, 20 (08) : 1187 - +
  • [29] Context-aware transcript quantification from long-read RNA-seq data with Bambu
    Ying Chen
    Andre Sim
    Yuk Kei Wan
    Keith Yeo
    Joseph Jing Xian Lee
    Min Hao Ling
    Michael I. Love
    Jonathan Göke
    Nature Methods, 2023, 20 : 1187 - 1195
  • [30] Can We Detect T Cell Receptors from Long-Read RNA-Seq Data?
    Mika, Justyna
    Candeias, Serge M.
    Badie, Christophe
    Polanska, Joanna
    BIOINFORMATICS AND BIOMEDICAL ENGINEERING, PT II, 2022, : 450 - 463