Comparison of GENCODE and RefSeq gene annotation and the impact of reference geneset on variant effect prediction

被引:60
|
作者
Frankish, Adam [1 ]
Uszczynska, Barbara [2 ]
Ritchie, Graham R. S. [1 ,3 ]
Gonzalez, Jose M. [1 ]
Pervouchine, Dmitri [2 ,4 ,5 ]
Petryszak, Robert [3 ]
Mudge, Jonathan M. [1 ]
Fonseca, Nuno [3 ]
Brazma, Alvis [3 ]
Guigo, Roderic [2 ]
Harrow, Jennifer [1 ]
机构
[1] Wellcome Trust Sanger Inst, Cambridge CB10 1SA, England
[2] Ctr Genom Regulat, Barcelona, Catalonia, Spain
[3] European Bioinformat Inst, European Mol Biol Lab, Cambridge CB10 1SD, England
[4] Fac Bioengn & Bioinformat, Moscow 119992, Russia
[5] Moscow MV Lomonosov State Univ, Moscow, Russia
来源
BMC GENOMICS | 2015年 / 16卷
基金
英国惠康基金; 美国国家卫生研究院;
关键词
MESSENGER-RNA STABILITY; GENOME ANNOTATION; INTRON RETENTION; TRANSCRIPTS; BROWSER;
D O I
10.1186/1471-2164-16-S8-S2
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: A vast amount of DNA variation is being identified by increasingly large-scale exome and genome sequencing projects. To be useful, variants require accurate functional annotation and a wide range of tools are available to this end. McCarthy et al recently demonstrated the large differences in prediction of loss-of-function (LoF) variation when RefSeq and Ensembl transcripts are used for annotation, highlighting the importance of the reference transcripts on which variant functional annotation is based. Results: We describe a detailed analysis of the similarities and differences between the gene and transcript annotation in the GENCODE and RefSeq genesets. We demonstrate that the GENCODE Comprehensive set is richer in alternative splicing, novel CDSs, novel exons and has higher genomic coverage than RefSeq, while the GENCODE Basic set is very similar to RefSeq. Using RNAseq data we show that exons and introns unique to one geneset are expressed at a similar level to those common to both. We present evidence that the differences in gene annotation lead to large differences in variant annotation where GENCODE and RefSeq are used as reference transcripts, although this is predominantly confined to non-coding transcripts and UTR sequence, with at most similar to 30% of LoF variants annotated discordantly. We also describe an investigation of dominant transcript expression, showing that it both supports the utility of the GENCODE Basic set in providing a smaller set of more highly expressed transcripts and provides a useful, biologically-relevant filter for further reducing the complexity of the transcriptome. Conclusions: The reference transcripts selected for variant functional annotation do have a large effect on the outcome. The GENCODE Comprehensive transcripts contain more exons, have greater genomic coverage and capture many more variants than RefSeq in both genome and exome datasets, while the GENCODE Basic set shows a higher degree of concordance with RefSeq and has fewer unique features. We propose that the GENCODE Comprehensive set has great utility for the discovery of new variants with functional potential, while the GENCODE Basic set is more suitable for applications demanding less complex interpretation of functional variants.
引用
收藏
页数:11
相关论文
共 20 条
  • [1] Comparison of GENCODE and RefSeq gene annotation and the impact of reference geneset on variant effect prediction
    Adam Frankish
    Barbara Uszczynska
    Graham RS Ritchie
    Jose M Gonzalez
    Dmitri Pervouchine
    Robert Petryszak
    Jonathan M Mudge
    Nuno Fonseca
    Alvis Brazma
    Roderic Guigo
    Jennifer Harrow
    [J]. BMC Genomics, 16
  • [2] Improving GENCODE reference gene annotation using a high-stringency proteogenomics workflow
    Wright, James C.
    Mudge, Jonathan
    Weisser, Hendrik
    Barzine, Mitra P.
    Gonzalez, Jose M.
    Brazma, Alvis
    Choudhary, Jyoti S.
    Harrow, Jennifer
    [J]. NATURE COMMUNICATIONS, 2016, 7
  • [3] Improving GENCODE reference gene annotation using a high-stringency proteogenomics workflow
    James C. Wright
    Jonathan Mudge
    Hendrik Weisser
    Mitra P. Barzine
    Jose M. Gonzalez
    Alvis Brazma
    Jyoti S. Choudhary
    Jennifer Harrow
    [J]. Nature Communications, 7
  • [4] Enhanced molecular consequence prediction and variant annotation with the Ensembl Variant Effect Predictor
    Lemos, Diana
    Saraiva-Agostinho, Nuno
    Orimoloye, Ola Austine
    Azov, Andrey
    Marques-Coelho, Diego
    Hossain, S. Nakib
    Schuilenburg, Helen
    Allen, Jamie
    Trevanion, Stephen
    Hunt, Sarah
    Flicek, Paul
    Cunningham, Fiona
    [J]. EUROPEAN JOURNAL OF HUMAN GENETICS, 2023, 31 : 607 - 608
  • [5] 'Deep dive' disease gene re-annotation in GENCODE: identifying and reporting new variant interpretations of likely clinical relevance.
    Mudge, J. M.
    Hunt, T.
    Gonzalez, J. M.
    Frankish, A.
    [J]. EUROPEAN JOURNAL OF HUMAN GENETICS, 2020, 28 (SUPPL 1) : 668 - 669
  • [6] Large-scale prokaryotic gene prediction and comparison to genome annotation
    Nielsen, P
    Krogh, A
    [J]. BIOINFORMATICS, 2005, 21 (24) : 4322 - 4329
  • [7] SUsPECT: a pipeline for variant effect prediction based on custom long-read transcriptomes for improved clinical variant annotation
    Salz, Renee
    Saraiva-Agostinho, Nuno
    Vorsteveld, Emil
    van der Made, Caspar I.
    Kersten, Simone
    Stemerdink, Merel
    Allen, Jamie
    Volders, Pieter-Jan
    Hunt, Sarah E.
    Hoischen, Alexander
    't Hoen, Peter A. C.
    [J]. BMC GENOMICS, 2023, 24 (01)
  • [8] SUsPECT: a pipeline for variant effect prediction based on custom long-read transcriptomes for improved clinical variant annotation
    Renee Salz
    Nuno Saraiva-Agostinho
    Emil Vorsteveld
    Caspar I. van der Made
    Simone Kersten
    Merel Stemerdink
    Jamie Allen
    Pieter-Jan Volders
    Sarah E. Hunt
    Alexander Hoischen
    Peter A.C. ’t Hoen
    [J]. BMC Genomics, 24
  • [9] Genome Assembly Has a Major Impact on Gene Content: A Comparison of Annotation in Two Bos Taurus Assemblies
    Florea, Liliana
    Souvorov, Alexander
    Kalbfleisch, Theodore S.
    Salzberg, Steven L.
    [J]. PLOS ONE, 2011, 6 (06):
  • [10] Genetic Analysis Workshop 18 single-nucleotide variant prioritization based on protein impact, sequence conservation, and gene annotation
    Thomas Nalpathamkalam
    Andriy Derkach
    Andrew D Paterson
    Daniele Merico
    [J]. BMC Proceedings, 8 (Suppl 1)