GENCODE: The reference human genome annotation for The ENCODE Project

被引:3287
|
作者
Harrow, Jennifer [1 ]
Frankish, Adam [1 ]
Gonzalez, Jose M. [1 ]
Tapanari, Electra [1 ]
Diekhans, Mark [2 ]
Kokocinski, Felix [1 ]
Aken, Bronwen L. [1 ]
Barrell, Daniel [1 ]
Zadissa, Amonida [1 ]
Searle, Stephen [1 ]
Barnes, If [1 ]
Bignell, Alexandra [1 ]
Boychenko, Veronika [1 ]
Hunt, Toby [1 ]
Kay, Mike [1 ]
Mukherjee, Gaurab [1 ]
Rajan, Jeena [1 ]
Despacio-Reyes, Gloria [1 ]
Saunders, Gary [1 ]
Steward, Charles [1 ]
Harte, Rachel [2 ]
Lin, Michael [3 ]
Howald, Cedric [4 ]
Tanzer, Andrea [5 ,6 ]
Derrien, Thomas [4 ]
Chrast, Jacqueline [4 ]
Walters, Nathalie [4 ]
Balasubramanian, Suganthi [7 ]
Pei, Baikang [7 ]
Tress, Michael [8 ]
Manuel Rodriguez, Jose [8 ]
Ezkurdia, Iakes [8 ]
van Baren, Jeltje [9 ]
Brent, Michael [9 ]
Haussler, David [2 ]
Kellis, Manolis [3 ]
Valencia, Alfonso [8 ]
Reymond, Alexandre [4 ]
Gerstein, Mark [7 ]
Guigo, Roderic [5 ,6 ]
Hubbard, Tim J. [1 ]
机构
[1] Wellcome Trust Sanger Inst, Cambridge CB10 1SA, England
[2] Univ Calif Santa Cruz, Santa Cruz, CA 95064 USA
[3] MIT, Cambridge, MA 02139 USA
[4] Univ Lausanne, Ctr Integrat Genom, CH-1015 Lausanne, Switzerland
[5] Ctr Genom Regulat CRG, Barcelona 08003, Catalonia, Spain
[6] UPF, Barcelona 08003, Catalonia, Spain
[7] Yale Univ, New Haven, CT 06520 USA
[8] Spanish Natl Canc Res Ctr CNIO, E-28029 Madrid, Spain
[9] Ctr Genome Sci & Syst Biol, St Louis, MO 63130 USA
基金
美国国家卫生研究院; 美国国家科学基金会; 英国惠康基金;
关键词
GENE-EXPRESSION; NONCODING RNAS; IDENTIFICATION; SEQUENCES; REVEALS; PSEUDOGENE; PREDICTION; TOPOLOGY; TRANSCRIPTION; COMPLEXITY;
D O I
10.1101/gr.135350.111
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The GENCODE Consortium aims to identify all gene features in the human genome using a combination of computational analysis, manual annotation, and experimental validation. Since the first public release of this annotation data set, few new protein-coding loci have been added, yet the number of alternative splicing transcripts annotated has steadily increased. The GENCODE 7 release contains 20,687 protein-coding and 9640 long noncoding RNA loci and has 33,977 coding transcripts not represented in UCSC genes and RefSeq. It also has the most comprehensive annotation of long noncoding RNA (IncRNA) loci publicly available with the predominant transcript form consisting of two exons. We have examined the completeness of the transcript annotation and found that 35% of transcriptional start sites are supported by CAGE clusters and 62% of protein-coding genes have annotated polyA sites. Over one-third of GENCODE protein-coding genes are supported by peptide hits derived from mass spectrometry spectra submitted to Peptide Atlas. New models derived from the Illumina Body Map 2.0 RNA-seq data identify 3689 new loci not currently in GENCODE, of which 3127 consist of two exon models indicating that they are possibly unannotated long noncoding loci. GENCODE 7 is publicly available from gencodegenes.org and via the Ensembl and UCSC Genome Browsers.
引用
收藏
页码:1760 / 1774
页数:15
相关论文
共 50 条
  • [21] Annotation of the human genome
    Gerstein, M
    SCIENCE, 2000, 288 (5471) : 1590 - 1590
  • [22] Mouse genome annotation by the RefSeq project
    Kelly M. McGarvey
    Tamara Goldfarb
    Eric Cox
    Catherine M. Farrell
    Tripti Gupta
    Vinita S. Joardar
    Vamsi K. Kodali
    Michael R. Murphy
    Nuala A. O’Leary
    Shashikant Pujar
    Bhanu Rajput
    Sanjida H. Rangwala
    Lillian D. Riddick
    David Webb
    Mathew W. Wright
    Terence D. Murphy
    Kim D. Pruitt
    Mammalian Genome, 2015, 26 : 379 - 390
  • [23] The Gene Ontology's Reference Genome Project: A Unified Framework for Functional Annotation across Species
    Gaudet, Pascale
    Chisholm, Rex
    Berardini, Tanya
    Dimmer, Emily
    Engel, Stacia R.
    Fey, Petra
    Hill, David P.
    Howe, Doug
    Hu, James C.
    Huntley, Rachael
    Khodiyar, Varsha K.
    Kishore, Ranjana
    Li, Donghui
    Lovering, Ruth C.
    McCarthy, Fiona
    Ni, Li
    Petri, Victoria
    Siegele, Deborah A.
    Tweedie, Susan
    Van Auken, Kimberly
    Wood, Valerie
    Basu, Siddhartha
    Carbon, Seth
    Dolan, Mary
    Mungall, Christopher J.
    Dolinski, Kara
    Thomas, Paul
    Ashburner, Michael
    Blake, Judith A.
    Cherry, J. Michael
    Lewis, Suzanna E.
    PLOS COMPUTATIONAL BIOLOGY, 2009, 5 (07)
  • [24] nGASP - the nematode genome annotation assessment project
    Coghlan, Avril
    Fiedler, Tristan J.
    Mckay, Sheldon J.
    Flicek, Paul
    Harris, Todd W.
    Blasiar, Darin
    Stein, Lincoln D.
    BMC BIOINFORMATICS, 2008, 9 (1)
  • [25] An Educational Bioinformatics Project to Improve Genome Annotation
    Amatore, Zoie
    Gunn, Susan
    Harris, Laura K.
    FRONTIERS IN MICROBIOLOGY, 2020, 11
  • [26] THE OYSTER GENOME PROJECT: AN UPDATE ON ASSEMBLY AND ANNOTATION
    Zhang, Guofan
    Guo, Ximing
    Li, Li
    Xu, Fei
    Wang, Xiaotong
    Qi, Haigang
    Zhang, Linlin
    Que, Huayong
    Wu, Hougang
    Wang, Shihuan
    Hedgecock, Dennis
    Gaffney, Patrick M.
    Luo, Ruibang
    Fang, Xiaodong
    Wang, Jun
    JOURNAL OF SHELLFISH RESEARCH, 2011, 30 (02): : 567 - 567
  • [27] The functional annotation of the sheep genome project.
    Murdoch, Brenda M.
    JOURNAL OF ANIMAL SCIENCE, 2019, 97 : 16 - 16
  • [28] The zebrafish genome project: Sequence analysis and annotation
    Jekosch, K
    ZEBRAFISH:2ND EDITION GENETICS GENOMICS AND INFORMATICS, 2004, 77 : 225 - 239
  • [29] nGASP – the nematode genome annotation assessment project
    Avril Coghlan
    Tristan J Fiedler
    Sheldon J McKay
    Paul Flicek
    Todd W Harris
    Darin Blasiar
    Lincoln D Stein
    BMC Bioinformatics, 9
  • [30] ENCODE apophenia or a panglossian analysis of the human genome
    Casane, Didier
    Fumey, Julien
    Laurenti, Patrick
    M S-MEDECINE SCIENCES, 2015, 31 (6-7): : 680 - 686