GENCODE: The reference human genome annotation for The ENCODE Project

被引:3287
|
作者
Harrow, Jennifer [1 ]
Frankish, Adam [1 ]
Gonzalez, Jose M. [1 ]
Tapanari, Electra [1 ]
Diekhans, Mark [2 ]
Kokocinski, Felix [1 ]
Aken, Bronwen L. [1 ]
Barrell, Daniel [1 ]
Zadissa, Amonida [1 ]
Searle, Stephen [1 ]
Barnes, If [1 ]
Bignell, Alexandra [1 ]
Boychenko, Veronika [1 ]
Hunt, Toby [1 ]
Kay, Mike [1 ]
Mukherjee, Gaurab [1 ]
Rajan, Jeena [1 ]
Despacio-Reyes, Gloria [1 ]
Saunders, Gary [1 ]
Steward, Charles [1 ]
Harte, Rachel [2 ]
Lin, Michael [3 ]
Howald, Cedric [4 ]
Tanzer, Andrea [5 ,6 ]
Derrien, Thomas [4 ]
Chrast, Jacqueline [4 ]
Walters, Nathalie [4 ]
Balasubramanian, Suganthi [7 ]
Pei, Baikang [7 ]
Tress, Michael [8 ]
Manuel Rodriguez, Jose [8 ]
Ezkurdia, Iakes [8 ]
van Baren, Jeltje [9 ]
Brent, Michael [9 ]
Haussler, David [2 ]
Kellis, Manolis [3 ]
Valencia, Alfonso [8 ]
Reymond, Alexandre [4 ]
Gerstein, Mark [7 ]
Guigo, Roderic [5 ,6 ]
Hubbard, Tim J. [1 ]
机构
[1] Wellcome Trust Sanger Inst, Cambridge CB10 1SA, England
[2] Univ Calif Santa Cruz, Santa Cruz, CA 95064 USA
[3] MIT, Cambridge, MA 02139 USA
[4] Univ Lausanne, Ctr Integrat Genom, CH-1015 Lausanne, Switzerland
[5] Ctr Genom Regulat CRG, Barcelona 08003, Catalonia, Spain
[6] UPF, Barcelona 08003, Catalonia, Spain
[7] Yale Univ, New Haven, CT 06520 USA
[8] Spanish Natl Canc Res Ctr CNIO, E-28029 Madrid, Spain
[9] Ctr Genome Sci & Syst Biol, St Louis, MO 63130 USA
基金
美国国家卫生研究院; 美国国家科学基金会; 英国惠康基金;
关键词
GENE-EXPRESSION; NONCODING RNAS; IDENTIFICATION; SEQUENCES; REVEALS; PSEUDOGENE; PREDICTION; TOPOLOGY; TRANSCRIPTION; COMPLEXITY;
D O I
10.1101/gr.135350.111
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The GENCODE Consortium aims to identify all gene features in the human genome using a combination of computational analysis, manual annotation, and experimental validation. Since the first public release of this annotation data set, few new protein-coding loci have been added, yet the number of alternative splicing transcripts annotated has steadily increased. The GENCODE 7 release contains 20,687 protein-coding and 9640 long noncoding RNA loci and has 33,977 coding transcripts not represented in UCSC genes and RefSeq. It also has the most comprehensive annotation of long noncoding RNA (IncRNA) loci publicly available with the predominant transcript form consisting of two exons. We have examined the completeness of the transcript annotation and found that 35% of transcriptional start sites are supported by CAGE clusters and 62% of protein-coding genes have annotated polyA sites. Over one-third of GENCODE protein-coding genes are supported by peptide hits derived from mass spectrometry spectra submitted to Peptide Atlas. New models derived from the Illumina Body Map 2.0 RNA-seq data identify 3689 new loci not currently in GENCODE, of which 3127 consist of two exon models indicating that they are possibly unannotated long noncoding loci. GENCODE 7 is publicly available from gencodegenes.org and via the Ensembl and UCSC Genome Browsers.
引用
收藏
页码:1760 / 1774
页数:15
相关论文
共 50 条
  • [41] A draft annotation and overview of the human genome
    Fred A Wright
    William J Lemon
    Wei D Zhao
    Russell Sears
    Degen Zhuo
    Jian-Ping Wang
    Hee-Yung Yang
    Troy Baer
    Don Stredney
    Joe Spitzner
    Al Stutz
    Ralf Krahe
    Bo Yuan
    Genome Biology, 2 (3)
  • [42] The ENCODE project
    de Souza, Natalie
    NATURE METHODS, 2012, 9 (11) : 1046 - 1046
  • [43] A draft annotation and overview of the human genome
    Wright, Fred A.
    Lemon, William J.
    Zhao, Wei D.
    Sears, Russell
    Zhuo, Degen
    Wang, Jian-Ping
    Yang, Hee-Yung
    Baer, Troy
    Stredney, Don
    Spitzner, Joe
    Stutz, Al
    Krahe, Ralf
    Yuan, Bo
    GENOME BIOLOGY, 2001, 2 (07):
  • [44] A draft annotation and overview of the human genome
    Fred A Wright
    William J Lemon
    Wei D Zhao
    Russell Sears
    Degen Zhuo
    Jian-Ping Wang
    Hee-Yung Yang
    Troy Baer
    Don Stredney
    Joe Spitzner
    Al Stutz
    Ralf Krahe
    Bo Yuan
    Genome Biology, 2 (7)
  • [45] Expanding GENCODE gene annotation to elucidate disease-linked variants
    Mudge, J. M.
    Hunt, T.
    Gonzalez, J. M.
    Jungreis, I.
    Lagarde, J.
    Johnson, R.
    Steward, C.
    Flicek, P.
    Frankish, A.
    EUROPEAN JOURNAL OF HUMAN GENETICS, 2019, 27 : 1698 - 1699
  • [46] Epigenetics, chromatin and genome organization: recent advances from the ENCODE project
    Siggens, L.
    Ekwall, K.
    JOURNAL OF INTERNAL MEDICINE, 2014, 276 (03) : 201 - 214
  • [47] Structured RNAs in the ENCODE selected regions of the human genome
    Washietl, Stefan
    Pedersen, Jakob S.
    Korbel, Jan O.
    Stocsits, Claudia
    Gruber, Andreas R.
    Hackermueller, Joerg
    Hertel, Jana
    Lindemeyer, Manja
    Reiche, Kristin
    Tanzer, Andrea
    Ucla, Catherine
    Wyss, Carine
    Antonarakis, Stylianos E.
    Denoeud, France
    Lagarde, Julien
    Drenkow, Jorg
    Kapranov, Philipp
    Gingeras, Thomas R.
    Guigo, Roderic
    Snyder, Michael
    Gerstein, Mark B.
    Reymond, Alexandre
    Hofacker, Ivo L.
    Stadler, Peter F.
    GENOME RESEARCH, 2007, 17 (06) : 852 - 864
  • [48] Thematic Minireview Series on Results from the ENCODE Project: Integrative Global Analyses of Regulatory Regions in the Human Genome
    Farnham, Peggy J.
    JOURNAL OF BIOLOGICAL CHEMISTRY, 2012, 287 (37) : 30885 - 30887
  • [49] Curated genome annotation of Oryza sativa ssp japonica and comparative genome analysis with Arabidopsis thaliana -: The Rice Annotation Project
    Gojobori, Takashi
    GENOME RESEARCH, 2007, 17 (02) : 175 - 183
  • [50] The human genome project
    McElheny, VK
    SCIENTIST, 2006, 20 (02): : 42 - +