GENCODE: The reference human genome annotation for The ENCODE Project

被引:3287
|
作者
Harrow, Jennifer [1 ]
Frankish, Adam [1 ]
Gonzalez, Jose M. [1 ]
Tapanari, Electra [1 ]
Diekhans, Mark [2 ]
Kokocinski, Felix [1 ]
Aken, Bronwen L. [1 ]
Barrell, Daniel [1 ]
Zadissa, Amonida [1 ]
Searle, Stephen [1 ]
Barnes, If [1 ]
Bignell, Alexandra [1 ]
Boychenko, Veronika [1 ]
Hunt, Toby [1 ]
Kay, Mike [1 ]
Mukherjee, Gaurab [1 ]
Rajan, Jeena [1 ]
Despacio-Reyes, Gloria [1 ]
Saunders, Gary [1 ]
Steward, Charles [1 ]
Harte, Rachel [2 ]
Lin, Michael [3 ]
Howald, Cedric [4 ]
Tanzer, Andrea [5 ,6 ]
Derrien, Thomas [4 ]
Chrast, Jacqueline [4 ]
Walters, Nathalie [4 ]
Balasubramanian, Suganthi [7 ]
Pei, Baikang [7 ]
Tress, Michael [8 ]
Manuel Rodriguez, Jose [8 ]
Ezkurdia, Iakes [8 ]
van Baren, Jeltje [9 ]
Brent, Michael [9 ]
Haussler, David [2 ]
Kellis, Manolis [3 ]
Valencia, Alfonso [8 ]
Reymond, Alexandre [4 ]
Gerstein, Mark [7 ]
Guigo, Roderic [5 ,6 ]
Hubbard, Tim J. [1 ]
机构
[1] Wellcome Trust Sanger Inst, Cambridge CB10 1SA, England
[2] Univ Calif Santa Cruz, Santa Cruz, CA 95064 USA
[3] MIT, Cambridge, MA 02139 USA
[4] Univ Lausanne, Ctr Integrat Genom, CH-1015 Lausanne, Switzerland
[5] Ctr Genom Regulat CRG, Barcelona 08003, Catalonia, Spain
[6] UPF, Barcelona 08003, Catalonia, Spain
[7] Yale Univ, New Haven, CT 06520 USA
[8] Spanish Natl Canc Res Ctr CNIO, E-28029 Madrid, Spain
[9] Ctr Genome Sci & Syst Biol, St Louis, MO 63130 USA
基金
美国国家卫生研究院; 美国国家科学基金会; 英国惠康基金;
关键词
GENE-EXPRESSION; NONCODING RNAS; IDENTIFICATION; SEQUENCES; REVEALS; PSEUDOGENE; PREDICTION; TOPOLOGY; TRANSCRIPTION; COMPLEXITY;
D O I
10.1101/gr.135350.111
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The GENCODE Consortium aims to identify all gene features in the human genome using a combination of computational analysis, manual annotation, and experimental validation. Since the first public release of this annotation data set, few new protein-coding loci have been added, yet the number of alternative splicing transcripts annotated has steadily increased. The GENCODE 7 release contains 20,687 protein-coding and 9640 long noncoding RNA loci and has 33,977 coding transcripts not represented in UCSC genes and RefSeq. It also has the most comprehensive annotation of long noncoding RNA (IncRNA) loci publicly available with the predominant transcript form consisting of two exons. We have examined the completeness of the transcript annotation and found that 35% of transcriptional start sites are supported by CAGE clusters and 62% of protein-coding genes have annotated polyA sites. Over one-third of GENCODE protein-coding genes are supported by peptide hits derived from mass spectrometry spectra submitted to Peptide Atlas. New models derived from the Illumina Body Map 2.0 RNA-seq data identify 3689 new loci not currently in GENCODE, of which 3127 consist of two exon models indicating that they are possibly unannotated long noncoding loci. GENCODE 7 is publicly available from gencodegenes.org and via the Ensembl and UCSC Genome Browsers.
引用
收藏
页码:1760 / 1774
页数:15
相关论文
共 50 条
  • [1] GENCODE: producing a reference annotation for ENCODE
    Harrow, Jennifer
    Denoeud, France
    Frankish, Adam
    Reymond, Alexandre
    Chen, Chao-Kung
    Chrast, Jacqueline
    Lagarde, Julien
    Gilbert, James Gr
    Storey, Roy
    Swarbreck, David
    Rossier, Colette
    Ucla, Catherine
    Hubbard, Tim
    Antonarakis, Stylianos E.
    Guigo, Roderic
    GENOME BIOLOGY, 2006, 7 (Suppl 1)
  • [2] GENCODE: producing a reference annotation for ENCODE
    Jennifer Harrow
    France Denoeud
    Adam Frankish
    Alexandre Reymond
    Chao-Kung Chen
    Jacqueline Chrast
    Julien Lagarde
    James GR Gilbert
    Roy Storey
    David Swarbreck
    Colette Rossier
    Catherine Ucla
    Tim Hubbard
    Stylianos E Antonarakis
    Roderic Guigo
    Genome Biology, 7
  • [3] EGASP:: the human ENCODE genome annotation assessment project
    Guigo, Roderic
    Flicek, Paul
    Abril, Josep F.
    Reymond, Alexandre
    Lagarde, Julien
    Denoeud, France
    Antonarakis, Stylianos
    Ashburner, Michael
    Bajic, Vladimir B.
    Birney, Ewan
    Castelo, Robert
    Eyras, Eduardo
    Ucla, Catherine
    Gingeras, Thomas R.
    Harrow, Jennifer
    Hubbard, Tim
    Lewis, Suzanna E.
    Reese, Martin G.
    GENOME BIOLOGY, 2006, 7 (Suppl 1)
  • [4] EGASP: the human ENCODE Genome Annotation Assessment Project
    Roderic Guigó
    Paul Flicek
    Josep F Abril
    Alexandre Reymond
    Julien Lagarde
    France Denoeud
    Stylianos Antonarakis
    Michael Ashburner
    Vladimir B Bajic
    Ewan Birney
    Robert Castelo
    Eduardo Eyras
    Catherine Ucla
    Thomas R Gingeras
    Jennifer Harrow
    Tim Hubbard
    Suzanna E Lewis
    Martin G Reese
    Genome Biology, 7
  • [5] GENCODE reference annotation for the human and mouse genomes
    Frankish, Adam
    Diekhans, Mark
    Ferreira, Anne-Maud
    Johnson, Rory
    Jungreis, Irwin
    Loveland, Jane
    Mudge, Jonathan M.
    Sisu, Cristina
    Wright, James
    Armstrong, Joel
    Barnes, If
    Berry, Andrew
    Bignell, Alexandra
    Sala, Silvia Carbonell
    Chrast, Jacqueline
    Cunningham, Fiona
    Di Domenico, Tomas
    Donaldson, Sarah
    Fiddes, Ian T.
    Giron, Carlos Garcia
    Gonzalez, Jose Manuel
    Grego, Tiago
    Hardy, Matthew
    Hourlier, Thibaut
    Hunt, Toby
    Izuogu, Osagie G.
    Lagarde, Julien
    Martin, Fergal J.
    Martinez, Laura
    Mohanan, Shamika
    Muir, Paul
    Navarro, Fabio C. P.
    Parker, Anne
    Pei, Baikang
    Pozo, Fernando
    Ruffier, Magali
    Schmitt, Bianca M.
    Stapleton, Eloise
    Suner, Marie-Marthe
    Sycheva, Irina
    Uszczynska-Ratajczak, Barbara
    Xu, Jinuri
    Yates, Andrew
    Zerbino, Daniel
    Zhang, Yan
    Aken, Bronwen
    Choudhary, Jyoti S.
    Gerstein, Mark
    Guigo, Roderic
    Hubbard, Tim J. P.
    NUCLEIC ACIDS RESEARCH, 2019, 47 (D1) : D766 - D773
  • [6] GENCODE 2025: reference gene annotation for human and mouse
    Mudge, Jonathan M.
    Carbonell-Sala, Silvia
    Diekhans, Mark
    Martinez, Jose Gonzalez
    Hunt, Toby
    Jungreis, Irwin
    Loveland, Jane E.
    Arnan, Carme
    Barnes, If
    Bennett, Ruth
    Berry, Andrew
    Bignell, Alexandra
    Cerdan-Velez, Daniel
    Cochran, Kelly
    Cortes, Lucas T.
    Davidson, Claire
    Donaldson, Sarah
    Dursun, Cagatay
    Fatima, Reham
    Hardy, Matthew
    Hebbar, Prajna
    Hollis, Zoe
    James, Benjamin T.
    Jiang, Yunzhe
    Johnson, Rory
    Kaur, Gazaldeep
    Kay, Mike
    Mangan, Riley J.
    Maquedano, Miguel
    Martinez Gomez, Laura
    Mathlouthi, Nourhen
    Merritt, Ryan
    Ni, Pengyu
    Palumbo, Emilio
    Perteghella, Tamara
    Pozo, Fernando
    Raj, Shriya
    Sisu, Cristina
    Steed, Emily
    Sumathipala, Dulika
    Suner, Marie-Marthe
    Uszczynska-Ratajczak, Barbara
    Wass, Elizabeth
    Yang, Yucheng T.
    Zhang, Dingyao
    Finn, Robert D.
    Gerstein, Mark
    Guigo, Roderic
    Hubbard, Tim J. P.
    Kellis, Manolis
    NUCLEIC ACIDS RESEARCH, 2024, 53 (01) : D966 - D975
  • [7] GENCODE: reference annotation for the human and mouse genomes in 2023
    Frankish, Adam
    Carbonell-Sala, Silvia
    Diekhans, Mark
    Jungreis, Irwin
    Loveland, Jane E.
    Mudge, Jonathan M.
    Sisu, Cristina
    Wright, James C.
    Arnan, Carme
    Barnes, If
    Banerjee, Abhimanyu
    Bennett, Ruth
    Berry, Andrew
    Bignell, Alexandra
    Boix, Carles
    Calvet, Ferriol
    Cerdan-Velez, Daniel
    Cunningham, Fiona
    Davidson, Claire
    Donaldson, Sarah
    Dursun, Cagatay
    Fatima, Reham
    Giorgetti, Stefano
    Giron, Carlos Garcia
    Gonzalez, Jose Manuel
    Hardy, Matthew
    Harrison, Peter W.
    Hourlier, Thibaut
    Hollis, Zoe
    Hunt, Toby
    James, Benjamin
    Jiang, Yunzhe
    Johnson, Rory
    Kay, Mike
    Lagarde, Julien
    Martin, Fergal J.
    Gomez, Laura Martinez
    Nair, Surag
    Ni, Pengyu
    Pozo, Fernando
    Ramalingam, Vivek
    Ruffier, Magali
    Schmitt, Bianca M.
    Schreiber, Jacob M.
    Steed, Emily
    Suner, Marie-Marthe
    Sumathipala, Dulika
    Sycheva, Irina
    Uszczynska-Ratajczak, Barbara
    Wass, Elizabeth
    NUCLEIC ACIDS RESEARCH, 2023, 51 (D1) : D942 - D949
  • [8] Assembly and annotation of an Ashkenazi human reference genome
    Shumate, Alaina
    Zimin, Aleksey, V
    Sherman, Rachel M.
    Puiu, Daniela
    Wagner, Justin M.
    Olson, Nathan D.
    Pertea, Mihaela
    Salit, Marc L.
    Zook, Justin M.
    Salzberg, Steven L.
    GENOME BIOLOGY, 2020, 21 (01)
  • [9] Assembly and annotation of an Ashkenazi human reference genome
    Alaina Shumate
    Aleksey V. Zimin
    Rachel M. Sherman
    Daniela Puiu
    Justin M. Wagner
    Nathan D. Olson
    Mihaela Pertea
    Marc L. Salit
    Justin M. Zook
    Steven L. Salzberg
    Genome Biology, 21
  • [10] Improving GENCODE reference gene annotation using a high-stringency proteogenomics workflow
    Wright, James C.
    Mudge, Jonathan
    Weisser, Hendrik
    Barzine, Mitra P.
    Gonzalez, Jose M.
    Brazma, Alvis
    Choudhary, Jyoti S.
    Harrow, Jennifer
    NATURE COMMUNICATIONS, 2016, 7