EGASP:: the human ENCODE genome annotation assessment project

被引:82
|
作者
Guigo, Roderic [1 ]
Flicek, Paul
Abril, Josep F.
Reymond, Alexandre
Lagarde, Julien
Denoeud, France
Antonarakis, Stylianos
Ashburner, Michael
Bajic, Vladimir B.
Birney, Ewan
Castelo, Robert
Eyras, Eduardo
Ucla, Catherine
Gingeras, Thomas R.
Harrow, Jennifer
Hubbard, Tim
Lewis, Suzanna E.
Reese, Martin G.
机构
[1] Univ Pompeu Fabra, Ctr Reg Genom, Inst Municipal Invest Med, E-08003 Barcelona, Spain
[2] European Bioinformat Inst, Cambridge CB10 1SD, England
[3] Univ Lausanne, Ctr Integrat Genom, Lausanne, Switzerland
[4] Univ Geneva, Sch Med, Univ Hosp Geneva, CH-1211 Geneva, Switzerland
[5] Univ Cambridge, Dept Genet, Cambridge CB2 3EH, England
[6] Univ Western Cape, S African Natl Bioinformat Inst, ZA-7535 Bellville, South Africa
[7] Affymetrix Inc, Santa Clara, CA 95051 USA
[8] Wellcome Trust Sanger Inst, Cambridge CB10 1SA, England
[9] Univ Calif Berkeley, Dept Mol & Cellular Biol, Berkeley, CA 94792 USA
[10] Omicia Inc, Emeryville, CA 94608 USA
关键词
D O I
10.1186/gb-2006-7-s1-s2
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: We present the results of EGASP, a community experiment to assess the state-of-the-art in genome annotation within the ENCODE regions, which span 1% of the human genome sequence. The experiment had two major goals: the assessment of the accuracy of computational methods to predict protein coding genes; and the overall assessment of the completeness of the current human genome annotations as represented in the ENCODE regions. For the computational prediction assessment, eighteen groups contributed gene predictions. We evaluated these submissions against each other based on a 'reference set' of annotations generated as part of the GENCODE project. These annotations were not available to the prediction groups prior to the submission deadline, so that their predictions were blind and an external advisory committee could perform a fair assessment. Results: The best methods had at least one gene transcript correctly predicted for close to 70% of the annotated genes. Nevertheless, the multiple transcript accuracy, taking into account alternative splicing, reached only approximately 40% to 50% accuracy. At the coding nucleotide level, the best programs reached an accuracy of 90% in both sensitivity and specificity. Programs relying on mRNA and protein sequences were the most accurate in reproducing the manually curated annotations. Experimental validation shows that only a very small percentage (3.2%) of the selected 221 computationally predicted exons outside of the existing annotation could be verified. Conclusions: This is the first such experiment in human DNA, and we have followed the standards established in a similar experiment, GASPI, in Drosophila melanogaster. We believe the results presented here contribute to the value of ongoing large-scale annotation projects and should guide further experimental methods when being scaled up to the entire human genome sequence.
引用
收藏
页数:31
相关论文
共 50 条
  • [1] EGASP: the human ENCODE Genome Annotation Assessment Project
    Roderic Guigó
    Paul Flicek
    Josep F Abril
    Alexandre Reymond
    Julien Lagarde
    France Denoeud
    Stylianos Antonarakis
    Michael Ashburner
    Vladimir B Bajic
    Ewan Birney
    Robert Castelo
    Eduardo Eyras
    Catherine Ucla
    Thomas R Gingeras
    Jennifer Harrow
    Tim Hubbard
    Suzanna E Lewis
    Martin G Reese
    [J]. Genome Biology, 7
  • [2] GENCODE: The reference human genome annotation for The ENCODE Project
    Harrow, Jennifer
    Frankish, Adam
    Gonzalez, Jose M.
    Tapanari, Electra
    Diekhans, Mark
    Kokocinski, Felix
    Aken, Bronwen L.
    Barrell, Daniel
    Zadissa, Amonida
    Searle, Stephen
    Barnes, If
    Bignell, Alexandra
    Boychenko, Veronika
    Hunt, Toby
    Kay, Mike
    Mukherjee, Gaurab
    Rajan, Jeena
    Despacio-Reyes, Gloria
    Saunders, Gary
    Steward, Charles
    Harte, Rachel
    Lin, Michael
    Howald, Cedric
    Tanzer, Andrea
    Derrien, Thomas
    Chrast, Jacqueline
    Walters, Nathalie
    Balasubramanian, Suganthi
    Pei, Baikang
    Tress, Michael
    Manuel Rodriguez, Jose
    Ezkurdia, Iakes
    van Baren, Jeltje
    Brent, Michael
    Haussler, David
    Kellis, Manolis
    Valencia, Alfonso
    Reymond, Alexandre
    Gerstein, Mark
    Guigo, Roderic
    Hubbard, Tim J.
    [J]. GENOME RESEARCH, 2012, 22 (09) : 1760 - 1774
  • [3] Performance assessment of promoter predictions on ENCODE regions in the EGASP experiment
    Vladimir B Bajic
    Michael R Brent
    Randall H Brown
    Adam Frankish
    Jennifer Harrow
    Uwe Ohler
    Victor V Solovyev
    Sin Lam Tan
    [J]. Genome Biology, 7
  • [4] Performance assessment of promoter predictions on ENCODE regions in the EGASP experiment
    Bajic, Vladimir B.
    Brent, Michael R.
    Brown, Randall H.
    Frankish, Adam
    Harrow, Jennifer
    Ohler, Uwe
    Solovyev, Victor V.
    Tan, Sin Lam
    [J]. GENOME BIOLOGY, 2006, 7 (Suppl 1)
  • [5] nGASP - the nematode genome annotation assessment project
    Coghlan, Avril
    Fiedler, Tristan J.
    Mckay, Sheldon J.
    Flicek, Paul
    Harris, Todd W.
    Blasiar, Darin
    Stein, Lincoln D.
    [J]. BMC BIOINFORMATICS, 2008, 9 (1)
  • [6] nGASP – the nematode genome annotation assessment project
    Avril Coghlan
    Tristan J Fiedler
    Sheldon J McKay
    Paul Flicek
    Todd W Harris
    Darin Blasiar
    Lincoln D Stein
    [J]. BMC Bioinformatics, 9
  • [7] A biologist's view of the Drosophila genome annotation assessment project
    Ashburner, M
    [J]. GENOME RESEARCH, 2000, 10 (04) : 391 - 393
  • [8] Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project
    Birney, Ewan
    Stamatoyannopoulos, John A.
    Dutta, Anindya
    Guigo, Roderic
    Gingeras, Thomas R.
    Margulies, Elliott H.
    Weng, Zhiping
    Snyder, Michael
    Dermitzakis, Emmanouil T.
    Stamatoyannopoulos, John A.
    Thurman, Robert E.
    Kuehn, Michael S.
    Taylor, Christopher M.
    Neph, Shane
    Koch, Christoph M.
    Asthana, Saurabh
    Malhotra, Ankit
    Adzhubei, Ivan
    Greenbaum, Jason A.
    Andrews, Robert M.
    Flicek, Paul
    Boyle, Patrick J.
    Cao, Hua
    Carter, Nigel P.
    Clelland, Gayle K.
    Davis, Sean
    Day, Nathan
    Dhami, Pawandeep
    Dillon, Shane C.
    Dorschner, Michael O.
    Fiegler, Heike
    Giresi, Paul G.
    Goldy, Jeff
    Hawrylycz, Michael
    Haydock, Andrew
    Humbert, Richard
    James, Keith D.
    Johnson, Brett E.
    Johnson, Ericka M.
    Frum, Tristan T.
    Rosenzweig, Elizabeth R.
    Karnani, Neerja
    Lee, Kirsten
    Lefebvre, Gregory C.
    Navas, Patrick A.
    Neri, Fidencio
    Parker, Stephen C. J.
    Sabo, Peter J.
    Sandstrom, Richard
    Shafer, Anthony
    [J]. NATURE, 2007, 447 (7146) : 799 - 816
  • [10] ENCODE and the parts of the human genome
    Kaiser, Marie, I
    [J]. STUDIES IN HISTORY AND PHILOSOPHY OF SCIENCE PART C-STUDIES IN HISTORY AND PHILOSOPHY OF BIOLOGICAL AND BIOMEDICAL SCIENCES, 2018, 72 : 28 - 37