nGASP - the nematode genome annotation assessment project

被引:31
|
作者
Coghlan, Avril [2 ]
Fiedler, Tristan J. [3 ]
Mckay, Sheldon J. [1 ]
Flicek, Paul [4 ]
Harris, Todd W. [1 ]
Blasiar, Darin [5 ]
Stein, Lincoln D. [1 ]
机构
[1] Cold Spring Harbor Lab, Cold Spring Harbor, NY 11724 USA
[2] Wellcome Trust Sanger Inst, Cambridge CB10 1SA, England
[3] Florida Inst Technol, Dept Biol Sci, Melbourne, FL 32901 USA
[4] European Bioinformat Inst, Cambridge CB10 1SD, England
[5] Washington Univ, Sch Med, St Louis, MO 63108 USA
基金
英国惠康基金; 美国国家卫生研究院;
关键词
D O I
10.1186/1471-2105-9-549
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: While the C. elegans genome is extensively annotated, relatively little information is available for other Caenorhabditis species. The nematode genome annotation assessment project (nGASP) was launched to objectively assess the accuracy of protein-coding gene prediction software in C. elegans, and to apply this knowledge to the annotation of the genomes of four additional Caenorhabditis species and other nematodes. Seventeen groups worldwide participated in nGASP, and submitted 47 prediction sets across 10 Mb of the C. elegans genome. Predictions were compared to reference gene sets consisting of confirmed or manually curated gene models from WormBase. Results: The most accurate gene-finders were 'combiner' algorithms, which made use of transcript-and protein-alignments and multi-genome alignments, as well as gene predictions from other gene-finders. Gene-finders that used alignments of ESTs, mRNAs and proteins came in second. There was a tie for third place between gene-finders that used multi-genome alignments and ab initio gene-finders. The median gene level sensitivity of combiners was 78% and their specificity was 42%, which is nearly the same accuracy reported for combiners in the human genome. C. elegans genes with exons of unusual hexamer content, as well as those with unusually many exons, short exons, long introns, a weak translation start signal, weak splice sites, or poorly conserved orthologs posed the greatest difficulty for gene-finders. Conclusion: This experiment establishes a baseline of gene prediction accuracy in Caenorhabditis genomes, and has guided the choice of gene-finders for the annotation of newly sequenced genomes of Caenorhabditis and other nematode species. We have created new gene sets for C. briggsae, C. remanei, C. brenneri, C. japonica, and Brugia malayi using some of the best-performing gene-finders.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] nGASP – the nematode genome annotation assessment project
    Avril Coghlan
    Tristan J Fiedler
    Sheldon J McKay
    Paul Flicek
    Todd W Harris
    Darin Blasiar
    Lincoln D Stein
    [J]. BMC Bioinformatics, 9
  • [2] EGASP:: the human ENCODE genome annotation assessment project
    Guigo, Roderic
    Flicek, Paul
    Abril, Josep F.
    Reymond, Alexandre
    Lagarde, Julien
    Denoeud, France
    Antonarakis, Stylianos
    Ashburner, Michael
    Bajic, Vladimir B.
    Birney, Ewan
    Castelo, Robert
    Eyras, Eduardo
    Ucla, Catherine
    Gingeras, Thomas R.
    Harrow, Jennifer
    Hubbard, Tim
    Lewis, Suzanna E.
    Reese, Martin G.
    [J]. GENOME BIOLOGY, 2006, 7 (Suppl 1)
  • [3] EGASP: the human ENCODE Genome Annotation Assessment Project
    Roderic Guigó
    Paul Flicek
    Josep F Abril
    Alexandre Reymond
    Julien Lagarde
    France Denoeud
    Stylianos Antonarakis
    Michael Ashburner
    Vladimir B Bajic
    Ewan Birney
    Robert Castelo
    Eduardo Eyras
    Catherine Ucla
    Thomas R Gingeras
    Jennifer Harrow
    Tim Hubbard
    Suzanna E Lewis
    Martin G Reese
    [J]. Genome Biology, 7
  • [4] A biologist's view of the Drosophila genome annotation assessment project
    Ashburner, M
    [J]. GENOME RESEARCH, 2000, 10 (04) : 391 - 393
  • [5] Mouse genome annotation by the RefSeq project
    McGarvey, Kelly M.
    Goldfarb, Tamara
    Cox, Eric
    Farrell, Catherine M.
    Gupta, Tripti
    Joardar, Vinita S.
    Kodali, Vamsi K.
    Murphy, Michael R.
    O'Leary, Nuala A.
    Pujar, Shashikant
    Rajput, Bhanu
    Rangwala, Sanjida H.
    Riddick, Lillian D.
    Webb, David
    Wright, Mathew W.
    Murphy, Terence D.
    Pruitt, Kim D.
    [J]. MAMMALIAN GENOME, 2015, 26 (9-10) : 379 - 390
  • [6] Mouse genome annotation by the RefSeq project
    Kelly M. McGarvey
    Tamara Goldfarb
    Eric Cox
    Catherine M. Farrell
    Tripti Gupta
    Vinita S. Joardar
    Vamsi K. Kodali
    Michael R. Murphy
    Nuala A. O’Leary
    Shashikant Pujar
    Bhanu Rajput
    Sanjida H. Rangwala
    Lillian D. Riddick
    David Webb
    Mathew W. Wright
    Terence D. Murphy
    Kim D. Pruitt
    [J]. Mammalian Genome, 2015, 26 : 379 - 390
  • [7] Genome assembly and annotation of the mermithid nematode Mermis nigrescens
    Bhattarai, Upendra R.
    Poulin, Robert
    Gemmell, Neil J.
    Dowle, Eddy
    [J]. G3-GENES GENOMES GENETICS, 2024, 14 (04):
  • [8] An Educational Bioinformatics Project to Improve Genome Annotation
    Amatore, Zoie
    Gunn, Susan
    Harris, Laura K.
    [J]. FRONTIERS IN MICROBIOLOGY, 2020, 11
  • [9] THE OYSTER GENOME PROJECT: AN UPDATE ON ASSEMBLY AND ANNOTATION
    Zhang, Guofan
    Guo, Ximing
    Li, Li
    Xu, Fei
    Wang, Xiaotong
    Qi, Haigang
    Zhang, Linlin
    Que, Huayong
    Wu, Hougang
    Wang, Shihuan
    Hedgecock, Dennis
    Gaffney, Patrick M.
    Luo, Ruibang
    Fang, Xiaodong
    Wang, Jun
    [J]. JOURNAL OF SHELLFISH RESEARCH, 2011, 30 (02): : 567 - 567
  • [10] The functional annotation of the sheep genome project.
    Murdoch, Brenda M.
    [J]. JOURNAL OF ANIMAL SCIENCE, 2019, 97 : 16 - 16