nGASP - the nematode genome annotation assessment project

被引:31
|
作者
Coghlan, Avril [2 ]
Fiedler, Tristan J. [3 ]
Mckay, Sheldon J. [1 ]
Flicek, Paul [4 ]
Harris, Todd W. [1 ]
Blasiar, Darin [5 ]
Stein, Lincoln D. [1 ]
机构
[1] Cold Spring Harbor Lab, Cold Spring Harbor, NY 11724 USA
[2] Wellcome Trust Sanger Inst, Cambridge CB10 1SA, England
[3] Florida Inst Technol, Dept Biol Sci, Melbourne, FL 32901 USA
[4] European Bioinformat Inst, Cambridge CB10 1SD, England
[5] Washington Univ, Sch Med, St Louis, MO 63108 USA
基金
英国惠康基金; 美国国家卫生研究院;
关键词
D O I
10.1186/1471-2105-9-549
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: While the C. elegans genome is extensively annotated, relatively little information is available for other Caenorhabditis species. The nematode genome annotation assessment project (nGASP) was launched to objectively assess the accuracy of protein-coding gene prediction software in C. elegans, and to apply this knowledge to the annotation of the genomes of four additional Caenorhabditis species and other nematodes. Seventeen groups worldwide participated in nGASP, and submitted 47 prediction sets across 10 Mb of the C. elegans genome. Predictions were compared to reference gene sets consisting of confirmed or manually curated gene models from WormBase. Results: The most accurate gene-finders were 'combiner' algorithms, which made use of transcript-and protein-alignments and multi-genome alignments, as well as gene predictions from other gene-finders. Gene-finders that used alignments of ESTs, mRNAs and proteins came in second. There was a tie for third place between gene-finders that used multi-genome alignments and ab initio gene-finders. The median gene level sensitivity of combiners was 78% and their specificity was 42%, which is nearly the same accuracy reported for combiners in the human genome. C. elegans genes with exons of unusual hexamer content, as well as those with unusually many exons, short exons, long introns, a weak translation start signal, weak splice sites, or poorly conserved orthologs posed the greatest difficulty for gene-finders. Conclusion: This experiment establishes a baseline of gene prediction accuracy in Caenorhabditis genomes, and has guided the choice of gene-finders for the annotation of newly sequenced genomes of Caenorhabditis and other nematode species. We have created new gene sets for C. briggsae, C. remanei, C. brenneri, C. japonica, and Brugia malayi using some of the best-performing gene-finders.
引用
收藏
页数:13
相关论文
共 50 条
  • [21] An assessment of genome annotation coverage across the bacterial tree of life
    Lobb, Briallen
    Tremblay, Benjamin Jean-Marie
    Moreno-Hagelsieb, Gabriel
    Doxey, Andrew C.
    [J]. MICROBIAL GENOMICS, 2020, 6 (03):
  • [22] Annotation confidence score for genome annotation: a genome comparison approach
    Yang, Youngik
    Gilbert, Donald
    Kim, Sun
    [J]. BIOINFORMATICS, 2010, 26 (01) : 22 - 29
  • [23] Citrus sinensis Annotation Project (CAP): A Comprehensive Database for Sweet Orange Genome
    Wang, Jia
    Chen, Dijun
    Lei, Yang
    Chang, Ji-Wei
    Hao, Bao-Hai
    Xing, Feng
    Li, Sen
    Xu, Qiang
    Deng, Xiu-Xin
    Chen, Ling-Ling
    [J]. PLOS ONE, 2014, 9 (01):
  • [24] The subsystems approach to genome annotation and its use in the project to annotate 1000 genomes
    Overbeek, R
    Begley, T
    Butler, RM
    Choudhuri, JV
    Chuang, HY
    Cohoon, M
    de Crécy-Lagard, V
    Diaz, N
    Disz, T
    Edwards, R
    Fonstein, M
    Frank, ED
    Gerdes, S
    Glass, EM
    Goesmann, A
    Hanson, A
    Iwata-Reuyl, D
    Jensen, R
    Jamshidi, N
    Krause, L
    Kubal, M
    Larsen, N
    Linke, B
    McHardy, AC
    Meyer, F
    Neuweger, H
    Olsen, G
    Olson, R
    Osterman, A
    Portnoy, V
    Pusch, GD
    Rodionov, DA
    Rückert, C
    Steiner, J
    Stevens, R
    Thiele, I
    Vassieva, O
    Ye, Y
    Zagnitko, O
    Vonstein, V
    [J]. NUCLEIC ACIDS RESEARCH, 2005, 33 (17) : 5691 - 5702
  • [26] The impact of the human genome project on risk assessment
    Doerffer, K
    Unrau, P
    [J]. IRPA9 - 1996 INTERNATIONAL CONGRESS ON RADIATION PROTECTION / NINTH INTERNATIONAL CONGRESS OF THE INTERNATIONAL RADIATION PROTECTION ASSOCIATION, PROCEEDINGS, VOL 2, 1996, : B340 - B342
  • [27] Annotation of the Arabidopsis genome
    Wortman, JR
    Haas, BJ
    Hannick, LI
    Smith, RK
    Maiti, R
    Ronning, CM
    Chan, AP
    Yu, CH
    Ayele, M
    Whitelaw, CA
    White, OR
    Town, CD
    [J]. PLANT PHYSIOLOGY, 2003, 132 (02) : 461 - 468
  • [28] Human Genome Annotation
    Gerstein, Mark
    [J]. BIOINFORMATICS RESEARCH AND APPLICATIONS, PROCEEDINGS, 2010, 6053 : 50 - 51
  • [29] Annotation of the Drosophila genome
    Samuel Karlin
    Aviv Bergman
    Andrew J. Gentles
    [J]. Nature, 2001, 411 : 259 - 260
  • [30] Errors in genome annotation
    Brenner, SE
    [J]. TRENDS IN GENETICS, 1999, 15 (04) : 132 - 133