Automated genome sequence analysis and annotation

被引:149
|
作者
Andrade, MA
Brown, NP
Leroy, C
Hoersch, S
de Daruvar, A
Reich, C
Franchini, A
Tamames, J
Valencia, A
Ouzounis, C
Sander, C
机构
[1] European Bioinformat Inst, Cambridge CB10 1SD, England
[2] CSIC, CNB, Prot Design Grp, M-28049 Madrid, Spain
关键词
D O I
10.1093/bioinformatics/15.5.391
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Large-scale genome projects generate a rapidly increasing number of sequences, most of them biochemically uncharacterized. Research in bioinformatics contributes to the development of methods for the computational characterization of these sequences. However the installation and application of these methods require experience and are time consuming. Results: We present here an automatic system for preliminary functional annotation of protein sequences that has been applied to the analysis of sets of sequences from complete genomes, both to refine overall performance and to make new discoveries comparable to those made by human experts. The GeneQuiz system includes a Web-based browser that allows examination of the evidence leading to an automatic annotation and offers additional information, views of the results, and links to biological databases that complement the automatic analysis. System structure and operating principles concerning the use of multiple sequence databases, underlying sequence analysis tools, lexical analyses of database annotations and decision criteria for functional assignments are detailed. The system makes automatic quality assessments of results based on prior experience,with the underlying sequence analysis tools, overall error rates in functional assignment are estimated at 2.5-5% for cases annotated with highest reliability ('clear' cases), Sources of overinterpretation of results are discussed with proposals for improvement. A conservative definition for reporting 'new findings' thar rakes account of database maturity is presented along with examples of possible kinds of discover ies (new function, family and superfamily) made by the system. System performance in relation to sequence database coverage, database dynamics and database search methods is analysed, demonstrating the inherent advantages of nn integrated automatic approach using multiple databases and search methods applied in art objective and repeatable manner.
引用
收藏
页码:391 / 412
页数:22
相关论文
共 50 条
  • [1] Automated bacterial genome analysis and annotation
    Stothard, Paul
    Wishart, David S.
    [J]. CURRENT OPINION IN MICROBIOLOGY, 2006, 9 (05) : 505 - 510
  • [2] RiceGAAS: an automated annotation system and database for rice genome sequence
    Sakata, K
    Nagamura, Y
    Numa, H
    Antonio, BA
    Nagasaki, H
    Idonuma, A
    Watanabe, W
    Shimizu, Y
    Horiuchi, I
    Matsumoto, T
    Sasaki, T
    Higo, K
    [J]. NUCLEIC ACIDS RESEARCH, 2002, 30 (01) : 98 - 102
  • [3] Automated sequence-based annotation and interpretation of the human genome
    Kundaje, Anshul
    Meuleman, Wouter
    [J]. NATURE GENETICS, 2022, 54 (07) : 916 - 917
  • [4] Automated sequence-based annotation and interpretation of the human genome
    Anshul Kundaje
    Wouter Meuleman
    [J]. Nature Genetics, 2022, 54 : 916 - 917
  • [5] The zebrafish genome project: Sequence analysis and annotation
    Jekosch, K
    [J]. ZEBRAFISH:2ND EDITION GENETICS GENOMICS AND INFORMATICS, 2004, 77 : 225 - 239
  • [6] An automated system for sequence analysis, annotation, and comparison.
    Taylor, TD
    Watanabe, H
    Gun, K
    Sun, J
    Sakaki, Y
    [J]. AMERICAN JOURNAL OF HUMAN GENETICS, 2000, 67 (04) : 257 - 257
  • [7] Genome annotation: From sequence to biology
    Stein, L
    [J]. NATURE REVIEWS GENETICS, 2001, 2 (07) : 493 - 503
  • [8] Genome annotation: from sequence to biology
    Lincoln Stein
    [J]. Nature Reviews Genetics, 2001, 2 : 493 - 503
  • [9] Genofunc: genome annotation and identification of genome features for automated pipelining analysis of virus whole genome sequences
    Xiaoyu Yu
    [J]. BMC Bioinformatics, 24
  • [10] Genofunc: genome annotation and identification of genome features for automated pipelining analysis of virus whole genome sequences
    Yu, Xiaoyu
    [J]. BMC BIOINFORMATICS, 2023, 24 (01)