RefSeq and the prokaryotic genome annotation pipeline in the age of metagenomes

被引:12
|
作者
Haft, Daniel H. [1 ]
Badretdin, Azat [1 ]
Coulouris, George [1 ]
Dicuccio, Michael [1 ]
Durkin, A. Scott [1 ]
Jovenitti, Eric [1 ]
Li, Wenjun [1 ]
Mersha, Megdelawit [1 ]
O'Neill, Kathleen R. [1 ]
Virothaisakun, Joel [1 ]
Thibaud-Nissen, Francoise [1 ]
机构
[1] NIH, Natl Ctr Biotechnol Informat, Natl Lib Med, Bethesda, MD 20894 USA
关键词
D O I
10.1093/nar/gkad988
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The Reference Sequence (RefSeq) project at the National Center for Biotechnology Information (NCBI) contains over 315 000 bacterial and archaeal genomes and 236 million proteins with up-to-date and consistent annotation. In the past 3 years, we have expanded the diversity of the RefSeq collection by including the best quality metagenome-assembled genomes (MAGs) submitted to INSDC (DDBJ, ENA and GenBank), while maintaining its quality by adding validation checks. Assemblies are now more stringently evaluated for contamination and for completeness of annotation prior to acceptance into RefSeq. MAGs now account for over 17000 assemblies in RefSeq, split over 165 orders and 362 families. Changes in the Prokaryotic Genome Annotation Pipeline (PGAP), which is used to annotate nearly all RefSeq assemblies include better detection of protein-coding genes. Nearly 83% of RefSeq proteins are now named by a curated Protein Family Model, a 4.7% increase in the past three years ago. In addition to literature citations, Enzyme Commission numbers, and gene symbols, Gene Ontology terms are now assigned to 48% of RefSeq proteins, allowing for easier multi-genome comparison. RefSeq is found at https://www.ncbi.nlm.nih.gov/refseq/. PGAP is available as a stand-alone tool able to produce GenBank-ready files at https://github.com/ncbi/pgap. Graphical Abstract
引用
收藏
页码:D762 / D769
页数:8
相关论文
共 50 条
  • [21] A hybrid strategy for comprehensive annotation of the protein coding genes in prokaryotic genome
    Yu, Jia-Feng
    Guo, Jing
    Liu, Qing-Bin
    Hou, Yue
    Xiao, Ke
    Chen, Qing-Li
    Wang, Ji-Hua
    Sun, Xiao
    [J]. GENES & GENOMICS, 2015, 37 (04) : 347 - 355
  • [22] Large-scale prokaryotic gene prediction and comparison to genome annotation
    Nielsen, P
    Krogh, A
    [J]. BIOINFORMATICS, 2005, 21 (24) : 4322 - 4329
  • [23] RefSeq curation and annotation of stop codon recoding in vertebrates
    Rajput, Bhanu
    Pruitt, Kim D.
    Murphy, Terence D.
    [J]. NUCLEIC ACIDS RESEARCH, 2019, 47 (02) : 594 - 606
  • [24] EuGene-PP: a next-generation automated annotation pipeline for prokaryotic genomes
    Sallet, Erika
    Gouzy, Jerome
    Schiex, Thomas
    [J]. BIOINFORMATICS, 2014, 30 (18) : 2659 - 2661
  • [25] Genome Properties: a system for the investigation of prokaryotic genetic content for microbiology, genome annotation and comparative genomics
    Haft, DH
    Selengut, JD
    Brinkac, LM
    Zafar, N
    White, O
    [J]. BIOINFORMATICS, 2005, 21 (03) : 293 - 306
  • [26] RefSeq curation and annotation of antizyme and antizyme inhibitor genes in vertebrates
    Rajput, Bhanu
    Murphy, Terence D.
    Pruitt, Kim D.
    [J]. NUCLEIC ACIDS RESEARCH, 2015, 43 (15) : 7270 - 7279
  • [27] Iron transporters in marine prokaryotic genomes and metagenomes
    Hopkinson, Brian M.
    Barbeau, Katherine A.
    [J]. ENVIRONMENTAL MICROBIOLOGY, 2012, 14 (01) : 114 - 128
  • [28] Genix: a new online automated pipeline for bacterial genome annotation
    Kremer, Frederico Schmitt
    Eslabao, Marcus Redu
    Dellagostin, Odir Antonio
    Pinto, Luciano da Silva
    [J]. FEMS MICROBIOLOGY LETTERS, 2016, 363 (23)
  • [29] LocusLink and RefSeq: Developing tools for genomic annotation and analysis.
    Frankel, SL
    Maglott, DR
    [J]. ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2001, 221 : U169 - U169
  • [30] Prokaryotic Phylogenies Inferred from Whole-Genome Sequence and Annotation Data
    Du, Wei
    Cao, Zhongbo
    Wang, Yan
    Sun, Ying
    Blanzieri, Enrico
    Liang, Yanchun
    [J]. BIOMED RESEARCH INTERNATIONAL, 2013, 2013