An Integrative Method for Identifying the Over-Annotated Protein-Coding Genes in Microbial Genomes

被引:16
|
作者
Yu, Jia-Feng [1 ,2 ]
Xiao, Ke [1 ]
Jiang, Dong-Ke [1 ]
Guo, Jing [1 ]
Wang, Ji-Hua [2 ]
Sun, Xiao [1 ]
机构
[1] Southeast Univ, Sch Biol Sci & Med Engn, State Key Lab Bioelect, Nanjing 210096, Jiangsu, Peoples R China
[2] Dezhou Univ, Dept Phys, Shandong Prov Key Lab Biophys Funct Macromol, Dezhou 253023, Peoples R China
基金
中国国家自然科学基金;
关键词
protein-coding gene; microbial genome; re-annotation; horizontal gene transfer; HORIZONTALLY TRANSFERRED GENES; RE-ANNOTATION; GRAPHICAL REPRESENTATION; ESCHERICHIA-COLI; DNA-SEQUENCE; CODON USAGE; BACTERIAL; ORFS; IDENTIFICATION; REANNOTATION;
D O I
10.1093/dnares/dsr030
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
The falsely annotated protein-coding genes have been deemed one of the major causes accounting for the annotating errors in public databases. Although many filtering approaches have been designed for the over-annotated protein-coding genes, some are questionable due to the resultant increase in false negative. Furthermore, there is no webserver or software specifically devised for the problem of over-annotation. In this study, we propose an integrative algorithm for detecting the over-annotated protein-coding genes in microorganisms. Overall, an average accuracy of 99.94% is achieved over 61 microbial genomes. The extremely high accuracy indicates that the presented algorithm is efficient to differentiate the protein-coding genes from the non-coding open reading frames. Abundant analyses show that the predicting results are reliable and the integrative algorithm is robust and convenient. Our analysis also indicates that the over-annotated protein-coding genes can cause the false positive of horizontal gene transfers detection. The webserver of the proposed algorithm can be freely accessible from www.cbi.seu.edu.cn/RPGM.
引用
收藏
页码:435 / 449
页数:15
相关论文
共 50 条
  • [1] A Review of the Computational Methods for Identifying the Over-Annotated Genes and Missing Genes in Microbial Genomes
    Yu, Jia-Feng
    Guo, Zhen-Zhen
    Sun, Xiao
    Wang, Ji-Hua
    CURRENT BIOINFORMATICS, 2014, 9 (02) : 147 - 154
  • [2] Computational prediction of over-annotated protein-coding genes in the genome of Agrobacterium tumefaciens strain C58
    于家峰
    隋天翔
    王红梅
    王春玲
    荆莉
    王吉华
    Chinese Physics B, 2015, (12) : 102 - 108
  • [3] Computational prediction of over-annotated protein-coding genes in the genome of Agrobacterium tumefaciens strain C58
    Yu Jia-Feng
    Sui Tian-Xiang
    Wang Hong-Mei
    Wang Chun-Ling
    Jing Li
    Wang Ji-Hua
    CHINESE PHYSICS B, 2015, 24 (12)
  • [4] PanCoreGen - Profiling, detecting, annotating protein-coding genes in microbial genomes
    Paul, Sandip
    Bhardwaj, Archana
    Bag, Sumit K.
    Sokurenko, Evgeni V.
    Chattopadhyay, Sujay
    GENOMICS, 2015, 106 (06) : 367 - 372
  • [5] ANNOTATION OF PROTEIN-CODING GENES IN FUNGAL GENOMES
    Martinez, Diego
    Grigoriev, Igor
    Salamov, Asaf
    APPLIED AND COMPUTATIONAL MATHEMATICS, 2010, 9 : 56 - 65
  • [6] Identifying protein-coding genes in genomic sequences
    Harrow, Jennifer
    Nagy, Alinda
    Reymond, Alexandre
    Alioto, Tyler
    Patthy, Laszlo
    Antonarakis, Stylianos E.
    Guigo, Roderic
    GENOME BIOLOGY, 2009, 10 (01): : 201
  • [7] Identifying protein-coding genes in genomic sequences
    Jennifer Harrow
    Alinda Nagy
    Alexandre Reymond
    Tyler Alioto
    Laszlo Patthy
    Stylianos E Antonarakis
    Roderic Guigó
    Genome Biology, 10
  • [8] Accurate annotation of protein-coding genes in mitochondrial genomes
    Al Arab, Marwa
    zu Siederdissen, Christian Hoener
    Tout, Kifah
    Sahyoun, Abdullah H.
    Stadler, Peter F.
    Bernt, Matthias
    MOLECULAR PHYLOGENETICS AND EVOLUTION, 2017, 106 : 209 - 216
  • [9] On the convergence of a clustering algorithm for protein-coding regions in microbial genomes
    Baldi, P
    BIOINFORMATICS, 2000, 16 (04) : 367 - 371
  • [10] Self-identification of protein-coding regions in microbial genomes
    Audic, S
    Claverie, JM
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1998, 95 (17) : 10026 - 10031