A Markovian analysis of bacterial genome sequence constraints

被引:7
|
作者
Skewes, Aaron D. [1 ,2 ]
Welch, Roy D. [1 ]
机构
[1] Syracuse Univ, Dept Biol, Syracuse, NY 13244 USA
[2] Syracuse Univ, Dept Math, Syracuse, NY 13244 USA
来源
PEERJ | 2013年 / 1卷
基金
美国国家科学基金会;
关键词
Sequencing; Markov model; rRNA; Bacteria; Topology;
D O I
10.7717/peerj.127
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
The arrangement of nucleotides within a bacterial chromosome is influenced by numerous factors. The degeneracy of the third codon within each reading frame allows some flexibility of nucleotide selection; however, the third nucleotide in the triplet of each codon is at least partly determined by the preceding two. This is most evident in organisms with a strong G + C bias, as the degenerate codon must contribute disproportionately to maintaining that bias. Therefore, a correlation exists between the first two nucleotides and the third in all open reading frames. If the arrangement of nucleotides in a bacterial chromosome is represented as a Markov process, we would expect that the correlation would be completely captured by a second-order Markov model and an increase in the order of the model ( e. g., third-, fourth-...order) would not capture any additional uncertainty in the process. In this manuscript, we present the results of a comprehensive study of the Markov property that exists in the DNA sequences of 906 bacterial chromosomes. All of the 906 bacterial chromosomes studied exhibit a statistically significant Markov property that extends beyond second-order, and therefore cannot be fully explained by codon usage. An unrooted tree containing all 906 bacterial chromosomes based on their transition probability matrices of third-order shares similar to 25% similarity to a tree based on sequence homologies of 16S rRNA sequences. This congruence to the 16S rRNA tree is greater than for trees based on lower-order models (e.g., second-order), and higher-order models result in diminishing improvements in congruence. A nucleotide correlation most likely exists within every bacterial chromosome that extends past three nucleotides. This correlation places significant limits on the number of nucleotide sequences that can represent probable bacterial chromosomes. Transition matrix usage is largely conserved by taxa, indicating that this property is likely inherited, however some important exceptions exist that may indicate the convergent evolution of some bacteria.
引用
收藏
页数:20
相关论文
共 50 条
  • [1] NANOPORES SEQUENCE BACTERIAL GENOME
    Arnaud, Celia Henry
    [J]. CHEMICAL & ENGINEERING NEWS, 2015, 93 (25) : 32 - 32
  • [2] BACTERIAL GENOME SEQUENCE BAGGED
    NOWAK, R
    [J]. SCIENCE, 1995, 269 (5223) : 468 - 470
  • [3] IonGAP: integrative bacterial genome analysis for Ion Torrent sequence data
    Baez-Ortega, Adrian
    Lorenzo-Diaz, Fabian
    Hernandez, Mariano
    Ignacio Gonzalez-Vila, Carlos
    Luis Roda-Garcia, Jose
    Colebrook, Marcos
    Flores, Carlos
    [J]. BIOINFORMATICS, 2015, 31 (17) : 2870 - 2873
  • [4] Complete genome sequence analysis of bacterial-flagellum-targeting bacteriophage chi
    Ju-Hoon Lee
    Hakdong Shin
    Younho Choi
    Sangryeol Ryu
    [J]. Archives of Virology, 2013, 158 : 2179 - 2183
  • [5] Complete genome sequence analysis of bacterial-flagellum-targeting bacteriophage chi
    Lee, Ju-Hoon
    Shin, Hakdong
    Choi, Younho
    Ryu, Sangryeol
    [J]. ARCHIVES OF VIROLOGY, 2013, 158 (10) : 2179 - 2183
  • [6] Genome Sequence Analysis Indicates that the Model Eukaryote Nematostella vectensis Harbors Bacterial Consorts
    Artamonova, Irena I.
    Mushegian, Arcady R.
    [J]. APPLIED AND ENVIRONMENTAL MICROBIOLOGY, 2013, 79 (22) : 6868 - 6873
  • [7] Comparative Whole Genome Sequence Analysis of the Carcinogenic Bacterial Model Pathogen Helicobacter felis
    Arnold, Isabelle C.
    Zigova, Zuzana
    Holden, Matthew
    Lawley, Trevor D.
    Rad, Roland
    Dougan, Gordon
    Falkow, Stanley
    Bentley, Stephen D.
    Mueller, Anne
    [J]. GENOME BIOLOGY AND EVOLUTION, 2011, 3 : 302 - 308
  • [8] AUTOREGRESSIVE MODELING OF CODING SEQUENCE LENGTHS IN BACTERIAL GENOME
    Morariu, Vasile V.
    Buimaga-Iarinca, Luiza
    [J]. FLUCTUATION AND NOISE LETTERS, 2010, 9 (01): : 47 - 59
  • [9] Complete Genome Sequence of the Bacterial Component of Mysorin Biopreparation
    Afonin, Alexey M.
    Gribchenko, Emma S.
    Akhtemova, Gulnar A.
    Laktionov, Yuri, V
    Kozhemyakov, Andrej P.
    Zhukova, Vladimir A.
    [J]. MICROBIOLOGY RESOURCE ANNOUNCEMENTS, 2021, 10 (11):
  • [10] Sequence and analysis of the Arabidopsis genome
    Bevan, M
    Mayer, K
    White, O
    Eisen, JA
    Preuss, D
    Bureau, T
    Salzberg, SL
    Mewes, HW
    [J]. CURRENT OPINION IN PLANT BIOLOGY, 2001, 4 (02) : 105 - 110