Predicting gene sequences with AI to study codon usage patterns

被引:0
|
作者
Sidi, Tomer [1 ]
Bahiri-Elitzur, Shir [2 ]
Tuller, Tamir [2 ,3 ]
Kolodny, Rachel [1 ]
机构
[1] Univ Haifa, Dept Comp Sci, IL-3303221 Haifa, Israel
[2] Tel Aviv Univ, Dept Biomed Engn, IL-6139001 Tel Aviv, Israel
[3] Tel Aviv Univ, Sagol Sch Neurosci, IL-6139001 Tel Aviv, Israel
关键词
codons prediction; codon AI model; mimicking codons; PROTEIN-STRUCTURE; TRANSLATION; EXPRESSION; BIAS; CONSERVATION; ELONGATION; OPTIMALITY; EFFICIENCY; EVOLUTION; SELECTION;
D O I
10.1073/pnas.2410003121
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Selective pressure acts on the codon use, optimizing multiple, overlapping signals that are only partially understood. We trained AI models to predict codons given their amino acid sequence in the eukaryotes Saccharomyces cerevisiae and Schizosaccharomyces pombe and the bacteria Escherichia coli and Bacillus subtilis to study the extent to which we can learn patterns in naturally occurring codons to improve predictions. We trained our models on a subset of the proteins and evaluated their predictions on large, separate sets of proteins of varying lengths and expression levels. Our models significantly outperformed na & iuml;ve frequency- based approaches, demonstrating that there are learnable dependencies in evolutionary- selected codon usage. The prediction accuracy advantage of our models is greater for highly expressed genes and is greater in bacteria than eukaryotes, supporting the hypothesis that there is a monotonic relationship between selective pressure for complex codon patterns and effective population size. In S . cerevisiae and bacteria, our models were more accurate for longer proteins, suggesting that the learned patterns may be related to cotranslational folding. Gene functionality and conservation were also important determinants that affect the performance of our models. Finally, we showed that using information encoded in homologous proteins has only a minor effect on prediction accuracy, perhaps due to complex codon-usage codes in genes undergoing rapid evolution. Our study employing contemporary AI methods offers a unique perspective and a deep- learning- based prediction tool for evolutionary- selected codons. We hope that these can be useful to optimize codon usage in endogenous and heterologous proteins.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Interrogating nucleotide sequences with AI to understand codon usage patterns
    Elazar, Assaf
    Mathew, Steve
    Babu, M. Madan
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2025, 122 (07)
  • [2] Codon usage patterns distort phylogenies from or of DNA sequences
    Christianson, ML
    AMERICAN JOURNAL OF BOTANY, 2005, 92 (08) : 1221 - 1233
  • [3] Evolutionary patterns of codon usage in the chloroplast gene rbcL
    Wall, DP
    Herbeck, JT
    JOURNAL OF MOLECULAR EVOLUTION, 2003, 56 (06) : 673 - 688
  • [4] Evolutionary Patterns of Codon Usage in the Chloroplast Gene rbcL
    Dennis P. Wall
    Joshua T. Herbeck
    Journal of Molecular Evolution, 2003, 56 : 673 - 688
  • [5] Predicting gene expression level from codon usage bias
    Henry, Ian
    Sharp, Paul M.
    MOLECULAR BIOLOGY AND EVOLUTION, 2007, 24 (01) : 10 - 12
  • [6] Base Composition, Codon Usage, and Patterns of Gene Sequence Evolution in Butterflies
    Naesvall, Karin
    Boman, Jesper
    Talla, Venkat
    Backstroem, Niclas
    GENOME BIOLOGY AND EVOLUTION, 2023, 15 (08):
  • [7] Dependency of codon usage on protein sequence patterns: a statistical study
    Foroughmand-Araabi, Mohammad-Hadi
    Goliaei, Bahram
    Alishahi, Kasra
    Sadeghi, Mehdi
    THEORETICAL BIOLOGY AND MEDICAL MODELLING, 2014, 11
  • [8] Analysis of codon usage in β-tubulin sequences of helminths
    von Samson-Himmelstjerna, G
    Harder, A
    Failing, K
    Pape, M
    Schnieder, T
    PARASITOLOGY RESEARCH, 2003, 90 (04) : 294 - 300
  • [9] Analysis of codon usage in β-tubulin sequences of helminths
    G. von Samson-Himmelstjerna
    A. Harder
    K. Failing
    M. Pape
    T. Schnieder
    Parasitology Research, 2003, 90 : 294 - 300
  • [10] Codon and Aminoacid Usage Patterns in Mycobacteria
    Scapoli, C.
    Bartolomei, E.
    De Lorenzi, S.
    Carrieri, A.
    Salvatorelli, G.
    Rodriguez-Larralde, A.
    Barrai, I.
    JOURNAL OF MOLECULAR MICROBIOLOGY AND BIOTECHNOLOGY, 2009, 17 (02) : 53 - 60