Predicting gene sequences with AI to study codon usage patterns

被引:0
|
作者
Sidi, Tomer [1 ]
Bahiri-Elitzur, Shir [2 ]
Tuller, Tamir [2 ,3 ]
Kolodny, Rachel [1 ]
机构
[1] Univ Haifa, Dept Comp Sci, IL-3303221 Haifa, Israel
[2] Tel Aviv Univ, Dept Biomed Engn, IL-6139001 Tel Aviv, Israel
[3] Tel Aviv Univ, Sagol Sch Neurosci, IL-6139001 Tel Aviv, Israel
关键词
codons prediction; codon AI model; mimicking codons; PROTEIN-STRUCTURE; TRANSLATION; EXPRESSION; BIAS; CONSERVATION; ELONGATION; OPTIMALITY; EFFICIENCY; EVOLUTION; SELECTION;
D O I
10.1073/pnas.2410003121
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Selective pressure acts on the codon use, optimizing multiple, overlapping signals that are only partially understood. We trained AI models to predict codons given their amino acid sequence in the eukaryotes Saccharomyces cerevisiae and Schizosaccharomyces pombe and the bacteria Escherichia coli and Bacillus subtilis to study the extent to which we can learn patterns in naturally occurring codons to improve predictions. We trained our models on a subset of the proteins and evaluated their predictions on large, separate sets of proteins of varying lengths and expression levels. Our models significantly outperformed na & iuml;ve frequency- based approaches, demonstrating that there are learnable dependencies in evolutionary- selected codon usage. The prediction accuracy advantage of our models is greater for highly expressed genes and is greater in bacteria than eukaryotes, supporting the hypothesis that there is a monotonic relationship between selective pressure for complex codon patterns and effective population size. In S . cerevisiae and bacteria, our models were more accurate for longer proteins, suggesting that the learned patterns may be related to cotranslational folding. Gene functionality and conservation were also important determinants that affect the performance of our models. Finally, we showed that using information encoded in homologous proteins has only a minor effect on prediction accuracy, perhaps due to complex codon-usage codes in genes undergoing rapid evolution. Our study employing contemporary AI methods offers a unique perspective and a deep- learning- based prediction tool for evolutionary- selected codons. We hope that these can be useful to optimize codon usage in endogenous and heterologous proteins.
引用
收藏
页数:12
相关论文
共 50 条
  • [31] Codon usage trend in mitochondrial CYB gene
    Uddin, Arif
    Chakraborty, Supriyo
    GENE, 2016, 586 (01) : 105 - 114
  • [32] Di-codon Usage for Gene Classification
    Nguyen, Minh N.
    Ma, Jianmin
    Fogel, Gary B.
    Rajapakse, Jagath C.
    PATTERN RECOGNITION IN BIOINFORMATICS, PROCEEDINGS, 2009, 5780 : 211 - +
  • [33] Molecular Evolution of Protein Sequences and Codon Usage in Monkeypox Viruses
    Shan, Ke-Jia
    Wu, Changcheng
    Tang, Xiaolu
    Lu, Roujian
    Hu, Yaling
    Tan, Wenjie
    Lu, Jian
    GENOMICS PROTEOMICS & BIOINFORMATICS, 2024, 22 (01)
  • [34] OPTIMIZER:: a web server for optimizing the codon usage of DNA sequences
    Puigbo, Pere
    Guzman, Eduard
    Romeu, Antoni
    Garcia-Vallve, Santiago
    NUCLEIC ACIDS RESEARCH, 2007, 35 : W126 - W131
  • [35] CYP4 Gene Detection in Cryptolaemus mountrouzieri Genome Based on Ortholog Sequences and Codon Usage
    Eduardo Jimenez-Diosdado, Merced Jose
    Mireles-Martinez, Maribel
    Villegas-Mendoza, Jesus M.
    Rosas-Quijano, Raymundo
    Rosas-Garcia, Ninfa M.
    SOUTHWESTERN ENTOMOLOGIST, 2016, 41 (03) : 771 - 782
  • [36] Characterization of Codon Usage Patterns and Evolutionary Relationships in Partitiviruses
    Je, Mikyung
    Kim, Hayeon
    Cho, Myeongji
    Son, Hyeon S.
    ICCBB 2019: PROCEEDINGS OF THE 2019 3RD INTERNATIONAL CONFERENCE ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2019, : 33 - 39
  • [37] Codon usage patterns across seven Rosales species
    Zhang, Yao
    Shen, Zenan
    Meng, Xiangrui
    Zhang, Liman
    Liu, Zhiguo
    Liu, Mengjun
    Zhang, Fa
    Zhao, Jin
    BMC PLANT BIOLOGY, 2022, 22 (01)
  • [38] Insight into the codon usage patterns and adaptation of Tembusu Virus
    Guo, Fucheng
    Tan, Huiming
    Yang, Jinjin
    Jia, Rumin
    Wang, Ruichen
    Wu, Lie
    Pan, Fengzhi
    Kang, Kai
    Xie, Weitian
    Li, Youquan
    Fan, Kewei
    POULTRY SCIENCE, 2025, 104 (01)
  • [39] A cross-talk on compositional dynamics and codon usage patterns of mitochondrial CYB gene in Echinodermata
    Barbhuiya, Masuk Ahmed
    Uddin, Arif
    Chakraborty, Supriyo
    MITOCHONDRIAL DNA PART A, 2019, 30 (02) : 351 - 366
  • [40] Trends in the codon usage patterns of Chromohalobacter salexigens genes
    Sanjukta, Rajkumari
    Farooqi, Mohammad Samir
    Sharma, Naveen
    Rai, Anil
    Mishra, Dwijesh Chandra
    Singh, Dhananjaya P.
    BIOINFORMATION, 2012, 8 (22) : 1087 - 1095