Analyzing the correlation between protein expression and sequence-related features of mRNA and protein in Escherichia coli K-12 MG1655 model

被引:0
|
作者
Truong, Nhat H. M. [1 ,2 ]
Vo, Nam T. [1 ,2 ,3 ]
Nguyen, Binh T. [2 ,4 ]
Huynh, Son T. [2 ,4 ]
Nguyen, Hoang D. [1 ,2 ]
机构
[1] Univ Sci, Ctr Biosci & Biotechnol, Ho Chi Minh City, Vietnam
[2] Vietnam Natl Univ, Ho Chi Minh City, Vietnam
[3] Univ Sci, Lab Mol Biotechnol, Ho Chi Minh City, Vietnam
[4] Univ Sci, Dept Comp Sci, Ho Chi Minh City, Vietnam
来源
PLOS ONE | 2024年 / 19卷 / 02期
关键词
TRANSLATION INITIATION; GENE-EXPRESSION; DATABASE;
D O I
10.1371/journal.pone.0288526
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
It was necessary to have a tool that could predict the amount of protein and optimize the gene sequences to produce recombinant proteins efficiently. The Transim model published by Tuller et al. in 2018 can calculate the translation rate in E. coli using features on the mRNA sequence, achieving a Spearman correlation with the amount of protein per mRNA of 0.36 when tested on the dataset of operons' first genes in E. coli K-12 MG1655 genome. However, this Spearman correlation was not high, and the model did not fully consider the features of mRNA and protein sequences. Therefore, to enhance the prediction capability, our study firstly tried expanding the testing dataset, adding genes inside the operon, and using the microarray of the mRNA expression data set, thereby helping to improve the correlation of translation rate with the amount of protein with more than 0.42. Next, the applicability of 6 traditional machine learning models to calculate a "new translation rate" was examined using initiation rate and elongation rate as inputs. The result showed that the SVR algorithm had the most correlated new translation rates, with Spearman correlation improving to R = 0.6699 with protein level output and to R = 0.6536 with protein level per mRNA. Finally, the study investigated the degree of improvement when combining more features with the new translation rates. The results showed that the model's predictive ability to produce a protein per mRNA reached R = 0.6660 when using six features, while the correlation of this model's final translation rate to protein level was up to R = 0.6729. This demonstrated the model's capability to predict protein expression of a gene, rather than being limited to predicting expression by an mRNA and showed the model's potential for development into gene expression predicting tools.
引用
收藏
页数:19
相关论文
共 50 条
  • [1] The PurR regulon in Escherichia coli K-12 MG1655
    Cho, Byung-Kwan
    Federowicz, Stephen A.
    Embree, Mallory
    Park, Young-Seoub
    Kim, Donghyuk
    Palsson, Bernhard O.
    NUCLEIC ACIDS RESEARCH, 2011, 39 (15) : 6456 - 6464
  • [2] Identification of new members of the Escherichia coli K-12 MG1655 SlyA regulon
    Curran, Thomas D.
    Abacha, Fatima
    Hibberd, Stephen P.
    Rolfe, Matthew D.
    Lacey, Melissa M.
    Green, Jeffrey
    MICROBIOLOGY-SGM, 2017, 163 (03): : 400 - 409
  • [3] Systematic discovery of uncharacterized transcription factors in Escherichia coli K-12 MG1655
    Gao, Ye
    Yurkovich, James T.
    Seo, Sang Woo
    Kabimoldayev, Ilyas
    Draeger, Andreas
    Chen, Ke
    Sastry, Anand V.
    Fang, Xin
    Mih, Nathan
    Yang, Laurence
    Eichner, Johannes
    Cho, Byung-Kwan
    Kim, Donghyuk
    Palsson, Bernhard O.
    NUCLEIC ACIDS RESEARCH, 2018, 46 (20) : 10682 - 10696
  • [4] Adaptive Evolution of Escherichia coli K-12 MG1655 Grown on Ethanol and Glycerol
    Eremina, N. S.
    Slivinskaya, E. A.
    Yampolskaya, T. A.
    Rybak, K. V.
    Altman, I. B.
    Ptitsyn, L. R.
    Stoynova, N. V.
    APPLIED BIOCHEMISTRY AND MICROBIOLOGY, 2018, 54 (08) : 793 - 799
  • [5] Adaptive Evolution of Escherichia coli K-12 MG1655 Grown on Ethanol and Glycerol
    N. S. Eremina
    E. A. Slivinskaya
    T. A. Yampolskaya
    K. V. Rybak
    I. B. Altman
    L. R. Ptitsyn
    N. V. Stoynova
    Applied Biochemistry and Microbiology, 2018, 54 : 793 - 799
  • [6] NOTI GENOMIC CLEAVAGE MAP OF ESCHERICHIA-COLI K-12 STRAIN MG1655
    HEATH, JD
    PERKINS, JD
    SHARMA, B
    WEINSTOCK, GM
    JOURNAL OF BACTERIOLOGY, 1992, 174 (02) : 558 - 567
  • [7] SFIL GENOMIC CLEAVAGE MAP OF ESCHERICHIA-COLI K-12 STRAIN MG1655
    PERKINS, JD
    HEATH, JD
    SHARMA, BR
    WEINSTOCK, GM
    NUCLEIC ACIDS RESEARCH, 1992, 20 (05) : 1129 - 1137
  • [8] Contribution of rpoS and bolA genes in biofilm formation in Escherichia coli K-12 MG1655
    Adnan, Mohd
    Morton, Glyn
    Singh, Jaipaul
    Hadi, Sibte
    MOLECULAR AND CELLULAR BIOCHEMISTRY, 2010, 342 (1-2) : 207 - 213
  • [9] Contribution of rpoS and bolA genes in biofilm formation in Escherichia coli K-12 MG1655
    Mohd Adnan
    Glyn Morton
    Jaipaul Singh
    Sibte Hadi
    Molecular and Cellular Biochemistry, 2010, 342 : 207 - 213
  • [10] Quantitative protein expression and cell surface characteristics of Escherichia coli MG1655 biofilms
    Mukherjee, Joy
    Ow, Saw Yen
    Noirel, Josselin
    Biggs, Catherine A.
    PROTEOMICS, 2011, 11 (03) : 339 - 351