Analyzing the correlation between protein expression and sequence-related features of mRNA and protein in Escherichia coli K-12 MG1655 model
被引:0
|
作者:
Truong, Nhat H. M.
论文数: 0引用数: 0
h-index: 0
机构:
Univ Sci, Ctr Biosci & Biotechnol, Ho Chi Minh City, Vietnam
Vietnam Natl Univ, Ho Chi Minh City, VietnamUniv Sci, Ctr Biosci & Biotechnol, Ho Chi Minh City, Vietnam
Truong, Nhat H. M.
[1
,2
]
Vo, Nam T.
论文数: 0引用数: 0
h-index: 0
机构:
Univ Sci, Ctr Biosci & Biotechnol, Ho Chi Minh City, Vietnam
Vietnam Natl Univ, Ho Chi Minh City, Vietnam
Univ Sci, Lab Mol Biotechnol, Ho Chi Minh City, VietnamUniv Sci, Ctr Biosci & Biotechnol, Ho Chi Minh City, Vietnam
Vo, Nam T.
[1
,2
,3
]
论文数: 引用数:
h-index:
机构:
Nguyen, Binh T.
[2
,4
]
Huynh, Son T.
论文数: 0引用数: 0
h-index: 0
机构:
Vietnam Natl Univ, Ho Chi Minh City, Vietnam
Univ Sci, Dept Comp Sci, Ho Chi Minh City, VietnamUniv Sci, Ctr Biosci & Biotechnol, Ho Chi Minh City, Vietnam
Huynh, Son T.
[2
,4
]
论文数: 引用数:
h-index:
机构:
Nguyen, Hoang D.
[1
,2
]
机构:
[1] Univ Sci, Ctr Biosci & Biotechnol, Ho Chi Minh City, Vietnam
[2] Vietnam Natl Univ, Ho Chi Minh City, Vietnam
[3] Univ Sci, Lab Mol Biotechnol, Ho Chi Minh City, Vietnam
[4] Univ Sci, Dept Comp Sci, Ho Chi Minh City, Vietnam
It was necessary to have a tool that could predict the amount of protein and optimize the gene sequences to produce recombinant proteins efficiently. The Transim model published by Tuller et al. in 2018 can calculate the translation rate in E. coli using features on the mRNA sequence, achieving a Spearman correlation with the amount of protein per mRNA of 0.36 when tested on the dataset of operons' first genes in E. coli K-12 MG1655 genome. However, this Spearman correlation was not high, and the model did not fully consider the features of mRNA and protein sequences. Therefore, to enhance the prediction capability, our study firstly tried expanding the testing dataset, adding genes inside the operon, and using the microarray of the mRNA expression data set, thereby helping to improve the correlation of translation rate with the amount of protein with more than 0.42. Next, the applicability of 6 traditional machine learning models to calculate a "new translation rate" was examined using initiation rate and elongation rate as inputs. The result showed that the SVR algorithm had the most correlated new translation rates, with Spearman correlation improving to R = 0.6699 with protein level output and to R = 0.6536 with protein level per mRNA. Finally, the study investigated the degree of improvement when combining more features with the new translation rates. The results showed that the model's predictive ability to produce a protein per mRNA reached R = 0.6660 when using six features, while the correlation of this model's final translation rate to protein level was up to R = 0.6729. This demonstrated the model's capability to predict protein expression of a gene, rather than being limited to predicting expression by an mRNA and showed the model's potential for development into gene expression predicting tools.
机构:
Seoul Natl Univ, Sch Chem & Biol Engn, 1 Gwanak Ro, Seoul 08826, South Korea
Seoul Natl Univ, Inst Chem Proc, 1 Gwanak Ro, Seoul 08826, South Korea
Univ Calif San Diego, Dept Bioengn, La Jolla, CA 92093 USASeoul Natl Univ, Sch Chem & Biol Engn, 1 Gwanak Ro, Seoul 08826, South Korea
Seo, Sang Woo
Gao, Ye
论文数: 0引用数: 0
h-index: 0
机构:
Univ Calif San Diego, Div Biol Sci, La Jolla, CA 92093 USASeoul Natl Univ, Sch Chem & Biol Engn, 1 Gwanak Ro, Seoul 08826, South Korea
Gao, Ye
Kim, Donghyuk
论文数: 0引用数: 0
h-index: 0
机构:
Univ Calif San Diego, Dept Bioengn, La Jolla, CA 92093 USA
Kyung Hee Univ, Coll Life Sci, Dept Genet Engn, Yongin 446701, South KoreaSeoul Natl Univ, Sch Chem & Biol Engn, 1 Gwanak Ro, Seoul 08826, South Korea
Kim, Donghyuk
Szubin, Richard
论文数: 0引用数: 0
h-index: 0
机构:
Univ Calif San Diego, Dept Bioengn, La Jolla, CA 92093 USASeoul Natl Univ, Sch Chem & Biol Engn, 1 Gwanak Ro, Seoul 08826, South Korea
Szubin, Richard
Yang, Jina
论文数: 0引用数: 0
h-index: 0
机构:
Seoul Natl Univ, Sch Chem & Biol Engn, 1 Gwanak Ro, Seoul 08826, South Korea
Seoul Natl Univ, Inst Chem Proc, 1 Gwanak Ro, Seoul 08826, South KoreaSeoul Natl Univ, Sch Chem & Biol Engn, 1 Gwanak Ro, Seoul 08826, South Korea
Yang, Jina
Cho, Byung-Kwan
论文数: 0引用数: 0
h-index: 0
机构:
Korea Adv Inst Sci & Technol, Dept Biol Sci, Daejeon 305701, South Korea
Tech Univ Denmark, Novo Nordisk Fdn Ctr Biosustainabil, DK-2800 Lyngby, DenmarkSeoul Natl Univ, Sch Chem & Biol Engn, 1 Gwanak Ro, Seoul 08826, South Korea
Cho, Byung-Kwan
Palsson, Bernhard O.
论文数: 0引用数: 0
h-index: 0
机构:
Univ Calif San Diego, Dept Bioengn, La Jolla, CA 92093 USA
Univ Calif San Diego, Dept Pediat, La Jolla, CA 92093 USA
Tech Univ Denmark, Novo Nordisk Fdn Ctr Biosustainabil, DK-2800 Lyngby, DenmarkSeoul Natl Univ, Sch Chem & Biol Engn, 1 Gwanak Ro, Seoul 08826, South Korea