Propensity Scores for Prediction and Characterization of Bioluminescent Proteins from Sequences

被引:22
|
作者
Huang, Hui-Ling [1 ,2 ]
机构
[1] Natl Chiao Tung Univ, Inst Bioinformat & Syst Biol, Hsinchu, Taiwan
[2] Natl Chiao Tung Univ, Dept Biol Sci & Technol, Hsinchu, Taiwan
来源
PLOS ONE | 2014年 / 9卷 / 05期
关键词
PHOTOPROTEIN AEQUORIN; CRYSTAL-STRUCTURE; FLUORESCENT; DATABASE; MACHINE; CELLS;
D O I
10.1371/journal.pone.0097158
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Bioluminescent proteins (BLPs) are a class of proteins with various mechanisms of light emission such as bioluminescence and fluorescence from luminous organisms. While valuable for commercial and medical applications, identification of BLPs, including luciferases and fluorescent proteins (FPs), is rather challenging, owing to their high variety of protein sequences. Moreover, characterization of BLPs facilitates mutagenesis analysis to enhance bioluminescence and fluorescence. Therefore, this study proposes a novel methodological approach to estimating the propensity scores of 400 dipeptides and 20 amino acids in order to design two prediction methods and characterize BLPs based on a scoring card method (SCM). The SCMBLP method for predicting BLPs achieves an accuracy of 90.83% for 10-fold cross-validation higher than existing support vector machine based methods and a test accuracy of 82.85%. A dataset consisting of 269 luciferases and 216 FPs is also established to design the SCMLFP prediction method, which achieves training and test accuracies of 97.10% and 96.28%, respectively. Additionally, four informative physicochemical properties of 20 amino acids are identified using the estimated propensity scores to characterize BLPs as follows: 1) high transfer free energy from inside to the protein surface, 2) high occurrence frequency of residues in the transmembrane regions of the protein, 3) large hydrophobicity scale from the native protein structure, and 4) high correlation coefficient (R = 0.921) between the amino acid compositions of BLPs and integral membrane proteins. Further analyzing BLPs reveals that luciferases have a larger value of R (0.937) than FPs (0.635), suggesting that luciferases tend to locate near the cell membrane location rather than FPs for convenient receipt of extracellular ions. Importantly, the propensity scores of dipeptides and amino acids and the identified properties facilitate efforts to predict, characterize, and apply BLPs, including luciferases, photoproteins, and FPs. The web server is available at http://iclab.life.nctu.edu.tw/SCMBLP/index.html.
引用
收藏
页数:15
相关论文
共 50 条
  • [41] Prediction of bioluminescent proteins by using sequence-derived features and lineage-specific scheme
    Jian Zhang
    Haiting Chai
    Guifu Yang
    Zhiqiang Ma
    BMC Bioinformatics, 18
  • [42] Prediction of risk scores for colorectal cancer patients from the concentration of proteins involved in mitochondrial apoptotic pathway
    Lathwal, Anjali
    Arora, Chakit
    Raghava, Gajendra P. S.
    PLOS ONE, 2019, 14 (09):
  • [43] Prediction of bioluminescent proteins by using sequence-derived features and lineage-specific scheme
    Zhang, Jian
    Chai, Haiting
    Yang, Guifu
    Ma, Zhiqiang
    BMC BIOINFORMATICS, 2017, 18
  • [44] The identification and characterization of xenoantigenic nonhuman carbohydrate sequences in membrane proteins from porcine kidney
    Kim, YG
    Kim, SY
    Hur, YM
    Joo, HS
    Chung, JH
    Lee, DS
    Royle, L
    Rudd, PM
    Dwek, RA
    Harvey, DJ
    Kim, BG
    PROTEOMICS, 2006, 6 (04) : 1133 - 1142
  • [45] Tools for the Recognition of Sorting Signals and the Prediction of Subcellular Localization of Proteins From Their Amino Acid Sequences
    Imai, Kenichiro
    Nakai, Kenta
    FRONTIERS IN GENETICS, 2020, 11
  • [46] DeepTM: A deep learning algorithm for prediction of melting temperature of thermophilic proteins directly from sequences
    Li, Mengyu
    Wang, Hongzhao
    Yang, Zhenwu
    Zhang, Longgui
    Zhu, Yushan
    COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2023, 21 : 5544 - 5560
  • [47] On the prediction of DNA-binding proteins only from primary sequences: A deep learning approach
    Qui, Yu-Hui
    Yu, Hua
    Gong, Xiu-Jun
    Xu, Jia-Hui
    Lee, Hong-Shun
    PLOS ONE, 2017, 12 (12):
  • [48] Prediction of MMPI-2 scores from PCSQ factor scores
    Michael, ST
    Axelrod, BN
    Lees-Haley, PR
    CLINICAL NEUROPSYCHOLOGIST, 1998, 12 (02): : 277 - 277
  • [49] CHARACTERIZATION OF AMINO ACID SEQUENCES IN PROTEINS BY STATISTICAL METHODS
    ZIMMERMAN, JM
    ELIEZER, N
    SIMHA, R
    JOURNAL OF THEORETICAL BIOLOGY, 1968, 21 (02) : 170 - +
  • [50] The use of propensity scores to assess the generalizability of results from randomized trials
    Stuart, Elizabeth A.
    Cole, Stephen R.
    Bradshaw, Catherine P.
    Leaf, Philip J.
    JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES A-STATISTICS IN SOCIETY, 2011, 174 : 369 - 386