共 13 条
pLoc_bal-mEuk: Predict Subcellular Localization of Eukaryotic Proteins by General PseAAC and Quasi-balancing Training Dataset
被引:38
|作者:
Chou, Kuo-Chen
[1
,3
]
Cheng, Xiang
[1
,2
]
Xiao, Xuan
[1
,2
]
机构:
[1] Gordon Life Sci Inst, Boston, MA 02478 USA
[2] Jingdezhen Ceram Inst, Comp Sci, Jingdezhen, Peoples R China
[3] Univ Elect Sci & Technol China, Ctr Informat Biol, Chengdu, Sichuan, Peoples R China
基金:
中国国家自然科学基金;
关键词:
Multi-label system;
eukaryotic proteins;
quasi-balance treatment;
5-step rules;
PseAAC;
ML-GKR;
Chou's intuitive metrics;
AMINO-ACID-COMPOSITION;
INCORPORATING EVOLUTIONARY INFORMATION;
IDENTIFY RECOMBINATION SPOTS;
SEQUENCE-BASED PREDICTOR;
PSEUDO NUCLEOTIDE COMPOSITION;
LYSINE SUCCINYLATION SITES;
AVERAGE CHEMICAL-SHIFT;
ALIGNMENT-FREE METHOD;
3 DIFFERENT MODES;
ENSEMBLE CLASSIFIER;
D O I:
10.2174/1573406415666181218102517
中图分类号:
R914 [药物化学];
学科分类号:
100701 ;
摘要:
Background/Objective: Information of protein subcellular localization is crucially important for both basic research and drug development. With the explosive growth of protein sequences discovered in the post-genomic age, it is highly demanded to develop powerful bioinformatics tools for timely and effectively identifying their subcellular localization purely based on the sequence information alone. Recently, a predictor called "pLoc-mEuk" was developed for identifying the subcellular localization of eukaryotic proteins. Its performance is overwhelmingly better than that of the other predictors for the same purpose, particularly in dealing with multi-label systems where many proteins, called "multiplex proteins", may simultaneously occur in two or more subcellular locations. Although it is indeed a very powerful predictor, more efforts are definitely needed to further improve it. This is because pLoc-mEuk was trained by an extremely skewed dataset where some subset was about 200 times the size of the other subsets. Accordingly, it cannot avoid the biased consequence caused by such an uneven training dataset. Methods: To alleviate such bias, we have developed a new predictor called pLoc_bal-mEuk by quasi-balancing the training dataset. Cross-validation tests on exactly the same experiment-confirmed dataset have indicated that the proposed new predictor is remarkably superior to pLoc-mEuk, the existing state-of-the-art predictor in identifying the subcellular localization of eukaryotic proteins. It has not escaped our notice that the quasi-balancing treatment can also be used to deal with many other biological systems. Results: To maximize the convenience for most experimental scientists, a user-friendly web-server for the new predictor has been established at http://www.jci-bioinfo.cn/pLoc_bal-mEuk/ Conclusion: It is anticipated that the pLoc_bal-Euk predictor holds very high potential to become a useful high throughput tool in identifying the subcellular localization of eukaryotic proteins, particularly for finding multi-target drugs that is currently a very hot trend trend in drug development.
引用
收藏
页码:472 / 485
页数:14
相关论文