An ensemble approach for large-scale identification of protein-protein interactions using the alignments of multiple sequences

被引:30
|
作者
Wang, Lei [1 ,5 ]
You, Zhu-Hong [2 ]
Chen, Xing [3 ]
Li, Jian-Qiang [4 ]
Yan, Xin [6 ]
Zhang, Wei [5 ]
Huang, Yu-An [4 ]
机构
[1] China Univ Min & Technol, Sch Comp Sci & Technol, Xuzhou 221116, Peoples R China
[2] Chinese Acad Sci, Xinjiang Tech Inst Phys & Chem, Urumqi 830011, Peoples R China
[3] China Univ Min & Technol, Sch Informat & Elect Engn, Xuzhou 221116, Peoples R China
[4] Shenzhen Univ, Coll Comp Sci & Software Engn, Shenzhen 518060, Guangdong, Peoples R China
[5] Zaozhuang Univ, Coll Informat Sci & Engn, Zaozhuang 277100, Shandong, Peoples R China
[6] Zaozhuang Univ, Sch Foreign Languages, Zaozhuang 277100, Shandong, Peoples R China
关键词
disease; position-specific scoring matrix; multiple sequences alignments; cancer; INTERACTION PREDICTION; NETWORKS; DATABASE; HYPERPLANES; GENERATION; TOOL;
D O I
10.18632/oncotarget.14103
中图分类号
R73 [肿瘤学];
学科分类号
100214 ;
摘要
Protein-Protein Interactions (PPI) is not only the critical component of various biological processes in cells, but also the key to understand the mechanisms leading to healthy and diseased states in organisms. However, it is time-consuming and cost-intensive to identify the interactions among proteins using biological experiments. Hence, how to develop a more efficient computational method rapidly became an attractive topic in the post-genomic era. In this paper, we propose a novel method for inference of protein-protein interactions from protein amino acids sequences only. Specifically, protein amino acids sequence is firstly transformed into Position-Specific Scoring Matrix (PSSM) generated by multiple sequences alignments; then the Pseudo PSSM is used to extract feature descriptors. Finally, ensemble Rotation Forest (RF) learning system is trained to predict and recognize PPIs based solely on protein sequence feature. When performed the proposed method on the three benchmark data sets (Yeast, H. pylori, and independent dataset) for predicting PPIs, our method can achieve good average accuracies of 98.38%, 89.75%, and 96.25%, respectively. In order to further evaluate the prediction performance, we also compare the proposed method with other methods using same benchmark data sets. The experiment results demonstrate that the proposed method consistently outperforms other state-of-the-art method. Therefore, our method is effective and robust and can be taken as a useful tool in exploring and discovering new relationships between proteins. A web server is made publicly available at the URL http://202.119.201.126: 8888/PsePSSM/for academic use.
引用
收藏
页码:5149 / 5159
页数:11
相关论文
共 50 条
  • [1] Efficiently predicting large-scale protein-protein interactions using MapReduce
    Hu, Lun
    Yuan, Xiaohui
    Hu, Pengwei
    Chan, Keith C. C.
    COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2017, 69 : 202 - 206
  • [2] Automatic analysis of large-scale pairwise alignments of protein sequences
    Codani, JJ
    Comet, JP
    Aude, JC
    Glémet, E
    Wozniak, A
    Risler, JL
    Hénaut, A
    Slonimski, PP
    METHODS IN MICROBIOLOGY, VOL 28, 1999, 28 : 229 - 244
  • [3] Large-scale prediction of protein-protein interactions from structures
    Martial Hue
    Michael Riffle
    Jean-Philippe Vert
    William S Noble
    BMC Bioinformatics, 11
  • [4] Large-scale characteristics of the energy landscape in protein-protein interactions
    O'Toole, Nicholas
    Vakser, Ilya A.
    PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2008, 71 (01) : 144 - 152
  • [5] Towards reproducibility in large-scale analysis of protein-protein interactions
    Lund-Johansen, Fridtjof
    Tran, Trung
    Mehta, Adi
    NATURE METHODS, 2021, 18 (07) : 720 - 721
  • [6] Large-scale prediction of protein-protein interactions from structures
    Hue, Martial
    Riffle, Michael
    Vert, Jean-Philippe
    Noble, William S.
    BMC BIOINFORMATICS, 2010, 11
  • [7] Folding and unfolding for binding: large-scale protein dynamics in protein-protein interactions
    Roberts, G. C. K.
    BIOCHEMICAL SOCIETY TRANSACTIONS, 2006, 34 : 971 - 974
  • [8] Investigating the Role of Large-Scale Domain Dynamics in Protein-Protein Interactions
    Delaforge, Elise
    Milles, Sigrid
    Huang, Jie-rong
    Bouvier, Denis
    Jensen, Malene Ringkjobing
    Sattler, Michael
    Hart, Darren J.
    Blackledge, Martin
    FRONTIERS IN MOLECULAR BIOSCIENCES, 2016, 3
  • [9] Large-scale mapping of human protein-protein interactions by mass spectrometry
    Ewing, Rob M.
    Chu, Peter
    Elisma, Fred
    Li, Hongyan
    Taylor, Paul
    Climie, Shane
    McBroom-Cerajewski, Linda
    Robinson, Mark D.
    O'Connor, Liam
    Li, Michael
    Taylor, Rod
    Dharsee, Moyez
    Ho, Yuen
    Heilbut, Adrian
    Moore, Lynda
    Zhang, Shudong
    Ornatsky, Olga
    Bukhman, Yury V.
    Ethier, Martin
    Sheng, Yinglun
    Vasilescu, Julian
    Abu-Farha, Mohamed
    Lambert, Jean-Philippe
    Duewel, Henry S.
    Stewart, Ian I.
    Kuehl, Bonnie
    Hogue, Kelly
    Colwill, Karen
    Gladwish, Katharine
    Muskat, Brenda
    Kinach, Robert
    Adams, Sally-Lin
    Moran, Michael F.
    Morin, Gregg B.
    Topaloglou, Thodoros
    Figeys, Daniel
    MOLECULAR SYSTEMS BIOLOGY, 2007, 3 (1)
  • [10] Comparative assessment of large-scale data sets of protein-protein interactions
    von Mering, C
    Krause, R
    Snel, B
    Cornell, M
    Oliver, SG
    Fields, S
    Bork, P
    NATURE, 2002, 417 (6887) : 399 - 403