Plus ca change - evolutionary sequence divergence predicts protein subcellular localization signals

被引:11
|
作者
Fukasawa, Yoshinori [1 ,2 ]
Leung, Ross K. K. [3 ,4 ]
Tsui, Stephen K. W. [3 ,4 ]
Horton, Paul [1 ,5 ]
机构
[1] Univ Tokyo, Grad Sch Frontier Sci, Dept Computat Biol, Kashiwa, Chiba, Japan
[2] Japan Soc Promot Sci, Tokyo Chiyoda, Japan
[3] Chinese Univ Hong Kong, Hong Kong Bioinformat Ctr, Shatin, Hong Kong, Peoples R China
[4] Chinese Univ Hong Kong, Sch Biomed Sci, Shatin, Hong Kong, Peoples R China
[5] Natl Inst Adv Ind Sci & Technol, Computat Biol Res Ctr, Tokyo, Japan
来源
BMC GENOMICS | 2014年 / 15卷
关键词
MITOCHONDRIAL PRESEQUENCES; AMINO-ACIDS; LOCATIONS; CONSERVATION; MULTICLASS; RESIDUES; PATTERNS; PEPTIDE; TOM20;
D O I
10.1186/1471-2164-15-46
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: Protein subcellular localization is a central problem in understanding cell biology and has been the focus of intense research. In order to predict localization from amino acid sequence a myriad of features have been tried: including amino acid composition, sequence similarity, the presence of certain motifs or domains, and many others. Surprisingly, sequence conservation of sorting motifs has not yet been employed, despite its extensive use for tasks such as the prediction of transcription factor binding sites. Results: Here, we flip the problem around, and present a proof of concept for the idea that the lack of sequence conservation can be a novel feature for localization prediction. We show that for yeast, mammal and plant datasets, evolutionary sequence divergence alone has significant power to identify sequences with N-terminal sorting sequences. Moreover sequence divergence is nearly as effective when computed on automatically defined ortholog sets as on hand curated ones. Unfortunately, sequence divergence did not necessarily increase classification performance when combined with some traditional sequence features such as amino acid composition. However a post-hoc analysis of the proteins in which sequence divergence changes the prediction yielded some proteins with atypical (i.e. not MPP-cleaved) matrix targeting signals as well as a few misannotations. Conclusion: We report the results of the first quantitative study of the effectiveness of evolutionary sequence divergence as a feature for protein subcellular localization prediction. We show that divergence is indeed useful for prediction, but it is not trivial to improve overall accuracy simply by adding this feature to classical sequence features. Nevertheless we argue that sequence divergence is a promising feature and show anecdotal examples in which it succeeds where other features fail.
引用
收藏
页数:15
相关论文
共 50 条
  • [41] Learning protein subcellular localization multi-view patterns from heterogeneous data of imaging, sequence and networks
    Wang, Ge
    Xue, Min-Qi
    Shen, Hong-Bin
    Xu, Ying-Ying
    [J]. BRIEFINGS IN BIOINFORMATICS, 2022, 23 (02)
  • [42] EvoStruct-Sub: An accurate Gram-positive protein subcellular localization predictor using evolutionary and structural features
    Uddin, Md. Raihan
    Sharma, Alok
    Farid, Dewan Md
    Rahman, Md. Mahmudur
    Dehzangi, Abdollah
    Shatabda, Swakkhar
    [J]. JOURNAL OF THEORETICAL BIOLOGY, 2018, 443 : 138 - 146
  • [43] Site-Specific Structural Constraints on Protein Sequence Evolutionary Divergence: Local Packing Density versus Solvent Exposure
    Yeh, So-Wei
    Liu, Jen-Wei
    Yu, Sung-Huan
    Shih, Chien-Hua
    Hwang, Jenn-Kang
    Echave, Julian
    [J]. MOLECULAR BIOLOGY AND EVOLUTION, 2014, 31 (01) : 135 - 139
  • [44] The subcellular distribution of Cucumber mosaic virus LS2b protein, correlation between its nuclear localization and predicted nuclear localization signals
    Wang, Ruilin
    Du, Zhiyou
    Liang, Zongsuo
    [J]. PHYSIOLOGICAL AND MOLECULAR PLANT PATHOLOGY, 2017, 100 : 1 - 12
  • [45] ProLoc-GO: Utilizing informative Gene Ontology terms for sequence-based prediction of protein subcellular localization
    Wen-Lin Huang
    Chun-Wei Tung
    Shih-Wen Ho
    Shiow-Fen Hwang
    Shinn-Ying Ho
    [J]. BMC Bioinformatics, 9
  • [46] ProLoc-GO: Utilizing informative Gene Ontology terms for sequence-based prediction of protein subcellular localization
    Huang, Wen-Lin
    Tung, Chun-Wei
    Ho, Shih-Wen
    Hwang, Shiow-Fen
    Ho, Shinn-Ying
    [J]. BMC BIOINFORMATICS, 2008, 9 (1)
  • [47] Human protein-protein interaction prediction by a novel sequence-based co-evolution method: co-evolutionary divergence
    Liu, Chia Hsin
    Li, Ker-Chau
    Yuan, Shinsheng
    [J]. BIOINFORMATICS, 2013, 29 (01) : 92 - 98
  • [48] Identification of the isoforms of Ca2+/calmodulin-dependent protein kinase II in rat astrocytes and their subcellular localization
    Takeuchi, Y
    Yamamoto, H
    Fukunaga, K
    Miyakawa, T
    Miyamoto, E
    [J]. JOURNAL OF NEUROCHEMISTRY, 2000, 74 (06) : 2557 - 2567
  • [49] Cloning, tissue distribution, subcellular localization and overexpression of murine histidine-rich Ca2+ binding protein
    Ridgeway, AG
    Petropoulos, H
    Siu, A
    Ball, JK
    Skerjanc, IS
    [J]. FEBS LETTERS, 1999, 456 (03) : 399 - 402
  • [50] Fibroblast growth factor 3, a protein with dual subcellular localization, is targeted to the nucleus and nucleolus by the concerted action of two nuclear localization signals and a nucleolar retention signal
    Antoine, M
    Reimers, K
    Dickson, C
    Kiefer, P
    [J]. JOURNAL OF BIOLOGICAL CHEMISTRY, 1997, 272 (47) : 29475 - 29481