A novel algorithm for detecting multiple covariance and clustering of biological sequences

被引:14
|
作者
Shen, Wei [1 ,2 ]
Li, Yan [1 ,2 ]
机构
[1] Third Mil Med Univ, Southwest Hosp, Med Res Ctr, Chongqing 400038, Peoples R China
[2] Third Mil Med Univ, Dept Microbiol, Coll Basic Med Sci, Chongqing 400038, Peoples R China
来源
SCIENTIFIC REPORTS | 2016年 / 6卷
基金
中国国家自然科学基金;
关键词
RESIDUE CONTACTS; PROTEIN; GAPDH; IDENTIFICATION; COEVOLUTION; INFORMATION; LIKELIHOOD; ALIGNMENT; FAMILIES;
D O I
10.1038/srep30425
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Single genetic mutations are always followed by a set of compensatory mutations. Thus, multiple changes commonly occur in biological sequences and play crucial roles in maintaining conformational and functional stability. Although many methods are available to detect single mutations or covariant pairs, detecting non-synchronous multiple changes at different sites in sequences remains challenging. Here, we develop a novel algorithm, named Fastcov, to identify multiple correlated changes in biological sequences using an independent pair model followed by a tandem model of site-residue elements based on inter-restriction thinking. Fastcov performed exceptionally well at harvesting co-pairs and detecting multiple covariant patterns. By 10-fold cross-validation using datasets of different scales, the characteristic patterns successfully classified the sequences into target groups with an accuracy of greater than 98%. Moreover, we demonstrated that the multiple covariant patterns represent co-evolutionary modes corresponding to the phylogenetic tree, and provide a new understanding of protein structural stability. In contrast to other methods, Fastcov provides not only a reliable and effective approach to identify covariant pairs but also more powerful functions, including multiple covariance detection and sequence classification, that are most useful for studying the point and compensatory mutations caused by natural selection, drug induction, environmental pressure, etc.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] A novel algorithm for detecting multiple covariance and clustering of biological sequences
    Wei Shen
    Yan Li
    Scientific Reports, 6
  • [2] A novel hierarchical clustering algorithm for gene sequences
    Wei, Dan
    Jiang, Qingshan
    Wei, Yanjie
    Wang, Shengrui
    BMC BIOINFORMATICS, 2012, 13
  • [3] A novel hierarchical clustering algorithm for gene sequences
    Dan Wei
    Qingshan Jiang
    Yanjie Wei
    Shengrui Wang
    BMC Bioinformatics, 13
  • [4] A Novel Approach to Clustering Genome Sequences Using Inter-nucleotide Covariance
    Dong, Rui
    He, Lily
    He, Rong Lucy
    Yau, Stephen S-T
    FRONTIERS IN GENETICS, 2019, 10
  • [5] Massively Parallel Algorithm for Multiple Biological Sequences Alignment
    Borovska, Plamenka
    Gancheva, Veska
    Landzhev, Nikolay
    2013 36TH INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS AND SIGNAL PROCESSING (TSP), 2013, : 638 - 642
  • [6] Active Clustering of Biological Sequences
    Voevodski, Konstantin
    Balcan, Maria-Florina
    Roeglin, Heiko
    Teng, Shang-Hua
    Xia, Yu
    JOURNAL OF MACHINE LEARNING RESEARCH, 2012, 13 : 203 - 225
  • [7] An MCMC algorithm for detecting short adjacent repeats shared by multiple sequences
    Li, Qiwei
    Fan, Xiaodan
    Liang, Tong
    Li, Shuo-Yen R.
    BIOINFORMATICS, 2011, 27 (13) : 1772 - 1779
  • [8] An Algorithm to Find All Identical Motifs in Multiple Biological Sequences
    Bindal, Ashish Kishor
    Sabarinathan, R.
    Sridhar, J.
    Sherlin, D.
    Sekar, K.
    PATTERN RECOGNITION IN BIOINFORMATICS, 2010, 6282 : 137 - +
  • [9] GenericBioMatch: A novel generic pattern match algorithm for biological sequences
    Pan, YL
    Famili, AF
    PROCEEDINGS OF THE 2003 IEEE BIOINFORMATICS CONFERENCE, 2003, : 562 - 563
  • [10] Detecting periodic patterns in biological sequences
    Coward, E
    Drablos, F
    BIOINFORMATICS, 1998, 14 (06) : 498 - 507