A novel algorithm for detecting multiple covariance and clustering of biological sequences

被引:14
|
作者
Shen, Wei [1 ,2 ]
Li, Yan [1 ,2 ]
机构
[1] Third Mil Med Univ, Southwest Hosp, Med Res Ctr, Chongqing 400038, Peoples R China
[2] Third Mil Med Univ, Dept Microbiol, Coll Basic Med Sci, Chongqing 400038, Peoples R China
来源
SCIENTIFIC REPORTS | 2016年 / 6卷
基金
中国国家自然科学基金;
关键词
RESIDUE CONTACTS; PROTEIN; GAPDH; IDENTIFICATION; COEVOLUTION; INFORMATION; LIKELIHOOD; ALIGNMENT; FAMILIES;
D O I
10.1038/srep30425
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Single genetic mutations are always followed by a set of compensatory mutations. Thus, multiple changes commonly occur in biological sequences and play crucial roles in maintaining conformational and functional stability. Although many methods are available to detect single mutations or covariant pairs, detecting non-synchronous multiple changes at different sites in sequences remains challenging. Here, we develop a novel algorithm, named Fastcov, to identify multiple correlated changes in biological sequences using an independent pair model followed by a tandem model of site-residue elements based on inter-restriction thinking. Fastcov performed exceptionally well at harvesting co-pairs and detecting multiple covariant patterns. By 10-fold cross-validation using datasets of different scales, the characteristic patterns successfully classified the sequences into target groups with an accuracy of greater than 98%. Moreover, we demonstrated that the multiple covariant patterns represent co-evolutionary modes corresponding to the phylogenetic tree, and provide a new understanding of protein structural stability. In contrast to other methods, Fastcov provides not only a reliable and effective approach to identify covariant pairs but also more powerful functions, including multiple covariance detection and sequence classification, that are most useful for studying the point and compensatory mutations caused by natural selection, drug induction, environmental pressure, etc.
引用
收藏
页数:8
相关论文
共 50 条
  • [21] A Novel Metric for Detecting Anomalous Ship Behavior Using a Variation of the DBSCAN Clustering Algorithm
    Botts C.H.
    SN Computer Science, 2021, 2 (5)
  • [22] Algorithm of detecting structural variations in DNA sequences
    Nalecz-Charkielwicz, Katarzyna
    Nowak, Robert
    PHOTONICS APPLICATIONS IN ASTRONOMY, COMMUNICATIONS, INDUSTRY, AND HIGH-ENERGY PHYSICS EXPERIMENTS 2014, 2014, 9290
  • [23] Hierarchic Clustering Algorithm used for Anomaly Detecting
    Chen, Zhenguo
    Zhu, Dongmei
    CEIS 2011, 2011, 15
  • [24] Clustering Boundary Detecting Algorithm for Each Cluster
    Wang, Kun
    Qiu, Baozhi
    Shen, Xiangdong
    PROCEEDINGS OF THE 2016 3RD INTERNATIONAL CONFERENCE ON MATERIALS ENGINEERING, MANUFACTURING TECHNOLOGY AND CONTROL, 2016, 67 : 394 - 398
  • [25] Optimization of covariance distance measurement algorithm for multidimensional clustering analysis
    Liu, Yun
    Zhang, Yi
    Zheng, Wenfeng
    He Jishu/Nuclear Techniques, 2023, 46 (05): : 102 - 110
  • [26] Spectral clustering algorithm combining local covariance matrix with normalization
    Tingting Du
    Guoqiu Wen
    Zhiguo Cai
    Wei Zheng
    Malong Tan
    Yangding Li
    Neural Computing and Applications, 2020, 32 : 6611 - 6618
  • [27] Spectral clustering algorithm combining local covariance matrix with normalization
    Du, Tingting
    Wen, Guoqiu
    Cai, Zhiguo
    Zheng, Wei
    Tan, Malong
    Li, Yangding
    NEURAL COMPUTING & APPLICATIONS, 2020, 32 (11): : 6611 - 6618
  • [28] Detecting and clustering multiple takes of one scene
    Bailer, Werner
    Lee, Felix
    Thallinger, Georg
    ADVANCES IN MULTIMEDIA MODELING, PROCEEDINGS, 2008, 4903 : 80 - 89
  • [29] AN IMPROVED ALGORITHM FOR MATCHING BIOLOGICAL SEQUENCES
    GOTOH, O
    JOURNAL OF MOLECULAR BIOLOGY, 1982, 162 (03) : 705 - 708
  • [30] Multiple structural alignment and clustering of RNA sequences
    Torarinsson, Elfar
    Havgaard, Jakob H.
    Gorodkin, Jan
    BIOINFORMATICS, 2007, 23 (08) : 926 - 932