Unsupervised protein sequences clustering algorithm using functional domain information

被引:0
|
作者
Chen, Wei-Bang [1 ]
Zhang, Chengcui [1 ]
Zhong, Hua [1 ]
机构
[1] Univ Alabama, Dept Comp & Informat Sci, Birmingham, AL 35294 USA
关键词
protein sequences clustering; data mining and knowledge discovery; profile hidden Markov model (HMM); ProDom database;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we present an unsupervised novel approach for protein sequences clustering by incorporating the functional domain information into the clustering process. In the proposed framework, the domain boundaries predicated by ProDom database are used to provide a better measurement in calculating the sequence similarity. In addition, we use an unsupervised clustering algorithm as the kernel that includes a hierarchical clustering in the first phase to pre-cluster the protein sequences, and a partitioning clustering in the second phase to refine the clustering results. More specifically, we perform the agglomerative hierarchical clustering on protein sequences in the first phase to obtain the initial clustering results for the subsequent partitioning clustering, and then, a profile Hidden Markove Model (HMM) is built for each cluster to represent the centroid of a cluster. In the second phase, the HMMs based k-means clustering is then performed to refine the cluster results as protein families. The experimental results show our model is effective and efficient in clustering protein families.
引用
下载
收藏
页码:76 / 81
页数:6
相关论文
共 50 条
  • [1] Gene sequences clustering and identifying functional domain using a suffix tree algorithm
    Han, Sang Il
    Lee, Sung Gun
    Hwang, Kyu Suk
    Kim, Young Han
    2006 SICE-ICASE INTERNATIONAL JOINT CONFERENCE, VOLS 1-13, 2006, : 2315 - +
  • [2] INFORMATION THEORETIC CLUSTERING FOR UNSUPERVISED DOMAIN-ADAPTATION
    Dey, Subhadeep
    Madikeri, Srikanth
    Motlicek, Petr
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5580 - 5584
  • [3] Efficient functional clustering of protein sequences using the Dirichlet process
    Brown, Duncan P.
    BIOINFORMATICS, 2008, 24 (16) : 1765 - 1771
  • [4] Clustering of multi-domain protein sequences
    Mehrotra, Prachi
    Ami, Vimla Kany G.
    Srinivasan, Narayanaswamy
    PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2018, 86 (07) : 759 - 776
  • [5] A modified Markov clustering approach to unsupervised classification of protein sequences
    Szilagyi, Laszlo
    Medves, Lehel
    Szilagyi, Sandor M.
    NEUROCOMPUTING, 2010, 73 (13-15) : 2332 - 2345
  • [6] Unsupervised learning for hierarchical clustering using statistical information
    Okamoto, M
    Bu, N
    Tsuji, T
    ADVANCES IN NEURAL NETWORKS - ISNN 2004, PT 1, 2004, 3173 : 834 - 839
  • [7] Novel information-theoretic clustering algorithm for robust, unsupervised classification
    Temel, Turgay
    Aydin, Nizamettin
    2007 9TH INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND ITS APPLICATIONS, VOLS 1-3, 2007, : 859 - +
  • [8] An unsupervised neuromorphic clustering algorithm
    Diamond, Alan
    Schmuker, Michael
    Nowotny, Thomas
    BIOLOGICAL CYBERNETICS, 2019, 113 (04) : 423 - 437
  • [9] An unsupervised neuromorphic clustering algorithm
    Alan Diamond
    Michael Schmuker
    Thomas Nowotny
    Biological Cybernetics, 2019, 113 : 423 - 437
  • [10] Unsupervised Cross-domain Learning by Interaction Information Co-clustering
    Ando, Shin
    Suzuki, Einoshin
    ICDM 2008: EIGHTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2008, : 13 - +