Unsupervised protein sequences clustering algorithm using functional domain information

被引:0
|
作者
Chen, Wei-Bang [1 ]
Zhang, Chengcui [1 ]
Zhong, Hua [1 ]
机构
[1] Univ Alabama, Dept Comp & Informat Sci, Birmingham, AL 35294 USA
关键词
protein sequences clustering; data mining and knowledge discovery; profile hidden Markov model (HMM); ProDom database;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we present an unsupervised novel approach for protein sequences clustering by incorporating the functional domain information into the clustering process. In the proposed framework, the domain boundaries predicated by ProDom database are used to provide a better measurement in calculating the sequence similarity. In addition, we use an unsupervised clustering algorithm as the kernel that includes a hierarchical clustering in the first phase to pre-cluster the protein sequences, and a partitioning clustering in the second phase to refine the clustering results. More specifically, we perform the agglomerative hierarchical clustering on protein sequences in the first phase to obtain the initial clustering results for the subsequent partitioning clustering, and then, a profile Hidden Markove Model (HMM) is built for each cluster to represent the centroid of a cluster. In the second phase, the HMMs based k-means clustering is then performed to refine the cluster results as protein families. The experimental results show our model is effective and efficient in clustering protein families.
引用
下载
收藏
页码:76 / 81
页数:6
相关论文
共 50 条
  • [21] An Unsupervised Attribute Clustering Algorithm for Unsupervised Feature Selection
    Zhou, Pei-Yuan
    Chan, Keith C. C.
    PROCEEDINGS OF THE 2015 IEEE INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (IEEE DSAA 2015), 2015, : 710 - 716
  • [22] A new alignment-independent algorithm for clustering protein sequences
    Kelil, Abdellali
    Wang, Shengrui
    Brzezinski, Ryszard
    PROCEEDINGS OF THE 7TH IEEE INTERNATIONAL SYMPOSIUM ON BIOINFORMATICS AND BIOENGINEERING, VOLS I AND II, 2007, : 27 - +
  • [23] Unsupervised statistical clustering of environmental shotgun sequences
    Kislyuk, Andrey
    Bhatnagar, Srijak
    Dushoff, Jonathan
    Weitz, Joshua S.
    BMC BIOINFORMATICS, 2009, 10 : 316
  • [24] Unsupervised statistical clustering of environmental shotgun sequences
    Andrey Kislyuk
    Srijak Bhatnagar
    Jonathan Dushoff
    Joshua S Weitz
    BMC Bioinformatics, 10
  • [25] pyUPMASK: an improved unsupervised clustering algorithm
    Pera, M. S.
    Perren, G., I
    Moitinho, A.
    Navone, H. D.
    Vazquez, R. A.
    ASTRONOMY & ASTROPHYSICS, 2021, 650
  • [26] An unsupervised clustering algorithm for intrusion detection
    Guan, Y
    Ghorbani, AA
    Belacel, N
    ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2003, 2671 : 616 - 617
  • [27] A scalable hierarchical algorithm for unsupervised clustering
    Boley, D
    DATA MINING FOR SCIENTIFIC AND ENGINEERING APPLICATIONS, 2001, 2 : 383 - 400
  • [28] UNSUPERVISED LEARNING ALGORITHM FOR FUZZY CLUSTERING
    URAHAMA, K
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 1993, E76D (03) : 390 - 391
  • [29] A New Clustering Algorithm By Using Boundary Information
    Zhong, Junkun
    Wang, Yuping
    Du, Hui
    Tong, Wuning
    2018 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2018, : 1923 - 1930
  • [30] Unsupervised bayesian clustering for functional data
    Juery, Damien
    Abraham, Christophe
    Fontez, Benedicte
    JOURNAL OF THE SFDS, 2014, 155 (02): : 185 - 201