Unsupervised protein sequences clustering algorithm using functional domain information

被引:0
|
作者
Chen, Wei-Bang [1 ]
Zhang, Chengcui [1 ]
Zhong, Hua [1 ]
机构
[1] Univ Alabama, Dept Comp & Informat Sci, Birmingham, AL 35294 USA
关键词
protein sequences clustering; data mining and knowledge discovery; profile hidden Markov model (HMM); ProDom database;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we present an unsupervised novel approach for protein sequences clustering by incorporating the functional domain information into the clustering process. In the proposed framework, the domain boundaries predicated by ProDom database are used to provide a better measurement in calculating the sequence similarity. In addition, we use an unsupervised clustering algorithm as the kernel that includes a hierarchical clustering in the first phase to pre-cluster the protein sequences, and a partitioning clustering in the second phase to refine the clustering results. More specifically, we perform the agglomerative hierarchical clustering on protein sequences in the first phase to obtain the initial clustering results for the subsequent partitioning clustering, and then, a profile Hidden Markove Model (HMM) is built for each cluster to represent the centroid of a cluster. In the second phase, the HMMs based k-means clustering is then performed to refine the cluster results as protein families. The experimental results show our model is effective and efficient in clustering protein families.
引用
下载
收藏
页码:76 / 81
页数:6
相关论文
共 50 条
  • [31] Speech Enhancement Network with Unsupervised Attention using Invariant Information Clustering
    Sugiura, Yosuke
    Nagamori, Shunta
    Shimamura, Tetsuya
    2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 406 - 409
  • [32] Unsupervised image-set clustering using an information theoretic framework
    Goldberger, J
    Gordon, S
    Greenspan, H
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2006, 15 (02) : 449 - 458
  • [33] Clustering of zones according to the level of gentrification by using an unsupervised learning algorithm
    Raya-Tapia, Alma Yunuen
    Ramirez-Marquez, Cesar
    Ponce-Ortega, Jose Maria
    CITIES, 2024, 151
  • [34] Clustering algorithm using rough set theory for unsupervised feature selection
    Pacheco, Fannia
    Cerrada, Mariela
    Li, Chuan
    Sanchez, Rene Vinicio
    Cabrera, Diego
    de Oliveira, Jose Valente
    2016 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2016, : 3493 - 3499
  • [35] Unsupervised Clustering for Identification of Malicious Domain Campaigns
    Weber, Michael
    Wang, Jun
    Zhou, Yuchen
    PROCEEDINGS OF THE FIRST WORKSHOP ON RADICAL AND EXPERIENTIAL SECURITY (RESEC'18), 2018, : 33 - 39
  • [36] A Novel Unsupervised Outlier Detection Algorithm Based on Mutual Information and Reduced Spectral Clustering
    Huang, Yuehua
    Liu, Wenfen
    Li, Song
    Guo, Ying
    Chen, Wen
    ELECTRONICS, 2023, 12 (23)
  • [37] Min-sum Clustering of Protein Sequences with Limited Distance Information
    Voevodski, Konstantin
    Balcan, Maria-Florina
    Roeglin, Heiko
    Teng, Shang-Hua
    Xia, Yu
    SIMILARITY-BASED PATTERN RECOGNITION: FIRST INTERNATIONAL WORKSHOP, SIMBAD 2011, 2011, 7005 : 192 - 206
  • [38] Bubble agglomeration algorithm for unsupervised classification: a new clustering methodology without a priori information
    Barakat, NAM
    Jiang, JH
    Yu, RQ
    CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2005, 77 (1-2) : 43 - 49
  • [39] Min-sum Clustering of Protein Sequences with Limited Distance Information
    Voevodski, Konstantin
    Balcan, Maria-Florina
    Roeglin, Heiko
    Teng, Shang-Hua
    Xia, Yu
    SIMILARITY-BASED PATTERN RECOGNITION, 2011, 7005 : 192 - +
  • [40] USING A CLUSTERING ALGORITHM FOR DOMAIN RELATED ONTOLOGY CONSTRUCTION
    Yi, Hongyan
    Rayward-Smith, V. J.
    KEOD 2009: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON KNOWLEDGE ENGINEERING AND ONTOLOGY DEVELOPMENT, 2009, : 336 - 341