Unsupervised protein sequences clustering algorithm using functional domain information

被引:0
|
作者
Chen, Wei-Bang [1 ]
Zhang, Chengcui [1 ]
Zhong, Hua [1 ]
机构
[1] Univ Alabama, Dept Comp & Informat Sci, Birmingham, AL 35294 USA
来源
PROCEEDINGS OF THE 2008 IEEE INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION | 2008年
关键词
protein sequences clustering; data mining and knowledge discovery; profile hidden Markov model (HMM); ProDom database;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we present an unsupervised novel approach for protein sequences clustering by incorporating the functional domain information into the clustering process. In the proposed framework, the domain boundaries predicated by ProDom database are used to provide a better measurement in calculating the sequence similarity. In addition, we use an unsupervised clustering algorithm as the kernel that includes a hierarchical clustering in the first phase to pre-cluster the protein sequences, and a partitioning clustering in the second phase to refine the clustering results. More specifically, we perform the agglomerative hierarchical clustering on protein sequences in the first phase to obtain the initial clustering results for the subsequent partitioning clustering, and then, a profile Hidden Markove Model (HMM) is built for each cluster to represent the centroid of a cluster. In the second phase, the HMMs based k-means clustering is then performed to refine the cluster results as protein families. The experimental results show our model is effective and efficient in clustering protein families.
引用
收藏
页码:76 / 81
页数:6
相关论文
共 50 条
  • [41] Unsupervised view and rate invariant clustering of video sequences
    Turaga, Pavan
    Veeraraghavan, Ashok
    Chellappa, Rama
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2009, 113 (03) : 353 - 371
  • [42] Unsupervised Two-Way Clustering of Metagenomic Sequences
    Prabhakara, Shruthi
    Acharya, Raj
    JOURNAL OF BIOMEDICINE AND BIOTECHNOLOGY, 2012,
  • [43] DeLUCS: Deep learning for unsupervised clustering of DNA sequences
    Arias, Pablo Milla
    Alipour, Fatemeh
    Hill, Kathleen A.
    Kari, Lila
    PLOS ONE, 2022, 17 (01):
  • [44] A Stream Clustering Algorithm using Information Theoretic Clustering Evaluation Function
    Gokcay, Erhan
    CLOSER: PROCEEDINGS OF THE 8TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND SERVICES SCIENCE, 2018, : 582 - 588
  • [45] Generating interacting protein sequences using domain-to-domain translation
    Meynard-Piganeau, Barthelemy
    Fabbri, Caterina
    Weigt, Martin
    Pagnani, Andrea
    Feinauer, Christoph
    BIOINFORMATICS, 2023, 39 (07)
  • [46] UPSEC: An algorithm for classifying unaligned protein sequences into functional families
    Ma, Patrick C. H.
    Chan, Keith C. C.
    JOURNAL OF COMPUTATIONAL BIOLOGY, 2008, 15 (04) : 431 - 443
  • [47] Detecting Functional Modules in Dynamic Protein-Protein Interaction Networks Using Markov Clustering and Firefly Algorithm
    Lei, Xiujuan
    Wang, Fei
    Wu, Fang-Xiang
    Zhang, Aidong
    2014 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2014,
  • [48] Unsupervised Cell Segmentation by Invariant Information Clustering
    van Nierop, Wessel L.
    Schneider, Jan-N.
    de With, Peter H. N.
    van der Sommen, Fons
    MEDICAL IMAGING 2024: IMAGE PROCESSING, 2024, 12926
  • [49] Refining a divisive partitioning algorithm for unsupervised clustering
    Kruengkrai, C
    Sornlertlamvanich, V
    Isahara, H
    DESIGN AND APPLICATION OF HYBRID INTELLIGENT SYSTEMS, 2003, 104 : 535 - 542
  • [50] Quantum spectral clustering algorithm for unsupervised learning
    Qingyu LI
    Yuhan HUANG
    Shan JIN
    Xiaokai HOU
    Xiaoting WANG
    Science China(Information Sciences), 2022, 65 (10) : 43 - 52