Unsupervised protein sequences clustering algorithm using functional domain information

被引:0
|
作者
Chen, Wei-Bang [1 ]
Zhang, Chengcui [1 ]
Zhong, Hua [1 ]
机构
[1] Univ Alabama, Dept Comp & Informat Sci, Birmingham, AL 35294 USA
来源
PROCEEDINGS OF THE 2008 IEEE INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION | 2008年
关键词
protein sequences clustering; data mining and knowledge discovery; profile hidden Markov model (HMM); ProDom database;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we present an unsupervised novel approach for protein sequences clustering by incorporating the functional domain information into the clustering process. In the proposed framework, the domain boundaries predicated by ProDom database are used to provide a better measurement in calculating the sequence similarity. In addition, we use an unsupervised clustering algorithm as the kernel that includes a hierarchical clustering in the first phase to pre-cluster the protein sequences, and a partitioning clustering in the second phase to refine the clustering results. More specifically, we perform the agglomerative hierarchical clustering on protein sequences in the first phase to obtain the initial clustering results for the subsequent partitioning clustering, and then, a profile Hidden Markove Model (HMM) is built for each cluster to represent the centroid of a cluster. In the second phase, the HMMs based k-means clustering is then performed to refine the cluster results as protein families. The experimental results show our model is effective and efficient in clustering protein families.
引用
收藏
页码:76 / 81
页数:6
相关论文
共 50 条
  • [11] Unsupervised Visual Domain Adaptation Using Auxiliary Information in Target Domain
    Okamoto, Masaya
    Nakayama, Hideki
    2014 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA (ISM), 2014, : 203 - 206
  • [12] Unsupervised Clustering Algorithm for Video Shots Using Spectral Division
    Zhong, Lin
    Li, Chao
    Li, Huan
    Xiong, Zhang
    ADVANCES IN VISUAL COMPUTING, PT I, PROCEEDINGS, 2008, 5358 : 782 - 792
  • [13] Unsupervised anomaly detection using HDG-Clustering algorithm
    Tsai, Cheng-Fa
    Yen, Chia-Chen
    NEURAL INFORMATION PROCESSING, PART II, 2008, 4985 : 356 - 365
  • [14] Unsupervised image segmentation using penalized fuzzy clustering algorithm
    Yang, Y
    Zhang, F
    Zheng, CX
    Lin, P
    INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING IDEAL 2005, PROCEEDINGS, 2005, 3578 : 71 - 77
  • [15] Unsupervised Bug Report Categorization Using Clustering and Labeling Algorithm
    Limsettho, Nachai
    Hata, Hideaki
    Monden, Akito
    Matsumoto, Kenichi
    INTERNATIONAL JOURNAL OF SOFTWARE ENGINEERING AND KNOWLEDGE ENGINEERING, 2016, 26 (07) : 1027 - 1053
  • [16] Unsupervised varied density based clustering algorithm using spline
    Louhichi, Soumaya
    Gzara, Mariem
    Ben-Abdallah, Hanene
    PATTERN RECOGNITION LETTERS, 2017, 93 : 48 - 57
  • [17] An automatic unsupervised querying algorithm for efficient information extraction in biomedical domain
    Song, M
    Song, IY
    Hu, XH
    Allen, RB
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS, 2005, 3518 : 173 - 179
  • [18] Clustering multivariate functional data using unsupervised binary trees
    Golovkine, Steven
    Klutchnikoff, Nicolas
    Patilea, Valentin
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2022, 168
  • [19] A new alignment-independent algorithm for clustering protein sequences
    Kelil, Abdellali
    Wang, Shengrui
    Brzezinski, Ryszard
    PROCEEDINGS OF THE 7TH IEEE INTERNATIONAL SYMPOSIUM ON BIOINFORMATICS AND BIOENGINEERING, VOLS I AND II, 2007, : 27 - +
  • [20] GibbsCluster: unsupervised clustering and alignment of peptide sequences
    Andreatta, Massimo
    Alvarez, Bruno
    Nielsen, Morten
    NUCLEIC ACIDS RESEARCH, 2017, 45 (W1) : W458 - W463