Unsupervised protein sequences clustering algorithm using functional domain information

被引:0
|
作者
Chen, Wei-Bang [1 ]
Zhang, Chengcui [1 ]
Zhong, Hua [1 ]
机构
[1] Univ Alabama, Dept Comp & Informat Sci, Birmingham, AL 35294 USA
关键词
protein sequences clustering; data mining and knowledge discovery; profile hidden Markov model (HMM); ProDom database;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we present an unsupervised novel approach for protein sequences clustering by incorporating the functional domain information into the clustering process. In the proposed framework, the domain boundaries predicated by ProDom database are used to provide a better measurement in calculating the sequence similarity. In addition, we use an unsupervised clustering algorithm as the kernel that includes a hierarchical clustering in the first phase to pre-cluster the protein sequences, and a partitioning clustering in the second phase to refine the clustering results. More specifically, we perform the agglomerative hierarchical clustering on protein sequences in the first phase to obtain the initial clustering results for the subsequent partitioning clustering, and then, a profile Hidden Markove Model (HMM) is built for each cluster to represent the centroid of a cluster. In the second phase, the HMMs based k-means clustering is then performed to refine the cluster results as protein families. The experimental results show our model is effective and efficient in clustering protein families.
引用
下载
收藏
页码:76 / 81
页数:6
相关论文
共 50 条
  • [11] Unsupervised Visual Domain Adaptation Using Auxiliary Information in Target Domain
    Okamoto, Masaya
    Nakayama, Hideki
    2014 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA (ISM), 2014, : 203 - 206
  • [12] Unsupervised Clustering Algorithm for Video Shots Using Spectral Division
    Zhong, Lin
    Li, Chao
    Li, Huan
    Xiong, Zhang
    ADVANCES IN VISUAL COMPUTING, PT I, PROCEEDINGS, 2008, 5358 : 782 - 792
  • [13] Unsupervised anomaly detection using HDG-Clustering algorithm
    Tsai, Cheng-Fa
    Yen, Chia-Chen
    NEURAL INFORMATION PROCESSING, PART II, 2008, 4985 : 356 - 365
  • [14] Unsupervised image segmentation using penalized fuzzy clustering algorithm
    Yang, Y
    Zhang, F
    Zheng, CX
    Lin, P
    INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING IDEAL 2005, PROCEEDINGS, 2005, 3578 : 71 - 77
  • [15] Unsupervised Bug Report Categorization Using Clustering and Labeling Algorithm
    Limsettho, Nachai
    Hata, Hideaki
    Monden, Akito
    Matsumoto, Kenichi
    INTERNATIONAL JOURNAL OF SOFTWARE ENGINEERING AND KNOWLEDGE ENGINEERING, 2016, 26 (07) : 1027 - 1053
  • [16] Unsupervised varied density based clustering algorithm using spline
    Louhichi, Soumaya
    Gzara, Mariem
    Ben-Abdallah, Hanene
    PATTERN RECOGNITION LETTERS, 2017, 93 : 48 - 57
  • [17] An automatic unsupervised querying algorithm for efficient information extraction in biomedical domain
    Song, M
    Song, IY
    Hu, XH
    Allen, RB
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS, 2005, 3518 : 173 - 179
  • [18] Clustering multivariate functional data using unsupervised binary trees
    Golovkine, Steven
    Klutchnikoff, Nicolas
    Patilea, Valentin
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2022, 168
  • [19] Transfer Domain Class Clustering for Unsupervised Domain Adaptation
    Fan, Yunxin
    Yan, Gang
    Li, Shuang
    Song, Shiji
    Wang, Wei
    Peng, Xinping
    PROCEEDINGS OF THE 3RD INTERNATIONAL CONFERENCE ON ELECTRICAL AND INFORMATION TECHNOLOGIES FOR RAIL TRANSPORTATION (EITRT) 2017: ELECTRICAL TRACTION, 2018, 482 : 827 - 835
  • [20] GibbsCluster: unsupervised clustering and alignment of peptide sequences
    Andreatta, Massimo
    Alvarez, Bruno
    Nielsen, Morten
    NUCLEIC ACIDS RESEARCH, 2017, 45 (W1) : W458 - W463