SEQOPTICS: a protein sequence clustering system

被引:14
|
作者
Chen, Yonghui [1 ]
Reilly, Kevin D.
Sprague, Alan P.
Guan, Zhijie
机构
[1] Univ Alabama Birmingham, Dept Comp & Informat Sci, Birmingham, AL 35294 USA
[2] Univ Calif San Diego, San Diego Supercomp Ctr, La Jolla, CA 92093 USA
关键词
D O I
10.1186/1471-2105-7-S4-S10
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Protein sequence clustering has been widely used as a part of the analysis of protein structure and function. In most cases single linkage or graph-based clustering algorithms have been applied. OPTICS (Ordering Points To Identify the Clustering Structure) is an attractive approach due to its emphasis on visualization of results and support for interactive work, e. g., in choosing parameters. However, OPTICS has not been used, as far as we know, for protein sequence clustering. Results: In this paper, a system of clustering proteins, SEQOPTICS (SEQuence clustering with OPTICS) is demonstrated. The system is implemented with Smith-Waterman as protein distance measurement and OPTICS at its core to perform protein sequence clustering. SEQOPTICS is tested with four data sets from different data sources. Visualization of the sequence clustering structure is demonstrated as well. Conclusion: The system was evaluated by comparison with other existing methods. Analysis of the results demonstrates that SEQOPTICS performs better based on some evaluation criteria including Jaccard coefficient, Precision, and Recall. It is a promising protein sequence clustering method with future possible improvement on parallel computing and other protein distance measurements.
引用
收藏
页数:9
相关论文
共 50 条
  • [1] SEQOPTICS: a protein sequence clustering system
    Yonghui Chen
    Kevin D Reilly
    Alan P Sprague
    Zhijie Guan
    BMC Bioinformatics, 7
  • [2] Seqoptics: A protein sequence clustering method
    Chen, Yonghui
    Reilly, Kevin D.
    Sprague, Alan P.
    Guan, Zhijie
    FIRST INTERNATIONAL MULTI-SYMPOSIUMS ON COMPUTER AND COMPUTATIONAL SCIENCES (IMSCCS 2006), PROCEEDINGS, VOL 1, 2006, : 69 - +
  • [3] A Modified Markov Clustering Approach for Protein Sequence Clustering
    Medves, Lehel
    Szilagyi, Laszlo
    Szilagyi, Sandor M.
    PATTERN RECOGNITION IN BIOINFORMATICS, PROCEEDINGS, 2008, 5265 : 110 - 120
  • [4] A Review on Protein Sequence Clustering Research
    Rahman, S. A.
    Bakar, A. A.
    Hussein, Z. A. M.
    4TH KUALA LUMPUR INTERNATIONAL CONFERENCE ON BIOMEDICAL ENGINEERING 2008, VOLS 1 AND 2, 2008, 21 (1-2): : 275 - +
  • [5] A clustering system for data sequence partitioning
    Wang, Yu-Jie
    EXPERT SYSTEMS WITH APPLICATIONS, 2011, 38 (01) : 659 - 666
  • [6] Performance evaluation of protein sequence clustering tools
    Liu, HF
    Teow, LN
    COMPUTATIONAL SCIENCE - ICCS 2005, PT 2, 2005, 3515 : 877 - 885
  • [7] An efficient incremental protein sequence clustering algorithm
    Vijaya, PA
    Murty, MN
    Subramanian, DK
    IEEE TENCON 2003: CONFERENCE ON CONVERGENT TECHNOLOGIES FOR THE ASIA-PACIFIC REGION, VOLS 1-4, 2003, : 409 - 413
  • [8] An efficient technique for protein sequence clustering and classification
    Vijay, PA
    Murty, MN
    Subramanian, DK
    PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 2, 2004, : 447 - 450
  • [9] A Comparative Study of Protein Sequence Clustering Algorithms
    Eldin, A. Sharaf
    AbdelGaber, S.
    Soliman, T.
    Kassim, S.
    Abdo, A.
    INNOVATIONS IN COMPUTING SCIENCES AND SOFTWARE ENGINEERING, 2010, : 373 - 378
  • [10] Parallel sequence alignment algorithm for clustering system
    Chen, Yang
    Yu, Songnian
    Leng, Ming
    KNOWLEDGE ENTERPRISE: INTELLIGENT STRATEGIES IN PRODUCT DESIGN, MANUFACTURING, AND MANAGEMENT, 2006, 207 : 311 - +