An Online Cluster Analysis Method for Large-scale Protein Sequences

被引:0
|
作者
Tang, DongMing [1 ]
Zhu, QingXin [1 ]
Zhang, YueFei [2 ]
Zhang, Jiang [3 ]
机构
[1] Univ Elect Sci & Technol China, Sch Comp Sci & Engn, Chengdu 610054, Peoples R China
[2] Univ Elect Sci & Technol China, Sch Elect Engn, Chengdu 610054, Peoples R China
[3] Univ Elect Sci & Technol China, Sch Life Sci & Technol, Chengdu 610054, Peoples R China
基金
中国国家自然科学基金;
关键词
Online clustering; Pattern recognition; Protein sequences; Sequences analysis; Clustering; CLASSIFICATION; ALGORITHM;
D O I
10.1109/FBIE.2009.5405808
中图分类号
R318 [生物医学工程];
学科分类号
0831 ;
摘要
As modern high-throughput sequencing technologies continue to improve, there is an overwhelming amount of protein sequences un-annotated in the biomedical databases. Clustering protein sequences into homologous groups can help to annotate uncharacterized protein sequences. In this paper, we introduce an online cluster analysis method for large-scale protein sequences based on online clustering algorithms and alignment-free similarity measure for protein sequences, namely, OnlineCAPS. The OnlineCAPS has many advantages, such as the memory requirements and computation cost are very low, the method is fast and enables us to extract clusters from a large scale set of protein sequences, and it can be deployed on the web server, and can perform clustering progress when uploading sequences dataset. The experimental results illustrate the efficiency of the proposed method.
引用
收藏
页码:478 / +
页数:2
相关论文
共 50 条
  • [1] Automatic analysis of large-scale pairwise alignments of protein sequences
    Codani, JJ
    Comet, JP
    Aude, JC
    Glémet, E
    Wozniak, A
    Risler, JL
    Hénaut, A
    Slonimski, PP
    METHODS IN MICROBIOLOGY, VOL 28, 1999, 28 : 229 - 244
  • [2] Online Event Correlations Analysis in System Logs of Large-Scale Cluster Systems
    Zhou, Wei
    Zhan, Jianfeng
    Meng, Dan
    Zhang, Zhihong
    NETWORK AND PARALLEL COMPUTING, 2010, 6289 : 262 - +
  • [3] Large-Scale Comparison Analysis of Genome Sequences
    Tang Haixu
    Ding Dafu(Shanghai Institute of Biochemistry
    生物数学学报, 1997, (02) : 97 - 103
  • [4] THE RATIONAL DESIGN OF LARGE-SCALE PROTEIN SEPARATION SEQUENCES
    ASENJO, JA
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 1988, 196 : 138 - MBTD
  • [5] Cluster-C, an algorithm for the large-scale clustering of protein sequences based on the extraction of maximal cliques
    Mohseni-Zadeh, S
    Brézellec, P
    Risler, JL
    COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2004, 28 (03) : 211 - 218
  • [6] MergeRUCB: A Method for Large-Scale Online Ranker Evaluation
    Zoghi, Masrour
    Whiteson, Shimon
    de Rijke, Maarten
    WSDM'15: PROCEEDINGS OF THE EIGHTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING, 2015, : 17 - 26
  • [7] Large-Scale Pairwise Sequence Alignments on a Large-Scale GPU Cluster
    Savran, Ibrahim
    Gao, Yang
    Bakos, Jason D.
    IEEE DESIGN & TEST, 2014, 31 (01) : 51 - 61
  • [8] MergeDTS: A Method for Effective Large-Scale Online Ranker Evaluation
    Li, Chang
    Markov, Ilya
    De Rijke, Maarten
    Zoghi, Masrour
    ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2020, 38 (04)
  • [9] Teaching Computer System Courses with an Online Large-Scale Method
    Zhang, Ke
    Zhang, Congwu
    Chang, Yisong
    Bao, Yungang
    Chen, Mingyu
    Zhang, Longbing
    Xu, Zhiwei
    Gratch, Jonathan
    Zhang, Jian
    Sun, Ninghui
    IEEE TALE2021: IEEE INTERNATIONAL CONFERENCE ON ENGINEERING, TECHNOLOGY AND EDUCATION, 2021, : 42 - 47
  • [10] Targeted large-scale analysis of protein acetylation
    Mischerikow, Nikolai
    Heck, Albert J. R.
    PROTEOMICS, 2011, 11 (04) : 571 - 589