A modified two-stage Markov clustering algorithm for large and sparse networks

被引:4
|
作者
Szilagyi, Laszlo [1 ,2 ]
Szilagyi, Sandor M. [2 ,3 ]
机构
[1] Sapientia Univ Transylvania, Fac Tech & Human Sci, Soseaua Sighisoarei 1-C, Targu Mures 540485, Romania
[2] Budapest Univ Technol & Econ, Dept Control Engn & Informat Technol, Magyar Tudosok Krt 2, H-1117 Budapest, Hungary
[3] Petru Maior Univ, Dept Informat, Str N Iorga 1, Targu Mures 540088, Romania
关键词
Hierarchical clustering; Markov clustering; Efficient computing; Sparse matrix; Protein sequence networks; PROTEIN; CLASSIFICATION; DATABASE;
D O I
10.1016/j.cmpb.2016.07.007
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Background: Graph-based hierarchical clustering algorithms become prohibitively costly in both execution time and storage space, as the number of nodes approaches the order of millions. Objective: A fast and highly memory efficient Markov clustering algorithm is proposed to perform the classification of huge sparse networks using an ordinary personal computer. Methods: Improvements compared to previous versions are achieved through adequately chosen data structures that facilitate the efficient handling of symmetric sparse matrices. Clustering is performed in two stages: the initial connected network is processed in a sparse matrix until it breaks into isolated, small, and relatively dense subgraphs, which are then processed separately until convergence is obtained. An intelligent stopping criterion is also proposed to quit further processing of a subgraph that tends toward completeness with equal edge weights. The main advantage of this algorithm is that the necessary number of iterations is separately decided for each graph node. Results: The proposed algorithm was tested using the SCOP95 and large synthetic protein sequence data sets. The validation process revealed that the proposed method can reduce 3-6 times the processing time of huge sequence networks compared to previous Markov clustering solutions, without losing anything from the partition quality. Conclusions: A one-million-node and one-billion-edge protein sequence network defined by a BLAST similarity matrix can be processed with an upper-class personal computer in 100 minutes. Further improvement in speed is possible via parallel data processing, while the extension toward several million nodes needs intermediary data storage, for example on solid state drives. (C) 2016 Elsevier Ireland Ltd. All rights reserved.
引用
收藏
页码:15 / 26
页数:12
相关论文
共 50 条
  • [1] A two-stage density clustering algorithm
    Min Wang
    Ying-Yi Zhang
    Fan Min
    Li-Ping Deng
    Lei Gao
    Soft Computing, 2020, 24 : 17797 - 17819
  • [2] A two-stage density clustering algorithm
    Wang, Min
    Zhang, Ying-Yi
    Min, Fan
    Deng, Li-Ping
    Gao, Lei
    SOFT COMPUTING, 2020, 24 (23) : 17797 - 17819
  • [3] Two-stage PD speech clustering envelope and convolution sparse transfer learning algorithm
    Zhang X.
    Li Y.
    Wang P.
    Yi Qi Yi Biao Xue Bao/Chinese Journal of Scientific Instrument, 2022, 43 (11): : 151 - 161
  • [4] A two-stage genetic algorithm for automatic clustering
    He, Hong
    Tan, Yonghong
    NEUROCOMPUTING, 2012, 81 : 49 - 59
  • [5] Two-stage clustering via neural networks
    Wang, JH
    Rau, JD
    Liu, WJ
    IEEE TRANSACTIONS ON NEURAL NETWORKS, 2003, 14 (03): : 606 - 615
  • [6] CHRONICLE: A Two-Stage Density-Based Clustering Algorithm for Dynamic Networks
    Kim, Min-Soo
    Han, Jiawei
    DISCOVERY SCIENCE, PROCEEDINGS, 2009, 5808 : 152 - 167
  • [7] Two-Stage Sparse Representation Clustering for Dynamic Data Streams
    Chen, Jie
    Wang, Zhu
    Yang, Shengxiang
    Mao, Hua
    IEEE TRANSACTIONS ON CYBERNETICS, 2023, 53 (10) : 6408 - 6420
  • [8] A two-stage evolutionary algorithm for large-scale sparse multiobjective optimization problems
    Jiang, Jing
    Han, Fei
    Wang, Jie
    Ling, Qinghua
    Han, Henry
    Wang, Yue
    SWARM AND EVOLUTIONARY COMPUTATION, 2022, 72
  • [9] A two-stage learning framework of relational Markov networks
    Wan, Huaiyu
    Lin, Youfang
    Wu, Zhihao
    Huang, Houkuan
    Journal of Computational Information Systems, 2010, 6 (04): : 1027 - 1035
  • [10] Two-Stage Clustering with k-Means Algorithm
    Salman, Raied
    Kecman, Vojislav
    Li, Qi
    Strack, Robert
    Test, Erick
    RECENT TRENDS IN WIRELESS AND MOBILE NETWORKS, 2011, 162 : 110 - 122