Biclustering and Boolean Matrix Factorization in Data Streams

被引:3
|
作者
Neumann, Stefan [1 ]
Miettinen, Pauli [2 ]
机构
[1] Univ Vienna, Fac Comp Sci, Vienna, Austria
[2] Univ Eastern Finland, Sch Comp, Kuopio, Finland
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2020年 / 13卷 / 10期
基金
欧洲研究理事会; 奥地利科学基金会;
关键词
ALGORITHMS;
D O I
10.14778/3401960.3401968
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We study clustering of bipartite graphs and Boolean matrix factorization in data streams. We consider a streaming setting in which the vertices from the left side of the graph arrive one by one together with all of their incident edges. We provide an algorithm which after one pass over the stream recovers the set of clusters on the right side of the graph using sublinear space; to the best of our knowledge this is the first algorithm with this property. We also show that after a second pass over the stream the left clusters of the bipartite graph can be recovered and we show how to extend our algorithm to solve the Boolean matrix factorization problem (by exploiting the correspondence of Boolean matrices and bipartite graphs). We evaluate an implementation of the algorithm on synthetic data and on real-world data. On real-world datasets the algorithm is orders of magnitudes faster than a static baseline algorithm while providing quality results within a factor 2 of the baseline algorithm. Our algorithm scales linearly in the number of edges in the graph. Finally, we analyze the algorithm theoretically and provide sufficient conditions under which the algorithm recovers a set of planted clusters under a standard random graph model.
引用
收藏
页码:1709 / 1722
页数:14
相关论文
共 50 条
  • [1] Boolean Matrix Factorization for Data with Symmetric Variables
    Konecny, Jan
    Trnecka, Martin
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2022, : 1011 - 1016
  • [2] What Can FCA-Based Boolean Matrix Factorization Do for Object-Attribute Biclustering?
    Trnecka, Martin
    Vyjidacek, Roman
    [J]. CONCEPTUAL KNOWLEDGE STRUCTURES, CONCEPTS 2024, 2024, 14914 : 123 - 131
  • [3] On Boolean Representation of Continuous Data Biclustering
    Michalak, Marcin
    Slezak, Dominik
    [J]. FUNDAMENTA INFORMATICAE, 2019, 167 (03) : 193 - 217
  • [4] A Boolean factorization using an extended Boolean matrix
    Kwon, OH
    Hong, SJ
    Kim, J
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 1998, E81D (12) : 1466 - 1472
  • [5] Bayesian biclustering for microbial metagenomic sequencing data via multinomial matrix factorization
    Zhou, Fangting
    He, Kejun
    Li, Qiwei
    Chapkin, Robert S.
    Ni, Yang
    [J]. BIOSTATISTICS, 2022, 23 (03) : 891 - 909
  • [6] Implementing Boolean Matrix Factorization
    Neruda, Roman
    Snasel, Vaclav
    Platos, Jan
    Kromer, Pavel
    Husek, Dusan
    Frolov, Alexander A.
    [J]. ARTIFICIAL NEURAL NETWORKS - ICANN 2008, PT I, 2008, 5163 : 543 - +
  • [7] On the implementation of Boolean matrix factorization
    Snasel, Vaclav
    Kromer, Pavel
    Platos, Jan
    Husek, Dusan
    [J]. DEXA 2008: 19TH INTERNATIONAL CONFERENCE ON DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2008, : 554 - +
  • [8] On the road to genetic Boolean matrix factorization
    Snasel, Vaclav
    Platos, Jan
    Kroemer, Pavel
    Husek, Dusan
    Frolov, Alexander A.
    [J]. NEURAL NETWORK WORLD, 2007, 17 (06) : 675 - 688
  • [9] On Genetic Algorithms for Boolean Matrix Factorization
    Snasel, Vaclav
    Platos, Jan
    Kromer, Pavel
    [J]. ISDA 2008: EIGHTH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS DESIGN AND APPLICATIONS, VOL 2, PROCEEDINGS, 2008, : 170 - 175
  • [10] A generalized approach for Boolean matrix factorization
    Farias, Rodrigo Cabral
    Miron, Sebastian
    [J]. SIGNAL PROCESSING, 2023, 206