Distributed Bayesian Matrix Decomposition for Big Data Mining and Clustering

被引:5
|
作者
Zhang, Chihao [1 ,2 ,3 ]
Yang, Yang [4 ]
Zhou, Wei [4 ]
Zhang, Shihua [1 ,2 ,3 ]
机构
[1] Chinese Acad Sci, Acad Math & Syst Sci, RCSDS, NCMIS,CEMS, Beijing 100190, Peoples R China
[2] Univ Chinese Acad Sci, Sch Math Sci, Beijing 100049, Peoples R China
[3] Chinese Acad Sci, Ctr Excellence Anim Evolut & Genet, Kunming 650223, Yunnan, Peoples R China
[4] Yunnan Univ, Sch Software, Kunming 650504, Yunnan, Peoples R China
基金
中国国家自然科学基金;
关键词
Matrix decomposition; Bayes methods; Big Data; Principal component analysis; Distributed databases; Data mining; Clustering algorithms; Distributed algorithm; bayesian matrix decomposition; clustering; big data; data mining; FACTORIZATION; MODEL;
D O I
10.1109/TKDE.2020.3029582
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Matrix decomposition is one of the fundamental tools to discover knowledge from big data generated by modern applications. However, it is still inefficient or infeasible to process very big data using such a method in a single machine. Moreover, big data are often distributedly collected and stored on different machines. Thus, such data generally bear strong heterogeneous noise. It is essential and useful to develop distributed matrix decomposition for big data analytics. Such a method should scale up well, model the heterogeneous noise, and address the communication issue in a distributed system. To this end, we propose a distributed Bayesian matrix decomposition model (DBMD) for big data mining and clustering. Specifically, we adopt three strategies to implement the distributed computing including 1) the accelerated gradient descent, 2) the alternating direction method of multipliers (ADMM), and 3) the statistical inference. We investigate the theoretical convergence behaviors of these algorithms. To address the heterogeneity of the noise, we propose an optimal plug-in weighted average that reduces the variance of the estimation. Synthetic experiments validate our theoretical results, and real-world experiments show that our algorithms scale up well to big data and achieves superior or competing performance compared to two typical distributed methods including Scalable-NMF and scalable k-means++.
引用
收藏
页码:3701 / 3713
页数:13
相关论文
共 50 条
  • [1] Parallel and distributed clustering framework for big spatial data mining
    Bendechache, Malika
    Tari, A-Kamel
    Kechadi, M-Tahar
    [J]. INTERNATIONAL JOURNAL OF PARALLEL EMERGENT AND DISTRIBUTED SYSTEMS, 2019, 34 (06) : 671 - 689
  • [2] Distributed Big Advertiser Data Mining
    Bindra, Ashish
    Pokuri, Sreenivasulu
    Uppala, Krishna
    Teredesai, Ankur
    [J]. 12TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW 2012), 2012, : 914 - 914
  • [3] An Efficient Clustering Technique for Big Data Mining
    Banait, Satish S.
    Sane, S. S.
    Talekar, Sopan A.
    [J]. INTERNATIONAL JOURNAL OF NEXT-GENERATION COMPUTING, 2022, 13 (03): : 702 - 717
  • [4] Using a Matrix Decomposition for Clustering Data
    Abdulla, Hussain Dahwa
    Polovincak, Martin
    Snasel, Vaclav
    [J]. 2009 INTERNATIONAL CONFERENCE ON COMPUTATIONAL ASPECTS OF SOCIAL NETWORKS, PROCEEDINGS, 2009, : 18 - 23
  • [5] A novel efficient Rank-Revealing QR matrix and Schur decomposition method for big data mining and clustering (RRQR-SDM)
    Paulraj, D.
    Junaid, K. A. Mohamed
    Sethukarasi, T.
    Prem, M. Vigilson
    Neelakandan, S.
    Alhudhaif, Adi
    Alnaim, Norah
    [J]. INFORMATION SCIENCES, 2024, 657
  • [6] Space decomposition in data mining: A clustering approach
    Rokach, L
    Maimon, O
    Lavi, I
    [J]. FOUNDATIONS OF INTELLIGENT SYSTEMS, 2003, 2871 : 24 - 31
  • [7] Space decomposition in data mining - a clustering approach
    Maimon, O
    Rokach, L
    Lavi, I
    [J]. 22ND CONVENTION OF ELECTRICAL AND ELECTRONICS ENGINEERS IN ISRAEL, PROCEEDINGS, 2002, : 101 - 104
  • [8] Distributed Clustering Algorithm for Spatial Data Mining
    Bendechache, Malika
    Kechadi, M-Tahar
    [J]. PROCEEDINGS 2015 SECOND IEEE INTERNATIONAL CONFERENCE ON SPATIAL DATA MINING AND GEOGRAPHICAL KNOWLEDGE SERVICES (ICSDM 2015), 2015, : 60 - 65
  • [9] Consensus Big Data Clustering for Bayesian Mixture Models
    Karras, Christos
    Karras, Aristeidis
    Giotopoulos, Konstantinos C.
    Avlonitis, Markos
    Sioutas, Spyros
    [J]. ALGORITHMS, 2023, 16 (05)
  • [10] Deep Bayesian network architecture for Big Data mining
    Njah, Hasna
    Jamoussi, Salma
    Mahdi, Walid
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2019, 31 (02):