Big Data Clustering via Random Sketching and Validation

被引:0
|
作者
Traganitis, Panagiotis A. [1 ]
Slavakis, Konstantinos
Giannakis, Georgios B.
机构
[1] Univ Minnesota, Dept ECE, Minneapolis, MN 55455 USA
关键词
Clustering; high-dimensional data; feature selection; big data; random sketching and validation; random sampling and consensus; K-means;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
As the number and dimensionality of data increases, development of new efficient processing tools has become a necessity. The present paper introduces a novel dimensionality reduction approach for fast and efficient clustering of high-dimensional data. The new methods extend random sampling and consensus (RANSAC) arguments, originally developed for robust regression tasks in computer vision, to the dimensionality reduction problem. The advocated random sketching and validation K-means (SkeVa K-means) and Divergence SkeVa algorithms can achieve high performance, with the latter being able to afford lower computational footprint than the former. Extensive numerical tests on synthetic and real datasets highlight the potential of the proposed algorithms, and demonstrate their competitive performance relative to state-of-the-art random projection alternatives.
引用
收藏
页码:1046 / 1050
页数:5
相关论文
共 50 条
  • [1] Spectral Clustering of Large-scale Communities via Random Sketching and Validation
    Traganitis, Panagiotis A.
    Slavakis, Konstantinos
    Giannakis, Georgios B.
    [J]. 2015 49TH ANNUAL CONFERENCE ON INFORMATION SCIENCES AND SYSTEMS (CISS), 2015,
  • [2] Large-Scale Subspace Clustering Using Random Sketching and Validation
    Traganitis, Panagiotis A.
    Slavakis, Konstantinos
    Giannakis, Georgios B.
    [J]. 2015 49TH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS AND COMPUTERS, 2015, : 107 - 111
  • [3] External clustering validation in Big Data context
    Zerabi, Soumeya
    Meshoul, Souham
    [J]. PROCEEDINGS OF 2017 3RD INTERNATIONAL CONFERENCE OF CLOUD COMPUTING TECHNOLOGIES AND APPLICATIONS (CLOUDTECH), 2017, : 264 - 269
  • [4] Big Data Sketching with Model Mismatch
    Chepuri, Sundeep Prabhakar
    Zhang, Yu
    Leus, Geert
    Giannakis, G. B.
    [J]. 2015 49TH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS AND COMPUTERS, 2015, : 97 - 101
  • [5] Sketching for Big Data Recommender Systems Using Fast Pseudo-random Fingerprints
    Bachrach, Yoram
    Porat, Ely
    [J]. AUTOMATA, LANGUAGES, AND PROGRAMMING, PT II, 2013, 7966 : 459 - 471
  • [6] Limited random walk algorithm for big graph data clustering
    Zhang H.
    Raitoharju J.
    Kiranyaz S.
    Gabbouj M.
    [J]. Journal of Big Data, 3 (1)
  • [7] ONLINE SKETCHING FOR BIG DATA SUBSPACE LEARNING
    Mardani, Morteza
    Giannakis, Georgios B.
    [J]. 2015 23RD EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2015, : 2511 - 2515
  • [8] Multi-view Iterative Random Projections on Big Data Clustering
    Bettoumi, Safa
    Jlassi, Chiraz
    Arous, Najet
    [J]. IMAGE AND SIGNAL PROCESSING (ICISP 2018), 2018, 10884 : 215 - 224
  • [9] Sketching Meets Random Projection in the Dual: A Provable Recovery Algorithm for Big and High-dimensional Data
    Wang, Jialei
    Lee, Jason D.
    Mahdavi, Mehrdad
    Kolar, Mladen
    Srebro, Nathan
    [J]. ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 54, 2017, 54 : 1150 - 1158
  • [10] Sketching meets random projection in the dual: A provable recovery algorithm for big and high-dimensional data
    Wang, Jialei
    Lee, Jason D.
    Mahdavi, Mehrdad
    Kolar, Mladen
    Srebro, Nathan
    [J]. ELECTRONIC JOURNAL OF STATISTICS, 2017, 11 (02): : 4896 - 4944