Robust and Scalable Column/Row Sampling from Corrupted Big Data

被引:3
|
作者
Rahmani, Mostafa [1 ]
Atia, George [1 ]
机构
[1] Univ Cent Florida, Orlando, FL 32816 USA
关键词
MATRIX; FACTORIZATION; ALGORITHMS;
D O I
10.1109/ICCVW.2017.215
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Conventional sampling techniques fall short of drawing descriptive sketches of the data when the data is grossly corrupted as such corruptions break the low rank structure required for them to perform satisfactorily. In this paper, we present new sampling algorithms which can locate the informative columns in presence of severe data corruptions. In addition, we develop new scalable randomized designs of the proposed algorithms. The proposed approach is simultaneously robust to sparse corruption and outliers and substantially outperforms the state-of-the-art robust sampling algorithms as demonstrated by experiments conducted using both real and synthetic data.
引用
收藏
页码:1818 / 1826
页数:9
相关论文
共 50 条
  • [1] SCALABLE AND ROBUST PCA APPROACH WITH RANDOM COLUMN/ROW SAMPLING
    Rahmani, Mostafa
    Atia, George
    2016 IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (GLOBALSIP), 2016, : 1320 - 1324
  • [2] Robust and Scalable Entity Alignment in Big Data
    Flamino, James
    Abriola, Christopher
    Zimmerman, Benjamin
    Li, Zhongheng
    Douglas, Joel
    2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 2526 - 2533
  • [3] CDFRS: A scalable sampling approach for efficient big data analysis
    Cai, Yongda
    Wu, Dingming
    Sun, Xudong
    Wu, Siyue
    Xu, Jingsheng
    Huang, Joshua Zhexue
    INFORMATION PROCESSING & MANAGEMENT, 2024, 61 (04)
  • [4] A Scalable Adaptive Sampling Based Approach for Big Data Classification
    Djouzi, Kheyreddine
    Beghdad-Bey, Kadda
    Amamra, Abdenour
    ADVANCES IN COMPUTING SYSTEMS AND APPLICATIONS, 2022, 513 : 73 - 83
  • [5] Robust Risk Minimization for Statistical Learning From Corrupted Data
    Osama, Muhammad
    Zachariah, Dave
    Stoica, Petre
    IEEE OPEN JOURNAL OF SIGNAL PROCESSING, 2020, 1 : 287 - 294
  • [6] Building A Scalable Forward Flux Sampling Framework using Big Data and HPC
    DeFever, Ryan S.
    Hanger, Walter
    Sarupria, Sapna
    Kilgannon, Jon
    Apon, Amy W.
    Ngo, Linh B.
    PEARC '19: PROCEEDINGS OF THE PRACTICE AND EXPERIENCE IN ADVANCED RESEARCH COMPUTING ON RISE OF THE MACHINES (LEARNING), 2019,
  • [7] On Scalable and Robust Truth Discovery in Big Data Social Media Sensing Applications
    Zhang, Daniel
    Wang, Dong
    Vance, Nathan
    Zhang, Yang
    Mike, Steven
    IEEE TRANSACTIONS ON BIG DATA, 2019, 5 (02) : 195 - 208
  • [8] Scalable data summarization on big data
    Feifei Li
    Suman Nath
    Distributed and Parallel Databases, 2014, 32 : 313 - 314
  • [9] Scalable data summarization on big data
    Li, Feifei
    Nath, Suman
    DISTRIBUTED AND PARALLEL DATABASES, 2014, 32 (03) : 313 - 314
  • [10] Scalable Functional Dependencies Discovery from Big Data
    Tu Shouzhong
    Huang Minlie
    2016 IEEE SECOND INTERNATIONAL CONFERENCE ON MULTIMEDIA BIG DATA (BIGMM), 2016, : 426 - 431