Streaming and Distributed Algorithms for Robust Column Subset Selection

被引:0
|
作者
Jiang, Shuli [1 ]
Li, Dongyu [1 ]
Li, Irene Mengze [1 ]
Mahankali, Arvind, V [1 ]
Woodruff, David P. [1 ]
机构
[1] Carnegie Mellon Univ, Sch Comp Sci, Pittsburgh, PA 15213 USA
来源
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139 | 2021年 / 139卷
关键词
MATRIX; APPROXIMATION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We give the first single-pass streaming algorithm for Column Subset Selection with respect to the entrywise l(p)-norm with 1 <= p < 2. We study the l(p) norm loss since it is often considered more robust to noise than the standard Frobenius norm. Given an input matrix A is an element of R-dxn (n >> d), our algorithm achieves a multiplicative k(1/p-1/2)-poly(log nd)-approximation to the error with respect to the best possible column subset of size k. Furthermore, the space complexity of the streaming algorithm is optimal up to a logarithmic factor. Our streaming algorithm also extends naturally to a 1-round distributed protocol with nearly optimal communication cost. A key ingredient in our algorithms is a reduction to column subset selection in the l(p,2)-norm, which corresponds to the p-norm of the vector of Euclidean norms of each of the columns of A. This enables us to leverage strong coreset constructions for the Euclidean norm, which previously had not been applied in this context. We also give the first provable guarantees for greedy column subset selection in the l(1,2) norm, which can be used as an alternative, practical subroutine in our algorithms. Finally, we show that our algorithms give significant practical advantages on real-world data analysis tasks.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Greedy Column Subset Selection: New Bounds and Distributed Algorithms
    Altschuler, Jason
    Bhaskara, Aditya
    Fu, Gang
    Mirrokni, Vahab
    Rostamizadeh, Afshin
    Zadimoghaddam, Morteza
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
  • [2] Distributed Column Subset Selection on MapReduce
    Farahat, Ahmed K.
    Elgohary, Ahmed
    Ghodsi, Ali
    Kamel, Mohamed S.
    2013 IEEE 13TH INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2013, : 171 - 180
  • [3] A Distributed Integrated Feature Selection Scheme for Column Subset Selection
    Xiao, Zheng
    Wei, PengCheng
    Chronopoulos, Anthony Theodore
    Elster, Anne C. C.
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (03) : 2193 - 2205
  • [4] Optimal column subset selection for image classification by genetic algorithms
    Pavel Krömer
    Jan Platoš
    Jana Nowaková
    Václav Snášel
    Annals of Operations Research, 2018, 265 : 205 - 222
  • [5] Optimal column subset selection for image classification by genetic algorithms
    Kroemer, Pavel
    Platos, Jan
    Nowakova, Jana
    Snasel, Vaclav
    ANNALS OF OPERATIONS RESEARCH, 2018, 265 (02) : 205 - 222
  • [6] A Comparison of Differential Evolution and Genetic Algorithms for the Column Subset Selection Problem
    Kromer, Pavel
    Platos, Jan
    PROCEEDINGS OF THE 9TH INTERNATIONAL CONFERENCE ON COMPUTER RECOGNITION SYSTEMS, CORES 2015, 2016, 403 : 223 - 232
  • [7] Robust subset selection
    Thompson, Ryan
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2022, 169
  • [8] Iterative column subset selection
    Bruno Ordozgoiti
    Sandra Gómez Canaval
    Alberto Mozo
    Knowledge and Information Systems, 2018, 54 : 65 - 94
  • [9] Iterative column subset selection
    Ordozgoiti, Bruno
    Gomez Canaval, Sandra
    Mozo, Alberto
    KNOWLEDGE AND INFORMATION SYSTEMS, 2018, 54 (01) : 65 - 94
  • [10] A Note on Column Subset Selection
    Youssef, Pierre
    INTERNATIONAL MATHEMATICS RESEARCH NOTICES, 2014, 2014 (23) : 6431 - 6447