Block coordinate descent algorithms for large-scale sparse multiclass classification

被引:51
|
作者
Blondel, Mathieu [1 ]
Seki, Kazuhiro [1 ]
Uehara, Kuniaki [1 ]
机构
[1] Kobe Univ, Grad Sch Syst Informat, Nada Ku, Kobe, Hyogo 6578501, Japan
关键词
Multiclass classification; Group sparsity; Block coordinate descent; MODEL SELECTION; REGULARIZATION; OPTIMIZATION;
D O I
10.1007/s10994-013-5367-2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Over the past decade, l(1) regularization has emerged as a powerful way to learn classifiers with implicit feature selection. More recently, mixed-norm (e. g., l(1)/l(2)) regularization has been utilized as a way to select entire groups of features. In this paper, we propose a novel direct multiclass formulation specifically designed for large-scale and high-dimensional problems such as document classification. Based on a multiclass extension of the squared hinge loss, our formulation employs l(1)/l(2) regularization so as to force weights corresponding to the same features to be zero across all classes, resulting in compact and fast-to-evaluate multiclass models. For optimization, we employ two globally-convergent variants of block coordinate descent, one with line search (Tseng and Yun in Math. Program. 117: 387-423, 2009) and the other without (Richtarik and Takac in Math. Program. 1-38, 2012a; Tech. Rep. arXiv:1212.0873, 2012b). We present the two variants in a unified manner and develop the core components needed to efficiently solve our formulation. The end result is a couple of block coordinate descent algorithms specifically tailored to our multiclass formulation. Experimentally, we show that block coordinate descent performs favorably compared to other solvers such as FOBOS, FISTA and SpaRSA. Furthermore, we show that our formulation obtains very compact multiclass models and outperforms l(1)/l(2)-regularized multiclass logistic regression in terms of training speed, while achieving comparable test accuracy.
引用
收藏
页码:31 / 52
页数:22
相关论文
共 50 条
  • [1] Block coordinate descent algorithms for large-scale sparse multiclass classification
    Mathieu Blondel
    Kazuhiro Seki
    Kuniaki Uehara
    [J]. Machine Learning, 2013, 93 : 31 - 52
  • [2] Coordinate descent algorithms for large-scale SVDD
    [J]. Tao, Q. (taoqing@gmail.com), 1600, Science Press (25):
  • [3] Indexed Block Coordinate Descent for Large-Scale Linear Classification with Limited Memory
    Yen, Ian E. H.
    Chang, Chun-Fu
    Lin, Ting-Wei
    Lin, Shan-Wei
    Lin, Shou-De
    [J]. 19TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING (KDD'13), 2013, : 248 - 256
  • [4] A Block-Coordinate Descent Approach for Large-scale Sparse Inverse Covariance Estimation
    Treister, Eran
    Turek, Javier
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 27 (NIPS 2014), 2014, 27
  • [5] On the flexibility of block coordinate descent for large-scale optimization
    Wang, Xiangfeng
    Zhang, Wenjie
    Yan, Junchi
    Yuan, Xiaoming
    Zha, Hongyuan
    [J]. NEUROCOMPUTING, 2018, 272 : 471 - 480
  • [6] Approximate Block Coordinate Descent for Large Scale Hierarchical Classification
    Charuvaka, Anveshi
    Rangwala, Huzefa
    [J]. 30TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, VOLS I AND II, 2015, : 837 - 844
  • [7] A random coordinate descent algorithm for large-scale sparse nonconvex optimization
    Patrascu, Andrei
    Necoara, Ion
    [J]. 2013 EUROPEAN CONTROL CONFERENCE (ECC), 2013, : 2789 - 2794
  • [8] Efficient random coordinate descent algorithms for large-scale structured nonconvex optimization
    Patrascu, Andrei
    Necoara, Ion
    [J]. JOURNAL OF GLOBAL OPTIMIZATION, 2015, 61 (01) : 19 - 46
  • [9] Stochastic Parallel Block Coordinate Descent for Large-Scale Saddle Point Problems
    Zhu, Zhanxing
    Storkey, Amos J.
    [J]. THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2016, : 2429 - 2435
  • [10] Efficient random coordinate descent algorithms for large-scale structured nonconvex optimization
    Andrei Patrascu
    Ion Necoara
    [J]. Journal of Global Optimization, 2015, 61 : 19 - 46