BEAR: Sketching BFGS Algorithm for Ultra-High Dimensional Feature Selection in Sublinear Memory

被引:0
|
作者
Aghazadeh, Amirali [1 ]
Gupta, Vipul [1 ]
DeWeese, Alex [1 ]
Koyluoglu, O. Ozan [1 ]
Ramchandran, Kannan [1 ]
机构
[1] Univ Calif Berkeley, Dept Elect Engn & Comp Sci, Berkeley, CA 94720 USA
关键词
Feature selection; sketching; second-order optimization; sublinear memory;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We consider feature selection for applications in machine learning where the dimensionality of the data is so large that it exceeds the working memory of the (local) computing machine. Unfortunately, current large-scale sketching algorithms show poor memory-accuracy trade-off in selecting features in high dimensions due to the irreversible collision and accumulation of the stochastic gradient noise in the sketched domain. Here, we develop a second-order feature selection algorithm, called BEAR, which avoids the extra collisions by efficiently storing the second-order stochastic gradients of the celebrated Broyden-Fletcher-Goldfarb-Shannon (BFGS) algorithm in Count Sketch, using a memory cost that grows sublinearly with the size of the feature vector. BEAR reveals an unexplored advantage of second-order optimization for memory-constrained high-dimensional gradient sketching. Our extensive experiments on several real-world data sets from genomics to language processing demonstrate that BEAR requires up to three orders of magnitude less memory space to achieve the same classification accuracy compared to the first-order sketching algorithms with a comparable run time. Our theoretical analysis further proves the global convergence of BEAR with O(1/t) rate in t iterations of the sketched algorithm.
引用
收藏
页码:75 / 92
页数:18
相关论文
共 50 条
  • [21] Ultra High-Dimensional Nonlinear Feature Selection for Big Biological Data
    Yamada, Makoto
    Tang, Jiliang
    Lugo-Martinez, Jose
    Hodzic, Ermin
    Shrestha, Raunak
    Saha, Avishek
    Ouyang, Hua
    Yin, Dawei
    Mamitsuka, Hiroshi
    Sahinalp, Cenk
    Radivojac, Predrag
    Menczer, Filippo
    Chang, Yi
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2018, 30 (07) : 1352 - 1365
  • [22] A New Evolutionary Multitasking Algorithm for High-Dimensional Feature Selection
    Liu, Ping
    Xu, Bangxin
    Xu, Wenwen
    [J]. IEEE ACCESS, 2024, 12 : 89856 - 89872
  • [23] Model Based Screening Embedded Bayesian Variable Selection for Ultra-high Dimensional Settings
    Li, Dongjin
    Dutta, Somak
    Roy, Vivekananda
    [J]. JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2023, 32 (01) : 61 - 73
  • [24] Variable selection for ultra-high dimensional quantile regression with missing data and measurement error
    Bai, Yongxin
    Tian, Maozai
    Tang, Man-Lai
    Lee, Wing-Yan
    [J]. STATISTICAL METHODS IN MEDICAL RESEARCH, 2021, 30 (01) : 129 - 150
  • [25] Semiparametric Bayesian information criterion for model selection in ultra-high dimensional additive models
    Lian, Heng
    [J]. JOURNAL OF MULTIVARIATE ANALYSIS, 2014, 123 : 304 - 310
  • [26] Robust feature screening for ultra-high dimensional right censored data via distance correlation
    Chen, Xiaolin
    Chen, Xiaojing
    Wang, Hong
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2018, 119 : 118 - 138
  • [27] Model-free conditional feature screening for ultra-high dimensional right censored data
    Chen, Xiaolin
    [J]. JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2018, 88 (12) : 2425 - 2446
  • [28] High-Dimensional Real Parameter Clonal Selection Memory Algorithm
    Song, Dan
    Fan, Xiaoping
    [J]. PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON PROGRESS IN INFORMATICS AND COMPUTING (PIC), VOL 1, 2016, : 42 - 46
  • [29] Multiobjective optimization algorithm with dynamic operator selection for feature selection in high-dimensional classification
    Wei, Wenhong
    Xuan, Manlin
    Li, Lingjie
    Lin, Qiuzhen
    Ming, Zhong
    Coello, Carlos A. Coello
    [J]. APPLIED SOFT COMPUTING, 2023, 143
  • [30] An Evolutionary Multitasking Algorithm With Multiple Filtering for High-Dimensional Feature Selection
    Li, Lingjie
    Xuan, Manlin
    Lin, Qiuzhen
    Jiang, Min
    Ming, Zhong
    Tan, Kay Chen
    [J]. IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, 2023, 27 (04) : 802 - 816