BEAR: Sketching BFGS Algorithm for Ultra-High Dimensional Feature Selection in Sublinear Memory

被引:0
|
作者
Aghazadeh, Amirali [1 ]
Gupta, Vipul [1 ]
DeWeese, Alex [1 ]
Koyluoglu, O. Ozan [1 ]
Ramchandran, Kannan [1 ]
机构
[1] Univ Calif Berkeley, Dept Elect Engn & Comp Sci, Berkeley, CA 94720 USA
关键词
Feature selection; sketching; second-order optimization; sublinear memory;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We consider feature selection for applications in machine learning where the dimensionality of the data is so large that it exceeds the working memory of the (local) computing machine. Unfortunately, current large-scale sketching algorithms show poor memory-accuracy trade-off in selecting features in high dimensions due to the irreversible collision and accumulation of the stochastic gradient noise in the sketched domain. Here, we develop a second-order feature selection algorithm, called BEAR, which avoids the extra collisions by efficiently storing the second-order stochastic gradients of the celebrated Broyden-Fletcher-Goldfarb-Shannon (BFGS) algorithm in Count Sketch, using a memory cost that grows sublinearly with the size of the feature vector. BEAR reveals an unexplored advantage of second-order optimization for memory-constrained high-dimensional gradient sketching. Our extensive experiments on several real-world data sets from genomics to language processing demonstrate that BEAR requires up to three orders of magnitude less memory space to achieve the same classification accuracy compared to the first-order sketching algorithms with a comparable run time. Our theoretical analysis further proves the global convergence of BEAR with O(1/t) rate in t iterations of the sketched algorithm.
引用
收藏
页码:75 / 92
页数:18
相关论文
共 50 条
  • [41] Detection algorithm of ultra-high harmonics in distribution networks
    Shen, Xin
    Shu, Hongchun
    Cao, Min
    Qian, Junbing
    Pan, Nan
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2021, 40 (04) : 7795 - 7802
  • [42] Forward Regression for Ultra-High Dimensional Variable Screening
    Wang, Hansheng
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2009, 104 (488) : 1512 - 1524
  • [43] A hybrid algorithm for feature subset selection in high-dimensional datasets using FICA and IWSSr algorithm
    Moradkhani, Mostafa
    Amiri, Ali
    Javaherian, Mohsen
    Safari, Hossein
    [J]. APPLIED SOFT COMPUTING, 2015, 35 : 123 - 135
  • [44] Feature selection algorithm based on optimized genetic algorithm and the application in high-dimensional data processing
    Feng, Guilian
    [J]. PLOS ONE, 2024, 19 (05):
  • [45] A New Online Feature Selection Method for Decision Making Problems with Ultra-High Dimension and Massive Training Data
    Ben Said, Fatma
    Alimi, Adel M.
    [J]. JOURNAL OF INFORMATION ASSURANCE AND SECURITY, 2016, 11 (05): : 293 - 301
  • [46] Dimensional decision covariance colony predation algorithm: global optimization and high-dimensional feature selection
    Xu, Boyang
    Heidari, Ali Asghar
    Cai, Zhennao
    Chen, Huiling
    [J]. ARTIFICIAL INTELLIGENCE REVIEW, 2023, 56 (10) : 11415 - 11471
  • [47] A group evaluation based binary PSO algorithm for feature selection in high dimensional data
    Huda, Ramesh Kumar
    Banka, Haider
    [J]. EVOLUTIONARY INTELLIGENCE, 2021, 14 (04) : 1949 - 1963
  • [48] A fast dual-module hybrid high-dimensional feature selection algorithm
    Yang, Geying
    He, Junjiang
    Lan, Xiaolong
    Li, Tao
    Fang, Wenbo
    [J]. INFORMATION SCIENCES, 2024, 681
  • [49] Feature selection based on the best-path algorithm in high dimensional graphical models
    Riso, Luigi
    Zoia, Maria G.
    Nava, Consuelo R.
    [J]. INFORMATION SCIENCES, 2023, 649
  • [50] RHDSI: A novel dimensionality reduction based algorithm on high dimensional feature selection with interactions
    Jain, Rahi
    Xu, Wei
    [J]. INFORMATION SCIENCES, 2021, 574 : 590 - 605