Towards Ultrahigh Dimensional Feature Selection for Big Data

被引:0
|
作者
Tan, Mingkui [1 ]
Tsang, Ivor W. [2 ]
Wang, Li [3 ]
机构
[1] Nanyang Technol Univ, Sch Comp Engn, Singapore 639798, Singapore
[2] Univ Technol Sydney, Ctr Quantum Computat & Intelligent Syst, Broadway, NSW 2007, Australia
[3] Univ Calif San Diego, Dept Math, La Jolla, CA 92093 USA
基金
澳大利亚研究理事会;
关键词
big data; ultrahigh dimensionality; feature selection; nonlinear feature selection; multiple kernel learning; feature generation; MULTIPLE; CLASSIFICATION; OPTIMIZATION; CONVERGENCE; ONLINE; CANCER; LASSO;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we present a new adaptive feature scaling scheme for ultrahigh-dimensional feature selection on Big Data, and then reformulate it as a convex semi-infinite programming (SIP) problem. To address the SIP, we propose an efficient feature generating paradigm. Different from traditional gradient-based approaches that conduct optimization on all input features, the proposed paradigm iteratively activates a group of features, and solves a sequence of multiple kernel learning (MKL) subproblems. To further speed up the training, we propose to solve the MKL subproblems in their primal forms through a modified accelerated proximal gradient approach. Due to such optimization scheme, some efficient cache techniques are also developed. The feature generating paradigm is guaranteed to converge globally under mild conditions, and can achieve lower feature selection bias. Moreover, the proposed method can tackle two challenging tasks in feature selection: 1) group-based feature selection with complex structures, and 2) nonlinear feature selection with explicit feature mappings. Comprehensive experiments on a wide range of synthetic and real-world data sets of tens of million data points with O(10(14)) features demonstrate the competitive performance of the proposed method over state-of-the-art feature selection methods in terms of generalization performance and training efficiency.
引用
收藏
页码:1371 / 1429
页数:59
相关论文
共 50 条
  • [31] A selective overview of feature screening for ultrahigh-dimensional data
    JingYuan Liu
    Wei Zhong
    RunZe Li
    Science China Mathematics, 2015, 58 : 1 - 22
  • [32] A selective overview of feature screening for ultrahigh-dimensional data
    Liu JingYuan
    Zhong Wei
    Li RunZe
    SCIENCE CHINA-MATHEMATICS, 2015, 58 (10) : 2033 - 2054
  • [33] Enhancing Big Data Feature Selection Using a Hybrid Correlation-Based Feature Selection
    Mohamad, Masurah
    Selamat, Ali
    Krejcar, Ondrej
    Crespo, Ruben Gonzalez
    Herrera-Viedma, Enrique
    Fujita, Hamido
    ELECTRONICS, 2021, 10 (23)
  • [34] Hybridization of feature selection and feature weighting for high dimensional data
    Singh, Dalwinder
    Singh, Birmohan
    APPLIED INTELLIGENCE, 2019, 49 (04) : 1580 - 1596
  • [35] Data Feature Selection Methods on Distributed Big Data Processing Platforms
    Catalkaya, Mehmet Burak
    Kalipsiz, Oya
    Aktas, Mehmet S.
    Turgut, Umut Orcun
    2018 3RD INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ENGINEERING (UBMK), 2018, : 133 - 138
  • [36] Hybridization of feature selection and feature weighting for high dimensional data
    Dalwinder Singh
    Birmohan Singh
    Applied Intelligence, 2019, 49 : 1580 - 1596
  • [37] Feature Selection in High Dimensional Data: A Review
    Silaich, Sarita
    Gupta, Suneet
    THIRD CONGRESS ON INTELLIGENT SYSTEMS, CIS 2022, VOL 1, 2023, 608 : 703 - 717
  • [38] Feature Selection for Clustering on High Dimensional Data
    Zeng, Hong
    Cheung, Yiu-ming
    PRICAI 2008: TRENDS IN ARTIFICIAL INTELLIGENCE, 2008, 5351 : 913 - 922
  • [39] A Review On Feature Selection For High Dimensional Data
    Anukrishna, P. R.
    Paul, Vince
    PROCEEDINGS OF THE 2017 INTERNATIONAL CONFERENCE ON INVENTIVE SYSTEMS AND CONTROL (ICISC 2017), 2017, : 519 - 522
  • [40] Feature selection for high-dimensional data
    Destrero A.
    Mosci S.
    De Mol C.
    Verri A.
    Odone F.
    Computational Management Science, 2009, 6 (1) : 25 - 40