Towards Ultrahigh Dimensional Feature Selection for Big Data

被引:0
|
作者
Tan, Mingkui [1 ]
Tsang, Ivor W. [2 ]
Wang, Li [3 ]
机构
[1] Nanyang Technol Univ, Sch Comp Engn, Singapore 639798, Singapore
[2] Univ Technol Sydney, Ctr Quantum Computat & Intelligent Syst, Broadway, NSW 2007, Australia
[3] Univ Calif San Diego, Dept Math, La Jolla, CA 92093 USA
基金
澳大利亚研究理事会;
关键词
big data; ultrahigh dimensionality; feature selection; nonlinear feature selection; multiple kernel learning; feature generation; MULTIPLE; CLASSIFICATION; OPTIMIZATION; CONVERGENCE; ONLINE; CANCER; LASSO;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we present a new adaptive feature scaling scheme for ultrahigh-dimensional feature selection on Big Data, and then reformulate it as a convex semi-infinite programming (SIP) problem. To address the SIP, we propose an efficient feature generating paradigm. Different from traditional gradient-based approaches that conduct optimization on all input features, the proposed paradigm iteratively activates a group of features, and solves a sequence of multiple kernel learning (MKL) subproblems. To further speed up the training, we propose to solve the MKL subproblems in their primal forms through a modified accelerated proximal gradient approach. Due to such optimization scheme, some efficient cache techniques are also developed. The feature generating paradigm is guaranteed to converge globally under mild conditions, and can achieve lower feature selection bias. Moreover, the proposed method can tackle two challenging tasks in feature selection: 1) group-based feature selection with complex structures, and 2) nonlinear feature selection with explicit feature mappings. Comprehensive experiments on a wide range of synthetic and real-world data sets of tens of million data points with O(10(14)) features demonstrate the competitive performance of the proposed method over state-of-the-art feature selection methods in terms of generalization performance and training efficiency.
引用
收藏
页码:1371 / 1429
页数:59
相关论文
共 50 条
  • [41] Feature selection for high-dimensional data
    Bolón-Canedo V.
    Sánchez-Maroño N.
    Alonso-Betanzos A.
    Progress in Artificial Intelligence, 2016, 5 (2) : 65 - 75
  • [42] Feature selection of ultrahigh-dimensional covariates with survival outcomes:a selective review
    HONG Hyokyoung Grace
    LI Yi
    AppliedMathematics:AJournalofChineseUniversities, 2017, 32 (04) : 379 - 396
  • [43] Feature selection of ultrahigh-dimensional covariates with survival outcomes: a selective review
    Hong, Hyokyoung Grace
    Li, Yi
    APPLIED MATHEMATICS-A JOURNAL OF CHINESE UNIVERSITIES SERIES B, 2017, 32 (04) : 379 - 396
  • [44] Feature selection techniques in the context of big data: taxonomy and analysis
    Abdulwahab, Hudhaifa Mohammed
    Ajitha, S.
    Saif, Mufeed Ahmed Naji
    APPLIED INTELLIGENCE, 2022, 52 (12) : 13568 - 13613
  • [45] Feature selection methods and genomic big data: a systematic review
    Tadist, Khawla
    Najah, Said
    Nikolov, Nikola S.
    Mrabti, Fatiha
    Zahi, Azeddine
    JOURNAL OF BIG DATA, 2019, 6 (01)
  • [46] Estimation of Error Variance in Genomic Selection for Ultrahigh Dimensional Data
    Majumdar, Sayanti Guha
    Rai, Anil
    Mishra, Dwijesh Chandra
    AGRICULTURE-BASEL, 2023, 13 (04):
  • [47] Feature selection of ultrahigh-dimensional covariates with survival outcomes: a selective review
    Hyokyoung Grace Hong
    Yi Li
    Applied Mathematics-A Journal of Chinese Universities, 2017, 32 : 379 - 396
  • [48] A Feature Selection Method for Comparision of Each Concept in Big Data
    Nakanishi, Takafumi
    2015 IEEE/ACIS 14TH INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION SCIENCE (ICIS), 2015, : 229 - 234
  • [49] A greedy feature selection algorithm for Big Data of high dimensionality
    Tsamardinos, Ioannis
    Borboudakis, Giorgos
    Katsogridakis, Pavlos
    Pratikakis, Polyvios
    Christophides, Vassilis
    MACHINE LEARNING, 2019, 108 (02) : 149 - 202
  • [50] Feature Selection in Big Data using Filter Based Techniques
    Srinivas, Sumitra K.
    Kancharla, Gangadhara Rao
    2019 4TH MEC INTERNATIONAL CONFERENCE ON BIG DATA AND SMART CITY (ICBDSC), 2019, : 139 - 145