Group Learning for High-Dimensional Sparse Data

被引:0
|
作者
Cherkassky, Vladimir [1 ,2 ]
Chen, Hsiang-Han [2 ]
Shiao, Han-Tai [1 ]
机构
[1] Univ Minnesota, Dept Elect & Comp Engn, Minneapolis, MN 55455 USA
[2] Univ Minnesota, Bioinformat & Computat Biol, Minneapolis, MN 55455 USA
关键词
binary classification; digit recognition; feature selection; histogram of projections; Group Learning; iEEG; seizure prediction; SVM; unbalanced data;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We describe new methodology for supervised learning with sparse data, i.e., when the number of input features is (much) larger than the number of training samples (n). Under the proposed approach, all available (d) input features are split into several (t) subsets, effectively resulting in a larger number (t*n) of labeled training samples in lower-dimensional input space (of dimensionality d/t). This (modified) training data is then used to estimate a classifier for making predictions in lower-dimensional space. In this paper, standard SVM is used for training a classifier. During testing (prediction), a group of t predictions made by SVM classifier needs to be combined via intelligent post-processing rules, in order to make a prediction for a test input (in the original d-dimensional space). The novelty of our approach is in the design and empirical validation of these post-processing rules under Group Learning setting. We demonstrate that such post-processing rules effectively reflect general (common-sense) a priori knowledge (about application data). Specifically, we propose two different post-processing schemes and demonstrate their effectiveness for two real-life application domains, i.e., handwritten digit recognition and seizure prediction from iEEG signal. These empirical results show superior performance of the Group Learning approach for sparse data, under both balanced and unbalanced classification settings
引用
收藏
页数:10
相关论文
共 50 条
  • [41] Sparse boosting for high-dimensional survival data with varying coefficients
    Yue, Mu
    Li, Jialiang
    Ma, Shuangge
    [J]. STATISTICS IN MEDICINE, 2018, 37 (05) : 789 - 800
  • [42] Market segmentation using high-dimensional sparse consumers data
    Zhou, Jian
    Zhai, Linli
    Pantelous, Athanasios A.
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2020, 145
  • [43] Sparse representation approaches for the classification of high-dimensional biological data
    Li, Yifeng
    Ngom, Alioune
    [J]. BMC SYSTEMS BIOLOGY, 2013, 7
  • [44] CLASSIFICATION OF HIGH-DIMENSIONAL DATA USING THE SPARSE MATRIX TRANSFORM
    Bachega, Leonardo R. |
    Bouman, Charles A.
    [J]. 2010 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, 2010, : 265 - 268
  • [45] A sparse factor model for clustering high-dimensional longitudinal data
    Lu, Zihang
    Chandra, Noirrit Kiran
    [J]. STATISTICS IN MEDICINE, 2024, 43 (19) : 3633 - 3648
  • [46] MINIMAX BOUNDS FOR SPARSE PCA WITH NOISY HIGH-DIMENSIONAL DATA
    Birnbaum, Aharon
    Johnstone, Iain M.
    Nadler, Boaz
    Paul, Debashis
    [J]. ANNALS OF STATISTICS, 2013, 41 (03): : 1055 - 1084
  • [47] A scalable sparse Cholesky based approach for learning high-dimensional covariance matrices in ordered data
    Kshitij Khare
    Sang-Yun Oh
    Syed Rahman
    Bala Rajaratnam
    [J]. Machine Learning, 2019, 108 : 2061 - 2086
  • [48] A Sparse Structure Learning Algorithm for Gaussian Bayesian Network Identification from High-Dimensional Data
    Huang, Shuai
    Li, Jing
    Ye, Jieping
    Fleisher, Adam
    Chen, Kewei
    Wu, Teresa
    Reiman, Eric
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (06) : 1328 - 1342
  • [49] Asynchronous Distributed ADMM for Learning with Large-Scale and High-Dimensional Sparse Data Set
    Wang, Dongxia
    Lei, Yongmei
    [J]. ADVANCED HYBRID INFORMATION PROCESSING, ADHIP 2019, PT II, 2019, 302 : 259 - 274
  • [50] SPARSE NULL SPACE BASIS PURSUIT AND ANALYSIS DICTIONARY LEARNING FOR HIGH-DIMENSIONAL DATA ANALYSIS
    Bian, Xiao
    Krim, Hamid
    Bronstein, Alex
    Dai, Liyi
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 3781 - 3785