SpIS: A stochastic approximation approach to minimal subset instance selection

被引:0
|
作者
Yeo, Guo Feng Anders [1 ]
Hudson, Irene [1 ]
Akman, David [1 ]
Chan, Jeffrey [2 ]
机构
[1] RMIT Univ, Sch Sci, Math Sci, 124 La Trobe St, Melbourne, Vic 3000, Australia
[2] RMIT Univ, Sch Sci, Comp Sci, 124 La Trobe St, Melbourne, Vic 3000, Australia
关键词
Instance selection; Dimensionality reduction; Stochastic approximation; Gradient descent optimisation; Training set selection; TRAINING SET SELECTION; SUPPORT VECTOR MACHINES; NUMERICAL-SOLUTION; REDUCTION; ALGORITHMS; REGRESSION; BARZILAI;
D O I
10.1016/j.ins.2024.121738
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Instance selection provides a means to enhance the efficacy and efficiency of machine learning tools when utilised for data mining. This study proposes Stochastic Perturbation Instance Selection (SpIS), a wrapper instance selection algorithm which uses two candidate solutions to traverse the reduction and performance criteria seeking a minimal subset specific to any given machine learning model in conjunction with any corresponding performance metric. The subset selected by SpIS provide a high quality, representative subset of the full data enabling better identfication of insights, whilst providing comparable predictive performance with respect to the full dataset. Across 43 diverse classfication datasets SpIS was evaluated on 6 different wrappers using 5-fold cross validation with respect to reduction rate and prediction accuracy. The mean results across all datasets and wrappers show that SpIS selects 3.10% of the dataset on average with statistically equivalent performance at a 5% level of significance compared to the full training set.
引用
收藏
页数:21
相关论文
共 50 条
  • [1] A stochastic approximation approach to fixed instance selection
    Yeo G.F.A.
    Akman D.
    Hudson I.
    Chan J.
    Information Sciences, 2023, 628 : 558 - 579
  • [2] Approximation Guarantees of Stochastic Greedy Algorithms for Subset Selection
    Qian, Chao
    Yu, Yang
    Tang, Ke
    PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 1478 - 1484
  • [3] Active Learning With Optimal Instance Subset Selection
    Fu, Yifan
    Zhu, Xingquan
    Elmagarmid, Ahmed K.
    IEEE TRANSACTIONS ON CYBERNETICS, 2013, 43 (02) : 464 - 475
  • [4] Ensembles of instance selection methods based on feature subset
    Blachnik, Marcin
    KNOWLEDGE-BASED AND INTELLIGENT INFORMATION & ENGINEERING SYSTEMS 18TH ANNUAL CONFERENCE, KES-2014, 2014, 35 : 388 - 396
  • [5] An Instance Selection Approach to Multiple Instance Learning
    Fu, Zhouyu
    Robles-Kelly, Antonio
    CVPR: 2009 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOLS 1-4, 2009, : 911 - +
  • [6] An Efficient Approach for Instance Selection
    Carbonera, Joel Luis
    BIG DATA ANALYTICS AND KNOWLEDGE DISCOVERY, DAWAK 2017, 2017, 10440 : 228 - 243
  • [7] Conformity-based source subset selection for instance transfer
    Zhou, Shuang
    Smirnov, Evgueni
    Schoenmakers, Gijs
    Peeters, Ralf
    NEUROCOMPUTING, 2017, 258 : 41 - 51
  • [8] MINIMAL ERROR ENTROPY STOCHASTIC APPROXIMATION
    KALATA, P
    PRIEMER, R
    INTERNATIONAL JOURNAL OF SYSTEMS SCIENCE, 1974, 5 (09) : 895 - 906
  • [9] Approximation schemes for a class of subset selection problems
    Prubs, Kirk
    Woeginger, Gerhard J.
    THEORETICAL COMPUTER SCIENCE, 2007, 382 (02) : 151 - 156
  • [10] Approximation schemes for a class of subset selection problems
    Pruhs, K
    Woeginger, GJ
    LATIN 2004: THEORETICAL INFORMATICS, 2004, 2976 : 203 - 211