SpIS: A stochastic approximation approach to minimal subset instance selection

被引:0
|
作者
Yeo, Guo Feng Anders [1 ]
Hudson, Irene [1 ]
Akman, David [1 ]
Chan, Jeffrey [2 ]
机构
[1] RMIT Univ, Sch Sci, Math Sci, 124 La Trobe St, Melbourne, Vic 3000, Australia
[2] RMIT Univ, Sch Sci, Comp Sci, 124 La Trobe St, Melbourne, Vic 3000, Australia
关键词
Instance selection; Dimensionality reduction; Stochastic approximation; Gradient descent optimisation; Training set selection; TRAINING SET SELECTION; SUPPORT VECTOR MACHINES; NUMERICAL-SOLUTION; REDUCTION; ALGORITHMS; REGRESSION; BARZILAI;
D O I
10.1016/j.ins.2024.121738
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Instance selection provides a means to enhance the efficacy and efficiency of machine learning tools when utilised for data mining. This study proposes Stochastic Perturbation Instance Selection (SpIS), a wrapper instance selection algorithm which uses two candidate solutions to traverse the reduction and performance criteria seeking a minimal subset specific to any given machine learning model in conjunction with any corresponding performance metric. The subset selected by SpIS provide a high quality, representative subset of the full data enabling better identfication of insights, whilst providing comparable predictive performance with respect to the full dataset. Across 43 diverse classfication datasets SpIS was evaluated on 6 different wrappers using 5-fold cross validation with respect to reduction rate and prediction accuracy. The mean results across all datasets and wrappers show that SpIS selects 3.10% of the dataset on average with statistically equivalent performance at a 5% level of significance compared to the full training set.
引用
收藏
页数:21
相关论文
共 50 条
  • [31] Conformance Checking Approximation Using Subset Selection and Edit Distance
    Sani, Mohammadreza Fani
    van Zelst, Sebastiaan J.
    van der Aalst, Wil M. P.
    ADVANCED INFORMATION SYSTEMS ENGINEERING, CAISE 2020, 2020, 12127 : 234 - 251
  • [32] Prototype Selection based on Minimal Consistent Subset and Genetic Algorithms
    Kruatrachue, Boontee
    Hongsamart, Marut
    2008 PROCEEDINGS OF SICE ANNUAL CONFERENCE, VOLS 1-7, 2008, : 647 - 651
  • [33] Minimal consistent subset selection as integer nonlinear programming problem
    Kangkan, Kamonnat
    Kruatrachue, Boontee
    2006 INTERNATIONAL SYMPOSIUM ON COMMUNICATIONS AND INFORMATION TECHNOLOGIES,VOLS 1-3, 2006, : 922 - +
  • [34] SIMULTANEOUS SELECTION OF EXTREME POPULATIONS - A SUBSET-SELECTION APPROACH
    MISHRA, SN
    DUDEWICZ, EJ
    BIOMETRICAL JOURNAL, 1987, 29 (04) : 471 - 483
  • [35] SIMULTANEOUS SELECTION OF EXTREME POPULATIONS - A SUBSET-SELECTION APPROACH
    MISHRA, SN
    DUDEWICZ, EJ
    BIOMETRICS, 1983, 39 (03) : 807 - 807
  • [36] A Novel Genetic Algorithm Approach to Simultaneous Feature Selection and Instance Selection
    Albuquerque, Inti Mateus Resende
    Bach Hoai Nguyen
    Xue, Bing
    Zhang, Mengjie
    2020 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2020, : 616 - 623
  • [37] A Bayesian Approach for Subset Selection in Contextual Bandits
    Li, Jialian
    Du, Chao
    Zhu, Jun
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 8384 - 8391
  • [38] A fast metric approach to feature subset selection
    Chan, TYT
    24TH EUROMICRO CONFERENCE - PROCEEDING, VOLS 1 AND 2, 1998, : 733 - 736
  • [39] Towards a Better Feature Subset Selection Approach
    Shiba, Omar A. A.
    PROCEEDINGS OF KNOWLEDGE MANAGEMENT 5TH INTERNATIONAL CONFERENCE 2010, 2010, : 675 - 678
  • [40] A SUBSET SUM APPROACH TO COIL SELECTION FOR SLITTING
    Han, Yune T.
    Chang, Soo Y.
    INTERNATIONAL JOURNAL OF INDUSTRIAL ENGINEERING-THEORY APPLICATIONS AND PRACTICE, 2015, 22 (03): : 343 - 353