SpIS: A stochastic approximation approach to minimal subset instance selection

被引:0
|
作者
Yeo, Guo Feng Anders [1 ]
Hudson, Irene [1 ]
Akman, David [1 ]
Chan, Jeffrey [2 ]
机构
[1] RMIT Univ, Sch Sci, Math Sci, 124 La Trobe St, Melbourne, Vic 3000, Australia
[2] RMIT Univ, Sch Sci, Comp Sci, 124 La Trobe St, Melbourne, Vic 3000, Australia
关键词
Instance selection; Dimensionality reduction; Stochastic approximation; Gradient descent optimisation; Training set selection; TRAINING SET SELECTION; SUPPORT VECTOR MACHINES; NUMERICAL-SOLUTION; REDUCTION; ALGORITHMS; REGRESSION; BARZILAI;
D O I
10.1016/j.ins.2024.121738
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Instance selection provides a means to enhance the efficacy and efficiency of machine learning tools when utilised for data mining. This study proposes Stochastic Perturbation Instance Selection (SpIS), a wrapper instance selection algorithm which uses two candidate solutions to traverse the reduction and performance criteria seeking a minimal subset specific to any given machine learning model in conjunction with any corresponding performance metric. The subset selected by SpIS provide a high quality, representative subset of the full data enabling better identfication of insights, whilst providing comparable predictive performance with respect to the full dataset. Across 43 diverse classfication datasets SpIS was evaluated on 6 different wrappers using 5-fold cross validation with respect to reduction rate and prediction accuracy. The mean results across all datasets and wrappers show that SpIS selects 3.10% of the dataset on average with statistically equivalent performance at a 5% level of significance compared to the full training set.
引用
收藏
页数:21
相关论文
共 50 条
  • [41] A Splicing Approach to Best Subset of Groups Selection
    Zhang, Yanhang
    Zhu, Junxian
    Zhu, Jin
    Wang, Xueqin
    INFORMS JOURNAL ON COMPUTING, 2023, 35 (01) : 104 - 119
  • [42] Approximation of SDEs: a stochastic sewing approach
    Butkovsky, Oleg
    Dareiotis, Konstantinos
    Gerencser, Mate
    PROBABILITY THEORY AND RELATED FIELDS, 2021, 181 (04) : 975 - 1034
  • [43] Approximation of SDEs: a stochastic sewing approach
    Oleg Butkovsky
    Konstantinos Dareiotis
    Máté Gerencsér
    Probability Theory and Related Fields, 2021, 181 : 975 - 1034
  • [44] STOCHASTIC APPROXIMATION APPROACH TO A DISCRIMINATION PROBLEM
    HAMILTON, MA
    ANNALS OF MATHEMATICAL STATISTICS, 1972, 43 (04): : 1096 - &
  • [45] A scalable approach to simultaneous evolutionary instance and feature selection
    Garcia-Pedrajas, Nicolas
    de Haro-Garcia, Aida
    Perez-Rodriguez, Javier
    INFORMATION SCIENCES, 2013, 228 : 150 - 174
  • [46] A Global Density-based Approach for Instance Selection
    Carbonera, Joel Luis
    PROCEEDINGS OF THE 23RD INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS (ICEIS 2021), VOL 1, 2021, : 402 - 409
  • [47] Towards high dimensional instance selection: An evolutionary approach
    Tsai, Chih-Fong
    Chen, Zong-Yao
    DECISION SUPPORT SYSTEMS, 2014, 61 : 79 - 92
  • [48] A novel density-based approach for instance selection
    Carbonera, Joel Luis
    Abel, Mara
    2016 IEEE 28TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2016), 2016, : 549 - 556
  • [49] A new approach for instance selection: Algorithms, evaluation, and comparisons
    Malhat, Mohamed
    El Menshawy, Mohamed
    Mousa, Hamdy
    El Sisi, Ashraf
    EXPERT SYSTEMS WITH APPLICATIONS, 2020, 149
  • [50] New Subset Selection Algorithms for Low Rank Approximation: Offline and Online
    Woodruff, David R.
    Yasuda, Taisuke
    PROCEEDINGS OF THE 55TH ANNUAL ACM SYMPOSIUM ON THEORY OF COMPUTING, STOC 2023, 2023, : 1802 - 1813