SpIS: A stochastic approximation approach to minimal subset instance selection

被引:0
|
作者
Yeo, Guo Feng Anders [1 ]
Hudson, Irene [1 ]
Akman, David [1 ]
Chan, Jeffrey [2 ]
机构
[1] RMIT Univ, Sch Sci, Math Sci, 124 La Trobe St, Melbourne, Vic 3000, Australia
[2] RMIT Univ, Sch Sci, Comp Sci, 124 La Trobe St, Melbourne, Vic 3000, Australia
关键词
Instance selection; Dimensionality reduction; Stochastic approximation; Gradient descent optimisation; Training set selection; TRAINING SET SELECTION; SUPPORT VECTOR MACHINES; NUMERICAL-SOLUTION; REDUCTION; ALGORITHMS; REGRESSION; BARZILAI;
D O I
10.1016/j.ins.2024.121738
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Instance selection provides a means to enhance the efficacy and efficiency of machine learning tools when utilised for data mining. This study proposes Stochastic Perturbation Instance Selection (SpIS), a wrapper instance selection algorithm which uses two candidate solutions to traverse the reduction and performance criteria seeking a minimal subset specific to any given machine learning model in conjunction with any corresponding performance metric. The subset selected by SpIS provide a high quality, representative subset of the full data enabling better identfication of insights, whilst providing comparable predictive performance with respect to the full dataset. Across 43 diverse classfication datasets SpIS was evaluated on 6 different wrappers using 5-fold cross validation with respect to reduction rate and prediction accuracy. The mean results across all datasets and wrappers show that SpIS selects 3.10% of the dataset on average with statistically equivalent performance at a 5% level of significance compared to the full training set.
引用
收藏
页数:21
相关论文
共 50 条
  • [21] A hybrid feature selection method based on instance learning and cooperative subset search
    Ben Brahim, Afef
    Limam, Mohamed
    PATTERN RECOGNITION LETTERS, 2016, 69 : 28 - 34
  • [22] A new approach to feature subset selection
    Liu, DZ
    Feng, ZJ
    Wang, XZ
    PROCEEDINGS OF THE 2004 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2004, : 1822 - 1825
  • [23] RESTRICTED SUBSET SELECTION APPROACH TO RANKING AND SELECTION PROBLEMS
    SANTNER, TJ
    ANNALS OF STATISTICS, 1975, 3 (02): : 334 - 349
  • [24] Evaluation of Instance-Based Feature Subset Selection Algorithm for Maintainability Prediction
    Gupta, Kanika
    Chug, Anuradha
    2017 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2017, : 1482 - 1487
  • [25] ROBUST STOCHASTIC APPROXIMATION APPROACH TO STOCHASTIC PROGRAMMING
    Nemirovski, A.
    Juditsky, A.
    Lan, G.
    Shapiro, A.
    SIAM JOURNAL ON OPTIMIZATION, 2009, 19 (04) : 1574 - 1609
  • [26] Optimal and instance-dependent guarantees for Markovian linear stochastic approximation
    Mou, Wenlong
    Pananjady, Ashwin
    Wainwright, Martin J.
    Bartlett, Peter L.
    CONFERENCE ON LEARNING THEORY, VOL 178, 2022, 178
  • [27] Approximate Submodularity and its Applications: Subset Selection, Sparse Approximation and Dictionary Selection
    Das, Abhimanyu
    Kempe, David
    JOURNAL OF MACHINE LEARNING RESEARCH, 2018, 19
  • [28] A density-based approach for instance selection
    Carbonera, Joel Luis
    Abel, Mara
    2015 IEEE 27TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2015), 2015, : 768 - 774
  • [29] An attraction-based approach for instance selection
    Carbonera, Joel Luis
    Abel, Mara
    2020 IEEE 32ND INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI), 2020, : 1053 - 1058
  • [30] A novel approach for integrating feature and instance selection
    De Souza, Jerffeson Teixeira
    Ferreira Do Carmo, Rafael Augusto
    Lima De Campos, Gustavo Augusto
    PROCEEDINGS OF 2008 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2008, : 374 - 379