Enhanced Cross-Validation Methods Leveraging Clustering Techniques

被引：0

作者：

Yucelbas, Cuneyt ^{[1
]}

Yucelbas, Sule ^{[2
]}

机构：

[1] Tarsus Univ, Dept Elect & Automat, TR-33400 Mersin, Turkiye

[2] Tarsus Univ, Comp Engn Dept, TR-33400 Mersin, Turkiye

来源：

TRAITEMENT DU SIGNAL | 2023年 / 40卷 / 06期

关键词：

large-scale classification; cross-validation methodology; k-means; k-medoids; clustering techniques; CLASSIFIERS; SELECTION;

D O I：

10.18280/ts.400626

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The efficacy of emerging and established learning algorithms warrants scrutiny. This examination is intrinsically linked to the results of classification performance. The primary determinant influencing these results is the distribution of the training and test data presented to the algorithms. Existing literature frequently employs standard and stratified (S-CV and St-CV) k-fold cross-validation methods for the creation of training and test data for classification tasks. In the S-CV method, training and test groups are formed via random data distribution, potentially undermining the reliability of performance results calculated post-classification. This study introduces innovative cross-validation strategies based on k -means and k-medoids clustering to address this challenge. These strategies are designed to tackle issues emerging from random data distribution. The proposed methods autonomously determine the number of clusters and folds. Initially, the number of clusters is established via Silhouette analysis, followed by identifying the number of folds according to the data volume within these clusters. An additional aim of this study is to minimize the standard deviation (Std) values between the folds. Particularly in classifying large datasets, the minimized Std negates the need to present each fold to the system, thereby reducing time expenditure and system congestion/fatigue. Analyses were carried out on several large-scale datasets to demonstrate the superiority of these new CV methods over the S-CV and St-CV techniques. The findings revealed superior performance results for the novel strategies. For instance, while the minimum Std value between folds was 0.022, the maximum accuracy rate achieved was approximately 100%. Owing to the proposed methods, the discrepancy between the performance outputs of each fold and the overall average is statistically minimized. The randomness in creating the training/test groups, which has been previously identified as a negative contributing factor to this discrepancy, has been significantly reduced. Hence, this study is anticipated to fill a critical and substantial gap in the existing literature concerning the formation of training/test groups in various classification problems and the statistical accuracy of performance results.

引用

页码：2649 / 2660

页数：12

共 50 条

[31] Cross-Validation of Hybrid-Electric Aircraft Sizing Methods
Finger, D. Felix
de Vries, Reynard
Vos, Roelof
Braun, Carsten
Bil, Cees
JOURNAL OF AIRCRAFT, 2022, 59 (03): : 742 - 760
[32] Cross-Validation Without Doing Cross-Validation in Genome-Enabled Prediction
Gianola, Daniel
Schoen, Chris-Carolin
G3-GENES GENOMES GENETICS, 2016, 6 (10): : 3107 - 3128
[33] Cross-validation of component models: A critical look at current methods
Bro, R.
Kjeldahl, K.
Smilde, A. K.
Kiers, H. A. L.
ANALYTICAL AND BIOANALYTICAL CHEMISTRY, 2008, 390 (05) : 1241 - 1251
[34] Cross-validation of component models: A critical look at current methods
R. Bro
K. Kjeldahl
A. K. Smilde
H. A. L. Kiers
Analytical and Bioanalytical Chemistry, 2008, 390 : 1241 - 1251
[35] MODEL STRUCTURE SELECTION FOR MULTIVARIABLE SYSTEMS BY CROSS-VALIDATION METHODS
JANSSEN, P
STOICA, P
SODERSTROM, T
EYKHOFF, P
INTERNATIONAL JOURNAL OF CONTROL, 1988, 47 (06) : 1737 - 1758
[36] Cross-validation of quality-adjustment methods for price indexes
Adams, Brian
Klayman, Alexander
MONTHLY LABOR REVIEW, 2018,
[37] Analysis of cross-validation methods for robust retrieval of biophysical parameters
Perez-Planells, Ll
Delegido, J.
Rivera-Caicedo, J. P.
Verrelst, J.
REVISTA DE TELEDETECCION, 2015, (44): : 55 - 65
[38] Cross-validation is dead. Long live cross-validation! Model validation based on resampling
Knut Baumann
Journal of Cheminformatics, 2 (Suppl 1)
[39] Cross-validation techniques for n-tuple based neural networks
Linneberg, C
Jorgensen, TM
NINTH WORKSHOP ON VIRTUAL INTELLIGENCE/DYNAMIC NEURAL NETWORKS: ACADEMIC/INDUSTRIAL/NASA/DEFENSE TECHNICAL INTERCHANGE AND TUTORIALS, 1999, 3728 : 266 - 277
[40] COMPARISON AND CROSS-VALIDATION OF OPTICAL TECHNIQUES IN DIFFERENT SWIRL SPRAY REGIMES
Lee, Joshua
Basu, Saptarshi
Kumar, Ranganathan
ATOMIZATION AND SPRAYS, 2013, 23 (08) : 697 - 724

← 1 2 3 4 5 →