A Comparative Study of the Use of Stratified Cross-Validation and Distribution-Balanced Stratified Cross-Validation in Imbalanced Learning

被引：34

作者：

Szeghalmy, Szilvia ^{[1
]}

Fazekas, Attila ^{[1
]}

机构：

[1] Univ Debrecen, Fac Informat, H-4028 Debrecen, Hungary

来源：

SENSORS | 2023年 / 23卷 / 04期

关键词：

imbalanced learning; cross validation; SCV; DOB-SCV; SMOTE; CLASSIFICATION; RECOGNITION;

D O I：

10.3390/s23042333

中图分类号：

O65 [分析化学];

学科分类号：

070302 ; 081704 ;

摘要：

Nowadays, the solution to many practical problems relies on machine learning tools. However, compiling the appropriate training data set for real-world classification problems is challenging because collecting the right amount of data for each class is often difficult or even impossible. In such cases, we can easily face the problem of imbalanced learning. There are many methods in the literature for solving the imbalanced learning problem, so it has become a serious question how to compare the performance of the imbalanced learning methods. Inadequate validation techniques can provide misleading results (e.g., due to data shift), which leads to the development of methods designed for imbalanced data sets, such as stratified cross-validation (SCV) and distribution optimally balanced SCV (DOB-SCV). Previous studies have shown that higher classification performance scores (AUC) can be achieved on imbalanced data sets using DOB-SCV instead of SCV. We investigated the effect of the oversamplers on this difference. The study was conducted on 420 data sets, involving several sampling methods and the DTree, kNN, SVM, and MLP classifiers. We point out that DOB-SCV often provides a little higher F1 and AUC values for classification combined with sampling. However, the results also prove that the selection of the sampler-classifier pair is more important for the classification performance than the choice between the DOB-SCV and the SCV techniques.

引用

页数：27

共 50 条

[1] Distribution-balanced stratified cross-validation for accuracy estimation
Zeng, XC
Martinez, TR
JOURNAL OF EXPERIMENTAL & THEORETICAL ARTIFICIAL INTELLIGENCE, 2000, 12 (01) : 1 - 12
[2] Stratified Cross-Validation on Multiple Columns
Motl, Jan
Kordik, Pavel
2021 IEEE 33RD INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2021), 2021, : 26 - 31
[3] Cross-validation Strategies for Balanced and Imbalanced Datasets
Fontanari, Thomas
Froes, Tiago Comassetto
Recamonde-Mendoza, Mariana
INTELLIGENT SYSTEMS, PT I, 2022, 13653 : 626 - 640
[4] Fold-stratified cross-validation for unbiased and privacy-preserving federated learning
Bey, Romain
Goussault, Romain
Grolleau, Francois
Benchoufi, Mehdi
Porcher, Raphael
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2020, 27 (08) : 1244 - 1251
[5] Cross-validation is safe to use
King, Ross D.
Orhobor, Oghenejokpeme I.
Taylor, Charles C.
NATURE MACHINE INTELLIGENCE, 2021, 3 (04) : 276 - 276
[6] Cross-validation is safe to use
Ross D. King
Oghenejokpeme I. Orhobor
Charles C. Taylor
Nature Machine Intelligence, 2021, 3 : 276 - 276
[7] A COMPARATIVE-STUDY OF ORDINARY CROSS-VALIDATION, NU-FOLD CROSS-VALIDATION AND THE REPEATED LEARNING-TESTING METHODS
BURMAN, P
BIOMETRIKA, 1989, 76 (03) : 503 - 514
[8] METRIC LEARNING VIA CROSS-VALIDATION
Dai, Linlin
Chen, Kani
Li, Gang
Lin, Yuanyuan
STATISTICA SINICA, 2022, 32 (03) : 1701 - 1721
[9] Fast Cross-Validation for Incremental Learning
Joulani, Pooria
Gyorgy, Andras
Szepesvari, Csaba
PROCEEDINGS OF THE TWENTY-FOURTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI), 2015, : 3597 - 3604
[10] Fast Cross-Validation
Liu, Yong
Lin, Hailun
Ding, Lizhong
Wang, Weiping
Liao, Shizhong
PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 2497 - 2503

← 1 2 3 4 5 →