Threshold optimization and random undersampling for imbalanced credit card data

被引：6

作者：

Leevy, Joffrey L. L. ^{[1
]}

Johnson, Justin M. M. ^{[1
]}

Hancock, John ^{[1
]}

Khoshgoftaar, Taghi M. M. ^{[1
]}

机构：

[1] Florida Atlantic Univ, 777 Glades Rd, Boca Raton, FL 33431 USA

来源：

JOURNAL OF BIG DATA | 2023年 / 10卷 / 01期

关键词：

Output thresholding; Credit Card Fraud Detection Dataset; Random undersampling; Machine learning; PERFORMANCE; ALGORITHMS;

D O I：

10.1186/s40537-023-00738-z

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Output thresholding is well-suited for addressing class imbalance, since the technique does not increase dataset size, run the risk of discarding important instances, or modify an existing learner. Through the use of the Credit Card Fraud Detection Dataset, this study proposes a threshold optimization approach that factors in the constraint True Positive Rate (TPR) >= True Negative Rate (TNR). Our findings indicate that an increase of the Area Under the Precision-Recall Curve (AUPRC) score is associated with an improvement in threshold-based classification scores, while an increase of positive class prior probability causes optimal thresholds to increase. In addition, we discovered that best overall results for the selection of an optimal threshold are obtained without the use of Random Undersampling (RUS). Furthermore, with the exception of AUPRC, we established that the default threshold yields good performance scores at a balanced class ratio. Our evaluation of four threshold optimization techniques, eight threshold-dependent metrics, and two threshold-agnostic metrics defines the uniqueness of this research.

引用

页数：22

共 50 条

[41] A Membership Probability-Based Undersampling Algorithm for Imbalanced Data
Ahn, Gilseung
Park, You-Jin
Hur, Sun
JOURNAL OF CLASSIFICATION, 2021, 38 (01) : 2 - 15
[42] Real Time Credit Card Fraud Detection on Huge Imbalanced Data using Meta-Classifiers
Kavitha, M.
Suriakala, M.
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INVENTIVE COMPUTING AND INFORMATICS (ICICI 2017), 2017, : 881 - 887
[43] Entropy and Confidence-Based Undersampling Boosting Random Forests for Imbalanced Problems
Wang, Zhe
Cao, Chenjie
Zhu, Yujin
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2020, 31 (12) : 5178 - 5191
[44] Random Forest for Credit Card Fraud Detection
Xuan, Shiyang
Liu, Guanjun
Li, Zhenchuan
Zheng, Lutao
Wang, Shuo
Jiang, Changjun
2018 IEEE 15TH INTERNATIONAL CONFERENCE ON NETWORKING, SENSING AND CONTROL (ICNSC), 2018,
[45] Quantum Autoencoder for Enhanced Fraud Detection in Imbalanced Credit Card Dataset
Huot, Chansreynich
Heng, Sovanmonynuth
Kim, Tae-Kyung
Han, Youngsun
IEEE Access, 2024, 12 : 169671 - 169682
[46] Semi-supervised Learning for Imbalanced Classification of Credit Card Transaction
Salazar, Addisson
Safont, Gonzalo
Vergara, Luis
2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2018,
[47] An Experimental Study With Imbalanced Classification Approaches for Credit Card Fraud Detection
Makki, Sara
Assaghir, Zainab
Taher, Yehia
Haque, Rafiqul
Hacid, Mohand-Said
Zeineddin, Hassan
IEEE ACCESS, 2019, 7 : 93010 - 93022
[48] Using Area Under the Precision Recall Curve to Assess the Effect of Random Undersampling in the Classification of Imbalanced Medicare Big Data
Hancock III, John T.
Khoshgoftaar, Taghi M.
Johnson, Justin M.
INTERNATIONAL JOURNAL OF RELIABILITY QUALITY AND SAFETY ENGINEERING, 2024, 31 (01)
[49] Undersampling with Support Vectors for Multi-Class Imbalanced Data Classification
Krawczyk, Bartosz
Bellinger, Colin
Corizzo, Roberto
Japkowicz, Nathalie
2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
[50] PSU: Particle Stacking Undersampling Method for Highly Imbalanced Big Data
Jeon, Yong-Seok
Lim, Dong-Joon
IEEE ACCESS, 2020, 8 : 131920 - 131927

← 1 2 3 4 5 →