Threshold optimization and random undersampling for imbalanced credit card data

被引:6
|
作者
Leevy, Joffrey L. L. [1 ]
Johnson, Justin M. M. [1 ]
Hancock, John [1 ]
Khoshgoftaar, Taghi M. M. [1 ]
机构
[1] Florida Atlantic Univ, 777 Glades Rd, Boca Raton, FL 33431 USA
关键词
Output thresholding; Credit Card Fraud Detection Dataset; Random undersampling; Machine learning; PERFORMANCE; ALGORITHMS;
D O I
10.1186/s40537-023-00738-z
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Output thresholding is well-suited for addressing class imbalance, since the technique does not increase dataset size, run the risk of discarding important instances, or modify an existing learner. Through the use of the Credit Card Fraud Detection Dataset, this study proposes a threshold optimization approach that factors in the constraint True Positive Rate (TPR) >= True Negative Rate (TNR). Our findings indicate that an increase of the Area Under the Precision-Recall Curve (AUPRC) score is associated with an improvement in threshold-based classification scores, while an increase of positive class prior probability causes optimal thresholds to increase. In addition, we discovered that best overall results for the selection of an optimal threshold are obtained without the use of Random Undersampling (RUS). Furthermore, with the exception of AUPRC, we established that the default threshold yields good performance scores at a balanced class ratio. Our evaluation of four threshold optimization techniques, eight threshold-dependent metrics, and two threshold-agnostic metrics defines the uniqueness of this research.
引用
收藏
页数:22
相关论文
共 50 条
  • [41] A Membership Probability-Based Undersampling Algorithm for Imbalanced Data
    Ahn, Gilseung
    Park, You-Jin
    Hur, Sun
    JOURNAL OF CLASSIFICATION, 2021, 38 (01) : 2 - 15
  • [42] Real Time Credit Card Fraud Detection on Huge Imbalanced Data using Meta-Classifiers
    Kavitha, M.
    Suriakala, M.
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INVENTIVE COMPUTING AND INFORMATICS (ICICI 2017), 2017, : 881 - 887
  • [43] Entropy and Confidence-Based Undersampling Boosting Random Forests for Imbalanced Problems
    Wang, Zhe
    Cao, Chenjie
    Zhu, Yujin
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2020, 31 (12) : 5178 - 5191
  • [44] Random Forest for Credit Card Fraud Detection
    Xuan, Shiyang
    Liu, Guanjun
    Li, Zhenchuan
    Zheng, Lutao
    Wang, Shuo
    Jiang, Changjun
    2018 IEEE 15TH INTERNATIONAL CONFERENCE ON NETWORKING, SENSING AND CONTROL (ICNSC), 2018,
  • [45] Quantum Autoencoder for Enhanced Fraud Detection in Imbalanced Credit Card Dataset
    Huot, Chansreynich
    Heng, Sovanmonynuth
    Kim, Tae-Kyung
    Han, Youngsun
    IEEE Access, 2024, 12 : 169671 - 169682
  • [46] Semi-supervised Learning for Imbalanced Classification of Credit Card Transaction
    Salazar, Addisson
    Safont, Gonzalo
    Vergara, Luis
    2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2018,
  • [47] An Experimental Study With Imbalanced Classification Approaches for Credit Card Fraud Detection
    Makki, Sara
    Assaghir, Zainab
    Taher, Yehia
    Haque, Rafiqul
    Hacid, Mohand-Said
    Zeineddin, Hassan
    IEEE ACCESS, 2019, 7 : 93010 - 93022
  • [48] Using Area Under the Precision Recall Curve to Assess the Effect of Random Undersampling in the Classification of Imbalanced Medicare Big Data
    Hancock III, John T.
    Khoshgoftaar, Taghi M.
    Johnson, Justin M.
    INTERNATIONAL JOURNAL OF RELIABILITY QUALITY AND SAFETY ENGINEERING, 2024, 31 (01)
  • [49] Undersampling with Support Vectors for Multi-Class Imbalanced Data Classification
    Krawczyk, Bartosz
    Bellinger, Colin
    Corizzo, Roberto
    Japkowicz, Nathalie
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [50] PSU: Particle Stacking Undersampling Method for Highly Imbalanced Big Data
    Jeon, Yong-Seok
    Lim, Dong-Joon
    IEEE ACCESS, 2020, 8 : 131920 - 131927