Threshold optimization and random undersampling for imbalanced credit card data

被引:6
|
作者
Leevy, Joffrey L. L. [1 ]
Johnson, Justin M. M. [1 ]
Hancock, John [1 ]
Khoshgoftaar, Taghi M. M. [1 ]
机构
[1] Florida Atlantic Univ, 777 Glades Rd, Boca Raton, FL 33431 USA
关键词
Output thresholding; Credit Card Fraud Detection Dataset; Random undersampling; Machine learning; PERFORMANCE; ALGORITHMS;
D O I
10.1186/s40537-023-00738-z
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Output thresholding is well-suited for addressing class imbalance, since the technique does not increase dataset size, run the risk of discarding important instances, or modify an existing learner. Through the use of the Credit Card Fraud Detection Dataset, this study proposes a threshold optimization approach that factors in the constraint True Positive Rate (TPR) >= True Negative Rate (TNR). Our findings indicate that an increase of the Area Under the Precision-Recall Curve (AUPRC) score is associated with an improvement in threshold-based classification scores, while an increase of positive class prior probability causes optimal thresholds to increase. In addition, we discovered that best overall results for the selection of an optimal threshold are obtained without the use of Random Undersampling (RUS). Furthermore, with the exception of AUPRC, we established that the default threshold yields good performance scores at a balanced class ratio. Our evaluation of four threshold optimization techniques, eight threshold-dependent metrics, and two threshold-agnostic metrics defines the uniqueness of this research.
引用
收藏
页数:22
相关论文
共 50 条
  • [1] Threshold optimization and random undersampling for imbalanced credit card data
    Joffrey L. Leevy
    Justin M. Johnson
    John Hancock
    Taghi M. Khoshgoftaar
    Journal of Big Data, 10
  • [2] Hybrid Undersampling and Oversampling for Handling Imbalanced Credit Card Data
    Alamri, Maram
    Ykhlef, Mourad
    IEEE ACCESS, 2024, 12 : 14050 - 14060
  • [3] Classification Using Random Forest on Imbalanced Credit Card Transaction Data
    Aktar, Hafija
    Masud, Md Abdul
    Aunto, Nusrat Jahan
    Sakib, Syed Nazmus
    2021 3RD INTERNATIONAL CONFERENCE ON SUSTAINABLE TECHNOLOGIES FOR INDUSTRY 4.0 (STI), 2021,
  • [4] Imbalanced credit card fraud detection data: A solution based on hybrid neural network and clustering-based undersampling technique
    Huang, Huajie
    Liu, Bo
    Xue, Xiaoyu
    Cao, Jiuxin
    Chen, Xinyi
    APPLIED SOFT COMPUTING, 2024, 154
  • [5] Learning to Undersampling for Class Imbalanced Credit Risk Forecasting
    Chi, Jianfeng
    Zeng, Guanxiong
    Zhong, Qiwei
    Liang, Ting
    Feng, Jinghua
    Xiang, Ao
    Tang, Jiayu
    20TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2020), 2020, : 72 - 81
  • [6] NUS: Noisy-Sample-Removed Undersampling Scheme for Imbalanced Classification and Application to Credit Card Fraud Detection
    Zhu, Honghao
    Zhou, MengChu
    Liu, Guanjun
    Xie, Yu
    Liu, Shijun
    Guo, Cheng
    IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2024, 11 (02) : 1793 - 1804
  • [7] Combining oversampling and undersampling techniques for imbalanced classification: A comparative study using credit card fraudulent transaction dataset
    Shamsudin, Haziqah
    Yusof, Umi Kalsom
    Jayalakshmi, Andal
    Khalid, Mohd Nor Akmal
    2020 IEEE 16TH INTERNATIONAL CONFERENCE ON CONTROL & AUTOMATION (ICCA), 2020, : 803 - 808
  • [8] A Novel Strategy for Mining Highly Imbalanced Data in Credit Card Transactions
    Zareapoor, Masoumeh
    Yang, Jie
    INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2018, 24 (04): : 721 - 727
  • [9] Enhancing credit card fraud detection: highly imbalanced data case
    Dalia Breskuvienė
    Gintautas Dzemyda
    Journal of Big Data, 11 (1)
  • [10] Customized Instance Random Undersampling to Increase Knowledge Management for Multiclass Imbalanced Data Classification
    Tusell-Rey, Claudia C.
    Camacho-Nieto, Oscar
    Yanez-Marquez, Cornelio
    Villuendas-Rey, Yenny
    SUSTAINABILITY, 2022, 14 (21)