Customer churn prediction in imbalanced datasets with resampling methods: A comparative study

被引:3
|
作者
Haddadi, Seyed Jamal [1 ,2 ,3 ]
Farshidvard, Aida [4 ]
Silva, Fillipe dos Santos [1 ,2 ,3 ]
dos Reis, Julio Cesar [1 ,3 ]
Reis, Marcelo da Silva [1 ,2 ,3 ]
机构
[1] Hb Inteligencia Artificial & Arquiteturas Cognit H, Campinas, Brazil
[2] Lab Inteligencia Artificial & Inferencia Dados Com, UNICAMP, Campinas, Brazil
[3] Inst Computacao, UNICAMP, Campinas, Brazil
[4] Amirkabir Univ Technol, Dept Math & Comp Sci, Tehran, Iran
关键词
Customer churn prediction; Two-phase resampling; SMOTE; Template; ADASYN; LSTM networks; SMOTE;
D O I
10.1016/j.eswa.2023.123086
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Customer churn presents a significant challenge for businesses in the era of subscription -based services because retaining customers plays a key role in sustained growth. Existing techniques for automatic churn prediction suffer from a primary challenge inherent in datasets as their significant disproportion between majority and minority classes, which may result in model bias favoring the dominant class. This study presents a comprehensive analysis of Customer Churn Prediction (CCP) with a focus on three public highly imbalanced datasets. The explored datasets span diverse business sectors, including telecommunications, online retail, and banking. We employ a comparative analysis regarding fourteen distinct classification methods considering popular resampling strategies, namely the Synthetic Minority Over -sampling Technique (SMOTE) and the Adaptive Synthetic Sampling (ADASYN). In particular, we investigate a specific configuration that combines a novel two-phase resampling method predicated on both clustering and ensemble techniques in conjunction with Long Short -Term Memory (LSTM) networks. Our findings demonstrate competitive effectiveness, underscoring its potential for effective imbalance correction by further enhancing prediction accuracy. Achieved results suggest that in almost all instances, the integrated approach outperforms the standalone methods across different scenarios in the three datasets, particularly in terms of the Area Under the Curve (AUC). This research represents a significant contribution to the field of churn prediction for addressing class imbalance.
引用
收藏
页数:22
相关论文
共 50 条
  • [1] Enhancing Customer Churn Prediction With Resampling: A Comparative Study
    Ong, Jia-Xuan
    Tong, Gee-Kok
    Khor, Kok-Chin
    Haw, Su-Cheng
    [J]. TEM JOURNAL-TECHNOLOGY EDUCATION MANAGEMENT INFORMATICS, 2024, 13 (03): : 1927 - 1936
  • [2] Applying Resampling Methods for Imbalanced Datasets to Not So Imbalanced Datasets
    Arbelaitz, Olatz
    Gurrutxaga, Ibai
    Muguerza, Javier
    Maria Perez, Jesus
    [J]. ADVANCES IN ARTIFICIAL INTELLIGENCE, CAEPIA 2013, 2013, 8109 : 111 - 120
  • [3] Comparative Methods for Personalized Customer Churn Prediction with Sequential Data
    Bayrak, Ahmet Tugrul
    Yuceturk, Guven
    Bahadir, Musa Berat
    Yalcinkaya, Sare Melek
    Demirdag, Melike
    Sayan, Ismail Utku
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (IEEE BIGCOMP 2022), 2022, : 222 - 225
  • [4] Customer churn prediction using a novel meta-classifier: an investigation on transaction, Telecommunication and customer churn datasets
    Ehsani, Fatemeh
    Hosseini, Monireh
    [J]. JOURNAL OF COMBINATORIAL OPTIMIZATION, 2024, 48 (01)
  • [5] Ensemble Methods in Customer Churn Prediction: A Comparative Analysis of the State-of-the-Art
    Bogaert, Matthias
    Delaere, Lex
    [J]. MATHEMATICS, 2023, 11 (05)
  • [6] Efficient Resampling Methods for Training Support Vector Machines with Imbalanced Datasets
    Batuwita, Rukshan
    Palade, Vasile
    [J]. 2010 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS IJCNN 2010, 2010,
  • [7] Study on Customer Churn Prediction Methods based on Multiple Classifiers Combination
    Xiao, Yao
    He, Changzheng
    Xiao, Jin
    [J]. 2009 THIRD INTERNATIONAL SYMPOSIUM ON INTELLIGENT INFORMATION TECHNOLOGY APPLICATION, VOL 1, PROCEEDINGS, 2009, : 597 - 601
  • [8] Study of machine learning methods for customer churn prediction in telecommunication company
    Sniegula, Anna
    Poniszewska-Maranda, Aneta
    Popovic, Milan
    [J]. IIWAS2019: THE 21ST INTERNATIONAL CONFERENCE ON INFORMATION INTEGRATION AND WEB-BASED APPLICATIONS & SERVICES, 2019, : 640 - 644
  • [9] Comparative Study of Dimension Reduction Methods for Highly Imbalanced Overlapping Churn Data
    Lee, Sujee
    Koo, Bonhyo
    Jung, Kyu-Hwan
    [J]. INDUSTRIAL ENGINEERING AND MANAGEMENT SYSTEMS, 2014, 13 (04): : 454 - 462
  • [10] Churn prediction methods based on mutual customer interdependence
    Ljubicic, Karmela
    Mercep, Andro
    Kostanjcar, Zvonko
    [J]. JOURNAL OF COMPUTATIONAL SCIENCE, 2023, 67