Efficient big data model selection with applications to fraud detection

被引:13
|
作者
Vaughan, Gregory [1 ]
机构
[1] Bentley Univ, Dept Math Sci, Waltham, MA 02452 USA
关键词
Big data; Stagewise estimation; Sub-sampling; Fraud detection; Clustered data; ESTIMATING EQUATIONS; OPTIMIZATION;
D O I
10.1016/j.ijforecast.2018.03.002
中图分类号
F [经济];
学科分类号
02 ;
摘要
As the volume and complexity of data continues to grow, more attention is being focused on solving so-called big data problems. One field where this focus is pertinent is credit card fraud detection. Model selection approaches can identify key predictors for preventing fraud. Stagewise Selection is a classic model selection technique that has experienced a revitalized interest due to its computational simplicity and flexibility. Over a sequence of simple learning steps, stagewise techniques build a sequence of candidate models that is less greedy than the stepwise approach. This paper introduces a new stochastic stagewise technique that integrates a subsampling approach into the stagewise framework, yielding a simple tool for model selection when working with big data. Simulation studies demonstrate the proposed technique offers a reasonable trade off between computational cost and predictive performance. We apply the proposed approach to synthetic credit card fraud data to demonstrate the technique's application. (C) 2018 International Institute of Forecasters. Published by Elsevier B.V. All rights reserved.
引用
收藏
页码:1116 / 1127
页数:12
相关论文
共 50 条
  • [2] Telecom fraud detection with big data analytics
    Terzi, Duygu Sinanç
    Sağıroğlu, Şeref
    Kılınç, Hakan
    International Journal of Data Science, 2021, 6 (03) : 191 - 204
  • [3] Online Payment Fraud Detection for Big Data
    Tawde, Samiksha Dattaprasad
    Arora, Sandhya
    Thakur, Yashasvee Shitalkumar
    DISTRIBUTED COMPUTING AND INTELLIGENT TECHNOLOGY, ICDCIT 2024, 2024, 14501 : 324 - 337
  • [4] Hyperparameter Tuning for Medicare Fraud Detection in Big Data
    Hancock J.T.
    Khoshgoftaar T.M.
    SN Computer Science, 3 (6)
  • [5] Big Data fraud detection using multiple medicare data sources
    Herland, Matthew
    Khoshgoftaar, Taghi M.
    Bauder, Richard A.
    JOURNAL OF BIG DATA, 2018, 5 (01)
  • [6] A Workload Aware Model of Computational Resource Selection for Big Data Applications
    Gupta, Amit
    Xu, Weijia
    Ruiz-Juri, Natalia
    Perrine, Kenneth
    2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2016, : 2243 - 2250
  • [7] The Effects of Random Undersampling for Big Data Medicare Fraud Detection
    Hancock, John
    Khoshgoftaar, Taghi M.
    Johnson, Justin M.
    2022 16TH IEEE INTERNATIONAL CONFERENCE ON SERVICE-ORIENTED SYSTEM ENGINEERING (SOSE 2022), 2022, : 141 - 146
  • [8] SmartFD: A Real Big Data Application for Electrical Fraud Detection
    Gutierrez-Aviles, D.
    Fabregas, J. A.
    Tejedor, J.
    Martinez-Alvarez, F.
    Troncoso, A.
    Arcos, A.
    Riquelme, J. C.
    HYBRID ARTIFICIAL INTELLIGENT SYSTEMS (HAIS 2018), 2018, 10870 : 120 - 130
  • [9] Optimizing Ensemble Trees for Big Data Healthcare Fraud Detection
    Hancock, John
    Khoshgoftaar, Taghi M.
    2022 IEEE 23RD INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION FOR DATA SCIENCE (IRI 2022), 2022, : 243 - 249
  • [10] Maxout Neural Network for Big Data Medical Fraud Detection
    Castaneda, Gabriel
    Morris, Paul
    Khoshgoftaar, Taghi M.
    2019 IEEE FIFTH INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING SERVICE AND APPLICATIONS (IEEE BIGDATASERVICE 2019), 2019, : 357 - 362