Efficient big data model selection with applications to fraud detection

被引:13
|
作者
Vaughan, Gregory [1 ]
机构
[1] Bentley Univ, Dept Math Sci, Waltham, MA 02452 USA
关键词
Big data; Stagewise estimation; Sub-sampling; Fraud detection; Clustered data; ESTIMATING EQUATIONS; OPTIMIZATION;
D O I
10.1016/j.ijforecast.2018.03.002
中图分类号
F [经济];
学科分类号
02 ;
摘要
As the volume and complexity of data continues to grow, more attention is being focused on solving so-called big data problems. One field where this focus is pertinent is credit card fraud detection. Model selection approaches can identify key predictors for preventing fraud. Stagewise Selection is a classic model selection technique that has experienced a revitalized interest due to its computational simplicity and flexibility. Over a sequence of simple learning steps, stagewise techniques build a sequence of candidate models that is less greedy than the stepwise approach. This paper introduces a new stochastic stagewise technique that integrates a subsampling approach into the stagewise framework, yielding a simple tool for model selection when working with big data. Simulation studies demonstrate the proposed technique offers a reasonable trade off between computational cost and predictive performance. We apply the proposed approach to synthetic credit card fraud data to demonstrate the technique's application. (C) 2018 International Institute of Forecasters. Published by Elsevier B.V. All rights reserved.
引用
收藏
页码:1116 / 1127
页数:12
相关论文
共 50 条
  • [31] Research on Efficient Data Warehouse Construction Methods for Big Data Applications
    Zhao, Chenggang
    Du, Junwei
    Wang, Furong
    Li, Haojie
    Applied Mathematics and Nonlinear Sciences, 2024, 9 (01)
  • [32] Efficient Detection of Environmental Violators: A Big Data Approach
    Chang, Xiangyu
    Huang, Yinghui
    Li, Mei
    Bo, Xin
    Kumar, Subodha
    PRODUCTION AND OPERATIONS MANAGEMENT, 2021, 30 (05) : 1246 - 1270
  • [33] A big data-based anti-fraud model for internet finance
    Liu F.
    You Y.
    Revue d'Intelligence Artificielle, 2020, 34 (04) : 501 - 506
  • [34] A study on rare fraud predictions with big Medicare claims fraud data
    Bauder, Richard A.
    Khoshgoftaar, Taghi M.
    INTELLIGENT DATA ANALYSIS, 2020, 24 (01) : 141 - 161
  • [35] Automobile insurance fraud detection in the age of big data - a systematic and comprehensive literature review
    Benedek, Botond
    Ciumas, Cristina
    Nagy, Balint Zsolt
    JOURNAL OF FINANCIAL REGULATION AND COMPLIANCE, 2022, 30 (04) : 503 - 523
  • [36] The effect of big data competencies and tone at the top on internal auditors fraud detection effectiveness
    Dewi, Novy Silvia
    Said, Jamaliah
    Faiza, Sharifah Nazatul
    Julian, Lufti
    DECISION SCIENCE LETTERS, 2024, 13 (01) : 153 - 160
  • [37] Model-Robust Subdata Selection for Big Data
    Chenlu Shi
    Boxin Tang
    Journal of Statistical Theory and Practice, 2021, 15
  • [38] Improved Feature Selection Model for Big Data Analytics
    El-Hasnony, Ibrahim M.
    Barakat, Sherif I.
    Elhoseny, Mohamed
    Mostafa, Reham R.
    IEEE ACCESS, 2020, 8 : 66989 - 67004
  • [39] Model-Robust Subdata Selection for Big Data
    Shi, Chenlu
    Tang, Boxin
    JOURNAL OF STATISTICAL THEORY AND PRACTICE, 2021, 15 (04)
  • [40] The nuclear techniques and the selection of model parameters in big data
    Chunhong, Wang, 1600, Science and Engineering Research Support Society (07):