An Ensemble Random Forest Algorithm for Insurance Big Data Analysis

被引:15
|
作者
Wu, Ziming [1 ]
Lin, Weiwei [1 ]
Zhang, Zilong [1 ]
Wen, Angzhan [1 ]
Lin, Longxin [2 ]
机构
[1] SCUT, Sch Comp Engn & Sci, Guangzhou, Guangdong, Peoples R China
[2] Jinan Univ, JNU, Coll Informat Sci & Technol, Guangzhou, Guangdong, Peoples R China
基金
中国国家自然科学基金;
关键词
Imbalance Classification; Ensemble Learning; Random Forest; Big Data; Spark; SMOTE;
D O I
10.1109/CSE-EUC.2017.99
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Due to the imbalanced distribution of business data, missing of user features and many other reasons, directly using big data techniques on realistic business data tends to deviate from the business goals. It is difficult to model the insurance business data by classification algorithms like Logistic Regression and SVM etc. This paper exploits a heuristic bootstrap sampling approach combined with the ensemble learning algorithm on the large-scale insurance business data mining, and proposes an ensemble random forest algorithm which used the parallel computing capability and memory-cache mechanism optimized by Spark. We collected the insurance business data from China Life Insurance Company to analyze the potential customers using the proposed algorithm. Experiment result shows that the ensemble random forest algorithm outperformed SVM and other classification algorithms in both performance and accuracy within the imbalanced data.
引用
收藏
页码:531 / 536
页数:6
相关论文
共 50 条
  • [1] An Ensemble Random Forest Algorithm for Insurance Big Data Analysis
    Lin, Weiwei
    Wu, Ziming
    Lin, Longxin
    Wen, Angzhan
    Li, Jin
    [J]. IEEE ACCESS, 2017, 5 : 16568 - 16575
  • [2] Random forest algorithm in big data environment
    Liu, Yingchun
    [J]. Computer Modelling and New Technologies, 2014, 18 (12): : 147 - 151
  • [3] A Distributed Ensemble of Deep Convolutional Neural Networks with Random Forest for Big Data Sentiment Analysis
    Hammou, Badr Ait
    Lahcen, Ayoub Ait
    Mouline, Salma
    [J]. MOBILE, SECURE, AND PROGRAMMABLE NETWORKING, 2019, 11557 : 153 - 162
  • [4] Cascade Parallel Random Forest Algorithm for Predicting Rice Diseases in Big Data Analysis
    Zhang, Lei
    Xie, Lun
    Wang, Zhiliang
    Huang, Chen
    [J]. ELECTRONICS, 2022, 11 (07)
  • [5] Analysis and Evaluation of Sports Effect Based on Random Forest Algorithm under Big Data
    Liang, Kai
    Zang, Dongdong
    [J]. MOBILE INFORMATION SYSTEMS, 2022, 2022
  • [6] Principal Components Analysis Random Discretization Ensemble for Big Data
    Garcia-Gil, Diego
    Ramirez-Gallego, Sergio
    Garcia, Salvador
    Herrera, Francisco
    [J]. KNOWLEDGE-BASED SYSTEMS, 2018, 150 : 166 - 174
  • [7] ENSEMBLE CLASSIFIER WITH RANDOM FOREST ALGORITHM TO DEAL WITH IMBALANCED HEALTHCARE DATA
    Anbarasi, M. S.
    Janani, V.
    [J]. 2017 INTERNATIONAL CONFERENCE ON INFORMATION COMMUNICATION AND EMBEDDED SYSTEMS (ICICES), 2017,
  • [8] An Advanced Random Forest Algorithm Targeting the Big Data with Redundant Features
    Zhang, Ying
    Song, Bin
    Zhang, Yue
    Chen, Sijia
    [J]. ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2017, 2017, 10393 : 642 - 651
  • [9] Accelerating Big Data Analysis through LASSO-Random Forest Algorithm in QSAR Studies
    Motamedi, Fahimeh
    Perez-Sanchez, Horacio
    Mehridehnavi, Alireza
    Fassihi, Afshin
    Ghasemi, Fahimeh
    [J]. BIOINFORMATICS, 2022, 38 (02) : 469 - 475
  • [10] UNBALANCED BIG DATA CLASSIFICATION BASED ON IMPROVED RANDOM FOREST ALGORITHM
    Zheng, Xin
    Huang, Li
    [J]. INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2024, 20 (02): : 575 - 590