An Ensemble Random Forest Algorithm for Insurance Big Data Analysis

被引:15
|
作者
Wu, Ziming [1 ]
Lin, Weiwei [1 ]
Zhang, Zilong [1 ]
Wen, Angzhan [1 ]
Lin, Longxin [2 ]
机构
[1] SCUT, Sch Comp Engn & Sci, Guangzhou, Guangdong, Peoples R China
[2] Jinan Univ, JNU, Coll Informat Sci & Technol, Guangzhou, Guangdong, Peoples R China
基金
中国国家自然科学基金;
关键词
Imbalance Classification; Ensemble Learning; Random Forest; Big Data; Spark; SMOTE;
D O I
10.1109/CSE-EUC.2017.99
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Due to the imbalanced distribution of business data, missing of user features and many other reasons, directly using big data techniques on realistic business data tends to deviate from the business goals. It is difficult to model the insurance business data by classification algorithms like Logistic Regression and SVM etc. This paper exploits a heuristic bootstrap sampling approach combined with the ensemble learning algorithm on the large-scale insurance business data mining, and proposes an ensemble random forest algorithm which used the parallel computing capability and memory-cache mechanism optimized by Spark. We collected the insurance business data from China Life Insurance Company to analyze the potential customers using the proposed algorithm. Experiment result shows that the ensemble random forest algorithm outperformed SVM and other classification algorithms in both performance and accuracy within the imbalanced data.
引用
收藏
页码:531 / 536
页数:6
相关论文
共 50 条
  • [21] Application of Big Data Unbalanced Classification Algorithm in Credit Risk Analysis of Insurance Companies
    Wu, Xian
    Liu, Huan
    [J]. JOURNAL OF MATHEMATICS, 2022, 2022
  • [22] Random forest algorithm for classification of multiwavelength data
    Gao, Dan
    Zhang, Yan-Xia
    Zhao, Yong-Heng
    [J]. RESEARCH IN ASTRONOMY AND ASTROPHYSICS, 2009, 9 (02) : 220 - 226
  • [23] Random forest algorithm for classification of multiwavelength data
    Dan Gao1
    2 Graduate University of Chinese Academy of Sciences
    [J]. Research in Astronomy and Astrophysics, 2009, 9 (02) : 220 - 226
  • [24] A random forest algorithm under the ensemble approach for feature selection and classification
    Kharwar, Ankit
    Thakor, Devendra
    [J]. INTERNATIONAL JOURNAL OF COMMUNICATION NETWORKS AND DISTRIBUTED SYSTEMS, 2023, 29 (04) : 426 - 447
  • [25] Random forest method and application in stream big data systems
    Liu, Yingchun
    Chen, Meiling
    [J]. Xibei Gongye Daxue Xuebao/Journal of Northwestern Polytechnical University, 2015, 33 (06): : 1055 - 1061
  • [26] Random Bits Forest: a Strong Classifier/Regressor for Big Data
    Yi Wang
    Yi Li
    Weilin Pu
    Kathryn Wen
    Yin Yao Shugart
    Momiao Xiong
    Li Jin
    [J]. Scientific Reports, 6
  • [27] Random Bits Forest: a Strong Classifier/Regressor for Big Data
    Wang, Yi
    Li, Yi
    Pu, Weilin
    Wen, Kathryn
    Shugart, Yin Yao
    Xiong, Momiao
    Jin, Li
    [J]. SCIENTIFIC REPORTS, 2016, 6
  • [28] Hyperparameters Optimization in Scalable Random Forest For Big Data Analytics
    Oo, Myal Cho Mon
    Thein, Thandar
    [J]. 2019 IEEE 4TH INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATION SYSTEMS (ICCCS 2019), 2019, : 125 - 129
  • [29] On the use of MapReduce for imbalanced big data using Random Forest
    del Rio, Sara
    Lopez, Victoria
    Manuel Benitez, Jose
    Herrera, Francisco
    [J]. INFORMATION SCIENCES, 2014, 285 : 112 - 137
  • [30] Performance Analysis of Random Forest Algorithm in Automatic Building Segmentation with Limited Data
    Widyastuti, Ratri
    Suwardhi, Deni
    Meilano, Irwan
    Hernandi, Andri
    Putri, Nabila S. E.
    Saptari, Asep Yusup
    Sudarman
    [J]. ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION, 2024, 13 (07)