An Ensemble Random Forest Algorithm for Insurance Big Data Analysis

被引:15
|
作者
Wu, Ziming [1 ]
Lin, Weiwei [1 ]
Zhang, Zilong [1 ]
Wen, Angzhan [1 ]
Lin, Longxin [2 ]
机构
[1] SCUT, Sch Comp Engn & Sci, Guangzhou, Guangdong, Peoples R China
[2] Jinan Univ, JNU, Coll Informat Sci & Technol, Guangzhou, Guangdong, Peoples R China
基金
中国国家自然科学基金;
关键词
Imbalance Classification; Ensemble Learning; Random Forest; Big Data; Spark; SMOTE;
D O I
10.1109/CSE-EUC.2017.99
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Due to the imbalanced distribution of business data, missing of user features and many other reasons, directly using big data techniques on realistic business data tends to deviate from the business goals. It is difficult to model the insurance business data by classification algorithms like Logistic Regression and SVM etc. This paper exploits a heuristic bootstrap sampling approach combined with the ensemble learning algorithm on the large-scale insurance business data mining, and proposes an ensemble random forest algorithm which used the parallel computing capability and memory-cache mechanism optimized by Spark. We collected the insurance business data from China Life Insurance Company to analyze the potential customers using the proposed algorithm. Experiment result shows that the ensemble random forest algorithm outperformed SVM and other classification algorithms in both performance and accuracy within the imbalanced data.
引用
收藏
页码:531 / 536
页数:6
相关论文
共 50 条
  • [41] A Multiple Fuzzy C-Means Ensemble Cluster Forest for Big Data
    Lahmar, Ines
    Zaier, Aida
    Yahia, Mohamed
    Boaullegue, Ridha
    [J]. HYBRID INTELLIGENT SYSTEMS, HIS 2021, 2022, 420 : 442 - 451
  • [42] Research of Medical High-dimensional Imbalanced Data Classification-Ensemble Feature Selection Algorithm with Random Forest
    Zhu, Min
    Su, Bo
    Ning, Gangmin
    [J]. 2017 INTERNATIONAL CONFERENCE ON SMART GRID AND ELECTRICAL AUTOMATION (ICSGEA), 2017, : 273 - 277
  • [43] MapReduce Distributed Highly Random Fuzzy Forest for Noisy Big Data
    Mustafic, Faruk
    Xiong, Ning
    Herera, Francisco
    Gallego, Sergio Ramrez
    [J]. 2017 13TH INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION, FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (ICNC-FSKD), 2017, : 560 - 567
  • [44] Limited random walk algorithm for big graph data clustering
    Zhang H.
    Raitoharju J.
    Kiranyaz S.
    Gabbouj M.
    [J]. Journal of Big Data, 3 (1)
  • [45] Green mining algorithm for big data based on random matrix
    [J]. Canwei, Wang (wangcanwei@sina.com), 1600, Science and Engineering Research Support Society (09):
  • [46] Lost in a random forest: Using Big Data to study rare events
    Bail, Christopher A.
    [J]. BIG DATA & SOCIETY, 2015, 2 (02):
  • [47] A Review on Random Forest: An Ensemble Classifier
    Parmar, Aakash
    Katariya, Rakesh
    Patel, Vatsal
    [J]. INTERNATIONAL CONFERENCE ON INTELLIGENT DATA COMMUNICATION TECHNOLOGIES AND INTERNET OF THINGS, ICICI 2018, 2019, 26 : 758 - 763
  • [48] Analysis of an Ensemble Algorithm for Clustering Cancer Data
    Wu, Dengyuan
    Sheng, Li
    Xu, Eric
    Xing, Kai
    Chen, Dechang
    [J]. 2012 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE WORKSHOPS (BIBMW), 2012,
  • [49] Towards the Learning from Low Quality Data in a Fuzzy Random Forest ensemble
    Cadenas, Jose M.
    Carmen Garrido, M.
    Martinez, Raquel
    Bonissone, Piero P.
    [J]. IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ 2011), 2011, : 2897 - 2904
  • [50] BIG DATA AND INSURANCE SYMPOSIUM
    George Jepsen, Attorney General
    [J]. CONNECTICUT INSURANCE LAW JOURNAL, 2014, 21 (01): : 255 - 259