Parallel Random Forest Algorithm Optimization Based on Maximal Information Coefficient

被引:0
|
作者
Liu, Song [1 ]
Hu, TianYu [1 ]
机构
[1] Qilu Univ Technol, Shandong Acad Sci, Dept Comp Sci & Technol, Jinan, Shandong, Peoples R China
关键词
Random Forest; Feature Selection; Maximal Information Coefficient; Spark; CLASSIFICATION;
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
In order to solve the problem that the traditional random forest algorithm runs too long or cannot be executed facing massive data, meanwhile in order to solve the problem that some redundant features are added to the training process and some strong expressive features are not selected when the traditional random forest algorithm randomly chooses features. A random forest algorithm based on maximum information coefficient (MIC) is proposed, and the algorithm is parallelized on the Spark platform. Firstly, MIC is used to rank each feature and the features are divided into three interval: high correlation interval, middle correlation interval and low correlation interval. In the process of constructing a single decision tree, the features of low correlation interval are deleted. Then, all the features of high correlation interval and the randomly selected features of middle correlation interval are selected to form a new feature subset to build the dectsion tree. Finally, the parallehzatlon of the algorithm is implemented based on Spark. The experimental results show that the proposed algorithm has a certain improvement in accuracy and stability compared with the traditional random forest algorithm.
引用
收藏
页码:1083 / 1087
页数:5
相关论文
共 50 条
  • [21] Optimization of Entrepreneurship Education for College Students Based on Improved Random Forest Algorithm
    Jia, Dongfeng
    Zhao, Hui
    MOBILE INFORMATION SYSTEMS, 2022, 2022
  • [22] Research on power line communication optimization algorithm based on improved random forest
    Xie W.
    Sun Y.
    Huang Y.
    Dianli Xitong Baohu yu Kongzhi/Power System Protection and Control, 2019, 47 (11): : 22 - 29
  • [23] Wind Power Forecasting Using Parallel Random Forest Algorithm
    Natarajan, V. Anantha
    Kumari, N. Sandhya
    SOFT COMPUTING FOR PROBLEM SOLVING, SOCPROS 2018, VOL 1, 2020, 1048 : 209 - 224
  • [24] Improved heuristic equivalent search algorithm based on Maximal Information Coefficient for Bayesian Network Structure Learning
    Zhang, Yinghua
    Zhang, Wensheng
    Xie, Yuan
    NEUROCOMPUTING, 2013, 117 : 186 - 195
  • [25] Equitability, mutual information, and the maximal information coefficient
    Kinney, Justin B.
    Atwal, Gurinder S.
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2014, 111 (09) : 3354 - 3359
  • [26] Parallel deep forest algorithm based on Spark and three-way interactive information
    Mao, Yimin
    Zhou, Zhan
    Chen, Zhigang
    Tongxin Xuebao/Journal on Communications, 2023, 44 (08): : 228 - 240
  • [27] ON THE EXPECTED PERFORMANCE OF A PARALLEL ALGORITHM FOR FINDING MAXIMAL INDEPENDENT SUBSETS OF A RANDOM GRAPH
    CALKIN, NJ
    FRIEZE, AM
    KUCERA, L
    RANDOM STRUCTURES & ALGORITHMS, 1992, 3 (02) : 215 - 221
  • [28] An improved sine–cosine algorithm based on orthogonal parallel information for global optimization
    Rizk M. Rizk-Allah
    Soft Computing, 2019, 23 : 7135 - 7161
  • [29] Railway Accidents Analysis and Prevention Based on the Maximal Information Coefficient
    Shao Fubo
    Li Kepig
    STATISTIC APPLICATION IN MODERN SOCIETY, 2015, : 213 - 218
  • [30] MODEL SELECTION METHOD BASED ON MAXIMAL INFORMATION COEFFICIENT OF RESIDUALS
    谭秋衡
    蒋杭进
    丁义明
    Acta Mathematica Scientia, 2014, 34 (02) : 579 - 592