Parallel Random Forest Algorithm Optimization Based on Maximal Information Coefficient

被引:0
|
作者
Liu, Song [1 ]
Hu, TianYu [1 ]
机构
[1] Qilu Univ Technol, Shandong Acad Sci, Dept Comp Sci & Technol, Jinan, Shandong, Peoples R China
关键词
Random Forest; Feature Selection; Maximal Information Coefficient; Spark; CLASSIFICATION;
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
In order to solve the problem that the traditional random forest algorithm runs too long or cannot be executed facing massive data, meanwhile in order to solve the problem that some redundant features are added to the training process and some strong expressive features are not selected when the traditional random forest algorithm randomly chooses features. A random forest algorithm based on maximum information coefficient (MIC) is proposed, and the algorithm is parallelized on the Spark platform. Firstly, MIC is used to rank each feature and the features are divided into three interval: high correlation interval, middle correlation interval and low correlation interval. In the process of constructing a single decision tree, the features of low correlation interval are deleted. Then, all the features of high correlation interval and the randomly selected features of middle correlation interval are selected to form a new feature subset to build the dectsion tree. Finally, the parallehzatlon of the algorithm is implemented based on Spark. The experimental results show that the proposed algorithm has a certain improvement in accuracy and stability compared with the traditional random forest algorithm.
引用
收藏
页码:1083 / 1087
页数:5
相关论文
共 50 条
  • [41] Feature selection algorithm based on random forest
    Yao, Deng-Ju
    Yang, Jing
    Zhan, Xiao-Juan
    Jilin Daxue Xuebao (Gongxueban)/Journal of Jilin University (Engineering and Technology Edition), 2014, 44 (01): : 137 - 141
  • [42] Space Transformation Based Random Forest Algorithm
    Guan, Xiaoqiang
    Wang, Wenjian
    Pang, Jifang
    Meng, Yinfeng
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2021, 58 (11): : 2485 - 2499
  • [43] An Improved Algorithm based on KNN and Random Forest
    Liang, Jun
    Liu, Qin
    Nie, Nuihua
    Zeng, Biqing
    Zhang, Zanbo
    PROCEEDINGS OF THE THIRD INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND APPLICATION ENGINEERING (CSAE2019), 2019,
  • [44] On the maximal size of tree in a random forest
    Pavlov, Yuriy L.
    DISCRETE MATHEMATICS AND APPLICATIONS, 2024, 34 (04): : 221 - 232
  • [45] Margin optimization based pruning for random forest
    Yang, Fan
    Lu, Wei-hang
    Luo, Lin-kai
    Li, Tao
    NEUROCOMPUTING, 2012, 94 : 54 - 63
  • [46] Random Forest Optimization Algorithm Fusion with Approximate Markov Blanket
    Luo, Jigen
    Xiong, Lingzhu
    Du, Jianqiang
    Nie, Bin
    Xiong, Wangping
    Li, Zhiqin
    Computer Engineering and Applications, 2023, 59 (20) : 77 - 84
  • [47] Estimation of the coefficient of permeability as an example of the application of the Random Forest algorithm in Civil Engineering
    Dzi, Justyna
    Sas, Wojciech
    ARCHIVES OF CIVIL ENGINEERING, 2024, 70 (02) : 119 - 134
  • [48] Feature subset selection combining maximal information entropy and maximal information coefficient
    Zheng, Kangfeng
    Wang, Xiujuan
    Wu, Bin
    Wu, Tong
    APPLIED INTELLIGENCE, 2020, 50 (02) : 487 - 501
  • [49] Feature subset selection combining maximal information entropy and maximal information coefficient
    Kangfeng Zheng
    Xiujuan Wang
    Bin Wu
    Tong Wu
    Applied Intelligence, 2020, 50 : 487 - 501
  • [50] Stability prediction of circular sliding failure soil slopes based on a genetic algorithm optimization of random forest algorithm
    Hu, Shengming
    Lu, Yongfei
    Liu, Xuanchi
    Huang, Cheng
    Wang, Zhou
    Huang, Lei
    Zhang, Weihang
    Li, Xiaoyang
    ELECTRONIC RESEARCH ARCHIVE, 2024, 32 (11): : 6120 - 6139