Parallel Random Forest Algorithm Optimization Based on Maximal Information Coefficient

被引:0
|
作者
Liu, Song [1 ]
Hu, TianYu [1 ]
机构
[1] Qilu Univ Technol, Shandong Acad Sci, Dept Comp Sci & Technol, Jinan, Shandong, Peoples R China
关键词
Random Forest; Feature Selection; Maximal Information Coefficient; Spark; CLASSIFICATION;
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
In order to solve the problem that the traditional random forest algorithm runs too long or cannot be executed facing massive data, meanwhile in order to solve the problem that some redundant features are added to the training process and some strong expressive features are not selected when the traditional random forest algorithm randomly chooses features. A random forest algorithm based on maximum information coefficient (MIC) is proposed, and the algorithm is parallelized on the Spark platform. Firstly, MIC is used to rank each feature and the features are divided into three interval: high correlation interval, middle correlation interval and low correlation interval. In the process of constructing a single decision tree, the features of low correlation interval are deleted. Then, all the features of high correlation interval and the randomly selected features of middle correlation interval are selected to form a new feature subset to build the dectsion tree. Finally, the parallehzatlon of the algorithm is implemented based on Spark. The experimental results show that the proposed algorithm has a certain improvement in accuracy and stability compared with the traditional random forest algorithm.
引用
收藏
页码:1083 / 1087
页数:5
相关论文
共 50 条
  • [1] Optimization of parallel random forest algorithm based on distance weight
    Wang, Qinge
    Chen, Huihua
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2020, 39 (02) : 1951 - 1963
  • [2] A Fast Parallel Random Forest Algorithm Based on Spark
    Yin, Linzi
    Chen, Ken
    Jiang, Zhaohui
    Xu, Xuemei
    APPLIED SCIENCES-BASEL, 2023, 13 (10):
  • [3] A New Algorithm to Optimize Maximal Information Coefficient
    Chen, Yuan
    Zeng, Ying
    Luo, Feng
    Yuan, Zheming
    PLOS ONE, 2016, 11 (06):
  • [4] Improved Approximation Algorithm for Maximal Information Coefficient
    Wang, Shuliang
    Zhao, Yiping
    Shu, Yue
    Shi, Wenzhong
    INTERNATIONAL JOURNAL OF DATA WAREHOUSING AND MINING, 2017, 13 (01) : 76 - 93
  • [5] Optimization of the Random Forest Algorithm
    Mohapatra, Niva
    Shreya, K.
    Chinmay, Ayes
    ADVANCES IN DATA SCIENCE AND MANAGEMENT, 2020, 37 : 201 - 208
  • [6] Railway accidents analysis based on the improved algorithm of the maximal information coefficient
    Shao, Fubo
    Li, Keping
    Xu, Xiaoming
    INTELLIGENT DATA ANALYSIS, 2016, 20 (03) : 597 - 613
  • [7] DBRF: Random Forest Optimization Algorithm Based on DBSCAN
    Zhuo, Wang
    Ahmad, Azlin
    International Journal of Advanced Computer Science and Applications, 2024, 15 (09) : 354 - 362
  • [8] Research on Optimization of Random Forest Algorithm Based on Spark
    Wang, Suzhen
    Zhang, Zhanfeng
    Geng, Shanshan
    Pang, Chaoyi
    CMC-COMPUTERS MATERIALS & CONTINUA, 2022, 71 (02): : 3721 - 3731
  • [9] RANDOM FOREST ALGORITHM OPTIMIZATION OF ENTERPRISE FINANCIAL INFORMATION MANAGEMENT SYSTEM
    Liu, X. H.
    Wang, E. X.
    Zheng, Y. Q.
    LATIN AMERICAN APPLIED RESEARCH, 2018, 48 (04) : 255 - 260
  • [10] A Novel Bayesian Network Structure Learning Algorithm based on Maximal Information Coefficient
    Zhang, Yinghua
    Hu, Qiping
    Zhang, Wensheng
    Liu, Jin
    2012 IEEE FIFTH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTATIONAL INTELLIGENCE (ICACI), 2012, : 862 - 867