An improved random forest based on the classification accuracy and correlation measurement of decision trees

被引:105
|
作者
Sun, Zhigang [1 ,2 ,4 ,5 ]
Wang, Guotao [1 ,2 ,4 ,5 ]
Li, Pengfei [2 ,4 ,5 ]
Wang, Hui [3 ]
Zhang, Min [1 ]
Liang, Xiaowen [1 ]
机构
[1] Heilongjiang Univ, Sch Elect & Elect Engn, Harbin 150080, Peoples R China
[2] Harbin Inst Technol, Sch Elect Engn & Automat, Harbin 150001, Peoples R China
[3] Yangzhou Univ, Sch Hydraul Sci & Engn, Yangzhou 225009, Peoples R China
[4] Key Lab Elect & Elect Reliabil Technol Heilongjian, Harbin 150001, Peoples R China
[5] MOE Key Lab Reliabil & Qual Consistency Elect Comp, Harbin 150001, Peoples R China
关键词
Classification accuracy; Correlation measurement; Dot product; Random forest; CART; DISTANCE;
D O I
10.1016/j.eswa.2023.121549
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Random forest is one of the most widely used machine learning algorithms. Decision trees used to construct the random forest may have low classification accuracies or high correlations, which affects the comprehensive performance of the random forest. Aiming at these problems, the authors proposed an improved random forest based on the classification accuracy and correlation measurement of decision trees in this paper. Its main idea includes two parts, one is retaining the classification and regression trees (CARTs) with better classification effects, the other is reducing the correlations between the CARTs. Specifically, in the classification effect evaluation part, each CART was applied to make predictions on three reserved data sets, then the average classifi-cation accuracies were achieved, respectively. Thus, all the CARTs were sorted in descending order according to their achieved average classification accuracies. In the correlation measurement part, the improved dot product method was proposed to calculate the cosine similarity, i.e., the correlation, between CARTs in the feature space. By using the achieved average classification accuracy as reference, the grid search method was used to find the inner product threshold. On this basis, the CARTs with low average classification accuracy among CART pairs whose inner product values are higher than the inner product threshold were marked as deletable. The achieved average classification accuracies and correlations of CARTs were comprehensively considered, those with high correlation and weak classification effect were deleted, and those with better quality were retained to construct the random forest. Multiple experiments show that, the proposed improved random forest achieved higher average classification accuracy than the five random forests used for comparison, and the lead was stable. The G-means and out-of-bag data (OBD) score obtained by the proposed improved random forest were also higher than the five random forests, and the lead was more obvious. In addition, the test results of three non-parametric tests show that, there were significant diversities between the proposed improved random forest and the other five random forests. This effectively proves the superiority and practicability of the proposed improved random forest.
引用
收藏
页数:19
相关论文
共 50 条
  • [1] Skin lesion classification using decision trees and random forest algorithms
    Dhivyaa, C. R.
    Sangeetha, K.
    Balamurugan, M.
    Amaran, Sibi
    Vetriselvi, T.
    Johnpaul, P.
    JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2020, 15 (Suppl 1) : 157 - 157
  • [2] An Improved Random Decision Trees Algorithm with Application to Land Cover Classification
    Xu, Haiwei
    Yang, Minhua
    Liang, Liang
    2010 18TH INTERNATIONAL CONFERENCE ON GEOINFORMATICS, 2010,
  • [3] Improved Random Forest for Classification
    Paul, Angshuman
    Mukherjee, Dipti Prasad
    Das, Prasun
    Gangopadhyay, Abhinandan
    Chintha, Appa Rao
    Kundu, Saurabh
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2018, 27 (08) : 4012 - 4024
  • [4] Image Classification Based on Improved Random Forest Algorithm
    Man, Weishi
    Ji, Yuanyuan
    Zhang, Zhiyu
    2018 IEEE 3RD INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND BIG DATA ANALYSIS (ICCCBDA), 2018, : 346 - 350
  • [5] An efficient Pearson correlation based improved random forest classification for protein structure prediction techniques
    Kalaiselvi, B.
    Thangamani, M.
    MEASUREMENT, 2020, 162
  • [6] Pattern classification with random decision forest
    Wang, Honghai
    2012 INTERNATIONAL CONFERENCE ON INDUSTRIAL CONTROL AND ELECTRONICS ENGINEERING (ICICEE), 2012, : 128 - 130
  • [7] A kernel-based quantum random forest for improved classification
    Srikumar, Maiyuren
    Hill, Charles D.
    Hollenberg, Lloyd C. L.
    QUANTUM MACHINE INTELLIGENCE, 2024, 6 (01)
  • [8] Not seeing the wood for the trees: Influences on random forest accuracy
    Hand, Chris
    Fitkov-Norris, Elena
    INTERNATIONAL JOURNAL OF MARKET RESEARCH, 2024, 66 (05) : 559 - 566
  • [9] A novel improved random forest for text classification using feature ranking and optimal number of trees
    Jalal, Nasir
    Mehmood, Arif
    Choi, Gyu Sang
    Ashraf, Imran
    JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2022, 34 (06) : 2733 - 2742
  • [10] Ensemble of optimal trees, random forest and random projection ensemble classification
    Zardad Khan
    Asma Gul
    Aris Perperoglou
    Miftahuddin Miftahuddin
    Osama Mahmoud
    Werner Adler
    Berthold Lausen
    Advances in Data Analysis and Classification, 2020, 14 : 97 - 116