A feature extraction method for small sample data based on optimal ensemble random forest

被引:0
|
作者
Zhang W. [1 ]
Zhang H. [1 ]
机构
[1] School of Mechanical Engineering, Northwestern Polytechnical University, Xi′an
关键词
data expansion; ensemble of optimal trees; feature extraction; high dimensional small sample data; random forest;
D O I
10.1051/jnwpu/20224061261
中图分类号
学科分类号
摘要
High dimensional small sample data is the difficulty of data mining. When using the traditional random forest algorithm for feature selection, it is to have the poor stability and low accuracy of feature importance ranking caused by over fitting of classification results. Aiming at the difficulties of random forest in the dimensionality reduction of small sample data, a feature extraction algorithm ote-gwrffs is proposed based on small sample data. Firstly, the algorithm expands the samples based on the generated countermeasure network Gan to avoid the over fitting phenomenon of traditional random forest in the small sample classification. Then, on the basis of data expansion, the optimal tree set algorithm based on weight is adopted to reduce the impact of data distribution error on feature extraction accuracy and improve the overall stability of decision tree set. Finally, the weighted average of the weight and feature importance measure of a single decision tree is used to obtain the feature importance ranking, which solves the problem of low accuracy and poor stability in the feature selection process of small sample data. Through the UCI data set, the present algorithm is compared with the traditional random forest algorithm and the weight based random forest algorithm. The ote-gwrffs algorithm has higher stability and accuracy for processing high-dimensional and small sample data. ©2022 Journal of Northwestern Polytechnical University.
引用
下载
收藏
页码:1261 / 1268
页数:7
相关论文
共 12 条
  • [1] HASSAN H, BADR A, ABDELHALIM M B., Prediction of o-glycosylation sites using random forest and GA-tuned PSO technique, Bioinformatics & Biology Insights, 9, 9, pp. 103-109, (2015)
  • [2] ROBIN G, JEAN-MICHEL P, CHRISTINE T., Variable selection using random forests, Pattern Recognit, Lett, 31, pp. 2225-2236, (2010)
  • [3] YAO Dengju, YANG Jing, ZHAN Xiaojuan, Feature selection algorithm based on random forest, Journal of Jilin University, 44, 1, pp. 137-141, (2014)
  • [4] WANG Xiang, HU Xuegang, A review of feature selection in high-dimensional small sample classification, Computer Application, 37, 9, pp. 2433-2438, (2017)
  • [5] XU Shaocheng, LI Dongxi, Weighted feature selection algorithm based on random forest, Statistics and Decision Making, 34, 18, pp. 25-28, (2018)
  • [6] LI H B, WANG W, DING H W, Et al., Trees weighting random forest method for classifying high dimensional noisy data, IEEE 7th International Conference on E-Business Engineering, (2010)
  • [7] KHAN Zardad, ASMA Gul, ARIS Perperoglou, Et al., Ensemble of optimal trees, random forest and random projection ensemble classification, Advances in Data Analysis and Classification, 14, pp. 97-116, (2020)
  • [8] KHAN Z, GUL A, MAHMOUD O, Et al., An ensemble of optimal trees for class membership probability estimation∥Analysis of large and complex data, pp. 395-409, (2016)
  • [9] WEN B, LUIS O, COLON K P., Subbalakshmi and ramamurti chandramouli causal-TGAN: generating tabular data using causal generative adversarial networks, (2021)
  • [10] ZHAO Qingping, CHEN Debao, JIANG Enhua, Et al., An improved weighted nonlocal mean image denoising algorithm, Journal of Electronic Measurement and Instrument, 28, 3, pp. 334-339, (2014)