Feature selection algorithm based on XGBoost

被引:0
|
作者
Li Z. [1 ,2 ,3 ]
Liu Z. [2 ,3 ]
机构
[1] College of Computer Science and Technology, Jilin University, Changchun
[2] College of Software, Jilin University, Changchun
[3] Key Laboratory of Symbolic Computation and Knowledge Engineering, Ministry of Education, Jilin University, Changchun
来源
基金
中国国家自然科学基金;
关键词
Feature selection; Sequential floating forward selection; XGBoost;
D O I
10.11959/j.issn.1000-436x.2019154
中图分类号
学科分类号
摘要
Feature selection in classification has always been an important but difficult problem. This kind of problem requires that feature selection algorithms can not only help classifiers to improve the classification accuracy, but also reduce the redundant features as much as possible. Therefore, in order to solve feature selection in the classification problems better, a new wrapped feature selection algorithm XGBSFS was proposed. The thought process of building trees in XGBoost was used for reference, and the importance of features from three importance metrics was measured to avoid the limitation of single importance metric. Then the improved sequential floating forward selection (ISFFS) was applied to search the feature subset so that it had high quality. Compared with the experimental results of eight datasets in UCI, the proposed algorithm has good performance. © 2019, Editorial Board of Journal on Communications. All right reserved.
引用
收藏
页码:101 / 108
页数:7
相关论文
共 26 条
  • [1] Zhou T., Lu H.L., Wang W.W., Et al., GA-SVM based feature selection and parameter optimization in hospitalization expense modeling, Applied Soft Computing, 75, pp. 323-332, (2019)
  • [2] Li J.D., Cheng K.W., Wang S.H., Et al., Feature selection: a data perspective, ACM Computing Surveys, 50, 6, pp. 1-45, (2017)
  • [3] Zhou Z.H., Machine Learning, (2016)
  • [4] Liu H., Yu L., Toward integrating feature selection algorithms for classification and clustering, IEEE Transactions on Knowledge and Data Engineering, 17, 4, pp. 491-502, (2005)
  • [5] Almuallim H., Dietterich T.G., Learning boolean concepts in the presence of many irrelevant features, Artificial Intelligence, 69, 1-2, pp. 279-305, (1994)
  • [6] Kamath U., De J.K., Shehu A., Effective automated feature construction and selection for classification of biological sequences, Plos One, 9, 7, (2014)
  • [7] Guyon I., Elisseeff A., An introduction to variable and feature selection, Journal of Machine Learning Research, 3, 6, pp. 1157-1182, (2003)
  • [8] Zakeri A., Hokmabadi A., Efficient feature selection method using real-valued grasshopper optimization algorithm, Expert Systems with Applications, 119, pp. 61-72, (2019)
  • [9] Xue B., Zhang M., Browne W.N., Et al., A survey on evolutionary computation approaches to feature selection, IEEE Transactions on Evolutionary Computation, 20, 4, pp. 606-626, (2016)
  • [10] Ghaemi M., Feizi-Derakhshi M.R., Feature selection using forest optimization algorithm, Pattern Recognition, 60, pp. 121-129, (2016)