Feature selection for global tropospheric ozone prediction based on the BO-XGBoost-RFE algorithm

被引:22
|
作者
Zhang, Biao [1 ]
Zhang, Ying [2 ]
Jiang, Xuchu [2 ]
机构
[1] Liaocheng Univ, Sch Comp Sci, Liaocheng 252000, Shandong, Peoples R China
[2] Zhongnan Univ Econ & Law, Sch Stat & Math, Wuhan 430073, Peoples R China
关键词
AIR-QUALITY; CHEMISTRY;
D O I
10.1038/s41598-022-13498-2
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Ozone is one of the most important air pollutants, with significant impacts on human health, regional air quality and ecosystems. In this study, we use geographic information and environmental information of the monitoring site of 5577 regions in the world from 2010 to 2014 as feature input to predict the long-term average ozone concentration of the site. A Bayesian optimization-based XGBoost-RFE feature selection model BO-XGBoost-RFE is proposed, and a variety of machine learning algorithms are used to predict ozone concentration based on the optimal feature subset. Since the selection of the underlying model hyperparameters is involved in the recursive feature selection process, different hyperparameter combinations will lead to differences in the feature subsets selected by the model, so that the feature subsets obtained by the model may not be optimal solutions. We combine the Bayesian optimization algorithm to adjust the parameters of recursive feature elimination based on XGBoost to obtain the optimal parameter combination and the optimal feature subset under the parameter combination. Experiments on long-term ozone concentration prediction on a global scale show that the prediction accuracy of the model after Bayesian optimized XGBoost-RFE feature selection is higher than that based on all features and on feature selection with Pearson correlation. Among the four prediction models, random forest obtained the highest prediction accuracy. The XGBoost prediction model achieved the greatest improvement in accuracy.
引用
收藏
页数:10
相关论文
共 50 条
  • [21] Bankruptcy Prediction using the XGBoost Algorithm and Variable Importance Feature Engineering
    Ben Jabeur, Sami
    Stef, Nicolae
    Carmona, Pedro
    COMPUTATIONAL ECONOMICS, 2023, 61 (02) : 715 - 741
  • [22] A Method for Cancer Genomics Feature Selection Based on LASSO-RFE
    Ai, Chen
    IRANIAN JOURNAL OF SCIENCE AND TECHNOLOGY TRANSACTION A-SCIENCE, 2022, 46 (03): : 731 - 738
  • [23] Feature selection for tumor classification based on improved SVM-RFE
    Li, Hangeng
    Duan, Yanhua
    Li, Qingshou
    Ruan, Xiaogang
    PROGRESS IN INTELLIGENCE COMPUTATION AND APPLICATIONS, PROCEEDINGS, 2007, : 422 - 424
  • [24] Feature Selection for SNP Data Based on SVM-RFE and AGA
    Yang, Xutao
    Wu, Yue
    Jia, Min
    Lei, Zhou
    Liu, Zongtian
    2011 AASRI CONFERENCE ON APPLIED INFORMATION TECHNOLOGY (AASRI-AIT 2011), VOL 1, 2011, : 204 - 208
  • [25] A Method for Cancer Genomics Feature Selection Based on LASSO-RFE
    Chen Ai
    Iranian Journal of Science and Technology, Transactions A: Science, 2022, 46 : 731 - 738
  • [26] A NEW FEATURE SELECTION METHOD BASED ON RELIEF AND SVM-RFE
    Fu Ruigang
    Wang Ping
    Gao Yinghui
    Hua Xiaoqiang
    2014 12TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP), 2014, : 1363 - 1366
  • [27] HMMPred: Accurate Prediction of DNA-Binding Proteins Based on HMM Profiles and XGBoost Feature Selection
    Sang, Xiuzhi
    Xiao, Wanyue
    Zheng, Huiwen
    Yang, Yang
    Liu, Taigang
    COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE, 2020, 2020 (2020)
  • [28] A parallel feature selection method based on NMI-XGBoost and distance correlation for typhoon trajectory prediction
    Baiyou Qiao
    Jiaqi Wu
    Rui Wang
    Yuanqing Hao
    Peirui Wang
    Donghong Han
    Gang Wu
    The Journal of Supercomputing, 2024, 80 : 11293 - 11321
  • [29] An evolutionary deep learning model based on XGBoost feature selection and Gaussian data augmentation for AQI prediction
    Qian, Shijie
    Peng, Tian
    Tao, Zihan
    Li, Xi
    Nazir, Muhammad Shahzad
    Zhang, Chu
    PROCESS SAFETY AND ENVIRONMENTAL PROTECTION, 2024, 191 : 836 - 851
  • [30] A parallel feature selection method based on NMI-XGBoost and distance correlation for typhoon trajectory prediction
    Qiao, Baiyou
    Wu, Jiaqi
    Wang, Rui
    Hao, Yuanqing
    Wang, Peirui
    Han, Donghong
    Wu, Gang
    JOURNAL OF SUPERCOMPUTING, 2024, 80 (08): : 11293 - 11321