Feature selection for global tropospheric ozone prediction based on the BO-XGBoost-RFE algorithm

被引:22
|
作者
Zhang, Biao [1 ]
Zhang, Ying [2 ]
Jiang, Xuchu [2 ]
机构
[1] Liaocheng Univ, Sch Comp Sci, Liaocheng 252000, Shandong, Peoples R China
[2] Zhongnan Univ Econ & Law, Sch Stat & Math, Wuhan 430073, Peoples R China
关键词
AIR-QUALITY; CHEMISTRY;
D O I
10.1038/s41598-022-13498-2
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Ozone is one of the most important air pollutants, with significant impacts on human health, regional air quality and ecosystems. In this study, we use geographic information and environmental information of the monitoring site of 5577 regions in the world from 2010 to 2014 as feature input to predict the long-term average ozone concentration of the site. A Bayesian optimization-based XGBoost-RFE feature selection model BO-XGBoost-RFE is proposed, and a variety of machine learning algorithms are used to predict ozone concentration based on the optimal feature subset. Since the selection of the underlying model hyperparameters is involved in the recursive feature selection process, different hyperparameter combinations will lead to differences in the feature subsets selected by the model, so that the feature subsets obtained by the model may not be optimal solutions. We combine the Bayesian optimization algorithm to adjust the parameters of recursive feature elimination based on XGBoost to obtain the optimal parameter combination and the optimal feature subset under the parameter combination. Experiments on long-term ozone concentration prediction on a global scale show that the prediction accuracy of the model after Bayesian optimized XGBoost-RFE feature selection is higher than that based on all features and on feature selection with Pearson correlation. Among the four prediction models, random forest obtained the highest prediction accuracy. The XGBoost prediction model achieved the greatest improvement in accuracy.
引用
收藏
页数:10
相关论文
共 50 条
  • [41] Binary biogeography-based optimization based SVM-RFE for feature selection
    Albashish, Dheeb
    Hammouri, Abdelaziz, I
    Braik, Malik
    Atwan, Jaffar
    Sahran, Shahnorbanun
    APPLIED SOFT COMPUTING, 2021, 101
  • [42] An XGboost Algorithm Based Model for Financial Risk Prediction
    Xu, Yunsong
    Li, Jiaqi
    Wu, Anqi
    TEHNICKI VJESNIK-TECHNICAL GAZETTE, 2024, 31 (06): : 1898 - 1907
  • [43] Duration Prediction for Truck Crashes Based on the XGBoost Algorithm
    Gu, Tingting
    Yang, Shunxin
    CICTP 2019: TRANSPORTATION IN CHINA-CONNECTING THE WORLD, 2019, : 5021 - 5031
  • [44] Feature-Grouping-Based Two Steps Feature Selection Algorithm in Software Defect Prediction
    Du, Yuntao
    Zhang, Lu
    Shi, Jiahao
    Tang, Jingjuan
    Yin, Ying
    ICAIP 2018: 2018 THE 2ND INTERNATIONAL CONFERENCE ON ADVANCES IN IMAGE PROCESSING, 2018, : 173 - 178
  • [45] XGBoost framework with feature selection for the prediction of RNA N5-methylcytosine sites
    Abbas, Zeeshan
    Rehman, Mobeen ur
    Tayara, Hilal
    Zou, Quan
    Chong, Kil To
    MOLECULAR THERAPY, 2023, 31 (08) : 2543 - 2551
  • [46] Specific Emitter Identification Based on ACO-XGBoost Feature Selection
    Cao, Jianjun
    Gu, Chumei
    Wang, Baowei
    Xu, Yuxin
    Wang, Mengda
    WEB AND BIG DATA, PT I, APWEB-WAIM 2022, 2023, 13421 : 76 - 90
  • [47] Triangular-based sine cosine algorithm for global search and feature selection
    Jiacong Liu
    Chunguang Bi
    Huiling Chen
    Ali Asghar Heidari
    He Chen
    Scientific Reports, 15 (1)
  • [48] Feature Selection Framework for XGBoost Based on Electrodermal Activity in Stress Detection
    Hsieh, Cheng-Ping
    Chen, Yi-Ta
    Beh, Win-Ken
    Wu, An-Yeu Andy
    PROCEEDINGS OF THE 2019 IEEE INTERNATIONAL WORKSHOP ON SIGNAL PROCESSING SYSTEMS (SIPS 2019), 2019, : 330 - 335
  • [49] Feature Selection for Hypertension Risk Prediction Using XGBoost on Single Nucleotide Polymorphism Data
    Muflikhah, Lailil
    Fatyanosa, Tirana Noor
    Widodo, Nashi
    Perdana, Rizal Setya
    Solimun
    Ratnawati, Hana
    HEALTHCARE INFORMATICS RESEARCH, 2025, 31 (01) : 16 - 22
  • [50] Financial distress prediction based on ensemble feature selection and improved stacking algorithm
    Wu, Chong
    Chen, Xiaofang
    Jiang, Yongjie
    KYBERNETES, 2024,