A parallel feature selection method based on NMI-XGBoost and distance correlation for typhoon trajectory prediction

被引:3
|
作者
Qiao, Baiyou [1 ]
Wu, Jiaqi [1 ]
Wang, Rui [2 ]
Hao, Yuanqing [1 ]
Wang, Peirui [1 ]
Han, Donghong [1 ]
Wu, Gang [1 ]
机构
[1] Northeastern Univ, Sch Comp Sci & Engn, Shenyang 110819, Peoples R China
[2] Chinese Acad Sci, Shenyang Inst Automat, Shenyang 110169, Peoples R China
来源
JOURNAL OF SUPERCOMPUTING | 2024年 / 80卷 / 08期
基金
中国国家自然科学基金;
关键词
Feature selection; NMI; XGBoost; Distance correlation; Spark; ASSOCIATION; DEPENDENCE; MODEL;
D O I
10.1007/s11227-023-05863-3
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Typhoon trajectory related data involve many factors, such as atmospheric factors, oceanic factors, and physical factors. It has the characteristics of high dimension, strong spatio-temporal correlation, and nonlinear correlation, which increases the difficulty of typhoon trajectory prediction. Using feature selection approaches to select appropriate prediction factors becomes an important means to reduce the dimension of typhoon trajectory related data and improve the performance and accuracy of typhoon trajectory prediction methods. However, the existing feature selection methods based on linear correlation analysis cannot well depict the nonlinear correlation between data features, which results in low accuracy of feature selection. The feature selection methods based on nonlinear correlation analysis are computationally expensive, which affects the timeliness of feature selection. To solve the problem, we propose a parallel feature selection method NX-Spark-DC based on the Spark platform for typhoon trajectory related data. The method firstly filters out the redundant features of typhoon related data by normalized mutual information (NMI) method, subsequently eliminates the useless features by XGBoost machine learning model, and thus reducing the dimension of typhoon related data. On this basis, an improved Spark-based parallel distance correlation algorithm (Spark-DC) is proposed to select the feature combinations with strong correlation. A series of experimental results show that NX-Spark-DC method has high execution efficiency and accuracy, which is significantly better than the existing methods.
引用
收藏
页码:11293 / 11321
页数:29
相关论文
共 50 条
  • [41] AN OPTIMAL FEATURE SUBSET SELECTION METHOD BASED ON DISTANCE DISCRIMINANT AND DISTRIBUTION OVERLAPPING
    Liang, Jianning
    Yang, Su
    Wang, Yuanyuan
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2009, 23 (08) : 1577 - 1597
  • [42] GalNAc-transferase specificity prediction based on feature selection method
    Lu, Lin
    Niu, Bing
    Zhao, Jun
    Liu, Liang
    Lu, Wen-Cong
    Liu, Xiao-Jun
    Li, Yi-Xue
    Cai, Yu-Dong
    PEPTIDES, 2009, 30 (02) : 359 - 364
  • [43] A cluster-based hybrid feature selection method for defect prediction
    Wang, Fei
    Ai, Jun
    Zou, Zhuoliang
    2019 IEEE 19TH INTERNATIONAL CONFERENCE ON SOFTWARE QUALITY, RELIABILITY AND SECURITY (QRS 2019), 2019, : 1 - 9
  • [44] A Hybrid Multi-feature Road Network Selection Method Based on Trajectory Data
    Ma J.
    Sun Q.
    Wen B.
    Zhou Z.
    Lu C.
    Lü Z.
    Sun S.
    Wuhan Daxue Xuebao (Xinxi Kexue Ban)/Geomatics and Information Science of Wuhan University, 2022, 47 (07): : 1009 - 1016
  • [45] A Feature Selection Method of Parallel Grey Wolf Optimization Algorithm Based on Spark
    Chen, Hongwei
    Han, Lin
    Hu, Zhou
    Hou, Qiao
    Ye, Zhiwei
    Zeng, Jun
    Yuan, Jiansen
    PROCEEDINGS OF THE 2019 10TH IEEE INTERNATIONAL CONFERENCE ON INTELLIGENT DATA ACQUISITION AND ADVANCED COMPUTING SYSTEMS - TECHNOLOGY AND APPLICATIONS (IDAACS), VOL. 1, 2019, : 81 - 85
  • [46] A parallel rough set based dependency calculation method for efficient feature selection
    Raza, Muhammad Summair
    Qamar, Usman
    APPLIED SOFT COMPUTING, 2018, 71 : 1020 - 1034
  • [47] A Machine Learning Method with Threshold Based Parallel Feature Fusion and Feature Selection for Automated Gait Recognition
    Sharif, Muhammad
    Attique, Muhammad
    Tahir, Muhammad Zeeshan
    Yasmim, Mussarat
    Saba, Tanzila
    Tanik, Urcun John
    JOURNAL OF ORGANIZATIONAL AND END USER COMPUTING, 2020, 32 (02) : 67 - 92
  • [48] Novel Feature-Based Difficulty Prediction Method for Mathematics Items Using XGBoost-Based SHAP Model
    Yi, Xifan
    Sun, Jianing
    Wu, Xiaopeng
    MATHEMATICS, 2024, 12 (10)
  • [49] A Risk Prediction Model for Type 2 Diabetes Based on Weighted Feature Selection of Random Forest and XGBoost Ensemble Classifier
    Xu, Zhongxian
    Wang, Zhiliang
    2019 ELEVENTH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTATIONAL INTELLIGENCE (ICACI 2019), 2019, : 278 - 283
  • [50] PM2.5 Concentration Prediction Based on Spatiotemporal Feature Selection Using XGBoost-MSCNN-GA-LSTM
    Dai, Hongbin
    Huang, Guangqiu
    Zeng, Huibin
    Yang, Fan
    SUSTAINABILITY, 2021, 13 (21)