MIC-SHAP: An ensemble feature selection method for materials machine learning

被引:5
|
作者
Wang, Junya [1 ]
Xu, Pengcheng [2 ]
Ji, Xiaobo [3 ]
Li, Minjie [3 ]
Lu, Wencong [3 ,4 ,5 ]
机构
[1] Shanghai Univ, Coll Sci, Dept Math, Shanghai 200444, Peoples R China
[2] Shanghai Univ, Mat Genome Inst, Shanghai 200444, Peoples R China
[3] Shanghai Univ, Coll Sci, Dept Chem, Shanghai 200444, Peoples R China
[4] Zhejiang Lab, Hangzhou 311100, Peoples R China
[5] Shanghai Univ, Key Lab Silicate Cultural Rel Conservat, Minist Educ, Shanghai 200444, Peoples R China
来源
基金
中国国家自然科学基金;
关键词
Ensemble feature selection; Materials machine learning; Interpretability; AIDED DESIGN; REGRESSION;
D O I
10.1016/j.mtcomm.2023.106910
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Feature selection has kept playing a significant role in the workflow of materials machine learning, but currently most of works of materials machine learning tend to use single or stepwise feature selection methods. A new ensemble feature selection method named MIC-SHAP was proposed in this work, which combines the SHapley Additive exPlanations (SHAP) method and the maximal information coefficient (MIC) method. The effectiveness of the ensemble feature selection method was evaluated with three different material datasets collected from publications. The results have demonstrated that MIC-SHAP method outperforms the commonly used feature selection methods, guaranteeing the prediction accuracy and greatly reducing the model complexity. The highest feature reduction rate is 91.67%, while the R2 of the 10-fold cross-validation reaches 0.98. The MIC-SHAP method could quickly select the optimal feature subset effectively, avoiding repeated attempts of different feature selection methods. Moreover, the MIC-SHAP method could increase the stability and interpretability of feature selection to help the subsequent process of materials design and discovery.
引用
收藏
页数:9
相关论文
共 50 条
  • [1] Machine Learning Ensemble Classifiers for Feature Selection in Rice Cultivars
    Thangavel, Chandrakumar
    Sakthipriya, D.
    [J]. APPLIED ARTIFICIAL INTELLIGENCE, 2024, 38 (01)
  • [2] Machine Learning for Data Center Optimizations: Feature Selection Using Shapley Additive exPlanation (SHAP)
    Gebreyesus, Yibrah
    Dalton, Damian
    Nixon, Sebastian
    De Chiara, Davide
    Chinnici, Marta
    [J]. FUTURE INTERNET, 2023, 15 (03)
  • [3] Solar Radiation Forecasting Using Machine Learning and Ensemble Feature Selection
    Solano, Edna S.
    Dehghanian, Payman
    Affonso, Carolina M.
    [J]. ENERGIES, 2022, 15 (19)
  • [4] EFS-MI: an ensemble feature selection method for classification An ensemble feature selection method
    Hoque, Nazrul
    Singh, Mihir
    Bhattacharyya, Dhruba K.
    [J]. COMPLEX & INTELLIGENT SYSTEMS, 2018, 4 (02) : 105 - 118
  • [5] EFS-MI: an ensemble feature selection method for classificationAn ensemble feature selection method
    Nazrul Hoque
    Mihir Singh
    Dhruba K. Bhattacharyya
    [J]. Complex & Intelligent Systems, 2018, 4 : 105 - 118
  • [6] EnRank: An Ensemble Method to Detect Pulmonary Hypertension Biomarkers Based on Feature Selection and Machine Learning Models
    Liu, Xiangju
    Zhang, Yu
    Fu, Chunli
    Zhang, Ruochi
    Zhou, Fengfeng
    [J]. FRONTIERS IN GENETICS, 2021, 12
  • [7] Feature Selection in Machine Learning for Perovskite Materials Design and Discovery
    Wang, Junya
    Xu, Pengcheng
    Ji, Xiaobo
    Li, Minjie
    Lu, Wencong
    [J]. MATERIALS, 2023, 16 (08)
  • [8] Machine learning in thermoelectric materials identification: Feature selection and analysis
    Xu, Yijing
    Jiang, Lu
    Qi, Xiang
    [J]. COMPUTATIONAL MATERIALS SCIENCE, 2021, 197
  • [9] Unsupervised feature selection with ensemble learning
    Elghazel, Haytham
    Aussem, Alex
    [J]. MACHINE LEARNING, 2015, 98 (1-2) : 157 - 180
  • [10] Unsupervised feature selection with ensemble learning
    Haytham Elghazel
    Alex Aussem
    [J]. Machine Learning, 2015, 98 : 157 - 180