An interpretable automated feature engineering framework for improving logistic regression

被引:3
|
作者
Liu, Mucan [1 ,2 ]
Guo, Chonghui [1 ]
Xu, Liangchen [1 ]
机构
[1] Dalian Univ Technol, Inst Syst Engn, Dalian 116024, Peoples R China
[2] City Univ Hong Kong, Dept Informat Syst, Hong Kong, Peoples R China
关键词
Interpretable machine learning; Feature engineering; Automated machine learning; Knowledge distillation;
D O I
10.1016/j.asoc.2024.111269
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Although black -box models such as ensemble learning models often provide better predictive performance than intrinsic interpretable models such as logistic regression, black -box models are not still applicable due to the lack of interpretability. Recently, there has been an explosion of work on explainable machine learning techniques, which utilize external algorithms or models to explain the behavior of black -box models. However, it is problematic to explain the black -box model behavior because the explanation provided might not reveal the real mechanism or decision process of black -box models. In this study, instead of using explainable machine learning techniques, an automated feature engineering task was formulated to help logistic regression achieve predictive performance comparable to or even better than black -box models while maintaining interpretability. In this paper, an INterpretable Automated Feature ENgineering (INAFEN) framework was designed for logistic regression. This framework automatically transforms the nonlinear relationships between numerical features and labels into linear relationships, conducts feature cross through association rule mining, and distills knowledge from black -box models. A case study was performed on gastric survival prediction to present the rationality of the feature transformations through INAFEN and benchmark experiments to show the validity of INAFEN. Experimental results on 10 classification tasks demonstrated that INAFEN achieved an average ranking of 2.60 in area under the ROC curve (AUROC), 3.35 in area under the PR curve (AUROC), 3.70 in F1 score and 3.00 in Brier score (among 13 models), outperforming other interpretable baselines and even black -box models. In addition, the interpretability measurement of INAFEN is significantly better than that of black -box models.
引用
收藏
页数:23
相关论文
共 50 条
  • [21] Automated Feature Engineering for Algorithmic Fairness
    Salazar, Ricardo
    Neutatz, Felix
    Abedjan, Ziawasch
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2021, 14 (09): : 1694 - 1702
  • [22] Improving Prediction Accuracy for Logistic Regression On Imbalanced Datasets
    Zhang, Hao
    Li, Zhuolin
    Shahriar, Hossain
    Tao, Lixin
    Bhattacharya, Prabir
    Qian, Ying
    2019 IEEE 43RD ANNUAL COMPUTER SOFTWARE AND APPLICATIONS CONFERENCE (COMPSAC), VOL 1, 2019, : 918 - 919
  • [23] DIFER: Differentiable Automated Feature Engineering
    Zhu, Guanghui
    Xu, Zhuoer
    Yuan, Chunfeng
    Huang, Yihua
    INTERNATIONAL CONFERENCE ON AUTOMATED MACHINE LEARNING, VOL 188, 2022, 188
  • [24] Automated target regression framework for LACP
    Raj, V. Arun
    Patil, Shruti
    Bamini, A.
    MATERIALS TODAY-PROCEEDINGS, 2022, 64 : 570 - 577
  • [25] Automated target regression framework for LACP
    Raj, V. Arun
    Patil, Shruti
    Bamini, A.
    MATERIALS TODAY-PROCEEDINGS, 2022, 64 : 570 - 577
  • [26] Linear iterative feature embedding: an ensemble framework for an interpretable model
    Sudjianto, Agus
    Qiu, Jinwen
    Li, Miaoqi
    Chen, Jie
    NEURAL COMPUTING & APPLICATIONS, 2023, 35 (13): : 9657 - 9685
  • [27] A Game-Theoretic Framework for Interpretable Preference and Feature Learning
    Polato, Mirko
    Aiolli, Fabio
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2018, PT I, 2018, 11139 : 659 - 668
  • [28] Registration by Regression (RbR): A Framework for Interpretable and Flexible Atlas Registration
    Gopinath, Karthik
    Hu, Xiaoling
    Hoffmann, Malte
    Puonti, Oula
    Iglesias, Juan Eugenio
    BIOMEDICAL IMAGE REGISTRATION, WBIR 2024, 2025, 15249 : 205 - 215
  • [29] A weighted logistic regression for conjoint analysis and kansei engineering
    Barone, Stefano
    Lombardo, Alberto
    Tarantino, Pietro
    QUALITY AND RELIABILITY ENGINEERING INTERNATIONAL, 2007, 23 (06) : 689 - 706
  • [30] Linear iterative feature embedding: an ensemble framework for an interpretable model
    Agus Sudjianto
    Jinwen Qiu
    Miaoqi Li
    Jie Chen
    Neural Computing and Applications, 2023, 35 : 9657 - 9685