An interpretable automated feature engineering framework for improving logistic regression

被引:3
|
作者
Liu, Mucan [1 ,2 ]
Guo, Chonghui [1 ]
Xu, Liangchen [1 ]
机构
[1] Dalian Univ Technol, Inst Syst Engn, Dalian 116024, Peoples R China
[2] City Univ Hong Kong, Dept Informat Syst, Hong Kong, Peoples R China
关键词
Interpretable machine learning; Feature engineering; Automated machine learning; Knowledge distillation;
D O I
10.1016/j.asoc.2024.111269
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Although black -box models such as ensemble learning models often provide better predictive performance than intrinsic interpretable models such as logistic regression, black -box models are not still applicable due to the lack of interpretability. Recently, there has been an explosion of work on explainable machine learning techniques, which utilize external algorithms or models to explain the behavior of black -box models. However, it is problematic to explain the black -box model behavior because the explanation provided might not reveal the real mechanism or decision process of black -box models. In this study, instead of using explainable machine learning techniques, an automated feature engineering task was formulated to help logistic regression achieve predictive performance comparable to or even better than black -box models while maintaining interpretability. In this paper, an INterpretable Automated Feature ENgineering (INAFEN) framework was designed for logistic regression. This framework automatically transforms the nonlinear relationships between numerical features and labels into linear relationships, conducts feature cross through association rule mining, and distills knowledge from black -box models. A case study was performed on gastric survival prediction to present the rationality of the feature transformations through INAFEN and benchmark experiments to show the validity of INAFEN. Experimental results on 10 classification tasks demonstrated that INAFEN achieved an average ranking of 2.60 in area under the ROC curve (AUROC), 3.35 in area under the PR curve (AUROC), 3.70 in F1 score and 3.00 in Brier score (among 13 models), outperforming other interpretable baselines and even black -box models. In addition, the interpretability measurement of INAFEN is significantly better than that of black -box models.
引用
收藏
页数:23
相关论文
共 50 条
  • [11] Interpretable prediction of stroke prognosis: SHAP for SVM and nomogram for logistic regression
    Guo, Kun
    Zhu, Bo
    Zha, Lei
    Shao, Yuan
    Liu, Zhiqin
    Gu, Naibing
    Chen, Kongbo
    FRONTIERS IN NEUROLOGY, 2025, 16
  • [12] Evolutionary Automated Feature Engineering
    Zhu, Guanghui
    Jiang, Shen
    Guo, Xu
    Yuan, Chunfeng
    Huang, Yihua
    PRICAI 2022: TRENDS IN ARTIFICIAL INTELLIGENCE, PT I, 2022, 13629 : 574 - 586
  • [13] FELIX: Automatic and Interpretable Feature Engineering Using LLMs
    Malberg, Simon
    Mosca, Edoardo
    Groh, Georg
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES: RESEARCH TRACK, PT IV, ECML PKDD 2024, 2024, 14944 : 230 - 246
  • [14] Feature Selection and Interpretable Feature Transformation: A Preliminary Study on Feature Engineering for Classification Algorithms
    Tallon-Ballesteros, Antonio J.
    Tuba, Milan
    Xue, Bing
    Hashimoto, Takako
    INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING (IDEAL 2018), PT II, 2018, 11315 : 280 - 287
  • [15] A Conceptual Framework for Ordered Logistic Regression Models
    Fullerton, Andrew S.
    SOCIOLOGICAL METHODS & RESEARCH, 2009, 38 (02) : 306 - 347
  • [16] Automated Feature Document Review via Interpretable Deep Learning
    Ye, Ming
    Chen, Yuanfan
    Zhang, Xin
    He, Jinning
    Cao, Jicheng
    Liu, Dong
    Gao, Jing
    Dai, Hailiang
    Cheng, Shengyu
    2023 IEEE/ACM 45TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING: COMPANION PROCEEDINGS, ICSE-COMPANION, 2023, : 351 - 354
  • [17] UAV Detection Using the Cepstral Feature with Logistic Regression
    Seo, Yoojeong
    Jang, Beomhui
    Jung, Jangwon
    Im, Sungbin
    2018 TENTH INTERNATIONAL CONFERENCE ON UBIQUITOUS AND FUTURE NETWORKS (ICUFN 2018), 2018, : 219 - 222
  • [18] Feature selection with multi-class logistic regression
    Wang, Jingyu
    Wang, Hongmei
    Nie, Feiping
    Li, Xuelong
    NEUROCOMPUTING, 2023, 543
  • [19] Explainable Machine Learning for Improving Logistic Regression Models
    Yang, Yimin
    Wu, Min
    2021 IEEE 19TH INTERNATIONAL CONFERENCE ON INDUSTRIAL INFORMATICS (INDIN), 2021,
  • [20] Improving Faulty Interaction Localization Using Logistic Regression
    Nishiura, Kinari
    Choi, Eun-Hye
    Mizuno, Osamu
    2017 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE QUALITY, RELIABILITY AND SECURITY (QRS), 2017, : 138 - 149