An interpretable automated feature engineering framework for improving logistic regression

被引:3
|
作者
Liu, Mucan [1 ,2 ]
Guo, Chonghui [1 ]
Xu, Liangchen [1 ]
机构
[1] Dalian Univ Technol, Inst Syst Engn, Dalian 116024, Peoples R China
[2] City Univ Hong Kong, Dept Informat Syst, Hong Kong, Peoples R China
关键词
Interpretable machine learning; Feature engineering; Automated machine learning; Knowledge distillation;
D O I
10.1016/j.asoc.2024.111269
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Although black -box models such as ensemble learning models often provide better predictive performance than intrinsic interpretable models such as logistic regression, black -box models are not still applicable due to the lack of interpretability. Recently, there has been an explosion of work on explainable machine learning techniques, which utilize external algorithms or models to explain the behavior of black -box models. However, it is problematic to explain the black -box model behavior because the explanation provided might not reveal the real mechanism or decision process of black -box models. In this study, instead of using explainable machine learning techniques, an automated feature engineering task was formulated to help logistic regression achieve predictive performance comparable to or even better than black -box models while maintaining interpretability. In this paper, an INterpretable Automated Feature ENgineering (INAFEN) framework was designed for logistic regression. This framework automatically transforms the nonlinear relationships between numerical features and labels into linear relationships, conducts feature cross through association rule mining, and distills knowledge from black -box models. A case study was performed on gastric survival prediction to present the rationality of the feature transformations through INAFEN and benchmark experiments to show the validity of INAFEN. Experimental results on 10 classification tasks demonstrated that INAFEN achieved an average ranking of 2.60 in area under the ROC curve (AUROC), 3.35 in area under the PR curve (AUROC), 3.70 in F1 score and 3.00 in Brier score (among 13 models), outperforming other interpretable baselines and even black -box models. In addition, the interpretability measurement of INAFEN is significantly better than that of black -box models.
引用
收藏
页数:23
相关论文
共 50 条
  • [1] Interpretable Functional Logistic Regression
    Lv, Cui
    Chen, Di-Rong
    PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND APPLICATION ENGINEERING (CSAE2018), 2018,
  • [2] LbR: A New Regression Architecture for Automated Feature Engineering
    Wang, Meng
    Ding, Zhijun
    Pan, Meiqin
    20TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW 2020), 2020, : 432 - 439
  • [3] Improving firefly algorithm-based logistic regression for feature selection
    Kahya, Mohammed Abdulrazaq
    Altamir, Suhaib Abduljabbar
    Algamal, Zakariya Yahya
    JOURNAL OF INTERDISCIPLINARY MATHEMATICS, 2019, 22 (08) : 1577 - 1581
  • [4] Ensemble Logistic Regression for Feature Selection
    Zakharov, Roman
    Dupont, Pierre
    PATTERN RECOGNITION IN BIOINFORMATICS, 2011, 7036 : 133 - 144
  • [5] Interpretable Diagnosis of ADHD Based on Wavelet Features and Logistic Regression
    Pastrana-Cortes, Julian D.
    Camila Maya-Piedrahita, Maria
    Marcela Herrera-Gomez, Paula
    Cardenas-Pena, David
    Orozco-Gutierrez, Alvaro A.
    PROGRESS IN ARTIFICIAL INTELLIGENCE AND PATTERN RECOGNITION, 2021, 13055 : 424 - 433
  • [6] Interpretable Feature Construction for Time Series Extrinsic Regression
    Gay, Dominique
    Bondu, Alexis
    Lemaire, Vincent
    Boulle, Marc
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2021, PT I, 2021, 12712 : 804 - 816
  • [7] Step down logistic regression for feature selection
    Baykal, N
    APPLIED STATISTICAL SCIENCE IV, 1999, 4 : 121 - 131
  • [8] Stable Feature Ranking with Logistic Regression Ensembles
    Nowling, Ronald J.
    Emrich, Scott J.
    2017 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2017, : 585 - 589
  • [9] An Experiment on Feature Selection Using Logistic Regression
    Islam, Raisa
    Mazumdar, Subhasish
    Islam, Rakibul
    2024 5TH INFORMATION COMMUNICATION TECHNOLOGIES CONFERENCE, ICTC 2024, 2024, : 319 - 324
  • [10] An automated exact solution framework towards solving the logistic regression best subset selection problem
    van Niekerk, Thomas K.
    Venter, Jacques V.
    Terblanche, Stephanus E.
    SOUTH AFRICAN STATISTICAL JOURNAL, 2023, 57 (02) : 89 - 129