An interpretable automated feature engineering framework for improving logistic regression

被引:3
|
作者
Liu, Mucan [1 ,2 ]
Guo, Chonghui [1 ]
Xu, Liangchen [1 ]
机构
[1] Dalian Univ Technol, Inst Syst Engn, Dalian 116024, Peoples R China
[2] City Univ Hong Kong, Dept Informat Syst, Hong Kong, Peoples R China
关键词
Interpretable machine learning; Feature engineering; Automated machine learning; Knowledge distillation;
D O I
10.1016/j.asoc.2024.111269
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Although black -box models such as ensemble learning models often provide better predictive performance than intrinsic interpretable models such as logistic regression, black -box models are not still applicable due to the lack of interpretability. Recently, there has been an explosion of work on explainable machine learning techniques, which utilize external algorithms or models to explain the behavior of black -box models. However, it is problematic to explain the black -box model behavior because the explanation provided might not reveal the real mechanism or decision process of black -box models. In this study, instead of using explainable machine learning techniques, an automated feature engineering task was formulated to help logistic regression achieve predictive performance comparable to or even better than black -box models while maintaining interpretability. In this paper, an INterpretable Automated Feature ENgineering (INAFEN) framework was designed for logistic regression. This framework automatically transforms the nonlinear relationships between numerical features and labels into linear relationships, conducts feature cross through association rule mining, and distills knowledge from black -box models. A case study was performed on gastric survival prediction to present the rationality of the feature transformations through INAFEN and benchmark experiments to show the validity of INAFEN. Experimental results on 10 classification tasks demonstrated that INAFEN achieved an average ranking of 2.60 in area under the ROC curve (AUROC), 3.35 in area under the PR curve (AUROC), 3.70 in F1 score and 3.00 in Brier score (among 13 models), outperforming other interpretable baselines and even black -box models. In addition, the interpretability measurement of INAFEN is significantly better than that of black -box models.
引用
收藏
页数:23
相关论文
共 50 条
  • [41] PRISMA: Improving risk estimation with parallel logistic regression trees
    Arnrich, B
    Albert, A
    Walter, J
    FROM DATA AND INFORMATION ANALYSIS TO KNOWLEDGE ENGINEERING, 2006, : 87 - +
  • [42] Automated Feature Engineering using Kernel Functions
    Mahajan, Puneet
    2020 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND SIGNAL PROCESSING (AISP), 2020,
  • [43] Automated feature engineering for HTTP tunnel detection
    Davis, Jonathan J.
    Foo, Ernest
    COMPUTERS & SECURITY, 2016, 59 : 166 - 185
  • [44] Cognito: Automated Feature Engineering for Supervised Learning
    Khurana, Udayan
    Turaga, Deepak
    Samulowitz, Horst
    Parthasrathy, Srinivasan
    2016 IEEE 16TH INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW), 2016, : 1304 - 1307
  • [45] A general, flexible, and harmonious framework to construct interpretable functions in regression analysis
    Zhan, Tianyu
    Kang, Jian
    BIOMETRICS, 2025, 81 (01)
  • [46] Feature and Language Selection in Temporal Symbolic Regression for Interpretable Air Quality Modelling
    Lucena-Sanchez, Estrella
    Sciavicco, Guido
    Stan, Ionel Eduard
    ALGORITHMS, 2021, 14 (03)
  • [47] Engineering Asset Life Span Evaluation Using Logistic Regression
    Trappey, Amy J. C.
    Trappey, Charles V.
    Tsao, Wan-Ting
    PROCEEDINGS OF THE 7TH WORLD CONGRESS ON ENGINEERING ASSET MANAGEMENT (WCEAM 2012), 2015, : 573 - 582
  • [48] An interpretable deep feature aggregation framework for machinery incremental fault diagnosis
    Hu, Kui
    Chen, Qian
    Yao, Jintao
    He, Qingbo
    Peng, Zhike
    ADVANCED ENGINEERING INFORMATICS, 2025, 65
  • [49] A sparse logistic regression framework by difference of convex functions programming
    Liming Yang
    Yannan Qian
    Applied Intelligence, 2016, 45 : 241 - 254
  • [50] A logistic regression framework for information technology outsourcing lifecycle management
    Mojsilovic, Aleksandra
    Ray, Bonnie
    Lawrence, Richard
    Takriti, Samer
    COMPUTERS & OPERATIONS RESEARCH, 2007, 34 (12) : 3609 - 3627