Symbolic regression as a feature engineering method for machine and deep learning regression tasks

被引:1
|
作者
Shmuel, Assaf [1 ]
Glickman, Oren [1 ]
Lazebnik, Teddy [2 ,3 ]
机构
[1] Bar Ilan Univ, Dept Comp Sci, Ramat Gan, Israel
[2] Ariel Univ, Dept Math, Ariel, Israel
[3] UCL, Canc Inst, Dept Canc Biol, London, England
来源
关键词
symbolic regression; neural network; data-driven physics; feature engineering; data science; FEATURE-SELECTION; BIG DATA; MODEL;
D O I
10.1088/2632-2153/ad513a
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the realm of machine and deep learning (DL) regression tasks, the role of effective feature engineering (FE) is pivotal in enhancing model performance. Traditional approaches of FE often rely on domain expertise to manually design features for machine learning (ML) models. In the context of DL models, the FE is embedded in the neural network's architecture, making it hard for interpretation. In this study, we propose to integrate symbolic regression (SR) as an FE process before a ML model to improve its performance. We show, through extensive experimentation on synthetic and 21 real-world datasets, that the incorporation of SR-derived features significantly enhances the predictive capabilities of both machine and DL regression models with 34%-86% root mean square error (RMSE) improvement in synthetic datasets and 4%-11.5% improvement in real-world datasets. In an additional realistic use case, we show the proposed method improves the ML performance in predicting superconducting critical temperatures based on Eliashberg theory by more than 20% in terms of RMSE. These results outline the potential of SR as an FE component in data-driven models, improving them in terms of performance and interpretability.
引用
收藏
页数:14
相关论文
共 50 条
  • [41] An enhanced extreme learning machine based on ridge regression for regression
    Li, Guoqiang
    Niu, Peifeng
    NEURAL COMPUTING & APPLICATIONS, 2013, 22 (3-4): : 803 - 810
  • [42] An enhanced extreme learning machine based on ridge regression for regression
    Guoqiang Li
    Peifeng Niu
    Neural Computing and Applications, 2013, 22 : 803 - 810
  • [43] Integration of Neural Network-Based Symbolic Regression in Deep Learning for Scientific Discovery
    Kim, Samuel
    Lu, Peter Y.
    Mukherjee, Srijon
    Gilbert, Michael
    Jing, Li
    Ceperic, Vladimir
    Soljacic, Marin
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2021, 32 (09) : 4166 - 4177
  • [44] Harmonia: A Unified Architecture for Efficient Deep Symbolic Regression
    Ma, Tianyun
    Wen, Yuanbo
    Song, Xinkai
    Jin, Pengwei
    Huang, Di
    Han, Husheng
    Nan, Ziyuan
    Yu, Zhongkai
    Peng, Shaohui
    Zhao, Yongwei
    Chen, Huaping
    Du, Zidong
    Hu, Xing
    Guo, Qi
    IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2025, 44 (02) : 737 - 750
  • [45] Unified Feature Selection and Hyperparameter Bayesian Optimization for Machine Learning based Regression
    Sandru, Elena-Diana
    David, Emilian
    2019 INTERNATIONAL SYMPOSIUM ON SIGNALS, CIRCUITS AND SYSTEMS (ISSCS 2019), 2019,
  • [46] A Machine Learning Approach for a Robust Irrigation Prediction via Regression and Feature Selection
    Ben Abdallah, Emna
    Grati, Rima
    Fredj, Malek
    Boukadi, Khouloud
    ADVANCED INFORMATION NETWORKING AND APPLICATIONS, AINA-2022, VOL 1, 2022, 449 : 491 - 502
  • [47] Feature Imputation using Neutrosophic Set Theory in Machine Learning Regression Context
    El Touati, Yamen
    Abdelfattah, Walid
    ENGINEERING TECHNOLOGY & APPLIED SCIENCE RESEARCH, 2024, 14 (02) : 13688 - 13694
  • [48] Multitree Genetic Programming With Feature-Based Transfer Learning for Symbolic Regression on Incomplete Data
    Al-Helali, Baligh
    Chen, Qi
    Xue, Bing
    Zhang, Mengjie
    IEEE TRANSACTIONS ON CYBERNETICS, 2024, 54 (07) : 4014 - 4027
  • [49] Automated Grammar-based Feature Selection in Symbolic Regression
    Ali, Muhammad Sarmad
    Kshirsagar, Meghana
    Naredo, Enrique
    Ryan, Conor
    PROCEEDINGS OF THE 2022 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE (GECCO'22), 2022, : 902 - 910
  • [50] Learning Through Utility Optimization in Regression Tasks
    Branco, Paula
    Torgo, Luis
    Ribeiro, Rita P.
    Frank, Eibe
    Pfahringer, Bernhard
    Rau, Markus Michael
    2017 IEEE INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA), 2017, : 30 - 39