Choosing function sets with better generalisation performance for symbolic regression models

被引:15
|
作者
Nicolau, Miguel [1 ]
Agapitos, Alexandros [2 ]
机构
[1] Univ Coll Dublin, Coll Business, Dublin, Ireland
[2] Huawei Technol Ltd, Ireland Res Ctr, Dublin, Ireland
关键词
Symbolic regression; Genetic Programming; Machine learning; Generalisation; Overfitting; Data-driven modelling; REGULARIZATION APPROACH; BLOAT CONTROL; PREDICTION; ENSEMBLE; RISK; GP;
D O I
10.1007/s10710-020-09391-4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Supervised learning by means of Genetic Programming (GP) aims at the evolutionary synthesis of a model that achieves a balance between approximating the target function on the training data and generalising on new data. The model space searched by the Evolutionary Algorithm is populated by compositions of primitive functions defined in a function set. Since the target function is unknown, the choice of function set's constituent elements is primarily guided by the makeup of function sets traditionally used in the GP literature. Our work builds upon previous research of the effects of protected arithmetic operators (i.e. division, logarithm, power) on the output value of an evolved model for input data points not encountered during training. The scope is to benchmark the approximation/generalisation of models evolved using different function set choices across a range of 43 symbolic regression problems. The salient outcomes are as follows. Firstly, Koza's protected operators of division and exponentiation have a detrimental effect on generalisation, and should therefore be avoided. This result is invariant of the use of moderately sized validation sets for model selection. Secondly, the performance of the recently introduced analytic quotient operator is comparable to that of the sinusoidal operator on average, with their combination being advantageous to both approximation and generalisation. These findings are consistent across two different system implementations, those of standard expression-tree GP and linear Grammatical Evolution. We highlight that this study employed very large test sets, which create confidence when benchmarking the effect of different combinations of primitive functions on model generalisation. Our aim is to encourage GP researchers and practitioners to use similar stringent means of assessing generalisation of evolved models where possible, and also to avoid certain primitive functions that are known to be inappropriate.
引用
收藏
页码:73 / 100
页数:28
相关论文
共 50 条
  • [1] Choosing function sets with better generalisation performance for symbolic regression models
    Miguel Nicolau
    Alexandros Agapitos
    [J]. Genetic Programming and Evolvable Machines, 2021, 22 : 73 - 100
  • [2] Bloat and Generalisation in Symbolic Regression
    Dick, Grant
    [J]. SIMULATED EVOLUTION AND LEARNING (SEAL 2014), 2014, 8886 : 491 - 502
  • [3] Symbolic regression for better specification
    Tsionas, Mike G.
    Assaf, A. George
    [J]. INTERNATIONAL JOURNAL OF HOSPITALITY MANAGEMENT, 2020, 91
  • [4] Generalisation and Domain Adaptation in GP with Gradient Descent for Symbolic Regression
    Chen, Qi
    Xue, Bing
    Zhang, Mengjie
    [J]. 2015 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2015, : 1137 - 1144
  • [5] Improving Generalisation of Genetic Programming for Symbolic Regression with Structural Risk Minimisation
    Chen, Qi
    Xue, Bing
    Shang, Lin
    Zhang, Mengjie
    [J]. GECCO'16: PROCEEDINGS OF THE 2016 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE, 2016, : 709 - 716
  • [6] Parametrizing GP Trees for Better Symbolic Regression Performance through Gradient Descent
    Verdu, Federico Julian Camerota
    Pietropolli, Gloria
    Manzoni, Luca
    Castelli, Mauro
    [J]. PROCEEDINGS OF THE 2023 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE COMPANION, GECCO 2023 COMPANION, 2023, : 619 - 622
  • [7] Performance of Regression Models as a Function of Experiment Noise
    Li, Gang
    Zrimec, Jan
    Ji, Boyang
    Geng, Jun
    Larsbrink, Johan
    Zelezniak, Aleksej
    Nielsen, Jens
    Engqvist, Martin K. M.
    [J]. BIOINFORMATICS AND BIOLOGY INSIGHTS, 2021, 15
  • [8] Symbolic regression of generative network models
    Menezes, Telmo
    Roth, Camille
    [J]. SCIENTIFIC REPORTS, 2014, 4
  • [9] Distilling Financial Models by Symbolic Regression
    La Malfa, Gabriele
    La Malfa, Emanuele
    Belavkin, Roman
    Pardalos, Panos M.
    Nicosia, Giuseppe
    [J]. MACHINE LEARNING, OPTIMIZATION, AND DATA SCIENCE (LOD 2021), PT II, 2022, 13164 : 502 - 517
  • [10] Symbolic regression of generative network models
    Telmo Menezes
    Camille Roth
    [J]. Scientific Reports, 4