On the Importance of Data Balancing for Symbolic Regression

被引:28
|
作者
Vladislavleva, Ekaterina [1 ]
Smits, Guido [2 ]
den Hertog, Dick [3 ]
机构
[1] Univ Antwerp, Dept Math & Comp Sci, B-2000 Antwerp, Belgium
[2] Dow Benelux BV, Core Res & Dev Dept, NL-4530 Terneuzen, Netherlands
[3] Tilburg Univ, Dept Econometr & Operat Res, Fac Econ & Business Adm, NL-5000 LE Tilburg, Netherlands
关键词
Compression; data balancing; data scoring; data weighting; fitting; genetic programming; information content; modeling; subset selection; symbolic regression; OUTLIERS;
D O I
10.1109/TEVC.2009.2029697
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Symbolic regression of input-output data conventionally treats data records equally. We suggest a framework for automatic assignment of weights to data samples, which takes into account the sample's relative importance. In this paper, we study the possibilities of improving symbolic regression on real-life data by incorporating weights into the fitness function. We introduce four weighting schemes de. ning the importance of a point relative to proximity, surrounding, remoteness, and nonlinear deviation from k nearest-in-the-input-space neighbors. For enhanced analysis and modeling of large imbalanced data sets we introduce a simple multidimensional iterative technique for subsampling. This technique allows a sensible partitioning (and compression) of data to nested subsets of an arbitrary size in such a way that the subsets are balanced with respect to either of the presented weighting schemes. For cases where a given input-output data set contains some redundancy, we suggest an approach to considerably improve the effectiveness of regression by applying more modeling effort to a smaller subset of the data set that has a similar information content. Such improvement is achieved due to better exploration of the search space of potential solutions at the same number of function evaluations. We compare different approaches to regression on five benchmark problems with a fixed budget allocation. We demonstrate that the significant improvement in the quality of the regression models can be obtained either with the weighted regression, exploratory regression using a compressed subset with a similar information content, or exploratory weighted regression on the compressed subset, which is weighted with one of the proposed weighting schemes.
引用
收藏
页码:252 / 277
页数:26
相关论文
共 50 条
  • [21] A review of "Symbolic Regression"
    Cava, William G. La
    Kronberger, Gabriel
    Burlacu, Bogdan
    Kommenda, Michael
    Winkler, Stephan M.
    Affenzeller, Michael
    GENETIC PROGRAMMING AND EVOLVABLE MACHINES, 2025, 26 (01)
  • [22] SYMBOLIC REGRESSION WITH SAMPLING
    Kommenda, Michael
    Kronberger, Gabriel K.
    Affenzeller, Michael
    Winkler, Stephan M.
    Feilmayr, Christoph
    Wagner, Stefan
    22ND EUROPEAN MODELING AND SIMULATION SYMPOSIUM (EMSS 2010), 2010, : 13 - 18
  • [23] Multiview Symbolic Regression
    Russeil, Etienne
    de Franca, Fabricio Olivetti
    Malanchev, Konstantin
    Burlacu, Bogdan
    Ishida, Emille E. O.
    Leroux, Marion
    Michelin, Clement
    Moinard, Guillaume
    Gangler, Emmanuel
    PROCEEDINGS OF THE 2024 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE, GECCO 2024, 2024, : 961 - 970
  • [24] Importance of covariance in mass balancing of particle size distribution data
    Bazin, C
    Hodouin, D
    MINERALS ENGINEERING, 2001, 14 (08) : 851 - 860
  • [25] Logistic regression-based pattern classifiers for symbolic interval data
    Renata M. C. R. de Souza
    Diego C. F. Queiroz
    Francisco José A. Cysneiros
    Pattern Analysis and Applications, 2011, 14 : 273 - 282
  • [26] The Lookup Table Regression Model for Histogram-Valued Symbolic Data
    Ichino, Manabu
    STATS, 2022, 5 (04): : 1271 - 1293
  • [27] Logistic regression-based pattern classifiers for symbolic interval data
    de Souza, Renata M. C. R.
    Queiroz, Diego C. F.
    Cysneiros, Francisco Jose A.
    PATTERN ANALYSIS AND APPLICATIONS, 2011, 14 (03) : 273 - 282
  • [28] Multi-Population Genetic Programming with Data Migration for Symbolic Regression
    Kommenda, Michael
    Affenzeller, Michael
    Kronberger, Gabriel
    Burlacu, Bogdan
    Winkler, Stephan
    COMPUTATIONAL INTELLIGENCE AND EFFICIENCY IN ENGINEERING SYSTEMS, 2015, 595 : 75 - 87
  • [29] Development of interpretable, data-driven plasticity models with symbolic regression
    Bomarito, G. F.
    Townsend, T. S.
    Stewart, K. M.
    Esham, K., V
    Emery, J. M.
    Hochhalter, J. D.
    COMPUTERS & STRUCTURES, 2021, 252
  • [30] Selecting Informative Data Samples for Model Learning Through Symbolic Regression
    Derner, Erik
    Kubalik, Jiri
    Babuska, Robert
    IEEE ACCESS, 2021, 9 : 14148 - 14158