Abstract
The Transformation-Interaction-Rational is a representation for symbolic regression that limits the search space of functions to the ratio of two nonlinear functions each one defined as the linear regression of transformed variables. This representation has the main objective to bias the search towards simpler expressions while keeping the approximation power of standard approaches. The performance of using Genetic Programming with this representation was substantially better than with its predecessor (Interaction-Transformation) and ranked close to the state-of-the-art on a contemporary Symbolic Regression benchmark. On a closer look at these results, we observed that the performance could be further improved with an additional selective pressure for smaller expressions when the dataset contains just a few data points. The introduction of a penalization term applied to the fitness measure improved the results on these smaller datasets. One problem with this approach is that it introduces two additional hyperparameters: (i) a criterion for when the penalization should be activated and, (ii) the amount of penalization to the fitness function. One possible solution to alleviate this additional burden of correctly setting these hyperparameters is to pose the search as a multi-objective optimization problem by minimizing the approximation error and the expression size. The main idea is that the selective pressure of finding non-dominating solutions will return the simplest model for each particular approximation error in the pareto front. In this paper, we extend Transformation-Interaction-Rational to support multi-objective optimization, specifically the NSGA-II algorithm, and apply that to the same benchmark. A detailed analysis of the results show that the use of multi-objective optimization benefits the overall performance on a subset of the benchmarks while keeping the results similar to the single-objective approach on the remainder of the datasets. Specifically to the small datasets, we observe a small (and statistically insignificant) improvement of the results suggesting that further strategies must be explored.
Similar content being viewed by others
References
R.E. Kass, Nonlinear regression analysis and its applications. J. Am. Stat. Assoc. 85(410), 594–596 (1990)
F.E. Harrell, Regression modeling strategies. Bios 330(2018), 14 (2017)
A. Gelman, J. Hill, A. Vehtari, Regression and Other Stories (Cambridge University Press, Cambridge, 2020)
G. Kronberger, F.O. de França, B. Burlacu, C. Haider, M. Kommenda, Shape-constrained symbolic regression-improving extrapolation with prior knowledge. Evolution. Comput. 30(1), 75–98 (2022)
C. Haider, F.O. de França, G. Kronberger, B. Burlacu, Comparing optimistic and pessimistic constraint evaluation in shape-constrained symbolic regression. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 938–945 (2022)
J.R. Koza, Genetic Programming: On the Means of Programming Computers by Means of Natural Selection. MIT Press (1992)
J.R. Koza, Genetic Programming II vol. 17. MIT press, Cambridge (1994)
R. Poli, W.B. Langdon, N.F. McPhee, J.R. Koza, A Field Guide to Genetic Programming (Lulu. com, Research Triangle Park, 2008)
F.O. de França, A greedy search tree heuristic for symbolic regression. Inf. Sci. 442–443, 18–32 (2018). https://doi.org/10.1016/j.ins.2018.02.040
G.S.I. Aldeia, F.O. de França, Lightweight symbolic regression with the interaction—transformation representation. In: 2018 IEEE Congress on Evolutionary Computation (CEC). IEEE, New York (2018). https://doi.org/10.1109%2Fcec.2018.8477951
W. La Cava, P. Orzechowski, B. Burlacu, F.O. de França, M. Virgolin, Y. Jin, M. Kommenda, J.H. Moore, Contemporary symbolic regression methods and their relative performance. In: Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks (2021). https://openreview.net/pdf?id=xVQMrDLyGst
de França, F.O., Transformation-interaction-rational representation for symbolic regression. In: Proceedings of the Genetic and Evolutionary Computation Conference. In: GECCO ’22, pp. 920–928. Association for Computing Machinery, New York, NY, USA (2022). 10.1145/3512290.3528695. https://doi.org/10.1145/3512290.3528695
D.M. Hawkins, The problem of overfitting. J. Chem. Inf. Comput. Sci. 44(1), 1–12 (2004)
M. Learning, Tom Mitchell (McGraw Hill, Publisher, 1997)
A.Y. Ng, Preventing "overfitting" of cross-validation data. In: ICML, vol. 97, pp. 245–253 (1997). Citeseer
M.J. Cavaretta,K. Chellapilla, Data mining using genetic programming: The implications of parsimony on generalization error. In: Proceedings of the 1999 Congress on Evolutionary Computation-CEC99 (Cat. No. 99TH8406), vol. 2, pp. 1330–1337. IEEE (1999)
G. Paris, D. Robilliard, C. Fonlupt, Exploring overfitting in genetic programming. In: International Conference on Artificial Evolution (Evolution Artificielle), pp. 267–277. Springer (2003)
J.R. Koza, Genetic Programming: On the Programming of Computers by Means of Natural Selection. A Bradford book. Bradford, Bradford, PA (1992). https://books.google.com.br/books?id=Bhtxo60BV0EC
W.B. Langdon, Size fair and homologous tree crossovers for tree genetic programming. Genetic Program. Evol. Mach. 1, 95–119 (2000)
T. Hastie, R. Tibshirani, J.H. Friedman, J.H. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction vol. 2. Springer, (2009)
G. Kronberger, M. Kommenda, M. Affenzeller, Overfitting detection and adaptive covariant parsimony pressure for symbolic regression. In: Proceedings of the 13th Annual Conference Companion on Genetic and Evolutionary Computation, pp. 631–638 (2011)
R. Poli, N.F. McPhee, Covariant Parsimony Pressure in Genetic Programming. Technical report, Technical Report CES-480, Department of Computing and Electronic Systems (2008)
L. Vanneschi, M. Castelli, S. Silva, Measuring bloat, overfitting and functional complexity in genetic programming. In: Proceedings of the 12th Annual Conference on Genetic and Evolutionary Computation, pp. 877–884 (2010)
Q. Chen, B. Xue, , L. Shang, M. Zhang, Improving generalisation of genetic programming for symbolic regression with structural risk minimisation. In: Proceedings of the Genetic and Evolutionary Computation Conference 2016, pp. 709–716 (2016)
G.F. Bomarito, P.E. Leser, N. Strauss, K.M. Garbrecht, J.D. Hochhalter. Bayesian model selection for reducing bloat and overfitting in genetic programming for symbolic regression. In: Proceedings of the Genetic and Evolutionary Computation Conference Companion, pp. 526–529 (2022)
M. Kommenda G. Kronberger, M. Affenzeller, S.M. Winkler, B. Burlacu, Evolving simple symbolic regression models by multi-objective genetic programming. Genetic Programming Theory and Practice XIII, 1–19 (2016)
E.D. De Jong, J.B. Pollack, Multi-objective methods for tree size control. Genet. Program. Evol. Mach. 4, 211–233 (2003)
Smits, G.F., Kotanchek, M.: Pareto-front exploitation in symbolic regression. Genetic Programming Theory and Practice II, 283–299 (2005)
Burlacu, B., Kronberger, G., Kommenda, M., Affenzeller, M.: Parsimony measures in multi-objective genetic programming for symbolic regression. In: Proceedings of the Genetic and Evolutionary Computation Conference Companion, pp. 338–339 (2019)
Kronberger, G., de França, F.O., Burlacu, B., Haider, C., Kommenda, M.: Shape-constrained symbolic regression–improving extrapolation with prior knowledge. Evolution. Comput., pp. 1–24
J. Kubalík, E. Derner, R. Babuška, Multi-objective symbolic regression for physics-aware dynamic modeling. Exp. Syst. Appl. 182, 115210 (2021)
Aldeia, G.S.I., de Franca, F.O.: A parametric study of interaction-transformation evolutionary algorithm for symbolic regression. In: 2020 IEEE Congress on Evolutionary Computation (CEC), pp. 1–8. IEEE (2020)
Udrescu, S.M., Tegmark, M.: AI Feynman: A physics-inspired method for symbolic regression. Sci. Adv. 6(16) (2020) 10.1126/sciadv.aay2631
V.-M. Taavitsainen, Ridge and pls based rational function regression. J. Chemomet. 24(11–12), 665–673 (2010)
V.-M. Taavitsainen, Rational function ridge regression in kinetic modeling: a case study. Chemomet. Intell. Lab. Syst. 120, 136–141 (2013)
Moghaddam, S.A., Mokhtarzade, M., Naeini, A.A., Moghaddama, S.A.: Statistical method to overcome overfitting issue in rational function models. Int. Arch. Photogram. Remote Sens. Spatial Inf. Sci. 42(4/W4) (2017)
de Franca, F.O.: Comparison of ols and nls to fit transformation-interaction-rational expressions. In: 2022 24th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC), pp. 319–322. IEEE (2022)
de França, F.O.: Transformation-interaction-rational representation for symbolic regression: a detailed analysis of srbench results. ACM Trans. Evol. Learn. (2023)
McConaghy, T.: Ffx: Fast, scalable, deterministic symbolic regression technology. Genetic Program. Theory Pract. IX, 235–260 (2011)
Deb, K., Agrawal, S., Pratap, A., Meyarivan, T.: A fast elitist non-dominated sorting genetic algorithm for multi-objective optimization: Nsga-ii. In: International Conference on Parallel Problem Solving from Nature, pp. 849–858. Springer (2000)
Jamieson, K., Talwalkar, A.: Non-stochastic best arm identification and hyperparameter optimization. In: Artificial Intelligence and Statistics, pp. 240–248. PMLR (2016)
Burlacu, B., Kronberger, G., Kommenda, M.: Operon c++: An efficient genetic programming framework for symbolic regression. In: Proceedings of the Genetic and Evolutionary Computation Conference Companion. GECCO ’20, pp. 1562–1570. Association for Computing Machinery, New York, NY, USA (2020). 10.1145/3377929.3398099. https://doi.org/10.1145/3377929.3398099
M. Kommenda, B. Burlacu, G. Kronberger, M. Affenzeller, Parameter identification for symbolic regression using nonlinear least squares. Genet. Program. Evol. Mach. 21(3), 471–501 (2019). https://doi.org/10.1007/s10710-019-09371-3
Acknowledgements
This project is funded by Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP), Grant Number 2021/12706-1 and CNPq through the Grant 301596/2022-0.
Author information
Authors and Affiliations
Contributions
FOF wrote the main manuscript, implemented the algorithm, executed the experiments, and prepared all figures and tables.
Corresponding author
Ethics declarations
Conflict of interest
The authors have no conflicts of interest to declare that are relevant to the content of this article.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
de França, F.O. Alleviating overfitting in transformation-interaction-rational symbolic regression with multi-objective optimization. Genet Program Evolvable Mach 24, 13 (2023). https://doi.org/10.1007/s10710-023-09461-3
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10710-023-09461-3