Skip to main content

Advertisement

Log in

A geometric semantic macro-crossover operator for evolutionary feature construction in regression

  • Published:
Genetic Programming and Evolvable Machines Aims and scope Submit manuscript

Abstract

Evolutionary feature construction has been successfully applied to various scenarios. In particular, multi-tree genetic programming-based feature construction methods have demonstrated promising results. However, existing crossover operators in multi-tree genetic programming mainly focus on exchanging genetic materials between two trees, neglecting the interaction between multi-trees within an individual. To increase search effectiveness, we take inspiration from the geometric semantic crossover operator used in single-tree genetic programming and propose a macro geometric semantic crossover operator for multi-tree genetic programming. This operator is designed for feature construction, with the goal of generating offspring containing informative and complementary features. Our experiments on 98 regression datasets show that the proposed geometric semantic macro-crossover operator significantly improves the predictive performance of the constructed features. Moreover, experiments conducted on a state-of-the-art regression benchmark demonstrate that multi-tree genetic programming with the geometric semantic macro-crossover operator can significantly outperform all 22 machine learning algorithms on the benchmark.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. Source code: https://tinyurl.com/MAPMX-GPFC

References

  1. H. Zhang, A. Zhou, H. Zhang, An evolutionary forest for regression. IEEE Trans. Evol. Comput. 26(4), 735–749 (2022)

    Article  MathSciNet  Google Scholar 

  2. B. Tran, B. Xue, M. Zhang, Genetic programming for multiple-feature construction on high-dimensional classification. Pattern Recogn. 93, 404–417 (2019)

    Article  Google Scholar 

  3. A. Lensen, B. Xue, M. Zhang, Genetic programming for evolving similarity functions for clustering: Representations and analysis. Evol. Comput. 28(4), 531–561 (2020)

    Article  Google Scholar 

  4. A. Lensen, M. Zhang, B. Xue, Multi-objective genetic programming for manifold learning: balancing quality and dimensionality. Genet. Program. Evolvable Mach. 21(3), 399–431 (2020)

    Article  Google Scholar 

  5. W. La Cava, J.H. Moore, Learning feature spaces for regression with genetic programming. Genet. Program. Evolvable Mach. 21, 433–467 (2020)

    Article  Google Scholar 

  6. J.R. Koza, Genetic programming as a means for programming computers by natural selection. Stat. Comput. 4(2), 87–112 (1994)

    Article  Google Scholar 

  7. H. Zhang, A. Zhou, H. Qian, H. Zhang, PS-Tree: a piecewise symbolic regression tree. Swarm Evol. Comput. 71, 101061 (2022)

    Article  Google Scholar 

  8. L. Vanneschi, M. Castelli, S. Silva, A survey of semantic methods in genetic programming. Genet. Program Evolvable Mach. 15, 195–214 (2014)

    Article  Google Scholar 

  9. A. Moraglio, K. Krawiec, C.G. Johnson, Geometric semantic genetic programming. In: International Conference on Parallel Problem Solving from Nature. pp. 21–31. Springer (2012)

  10. L. Vanneschi, M. Castelli, L. Manzoni, S. Silva, A new implementation of geometric semantic GP and its application to problems in pharmacokinetics. In: Genetic Programming: 16th European Conference, EuroGP 2013, Vienna, Austria, April 3-5, 2013. Proceedings 16. pp. 205–216. Springer (2013)

  11. M. Castelli, S. Silva, L. Vanneschi, A c++ framework for geometric semantic genetic programming. Genet. Program. Evolvable Mach. 16, 73–81 (2015)

    Article  Google Scholar 

  12. J.F.B. Martins, L.O.V. Oliveira, L.F. Miranda, F. Casadei, G.L. Pappa, Solving the exponential growth of symbolic regression trees in geometric semantic genetic programming. In: Proceedings of the Genetic and Evolutionary Computation Conference. pp. 1151–1158 (2018)

  13. K. Krawiec, T. Pawlak, Approximating geometric crossover by semantic backpropagation. In: Proceedings of the 15th Annual Conference on Genetic and Evolutionary Computation. pp. 941–948 (2013)

  14. K. Krawiec, T. Pawlak, Locally geometric semantic crossover: a study on the roles of semantics and homology in recombination operators. Genet. Program. Evolvable Mach. 14, 31–63 (2013)

    Article  Google Scholar 

  15. T.P. Pawlak, B. Wieloch, K. Krawiec, Semantic backpropagation for designing search operators in genetic programming. IEEE Trans. Evol. Comput. 19(3), 326–340 (2014)

    Article  Google Scholar 

  16. Q. Chen, B. Xue, M. Zhang, Improving generalization of genetic programming for symbolic regression with angle-driven geometric semantic operators. IEEE Trans. Evol. Comput. 23(3), 488–502 (2018)

    Article  Google Scholar 

  17. T.P. Pawlak, B. Wieloch, K. Krawiec, Review and comparative analysis of geometric semantic crossovers. Genet. Program. Evolvable Mach. 16, 351–386 (2015)

    Article  Google Scholar 

  18. Q.U. Nguyen, T.A. Pham, X.H. Nguyen, J. McDermott, Subtree semantic geometric crossover for genetic programming. Genet. Program. Evolvable Mach. 17, 25–53 (2016)

    Article  Google Scholar 

  19. M. Castelli, L. Manzoni, L. Vanneschi, S. Silva, A. Popovič, Self-tuning geometric semantic genetic programming. Genet. Program. Evolvable Mach. 17, 55–74 (2016)

    Article  Google Scholar 

  20. M. Castelli, L. Vanneschi, L. Manzoni, A. Popovič, Semantic genetic programming for fast and accurate data knowledge discovery. Swarm Evol. Comput. 26, 1–7 (2016)

    Article  Google Scholar 

  21. I. Bakurov, M. Castelli, F. Fontanella, A.S. di Freca, L. Vanneschi, A novel binary classification approach based on geometric semantic genetic programming. Swarm Evol. Comput. 69, 101028 (2022)

    Article  Google Scholar 

  22. W. La Cava, T.R. Singh, J. Taggart, S. Suri, J.H. Moore, Learning concise representations for regression by evolving networks of trees. In: International Conference on Learning Representations (2018)

  23. L. Muñoz, L. Trujillo, S. Silva, M. Castelli, L. Vanneschi, Evolving multidimensional transformations for symbolic regression with M3GP. Memetic Comput. 11, 111–126 (2019)

    Article  Google Scholar 

  24. B. Al-Helali, Q. Chen, B. Xue, M. Zhang, Multitree genetic programming with new operators for transfer learning in symbolic regression with incomplete data. IEEE Trans. Evol. Comput. 25(6), 1049–1063 (2021)

    Article  Google Scholar 

  25. S. Nguyen, D. Thiruvady, M. Zhang, D. Alahakoon, Automated design of multipass heuristics for resource-constrained job scheduling with self-competitive genetic programming. IEEE Trans. Cybern. 52(9), 8603–8616 (2021)

    Article  Google Scholar 

  26. K. Krawiec, Genetic programming-based construction of features for machine learning and knowledge discovery tasks. Genet. Program. Evolvable Mach. 3, 329–343 (2002)

    Article  MATH  Google Scholar 

  27. K. Neshatian, M. Zhang, P. Andreae, A filter approach to multiple feature construction for symbolic learning classifiers using genetic programming. IEEE Trans. Evol. Comput. 16(5), 645–661 (2012)

    Article  Google Scholar 

  28. K. Nag, N.R. Pal, Feature extraction and selection for parsimonious classifiers with multiobjective genetic programming. IEEE Trans. Evol. Comput. 24(3), 454–466 (2019)

    Google Scholar 

  29. M. Muharram, G.D. Smith, Evolutionary constructive induction. IEEE Trans. Knowl. Data Eng. 17(11), 1518–1528 (2005)

    Article  Google Scholar 

  30. I. Arnaldo, U.M. O’Reilly, K. Veeramachaneni, Building predictive models via feature synthesis. In: Proceedings of the 2015 Annual Conference on Genetic and Evolutionary Computation. pp. 983–990 (2015)

  31. J. Ma, X. Gao, A filter-based feature construction and feature selection approach for classification using genetic programming. Knowl.-Based Syst. 196, 105806 (2020)

    Article  Google Scholar 

  32. Y. Bi, B. Xue, M. Zhang, Genetic programming with a new representation to automatically learn features and evolve ensembles for image classification. IEEE Trans. Cybern. 51(4), 1769–1783 (2020)

    Article  Google Scholar 

  33. H. Zhang, A. Zhou, Q. Chen, B. Xue, M. Zhang, SR-Forest: a genetic programming based heterogeneous ensemble learning method. IEEE Trans. Evol. Comput. https://doi.org/10.1109/TEVC.2023.3243172 (2023)

  34. Q. Chen, M. Zhang, B. Xue, Genetic programming with embedded feature construction for high-dimensional symbolic regression. In: Intelligent and Evolutionary Systems: The 20th Asia Pacific Symposium, IES 2016, Canberra, Australia, November 2016, Proceedings. pp. 87–102. Springer (2017)

  35. W. La Cava, L. Spector, K. Danai, Epsilon-lexicase selection for regression. In: Proceedings of the Genetic and Evolutionary Computation Conference 2016. pp. 741–748 (2016)

  36. W. La Cava, T. Helmuth, L. Spector, J.H. Moore, A probabilistic and multi-objective analysis of lexicase selection and \(\varepsilon\)-lexicase selection. Evol. Comput. 27(3), 377–402 (2019)

    Article  Google Scholar 

  37. J.B. Mouret, J. Clune, Illuminating search spaces by mapping elites. arXiv preprint arXiv:1504.04909 (2015)

  38. A. Cully, J. Clune, D. Tarapore, J.B. Mouret, Robots that can adapt like animals. Nature 521(7553), 503–507 (2015)

    Article  Google Scholar 

  39. H. Zhang, Q. Chen, A. Tonda, B. Xue, W. Banzhaf, M. Zhang, MAP-Elites with cosine-similarity for evolutionary ensemble learning. In: Genetic Programming: 26th European Conference, EuroGP 2023, Held as Part of EvoStar 2023, Brno, Czech Republic, April 12–14, 2023, Proceedings. pp. 84–100. Springer (2023)

  40. J.P. Aumasson, D.J. Bernstein, Siphash: a fast short-input prf. In: Progress in Cryptology-INDOCRYPT 2012: 13th International Conference on Cryptology in India, Kolkata, India, December 9-12, 2012. Proceedings 13. pp. 489–508. Springer (2012)

  41. J.D. Romano, T.T. Le, W. La Cava, J.T. Gregg, D.J. Goldberg, P. Chakraborty, N.L. Ray, D. Himmelstein, W. Fu, J.H. Moore, PMLB v1.0: an open-source dataset collection for benchmarking machine learning methods. Bioinformatics 38(3), 878–880 (2022)

    Article  Google Scholar 

  42. J. Ni, R.H. Drieberg, P.I. Rockett, The use of an analytic quotient operator in genetic programming. IEEE Trans. Evol. Comput. 17(1), 146–152 (2012)

    Article  Google Scholar 

  43. N.F. McPhee, M.K. Dramdahl, D. Donatucci, Impact of crossover bias in genetic programming. In: Proceedings of the 2015 Annual Conference on Genetic and Evolutionary Computation. pp. 1079–1086 (2015)

  44. F. Ramsey, D. Schafer, The statistical sleuth: a course in methods of data analysis. Cengage Learning (2012)

  45. Q.U. Nguyen, T.H. Chu, Semantic approximation for reducing code bloat in genetic programming. Swarm Evol. Comput. 58, 100729 (2020)

    Article  Google Scholar 

Download references

Acknowledgements

The authors would like to acknowledge the assistance of the volunteer evaluators and the helpful comments of the reviewers, which have significantly improved the paper.

Funding

This work was supported in part by the Marsden Fund of New Zealand Government under Contracts VUW1913, VUW1914, VUW2016, MBIE Data Science SSIF Fund under the contract RTVU1914, Huayin Medical under grant E3791/4165, and MBIE Endeavor Research Programme under contracts C11X2001 and UOCX2104.

Author information

Authors and Affiliations

Authors

Contributions

Hengzhe Zhang, Qi Chen, and Mengjie Zhang designed the algorithm and experimental protocol. Hengzhe Zhang implemented the code and conducted the experiments. All authors analyzed the results. Hengzhe Zhang drafted the paper, and all authors edited the manuscript.

Corresponding author

Correspondence to Qi Chen.

Ethics declarations

Conflict of interest

The authors are not aware of any competing interests.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (pdf 129 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, H., Chen, Q., Xue, B. et al. A geometric semantic macro-crossover operator for evolutionary feature construction in regression. Genet Program Evolvable Mach 25, 2 (2024). https://doi.org/10.1007/s10710-023-09465-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10710-023-09465-z

Keywords

Navigation