Skip to main content
Log in

A gradient-based bilevel optimization approach for tuning regularization hyperparameters

  • Original paper
  • Published:
Optimization Letters Aims and scope Submit manuscript

Abstract

Hyperparameter tuning in the area of machine learning is often achieved using naive techniques, such as random search and grid search. However, most of these methods seldom lead to an optimal set of hyperparameters and often get very expensive. The hyperparameter optimization problem is inherently a bilevel optimization task, and there exist studies that have attempted bilevel solution methodologies to solve this problem. These techniques often assume a unique set of weights that minimizes the loss on the training set. Such an assumption is violated by deep learning architectures. We propose a bilevel solution method for solving the hyperparameter optimization problem that does not suffer from the drawbacks of the earlier studies. The proposed method is general and can be easily applied to any class of machine learning algorithms that involve continuous hyperparameters. The idea is based on the approximation of the lower level optimal value function mapping that helps in reducing the bilevel problem to a single-level constrained optimization task. The single-level constrained optimization problem is then solved using the augmented Lagrangian method. We perform extensive computational study on three datasets that confirm the efficiency of the proposed method. A comparative study against grid search, random search, Tree-structured Parzen Estimator and Quasi Monte Carlo Sampler shows that the proposed algorithm is multiple times faster and leads to models that generalize better on the testing set.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Bennett, K.P., Kunapuli, G., Hu, J., Pang, J.: Bilevel optimization and machine learning. In: Proceedings of the 2008 World Congress on Computational Intelligence, pp. 25–47 (2008)

  2. Von Stackelberg, H.: The Theory of the Market Economy. Oxford University Press, Oxford (1952)

  3. Bracken, J., McGill, J.T.: Mathematical programs with optimization problems in the constraints. Oper. Res. 21(1), 37–44 (1973)

    Article  MathSciNet  MATH  Google Scholar 

  4. Sinha, A., Malo, P., Deb, K.: A review on bilevel optimization: From classical to evolutionary approaches and applications. IEEE Trans. Evol. Comput. 22(2), 276–295 (2017)

    Article  Google Scholar 

  5. Dempe, S.: Foundations of Bilevel Programming. Springer, Heidelberg, Germany (2002)

    MATH  Google Scholar 

  6. Bard, J.F.: Practical Bilevel Optimization: Algorithms and Applications, vol. 30. Springer, Heidelberg, Germany (2013)

    MATH  Google Scholar 

  7. Bergstra, J., Bardenet, R., Bengio, Y., Kégl, B.: Algorithms for hyper-parameter optimization. In: Advances in Neural Information Processing Systems, pp. 2546–2554 (2011)

  8. Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13, 281–305 (2012)

    MathSciNet  MATH  Google Scholar 

  9. Li, L., Jamieson, K., DeSalvo, G., Rostamizadeh, A., Talwalkar, A.: Hyperband: a novel bandit-based approach to hyperparameter optimization. J. Mach. Learn. Res. 18(1), 6765–6816 (2017)

    MathSciNet  MATH  Google Scholar 

  10. Hutter, F., Hoos, H.H., Leyton-Brown, K.: Sequential model-based optimization for general algorithm configuration. In: International Conference on Learning and Intelligent Optimization, pp. 507–523 (2011). Springer

  11. Snoek, J., Larochelle, H., Adams, R.P.: Practical bayesian optimization of machine learning algorithms. In: Advances in Neural Information Processing Systems, pp. 2951–2959 (2012)

  12. Snoek, J., Rippel, O., Swersky, K., Kiros, R., Satish, N., Sundaram, N., Patwary, M., Prabhat, M., Adams, R.: Scalable bayesian optimization using deep neural networks. In: International Conference on Machine Learning, pp. 2171–2180 (2015)

  13. Maclaurin, D., Duvenaud, D., Adams, R.: Gradient-based hyperparameter optimization through reversible learning. In: International Conference on Machine Learning, pp. 2113–2122 (2015)

  14. Franceschi, L., Frasconi, P., Salzo, S., Grazzi, R., Pontil, M.: Bilevel programming for hyperparameter optimization and meta-learning. arXiv preprint arXiv:1806.04910 (2018)

  15. Bengio, Y.: Gradient-based optimization of hyperparameters. Neural Comput. 12(8), 1889–1900 (2000)

    Article  MathSciNet  Google Scholar 

  16. Pedregosa, F.: Hyperparameter optimization with approximate gradient. arXiv preprint arXiv:1602.02355 (2016)

  17. Lorraine, J., Duvenaud, D.: Stochastic hyperparameter optimization through hypernetworks. arXiv preprint arXiv:1802.09419 (2018)

  18. MacKay, M., Vicol, P., Lorraine, J., Duvenaud, D., Grosse, R.: Self-tuning networks: Bilevel optimization of hyperparameters using structured best-response functions. arXiv preprint arXiv:1903.03088 (2019)

  19. Mehra, A., Hamm, J.: Penalty method for inversion-free deep bilevel optimization. arXiv preprint arXiv:1911.03432 (2019)

  20. Vicente, L., Savard, G., Júdice, J.: Descent approaches for quadratic bilevel programming. J. Optim. Theory Appl. 81, 379–399 (1994)

    Article  MathSciNet  MATH  Google Scholar 

  21. Wen, U., Hsu, S.: Linear bi-level programming problems—a review. J. Oper. Res. Soc. 42, 125–133 (1991)

    MATH  Google Scholar 

  22. Ben-Ayed, O.: Bilevel linear programming. Comput. Oper. Res. 20, 485–501 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  23. Bard, J., Falk, J.: An explicit solution to the multi-level programming problem. Comput. Oper. Res. 9, 77–100 (1982)

    Article  MathSciNet  Google Scholar 

  24. Fortuny-Amat, J., McCarl, B.: A representation and economic interpretation of a two-level programming problem. J. Oper. Res. Soc. 32, 783–792 (1981)

    Article  MathSciNet  MATH  Google Scholar 

  25. Tuy, H., Migdalas, A., Värbrand, P.: A global optimization approach for the linear two-level program. J. Global Optim. 3, 1–23 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  26. Bard, J., Moore, J.: A branch and bound algorithm for the bilevel programming problem. SIAM J. Sci. Stat. Comput. 11, 281–292 (1990)

    Article  MathSciNet  MATH  Google Scholar 

  27. Edmunds, T., Bard, J.: Algorithms for nonlinear bilevel mathematical programming. IEEE Trans. Syst. Man Cybern. 21, 83–89 (1991)

    Article  Google Scholar 

  28. Al-Khayyal, F., Horst, R., Pardalos, P.: Global optimization of concave functions subject to quadratic constraints: an application in nonlinear bilevel programming. Ann. Oper. Res. 34, 125–147 (1992)

    Article  MathSciNet  MATH  Google Scholar 

  29. Yanikoglu, I., Kuhn, D.: Decision rule bounds for two-stage stochastic bilevel programs. SIAM J. Optim. 28(1), 198–222 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  30. Savard, G., Gauvin, J.: The steepest descent direction for the nonlinear bilevel programming problem. Oper. Res. Lett. 15, 275–282 (1994)

    Article  MathSciNet  MATH  Google Scholar 

  31. Liu, G., Han, J., Wang, S.: A trust region algorithm for bilevel programing problems. Chin. Sci. Bull. 43(10), 820–824 (1998)

    Article  MATH  Google Scholar 

  32. Marcotte, P., Savard, G., Zhu, D.L.: A trust region algorithm for nonlinear bilevel programming. Oper. Res. Lett. 29(4), 171–179 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  33. Colson, B., Marcotte, P., Savard, G.: A trust-region method for nonlinear bilevel programming: algorithm and computational experience. Comput. Optim. Appl. 30(3), 211–227 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  34. Aiyoshi, E., Shimizu, K.: Hierarchical decentralized systems and its new solution by a barrier method. IEEE Trans. Syst. Man Cybern. 6, 444–449 (1981)

    MathSciNet  Google Scholar 

  35. Aiyoshi, E., Shimizu, K.: A solution method for the static constrained Stackelberg problem via penalty method. IEEE Trans. Autom. Control 29, 1111–1114 (1984)

    Article  MathSciNet  MATH  Google Scholar 

  36. Ishizuka, Y., Aiyoshi, E.: Double penalty method for bilevel optimization problems. Ann. Oper. Res. 34, 73–88 (1992)

    Article  MathSciNet  MATH  Google Scholar 

  37. White, D., Anandalingam, G.: A penalty function approach for solving bi-level linear programs. J. Global Optim. 3, 397–419 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  38. Mathieu, R., Pittard, L., Anandalingam, G.: Genetic algorithm based approach to bi-level linear programming. Oper. Res. 28(1), 1–21 (1994)

    MathSciNet  MATH  Google Scholar 

  39. Yin, Y.: Genetic algorithm based approach for bilevel programming models. J. Transp. Eng. 126(2), 115–120 (2000)

    Article  Google Scholar 

  40. Zhu, X., Yu, Q., Wang, X.: A hybrid differential evolution algorithm for solving nonlinear bilevel programming with linear constraints. In: Cognitive Informatics, 2006. ICCI 2006. 5th IEEE International Conference On, vol. 1, pp. 126–131 (2006)

  41. Sinha, A., Malo, P., Frantsev, A., Deb, K.: Finding optimal strategies in a multi-period multi-leader-follower stackelberg game using an evolutionary algorithm. Comput. Oper. Res. 41, 374–385 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  42. Angelo, J., Krempser, E., Barbosa, H.: Differential evolution for bilevel programming. In: 2013 Congress on Evolutionary Computation (CEC-2013) (2013)

  43. Hejazi, S.R., Memariani, A., Jahanshahloo, G., Sepehri, M.M.: Linear bilevel programming solution by genetic algorithm. Comput. Oper. Res. 29(13), 1913–1925 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  44. Wang, Y., Jiao, Y., Li, H.: An evolutionary algorithm for solving nonlinear bilevel programming based on a new constraint-handling scheme. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 35(2), 221–32 (2005)

    Article  Google Scholar 

  45. Li, H.: A genetic algorithm using a finite search space for solving nonlinear/linear fractional bilevel programming problems. Annal. Oper. Res. 235(1), 543–58 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  46. Sinha, A., Soun, T., Deb, K.: Evolutionary bilevel optimization using KKT proximity measure. In: 2017 IEEE Congress on Evolutionary Computation (CEC-2017), pp. 2412–2419 (2017)

  47. Sinha, A., Soun, T., Deb, K.: Using karush-kuhn-tucker proximity measure for solving bilevel optimization problems. Swarm Evol. Comput. 44, 496–510 (2019)

    Article  Google Scholar 

  48. Angelo, J.S., Krempser, E., Barbosa, H.J.C.: Differential evolution assisted by a surrogate model for bilevel programming problems. In: 2014 IEEE Congress on Evolutionary Computation (CEC 2014), pp. 1784–1791 (2014)

  49. Sinha, A., Malo, P., Deb, K.: Evolutionary algorithm for bilevel optimization using approximations of the lower level optimal solution mapping. Eur. J. Oper. Res. 257(2), 395–411 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  50. Islam, M.M., Singh, H.K., Ray, T.: A surrogate assisted approach for single-objective bilevel optimization. IEEE Trans. Evol. Comput. 21(5), 681–696 (2017)

    Article  Google Scholar 

  51. Sinha, A., Lu, Z., Deb, K., Malo, P.: Bilevel optimization based on iterative approximation of multiple mappings. J. Heuristics 26(2), 151–185 (2020)

    Article  Google Scholar 

  52. Sinha, A., Shaikh, V.: Solving bilevel optimization problems using kriging approximations. IEEE Trans. Cybern. 52(10), 10639–10654 (2021)

    Article  Google Scholar 

  53. Dempe, S., Dutta, J., Mordukhovich, B.S.: New necessary optimality conditions in optimistic bilevel programming. Optimization 56(5–6), 577–604 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  54. Wiesemann, W., Tsoukalas, A., Kleniati, P.-M., Rustem, B.: Pessimistic bilevel optimization. SIAM J. Optim. 23(1), 353–380 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  55. Dempe, S., Mordukhovich, B.S., Zemkoho, A.B.: Necessary optimality conditions in pessimistic bilevel programming. Optimization 63(4), 505–533 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  56. Sinha, A., Lu, Z., Deb, K., Malo, P.: Bilevel optimization based on iterative approximation of multiple mappings. J. Heuristics 26(2), 151–185 (2020)

    Article  Google Scholar 

  57. Sinha, A., Bedi, S., Deb, K.: Bilevel optimization based on kriging approximations of lower level optimal value function. In: 2018 IEEE Congress on Evolutionary Computation (CEC), pp. 1–8 (2018)

  58. Sacks, J., Welch, W.J., Mitchell, T.J., Wynn, H.P.: Design and analysis of computer experiments. Stat. Sci. 4, 409–23 (1989)

    MathSciNet  MATH  Google Scholar 

  59. Jones, D.R., Schonlau, M., Welch, W.J.: Efficient global optimization of expensive black-box functions. J. Global Optim. 13(4), 455–492 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  60. Nocedal, J., Wright, S.J.: Penalty and Augmented Lagrangian Methods, pp. 497–528. Springer, New York, NY (2006)

  61. UCI Machine Learning Repository: Communities and Crime Data Set. https://archive.ics.uci.edu/ml/datasets/communities+and+crime. Accessed: 2020-05-1

  62. LeCun, Y., Cortes, C., Burges, C.: MNIST Handwritten Digit Database (2010)

  63. Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009)

  64. Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 2623–2631 (2019)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ankur Sinha.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sinha, A., Khandait, T. & Mohanty, R. A gradient-based bilevel optimization approach for tuning regularization hyperparameters. Optim Lett (2023). https://doi.org/10.1007/s11590-023-02057-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11590-023-02057-x

Keywords

Navigation