A gradient-based bilevel optimization approach for tuning regularization hyperparameters

Sinha, Ankur; Khandait, Tanmay; Mohanty, Raja

doi:10.1007/s11590-023-02057-x

A gradient-based bilevel optimization approach for tuning regularization hyperparameters

Original paper
Published: 29 September 2023

(2023)
Cite this article

Optimization Letters Aims and scope Submit manuscript

Ankur Sinha¹,
Tanmay Khandait² &
Raja Mohanty³

237 Accesses
Explore all metrics

Abstract

Hyperparameter tuning in the area of machine learning is often achieved using naive techniques, such as random search and grid search. However, most of these methods seldom lead to an optimal set of hyperparameters and often get very expensive. The hyperparameter optimization problem is inherently a bilevel optimization task, and there exist studies that have attempted bilevel solution methodologies to solve this problem. These techniques often assume a unique set of weights that minimizes the loss on the training set. Such an assumption is violated by deep learning architectures. We propose a bilevel solution method for solving the hyperparameter optimization problem that does not suffer from the drawbacks of the earlier studies. The proposed method is general and can be easily applied to any class of machine learning algorithms that involve continuous hyperparameters. The idea is based on the approximation of the lower level optimal value function mapping that helps in reducing the bilevel problem to a single-level constrained optimization task. The single-level constrained optimization problem is then solved using the augmented Lagrangian method. We perform extensive computational study on three datasets that confirm the efficiency of the proposed method. A comparative study against grid search, random search, Tree-structured Parzen Estimator and Quasi Monte Carlo Sampler shows that the proposed algorithm is multiple times faster and leads to models that generalize better on the testing set.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Bilevel Optimization of Regularization Hyperparameters in Machine Learning

Hyperparameter Tuning and Optimization Applications

Hyperparameter Optimization

References

Bennett, K.P., Kunapuli, G., Hu, J., Pang, J.: Bilevel optimization and machine learning. In: Proceedings of the 2008 World Congress on Computational Intelligence, pp. 25–47 (2008)
Von Stackelberg, H.: The Theory of the Market Economy. Oxford University Press, Oxford (1952)
Bracken, J., McGill, J.T.: Mathematical programs with optimization problems in the constraints. Oper. Res. 21(1), 37–44 (1973)
Article MathSciNet MATH Google Scholar
Sinha, A., Malo, P., Deb, K.: A review on bilevel optimization: From classical to evolutionary approaches and applications. IEEE Trans. Evol. Comput. 22(2), 276–295 (2017)
Article Google Scholar
Dempe, S.: Foundations of Bilevel Programming. Springer, Heidelberg, Germany (2002)
MATH Google Scholar
Bard, J.F.: Practical Bilevel Optimization: Algorithms and Applications, vol. 30. Springer, Heidelberg, Germany (2013)
MATH Google Scholar
Bergstra, J., Bardenet, R., Bengio, Y., Kégl, B.: Algorithms for hyper-parameter optimization. In: Advances in Neural Information Processing Systems, pp. 2546–2554 (2011)
Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13, 281–305 (2012)
MathSciNet MATH Google Scholar
Li, L., Jamieson, K., DeSalvo, G., Rostamizadeh, A., Talwalkar, A.: Hyperband: a novel bandit-based approach to hyperparameter optimization. J. Mach. Learn. Res. 18(1), 6765–6816 (2017)
MathSciNet MATH Google Scholar
Hutter, F., Hoos, H.H., Leyton-Brown, K.: Sequential model-based optimization for general algorithm configuration. In: International Conference on Learning and Intelligent Optimization, pp. 507–523 (2011). Springer
Snoek, J., Larochelle, H., Adams, R.P.: Practical bayesian optimization of machine learning algorithms. In: Advances in Neural Information Processing Systems, pp. 2951–2959 (2012)
Snoek, J., Rippel, O., Swersky, K., Kiros, R., Satish, N., Sundaram, N., Patwary, M., Prabhat, M., Adams, R.: Scalable bayesian optimization using deep neural networks. In: International Conference on Machine Learning, pp. 2171–2180 (2015)
Maclaurin, D., Duvenaud, D., Adams, R.: Gradient-based hyperparameter optimization through reversible learning. In: International Conference on Machine Learning, pp. 2113–2122 (2015)
Franceschi, L., Frasconi, P., Salzo, S., Grazzi, R., Pontil, M.: Bilevel programming for hyperparameter optimization and meta-learning. arXiv preprint arXiv:1806.04910 (2018)
Bengio, Y.: Gradient-based optimization of hyperparameters. Neural Comput. 12(8), 1889–1900 (2000)
Article MathSciNet Google Scholar
Pedregosa, F.: Hyperparameter optimization with approximate gradient. arXiv preprint arXiv:1602.02355 (2016)
Lorraine, J., Duvenaud, D.: Stochastic hyperparameter optimization through hypernetworks. arXiv preprint arXiv:1802.09419 (2018)
MacKay, M., Vicol, P., Lorraine, J., Duvenaud, D., Grosse, R.: Self-tuning networks: Bilevel optimization of hyperparameters using structured best-response functions. arXiv preprint arXiv:1903.03088 (2019)
Mehra, A., Hamm, J.: Penalty method for inversion-free deep bilevel optimization. arXiv preprint arXiv:1911.03432 (2019)
Vicente, L., Savard, G., Júdice, J.: Descent approaches for quadratic bilevel programming. J. Optim. Theory Appl. 81, 379–399 (1994)
Article MathSciNet MATH Google Scholar
Wen, U., Hsu, S.: Linear bi-level programming problems—a review. J. Oper. Res. Soc. 42, 125–133 (1991)
MATH Google Scholar
Ben-Ayed, O.: Bilevel linear programming. Comput. Oper. Res. 20, 485–501 (1993)
Article MathSciNet MATH Google Scholar
Bard, J., Falk, J.: An explicit solution to the multi-level programming problem. Comput. Oper. Res. 9, 77–100 (1982)
Article MathSciNet Google Scholar
Fortuny-Amat, J., McCarl, B.: A representation and economic interpretation of a two-level programming problem. J. Oper. Res. Soc. 32, 783–792 (1981)
Article MathSciNet MATH Google Scholar
Tuy, H., Migdalas, A., Värbrand, P.: A global optimization approach for the linear two-level program. J. Global Optim. 3, 1–23 (1993)
Article MathSciNet MATH Google Scholar
Bard, J., Moore, J.: A branch and bound algorithm for the bilevel programming problem. SIAM J. Sci. Stat. Comput. 11, 281–292 (1990)
Article MathSciNet MATH Google Scholar
Edmunds, T., Bard, J.: Algorithms for nonlinear bilevel mathematical programming. IEEE Trans. Syst. Man Cybern. 21, 83–89 (1991)
Article Google Scholar
Al-Khayyal, F., Horst, R., Pardalos, P.: Global optimization of concave functions subject to quadratic constraints: an application in nonlinear bilevel programming. Ann. Oper. Res. 34, 125–147 (1992)
Article MathSciNet MATH Google Scholar
Yanikoglu, I., Kuhn, D.: Decision rule bounds for two-stage stochastic bilevel programs. SIAM J. Optim. 28(1), 198–222 (2018)
Article MathSciNet MATH Google Scholar
Savard, G., Gauvin, J.: The steepest descent direction for the nonlinear bilevel programming problem. Oper. Res. Lett. 15, 275–282 (1994)
Article MathSciNet MATH Google Scholar
Liu, G., Han, J., Wang, S.: A trust region algorithm for bilevel programing problems. Chin. Sci. Bull. 43(10), 820–824 (1998)
Article MATH Google Scholar
Marcotte, P., Savard, G., Zhu, D.L.: A trust region algorithm for nonlinear bilevel programming. Oper. Res. Lett. 29(4), 171–179 (2001)
Article MathSciNet MATH Google Scholar
Colson, B., Marcotte, P., Savard, G.: A trust-region method for nonlinear bilevel programming: algorithm and computational experience. Comput. Optim. Appl. 30(3), 211–227 (2005)
Article MathSciNet MATH Google Scholar
Aiyoshi, E., Shimizu, K.: Hierarchical decentralized systems and its new solution by a barrier method. IEEE Trans. Syst. Man Cybern. 6, 444–449 (1981)
MathSciNet Google Scholar
Aiyoshi, E., Shimizu, K.: A solution method for the static constrained Stackelberg problem via penalty method. IEEE Trans. Autom. Control 29, 1111–1114 (1984)
Article MathSciNet MATH Google Scholar
Ishizuka, Y., Aiyoshi, E.: Double penalty method for bilevel optimization problems. Ann. Oper. Res. 34, 73–88 (1992)
Article MathSciNet MATH Google Scholar
White, D., Anandalingam, G.: A penalty function approach for solving bi-level linear programs. J. Global Optim. 3, 397–419 (1993)
Article MathSciNet MATH Google Scholar
Mathieu, R., Pittard, L., Anandalingam, G.: Genetic algorithm based approach to bi-level linear programming. Oper. Res. 28(1), 1–21 (1994)
MathSciNet MATH Google Scholar
Yin, Y.: Genetic algorithm based approach for bilevel programming models. J. Transp. Eng. 126(2), 115–120 (2000)
Article Google Scholar
Zhu, X., Yu, Q., Wang, X.: A hybrid differential evolution algorithm for solving nonlinear bilevel programming with linear constraints. In: Cognitive Informatics, 2006. ICCI 2006. 5th IEEE International Conference On, vol. 1, pp. 126–131 (2006)
Sinha, A., Malo, P., Frantsev, A., Deb, K.: Finding optimal strategies in a multi-period multi-leader-follower stackelberg game using an evolutionary algorithm. Comput. Oper. Res. 41, 374–385 (2014)
Article MathSciNet MATH Google Scholar
Angelo, J., Krempser, E., Barbosa, H.: Differential evolution for bilevel programming. In: 2013 Congress on Evolutionary Computation (CEC-2013) (2013)
Hejazi, S.R., Memariani, A., Jahanshahloo, G., Sepehri, M.M.: Linear bilevel programming solution by genetic algorithm. Comput. Oper. Res. 29(13), 1913–1925 (2002)
Article MathSciNet MATH Google Scholar
Wang, Y., Jiao, Y., Li, H.: An evolutionary algorithm for solving nonlinear bilevel programming based on a new constraint-handling scheme. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 35(2), 221–32 (2005)
Article Google Scholar
Li, H.: A genetic algorithm using a finite search space for solving nonlinear/linear fractional bilevel programming problems. Annal. Oper. Res. 235(1), 543–58 (2015)
Article MathSciNet MATH Google Scholar
Sinha, A., Soun, T., Deb, K.: Evolutionary bilevel optimization using KKT proximity measure. In: 2017 IEEE Congress on Evolutionary Computation (CEC-2017), pp. 2412–2419 (2017)
Sinha, A., Soun, T., Deb, K.: Using karush-kuhn-tucker proximity measure for solving bilevel optimization problems. Swarm Evol. Comput. 44, 496–510 (2019)
Article Google Scholar
Angelo, J.S., Krempser, E., Barbosa, H.J.C.: Differential evolution assisted by a surrogate model for bilevel programming problems. In: 2014 IEEE Congress on Evolutionary Computation (CEC 2014), pp. 1784–1791 (2014)
Sinha, A., Malo, P., Deb, K.: Evolutionary algorithm for bilevel optimization using approximations of the lower level optimal solution mapping. Eur. J. Oper. Res. 257(2), 395–411 (2017)
Article MathSciNet MATH Google Scholar
Islam, M.M., Singh, H.K., Ray, T.: A surrogate assisted approach for single-objective bilevel optimization. IEEE Trans. Evol. Comput. 21(5), 681–696 (2017)
Article Google Scholar
Sinha, A., Lu, Z., Deb, K., Malo, P.: Bilevel optimization based on iterative approximation of multiple mappings. J. Heuristics 26(2), 151–185 (2020)
Article Google Scholar
Sinha, A., Shaikh, V.: Solving bilevel optimization problems using kriging approximations. IEEE Trans. Cybern. 52(10), 10639–10654 (2021)
Article Google Scholar
Dempe, S., Dutta, J., Mordukhovich, B.S.: New necessary optimality conditions in optimistic bilevel programming. Optimization 56(5–6), 577–604 (2007)
Article MathSciNet MATH Google Scholar
Wiesemann, W., Tsoukalas, A., Kleniati, P.-M., Rustem, B.: Pessimistic bilevel optimization. SIAM J. Optim. 23(1), 353–380 (2013)
Article MathSciNet MATH Google Scholar
Dempe, S., Mordukhovich, B.S., Zemkoho, A.B.: Necessary optimality conditions in pessimistic bilevel programming. Optimization 63(4), 505–533 (2014)
Article MathSciNet MATH Google Scholar
Sinha, A., Lu, Z., Deb, K., Malo, P.: Bilevel optimization based on iterative approximation of multiple mappings. J. Heuristics 26(2), 151–185 (2020)
Article Google Scholar
Sinha, A., Bedi, S., Deb, K.: Bilevel optimization based on kriging approximations of lower level optimal value function. In: 2018 IEEE Congress on Evolutionary Computation (CEC), pp. 1–8 (2018)
Sacks, J., Welch, W.J., Mitchell, T.J., Wynn, H.P.: Design and analysis of computer experiments. Stat. Sci. 4, 409–23 (1989)
MathSciNet MATH Google Scholar
Jones, D.R., Schonlau, M., Welch, W.J.: Efficient global optimization of expensive black-box functions. J. Global Optim. 13(4), 455–492 (1998)
Article MathSciNet MATH Google Scholar
Nocedal, J., Wright, S.J.: Penalty and Augmented Lagrangian Methods, pp. 497–528. Springer, New York, NY (2006)
UCI Machine Learning Repository: Communities and Crime Data Set. https://archive.ics.uci.edu/ml/datasets/communities+and+crime. Accessed: 2020-05-1
LeCun, Y., Cortes, C., Burges, C.: MNIST Handwritten Digit Database (2010)
Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009)
Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 2623–2631 (2019)

Download references

Author information

Authors and Affiliations

Centre for Data Science and Artificial Intelligence, Indian Institute of Management Ahmedabad, Ahmedabad, Gujarat, 380015, India
Ankur Sinha
CIDSE, Arizona State University, Tempe, AZ, 85281, USA
Tanmay Khandait
Spears School of Business, Oklahoma State University, Stillwater, OK, 74078, USA
Raja Mohanty

Authors

Ankur Sinha
View author publications
You can also search for this author in PubMed Google Scholar
Tanmay Khandait
View author publications
You can also search for this author in PubMed Google Scholar
Raja Mohanty
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ankur Sinha.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Sinha, A., Khandait, T. & Mohanty, R. A gradient-based bilevel optimization approach for tuning regularization hyperparameters. Optim Lett (2023). https://doi.org/10.1007/s11590-023-02057-x

Download citation

Received: 02 January 2022
Accepted: 24 August 2023
Published: 29 September 2023
DOI: https://doi.org/10.1007/s11590-023-02057-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A gradient-based bilevel optimization approach for tuning regularization hyperparameters

Abstract

Access this article

Similar content being viewed by others

Bilevel Optimization of Regularization Hyperparameters in Machine Learning

Hyperparameter Tuning and Optimization Applications

Hyperparameter Optimization

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A gradient-based bilevel optimization approach for tuning regularization hyperparameters

Abstract

Access this article

Similar content being viewed by others

Bilevel Optimization of Regularization Hyperparameters in Machine Learning

Hyperparameter Tuning and Optimization Applications

Hyperparameter Optimization

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation