Skip to main content

Advertisement

Log in

Solving trust region subproblems using Riemannian optimization

  • Published:
Numerische Mathematik Aims and scope Submit manuscript

Abstract

The Trust Region Subproblem is a fundamental optimization problem that takes a pivotal role in Trust Region Methods. However, the problem, and variants of it, also arise in quite a few other applications. In this article, we present a family of iterative Riemannian optimization algorithms for a variant of the Trust Region Subproblem that replaces the inequality constraint with an equality constraint, and converge to a global optimum. Our approach uses either a trivial or a non-trivial Riemannian geometry of the search-space, and requires only minimal spectral information about the quadratic component of the objective function. We further show how the theory of Riemannian optimization promotes a deeper understanding of the Trust Region Subproblem and its difficulties, e.g., a deep connection between the Trust Region Subproblem and the problem of finding affine eigenvectors, and a new examination of the so-called hard case in light of the condition number of the Riemannian Hessian operator at a global optimum. Finally, we propose to incorporate preconditioning via a careful selection of a variable Riemannian metric, and establish bounds on the asymptotic convergence rate in terms of how well the preconditioner approximates the input matrix.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. Standard in the sense that it uses the most natural choice of retraction and Riemannian metric.

  2. We caution that [22] does not use the term “affine eigenvalues”.

  3. This statement is simple corollary of Lemma 2.2 in [20].

  4. In spite of the method’s simplicity, we are not aware of any descriptions of this method earlier then Phan et al.’s (relatively) recent work.

References

  1. Absil, P.-A., Mahony, R., Sepulchre, R.: Optimization Algorithms on Matrix Manifolds. Princeton University Press, Princeton (2008)

    Book  MATH  Google Scholar 

  2. Adachi, S., Iwata, S., Nakatsukasa, Y., Takeda, A.: Solving the trust-region subproblem by a generalized eigenvalue problem. SIAM J. Optim. 27(1), 269–291 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  3. Bakshi, A., Chepurko, N., Jayaram, R.: Testing positive semi-definiteness via random submatrices. In: 61st Annual IEEE Symposium on Foundations of Computer Science (2020)

  4. Beck, A., Vaisbourd, Y.: Globally solving the trust region subproblem using simple first-order methods. SIAM J. Optim. 28(3), 1951–1967 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  5. Boumal, N.: An Introduction to Optimization on Smooth Manifolds. To appear with Cambridge University Press (2022)

  6. Boumal, N., Voroninski, V., Bandeira, A.S.: Deterministic guarantees for Burer–Monteiro factorizations of smooth semidefinite programs. Commun. Pure Appl. Math. 73(3), 581–608 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  7. Carmon, Y., Duchi, J.C.: Analysis of Krylov subspace solutions of regularized non-convex quadratic problems. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 31, pp. 10705–10715. Curran Associates Inc., Red Hook (2018)

    Google Scholar 

  8. Chambers, L.G., Fletcher, R.: Practical methods of optimization. Math. Gaz. 85(504), 562 (2001)

    Article  Google Scholar 

  9. Chapelle, O., Sindhwani, V., Keerthi, S.: Branch and bound for semi-supervised support vector machines. In: Proceedings of the 19th International Conference on Neural Information Processing Systems, NIPS’06, pp. 217–224. MIT Press, Cambridge (2006)

  10. Cucuringu, M., Tyagi, H.: Provably robust estimation of modulo 1 samples of a smooth function with applications to phase unwrapping. arXiv e-prints arXiv:1803.03669 (2018)

  11. Edelman, A., Arias, T.A., Smith, S.T.: The geometry of algorithms with orthogonality constraints. SIAM J. Matrix Anal. Appl. 20(2), 303–353 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  12. Gander, W., Golub, G.H., von Matt, U.: A constrained eigenvalue problem. Linear Algebra Appl. 114–115, 815–839 (1989). (Special Issue Dedicated to Alan J. Hoffman)

    Article  MathSciNet  MATH  Google Scholar 

  13. Golub, G.H., von Matt, U.: Quadratically constrained least squares and quadratic problems. Numer. Math. 59(1), 561–580 (1991)

    Article  MathSciNet  MATH  Google Scholar 

  14. Gould, N., Lucidi, S., Roma, M., Toint, P.: Solving the trust-region subproblem using the Lanczos method. SIAM J. Optim. 9(2), 504–525 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  15. Hager, W.W.: Minimizing a quadratic over a sphere. SIAM J. Optim. 12(1), 188–208 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  16. Han, I., Malioutov, D., Avron, H., Shin, J.: Approximating spectral sums of large-scale matrices using stochastic Chebyshev approximations. SIAM J. Sci. Comput. 39(4), A1558–A1585 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  17. Joachims, T.: Transductive learning via spectral graph partitioning. In: Proceedings of the Twentieth International Conference on International Conference on Machine Learning, ICML’03, pp. 290–297. AAAI Press (2003)

  18. Lucidi, S., Palagi, L., Roma, M.: On some properties of quadratic programs with a convex quadratic constraint. SIAM J. Optim. 8(1), 105–122 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  19. Luenberger, D.G.: Linear and nonlinear programming. Math. Comput. Simul. 28(1), 78 (1986)

    Article  Google Scholar 

  20. Martínez, J.M.: Local minimizers of quadratic functions on Euclidean balls and spheres. SIAM J. Optim. 4(1), 159–176 (1994)

    Article  MathSciNet  MATH  Google Scholar 

  21. Mishra, B., Sepulchre, R.: Riemannian preconditioning. SIAM J. Optim. 26(1), 635–660 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  22. Moré, J.J., Sorensen, D.C.: Computing a trust region step. SIAM J. Sci. Stat. Comput. 4(3), 553–572 (1983)

    Article  MathSciNet  MATH  Google Scholar 

  23. Nocedal, J., Wright, S.J.: Numerical Optimization, 2nd edn. (2006)

  24. Phan, A.-H., Yamagishi, M., Mandic, D., Cichocki, A.: Quadratic programming over ellipsoids with applications to constrained linear regression and tensor decomposition. Neural Comput. Appl. 32(11), 7097–7120 (2020)

    Article  Google Scholar 

  25. Shustin, B., Avron, H.: Randomized Riemannian preconditioning for orthogonality constrained problems (2020)

  26. Tropp, J.A., Yurtsever, A., Udell, M., Cevher, V.: Practical sketching algorithms for low-rank matrix approximation. SIAM J. Matrix Anal. Appl. 38(4), 1454–1485 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  27. Vandereycken, B., Vandewalle, S.: A Riemannian optimization approach for computing low-rank solutions of Lyapunov equations. SIAM J. Matrix Anal. Appl. 31(5), 2553–2579 (2010)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Haim Avron.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was supported by Israel Science Foundation Grant 1272/17.

Appendix A: Constructing \(\phi \)

Appendix A: Constructing \(\phi \)

We now show a simple way to construct a smooth \(\phi (\cdot )\) fulfilling both requirements stated in Items 1 and 2 in Sect. 6.

First, we choose some time parameter \(\epsilon > 0\) and set \( d:= \lambda _{\min }(\textbf{M}) - \epsilon \). We construct \(\phi (\alpha )\) to be a smoothed out over-estimation of \( \max (\alpha , -d) \). We first construct an under estimation. Let \(\gamma >1\) be another parameter, and define

$$\begin{aligned} \varphi (\alpha )=\frac{\alpha +d}{2}\left( 1-\tanh \left( -\gamma \left( \alpha +d\right) \right) \right) -d. \end{aligned}$$

Then \( \varphi (\alpha ) \) is a smooth function that approximates \(\max (\alpha , -d) \), but it is an under estimation: \(\varphi (\alpha ) \le \max (\alpha , -d) \).

While the difference between \( \max (\alpha , -d) \) and \(\varphi (\alpha )\) reduces significantly when \( \gamma \) grows, we want to make sure that our approximation is greater than or equal to the \( \max (\alpha , -d) \). To that end, let

$$\begin{aligned} \alpha _0:= - \frac{\textrm{W}\left[ 0,e^{-1}\right] +1}{2\gamma } - d, \end{aligned}$$

where \(\textrm{W}\left[ 0,\cdot \right] \) is the zero branch of the Lambert-\( \textrm{W}\) function. Now, set:

$$\begin{aligned} \phi (\alpha ):=\varphi (\alpha )-\varphi (\alpha _{0})-d. \end{aligned}$$
(A.1)

See Fig. 5 for a graphical illustration of \(\phi \). It is possible to show that

$$\begin{aligned} 0 \le \phi (\alpha ) - \max (\alpha , -\lambda _{\min }(\textbf{M})) \le \frac{\textrm{W}\big [0,e^{-1}\big ]+1}{2\gamma } \big (1 - \tanh \big (\big (\textrm{W}\big [0,e^{-1}\big ]+1\big )/2\big )\big )+ \epsilon , \end{aligned}$$
(A.2)

so Item 1 holds, and the approximation error (Item 2) is small if \(\epsilon \) is sufficiently small and \(\gamma \) is sufficiently large. The proof of Eq. (A.2) is rather technical and does not convey any additional insight on the btrs and its solution, so we omit it.

Fig. 5
figure 5

Illustration of the approximation’s behavior. Values of the variable \(\alpha \) are depicted in the x axis. Note that \( \phi (-\alpha ) > - \lambda _{\min }(\textbf{M}) \) for all \( \alpha \), while in addition \(\phi (\alpha ) \) well approximates \( \alpha \) for values of \(\alpha \ge -\lambda _{\min }(\textbf{M})\)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mor, U., Shustin, B. & Avron, H. Solving trust region subproblems using Riemannian optimization. Numer. Math. 154, 1–33 (2023). https://doi.org/10.1007/s00211-023-01360-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00211-023-01360-0

Mathematics Subject Classification

Navigation