Global stability of first-order methods for coercive tame functions

Josz, Cédric; Lai, Lexiao

doi:10.1007/s10107-023-02020-9

Global stability of first-order methods for coercive tame functions

Full Length Paper
Series A
Published: 06 October 2023

(2023)
Cite this article

Mathematical Programming Submit manuscript

638 Accesses
Explore all metrics

Abstract

We consider first-order methods with constant step size for minimizing locally Lipschitz coercive functions that are tame in an o-minimal structure on the real field. We prove that if the method is approximated by subgradient trajectories, then the iterates eventually remain in a neighborhood of a connected component of the set of critical points. Under suitable method-dependent regularity assumptions, this result applies to the subgradient method with momentum, the stochastic subgradient method with random reshuffling and momentum, and the random-permutations cyclic coordinate descent method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Algorithm 1

Lyapunov stability of the subgradient method with constant step size

Article 16 February 2023

Strong convergence of modified inertial extragradient methods for non-Lipschitz continuous variational inequalities and fixed point problems

Article 24 January 2024

Global convergence of the gradient method for functions definable in o-minimal structures

Article 23 February 2023

Notes

References

Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A., Irving, G., Isard, M., Jia, Y., Jozefowicz, R., Kaiser, L., Kudlur, M., Levenberg, J., Mané, D., Monga, R., Moore, S., Murray, D., Olah, C., Schuster, M., Shlens, J., Steiner, B., Sutskever, I., Talwar, K., Tucker, P., Vanhoucke, V., Vasudevan, V., Viégas, F., Vinyals, O., Warden, P., Wattenberg, M., Wicke, M., Yu, Y., Zheng, X.: TensorFlow: large-scale machine learning on heterogeneous systems (2015). https://www.tensorflow.org/. Software available from tensorflow.org
Attouch, H., Bolte, J., Redont, P., Soubeyran, A.: Proximal alternating minimization and projection methods for nonconvex problems: an approach based on the Kurdyka–Łojasiewicz inequality. Math. Oper. Res. 35, 438–457 (2010)
Article MathSciNet MATH Google Scholar
Aubin, J.P., Cellina, A.: Differential Inclusions: Set-Valued Maps and Viability Theory, vol. 264. Springer, Berlin (1984)
MATH Google Scholar
Beck, A.: First-Order Methods in Optimization. SIAM, Philadelphia (2017)
Book MATH Google Scholar
Beck, A., Tetruashvili, L.: On the convergence of block coordinate descent type methods. SIAM J. Optim. 23(4), 2037–2060 (2013)
Article MathSciNet MATH Google Scholar
Benaïm, M.: Recursive algorithms, urn processes and chaining number of chain recurrent sets. Ergodic Theory Dyn. Syst. 18(1), 53–87 (1998)
Article MathSciNet MATH Google Scholar
Benaïm, M.: Dynamics of stochastic approximation algorithms. In: Seminaire de probabilites XXXIII, pp. 1–68. Springer (2006)
Benaïm, M., Hofbauer, J., Sorin, S.: Stochastic approximations and differential inclusions. SIAM J. Control Optim. 44(1), 328–348 (2005)
Article MathSciNet MATH Google Scholar
Benaïm, M., Hofbauer, J., Sorin, S.: Stochastic approximations and differential inclusions, part II: applications. Math. Oper. Res. 31(4), 673–695 (2006)
Article MathSciNet MATH Google Scholar
Benveniste, A., Métivier, M., Priouret, P.: Adaptive Algorithms and Stochastic Approximations, vol. 22. Springer, Berlin (2012)
MATH Google Scholar
Bertsekas, D.: Convex Optimization Algorithms. Athena Scientific, Nashua (2015)
MATH Google Scholar
Bertsekas, D.P., et al.: Incremental gradient, subgradient, and proximal methods for convex optimization: a survey. Optim. Mach. Learn. 2010(1–38), 3 (2011)
Google Scholar
Bianchi, P., Hachem, W., Salim, A.: Constant step stochastic approximations involving differential inclusions: stability, long-run convergence and applications. Stochastics 91(2), 288–320 (2019)
Article MathSciNet MATH Google Scholar
Bianchi, P., Hachem, W., Schechtman, S.: Convergence of constant step stochastic gradient descent for non-smooth non-convex functions. Set-Valued Var. Anal. 30, 1–31 (2022)
Article MathSciNet MATH Google Scholar
Blanton, J.: Foundations of Differential Calculus. Springer, Berlin (2006)
Google Scholar
Bochnak, J., Coste, M., Roy, M.F.: Real Algebraic Geometry, vol. 36. Springer, Berlin (2013)
MATH Google Scholar
Bolte, J., Daniilidis, A., Lewis, A., Shiota, M.: Clarke subgradients of stratifiable functions. SIAM J. Optim. 18(2), 556–572 (2007)
Article MathSciNet MATH Google Scholar
Bolte, J., Pauwels, E.: Conservative set valued fields, automatic differentiation, stochastic gradient methods and deep learning. Math. Program. 188, 1–33 (2020)
Article MathSciNet MATH Google Scholar
Borkar, V.S.: Stochastic Approximation: A Dynamical Systems Viewpoint, vol. 48. Springer, Berlin (2009)
Google Scholar
Bottou, L., Curtis, F.E., Nocedal, J.: Optimization methods for large-scale machine learning. SIAM Rev. 60(2), 223–311 (2018)
Article MathSciNet MATH Google Scholar
Clarke, F.H.: Optimization and Nonsmooth Analysis. SIAM Classics in Applied Mathematics (1990)
Coddington, E.A., Levinson, N.: Theory of Ordinary Differential Equations. Tata McGraw-Hill Education, New York (1955)
MATH Google Scholar
Daniilidis, A., Drusvyatskiy, D.: Pathological subgradient dynamics. SIAM J. Optim. 30(2), 1327–1338 (2020)
Article MathSciNet MATH Google Scholar
Davis, D., Drusvyatskiy, D., Kakade, S., Lee, J.D.: Stochastic subgradient method converges on tame functions. Found. Comput. Math. 20(1), 119–154 (2020)
Article MathSciNet MATH Google Scholar
Dontchev, A., Lempio, F.: Difference methods for differential inclusions: a survey. SIAM Rev. 34(2), 263–294 (1992)
Article MathSciNet MATH Google Scholar
Drusvyatskiy, D., Ioffe, A.D., Lewis, A.S.: Curves of descent. SIAM J. Control Optim. 53(1), 114–138 (2015)
Article MathSciNet MATH Google Scholar
Duchi, J.C., Ruan, F.: Stochastic methods for composite and weakly convex optimization problems. SIAM J. Optim. 28(4), 3229–3259 (2018)
Article MathSciNet MATH Google Scholar
Ethier, S.N., Kurtz, T.G.: Markov Processes: Characterization and Convergence. Wiley, New York (2009)
MATH Google Scholar
Euler, L.: Institutiones calculi integralis, vol. 1. impensis Academiae imperialis scientiarum (1792)
Filippov, A.F.: Differential Equations with Discontinuous Righthand Sides: Control Systems, vol. 18. Springer, Berlin (2013)
Google Scholar
Ge, R., Lee, J.D., Ma, T.: Matrix Completion has No Spurious Local Minimum. NIPS (2016)
Gürbüzbalaban, M., Ozdaglar, A., Parrilo, P.A.: Why random reshuffling beats stochastic gradient descent. Math. Program. 186(1), 49–84 (2021)
Article MathSciNet MATH Google Scholar
Gürbüzbalaban, M., Ozdaglar, A., Vanli, N.D., Wright, S.J.: Randomness and permutations in coordinate descent methods. Math. Program. 181(2), 349–376 (2020)
Article MathSciNet MATH Google Scholar
Ioffe, A.D.: An invitation to tame optimization. SIAM J. Optim. 19(4), 1894–1917 (2009)
Article MathSciNet MATH Google Scholar
Kohonen, T.: An adaptive associative memory principle. IEEE Trans. Comput. 100(4), 444–445 (1974)
Article MATH Google Scholar
Kovachki, N.B., Stuart, A.M.: Continuous time analysis of momentum methods. J. Mach. Learn. Res. 22(17), 1–40 (2021)
MathSciNet MATH Google Scholar
Kurdyka, K.: On gradients of functions definable in o-minimal structures. In: Annales de l’institut Fourier, vol. 48, pp. 769–783 (1998)
Kurtz, T.G.: Solutions of ordinary differential equations as limits of pure jump Markov processes. J. Appl. Probab. 7(1), 49–58 (1970)
Article MathSciNet MATH Google Scholar
Kushner, H.: Convergence of recursive adaptive and identification procedures via weak convergence theory. IEEE Trans. Autom. Control 22(6), 921–930 (1977)
Article MathSciNet MATH Google Scholar
Kushner, H.J.: General convergence results for stochastic approximations via weak convergence theory. J. Math. Anal. Appl. 61(2), 490–503 (1977)
Article MathSciNet MATH Google Scholar
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)
Article Google Scholar
Lee, C.P., Wright, S.J.: Random permutations fix a worst case for cyclic coordinate descent. IMA J. Numer. Anal. 39(3), 1246–1275 (2019)
Article MathSciNet MATH Google Scholar
Li, X., Milzarek, A., Qiu, J.: Convergence of random reshuffling under the Kurdyka–Łojasiewicz inequality. SIAM J. Optim. 33(2), 1092–1120 (2023)
Article MathSciNet MATH Google Scholar
Li, X., Zhu, Z., So, A.M.C., Vidal, R.: Nonconvex robust low-rank matrix recovery. SIAM J. Optim. 30, 660–686 (2019)
Article MathSciNet MATH Google Scholar
Ljung, L.: Analysis of recursive stochastic algorithms. IEEE Trans. Autom. Control 22(4), 551–575 (1977)
Article MathSciNet MATH Google Scholar
Luo, Z.Q.: On the convergence of the LMS algorithm with adaptive learning rate for linear feedforward networks. Neural Comput. 3(2), 226–245 (1991)
Article Google Scholar
Luo, Z.Q., Tseng, P.: On the convergence of the coordinate descent method for convex differentiable minimization. J. Optim. Theory Appl. 72(1), 7–35 (1992)
Article MathSciNet MATH Google Scholar
Ma, J., Fattahi, S.: Global convergence of sub-gradient method for robust matrix recovery: small initialization, noisy measurements, and over-parameterization. J. Mach. Learn. Res. 24, 1–84 (2023)
MathSciNet Google Scholar
Mishchenko, K., Khaled, A., Richtárik, P.: Random reshuffling: simple analysis with vast improvements. Adv. Neural Inf. Process. Syst. 33, 17309–17320 (2020)
Google Scholar
Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM J. Optim. 12(1), 109–138 (2001)
Article MathSciNet MATH Google Scholar
Nesterov, Y.: Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM J. Optim. 22(2), 341–362 (2012)
Article MathSciNet MATH Google Scholar
Nesterov, Y.: Lectures on Convex Optimization, vol. 137. Springer, Berlin (2018)
MATH Google Scholar
Nesterov, Y.E.: A method for solving the convex programming problem with convergence rate \(O(1/k^2)\). Dokl. Akad. Nauk SSSR 269, 543–547 (1983)
MathSciNet Google Scholar
Nguyen, L.M., Tran-Dinh, Q., Phan, D.T., Nguyen, P.H., van Dijk, M.: A unified convergence analysis for shuffling-type gradient methods. J. Mach. Learn. Res. 22(207), 1–44 (2021)
MathSciNet MATH Google Scholar
Nielsen, O.A.: An Introduction to Integration and Measure Theory, vol. 17. Wiley, New York (1997)
MATH Google Scholar
Ochs, P., Chen, Y., Brox, T., Pock, T.: iPiano: inertial proximal algorithm for nonconvex optimization. SIAM J. Imaging Sci. 7(2), 1388–1419 (2014)
Article MathSciNet MATH Google Scholar
Ortega, J.M., Rheinboldt, W.C.: Iterative Solution of Nonlinear Equations in Several Variables. Academic Press, New York (1970)
MATH Google Scholar
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., et al.: Pytorch: an imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Pauwels, E.: Incremental without replacement sampling in nonconvex optimization. J. Optim. Theory Appl. 190, 1–26 (2021)
Article MathSciNet MATH Google Scholar
Pillay, A., Steinhorn, C.: Definable sets in ordered structures. I. Trans. Am. Math. Soc. 295(2), 565–592 (1986)
Article MathSciNet MATH Google Scholar
Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Comput. Math. Math. Phys. 4(5), 1–17 (1964)
Article Google Scholar
Ríos-Zertuche, R.: Examples of pathological dynamics of the subgradient method for Lipschitz path-differentiable functions. Math. Oper. Res. 47(4), 3184–3206 (2022)
Article MathSciNet MATH Google Scholar
Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat. 22, 400–407 (1951)
Article MathSciNet MATH Google Scholar
Roth, G., Sandholm, W.H.: Stochastic approximations with constant step size and differential inclusions. SIAM J. Control Optim. 51(1), 525–555 (2013)
Article MathSciNet MATH Google Scholar
Salim, A.: Random monotone operators and application to stochastic optimization. Ph.D. thesis, Université Paris-Saclay (ComUE) (2018)
Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: International Conference on Machine Learning, pp. 1139–1147. PMLR (2013)
Taubert, K.: Converging multistep methods for initial value problems involving multivalued maps. Computing 27(2), 123–136 (1981)
Article MathSciNet MATH Google Scholar
Tran, T.H., Nguyen, L.M., Tran-Dinh, Q.: SMG: a shuffling gradient-based method with momentum. In: International Conference on Machine Learning, pp. 10379–10389. PMLR (2021)
Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov accelerated shuffling gradient method for convex optimization. In: International Conference on Machine Learning, pp. 21703–21732. PMLR (2022)
van den Dries, L.: Remarks on Tarski’s problem concerning (\({\mathbb{R}}\),\(+\),\(*\), exp). In: Studies in Logic and the Foundations of Mathematics, vol. 112, pp. 97–121. Elsevier (1984)
van den Dries, L.: Tame Topology and o-minimal Structures, vol. 248. Cambridge University Press, Cambridge (1998)
Book MATH Google Scholar
van den Dries, L., Miller, C.: Geometric categories and o-minimal structures. Duke Math. J. 84(2), 497–540 (1996)
MathSciNet MATH Google Scholar
Widrow, B., Hoff, M.E.: Adaptive switching circuits. Technical report, Stanford University, CA, Stanford Electronics Labs (1960)
Wright, S.J.: Coordinate descent algorithms. Math. Program. 151(1), 3–34 (2015)
Article MathSciNet MATH Google Scholar
Zavriev, S., Kostyuk, F.: Heavy-ball method in nonconvex optimization problems. Comput. Math. Model. 4(4), 336–341 (1993)
Article MATH Google Scholar
Zhang, G., Chiu, H.M., Zhang, R.Y.: Accelerating SGD for highly ill-conditioned huge-scale online matrix completion. In: Advances in Neural Information Processing Systems. vol. 35 (2022)

Download references

Acknowledgements

We thank the reviewers and the co-editor for their valuable feedback.

Author information

Authors and Affiliations

Department of Industrial Engineering and Operations Research, Columbia University, New York, NY, USA
Cédric Josz & Lexiao Lai

Authors

Cédric Josz
View author publications
You can also search for this author in PubMed Google Scholar
Lexiao Lai
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Cédric Josz.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

We acknowledge support from NSF EPCN Grant 2023032 and ONR Grant N00014-21-1-2282.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Josz, C., Lai, L. Global stability of first-order methods for coercive tame functions. Math. Program. (2023). https://doi.org/10.1007/s10107-023-02020-9

Download citation

Received: 14 June 2022
Accepted: 27 August 2023
Published: 06 October 2023
DOI: https://doi.org/10.1007/s10107-023-02020-9

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Global stability of first-order methods for coercive tame functions

Abstract

Access this article

Similar content being viewed by others

Lyapunov stability of the subgradient method with constant step size

Strong convergence of modified inertial extragradient methods for non-Lipschitz continuous variational inequalities and fixed point problems

Global convergence of the gradient method for functions definable in o-minimal structures

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Global stability of first-order methods for coercive tame functions

Abstract

Access this article

Similar content being viewed by others

Lyapunov stability of the subgradient method with constant step size

Strong convergence of modified inertial extragradient methods for non-Lipschitz continuous variational inequalities and fixed point problems

Global convergence of the gradient method for functions definable in o-minimal structures

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation