Abstract
This work presents a universal accelerated primal–dual method for affinely constrained convex optimization problems. It can handle both Lipschitz and Hölder gradients but does not need to know the smoothness level of the objective function. In line search part, it uses dynamically decreasing parameters and produces approximate Lipschitz constant with moderate magnitude. In addition, based on a suitable discrete Lyapunov function and tight decay estimates of some differential/difference inequalities, a universal optimal mixed-type convergence rate is established. Some numerical tests are provided to confirm the efficiency of the proposed method.
Similar content being viewed by others
References
Bauschke, H., Combettes, P.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces. CMS Books in Mathematics. Springer Science+Business Media, New York (2011)
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)
Blanchard, P., Higham, D.J., Higham, N.J.: Accurately computing the log-sum-exp and softmax functions. IMA J. Numer. Anal. 41(4), 2311–2330 (2021)
Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2010)
Cai, J.-F., Osher, S., Shen, Z.: Linearized Bregman iterations for compressed sensing. Math. Comput. 78(267), 1515–1536 (2009)
Candès, E., Romberg, J., Tao, T.: Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Trans. Inf. Theory 52(2), 489–509 (2006)
Chambolle, A., Pock, T.: A first-order primal–dual algorithm for convex problems with applications to imaging. J. Math. Imaging Vis. 40(1), 120–145 (2011)
Chambolle, A., Pock, T.: An introduction to continuous optimization for imaging. Acta Numer. 25, 161–319 (2016)
Chen, G., Teboulle, M.: Convergence analysis of a proximal-like minimization algorithm using Bregman functions. SIAM J. Optim. 3(3), 538–543 (1993)
Chen, L., Luo, H.: First order optimization methods based on Hessian-driven Nesterov accelerated gradient flow. arXiv:1912.09276 (2019)
Chen, L., Luo, H.: A unified convergence analysis of first order convex optimization methods via strong Lyapunov functions. arXiv: 2108.00132 (2021)
Chen, Y., Lan, G., Ouyang, Y.: Optimal primal–dual methods for a class of saddle point problems. SIAM J. Optim. 24(4), 1779–1814 (2014)
Davis, D., Yin, W.: Convergence rate analysis of several splitting schemes. Splitting Methods in Communication, Imaging, Science, and Engineering, pages 115–163 (2016)
Demengel, F., Demengel, G., Erné, R.: Functional Spaces for the Theory of Elliptic Partial Differential Equations. Universitext. Springer, London (2012)
Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Math. Program. 146(1–2), 37–75 (2014)
Dvurechensky, P, Gasnikov, A., Kroshnin, A.: Computational optimal transport: complexity by accelerated gradient descent is better than by Sinkhorn’s algorithm. In: Proceedings of the 35 th International Conference on Machine Learning, volume 80, Stockholm, Sweden (2018). PMLR
Dvurechensky, P., Staudigl, M., Shtern, S.: First-order methods for convex optimization. arXiv:2101.00935 (2021)
Eckstein, J.: Splitting Methods for Monotone Operators with Applications to Parallel Optimization. PhD Thesis, Massachusetts Institute of Technology (1989)
Eckstein, J., Bertsekas, D.P.: On the Douglas–Rachford splitting method and the proximal point algorithm for maximal monotone operators. Math. Program. 55(1), 293–318 (1992)
Esser, E., Zhang, X., Chan, T.F.: A general framework for a class of first order primal-dual algorithms for convex optimization in imaging science. SIAM J. Imaging Sci. 3(4), 1015–1046 (2010)
Feijer, D., Paganini, F.: Stability of primal–dual gradient dynamics and applications to network optimization. Automatica 46(12), 1974–1981 (2010)
Goldstein, T., O’Donoghue, B., Setzer, S., Baraniuk, R.: Fast alternating direction optimization methods. SIAM J. Imaging Sci. 7(3), 1588–1623 (2014)
Guminov, S., Gasnikov, A., Anikin, A., Gornov, A.: A universal modification of the linear coupling method. Optim. Methods Softw. 34(3), 560–577 (2019)
Guminov, S.V., Nesterov, Y.E., Dvurechensky, P.E., Gasnikov, A.V.: Primal–dual accelerated gradient descent with line search for convex and nonconvex optimization problems. arXiv:1809.05895 (2018)
He, B., You, Y., Yuan, X.: On the convergence of primal-dual hybrid gradient algorithm. SIAM J. Imaging Sci. 7(4), 2526–2537 (2014)
He, B., Yuan, X.: On the acceleration of augmented Lagrangian method for linearly constrained optimization. https://optimization-online.org/2010/10/2760/ (2010)
He, X., Hu, R., Fang, Y.-P.: Fast primal–dual algorithm via dynamical system for a linearly constrained convex optimization problem. Automatica 146, 110547 (2022)
He, X., Hu, R., Fang, Y.-P.: Inertial accelerated primal–dual methods for linear equality constrained convex optimization problems. Numer. Algorithms 90(4), 1669–1690 (2022)
Huang, B., Ma, S., Goldfarb, D.: Accelerated linearized Bregman method. J. Sci. Comput. 54, 428–453 (2013)
Heinonen, J.: Lectures on Lipschitz analysis. Technical Report vol. 100, Rep. Univ. Jyväskylä Dept. Math. Stat., University of Jyväskylä (2005)
Jiang, F., Cai, X., Wu, Z., Han, D.: Approximate first-order primal-dual algorithms for saddle point problems. Math. Comput. 90(329), 1227–1262 (2021)
Kamzolov, D., Dvurechensky, P., Gasnikov, A.: Universal intermediate gradient method for convex problems with inexact oracle. arXiv:1712.06036 (2019)
Kang, M., Kang, M., Jung, M.: Inexact accelerated augmented Lagrangian methods. Comput. Optim. Appl. 62(2), 373–404 (2015)
Krichene, W., Bayen, A., Bartlett, P.: Accelerated mirror descent in continuous and discrete time. Adv. Neural Inf. Process. Syst. (NIPS) 28, 2845–2853 (2015)
Li, H., Fang, C., Lin, Z.: Convergence rates analysis of the quadratic penalty method and its applications to decentralized distributed optimization. arXiv:1711.10802 (2017)
Li, H., Lin, Z.: Accelerated alternating direction method of multipliers: an optimal \({O}(1/{K})\) nonergodic analysis. J. Sci. Comput. 79(2), 671–699 (2019)
Lin, T., Ho, N., Jordan, M.I.: On efficient optimal transport: An analysis of greedy and accelerated mirror descent algorithms. In International Conference on Machine Learning, pp. 3982–3991. PMLR (2019)
Luo, H.: Accelerated differential inclusion for convex optimization. Optimization (2021). https://doi.org/10.1080/02331934.2021.2002327
Luo, H.: Accelerated primal-dual methods for linearly constrained convex optimization problems. arXiv:2109.12604 (2021)
Luo, H., Zhang, Z.-H.: A unified differential equation solver approach for separable convex optimization: splitting, acceleration and nonergodic rate. arXiv:2109.13467 (2023)
Luo, H.: A primal–dual flow for affine constrained convex optimization. ESAIM: Control Optim. Calc. Var. 28, 33 (2022)
Luo, H., Chen, L.: From differential equation solvers to accelerated first-order methods for convex optimization. Math. Program. (2021). https://doi.org/10.1007/s10107-021-01713-3
Nbmirovskii, A.S., Nrsterov, Y.E.: Optimal methods of smooth convex minimization. USSR Comput. Math. Math. Phys. 25(2), 21–30 (1985)
Nemirovsky, A., Yudin, D.: Problem Complexity and Method Efficiency in Optimization. John Wiley & Sons, New York (1983)
Nesterov, Y.: Introductory Lectures on Convex Optimization. Applied Optimization, vol. 87. Springer, US, Boston, MA (2004)
Nesterov, Y.: Smooth minimization of non-smooth functions. Math. Program. 103(1), 127–152 (2005)
Nesterov, Y.: Gradient methods for minimizing composite functions. Math. Program. Ser. B 140(1), 125–161 (2013)
Nesterov, Y.: Universal gradient methods for convex optimization problems. Math. Program. 152, 381–404 (2015)
Nesterov, Y.: Lectures on Convex Optimization. Springer Optimization and Its Applications, vol. 137. Springer International Publishing, Cham (2018)
Ouyang, Y., Chen, Y., Lan, G., Pasiliao, E.: An accelerated linearized alternating direction method of multipliers. SIAM J. Imaging Sci. 8(1), 644–681 (2015)
Ouyang, Y., Xu, Y.: Lower complexity bounds of first-order methods for convex–concave bilinear Saddle-point problems. Math. Program. 185(1–2), 1–35 (2021)
Roulet, V., d’Aspremont, A.: Sharpness, restart, and acceleration. In: 31st Conference on Neural Information Processing Systems, Long Beach, CA, USA (2017)
Sabach, S., Teboulle, M.: Faster Lagrangian-based methods in convex optimization. SIAM J. Optim. 32(1), 204–227 (2022)
Stonyakin, F., Dvinskikh, D., Dvurechensky, P., Kroshnin, A., Kuznetsova, O., Agafonov, A., Gasnikov, A., Tyurin, A., Uribe, C.A., Pasechnyuk, D., Artamonov, S.: Gradient methods for problems with inexact model of the objective. arXiv:1902.09001 (2019)
Stonyakin, F., Gasnikov, A., Dvurechensky, P., Alkousa, M., Titov, A.: Generalized mirror prox for monotone variational inequalities: Universality and inexact oracle. arXiv:1806.05140 (2022)
Tao, M., Yuan, X.: Accelerated Uzawa methods for convex optimization. Math. Comput. 86(306), 1821–1845 (2016)
Tian, W., Yuan, X.: An alternating direction method of multipliers with a worst-case \({O}(1/n^2)\) convergence rate. Math. Comput. 88(318), 1685–1713 (2018)
Tran-Dinh, Q.: A unified convergence rate analysis of the accelerated smoothed gap reduction algorithm. Optim. Lett. (2021). https://doi.org/10.1007/s11590-021-01775-4
Tran-Dinh, Q., Fercoq, O., Cevher, V.: A smooth primal-dual optimization framework for nonsmooth composite convex minimization. SIAM J. Optim. 28(1), 96–134 (2018)
Tran-Dinh, Q., Zhu, Y.: Augmented Lagrangian-based decomposition methods with non-ergodic optimal rates. arXiv:1806.05280 (2018)
Tran-Dinh, Q., Zhu, Y.: Non-stationary first-order primal–dual algorithms with faster convergence rates. SIAM J. Optim. 30(4), 2866–2896 (2020)
Valkonen, T.: Inertial, corrected, primal–dual proximal splitting. SIAM J. Optim. 30(2), 1391–1420 (2020)
Wibisono, A., Wilson, A.C., Jordan, M.I.: A variational perspective on accelerated methods in optimization. Proc. Natl. Acad. Sci. 113(47), E7351–E7358 (2016)
Wilson, A., Recht, B., Jordan, M.: A Lyapunov analysis of momentum methods in optimization. arXiv:1611.02635 (2016)
Xu, Y.: Accelerated first-order primal-dual proximal methods for linearly constrained composite convex programming. SIAM J. Optim. 27(3), 1459–1484 (2017)
Xu, Y.: Iteration complexity of inexact augmented Lagrangian methods for constrained convex programming. Math. Program. 185(1–2), 199–244 (2021)
Yin, W., Osher, S., Goldfarb, D., Darbon, J.: Bregman iterative algorithms for \(\ell _1\)-minimization with applications to compressed sensing. SIAM J. Imaging Sci. 1(1), 143–168 (2008)
Yurtsever, A., Tran-Dinh, Q., Cevher, V.: A universal primal-dual convex optimization framework. arXiv: 1502.03123 (2015)
Zhao, Y., Liao, X., He, X., Li, C.: Accelerated primal-dual mirror dynamical approaches for constrained convex optimization. arXiv:2205.15983 (2022)
Acknowledgements
The author would like to thank the Editor and two anonymous referees, for their careful readings and valuable comments that improve significantly the early version of the paper.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Olivier Fercoq.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This work was supported by the Foundation of Chongqing Normal University (Grant No. 202210000161) and the Science and Technology Research Program of Chongqing Municipal Education Commission (Grant No. KJZD-K202300505)
Appendices
Proof of Lemma 3.1
Let us first prove (18). Recall that \(i_k\) is the smallest nonnegative integer such that (cf. (15))
If \(i_k = 0\), then \(M_{k+1} = M_{k,0} /\rho _{_{\!d}}= M_{k}/\rho _{_{\!d}}\). Otherwise (i.e., \(i_k\ge 1\)), we claim that
If this is violated, then \(M_{k,i_{k}-1} = M_{k,i_k}/\rho _{\!u}>M(\nu ,\delta _{k,i_{k}-1})\). According to Proposition 2.1, this implies immediately that
which yields a contradiction and thus verifies (57). Additionally, by (16), we have \( \alpha _{k,i_{k}-1}\le \sqrt{\rho _{\! u}}\alpha _{k,i_{k}}\). Therefore, collecting (13,17,57) leads to
This means that for all \(k\in {\mathbb {N}}\), we have
Since \(\delta _{k+1} = \beta _{k+1}/{(k+1)}\) with \(\{\beta _k\}_{k\in {\mathbb {N}}}\) being decreasing (cf.(17)), it follows that \(\delta _{k+1}\le \delta _{\ell +1}\) and \(M(\nu ,\delta _{\ell +1})\le M(\nu ,\delta _{k+1})\) for all \(0\le \ell \le k\), which together with (59) indicates the estimate
This proves the desired result (18).
Then, let us verify (19). Observing that \(M_{k+1} = M_{k,i_k} /\rho _{_{\!d}}= \rho _{\! u}^{i_k}M_{k}/\rho _{_{\!d}}\), we have
Invoking the estimate of \(M_{k+1}\) gives
Since \(M(\nu ,\delta _{k+1}) = \delta _{k+1}^{\frac{\nu -1}{\nu +1}}[M_\nu (h)]^{\frac{2}{\nu +1}}\), we obtain (19) and complete the proof of Lemma 3.1.
Derivation of the Reformulation (20)
Observing line 10 of Algorithm 1, we obtain (20d). Given \((x_k,v_k,\lambda _k)\) and \((\gamma _k,\beta _k,M_{k})\), \((y_k,x_{k+1},v_{k+1})\) are nothing but the output of Algorithm 2 with the input \((k,S_k,M_{k,i_k})\), where \(S_k=\{x_k,v_k,\lambda _k,\beta _k,\gamma _k\}\). Hence, from lines 3 and 4, we obtain
which gives (20a) and (20c). Besides, we have
with \(\widetilde{\lambda }_k= \lambda _k+\alpha _k/\beta _k(Av_k-b)\). The optimality condition reads as
After rearranging, we get (20b).
Proof of Lemma 5.1
1.1 The Case \(\eta =\theta -1\)
The estimate (47) becomes
Since \(y(0) = 1\) and \(y'(t)\le 0\), it holds that \(0<y(t)\le 1\) for all \(t\ge 0\). As \(\varphi (t)\) is positive and nondecreasing, we obtain
Combining this with (60) gives
and integrating over (0, t) leads to
Define
Then one finds that
This also implies
where \(Y(t):= Y_1(t)+Y_2(t)\). For fixed \(t>0\), the function
is monotonously decreasing in terms of \(v\in (0,\infty )\). Collecting (61,63) yields that
This completes the proof of Lemma 5.1 with \(\eta =\theta -1\).
1.2 The Case \(\eta <\theta -1\)
The proof is in line with the previous case. With some elementary calculus computations, the estimate (61) now becomes
where \(G:(0,\infty )\times (0,\infty )\rightarrow \,{{\mathbb {R}}}\) is defined by
for all \(w,\,v>0\). In addition to \(Y_2(t)\) defined in (62), we need
Since \(G(w,\cdot )\) is monotonously decreasing and
we obtain
This concludes the proof of Lemma 5.1 with \(\eta <\theta -1\).
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Luo, H. A Universal Accelerated Primal–Dual Method for Convex Optimization Problems. J Optim Theory Appl 201, 280–312 (2024). https://doi.org/10.1007/s10957-024-02394-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10957-024-02394-6
Keywords
- Convex optimization
- Primal–dual method
- Mixed-type estimate
- Optimal complexity
- Bregman divergence
- Lyapunov function