The Inexact Cyclic Block Proximal Gradient Method and Properties of Inexact Proximal Maps

Farias Maia, Leandro; Gutman, David Huckleberry; Hughes, Ryan Christopher

doi:10.1007/s10957-024-02404-7

The Inexact Cyclic Block Proximal Gradient Method and Properties of Inexact Proximal Maps

Published: 13 March 2024

(2024)
Cite this article

Journal of Optimization Theory and Applications Aims and scope Submit manuscript

Leandro Farias Maia¹,
David Huckleberry Gutman ORCID: orcid.org/0000-0003-3130-0076¹ &
Ryan Christopher Hughes^2,3

94 Accesses
1 Altmetric
Explore all metrics

Abstract

This paper expands the cyclic block proximal gradient method for block separable composite minimization by allowing for inexactly computed gradients and pre-conditioned proximal maps. The resultant algorithm, the inexact cyclic block proximal gradient (I-CBPG) method, shares the same convergence rate as its exactly computed analogue provided the allowable errors decrease sufficiently quickly or are pre-selected to be sufficiently small. We provide numerical experiments that showcase the practical computational advantage of I-CBPG for certain fixed tolerances of approximation error and for a dynamically decreasing error tolerance regime in particular. Our experimental results indicate that cyclic methods with dynamically decreasing error tolerance regimes can actually outpace their randomized siblings with fixed error tolerance regimes. We establish a tight relationship between inexact pre-conditioned proximal map evaluations and $\delta $-subgradients in our $(\delta ,B)$-Second Prox theorem. This theorem forms the foundation of our convergence analysis and enables us to show that inexact gradient computations can be subsumed within a single unifying framework.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Block-coordinate and incremental aggregated proximal gradient methods for nonsmooth nonconvex problems

Article 13 January 2021

First-Order Algorithms for Convex Optimization with Nonseparable Objective and Coupled Constraints

Article 18 June 2016

A semi-Bregman proximal alternating method for a class of nonconvex problems: local and global convergence analysis

Article 03 November 2023

Notes

In general, not every selection function for ${{\,\textrm{Prox}\,}}^{B_i}_{\Psi _i/L_i}$ satisfies the monotone decrease condition (14). However, computable selection functions satisfying (14) are available for all of our applications in Sect. 4.
Data and related code for these experiments will be made available upon reasonable request.

References

Beck, A.: First-Order Methods in Optimization. SIAM, Philadelphia (2017)
Book Google Scholar
Beck, A., Tetruashvili, L.: On the convergence of block coordinate descent type methods. SIAM J. Optim. 23(4), 2037–2060 (2013). https://doi.org/10.1137/120887679
Article MathSciNet Google Scholar
Broughton, R., Coope, I., Renaud, P., Tappenden, R.: A box constrained gradient projection algorithm for compressed sensing. Signal Process. 91(8), 1985–1992 (2011)
Article Google Scholar
d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008)
Article MathSciNet Google Scholar
Devolder, O., Glineur, F., Nesterov, Y.: Intermediate gradient methods for smooth convex problems with inexact oracle. Technical report CORE-2013017 Center for Operations Research (2013)
Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Math. Program. 146(1), 37–75 (2014)
Article MathSciNet Google Scholar
Donoho, D.: Compressed sensing. IEEE Trans. Inf. Theory 52(4), 1289–1306 (2006)
Article MathSciNet Google Scholar
Dvurechensky, P., Gasnikov, A.: Stochastic intermediate gradient method for convex problems with stochastic inexact oracle. J. Optim. Theory Appl. 171(1), 121–145 (2016)
Article MathSciNet Google Scholar
Frongillo, R., Reid, M.: Convergence analysis of prediction markets via randomized subspace descent. In: Cortes, C., Lawrence, N., Lee, D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 28 (2015)
Hiriart-Urruty, J., Lemaréchal, C.: Convex Analysis and Minimization Algorithms II: Advanced Theory and Bundle Methods. Springer, Berlin (2013)
Google Scholar
Hua, X., Yamashita, N.: Block coordinate proximal gradient methods with variable Bregman functions for nonsmooth separable optimization. Math. Program. 160(1), 1–32 (2016)
Article MathSciNet Google Scholar
Leventhal, D., Lewis, A.: Randomized methods for linear constraints: convergence rates and conditioning. Math. Oper. Res. 35(3), 641–654 (2010)
Article MathSciNet Google Scholar
Lu, H., Freund, R., Nesterov, Y.: Relatively smooth convex optimization by first-order methods, and applications. SIAM J. Optim. 28(1), 333–354 (2018)
Article MathSciNet Google Scholar
Nesterov, Y.: Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM J. Optim. 22(2), 341–362 (2012)
Article MathSciNet Google Scholar
Qin, Z., Scheinberg, K., Goldfarb, D.: Efficient block-coordinate descent algorithms for the group lasso. Math. Program. Comput. 5(2), 143–169 (2013)
Article MathSciNet Google Scholar
Richtárik, P., Takáč, M.: Efficient serial and parallel coordinate descent methods for huge-scale truss topology design. In: Klatte, D., Lüthi, H., Schmedders, K. (eds.) Operations Research Proceedings, vol. 2011 (2012)
Richtárik, P., Takáč, M.: Parallel coordinate descent methods for big data optimization. Math. Program. 156(1–2), 433–484 (2016)
Article MathSciNet Google Scholar
Salzo, S., Villa, S.: Inexact and accelerated proximal point algorithms. J. Convex Anal. 19(4), 1167–1192 (2012)
MathSciNet Google Scholar
Schmidt, M., Roux, N., Bach, F.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R., Bartlett, P., Pereira, F., Weinberger, K. (eds.) Advances in Neural Information Processing Systems, vol. 24 (2011)
Shefi, R., Teboulle, M.: On the rate of convergence of the proximal alternating linearized minimization algorithm for convex problems. EURO J. Comput. Optim. 4(1), 27–46 (2016)
Article MathSciNet Google Scholar
Simon, N., Tibshirani, R.: Standardization and the group lasso penalty. Stat. Sin. 22(2), 983–1002 (2012)
MathSciNet PubMed PubMed Central Google Scholar
Tappenden, R., Richtárik, P., Gondzio, J.: Inexact coordinate descent: complexity and preconditioning. J. Optim. Theory Appl. 170(1), 144–176 (2016)
Article MathSciNet Google Scholar
Villa, S., Salzo, S., Baldassarre, L., Verri, A.: Accelerated and inexact forward–backward algorithms. SIAM J. Optim. 23(3), 1607–1633 (2013)
Article MathSciNet Google Scholar
Wright, S., Nowak, R., Figueiredo, M.: Sparse reconstruction by separable approximation. IEEE Trans. Signal Process. 57(7), 2479–2493 (2009)
Article ADS MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Industrial and Systems Engineering, Texas A &M University, College Station, TX, USA
Leandro Farias Maia & David Huckleberry Gutman
U.S. Patent and Trademark Office, Alexandria, VA, USA
Ryan Christopher Hughes
Addx Corporation, Alexandria, VA, USA
Ryan Christopher Hughes

Authors

Leandro Farias Maia
View author publications
You can also search for this author in PubMed Google Scholar
David Huckleberry Gutman
View author publications
You can also search for this author in PubMed Google Scholar
Ryan Christopher Hughes
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to David Huckleberry Gutman.

Additional information

Communicated by Edouard Pauwels.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Proof of Lemma 3.3

Proof

Fix $k\ge 2$. We begin by dividing both sides of (27) by $A_\ell A_{\ell +1}$,

$$\begin{aligned} \frac{1}{\gamma }\frac{A_{\ell +1}}{A_\ell } \le \frac{1}{A_{\ell +1}}-\frac{1}{A_\ell }+\frac{\Delta _{\ell +1}}{A_\ell A_{\ell +1}}, \end{aligned}$$

then rearrange and use monotonicity of $\{A_\ell \}_{\ell \ge 0}$ to simplify this to

$$\begin{aligned} \frac{1}{A_{\ell +1}}-\frac{1}{A_\ell } \ge \frac{1}{\gamma }\frac{A_{\ell +1}}{A_\ell } -\frac{\Delta _{\ell +1}}{A_\ell A_{\ell +1}}. \end{aligned}$$

This rearrangement foreshadows the important roles of $A_{\ell +1}/A_\ell $ and $\Delta _{\ell +1}/(A_\ell A_{\ell +1})$. We consider two cases, divided according to the typical size of the ratio $A_{\ell +1}/A_\ell $ for $\ell + 1 \le k$. In the second case, when the values of $A_\ell $ fall at what one may consider a relatively slow rate over this range, we consider three subcases based on the behavior of $\{\Delta _\ell \}_{\ell \ge 1}$ and the typical values of $\frac{\Delta _{\ell +1}}{A_\ell A_{\ell +1}}$.

(i)
For at least $\lfloor k/2\rfloor $ values of $0\le \ell \le k-1$, we have $A_{\ell +1}/A_\ell \le 1/2$.
(ii)
For at least $\lfloor k/2\rfloor $ values of $0\le \ell \le k-1$, we have $1/2 < A_{\ell +1}/A_\ell \le 1$. In this case, we consider three subcases based on the values of $\Delta _{\ell +1}/(A_\ell A_{\ell +1})$ and the sequence $\{\Delta _\ell \}_{\ell \ge 1}$.

Case 1: For at least $\lfloor k/2\rfloor $ values of $0\le \ell \le k-1$, $\frac{A_{\ell +1}}{A_\ell }\le \frac{1}{2}$.

This is the easy case. First, assume that k is even. Then, we have that $A_{\ell +1}\le \frac{1}{2}A_\ell $ for at least k/2 values of $0\le \ell \le k-1$ so

$$\begin{aligned} A_k\le \left( \frac{1}{2}\right) ^{k/2}A_0, \end{aligned}$$

since the $A_\ell $ terms are decreasing. If $k>2$ is odd, then $k-1$ is even, so by the same logic

$$\begin{aligned} A_k\le \left( \frac{1}{2}\right) ^{(k-1)/2}A_0. \end{aligned}$$

Case 2: For at least $\lfloor k/2\rfloor $ values of $0\le \ell \le k-1$, $\frac{1}{2} < \frac{A_{\ell +1}}{A_\ell } \le 1$.

We examine the following three subcases in turn:

(i)
$\Delta _{\ell } = \Delta \ge 0 $ for all $\ell $.
(ii)
The sequence $\{\Delta _\ell \}_{\ell \ge 1}$ shrinks at the sublinear rate ${\mathcal {O}}(1/\ell ^2)$ and for at least $\lfloor k/4\rfloor $ of the values for which $\frac{1}{2}< \frac{A_{\ell +1}}{A_\ell }\le 1$ it also holds that $\frac{1}{4\gamma }>\frac{\Delta _{\ell +1}}{A_\ell A_{\ell +1}}$.
(iii)
The sequence $\{\Delta _\ell \}_{\ell \ge 1}$ shrinks at the sublinear rate ${\mathcal {O}}(1/\ell ^2)$ and for at least $\lfloor k/4\rfloor $ of the values for which $\frac{1}{2}<\frac{A_{\ell +1}}{A_\ell }\le 1$ it also holds that $\frac{1}{4\gamma } \le \frac{\Delta _{\ell +1}}{A_\ell A_{\ell +1}}$.

Case 2, Subcase i: $\Delta _\ell = \Delta \ge 0$ for all $\ell $.

Assume for now that k is even. Define $u=\sqrt{\Delta \gamma }$, and let $\tilde{A}_{\ell }=A_{\ell }-u$. Then, the recurrence (27) implies that $ \frac{1}{\gamma }A_{\ell +1}^2\le A_\ell -A_{\ell +1}+\Delta _{\ell +1} $, which we may express as

$$\begin{aligned} \frac{1}{\gamma }(\tilde{A}_{\ell +1}+u)^2=\frac{1}{\gamma }A_{\ell +1}^2\le A_\ell -A_{\ell +1}+\Delta =\tilde{A}_\ell -\tilde{A}_{\ell +1}+\Delta . \end{aligned}$$

Expanding the square on the left, using the definition of u, and rearranging we see

$$\begin{aligned} \frac{1}{\gamma }\tilde{A}_{\ell +1}^2\le \tilde{A}_\ell -\left( 1+\frac{2u}{\gamma }\right) \tilde{A}_{\ell +1}. \end{aligned}$$

If $\tilde{A}_{k} \le 0$, the result is immediate, so suppose that $\tilde{A}_{k} > 0$, from which it follows that the earlier $\tilde{A}_{\ell }$ terms are also positive. Then, for any $\ell $ with $0 \le \ell \le k-1$, we may divide the recurrence inequality by the product $\tilde{A}_{\ell +1}\tilde{A}_\ell $ to obtain

$$\begin{aligned} \frac{1}{\tilde{A}_{\ell +1}} - \left( 1 + \frac{2u}{\gamma }\right) \frac{1}{\tilde{A}_\ell } \ge \frac{1}{\gamma }\frac{\tilde{A}_{\ell +1}}{\tilde{A}_\ell }. \end{aligned}$$

Now, by hypothesis, for at least k/2 indices in the range $0 \le \ell \le k -1$

$$\begin{aligned} \frac{1}{\tilde{A}_{\ell +1}} - \frac{1}{\tilde{A}_\ell } \ge \frac{1}{\gamma }\frac{\tilde{A}_{\ell +1}}{\tilde{A}_\ell } + \frac{2u}{\gamma } \frac{1}{\tilde{A}_\ell } \ge \frac{1}{\gamma }\frac{1 }{2} + \frac{2u}{\gamma } \frac{1}{\tilde{A}_0}. \end{aligned}$$

Iterating backward, one obtains

$$\begin{aligned} \frac{1}{\tilde{A}_k} \ge \frac{1}{\tilde{A}_{k}} - \frac{1}{\tilde{A}_0} \ge \frac{k}{2} \left( \frac{1}{2\gamma } + \frac{2u}{\gamma } \frac{1}{\tilde{A}_0} \right) {,} \end{aligned}$$

which gives $\tilde{A}_k \le 4\gamma \tilde{A}_0/[k (\tilde{A}_0+4u)]$. The result follows from noting that $k-1$ is even if k is odd, so we may replace k with $k-1$ above to obtain a generic bound.

Case 2, Subcase ii: The sequence $\{\Delta _\ell \}_{\ell \ge 1}$ shrinks at the sublinear rate ${\mathcal {O}}(1/\ell ^2)$ and for at least $\lfloor k/4\rfloor $ of the values for which $\frac{1}{2} < \frac{A_{\ell +1}}{A_\ell } \le 1$, it also holds that$\frac{\Delta _{\ell +1}}{A_\ell A_{\ell +1}}< \frac{1}{4\gamma }$.

Our reasoning follows the same idea as when $\Delta _\ell =\Delta \ge 0$ for all $\ell \ge 1$ (Case 2, Subcase i). First, assume that k is divisible by 4. We have for k/4 values of $0\le \ell \le k-1$ that

$$\begin{aligned} \frac{1}{A_{\ell +1}}-\frac{1}{A_\ell }\ge \frac{1}{2\gamma }-\frac{\Delta _{\ell +1}}{A_\ell A_{\ell +1}}\ge \frac{1}{2\gamma }-\frac{1}{4\gamma }=\frac{1}{4\gamma }. \end{aligned}$$

This inequality iterated backward, plus monotonicity and non-negativity of the sequence $\{A_\ell \}_{\ell \ge 0}$, implies that

$$\begin{aligned} \frac{1}{A_k}\ge \frac{1}{A_k}-\frac{1}{A_0}\ge \frac{k}{4}\left[ \frac{1}{4\gamma }\right] =\frac{k}{16\gamma }. \end{aligned}$$

Rearranging, we have that $A_k \le 16\gamma / k$. If $k > 4$ is not divisible by 4, then $k-1$, $k-2$, or $k-3$ must be, so in the worst case $A_k \le 16\gamma /(k-3)$.

Case 2, Subcase iii: The sequence $\{\Delta _\ell \}_{\ell \ge 1}$ shrinks at the sublinear rate ${\mathcal {O}}(1/\ell ^2)$ and for at least $\lfloor k/4\rfloor $ of the values for which $\frac{1}{2} < \frac{A_{\ell +1}}{A_\ell } \le 1$, it also holds that $\frac{\Delta _{\ell +1}}{A_\ell A_{\ell +1}}\ge \frac{1}{4\gamma }$.

First, suppose k is divisible by 4. Let $\ell ^*$ denote the largest $\ell \in \{0,\ldots ,k-1\}$ for which $\frac{\Delta _{\ell +1}}{A_\ell A_{\ell +1}}\ge \frac{1}{4\gamma }$ holds. By hypothesis, $\ell ^*$ must be at least as big as $\frac{k}{4} - 1$, and $\Delta ^2_\ell \le D/\ell ^2$, so

$$\begin{aligned} \frac{1}{4\gamma }\cdot A_k^2\le \frac{1}{4\gamma }\cdot A_k A_{k-1}\le \frac{1}{4\gamma }\cdot A_{\ell ^*+1} A_{\ell ^*}\le \Delta _{\ell ^*+1}\le \Delta _{k/4}\le \frac{D}{(k/4)^2}. \end{aligned}$$

Dividing by $1/4\gamma $ and taking square roots, we have $A_{k}\le \frac{8\sqrt{\gamma D}}{k}$. If $k>4$ is not divisible by 4, then one of $k-1$, $k-2$, or $k-3$ are, so at worst $A_k\le \frac{8\sqrt{\gamma D}}{k-3}$.

Having completed our analysis, we may now combine the results from Case 1 with the appropriate Subcase(s) of Case 2 to establish the results in Lemma 3.3.

$\square $

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Farias Maia, L., Gutman, D.H. & Hughes, R.C. The Inexact Cyclic Block Proximal Gradient Method and Properties of Inexact Proximal Maps. J Optim Theory Appl (2024). https://doi.org/10.1007/s10957-024-02404-7

Download citation

Received: 03 January 2022
Accepted: 04 February 2024
Published: 13 March 2024
DOI: https://doi.org/10.1007/s10957-024-02404-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The Inexact Cyclic Block Proximal Gradient Method and Properties of Inexact Proximal Maps

Abstract

Access this article

Similar content being viewed by others

Block-coordinate and incremental aggregated proximal gradient methods for nonsmooth nonconvex problems

First-Order Algorithms for Convex Optimization with Nonseparable Objective and Coupled Constraints

A semi-Bregman proximal alternating method for a class of nonconvex problems: local and global convergence analysis

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Proof of Lemma 3.3

Proof

Rights and permissions

About this article

Cite this article

Keywords

Navigation

The Inexact Cyclic Block Proximal Gradient Method and Properties of Inexact Proximal Maps

Abstract

Access this article

Similar content being viewed by others

Block-coordinate and incremental aggregated proximal gradient methods for nonsmooth nonconvex problems

First-Order Algorithms for Convex Optimization with Nonseparable Objective and Coupled Constraints

A semi-Bregman proximal alternating method for a class of nonconvex problems: local and global convergence analysis

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Proof of Lemma 3.3

Proof of Lemma 3.3

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation