Skip to main content
Log in

The Inexact Cyclic Block Proximal Gradient Method and Properties of Inexact Proximal Maps

  • Published:
Journal of Optimization Theory and Applications Aims and scope Submit manuscript

Abstract

This paper expands the cyclic block proximal gradient method for block separable composite minimization by allowing for inexactly computed gradients and pre-conditioned proximal maps. The resultant algorithm, the inexact cyclic block proximal gradient (I-CBPG) method, shares the same convergence rate as its exactly computed analogue provided the allowable errors decrease sufficiently quickly or are pre-selected to be sufficiently small. We provide numerical experiments that showcase the practical computational advantage of I-CBPG for certain fixed tolerances of approximation error and for a dynamically decreasing error tolerance regime in particular. Our experimental results indicate that cyclic methods with dynamically decreasing error tolerance regimes can actually outpace their randomized siblings with fixed error tolerance regimes. We establish a tight relationship between inexact pre-conditioned proximal map evaluations and \(\delta \)-subgradients in our \((\delta ,B)\)-Second Prox theorem. This theorem forms the foundation of our convergence analysis and enables us to show that inexact gradient computations can be subsumed within a single unifying framework.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Algorithm 1
Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. In general, not every selection function for \({{\,\textrm{Prox}\,}}^{B_i}_{\Psi _i/L_i}\) satisfies the monotone decrease condition (14). However, computable selection functions satisfying (14) are available for all of our applications in Sect. 4.

  2. Data and related code for these experiments will be made available upon reasonable request.

References

  1. Beck, A.: First-Order Methods in Optimization. SIAM, Philadelphia (2017)

    Book  Google Scholar 

  2. Beck, A., Tetruashvili, L.: On the convergence of block coordinate descent type methods. SIAM J. Optim. 23(4), 2037–2060 (2013). https://doi.org/10.1137/120887679

    Article  MathSciNet  Google Scholar 

  3. Broughton, R., Coope, I., Renaud, P., Tappenden, R.: A box constrained gradient projection algorithm for compressed sensing. Signal Process. 91(8), 1985–1992 (2011)

    Article  Google Scholar 

  4. d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008)

    Article  MathSciNet  Google Scholar 

  5. Devolder, O., Glineur, F., Nesterov, Y.: Intermediate gradient methods for smooth convex problems with inexact oracle. Technical report CORE-2013017 Center for Operations Research (2013)

  6. Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Math. Program. 146(1), 37–75 (2014)

    Article  MathSciNet  Google Scholar 

  7. Donoho, D.: Compressed sensing. IEEE Trans. Inf. Theory 52(4), 1289–1306 (2006)

    Article  MathSciNet  Google Scholar 

  8. Dvurechensky, P., Gasnikov, A.: Stochastic intermediate gradient method for convex problems with stochastic inexact oracle. J. Optim. Theory Appl. 171(1), 121–145 (2016)

    Article  MathSciNet  Google Scholar 

  9. Frongillo, R., Reid, M.: Convergence analysis of prediction markets via randomized subspace descent. In: Cortes, C., Lawrence, N., Lee, D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 28 (2015)

  10. Hiriart-Urruty, J., Lemaréchal, C.: Convex Analysis and Minimization Algorithms II: Advanced Theory and Bundle Methods. Springer, Berlin (2013)

    Google Scholar 

  11. Hua, X., Yamashita, N.: Block coordinate proximal gradient methods with variable Bregman functions for nonsmooth separable optimization. Math. Program. 160(1), 1–32 (2016)

    Article  MathSciNet  Google Scholar 

  12. Leventhal, D., Lewis, A.: Randomized methods for linear constraints: convergence rates and conditioning. Math. Oper. Res. 35(3), 641–654 (2010)

    Article  MathSciNet  Google Scholar 

  13. Lu, H., Freund, R., Nesterov, Y.: Relatively smooth convex optimization by first-order methods, and applications. SIAM J. Optim. 28(1), 333–354 (2018)

    Article  MathSciNet  Google Scholar 

  14. Nesterov, Y.: Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM J. Optim. 22(2), 341–362 (2012)

    Article  MathSciNet  Google Scholar 

  15. Qin, Z., Scheinberg, K., Goldfarb, D.: Efficient block-coordinate descent algorithms for the group lasso. Math. Program. Comput. 5(2), 143–169 (2013)

    Article  MathSciNet  Google Scholar 

  16. Richtárik, P., Takáč, M.: Efficient serial and parallel coordinate descent methods for huge-scale truss topology design. In: Klatte, D., Lüthi, H., Schmedders, K. (eds.) Operations Research Proceedings, vol. 2011 (2012)

  17. Richtárik, P., Takáč, M.: Parallel coordinate descent methods for big data optimization. Math. Program. 156(1–2), 433–484 (2016)

    Article  MathSciNet  Google Scholar 

  18. Salzo, S., Villa, S.: Inexact and accelerated proximal point algorithms. J. Convex Anal. 19(4), 1167–1192 (2012)

    MathSciNet  Google Scholar 

  19. Schmidt, M., Roux, N., Bach, F.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R., Bartlett, P., Pereira, F., Weinberger, K. (eds.) Advances in Neural Information Processing Systems, vol. 24 (2011)

  20. Shefi, R., Teboulle, M.: On the rate of convergence of the proximal alternating linearized minimization algorithm for convex problems. EURO J. Comput. Optim. 4(1), 27–46 (2016)

    Article  MathSciNet  Google Scholar 

  21. Simon, N., Tibshirani, R.: Standardization and the group lasso penalty. Stat. Sin. 22(2), 983–1002 (2012)

    MathSciNet  PubMed  PubMed Central  Google Scholar 

  22. Tappenden, R., Richtárik, P., Gondzio, J.: Inexact coordinate descent: complexity and preconditioning. J. Optim. Theory Appl. 170(1), 144–176 (2016)

    Article  MathSciNet  Google Scholar 

  23. Villa, S., Salzo, S., Baldassarre, L., Verri, A.: Accelerated and inexact forward–backward algorithms. SIAM J. Optim. 23(3), 1607–1633 (2013)

    Article  MathSciNet  Google Scholar 

  24. Wright, S., Nowak, R., Figueiredo, M.: Sparse reconstruction by separable approximation. IEEE Trans. Signal Process. 57(7), 2479–2493 (2009)

    Article  ADS  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to David Huckleberry Gutman.

Additional information

Communicated by Edouard Pauwels.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Proof of Lemma 3.3

Proof of Lemma 3.3

Proof

Fix \(k\ge 2\). We begin by dividing both sides of (27) by \(A_\ell A_{\ell +1}\),

$$\begin{aligned} \frac{1}{\gamma }\frac{A_{\ell +1}}{A_\ell } \le \frac{1}{A_{\ell +1}}-\frac{1}{A_\ell }+\frac{\Delta _{\ell +1}}{A_\ell A_{\ell +1}}, \end{aligned}$$

then rearrange and use monotonicity of \(\{A_\ell \}_{\ell \ge 0}\) to simplify this to

$$\begin{aligned} \frac{1}{A_{\ell +1}}-\frac{1}{A_\ell } \ge \frac{1}{\gamma }\frac{A_{\ell +1}}{A_\ell } -\frac{\Delta _{\ell +1}}{A_\ell A_{\ell +1}}. \end{aligned}$$

This rearrangement foreshadows the important roles of \(A_{\ell +1}/A_\ell \) and \(\Delta _{\ell +1}/(A_\ell A_{\ell +1})\). We consider two cases, divided according to the typical size of the ratio \(A_{\ell +1}/A_\ell \) for \(\ell + 1 \le k\). In the second case, when the values of \(A_\ell \) fall at what one may consider a relatively slow rate over this range, we consider three subcases based on the behavior of \(\{\Delta _\ell \}_{\ell \ge 1}\) and the typical values of \(\frac{\Delta _{\ell +1}}{A_\ell A_{\ell +1}}\).

  1. (i)

    For at least \(\lfloor k/2\rfloor \) values of \(0\le \ell \le k-1\), we have \(A_{\ell +1}/A_\ell \le 1/2\).

  2. (ii)

    For at least \(\lfloor k/2\rfloor \) values of \(0\le \ell \le k-1\), we have \(1/2 < A_{\ell +1}/A_\ell \le 1\). In this case, we consider three subcases based on the values of \(\Delta _{\ell +1}/(A_\ell A_{\ell +1})\) and the sequence \(\{\Delta _\ell \}_{\ell \ge 1}\).

Case 1: For at least \(\lfloor k/2\rfloor \) values of \(0\le \ell \le k-1\), \(\frac{A_{\ell +1}}{A_\ell }\le \frac{1}{2}\).

This is the easy case. First, assume that k is even. Then, we have that \(A_{\ell +1}\le \frac{1}{2}A_\ell \) for at least k/2 values of \(0\le \ell \le k-1\) so

$$\begin{aligned} A_k\le \left( \frac{1}{2}\right) ^{k/2}A_0, \end{aligned}$$

since the \(A_\ell \) terms are decreasing. If \(k>2\) is odd, then \(k-1\) is even, so by the same logic

$$\begin{aligned} A_k\le \left( \frac{1}{2}\right) ^{(k-1)/2}A_0. \end{aligned}$$

Case 2: For at least \(\lfloor k/2\rfloor \) values of \(0\le \ell \le k-1\), \(\frac{1}{2} < \frac{A_{\ell +1}}{A_\ell } \le 1\).

We examine the following three subcases in turn:

  1. (i)

    \(\Delta _{\ell } = \Delta \ge 0 \) for all \(\ell \).

  2. (ii)

    The sequence \(\{\Delta _\ell \}_{\ell \ge 1}\) shrinks at the sublinear rate \({\mathcal {O}}(1/\ell ^2)\) and for at least \(\lfloor k/4\rfloor \) of the values for which \(\frac{1}{2}< \frac{A_{\ell +1}}{A_\ell }\le 1\) it also holds that \(\frac{1}{4\gamma }>\frac{\Delta _{\ell +1}}{A_\ell A_{\ell +1}}\).

  3. (iii)

    The sequence \(\{\Delta _\ell \}_{\ell \ge 1}\) shrinks at the sublinear rate \({\mathcal {O}}(1/\ell ^2)\) and for at least \(\lfloor k/4\rfloor \) of the values for which \(\frac{1}{2}<\frac{A_{\ell +1}}{A_\ell }\le 1\) it also holds that \(\frac{1}{4\gamma } \le \frac{\Delta _{\ell +1}}{A_\ell A_{\ell +1}}\).

Case 2, Subcase i: \(\Delta _\ell = \Delta \ge 0\) for all \(\ell \).

Assume for now that k is even. Define \(u=\sqrt{\Delta \gamma }\), and let \(\tilde{A}_{\ell }=A_{\ell }-u\). Then, the recurrence (27) implies that \( \frac{1}{\gamma }A_{\ell +1}^2\le A_\ell -A_{\ell +1}+\Delta _{\ell +1} \), which we may express as

$$\begin{aligned} \frac{1}{\gamma }(\tilde{A}_{\ell +1}+u)^2=\frac{1}{\gamma }A_{\ell +1}^2\le A_\ell -A_{\ell +1}+\Delta =\tilde{A}_\ell -\tilde{A}_{\ell +1}+\Delta . \end{aligned}$$

Expanding the square on the left, using the definition of u, and rearranging we see

$$\begin{aligned} \frac{1}{\gamma }\tilde{A}_{\ell +1}^2\le \tilde{A}_\ell -\left( 1+\frac{2u}{\gamma }\right) \tilde{A}_{\ell +1}. \end{aligned}$$

If \(\tilde{A}_{k} \le 0\), the result is immediate, so suppose that \(\tilde{A}_{k} > 0\), from which it follows that the earlier \(\tilde{A}_{\ell }\) terms are also positive. Then, for any \(\ell \) with \(0 \le \ell \le k-1\), we may divide the recurrence inequality by the product \(\tilde{A}_{\ell +1}\tilde{A}_\ell \) to obtain

$$\begin{aligned} \frac{1}{\tilde{A}_{\ell +1}} - \left( 1 + \frac{2u}{\gamma }\right) \frac{1}{\tilde{A}_\ell } \ge \frac{1}{\gamma }\frac{\tilde{A}_{\ell +1}}{\tilde{A}_\ell }. \end{aligned}$$

Now, by hypothesis, for at least k/2 indices in the range \(0 \le \ell \le k -1\)

$$\begin{aligned} \frac{1}{\tilde{A}_{\ell +1}} - \frac{1}{\tilde{A}_\ell } \ge \frac{1}{\gamma }\frac{\tilde{A}_{\ell +1}}{\tilde{A}_\ell } + \frac{2u}{\gamma } \frac{1}{\tilde{A}_\ell } \ge \frac{1}{\gamma }\frac{1 }{2} + \frac{2u}{\gamma } \frac{1}{\tilde{A}_0}. \end{aligned}$$

Iterating backward, one obtains

$$\begin{aligned} \frac{1}{\tilde{A}_k} \ge \frac{1}{\tilde{A}_{k}} - \frac{1}{\tilde{A}_0} \ge \frac{k}{2} \left( \frac{1}{2\gamma } + \frac{2u}{\gamma } \frac{1}{\tilde{A}_0} \right) {,} \end{aligned}$$

which gives \(\tilde{A}_k \le 4\gamma \tilde{A}_0/[k (\tilde{A}_0+4u)]\). The result follows from noting that \(k-1\) is even if k is odd, so we may replace k with \(k-1\) above to obtain a generic bound.

Case 2, Subcase ii: The sequence \(\{\Delta _\ell \}_{\ell \ge 1}\) shrinks at the sublinear rate \({\mathcal {O}}(1/\ell ^2)\) and for at least \(\lfloor k/4\rfloor \) of the values for which \(\frac{1}{2} < \frac{A_{\ell +1}}{A_\ell } \le 1\), it also holds that\(\frac{\Delta _{\ell +1}}{A_\ell A_{\ell +1}}< \frac{1}{4\gamma }\).

Our reasoning follows the same idea as when \(\Delta _\ell =\Delta \ge 0\) for all \(\ell \ge 1\) (Case 2, Subcase i). First, assume that k is divisible by 4. We have for k/4 values of \(0\le \ell \le k-1\) that

$$\begin{aligned} \frac{1}{A_{\ell +1}}-\frac{1}{A_\ell }\ge \frac{1}{2\gamma }-\frac{\Delta _{\ell +1}}{A_\ell A_{\ell +1}}\ge \frac{1}{2\gamma }-\frac{1}{4\gamma }=\frac{1}{4\gamma }. \end{aligned}$$

This inequality iterated backward, plus monotonicity and non-negativity of the sequence \(\{A_\ell \}_{\ell \ge 0}\), implies that

$$\begin{aligned} \frac{1}{A_k}\ge \frac{1}{A_k}-\frac{1}{A_0}\ge \frac{k}{4}\left[ \frac{1}{4\gamma }\right] =\frac{k}{16\gamma }. \end{aligned}$$

Rearranging, we have that \(A_k \le 16\gamma / k\). If \(k > 4\) is not divisible by 4, then \(k-1\), \(k-2\), or \(k-3\) must be, so in the worst case \(A_k \le 16\gamma /(k-3)\).

Case 2, Subcase iii: The sequence \(\{\Delta _\ell \}_{\ell \ge 1}\) shrinks at the sublinear rate \({\mathcal {O}}(1/\ell ^2)\) and for at least \(\lfloor k/4\rfloor \) of the values for which \(\frac{1}{2} < \frac{A_{\ell +1}}{A_\ell } \le 1\), it also holds that \(\frac{\Delta _{\ell +1}}{A_\ell A_{\ell +1}}\ge \frac{1}{4\gamma }\).

First, suppose k is divisible by 4. Let \(\ell ^*\) denote the largest \(\ell \in \{0,\ldots ,k-1\}\) for which \(\frac{\Delta _{\ell +1}}{A_\ell A_{\ell +1}}\ge \frac{1}{4\gamma }\) holds. By hypothesis, \(\ell ^*\) must be at least as big as \(\frac{k}{4} - 1\), and \(\Delta ^2_\ell \le D/\ell ^2\), so

$$\begin{aligned} \frac{1}{4\gamma }\cdot A_k^2\le \frac{1}{4\gamma }\cdot A_k A_{k-1}\le \frac{1}{4\gamma }\cdot A_{\ell ^*+1} A_{\ell ^*}\le \Delta _{\ell ^*+1}\le \Delta _{k/4}\le \frac{D}{(k/4)^2}. \end{aligned}$$

Dividing by \(1/4\gamma \) and taking square roots, we have \(A_{k}\le \frac{8\sqrt{\gamma D}}{k}\). If \(k>4\) is not divisible by 4, then one of \(k-1\), \(k-2\), or \(k-3\) are, so at worst \(A_k\le \frac{8\sqrt{\gamma D}}{k-3}\).

Having completed our analysis, we may now combine the results from Case 1 with the appropriate Subcase(s) of Case 2 to establish the results in Lemma 3.3.

\(\square \)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Farias Maia, L., Gutman, D.H. & Hughes, R.C. The Inexact Cyclic Block Proximal Gradient Method and Properties of Inexact Proximal Maps. J Optim Theory Appl (2024). https://doi.org/10.1007/s10957-024-02404-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10957-024-02404-7

Keywords

Navigation