Skip to main content
Log in

Random-reshuffled SARAH does not need full gradient computations

  • Original Paper
  • Published:
Optimization Letters Aims and scope Submit manuscript

Abstract

The StochAstic Recursive grAdient algoritHm (SARAH) algorithm is a variance reduced variant of the Stochastic Gradient Descent algorithm that needs a gradient of the objective function from time to time. In this paper, we remove the necessity of a full gradient computation. This is achieved by using a randomized reshuffling strategy and aggregating stochastic gradients obtained in each epoch. The aggregated stochastic gradients serve as an estimate of a full gradient in the SARAH algorithm. We provide a theoretical analysis of the proposed approach and conclude the paper with numerical experiments that demonstrate the efficiency of this approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Algorithm 1
Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. Ahn, K., Yun, C., Sra, S.: SGD with shuffling: optimal rates without component convexity and large epoch requirements. Adv. Neural Inf. Process. Syst. 33, 17526–17535 (2020)

    Google Scholar 

  2. Allen-Zhu, Z.: Katyusha: the first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017)

  3. Allen-Zhu, Z., Yuan, Y.: Improved SVRG for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089. PMLR (2016)

  4. Bengio, Y.: Practical recommendations for gradient-based training of deep architectures. In: Neural Networks: Tricks of the Trade, 2nd ed., pp. 437–478. Springer (2012)

  5. Bottou, L.: Curiously fast convergence of some stochastic gradient descent algorithms. In: Proceedings of the Symposium on Learning and Data Science, Paris, vol. 8, pp. 2624–2633. Citeseer (2009)

  6. Bottou, L., Curtis, F.E., Nocedal, J.: Optimization methods for large-scale machine learning. SIAM Rev. 60(2), 223–311 (2018)

    Article  MathSciNet  Google Scholar 

  7. Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. TIST 2(3), 1–27 (2011)

    Article  Google Scholar 

  8. Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: International Conference on Machine Learning, pp. 1019–1028. PMLR (2018)

  9. Cutkosky, A., Orabona, F.: Momentum-based variance reduction in non-convex sgd. arXiv preprint arXiv:1905.10018 (2019)

  10. Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: a fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014)

  11. Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: near-optimal non-convex optimization via stochastic path integrated differential estimator. arXiv preprint arXiv:1807.01695 (2018)

  12. Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Math. Program. 155(1–2), 267–305 (2016)

    Article  MathSciNet  Google Scholar 

  13. Gurbuzbalaban, M., Ozdaglar, A., Parrilo, P.A.: On the convergence rate of incremental aggregated gradient algorithms. SIAM J. Optim. 27(2), 1035–1048 (2017)

    Article  MathSciNet  Google Scholar 

  14. Hendrikx, H., Xiao, L., Bubeck, S., Bach, F., Massoulie, L.: Statistically preconditioned accelerated gradient method for distributed optimization. In: International Conference on Machine Learning, pp. 4203–4227. PMLR (2020)

  15. Hu, W., Li, C.J., Lian, X., Liu, J., Yuan, H.: Efficient smooth non-convex stochastic compositional optimization via stochastic recursive gradient descent (2019)

  16. Huang, X., Yuan, K., Mao, X., Yin, W.: An improved analysis and rates for variance reduction under without-replacement sampling orders. In: Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems, vol. 34, pp. 3232–3243. Curran Associates Inc, Red Hook (2021)

    Google Scholar 

  17. Jain, P., Nagaraj, D., Netrapalli, P.: SGD without replacement: sharper rates for general smooth convex functions. arXiv preprint arXiv:1903.01463 (2019)

  18. Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. Adv. Neural Inf. Process. Syst. 26, 315–323 (2013)

    Google Scholar 

  19. Khaled, A., Richtárik, P.: Better theory for SGD in the nonconvex world. arXiv preprint arXiv:2002.03329 (2020)

  20. Koloskova, A., Doikov, N., Stich, S.U., Jaggi, M.: Shuffle SGD is always better than SGD: improved analysis of SGD with arbitrary data orders. arXiv preprint arXiv:2305.19259 (2023)

  21. Li, B., Ma, M., Giannakis, G.B.: On the convergence of Sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233. PMLR (2020)

  22. Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: a simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295. PMLR (2021)

  23. Li, Z., Richtárik, P.: Zerosarah: efficient nonconvex finite-sum optimization with zero full gradient computation. arXiv preprint arXiv:2103.01447 (2021)

  24. Liu, D., Nguyen, L.M., Tran-Dinh, Q.: An optimal hybrid variance-reduced algorithm for stochastic composite nonconvex optimization. arXiv preprint arXiv:2008.09055 (2020)

  25. Mairal, J.: Incremental majorization-minimization optimization with application to large-scale machine learning. SIAM J. Optim. 25(2), 829–855 (2015)

    Article  MathSciNet  Google Scholar 

  26. Malinovsky, G., Sailanbayev, A., Richtárik, P.: Random reshuffling with variance reduction: new analysis and better rates. arXiv preprint arXiv:2104.09342 (2021)

  27. Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: simple analysis with vast improvements. Adv. Neural Inf. Process. Syst. 33 (2020)

  28. Mokhtari, A., Gurbuzbalaban, M., Ribeiro, A.: Surpassing gradient descent provably: a cyclic incremental method with linear convergence rate. SIAM J. Optim. 28(2), 1420–1447 (2018)

    Article  MathSciNet  Google Scholar 

  29. Moulines, E., Bach, F.: Non-asymptotic analysis of stochastic approximation algorithms for machine learning. In: Shawe-Taylor, J., Zemel, R., Bartlett, P., Pereira, F., Weinberger, K. (eds.) Advances in Neural Information Processing Systems, vol. 24. Curran Associates Inc, Red Hook (2011)

    Google Scholar 

  30. Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course, vol. 87. Springer, New York (2003)

    Google Scholar 

  31. Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: SARAH: a novel method for machine learning problems using stochastic recursive gradient. In: International Conference on Machine Learning, pp. 2613–2621. PMLR (2017)

  32. Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017)

  33. Nguyen, L.M., Nguyen, P.H., Richtárik, P., Scheinberg, K., Takác, M., van Dijk, M.: New convergence aspects of stochastic gradient algorithms. J. Mach. Learn. Res. 20, 176–1 (2019)

    MathSciNet  Google Scholar 

  34. Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact SARAH algorithm for stochastic optimization. Optim. Methods Softw. 36(1), 237–258 (2021)

    Article  MathSciNet  Google Scholar 

  35. Nguyen, L.M., Tran-Dinh, Q., Phan, D.T., Nguyen, P.H., Van Dijk, M.: A unified convergence analysis for shuffling-type gradient methods. J. Mach. Learn. Res. 22(1), 9397–9440 (2021)

    MathSciNet  Google Scholar 

  36. Park, Y., Ryu, E.K.: Linear convergence of cyclic saga. Optim. Lett. 14(6), 1583–1598 (2020)

    Article  MathSciNet  Google Scholar 

  37. Polyak, B.T.: Introduction to optimization

  38. Qian, X., Qu, Z., Richtárik, P.: Saga with arbitrary sampling. In: International Conference on Machine Learning, pp. 5190–5199. PMLR (2019)

  39. Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Math. Program. Comput. 5(2), 201–226 (2013)

    Article  MathSciNet  Google Scholar 

  40. Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat. 400–407 (1951)

  41. Schmidt, M., Le Roux, N., Bach, F.: Minimizing finite sums with the stochastic average gradient. Math. Program. 162(1–2), 83–112 (2017)

    Article  MathSciNet  Google Scholar 

  42. Shalev-Shwartz, S., Ben-David, S.: Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press, Cambridge (2014)

    Book  Google Scholar 

  43. Shalev-Shwartz, S., Singer, Y., Srebro, N., Cotter, A.: Pegasos: primal estimated sub-gradient solver for SVM. Math. Program. 127(1), 3–30 (2011)

    Article  MathSciNet  Google Scholar 

  44. Stich, S.U.: Unified optimal analysis of the (stochastic) gradient method. arXiv preprint arXiv:1907.04232 (2019)

  45. Sun, R.Y.: Optimization for deep learning: an overview. J. Oper. Res. Soc. China 8(2), 249–294 (2020)

    Article  MathSciNet  Google Scholar 

  46. Sun, T., Sun, Y., Li, D., Liao, Q.: General proximal incremental aggregated gradient algorithms: Better and novel results under general scheme. Adv. Neural Inf. Process. Syst. 32 (2019)

  47. Takác, M., Bijral, A., Richtárik, P., Srebro, N.: Mini-batch primal and dual methods for SVMs. In: International Conference on Machine Learning, pp. 1022–1030. PMLR (2013)

  48. Tropp, J.A.: User-friendly tail bounds for sums of random matrices. Found. Comput. Math. 12, 389–434 (2012)

    Article  MathSciNet  Google Scholar 

  49. Vanli, N.D., Gurbuzbalaban, M., Ozdaglar, A.: A stronger convergence result on the proximal incremental aggregated gradient method. arXiv preprint arXiv:1611.08022 (2016)

  50. Vaswani, S., Bach, F., Schmidt, M.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 1195–1204. PMLR (2019)

  51. Yang, Z., Chen, Z., Wang, C.: Accelerating mini-batch SARAH by step size rules. Inf. Sci. 558, 157–173 (2021)

    Article  MathSciNet  Google Scholar 

  52. Ying, B., Yuan, K., Sayed, A.H.: Variance-reduced stochastic learning under random reshuffling. IEEE Trans. Signal Process. 68, 1390–1408 (2020)

    Article  MathSciNet  Google Scholar 

  53. Ying, B., Yuan, K., Vlaski, S., Sayed, A.H.: Stochastic learning under random reshuffling with constant step-sizes. IEEE Trans. Signal Process. 67(2), 474–489 (2018)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

The work of A. Beznosikov was supported by a grant for research centers in the field of artificial intelligence, provided by the Analytical Center for the Government of the Russian Federation in accordance with the subsidy agreement (agreement identifier 000000D730321P5Q0002) and the agreement with the Moscow Institute of Physics and Technology dated November 1, 2021 No. 70-2021-00138. This work was partially conducted while A. Beznosikov, was visiting research assistants in Mohamed bin Zayed University of Artificial Intelligence (MBZUAI).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Aleksandr Beznosikov.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1: Additional experimental results

See Figs. 4 and 5.

Fig. 4
figure 4

Convergence of SARAH-type methods on various LiBSVM datasets. Convergence on the distance to the solution

Fig. 5
figure 5

Convergence of SARAH-type methods on various LIBSVM datasets. Convergence on the norm og the gradient

Appendix 2: RR-SARAH

This Algorithm is a modification of the original SARAH using Random Reshuffling. Unlike Algorithm 1, this algorithm uses the full gradient \(\nabla P\).

Algorithm 2
figure b

RR-SARAH

Theorem 2

Suppose that Assumption 1 holds. Consider RR-SARAH (Algorithm 2) with the choice of \(\eta\) such that

$$\begin{aligned} \eta \le \min \left[ \frac{1}{8n L}; \frac{1}{8n^{2} \delta } \right] . \end{aligned}$$
(6)

Then, we have

$$\begin{aligned} P(w_{s+1}) - P^*&\le \left( 1 - \frac{\eta \mu (n+1)}{2}\right) \left( P(w_s) - P^*\right) . \end{aligned}$$

Corollary 2

Fix \(\varepsilon\), and let us run RR-SARAH with \(\eta\) from (6). Then we can obtain an \(\varepsilon\)-approximate solution (in terms of \(P(w) - P^* \le \varepsilon\)) after

$$\begin{aligned} S = \mathcal {O}\left( \left[ n \cdot \frac{L}{ \mu } + n^2 \cdot \frac{\delta }{\mu } \right] \log \frac{1}{\varepsilon }\right) \quad \text {calls of terms }f_i. \end{aligned}$$

Appendix 3: Missing proofs for Sect. 3 and “Appendix 2”

Before we start to prove, let us note that \(\delta\)-similarity from Assumption 1 gives \((\delta /2)\)-smoothness of function \((f_i - P)\) for any \(i \in [n]\). This implies \(\delta\)-smoothness of function \((f_i - f_j)\) for any \(i,j \in [n]\):

$$\begin{aligned}&\Vert \nabla f_i(w_1) - \nabla f_j(w_1) - (\nabla f_i(w_2) - \nabla f_j(w_2))\Vert \\&\quad \le \Vert \nabla f_i(w_1) - \nabla P(w_1) - (\nabla f_i(w_2) - \nabla P(w_2))\Vert \\&\qquad + \Vert \nabla P(w_1) - \nabla f_j(w_1) - (\nabla P(w_2) - \nabla f_j(w_2))\Vert \\&\quad \le 2\cdot (\delta /2)\Vert w_1 - w_2\Vert ^2 = \delta \Vert w_1 - w_2\Vert ^2. \end{aligned}$$
(7)

Next, we introduce additional notation for simplicity. If we consider Algorithm 1 in iteration \(s \ne 0\), one can note that update rule is nothing more than

$$\begin{aligned} w_s&= w^0_s = w^{n+1}_{s-1}, \\ v_s&= v^0_s = \frac{1}{n} \sum \limits _{i=1}^{n} f_{\pi ^{i}_{s-1}} (w^{i}_{s-1}), \\ w^1_s&= w^0_s - \eta v^0_s, \end{aligned}$$
(8)
$$\begin{aligned} v^{i}_s&= v^{i-1}_s + f_{\pi ^{i}_{s}} (w^{i}_{s}) - f_{\pi ^{i}_{s}} (w^{i-1}_{s}), \end{aligned}$$
(9)
$$\begin{aligned} w^{i+1}_s&= w^{i}_s - \eta v^{i}_s . \end{aligned}$$
(10)

These new notations will be used further in the proofs. For Algorithm 2, one can do exactly the same notations with \(v_s = v^0_s = \nabla P(w_s)\).

Lemma 1

Under Assumption 1, for Algorithms 1 and 2 with \(\eta\) from (5) the following holds

$$\begin{aligned} P(w_{s+1})&\le P(w_s) - \frac{\eta (n + 1)}{2} \Vert \nabla P(w_s)\Vert ^2 + \frac{\eta (n + 1)}{2} \left\| \nabla P(w_s) - \frac{1}{n} \sum \limits _{i=0}^{n} v^i_s\right\| ^2. \end{aligned}$$

Proof

Using L-smoothness of function P (Assumption 1 (i)), we have

$$\begin{aligned} P(w_{s+1})&\le P(w_s) + \langle \nabla P(w_s), w_{s+1} - w_s \rangle + \frac{L}{2} \Vert w_{s+1} - w_s \Vert ^2 \\&= P(w_s) - \eta (n+1) \left\langle \nabla P(w_s), \frac{1}{n+1} \sum \limits _{i=0}^{n} v^i_s \right\rangle + \frac{\eta ^2 (n+1)^2 L}{2} \left\| \frac{1}{n+1} \sum \limits _{i=0}^{n} v^i_s\right\| ^2 \\&= P(w_s) - \frac{\eta (n+1)}{2} \left( \Vert \nabla P(w_s)\Vert ^2 + \left\| \frac{1}{n+1} \sum \limits _{i=0}^{n} v^i_s\right\| ^2 - \left\| \nabla P(w_s) - \frac{1}{n+1} \sum \limits _{i=0}^{n} v^i_s\right\| ^2\right) \\&\quad + \frac{\eta ^2 (n+1)^2 L}{2} \left\| \frac{1}{n+1} \sum \limits _{i=0}^{n} v^i_s\right\| ^2 \\&= P(w_s) - \frac{\eta (n+1)}{2} \Vert \nabla P(w_s)\Vert ^2 - \frac{\eta (n+1)}{2} (1 - \eta (n+1) L) \left\| \frac{1}{n+1} \sum \limits _{i=0}^{n} v^i_s\right\| ^2\\&\quad + \frac{\eta (n+1)}{2} \left\| \nabla P(w_s) - \frac{1}{n+1} \sum \limits _{i=0}^{n} v^i_s\right\| ^2. \end{aligned}$$

With \(\eta \le \frac{1}{8nL} \le \frac{1}{(n+1)L}\), we get

$$\begin{aligned} P(w_{s+1})&\le P(w_s) - \frac{\eta (n+1)}{2} \Vert \nabla P(w_s)\Vert ^2 + \frac{\eta (n+1)}{2} \left\| \nabla P(w_s) - \frac{1}{n+1} \sum \limits _{i=0}^{n} v^i_s\right\| ^2. \end{aligned}$$

Which completes the proof. \(\square\)

Lemma 2

Under Assumption 1, for Algorithms 1 and 2 the following holds

$$\begin{aligned} \left\| \nabla P(w_s) - \frac{1}{n+1} \sum \limits _{i=0}^{n} v^i_s\right\| ^2 \le 2\Vert \nabla P(w_s) - v_s \Vert ^2 + \left( \frac{4L^2}{n+1} + 4 \delta ^2 n\right) \sum \limits _{i=1}^n \Vert w^{i}_s - w_s\Vert ^2. \end{aligned}$$

Proof

To begin with, we prove that for any \(k = n, \ldots , 0\), it holds

$$\begin{aligned} \sum \limits _{i=k}^{n} v^i_s&= \sum \limits _{i=k+1}^n \left( (n+1-i)\cdot \left[ \nabla f_{\pi ^{i}_s} (w^{i}_s) -\nabla f_{\pi ^{i}_s} (w^{i-1}_s) \right] \right) + (n - k +1)v^k_s. \end{aligned}$$
(11)

One can prove it by mathematical induction. For \(k = n\), we have \(\sum _{i=n}^{n} v^i_s = v^n_s\). Suppose (11) holds true for k, let us prove for \(k-1\):

$$\begin{aligned} \sum \limits _{i=k-1}^{n} v^i_s&= v^{k-1}_s + \sum \limits _{i=k}^{n} v^i_s \\&= v^{k-1}_s + \sum \limits _{i=k+1}^n \left( (n+1-i)\cdot \left[ \nabla f_{\pi ^{i}_s} (w^{i}_s) -\nabla f_{\pi ^{i}_s} (w^{i-1}_s) \right] \right) + (n - k +1)v^k_s \\&= v^{k-1}_s + \sum \limits _{i=k+1}^n \left( (n+1-i)\cdot \left[ \nabla f_{\pi ^{i}_s} (w^{i}_s) -\nabla f_{\pi ^{i}_s} (w^{i-1}_s) \right] \right) \\&\quad + (n - k +1) \left[ v^{k-1}_s + f_{\pi ^{k}_{s}} (w^{k}_{s}) - f_{\pi ^{k}_{s}} (w^{k-1}_{s})\right] \\&= \sum \limits _{i= k}^n \left( (n+1-i)\cdot \left[ \nabla f_{\pi ^{i}_s} (w^{i}_s) -\nabla f_{\pi ^{i}_s} (w^{i-1}_s) \right] \right) + (n - k) v^{k-1}_s . \end{aligned}$$

Here we additionally used (9). This completes the proof of (11). In particular, (11) with \(k = 0\) gives

$$\begin{aligned}&\Bigg \Vert \nabla P(w_s) - \frac{1}{n+1} \sum \limits _{i=0}^{n} v^i_s\Bigg \Vert ^2 \\&\quad = \frac{1}{(n+1)^2}\left\| (n+1)\nabla P(w_s) - \sum \limits _{i=0}^{n} v^i_s\right\| ^2 \\&\quad = \frac{1}{(n+1)^2}\Bigg \Vert (n+1)\nabla P(w_s) \\&\qquad - \sum \limits _{i=1}^n \left( (n+1-i)\cdot \left[ \nabla f_{\pi ^{i}_s} (w^{i}_s) -\nabla f_{\pi ^{i}_s} (w^{i-1}_s) \right] \right) + (n +1)v^0_s\Bigg \Vert ^2. \end{aligned}$$

Using \(\Vert a + b\Vert ^2 \le 2\Vert a\Vert ^2 + 2\Vert b\Vert ^2\), we get

$$\begin{aligned}&\Bigg \Vert \nabla P(w_s) - \frac{1}{n+1} \sum \limits _{i=0}^{n} v^i_s\Bigg \Vert ^2 \\&\quad \le 2 \Vert \nabla P(w_s) - v^0_s\Vert ^2 \\&\qquad + \frac{2}{(n+1)^2}\left\| \sum \limits _{i=1}^n \left( (n+1-i)\cdot \left[ \nabla f_{\pi ^{i}_s} (w^{i}_s) -\nabla f_{\pi ^{i}_s} (w^{i-1}_s) \right] \right) \right\| ^2. \end{aligned}$$
(12)

Again using mathematical induction, we prove for \(k = n, \ldots , 1\) the following estimate:

$$\begin{aligned}&\Bigg \Vert \sum \limits _{i=1}^n \left( (n+1-i)\cdot \left[ \nabla f_{\pi ^{i}_s} (w^{i}_s) -\nabla f_{\pi ^{i}_s} (w^{i-1}_s) \right] \right) \Bigg \Vert ^2 \\&\quad \le 2 L^2 (n+1) \sum \limits _{i=k+1}^n \Vert w^{i}_s - w_s\big \Vert ^2 + 2 \delta ^2 (n+1) \sum \limits _{i=k+1}^n (n + 1 - i)^2 \Vert w_s - w^{i-1}_s \Vert ^2 \\&\quad +\frac{n+1}{k+1} \Bigg \Vert (n-k)\nabla f_{\pi ^{k}_s} (w_s) + \nabla f_{\pi ^{k}_s} (w^{k}_s) - (n-k+1)\nabla f_{\pi ^{k}_s} (w^{k-1}_s) \\&\quad + \sum \limits _{i=1}^{k-1} \left( (n+1-i)\cdot \left[ \nabla f_{\pi ^{i}_s} (w^{i}_s) -\nabla f_{\pi ^{i}_s} (w^{i-1}_s) \right] \right) \Bigg \Vert ^2. \end{aligned}$$
(13)

For \(k = n\), the statement holds automatically. Suppose (13) holds true for k, let us prove for \(k-1\):

$$\begin{aligned}&\Bigg \Vert \sum \limits _{i=1}^n \left( (n+1-i)\cdot \left[ \nabla f_{\pi ^{i}_s} (w^{i}_s) -\nabla f_{\pi ^{i}_s} (w^{i-1}_s) \right] \right) \Bigg \Vert ^2 \\&\quad \le 2 L^2 (n+1) \sum \limits _{i=k+1}^n \Vert w^{i}_s - w_s\big \Vert ^2 + 2 \delta ^2 (n+1) \sum \limits _{i=k+1}^n (n - i + 1)^2 \Vert w_s - w^{i-1}_s \Vert ^2 \\&\qquad +\frac{n+1}{k+1} \Bigg \Vert (n-k)\nabla f_{\pi ^{k}_s} (w_s) + \nabla f_{\pi ^{k}_s} (w^{k}_s) - (n-k+1)\nabla f_{\pi ^{k}_s} (w^{k-1}_s)\\&\qquad +\sum \limits _{i=1}^{k-1} \left( (n+1-i)\cdot \left[ \nabla f_{\pi ^{i}_s} (w^{i}_s) -\nabla f_{\pi ^{i}_s} (w^{i-1}_s) \right] \right) \Bigg \Vert ^2 \\&\quad =2 L^2 (n+1) \sum \limits _{i=k+1}^n \Vert w^{i}_s - w_s\big \Vert ^2 + 2 \delta ^2 (n+1) \sum \limits _{i=k+1}^n (n - i + 1)^2 \Vert w_s - w^{i-1}_s \Vert ^2 \\&\qquad +\frac{n+1}{k+1} \Bigg \Vert \nabla f_{\pi ^{k}_s} (w^{k}_s) - f_{\pi ^{k}_s} (w_s) + (n-k+1)\nabla f_{\pi ^{k}_s} (w_s) - (n-k+1)\nabla f_{\pi ^{k}_s} (w^{k-1}_s) \\&\qquad + (n-k+2)\cdot \left[ \nabla f_{\pi ^{k-1}_s} (w^{k-1}_s) -\nabla f_{\pi ^{k-1}_s} (w^{k-2}_s) \right] \\&\qquad + \sum \limits _{i=1}^{k-2} \left( (n+1-i)\cdot \left[ \nabla f_{\pi ^{i}_s} (w^{i}_s) -\nabla f_{\pi ^{i}_s} (w^{i-1}_s) \right] \right) \Bigg \Vert ^2 \\&\quad =2 L^2 (n+1) \sum \limits _{i=k+1}^n \Vert w^{i}_s - w_s\big \Vert ^2 + 2 \delta ^2 (n+1) \sum \limits _{i=k+1}^n (n - i + 1)^2 \Vert w_s - w^{i-1}_s \Vert ^2 \\&\qquad +\frac{n+1}{k+1} \Bigg \Vert \nabla f_{\pi ^{k}_s} (w^{k}_s) - f_{\pi ^{k}_s} (w_s) \\&\qquad + (n-k+1) \cdot \left[ \nabla f_{\pi ^{k}_s} (w_s) - \nabla f_{\pi ^{k-1}_s} (w_s) - \nabla f_{\pi ^{k}_s} (w^{k-1}_s) + \nabla f_{\pi ^{k-1}_s} (w^{k-1}_s) \right] \\&\qquad + (n-k+1)\nabla f_{\pi ^{k-1}_s} (w_s) + \nabla f_{\pi ^{k-1}_s} (w^{k-1}_s) -(n-k+2) \nabla f_{\pi ^{k-1}_s} (w^{k-2}_s) \\&\qquad + \sum \limits _{i=1}^{k-2} \left( (n+1-i)\cdot \left[ \nabla f_{\pi ^{i}_s} (w^{i}_s) -\nabla f_{\pi ^{i}_s} (w^{i-1}_s) \right] \right) \Bigg \Vert ^2. \end{aligned}$$

Using \(\Vert a + b\Vert ^2 \le (1 + c)\Vert a\Vert ^2 + (1+ 1/c)\Vert b\Vert ^2\) with \(c = k\), we have

$$\begin{aligned}&\Bigg \Vert \sum \limits _{i=1}^n \left( (n+1-i)\cdot \left[ \nabla f_{\pi ^{i}_s} (w^{i}_s) -\nabla f_{\pi ^{i}_s} (w^{i-1}_s) \right] \right) \Bigg \Vert ^2 \\&\quad \le 2 L^2 (n+1) \sum \limits _{i=k+1}^n \Vert w^{i}_s - w_s\big \Vert ^2 + 2 \delta ^2 (n+1) \sum \limits _{i=k+1}^n (n - i + 1)^2 \Vert w_s - w^{i-1}_s \Vert ^2 \\&\qquad +(n+1) \Big \Vert \nabla f_{\pi ^{k}_s} (w^{k}_s) - f_{\pi ^{k}_s} (w_s) \\&\qquad + (n-k+1) \cdot \left[ \nabla f_{\pi ^{k}_s} (w_s) - \nabla f_{\pi ^{k-1}_s} (w_s) - \nabla f_{\pi ^{k}_s} (w^{k-1}_s) + \nabla f_{\pi ^{k-1}_s} (w^{k-1}_s) \right] \Big \Vert ^2 \\&\qquad + \frac{n+1}{k} \Bigg \Vert (n-k+1)\nabla f_{\pi ^{k-1}_s} (w_s) + \nabla f_{\pi ^{k-1}_s} (w^{k-1}_s) -(n-k+2) \nabla f_{\pi ^{k-1}_s} (w^{k-2}_s) \\&\qquad + \sum \limits _{i=1}^{k-2} \left( (n+1-i)\cdot \left[ \nabla f_{\pi ^{i}_s} (w^{i}_s) -\nabla f_{\pi ^{i}_s} (w^{i-1}_s) \right] \right) \Bigg \Vert ^2. \end{aligned}$$

With \(\Vert a + b\Vert ^2 \le 2\Vert a\Vert ^2 + 2\Vert b\Vert ^2\) Assumption 1 (\(\delta\)-similarity (7) and L-smoothness), one can obtain

$$\begin{aligned}&\Bigg \Vert \sum \limits _{i=1}^n \left( (n+1-i)\cdot \left[ \nabla f_{\pi ^{i}_s} (w^{i}_s) -\nabla f_{\pi ^{i}_s} (w^{i-1}_s) \right] \right) \Bigg \Vert ^2 \\&\quad \le 2 L^2 (n+1) \sum \limits _{i=k+1}^n \Vert w^{i}_s - w_s\big \Vert ^2 + 2 \delta ^2 (n+1) \sum \limits _{i=k+1}^n (n - i + 1)^2 \Vert w_s - w^{i-1}_s \Vert ^2 \\&\qquad +2(n+1) \Vert \nabla f_{\pi ^{k}_s} (w^{k}_s) - f_{\pi ^{k}_s} (w_s) \Vert ^2 \\&\qquad + 2 (n+1) (n-k+1)^2 \Vert \nabla f_{\pi ^{k}_s} (w_s) - \nabla f_{\pi ^{k-1}_s} (w_s) - \nabla f_{\pi ^{k}_s} (w^{k-1}_s) + \nabla f_{\pi ^{k-1}_s} (w^{k-1}_s) \Vert ^2 \\&\qquad + \frac{n+1}{k} \Bigg \Vert (n-k+1)\nabla f_{\pi ^{k-1}_s} (w_s) + \nabla f_{\pi ^{k-1}_s} (w^{k-1}_s) -(n-k+2) \nabla f_{\pi ^{k-1}_s} (w^{k-2}_s) \\&\qquad + \sum \limits _{i=1}^{k-2} \left( (n+1-i)\cdot \left[ \nabla f_{\pi ^{i}_s} (w^{i}_s) -\nabla f_{\pi ^{i}_s} (w^{i-1}_s) \right] \right) \Bigg \Vert ^2 \\&\quad \le 2 L^2 (n+1) \sum \limits _{i=k}^n \Vert w^{i}_s - w_s\big \Vert ^2 + 2 \delta ^2 (n+1) \sum \limits _{i=k}^n (n - i + 1)^2 \Vert w_s - w^{i-1}_s \Vert ^2 \\&\qquad + \frac{n+1}{k} \Bigg \Vert (n-k+1)\nabla f_{\pi ^{k-1}_s} (w_s) + \nabla f_{\pi ^{k-1}_s} (w^{k-1}_s) -(n-k+2) \nabla f_{\pi ^{k-1}_s} (w^{k-2}_s) \\&\qquad + \sum \limits _{i=1}^{k-2} \left( (n+1-i)\cdot \left[ \nabla f_{\pi ^{i}_s} (w^{i}_s) -\nabla f_{\pi ^{i}_s} (w^{i-1}_s) \right] \right) \Bigg \Vert ^2. \end{aligned}$$

This completes the proof of (13). In particular, (13) with \(k = 1\) gives

$$\begin{aligned}&\Bigg \Vert \sum \limits _{i=1}^n \left( (n+1-i)\cdot \left[ \nabla f_{\pi ^{i}_s} (w^{i}_s) -\nabla f_{\pi ^{i}_s} (w^{i-1}_s) \right] \right) \Bigg \Vert ^2 \\&\quad \le 2 L^2 (n+1) \sum \limits _{i=2}^n \Vert w^{i}_s - w_s\big \Vert ^2 + 2 \delta ^2 (n+1) \sum \limits _{i=2}^n (n - i + 1)^2 \Vert w_s - w^{i-1}_s \Vert ^2 \\&\qquad +\frac{n+1}{2} \Vert (n-1)\nabla f_{\pi ^{1}_s} (w_s) + \nabla f_{\pi ^{1}_s} (w^{1}_s) - n\nabla f_{\pi ^{1}_s} (w^{0}_s)\Vert ^2. \end{aligned}$$

With (8) and L-smoothness of function \(f_{\pi ^{1}_s}\) (Assumption 1 (i)), we have

$$\begin{aligned}&\Bigg \Vert \sum \limits _{i=1}^n \left( (n+1-i)\cdot \left[ \nabla f_{\pi ^{i}_s} (w^{i}_s) -\nabla f_{\pi ^{i}_s} (w^{i-1}_s) \right] \right) \Bigg \Vert ^2 \\&\quad \le 2 L^2 (n+1) \sum \limits _{i=2}^n \Vert w^{i}_s - w_s\big \Vert ^2 + 2 \delta ^2 (n+1) \sum \limits _{i=2}^n (n - i + 1)^2 \Vert w_s - w^{i-1}_s \Vert ^2 \\&\qquad +\frac{n+1}{2} \Vert \nabla f_{\pi ^{1}_s} (w^{1}_s) - \nabla f_{\pi ^{1}_s} (w_s)\Vert ^2 \\&\quad \le 2 L^2 (n+1) \sum \limits _{i=1}^n \Vert w^{i}_s - w_s\big \Vert ^2 + 2 \delta ^2 (n+1) \sum \limits _{i=2}^n (n - i + 1)^2 \Vert w_s - w^{i-1}_s \Vert ^2 \\&\quad \le \left( 2 L^2 (n+1) + 2 \delta ^2 (n+1) n^2 \right) \sum \limits _{i=1}^n \Vert w^{i}_s - w_s\big \Vert ^2. \end{aligned}$$
(14)

Substituting of (14) to (12) completes proof. \(\square\)

Lemma 3

Under Assumption 1, for Algorithms 1 and 2 with \(\eta\) from (5) the following holds for \(i \in [n]\)

$$\begin{aligned} \Vert v^i_s\Vert ^2 \le \Vert v^{i-1}_s\Vert ^2. \end{aligned}$$

Proof

With (9) and (10), we have

$$\begin{aligned} \Vert v^{i}_s \Vert ^2&= \Vert v^{i-1}_s + f_{\pi ^{i}_{s}} (w^{i}_{s}) - f_{\pi ^{i}_{s}} (w^{i-1}_{s}) \Vert ^2 \\&= \Vert v^{i-1}_s \Vert ^2 + \Vert f_{\pi ^{i}_{s}} (w^{i}_{s}) - f_{\pi ^{i}_{s}} (w^{i-1}_{s}) \Vert ^2 + 2\langle v^{i-1}_s, f_{\pi ^{i}_{s}} (w^{i}_{s}) - f_{\pi ^{i}_{s}} (w^{i-1}_{s})\rangle \\&= \Vert v^{i-1}_s \Vert ^2 + \Vert f_{\pi ^{i}_{s}} (w^{i}_{s}) - f_{\pi ^{i}_{s}} (w^{i-1}_{s}) \Vert ^2 + \frac{2}{\eta }\langle w^{i-1}_s - w^i_s, f_{\pi ^{i}_{s}} (w^{i}_{s}) - f_{\pi ^{i}_{s}} (w^{i-1}_{s})\rangle . \end{aligned}$$

Assumption 1 (i) on convexity and L-smoothness of \(f_{\pi ^{i}_{s}}\) gives (see also Theorem 2.1.5 from [30])

$$\begin{aligned} \Vert v^{i}_s \Vert ^2&= \Vert v^{i-1}_s + f_{\pi ^{i}_{s}} (w^{i}_{s}) - f_{\pi ^{i}_{s}} (w^{i-1}_{s}) \Vert ^2 \\&= \Vert v^{i-1}_s \Vert ^2 + \Vert f_{\pi ^{i}_{s}} (w^{i}_{s}) - f_{\pi ^{i}_{s}} (w^{i-1}_{s}) \Vert ^2 + 2\langle v^{i-1}_s, f_{\pi ^{i}_{s}} (w^{i}_{s}) - f_{\pi ^{i}_{s}} (w^{i-1}_{s})\rangle \\&= \Vert v^{i-1}_s \Vert ^2 + \Vert f_{\pi ^{i}_{s}} (w^{i}_{s}) - f_{\pi ^{i}_{s}} (w^{i-1}_{s}) \Vert ^2 + \frac{2}{\eta }\langle w^{i-1}_s - w^i_s, f_{\pi ^{i}_{s}} (w^{i}_{s}) - f_{\pi ^{i}_{s}} (w^{i-1}_{s})\rangle \\&\le \Vert v^{i-1}_s \Vert ^2 + \left( 1 - \frac{2}{L \eta } \right) \Vert f_{\pi ^{i}_{s}} (w^{i}_{s}) - f_{\pi ^{i}_{s}} (w^{i-1}_{s}) \Vert ^2. \end{aligned}$$

Taking into account that \(\eta \le \frac{1}{8nL} \le \frac{1}{2\,L}\), we finishes the proof. \(\square\)

Proof of Theorem 2

For RR-SARAH \(v_s = \nabla P(w_s)\), then by Lemma 2, we get

$$\begin{aligned} \left\| \nabla P(w_s) - \frac{1}{n+1} \sum \limits _{i=0}^{n} v^i_s\right\| ^2&\le \left( \frac{4L^2}{n+1} + 4 \delta ^2 n\right) \sum \limits _{i=1}^n \Vert w^{i}_s - w_s\Vert ^2. \end{aligned}$$

Combining with Lemma 1, one can obtain

$$\begin{aligned} P(w_{s+1})&\le P(w_s) - \frac{\eta (n+1)}{2} \Vert \nabla P(w_s)\Vert ^2 + \frac{\eta (n+1)}{2} \left( \frac{4L^2}{n+1} + 4 \delta ^2 n\right) \sum \limits _{i=1}^n \Vert w^{i}_s - w_s\Vert ^2. \end{aligned}$$

Next, we work with \(\sum \nolimits _{i=1}^n \Vert w^{i}_s - w_s\Vert ^2\). By Lemma 3 and the update for \(w^i_s\) ((8) and (10)), we get

$$\begin{aligned} \sum \limits _{i=1}^n \Vert w^{i}_s - w_s\Vert ^2&= \eta ^2 \sum \limits _{i=1}^n \left\| \sum \limits _{k=0}^{i-1} v^k_s \right\| ^2 \le \eta ^2 \sum \limits _{i=1}^n i \sum \limits _{k=0}^{i-1} \left\| v^k_s \right\| ^2 \le \eta ^2 \sum \limits _{i=1}^n i \sum \limits _{k=0}^{i-1} \left\| v_s \right\| ^2 \\&\le \eta ^2 \left\| v_s \right\| ^2 \sum \limits _{i=1}^n i \sum \limits _{k=0}^{i-1} 1 \le \eta ^2 n^3 \left\| v_s \right\| ^2 = \eta ^2 n^3 \left\| \nabla P(w_s) \right\| ^2. \end{aligned}$$
(15)

Hence,

$$\begin{aligned} P(w_{s+1})&\le P(w_s) - \frac{\eta (n+1)}{2} \Vert \nabla P(w_s)\Vert ^2 + \frac{\eta (n+1)}{2} \left( \frac{4L^2}{n+1} + 4 \delta ^2 n\right) \cdot \eta ^2 n^3 \left\| \nabla P(w_s)\right\| ^2 \\&\le P(w_s) - \frac{\eta (n+1)}{2} \left( 1 - \left( \frac{4L^2}{n+1} + 4 \delta ^2 n\right) \cdot \eta ^2 n^3\right) \Vert \nabla P(w_s)\Vert ^2. \end{aligned}$$

With \(\gamma \le \min \left\{ \frac{1}{8n L}; \frac{1}{8n^{2} \delta } \right\}\), we get

$$\begin{aligned} P(w_{s+1}) - P^*&\le P(w_s) - P^* - \frac{\eta (n+1)}{4} \Vert \nabla P(w_s)\Vert ^2. \end{aligned}$$

Strong-convexity of P ends the proof:

$$\begin{aligned} P(w_{s+1}) - P^*&\le \left( 1 - \frac{\eta (n+1) \mu }{2}\right) \left( P(w_s) - P^*\right) . \end{aligned}$$

\(\square\)

Proof of Theorem 1

For Shuffled-SARAH \(v_s = \frac{1}{n} \sum _{i=1}^{n} f_{\pi ^{i}_{s-1}} (w^{i}_{s-1})\), then

$$\begin{aligned}&\left\| \nabla P(w_s) - \frac{1}{n} \sum \limits _{i=1}^{n} v^i_s\right\| ^2 \\&\quad \le \left( \frac{4L^2}{n+1} + 4 \delta ^2 n\right) \sum \limits _{i=1}^n \Vert w^{i}_s - w_s\Vert ^2 + 2 \left\| \frac{1}{n} \sum \limits _{i=1}^{n} \left[ f_{\pi ^{i}_{s-1}}(w_s) - f_{\pi ^{i}_{s-1}} (w^{i}_{s-1}) \right] \right\| ^2 \\&\quad \le \left( \frac{4L^2}{n+1} + 4\delta ^2 n \right) \sum \limits _{i=1}^n \Vert w^{i}_s - w_s\Vert ^2 + \frac{2L^2}{n}\sum \limits _{i=1}^n \left\| w^{i}_{s-1} - w_s\right\| ^2. \end{aligned}$$
(16)

With \(\sum \nolimits _{i=1}^n \Vert w^{i}_s - w_s\Vert ^2\) we can work in the same way as in proof of Theorem 2. In remains to deal with \(\sum _{i=1}^n \left\| w^{i}_{s-1} - w_s\right\| ^2\). Using Lemma 3 and the update for \(w^i_s\) ((8) and (10)), we get

$$\begin{aligned} \sum \limits _{i=1}^n \Vert w^{i}_{s-1} - w_s\Vert ^2&= \eta ^2 \sum \limits _{i=1}^{n} \left\| \sum \limits _{k=1}^{n+ 1 -i} v^{n + 1-k}_{s-1} \right\| ^2 \le \eta ^2 \sum \limits _{i=1}^n (n+1 -i) \sum \limits _{k=1}^{n+ 1 -i} \left\| v^{n+1-k}_{s-1} \right\| ^2 \\&\le \eta ^2 \sum \limits _{i=1}^n (n+1 -i)\sum \limits _{k=1}^{n+ 1 -i} \left\| v_{s-1} \right\| ^2 \\&\le \eta ^2 \left\| v_{s-1} \right\| ^2 \sum \limits _{i=1}^n (n+1 -i) \sum \limits _{k=1}^{n+1 -i} 1 \\&\le \eta ^2 n^3 \left\| v_{s-1} \right\| ^2. \end{aligned}$$
(17)

Combining the results of Lemma 1 with (16), (15) and (17), one can obtain

$$\begin{aligned} P(w_{s+1})&\le P(w_s) - \frac{\eta (n+1)}{2} \Vert \nabla P(w_s)\Vert ^2 \\&\quad + \frac{\eta (n+1)}{2} \left[ \left( \frac{4L^2}{n+1} + 4\delta ^2 n \right) \cdot \eta ^2 n^3 \left\| v_s\right\| ^2 + \frac{2L^2}{n}\cdot \eta ^2 n^3 \left\| v_{s-1} \right\| ^2\right] \\&= P(w_s) - \frac{\eta (n+1)}{4} \Vert \nabla P(w_s)\Vert ^2 \\&\quad + \frac{\eta (n+1)}{2} \left[ \left( \frac{4L^2}{n+1} + 4\delta ^2 n \right) \cdot \eta ^2 n^3 \left\| v_s\right\| ^2 + \frac{2L^2}{n}\cdot \eta ^2 n^3 \left\| v_{s-1} \right\| ^2\right] \\&\quad - \frac{\eta (n+1)}{4} \Vert \nabla P(w_s)\Vert ^2 \\&\le P(w_s) - \frac{\eta (n+1)}{4} \Vert \nabla P(w_s)\Vert ^2 \\&\quad + \frac{\eta (n+1)}{2} \left[ \left( \frac{4L^2}{n+1} + 4\delta ^2 n \right) \cdot \eta ^2 n^3 \left\| v_s\right\| ^2 + \frac{2L^2}{n}\cdot \eta ^2 n^3 \left\| v_{s-1} \right\| ^2\right] \\&\quad - \frac{\eta (n+1)}{8} \Vert v_s \Vert ^2 + \frac{\eta (n+1)}{4} \Vert v_s - \nabla P(w_s)\Vert ^2 \\&\le P(w_s) - \frac{\eta (n+1)}{4} \Vert \nabla P(w_s)\Vert ^2 \\&\quad + \frac{\eta (n+1)}{2} \left[ \left( \frac{4L^2}{n+1} + 4\delta ^2 n \right) \cdot \eta ^2 n^3 \left\| v_s\right\| ^2 + \frac{2L^2}{n}\cdot \eta ^2 n^3 \left\| v_{s-1} \right\| ^2\right] \\&\quad - \frac{\eta (n+1)}{8} \Vert v_s \Vert ^2 + \frac{\eta (n+1)}{4} \cdot \frac{2L^2}{n} \cdot \eta ^2 n^3 \left\| v_{s-1} \right\| ^2. \end{aligned}$$

The last step is deduced the same way as (17). Small rearrangement gives

$$\begin{aligned} P(w_{s+1}) - P^*&\le P(w_s) - P^* - \frac{\eta (n+1)}{4} \Vert \nabla P(w_s)\Vert ^2 \\&\quad - \frac{\eta (n+1)}{8} \left( 1 - \left( \frac{16L^2}{n+1} + 16\delta ^2 n \right) \cdot \eta ^2 n^3 \right) \Vert v_s \Vert ^2 \\&\quad + \eta (n+1)\cdot \frac{2L^2}{n} \cdot \eta ^2 n^3 \left\| v_{s-1} \right\| ^2. \end{aligned}$$

With the choice of \(\eta \le \min \left\{ \frac{1}{8n L}; \frac{1}{8n^{2} \delta } \right\}\), we have

$$\begin{aligned}&P(w_{s+1}) - P^* + \frac{\eta (n+1)}{16}\Vert v_s \Vert ^2 \\&\quad \le P(w_s) - P^* - \frac{\eta (n+1)}{4} \Vert \nabla P(w_s)\Vert ^2 + \frac{\eta (n+1)}{16}\cdot \frac{32L^2}{n} \cdot \eta ^2 n^3 \left\| v_{s-1} \right\| ^2. \end{aligned}$$

Again using that \(\eta \le \frac{1}{8nL}\), we obtain \(32\,L^2 \eta ^2 n^2 \le \left( 1 - \frac{\eta (n+1) \mu }{2}\right)\) and

$$\begin{aligned}&P(w_{s+1}) - P^* + \frac{\eta (n+1)}{16}\Vert v_s \Vert ^2 \\&\quad \le P(w_s) - P^* - \frac{\eta (n+1)}{4} \Vert \nabla P(w_s)\Vert ^2 + \left( 1 - \frac{\eta (n+1) \mu }{2}\right) \cdot \frac{\eta (n+1)}{16}\left\| v_{s-1} \right\| ^2. \end{aligned}$$

Strong-convexity of P ends the proof. \(\square\)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Beznosikov, A., Takáč, M. Random-reshuffled SARAH does not need full gradient computations. Optim Lett 18, 727–749 (2024). https://doi.org/10.1007/s11590-023-02081-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11590-023-02081-x

Keywords

Navigation