1 Introduction

We consider the optimization problem

$$\begin{aligned}&\min _{(x, z)\in {\mathbb {R}}^n\times {\mathbb {R}}^m} f(x)+g(z),\nonumber \\&\quad \textrm{s}.\,\textrm{t}.\ Ax+Bz=b, \end{aligned}$$
(1)

where \(f: {\mathbb {R}}^n\rightarrow {\mathbb {R}}\cup \{\infty \}\) and \(g: {\mathbb {R}}^m\rightarrow {\mathbb {R}}\cup \{\infty \}\) are closed proper convex functions, \(0\ne A\in {\mathbb {R}}^{r\times n}\), \(0\ne B\in {\mathbb {R}}^{r\times m}\) and \(b\in {\mathbb {R}}^{r}\). Moreover, we assume that \((x^\star , z^\star )\) is an optimal solution of problem (1) and \(\lambda ^\star \) is its corresponding Lagrange multipliers. Moreover, we denote the value of f and g at \(x^\star \) and \(z^\star \) with \(f^\star \) and \(g^\star \), respectively.

Problem (1) appears naturally (or after variable splitting) in many applications in statistics, machine learning and image processing to name but a few [9, 23, 29, 42]. The most common method for solving problem (1) is the alternating direction method of multipliers (ADMM). ADMM is a dual based approach that exploits separable structure and it may be described as follows.

Algorithm 1
figure a

ADMM

ADMM was first proposed in [14, 16] for solving nonlinear variational problems. We refer the interested reader to [17] for a historical review of ADMM. The popularity of ADMM is due to its capability to be implemented parallelly and hence can handle large-scale problems [9, 22, 34, 45]. For example, it is used for solving inverse problems governed by partial differential equation forward models [32], and distributed energy resource coordinations [30], to mention but a few.

The convergence of ADMM has been investigated extensively in the literature and there exist many convergence results. However, different performance measures have been used for the computation of convergence rate; see [13, 18, 19, 24, 28, 29, 35, 44]. In this paper, we consider the dual objective value as a performance measure.

Throughout the paper, we assume that each subproblem in steps 1 and 2 of Algorithm 1 attains its minimum. The Lagrangian function of problem (1) may be written as

$$\begin{aligned} L(x, z, \lambda )= f(x)+g(z)+\langle \lambda , Ax+Bz-b\rangle , \end{aligned}$$
(2)

and the dual objective of problem (1) is also defined as

$$\begin{aligned} D(\lambda )=\min _{(x, z)\in {\mathbb {R}}^n \times {\mathbb {R}}^m} f(x)+g(z)+\langle \lambda , Ax+Bz-b\rangle . \end{aligned}$$

We assume throughout the paper that strong duality holds for problem (1), that is

$$\begin{aligned} \max _{\lambda \in {\mathbb {R}}^r} D(\lambda )=\min _{Ax+Bz=b} f(x)+g(z). \end{aligned}$$

Note that we have strong duality when both functions f and g are real-valued. For extended convex functions, strong duality holds under some mild conditions; see e.g. [4, Chapter 15].

Some common performance measures for the analysis of ADMM are as follows,

  • Objective value: \(\left| f(x^N)+g(z^N)-f^\star -g^\star \right| \);

  • Primal and dual feasibility: \(\left\| Ax^N+Bz^N-b\right\| \) and \(\left\| A^TB(z^N-z^{N-1})\right\| \);

  • Dual objective value: \(D(\lambda ^\star )-D(\lambda ^N)\);

  • Distance between \((x^N, z^N, \lambda ^N)\) and a saddle points of problem (2).

Note that the mathematical expressions are written in a non-ergodic sense for convenience. Each measure is useful in monitoring the progress and convergence of ADMM. The objective value is the most commonly used performance measure for the analysis of algorithms in convex optimization [4, 5, 37]. As mentioned earlier, ADMM is a dual based method and it may be interpreted as a proximal method applied to the dual problem; see [5, 29] for further discussions and insights. Thus, a natural performance measure for ADMM would be dual objective value. In this study, we investigate the convergence rate of ADMM in terms of dual objective value and feasibility. It worth noting that most performance measures may be analyzed through the framework developed in Sect. 2.

Regarding dual objective value, the following convergence rate is known in the literature. This theorem holds for strongly convex functions f and g; recall that f is called strongly convex with modulus \(\mu \ge 0\) if the function \(f-\tfrac{\mu }{2}\Vert \cdot \Vert ^2\) is convex.

Theorem 1

[19, Theorem 1] Let f and g be strongly convex with moduli \(\mu _1>0\) and \(\mu _2>0\), respectively. If \(t\le \root 3 \of {\frac{\mu _1\mu _2^2}{\lambda _{\max } (A^TA)\lambda _{\max }^2 (B^TB)}}\), then

$$\begin{aligned} D(\lambda ^\star )-D(\lambda ^N)\le \frac{\Vert \lambda ^1 -\lambda ^\star \Vert ^2}{2t(N-1)}. \end{aligned}$$
(3)

In this study we establish that Algorithm 1 has the convergence rate of \(O(\tfrac{1}{N})\) in terms of dual objective value without assuming the strong convexity of g. Under this setting, we also prove that Algorithm 1 has the convergence rate of \(O(\tfrac{1}{N})\) in terms of primal and dual residuals. Moreover, we show that the given bounds are exact. Furthermore, we study the linear and R-linear convergence.

1.1 Outline of our paper

Our paper is structured as follows. We present the semidefinite programming (SDP) performance estimation method in Sect. 2, and we develop the performance estimation to handle dual based methods including ADMM. In Sect. 3, we derive some new non-asymptotic convergence rates by using performance estimation for ADMM in terms of dual function, primal and dual residuals. Furthermore, we show that the given bounds are tight by providing some examples. In Sect. 4 we proceed with the study of the linear convergence of ADMM. We establish that ADMM enjoys a linear convergence if and only if the dual function satisfies the PŁ inequality when the objective function is strongly convex. Furthermore, we investigate the relation between the PŁ inequality and common conditions used by scholars to prove the linear convergence. Section 5 is devoted to the R-linear convergence. We prove that ADMM is R-linear convergent under two new scenarios which are weaker than the existing ones in the literature.

1.2 Terminology and notation

In this subsection we review some definitions and concepts from convex analysis. The interested reader is referred to the classical text by Rockafellar [41] for more information. The n-dimensional Euclidean space is denoted by \({\mathbb {R}}^n\). We use \(\langle \cdot , \cdot \rangle \) and \(\Vert \cdot \Vert \) to denote the Euclidean inner product and norm, respectively. The column vector \(e_i\) represents the i-th standard unit vector and I stands for the identity matrix. For a matrix A, \(A_{i, j}\) denotes its (ij)-th entry, and \(A^T\) represents the transpose of A. The notation \(A\succeq 0\) means the matrix A is symmetric positive semidefinite. We use \(\lambda _{\max } (A)\) and \(\lambda _{\min } (A)\) to denote the largest and the smallest eigenvalue of symmetric matrix A, respectively. Moreover, the seminorm \(\Vert \cdot \Vert _A\) is defined as \(\Vert x\Vert _A=\Vert Ax\Vert \) for any \(A\in {\mathbb {R}}^{m\times n}\); see [26, Section 5.2] for more discussion.

Suppose that \(f:{\mathbb {R}}^n\rightarrow (-\infty , \infty ]\) is an extended convex function. The function f is called closed if its epi-graph is closed, that is \(\{(x, r): f(x)\le r\}\) is a closed subset of \({\mathbb {R}}^{n+1}\). The function f is said to be proper if there exists \(x\in {\mathbb {R}}^n\) with \(f(x)<\infty \). We denote the set of proper and closed convex functions on \({\mathbb {R}}^n\) by \({\mathcal {F}}_{0}({\mathbb {R}}^n)\). The subgradients of f at x is denoted and defined as

$$\begin{aligned} \partial f(x)=\{\xi : f(y)\ge f(x)+\langle \xi , y-x\rangle , \forall y\in {\mathbb {R}}^n\}. \end{aligned}$$

We call a differentiable function f L-smooth if for any \(x_1, x_2\in {\mathbb {R}}^n\),

$$\begin{aligned} \left\| \nabla f(x_1)-\nabla f(x_2) \right\| \le L\Vert x_1-x_2\Vert \ \ \forall x_1, x_2\in {\mathbb {R}}^n. \end{aligned}$$

Definition 1

Let \(f:{\mathbb {R}}^n\rightarrow (-\infty , \infty ]\) be a closed proper function and let \(A\in {\mathbb {R}}^{m\times n}\). We say f is c-strongly convex relative to \(\Vert .\Vert _A\) if the function \(f-\tfrac{c}{2} \Vert . \Vert _A^2\) is convex.

In the rest of the section, we assume that \(A\in {\mathbb {R}}^{m\times n}\). It is seen that any \(\mu \)-strongly convex function is \(\tfrac{\mu }{\lambda _{\max }(A^TA)}\)-strongly convex relative to \(\Vert .\Vert _A\). However, its converse does not necessarily hold unless A has full column rank. Hence, the assumption of strong convexity relative to \(\Vert .\Vert _A\) for a given matrix A is weaker compared to the assumption of strong convexity. For further details on the strong convexity in relation to a given function, we refer the reader to [3, 33]. We denote the set of c-strongly convex functions relative to \(\Vert .\Vert _A\) on \({\mathbb {R}}^n\) by \({\mathcal {F}}_{c}^A({\mathbb {R}}^n)\). We denote the distance function to the set X by \(d_X(x):=\inf _{y\in X}\Vert y-x\Vert \).

In the following sections we derive some new convergence rates for ADMM by using performance estimation. The main idea of performance estimation is based on interpolablity. Let \({\mathcal {I}}\) be an index set and let \(\{(x^i; g^i; f^i)\}_{i\in {\mathcal {I}}}\subseteq {\mathbb {R}}^n\times {\mathbb {R}}^n\times {\mathbb {R}}\). A set \(\{(x^i; \xi ^i; f^i)\}_{i\in {\mathcal {I}}}\) is called \({\mathcal {F}}^A_{c}\)-interpolable if there exists \(f\in {\mathcal {F}}^A_{c}({\mathbb {R}}^n)\) with

$$\begin{aligned} f(x^i)=f^i, \ \xi ^i\in \partial f(x^i) \ \ i\in {\mathcal {I}}. \end{aligned}$$

The next theorem gives necessary and sufficient conditions for \({\mathcal {F}}_{c}^A\)-interpolablity.

Theorem 2

Let \(c\in [0, \infty )\) and let \({\mathcal {I}}\) be an index set. The set \(\{(x^i; \xi ^i; f^i)\}_{i\in {\mathcal {I}}}\subseteq {\mathbb {R}}^n\times {\mathbb {R}}^n \times {\mathbb {R}}\) is \({\mathcal {F}}^A_{c}\)-interpolable if and only if for any \(i, j\in {\mathcal {I}}\), we have

$$\begin{aligned} \tfrac{c}{2}\left\| x^i-x^j\right\| _A^2\le f^i-f^j -\left\langle \xi ^j, x^i-x^j\right\rangle . \end{aligned}$$
(4)

Moreover, \(\{(x^i; \xi ^i; f^i)\}_{i\in {\mathcal {I}}}\) is \({\mathcal {F}}_{0}\)-interpolable and L-smooth if and only if for any \(i, j\in {\mathcal {I}}\), we have

$$\begin{aligned} \tfrac{1}{2L}\left\| g^i-g^j\right\| ^2\le f^i-f^j -\left\langle g^j, x^i-x^j\right\rangle . \end{aligned}$$
(5)

Proof

The argument is analogous to that of [46, Theorem 4]. The triple \(\{(x^i; \xi ^i; f^i)\}_{i\in {\mathcal {I}}}\) is \({\mathcal {F}}^A_{c}\)-interpolable if and only if the triple \(\{(x^i; \xi ^i-cA^TAx^i; f^i-\tfrac{c}{2}\Vert x^i\Vert _A^2)\}_{i\in {\mathcal {I}}}\) is \({\mathcal {F}}_{0}\)-interpolable. By [46, Theorem 1], \(\{(x^i; \xi ^i-cA^TAx^i; f^i-\tfrac{c}{2}\Vert x^i\Vert _A^2)\}_{i\in {\mathcal {I}}}\) is \({\mathcal {F}}_{0}\)-interpolable if and only if

$$\begin{aligned} f^i-\tfrac{c}{2}\left\| x^i \right\| _A^2\ge f^j-\tfrac{c}{2}\left\| x^j \right\| _A^2 -\left\langle \xi ^j-cA^TAx^j, x^i-x^j\right\rangle \end{aligned}$$

which implies inequality (4). The second part follows directly from [46, Theorem 4]. \(\square \)

Note that any convex function is 0-strongly convex relative to A. Let \(f\in {\mathcal {F}}_{0}({\mathbb {R}}^n)\). The conjugate function \(f^*:{\mathbb {R}}^n\rightarrow (-\infty , \infty ]\) is defined as \(f^*(y)=\sup _{x\in {\mathbb {R}}^n} \langle y, x\rangle -f(x)\). We have the following identity

$$\begin{aligned} \xi \in \partial f(x) \ \ \Leftrightarrow \ \ x\in \partial f^*(\xi ). \end{aligned}$$
(6)

Let \(f\in {\mathcal {F}}_{0}({\mathbb {R}}^n)\) be \(\mu \)-strongly convex. The function f is \(\mu \)-strongly convex if and only if \(f^*\) is \(\tfrac{1}{\mu }\)-smooth. Moreover, \((f^*)^*=f\).

By using conjugate functions, the dual of problem (1) may be written as

$$\begin{aligned} D(\lambda )&=\min _{(x, z)\in {\mathbb {R}}^n \times {\mathbb {R}}^m} f(x)+g(z)+\langle \lambda , Ax+Bz-b\rangle \nonumber \\&=-\langle \lambda , b\rangle -f^*(-A^T\lambda )-g^*(-B^T\lambda ). \end{aligned}$$
(7)

By the optimality conditions for the dual problem, we get

$$\begin{aligned} b-Ax^\star -Bz^\star =0, \end{aligned}$$
(8)

for some \( x^\star \in \partial f^*(-A^T\lambda ^\star )\) and \(z^\star \in \partial g^*(-B^T\lambda ^\star )\). Equation (8) with (6) imply that \(( x^\star , z^\star )\) is an optimal solution to problem (1).

The optimality conditions for the subproblems of Algorithm 1 may be written as

$$\begin{aligned}&0\in \partial f(x^k)+A^T\lambda ^{k-1} +tA^T\left( Ax^k+Bz^{k-1}-b\right) ,\nonumber \\&0\in \partial g(z^k)+B^T\lambda ^{k-1}+tB^T\left( Ax^k+Bz^{k}-b\right) . \end{aligned}$$
(9)

As \(\lambda ^k=\lambda ^{k-1}+t(Ax^k+Bz^k-b)\), we get

$$\begin{aligned} 0\in \partial f(x^k)+A^T\lambda ^k+tA^TB\left( z^{k-1}-z^k\right) , \ \ 0\in \partial g(z^k)+B^T\lambda ^k. \end{aligned}$$
(10)

So, \((x^k, z^k)\) is optimal for dual objective at \(\lambda ^k\) if and only if \(A^TB\left( z^{k-1}-z^k\right) =0\). We call \(A^TB\left( z^{k-1}-z^k\right) \) dual residual.

2 Performance estimation

In this section, we develop the performance estimation for ADMM. The performance estimation method introduced by Drori and Teboulle [12] is an SDP-based method for the analysis of first order methods. Since then, many scholars employed this strong tool to derive the worst case convergence rate of different iterative methods; see [2, 27, 43, 46] and the references therein. Moreover, Gu and Yang [20] employed performance estimation to study the extension of the dual step length for ADMM. Note that while there are some similarities between our work and [20] in using performance estimation, the formulations and results are different.

The worst-case convergence rate of Algorithm 1 with respect to dual objective value may be cast as the following abstract optimization problem,

$$\begin{aligned}&\max D(\lambda ^\star )-D(\lambda ^N)\nonumber \\&\ \textrm{s}.\,\textrm{t}.\{x^k, z^k, \lambda ^k\}_1^N \text {is generated by Algorithm}~1 \text {w.r.t.} \ f, g, A, B, b, \lambda ^0, z^0, t \nonumber \\&\qquad (x^\star , z^\star ) \text {is an optimal solution with Lagrangian multipliers } \lambda ^\star \nonumber \\&\qquad \Vert \lambda ^0-\lambda ^\star \Vert ^2+t^2\left\| z^0 -z^\star \right\| _B^2=\varDelta \nonumber \\&\qquad f\in {\mathcal {F}}^A_{c_1}({\mathbb {R}}^n), g \in {\mathcal {F}}^B_{c_2}({\mathbb {R}}^m)\nonumber \\&\qquad \lambda ^0\in {\mathbb {R}}^r, z^0\in {\mathbb {R}}^m, A\in {\mathbb {R}}^{r\times n}, B\in {\mathbb {R}}^{r\times m},b \in {\mathbb {R}}^{r}, \end{aligned}$$
(11)

where \(f, g, A, B, b, z^0, \lambda ^0, x^\star , z^\star , \lambda ^\star \) are decision variables and \(N, t, c_1, c_2, \varDelta \) are the given parameters. Note that problem (11) will be unbounded unless we impose some initial condition. We regard boundedness of \(\Vert \lambda ^0-\lambda ^\star \Vert ^2+t^2\left\| z^0-z^\star \right\| _B^2\) as an initial condition. The boundedness of \(t^{-1}\Vert \lambda ^0-\lambda ^\star \Vert ^2+t\left\| z^0-z^\star \right\| _B^2\) is commonly used for the convergence analysis of ADMM; see e.g. [9, 29]. We opt to utilize the positive multiplication of this criterion for notational convenience as t is a fixed positive constant in Algorithm 1. Moreover, we use this measure to establish R-linear convergence in terms of dual objective; see Sect. 5 for more discussion.

Note that \(D(\lambda ^\star )=f^\star +g^\star \) and \((\tilde{x}, \tilde{z})\in {{\,\textrm{argmin}\,}}f(x)+g(z)+\langle \lambda ^N, Ax+Bz-b\rangle \) if and only if

$$\begin{aligned} \tilde{\xi }+A^T\lambda ^N=0, \ \ \ \tilde{\eta }+B^T\lambda ^N=0, \end{aligned}$$
(12)

for some \(\tilde{\xi }\in \partial f(\tilde{x})\) and \(\tilde{\eta }\in \partial g(\tilde{z})\). It is worth noting that a point \(\tilde{x}\) satisfying these conditions exists, as function f is strongly convex relative to A. In addition, one may consider \(\tilde{z}=z^N\) by virtue of (10). For the sake of notational convenience, we introduce \(x^{N+1}=\tilde{x}\) and \(\xi ^{N+1}=\tilde{\xi }\). The reader should bear in mind that \(x^{N+1}\) is not generated by Algorithm 1. Therefore,

$$\begin{aligned} D(\lambda ^N)=f(x^{N+1})+g(z^N)+\left\langle \lambda ^N, Ax^{N+1} +Bz^N-b\right\rangle \end{aligned}$$

for some \(x^{N+1}\) with \(-A^T\lambda ^N\in \partial f(x^{N+1})\).

By using Theorem 2 to replace the conditions \(f\in {\mathcal {F}}^A_{c_1}({\mathbb {R}}^n)\), and \(g\in {\mathcal {F}}^B_{c_2}({\mathbb {R}}^m)\) by finite interpolation conditions, and by using the optimality conditions (9), problem (11) may be reformulated as a finite dimensional optimization problem, through the performance estimation technique:

$$\begin{aligned}&\max \ f^\star +g^\star -\left( f^{N+1}+g^N +\left\langle \lambda ^N, Ax^{N+1}+Bz^N-b\right\rangle \right) \nonumber \\&\ \textrm{s}.\,\textrm{t}.\ \{(x^k; \xi ^k; f^k)\}_1^{N+1}\cup \{(x^\star ; \xi ^\star ; f^\star )\} \ \text {satisfy interpolation constraints} \ (4) \nonumber \\&\qquad \{(z^k; \eta ^k; g^k)\}_0^N\cup \{(z^\star ; \eta ^\star ; g^\star )\} \ \text {satisfy interpolation constraints} \ (4)\nonumber \\&\qquad (x^\star , z^\star ) \ \text {is an optimal solution with Lagrangian multipliers } \ \lambda ^\star \nonumber \\&\qquad \Vert \lambda ^0-\lambda ^\star \Vert ^2+t^2 \left\| z^0-z^\star \right\| _B^2= \varDelta \nonumber \\&\qquad \xi ^k=tA^Tb-tA^TAx^k-tA^TBz^{k-1}-A^T \lambda ^{k-1}, \ \ k\in \{1, \ldots , N\}\nonumber \\&\qquad \eta ^k=tB^Tb-tB^TAx^k-tB^TBz^{k} -B^T\lambda ^{k-1}, \ \ k\in \{1, \ldots , N\}\nonumber \\&\qquad \lambda ^k=\lambda ^{k-1}+t(Ax^k+Bz^{k}-b), \ \ k\in \{1, \ldots , N\}\nonumber \\&\qquad \xi ^{N+1}+A^T\lambda ^N=0\nonumber \\&\qquad \lambda ^0\in {\mathbb {R}}^r, z^0 \in {\mathbb {R}}^m, A\in {\mathbb {R}}^{r\times n}, B \in {\mathbb {R}}^{r\times m},b\in {\mathbb {R}}^{r}. \end{aligned}$$
(13)

In problem (13), \(A, B, \{x^k; \xi ^k; f^k\}_1^{N+1}, \{(x^\star ; \xi ^\star ; f^\star )\}, \{\lambda ^k\}_0^{N}, \{z^k; \eta ^k; g^k\}_0^{N},\) \(\{(z^\star ; \eta ^\star ; g^\star )\}, \lambda ^\star , b\) are decision variables. To handle problem (13), without loss of generality, we assume that the matrix \(\begin{pmatrix} A&B \end{pmatrix}\) has full row rank. Note this assumption does not appear in our arguments in the following sections. In addition, we introduce some new variables. As problem (1) is invariant under translation of (xz), we may assume without loss of generality that \(b=0\) and \((x^\star , z^\star )=(0, 0)\). In addition, due to the full row rank of the matrix \(\begin{pmatrix} A&B \end{pmatrix}\), we may assume that \(\lambda ^0=\begin{pmatrix} A&B \end{pmatrix} \begin{pmatrix} x^{\dagger } \\ z^{\dagger } \end{pmatrix}\) and \(\lambda ^\star =\begin{pmatrix} A&B \end{pmatrix} \begin{pmatrix} \bar{x} \\ \bar{z} \end{pmatrix}\) for some \(\bar{x}, x^{\dagger }, \bar{z}, z^{\dagger }\). So,

$$\begin{aligned} \xi ^\star =-A^TA\bar{x}-A^TB\bar{z}\in \partial f(0), \ \ \eta ^\star =-B^TA\bar{x}-B^TB\bar{z}\in \partial g(0), \end{aligned}$$

and \(D(\lambda ^\star )=f^\star +g^\star \).

By using equality constraints of problem (13) and the newly introduced variables, we have for \(k\in \{1,\ldots , N\}\)

$$\begin{aligned} \lambda ^k&=(Ax^{\dagger }+Bz^{\dagger }) +\sum _{i=1}^{k} t(Ax^i+Bz^i),\nonumber \\&\quad -(A^TAx^{\dagger }+A^TBz^{\dagger })\nonumber \\&\quad -\sum _{i=1}^{k-1} t(A^TAx^i+A^TBz^i)-tA^TAx^k-tA^TBz^{k-1} \in \partial f(x^k),\nonumber \\&\quad -(B^TAx^{\dagger }+B^TBz^{\dagger }) -\sum _{i=1}^{k} t(B^TAx^i+B^TBz^i)\in \partial g(z^k). \end{aligned}$$
(14)

Hence, problem (13) may be written as

$$\begin{aligned}&\max \ f^\star +g^\star -f^{N+1}-g^{N}-\left\langle Ax^{\dagger } +Bz^{\dagger }+\sum _{i=1}^{N} t(Ax^i+Bz^i), Ax^{N+1} +Bz^{N}\right\rangle \nonumber \\&\ \textrm{s}.\,\textrm{t}.\ \ \tfrac{c_1}{2}\left\| x^k-x^j\right\| _A^2 \le \left\langle Ax^{\dagger }+Bz^{\dagger } +\sum _{i=1}^{k-1} t(Ax^i+Bz^i)+tAx^k +tBz^{k-1}, A(x^j-x^k) \right\rangle + \nonumber \\&\qquad f^j-f^k, \ \ k\in \{1, \dots , N\}, \ \ \ j\in \{1, \dots , N+1\},\nonumber \\&\qquad \tfrac{c_1}{2}\left\| x^{N+1}-x^j\right\| _A^2 \le \left\langle Ax^{\dagger }+Bz^{\dagger } +\sum _{i=1}^{N} t(Ax^i+Bz^i), A \left( x^j-x^{N+1}\right) \right\rangle + \nonumber \\&\qquad f^j-f^{N+1}, \ \ \ j\in \{1, \dots , N\},\nonumber \\&\qquad \tfrac{c_2}{2}\left\| z^k-z^j\right\| _B^2 \le \left\langle Ax^{\dagger }+Bz^{\dagger } +\sum _{i=1}^{k} t(Ax^i+Bz^i), B \left( z^j-z^k\right) \right\rangle + \nonumber \\&\qquad \qquad g^j-g^k, \ \ \ j, k\in \{1, \dots , N\},\nonumber \\&\qquad \tfrac{c_1}{2}\left\| x^k\right\| _A^2\le f^k-f^\star +\left\langle A\bar{x}+B\bar{z}, Ax^k \right\rangle , \ \ \ k\in \{1, \dots , N+1\},\nonumber \\&\qquad \tfrac{c_1}{2}\left\| x^k\right\| _A^2\le -\left\langle Ax^{\dagger }+Bz^{\dagger } +\sum _{i=1}^{k-1} t(Ax^i+Bz^i)+tAx^k +tBz^{k-1}, Ax^k \right\rangle + \nonumber \\&\qquad \qquad f^\star -f^k, \ \ \ k\in \{1, \dots , N\},\nonumber \\&\qquad \tfrac{c_1}{2}\left\| x^{N+1}\right\| _A^2\le f^\star -f^{N+1} -\left\langle Ax^{\dagger }+Bz^{\dagger }+\sum _{i=1}^{N} t(Ax^i+Bz^i), Ax^{N+1} \right\rangle ,\nonumber \\&\qquad \tfrac{c_2}{2}\left\| z^k\right\| _B^2\le g^k-g^\star +\left\langle A\bar{x}+B\bar{z}, Bz^k \right\rangle , \ \ \ k\in \{1, \dots , N\},\nonumber \\&\qquad \tfrac{c_2}{2}\left\| z^k\right\| _B^2 \le g^\star -g^k-\left\langle Ax^{\dagger } +Bz^{\dagger }+\sum _{i=1}^{k} t(Ax^i+Bz^i), Bz^k \right\rangle , \ k\in \{1, \dots , N\},\nonumber \\&\qquad \left\| Ax^{\dagger }+Bz^{\dagger }-(A\bar{x}+B\bar{z})\right\| ^2 +t^2\left\| z^0\right\| _B^2= \varDelta ,\nonumber \\&\qquad x^{\dagger }\in {\mathbb {R}}^n, z^0, z^{\dagger }\in {\mathbb {R}}^m, A\in {\mathbb {R}}^{r\times n}, B\in {\mathbb {R}}^{r\times m}. \end{aligned}$$
(15)

In problem (15), \(A, B, \{x^k; f^k\}_1^{N+1}, \{z^k; g^k\}_1^{N}, x^{\dagger }, z^{\dagger }, \bar{x}, f^\star , \bar{z}, g^\star , z^0\) are decision variables. By using the Gram matrix method, problem (15) may be relaxed as a semidefinite program as follows. Let

$$\begin{aligned}&U=\begin{pmatrix} x^{\dagger }&x^1&\dots&x^{N+1}&\bar{x} \end{pmatrix}, \qquad V=\begin{pmatrix} z^{\dagger }&z^0&\dots&z^{N}&\bar{z} \end{pmatrix}. \end{aligned}$$

By introducing matrix variable

$$\begin{aligned} Y=\begin{pmatrix} AU&BV \end{pmatrix}^T \begin{pmatrix} AU&BV \end{pmatrix}, \end{aligned}$$

problem (15) may be relaxed as the following SDP,

$$\begin{aligned}&\max \ f^\star +g^\star -f^{N+1}-g^{N}-{{\,\textrm{tr}\,}}(L_oY) \nonumber \\&\ \textrm{s}.\,\textrm{t}.\ {{\,\textrm{tr}\,}}(L_{i, j}^fY)\le f^i-f^j, \ \ i,j\in \{1, \ldots , N+1, \star \} \nonumber \\&\qquad {{\,\textrm{tr}\,}}(L_{i, j}^gY)\le g^i-g^j , \ \ i,j\in \{1, \ldots , N, \star \} \nonumber \\&\qquad {{\,\textrm{tr}\,}}(L_0 Y)= \varDelta \nonumber \\&\qquad Y\succeq 0, \end{aligned}$$
(16)

where the constant matrices \(L_{i, j}^f, L_{i, j}^g, L_o, L_0\) are determined according to the constraints of problem (15). In the following sections, we present some new convergence results that are derived by solving this kind of formulation.

3 Worst-case convergence rate

In this section, we provide new convergence rates for ADMM with respect to some performance measures. Before we get to the theorems we need to present some lemmas.

Lemma 1

Let \(N\ge 4\) and \(t, c \in {\mathbb {R}}\). Let E(tc) be \((N+1)\times (N+1)\) symmetric matrix given by

$$\begin{aligned} E(t, c )=\left( \begin{array}{cccccccccc} 2c &{} 0 &{} 0 &{} 0 &{} \dots &{} 0 &{} 0 &{} \dots &{} 0 &{} t-c \\ 0 &{} \alpha _2 &{} \beta _2 &{} 0 &{} \dots &{} 0 &{} 0 &{} \dots &{} 0 &{} -t\\ 0 &{} \beta _2 &{} \alpha _3 &{} \beta _3 &{} \dots &{} 0 &{} 0 &{} \dots &{} 0 &{} t\\ \vdots &{}\vdots &{} \vdots &{} \vdots &{} \vdots &{} \vdots &{} \vdots &{} \vdots &{} \vdots &{} \vdots \\ 0 &{} 0 &{} 0 &{} 0 &{} \dots &{} \alpha _k &{} \beta _k &{} \dots &{} 0 &{} t\\ \vdots &{} \vdots &{} \vdots &{} \vdots &{} \vdots &{} \vdots &{} \vdots &{} \vdots &{} \vdots &{} \vdots \\ 0 &{} 0 &{} 0 &{} 0 &{} \dots &{} 0 &{} 0 &{} \dots &{} \alpha _{N} &{} \beta _{N}\\ t-c &{} -t &{} t &{} t &{} \dots &{} t &{} t &{} \dots &{} \beta _{N} &{} \alpha _{N+1}\\ \end{array}\right) , \end{aligned}$$

where

$$\begin{aligned}&\alpha _k= {\left\{ \begin{array}{ll} 6c -5t, &{} {k=2}\\ 2\left( 2k^2-3k+1\right) c -\left( 4k-1\right) t, &{} {3\le k\le N-1}\\ 2N(N-1)c -(2N+1)t, &{} {k=N} \\ 2Nc -(N+1)t, &{} {k=N+1}, \end{array}\right. } \\&\beta _k= {\left\{ \begin{array}{ll} 2kt-(2k^2-k-1)c , &{} {2\le k\le N-1 } \\ 3t-2(N-1)c , &{} {k=N}, \end{array}\right. } \end{aligned}$$

and k denotes row number. If \(c >0\) is given, then

$$\begin{aligned} {[}0, c ]\subseteq \{t: E(t, c )\succeq 0\}. \end{aligned}$$

Proof

As \(\{t: E(t, c )\succeq 0\}\) is a convex set, it suffices to prove the positive semidefiniteness of E(0, c) and E(cc). Since E(0, c) is diagonally dominant, it is positive semidefinite. Now, we establish that the matrix \(K=E(1,1)\) is positive definite. To this end, we show that all leading principal minors of K are positive. To compute the leading principal minors, we perform the following elementary row operations on K:

  1. (i)

    Add the second row to the third row;

  2. (ii)

    Add the second row to the last row;

  3. (iii)

    Add the third row to the forth row;

  4. (iv)

    For \(i=4:N-1\)

    • Add \(i-th\) row to \((i+1)-th\) row;

    • Add \(\tfrac{3-i}{2i^2-3i-1}\) times of \(i-th\) row to the last row;

  5. (v)

    Add \(\frac{N-1}{3N-5}\) times of \(N-th\) row to \((N+1)-th\) row.

It is seen that \(K_{k-1, k}+K_{k, k}=-K_{k+1, k}\) for \(2\le k\le N-1\). Hence, by performing these operations, we get an upper triangular matrix J with diagonal

$$\begin{aligned} J_{k, k}={\left\{ \begin{array}{ll} 2, &{} {k=1}\\ 2k^2-3k-1, &{} {2\le k\le N-1}\\ 3N-5, &{} {k=N} \\ N-2-\frac{(N-1)^2}{ 3N-5}-\sum _{i=4}^{N-1} \tfrac{(i-3)^2}{2i^2-3i-1}, &{} {k=N+1}. \end{array}\right. } \end{aligned}$$

It is seen all first N diagonal elements of J are positive. We show that \(J_{N+1, N+1}\) is also positive. For \(i\ge 4\) we have

$$\begin{aligned} \tfrac{(i-3)^2}{2i^2-3i-1}\le \tfrac{(i-1)^2+4}{2(i-1)^2} \le \tfrac{1}{2}+\tfrac{2}{(i-1)(i-2)}. \end{aligned}$$
(17)

So,

$$\begin{aligned} \tfrac{2N^2-9N+9}{3N-5}-\sum _{i=4}^{N-1}\tfrac{(i-3)^2}{2i^2-3i-1} \ge \tfrac{(N-2)(N^2-5N+10)}{2N(3N-5)}> 0, \end{aligned}$$

which implies \(J_{N+1, N+1}>0\). Since we add a factor of \(i-th\) row to \(j-th\) row with \(i<j\), all leading principal minors of matrices K and J are the same. Hence K is positive definite. As \( E(c, c )=c K\), one can infer the positive definiteness of E(cc) and the proof is complete. \(\square \)

In the upcoming lemma, we establish a valid inequality for ADMM that will be utilized in all the subsequent results presented in this section.

Lemma 2

Let \(f\in {\mathcal {F}}^A_{c_1}({\mathbb {R}}^n)\), \(g\in {\mathcal {F}}_{0}({\mathbb {R}}^m)\) and \(x^\star =0\), \(z^\star =0\). Suppose that ADMM with the starting points \(\lambda ^0\) and \(z^0\) generates \(\{(x^k; z^k; \lambda ^k)\}\). If \(N\ge 4\) and \(v\in {\mathbb {R}}^r\), then

$$\begin{aligned}&N\langle \lambda ^N, Ax^N+Bz^N\rangle -\langle \lambda ^N +tAx^N+tBz^{N-1}, Ax^N-v\rangle +\langle \lambda ^{0} +tAx^1+tBz^0, Ax^1-v\rangle \nonumber \\&\quad +\tfrac{1}{2t}\left\| \lambda ^{0}-\lambda ^\star \right\| ^2 -\tfrac{1}{2t}\left\| \lambda ^{N}-\lambda ^\star \right\| ^2 +\tfrac{t}{2}\left\| z^0\right\| ^2_B -t \left\langle Ax^1-Ax^2+(N+1)Ax^N+Bz^N, v \right\rangle \nonumber \\&\quad -t\sum _{k=3}^{N} \langle Ax^k, v\rangle +\tfrac{t(N-1)}{2}\left\| v\right\| ^2 -\tfrac{c_1}{2}\left\| x^1\right\| _A^2 +\sum _{k=2}^{N}\tfrac{\alpha _k}{2} \left\| x^k\right\| ^2_A+\sum _{k=2}^{N-1} \beta _k\langle Ax^k,Ax^{k+1}\rangle \nonumber \\&\quad +tN\langle Bz^{N-1}, Ax^N-v\rangle +t\langle Ax^N, Bz^{N} \rangle -\tfrac{t(N-1)^2}{2} \left\| z^N-z^{N-1}\right\| _B^2 -\tfrac{tN^2}{2}\left\| Ax^{N}+Bz^{N}\right\| ^2\nonumber \\&\quad -t\left\| x^2\right\| _A^2+f(x^1)-f(x^N) +N\left( f(x^N)-f^\star +g(x^N)-g^\star \right) \ge 0, \end{aligned}$$
(18)

where

$$\begin{aligned}&\alpha _k= {\left\{ \begin{array}{ll} \left( 4k-1\right) t-2\left( 2k^2-3k+1\right) c_1, &{} {2\le k\le N-1},\\ \left( 4N+1\right) t-\left( 2N^2-5N+3\right) c_1, &{} {k=N}, \end{array}\right. }\\&\beta _k=\left( 2k^2-k-1\right) c_1-2kt. \end{aligned}$$

Proof

To establish the desired inequality, we demonstrate its validity by summing a series of valid inequalities. To simplify the notation, let \(f^k=f(x^k)\) and \(g^k=g(z^k)\) for \(k\in \{1, \dots , N\}\). Note that \(b=0\) because \(x^\star =0, z^\star =0\). By (4) and (9), we get the following inequality

$$\begin{aligned}&\sum _{k=1}^{N-1}(k^2-1)\left( f^{k+1}-f^k +\left\langle \lambda ^{k-1}+tAx^k+tBz^{k-1}, A(x^{k+1}-x^k) \right\rangle -\tfrac{c_1}{2} \left\| x^{k+1}-x^k\right\| _A^2\right) \\&\quad +\sum _{k=1}^{N-1}(k^2-k)\left( f^k-f^{k+1} +\left\langle \lambda ^{k}+tAx^{k+1}+tBz^{k}, A(x^k-x^{k+1}) \right\rangle -\tfrac{c_1}{2} \left\| x^{k+1}-x^k\right\| _A^2\right) \\&\quad +\sum _{k=1}^{N}\left( f^k-f^\star +\left\langle \lambda ^\star , Ax^k \right\rangle -\tfrac{c_1}{2}\left\| x^k\right\| _A^2\right) +\sum _{k=1}^{N-1}k^2\left( g^k-g^{k+1} +\left\langle \lambda ^{k+1}, B(z^{k}-z^{k+1}) \right\rangle \right) \\&\quad +\sum _{k=1}^{N-1}(k^2+k)\left( g^{k+1}-g^k +\left\langle \lambda ^k, B(z^{k+1}-z^k) \right\rangle \right) +\sum _{k=1}^{N}\left( g^k-g^\star +\left\langle \lambda ^\star , Bz^k \right\rangle \right) \\&\quad +\tfrac{t}{2}\left\| Ax^1+Bz^0-v\right\| ^2\ge 0. \end{aligned}$$

As \(\lambda ^k = \lambda ^{k-1} + tAx^k + tBz^k\), the inequality can be expressed as

$$\begin{aligned}&\sum _{k=1}^{N-1}(k^2-1)\left( \left\langle tAx^k+tBz^{k-1}, A(x^{k+1}-x^k) \right\rangle -\tfrac{c_1}{2} \left\| x^{k+1}-x^k\right\| _A^2\right) \\&\quad +\sum _{k=1}^{N-1}(k^2-1) \left( \left\langle \lambda ^{k}, Ax^{k+1} \right\rangle -\left\langle \lambda ^{k-1}, Ax^{k} \right\rangle -\left\langle tAx^k+tBz^k, Ax^{k+1} \right\rangle \right) \\&\quad +\sum _{k=1}^{N-1}(k^2-k) \left( \left\langle tAx^{k+1} +tBz^{k}, A(x^k-x^{k+1}) \right\rangle -\tfrac{c_1}{2} \left\| x^{k+1}-x^k\right\| _A^2\right) \\&\quad +\sum _{k=1}^{N-1}(k^2-k)\left( \left\langle \lambda ^{k-1}, Ax^{k} \right\rangle -\left\langle \lambda ^{k}, Ax^{k+1} \right\rangle +\left\langle tAx^k+tBz^k, Ax^{k} \right\rangle \right) \\&\quad +\sum _{k=1}^{N-1}(k^2+k) \left( \left\langle \lambda ^k, Bz^{k+1} \right\rangle -\left\langle \lambda ^{k-1}, Bz^k \right\rangle -\left\langle tAx^k+tBz^k, Bz^k \right\rangle \right) \\&\quad +\sum _{k=1}^{N-1}k^2\Bigg (\left\langle \lambda ^{k-1}, Bz^{k} \right\rangle -\left\langle \lambda ^{k}, Bz^{k+1} \right\rangle +\left\langle tAx^{k}+tBz^{k}+tAx^{k+1}+tBz^{k+1}, Bz^{k} \right\rangle \\&\quad -\left\langle tAx^{k+1}+tBz^{k+1}, Bz^{k+1} \right\rangle \Bigg ) +\sum _{k=1}^{N}\left( \left\langle \lambda ^\star , Ax^k+Bz^k \right\rangle -\tfrac{c_1}{2}\left\| x^k\right\| _A^2\right) +\tfrac{t}{2}\left\| Bz^0\right\| ^2\\&\quad +\tfrac{t}{2}\left\| Ax^1-v\right\| ^2 +t\left\langle Ax^1-v, Bz^0\right\rangle +f^1-f^N +N(f^N-f^\star +g^N-g^\star )\ge 0. \end{aligned}$$

After performing some algebraic manipulations, we obtain

$$\begin{aligned}&N\langle \lambda ^{N-1}, Ax^N+Bz^N\rangle - \langle \lambda ^{N-1}, Ax^N\rangle +\langle \lambda ^{0}, Ax^1\rangle - \sum _{k=0}^{N-1} \langle \lambda ^k-\lambda ^\star , Ax^{k+1}+Bz^{k+1}\rangle \\&\quad +\tfrac{t}{2}\left\| Ax^1-v\right\| ^2+\tfrac{t}{2}\left\| Bz^0\right\| ^2+t\left\langle Ax^1-v, Bz^0\right\rangle - t(N^2-3N+1) \langle Ax^N, Bz^{N-1} \rangle \\&\quad -t\sum _{k=1}^{N-1}\left( (k-1)^2 \Vert Ax^k\Vert ^2-(k^2-k)\langle Ax^k, Ax^{k+1}\rangle -(k^2-1)\langle Ax^{k+1}, Bz^{k-1}\rangle \right) \\&\quad -t\sum _{k=1}^{N-1} \left( (k^2-k+1) \Vert Bz^k\Vert ^2+(-k^2+k+1)\langle Ax^k, Bz^k\rangle -k^2\langle Bz^k, Bz^{k+1}\rangle \right) \\&\quad -t\sum _{k=2}^{N-1} \left( (2k^2-3k) \langle Ax^{k}, Bz^{k-1} \rangle \right) -t(N-1)^2\Vert Bz^{N}\Vert ^2-t(N^2-3N+2)\Vert Ax^N\Vert ^2\\&\quad -t(N-1)^2\langle Ax^N, Bz^{N} \rangle -\sum _{k=1}^{N-1}\left( (2k^2-k-1)\tfrac{c_1}{2} \left\| x^{k+1}-x^k\right\| _A^2+\tfrac{c_1}{2} \left\| x^{k+1}\right\| _A^2\right) \\&\quad -\tfrac{c_1}{2}\left\| x^{1}\right\| _A^2+f^1-f^N +N(f^N-f^\star +g^N-g^\star )\ge 0. \end{aligned}$$

By using \(\lambda ^{N-1}=\lambda ^N-tAx^N-tBz^N\) and

$$\begin{aligned}{} & {} 2\langle \lambda ^k-\lambda ^\star , Ax^{k+1}+Bz^{k+1}\rangle =\tfrac{1}{t}\Vert \lambda ^{k+1}-\lambda ^\star \Vert ^2 \\{} & {} \quad -\tfrac{1}{t}\Vert \lambda ^{k}-\lambda ^\star \Vert ^2 -t\Vert Ax^{k+1}+Bz^{k+1}\Vert ^2, \end{aligned}$$

we get

$$\begin{aligned}&N\langle \lambda ^N, Ax^N+Bz^N\rangle - \langle \lambda ^N+tAx^N+tBz^{N-1}, Ax^N-v\rangle +\langle \lambda ^{0}+tAx^1+tBz^0, Ax^1-v\rangle \\&\quad +\tfrac{1}{2t}\left\| \lambda ^{0}-\lambda ^\star \right\| ^2 -\tfrac{1}{2t}\left\| \lambda ^{N}-\lambda ^\star \right\| ^2 +\tfrac{t}{2}\left\| z^0\right\| _B^2-t \left\langle Ax^1-Ax^2 +(N+1)Ax^N+Bz^N, v \right\rangle \\&\quad -t\sum _{k=3}^{N} \left\langle Ax^k, v\right\rangle -\tfrac{t}{2} \sum _{k=2}^{N-1}\left\| (k-1)Bz^{k-1}-(k-1) Bz^{k}+kAx^{k}-(k+1)Ax^{k+1}+v \right\| ^2\\&\quad +\tfrac{t(N-1)}{2}\left\| v\right\| ^2 -\frac{c_1}{2}\left\| x^1\right\| _A^2 -2t\left\| x^2\right\| _A^2 +\tfrac{1}{2}\sum _{k=2}^{N-1}\left( \left( 4k-1\right) t -2\left( 2k^2-3k+1\right) c_1\right) \left\| x^k\right\| ^2_A\\&\quad +\sum _{k=2}^{N-1}\left( \left( 2k^2-k-1\right) c_1-2kt\right) \langle Ax^k,Ax^{k+1}\rangle +\left( \left( 2N+\tfrac{1}{2}\right) t -\left( N^2-\tfrac{5}{2}N+\tfrac{3}{2}\right) c_1\right) \left\| x^N\right\| ^2_A\\&\quad +tN\left\langle Bz^{N-1}, Ax^N-v\right\rangle +t\left\langle Ax^N, Bz^{N}\right\rangle -\tfrac{t\left( N-1\right) ^2}{2}\left\| z^N-z^{N-1}\right\| _B^2\\&\quad -\tfrac{tN^2}{2}\left\| Ax^{N}+Bz^{N}\right\| ^2+f^1-f^N+N \left( f^N-f^\star +g^N-g^\star \right) \ge 0, \end{aligned}$$

which implies the desired inequality. \(\square \)

We may now prove the main result of this section.

Theorem 3

Let \(f\in {\mathcal {F}}^A_{c_1}({\mathbb {R}}^n)\) and \(g\in {\mathcal {F}}_{0}({\mathbb {R}}^m)\) with \(c_1>0\). If \(t\le c_1\) and \(N\ge 4\), then

$$\begin{aligned} D(\lambda ^\star )-D(\lambda ^N)\le \frac{\Vert \lambda ^0-\lambda ^\star \Vert ^2 +t^2\left\| z^0-z^\star \right\| _B^2}{4Nt}. \end{aligned}$$
(19)

Proof

As discussed in Sect. 2, we may assume that \(x^\star =0\) and \(z^\star =0\). By (12), we have \(D(\lambda ^N)=f(\hat{x}^{N})+g(z^N)+\left\langle \lambda ^N, A\hat{x}^{N}+Bz^N\right\rangle \) for some \(\hat{x}^{N}\) with \(-A^T\lambda ^N\in \partial f(\hat{x}^{N})\). By employing (4) and (9), we obtain

$$\begin{aligned}&N\left( g(x^N)-g^\star +\langle \lambda ^\star , Bz^N\rangle \right) +(N-1)\left( f(x^N)-f^\star +\langle \lambda ^\star , Ax^N\rangle -\tfrac{c_1}{2}\left\| x^N\right\| _A^2\right) \nonumber \\&\quad +\left( f(\hat{x}^{N})-f(x^1)+\left\langle \lambda ^0 +tAx^{1}+tBz^0, A\hat{x}^{N}-Ax^1\right\rangle -\tfrac{c_1}{2} \left\| \hat{x}^{N}-x^1\right\| _A^2\right) \nonumber \\&\quad +(2N-2)\Bigg ( f(\hat{x}^{N})-f(x^N) +\left\langle \lambda ^N-tBz^{N}+tBz^{N-1}, A\hat{x}^{N}-Ax^N\right\rangle \nonumber \\&\quad -\tfrac{c_1}{2}\left\| \hat{x}^{N}-x^N\right\| _A^2 \Bigg ) +\left( f(\hat{x}^{N})-f^\star +\langle \lambda ^\star , A\hat{x}^{N}\rangle -\tfrac{c_1}{2} \left\| \hat{x}^{N}\right\| _A^2 \right) \ge 0. \end{aligned}$$
(20)

By substituting v with \(A\hat{x}^{N}\) in inequality (18) and summing it with (20), we get the following inequality after performing some algebraic manipulations

$$\begin{aligned}&2N\left( f(\hat{x}^{N})+g(x^N)+\left\langle \lambda ^N, A\hat{x}^{N}+Bz^N\right\rangle -f^\star -g^\star \right) +\tfrac{1}{2t}\left\| \lambda ^{0}-\lambda ^\star \right\| ^2 +\tfrac{t}{2}\left\| z^0\right\| _B^2\nonumber \\&\quad -\tfrac{1}{2t}\left\| \lambda ^N-\lambda ^\star +t(N-1)Ax^N+tA\hat{x}^{N}+tNBz^N \right\| ^2\nonumber \\&\quad -\tfrac{t}{2}\left\| (N-1)(Bz^{N-1}-Bz^N)+tAx^N -tA\hat{x}^{N} \right\| ^2\nonumber \\&\quad -\tfrac{1}{2}{{\,\textrm{tr}\,}}\left( {E(t, c_1)} \begin{pmatrix} Ax^1&\dots&A\hat{x}^{N} \end{pmatrix}^T \begin{pmatrix} Ax^1&\dots&A\hat{x}^{N} \end{pmatrix}\right) \ge 0, \end{aligned}$$
(21)

where the positive semidefinite matrix \(E(t, c_1)\) is given in Lemma 1. As the inner product of positive semidefinite matrices is non-negative, inequality (21) implies that

$$\begin{aligned} 2N\left( D(\lambda ^\star )-D(\lambda ^N)\right) \le \tfrac{1}{2t} \left\| \lambda ^{0}-\lambda ^\star \right\| ^2+ \tfrac{t}{2} \left\| z^0\right\| _B^2, \end{aligned}$$

and the proof is complete. \(\square \)

In comparison with Theorem 1, we could get a new convergence rate when only f is strongly convex, i.e. g does not need to be strongly convex. Also, the constant does not depend on \(\lambda ^1\). One important question concerning bound (19) is its tightness, that is, if there is an optimization problem which attains the given convergence rate. It turns out that the bound (19) is exact. The following example demonstrates this point.

Example 1

Suppose that \(c_1>0\), \(N\ge 4\) and \(t\in (0, c_1]\). Let \(f, g: {\mathbb {R}}\rightarrow {\mathbb {R}}\) be given as follows,

$$\begin{aligned} f(x)=\tfrac{1}{2}|x|+\tfrac{c_1}{2}x^2, \ \ g(z)=\tfrac{1}{2}\max \left\{ \tfrac{N-1}{N} \left( z-\tfrac{1}{2Nt}\right) -\tfrac{1}{2Nt}, -z\right\} . \end{aligned}$$

Consider the optimization problem

$$\begin{aligned}&\min _{(x, z)\in {\mathbb {R}}\times {\mathbb {R}}} f(x)+g(z),\\&\quad \textrm{s}.\,\textrm{t}.\ x+z=0, \end{aligned}$$

It is seen that \(A=B=I\) in this problem. Note that \((x^\star , z^\star )=(0, 0)\) with Lagrangian multiplier \(\lambda ^\star =\tfrac{1}{2}\) is an optimal solution and the optimal value is zero. One can check that Algorithm 1 with initial point \(\lambda ^0=\tfrac{-1}{2}\) and \( z^0=0\) generates the following points,

$$\begin{aligned}&x^k=0 \quad k\in \{1, \dots , N\} \\&z^k=\tfrac{1}{2Nt} \quad k\in \{1, \dots , N\} \\&\lambda ^k=\tfrac{-1}{2}+\tfrac{k}{2N} \quad k\in \{1, \dots , N\}. \end{aligned}$$

At \(\lambda ^{N}\), we have \(D(\lambda ^N)=\tfrac{-1}{4Nt} =-\tfrac{\Vert \lambda ^0-\lambda ^\star \Vert ^2+t^2 \left\| z^0-z^\star \right\| _B^2}{4Nt}\), which shows the tightness of bound (19).

One important factor concerning dual-based methods that determines the efficiency of an algorithm is primal and dual feasibility (residual) convergence rates. In what follows, we study this subject under the setting of Theorem 3. The next theorem gives a convergence rate in terms of primal residual under the setting of Theorem 3.

Theorem 4

Let \(f\in {\mathcal {F}}_{c_1}^A({\mathbb {R}}^n)\) and \(g\in {\mathcal {F}}_{0}({\mathbb {R}}^m)\) with \(c_1>0\). If \(t\le c_1\) and \(N\ge 4\), then

$$\begin{aligned} \left\| Ax^N+Bz^N-b\right\| \le \frac{\sqrt{\Vert \lambda ^0 -\lambda ^\star \Vert ^2+t^2\left\| z^0-z^\star \right\| _B^2}}{tN}. \end{aligned}$$
(22)

Proof

The argument is similar to that used in the proof of Theorem 3. By setting \(v=Ax^{N}\) in (18), one can infer the following inequality

$$\begin{aligned}&N\left\langle \lambda ^N, Ax^N+Bz^N\right\rangle +\left\langle \lambda ^{0}+tAx^1+tBz^0, Ax^1-Ax^N\right\rangle +\tfrac{1}{2t}\left\| \lambda ^{0}-\lambda ^\star \right\| ^2 +\tfrac{t}{2}\left\| z^0\right\| _B^2\nonumber \\&\quad - t \left\langle Ax^1-Ax^2, Ax^N \right\rangle +\tfrac{t(N-1)}{2}\left\| Ax^N\right\| ^2 -t\sum _{k=3}^{N} \left\langle Ax^k, Ax^N\right\rangle -\frac{c_1}{2}\left\| x^1\right\| _A^2-t\left\| x^2\right\| _A^2\nonumber \\&\quad +\sum _{k=2}^{N-1}\left( \left( 2k-\tfrac{1}{2}\right) t -\left( 2k^2-3k+1\right) c_1\right) \left\| x^k\right\| ^2_A +\left( \left( \tfrac{3}{2}N-\tfrac{3}{2}\right) t -\left( N^2-\tfrac{5}{2}N+\tfrac{3}{2}\right) c_1\right) \left\| x^N\right\| ^2_A\nonumber \\&\quad +\sum _{k=2}^{N-1}\left( \left( 2k^2-k-1\right) c_1-2kt\right) \langle Ax^k,Ax^{k+1}\rangle -\tfrac{t\left( N-1\right) ^2}{2} \left\| z^N-z^{N-1}\right\| _B^2\nonumber \\&\quad -\tfrac{tN^2}{2}\left\| Ax^{N}+Bz^{N}\right\| ^2 +f(x^1)-f(x^N)+N\left( f(x^N)-f^\star +g(x^N)-g^\star \right) \ge 0. \end{aligned}$$
(23)

By employing (4) and (9), we have

$$\begin{aligned}&N\left( f^\star -f(x^N)-\langle \lambda ^N+Bz^{N-1} -Bz^N, Ax^N\rangle -\tfrac{c_1}{2}\left\| x^N\right\| _A^2\right) \nonumber \\&\quad +\left( f(x^N)-f^1+\left\langle \lambda ^0+tAx^{1}+tBz^0, Ax^N-Ax^1\right\rangle -\tfrac{c_1}{2} \left\| x^N-x^1\right\| _A^2\right) \nonumber \\&\quad + N\left( g^\star -g(x^N) -\langle \lambda ^N, Bz^N\rangle \right) \ge 0. \end{aligned}$$
(24)

By summing (23) and (24), we obtain

$$\begin{aligned}&\tfrac{1}{2t}\left\| \lambda ^{0}-\lambda ^\star \right\| ^2 +\tfrac{t}{2}\left\| z^0\right\| _B^2-\tfrac{t\left( N-1\right) ^2}{2} \left\| z^{N-1}-z^{N}+\tfrac{N}{(N-1)^2} x^N\right\| _B^2\nonumber \\&\quad -\tfrac{tN^2}{2}\left\| Ax^{N}+Bz^{N}\right\| ^2-\tfrac{1}{2} {{\,\textrm{tr}\,}}\left( {D(t, c_1)} \begin{pmatrix} Ax^1&\dots&Ax^{N} \end{pmatrix}^{T} \begin{pmatrix} Ax^1&\dots&Ax^{N} \end{pmatrix}\right) \ge 0, \end{aligned}$$
(25)

where the matrix \(D(t, c_1)\) is as follows,

$$\begin{aligned} D(t, c_1)=\left( \begin{array}{cccccccccc} 2c_1 &{} 0 &{} 0 &{} 0 &{} \dots &{} 0 &{} 0 &{} \dots &{} 0 &{} t-c_1\\ 0 &{} \alpha _2 &{} \beta _2 &{} 0 &{} \dots &{} 0 &{} 0 &{} \dots &{} 0 &{} -t\\ 0 &{} \beta _2 &{} \alpha _3 &{} \beta _3 &{} \dots &{} 0 &{} 0 &{} \dots &{} 0 &{} t\\ \vdots &{}\vdots &{} \vdots &{} \vdots &{} \vdots &{} \vdots &{} \vdots &{} \vdots &{} \vdots &{} \vdots \\ 0 &{} 0 &{} 0 &{} 0 &{} \dots &{} \alpha _k &{} \beta _k &{} \dots &{} 0 &{} t\\ \vdots &{} \vdots &{} \vdots &{} \vdots &{} \vdots &{} \vdots &{} \vdots &{} \vdots &{} \vdots &{} \vdots \\ 0 &{} 0 &{} 0 &{} 0 &{} \dots &{} 0 &{} 0 &{} \dots &{} \alpha _{N-1} &{} \beta _{N-1}\\ t-c_1 &{} -t &{} t &{} t &{} \dots &{} t &{} t &{} \dots &{} \beta _{N-1} &{} \alpha _{N}\\ \end{array}\right) , \end{aligned}$$

and

$$\begin{aligned}&\alpha _k= {\left\{ \begin{array}{ll} 6c_1-5t, &{} {k=2}\\ 2\left( 2k^2-3k+1\right) c_1-\left( 4k-1\right) t, &{} {3\le k\le N-1},\\ \left( 2N^2-4N+4\right) c_1-\left( 3N-5+\frac{N^2}{\left( N-1\right) ^2}\right) t, &{} {k=N},\\ \end{array}\right. } \\&\beta _k=2kt-\left( 2k^2-k-1\right) c_1, \ \ \ {2\le k\le N-1} \end{aligned}$$

As the matrix \(D(t, c_1)\) is positive semidefinite, see Appendix A, inequality (25) implies that

$$\begin{aligned} \tfrac{tN^2}{2}\left\| Ax^{N}+Bz^{N}\right\| ^2\le \tfrac{1}{2t} \left\| \lambda ^{0}-\lambda ^\star \right\| ^2+ \tfrac{t}{2} \left\| z^0\right\| _B^2, \end{aligned}$$

and the proof is complete. \(\square \)

The following example shows the exactness of bound (22).

Example 2

Let \(c_1>0\), \(N\ge 4\) and \(t\in (0, c_1]\). Consider functions \(f, g: {\mathbb {R}}\rightarrow {\mathbb {R}}\) given by the formulae follows,

$$\begin{aligned}&f(x)=\tfrac{1}{2}|x|+\tfrac{ c_1}{2}x^2,\\&g(z)=\max \left\{ \left( \tfrac{1}{2}-\tfrac{1}{N}\right) \left( z-\tfrac{1}{Nt} \right) , \tfrac{1}{2}\left( \tfrac{1}{Nt}-z \right) \right\} . \end{aligned}$$

We formulate the following optimization problem,

$$\begin{aligned}&\min _{(x, z)\in {\mathbb {R}}\times {\mathbb {R}}} f(x)+g(z),\\&\quad \textrm{s}.\,\textrm{t}.\ Ax+Bz=0, \end{aligned}$$

where \(A=B=I\). One can verify that \((x^\star , z^\star )=(0, 0)\) with Lagrangian multiplier \(\lambda ^\star =\tfrac{1}{2}\) is an optimal solution. Algorithm 1 with initial point \(\lambda ^0=\tfrac{-1}{2}\) and \( z^0=0\) generates the following points,

$$\begin{aligned}&x^k=0 \quad k\in \{1, \dots , N\} \\&z^k=\tfrac{1}{Nt} \quad k\in \{1, \dots , N\} \\&\lambda ^k=\tfrac{2k-N}{2N} \quad k\in \{1, \dots , N\}. \end{aligned}$$

At iteration N, we have \(\Vert Ax^N+Bz^N\Vert =\tfrac{1}{tN} =\tfrac{\sqrt{\Vert \lambda ^0-\lambda ^\star \Vert ^2+t^2\left\| z^0-z^\star \right\| _B^2}}{tN}\), which shows the tightness of bound (22).

In what follows, we study the convergence rate of ADMM in terms of residual dual. To this end, we investigate the convergence rate of \(\{B\left( z^{k-1}-z^k\right) \}\) as \(\left\| A^TB\left( z^{k-1}-z^k\right) \right\| \le \Vert A\Vert \left\| z^{k-1}-z^k\right\| _B\). The next theorem provides a convergence rate for the aforementioned sequence.

Theorem 5

Let \(f\in {\mathcal {F}}^A_{c_1}({\mathbb {R}}^n)\) and \(g\in {\mathcal {F}}_{0}({\mathbb {R}}^m)\) with \(c_1>0\). If \(t\le c_1\) and \(N\ge 4\), then

$$\begin{aligned} \left\| z^N-z^{N-1}\right\| _B\le \frac{\sqrt{\Vert \lambda ^0 -\lambda ^\star \Vert ^2+t^2\left\| z^0-z^\star \right\| _B^2}}{(N-1)t}. \end{aligned}$$
(26)

Proof

Similar to the proof of Theorem 3, by setting \(v=Ax^{N}\) in (18) for \(N-1\) iterations, one can infer the following inequality

$$\begin{aligned}&(N-1)\langle \lambda ^{N-1}, Ax^{N-1}+Bz^{N-1}\rangle +\tfrac{1}{2t}\Vert \lambda ^{0}-\lambda ^\star \Vert ^2 -\tfrac{1}{2t}\Vert \lambda ^{N-1}-\lambda ^\star \Vert ^2\nonumber \\&\quad +\tfrac{t}{2}\left\| z^0\right\| ^2_B -\langle \lambda ^{N-1}+tAx^{N-1}+tBz^{N-2}, Ax^{N-1}-Ax^N\rangle +\tfrac{t(N-2)}{2}\Vert x^N\Vert _A^2 \nonumber \\&\quad +\langle \lambda ^{0}+tAx^1+tBz^0, Ax^1 -Ax^N\rangle -t \left\langle Ax^1-Ax^2+NAx^{N-1} +Bz^{N-1}, Ax^N \right\rangle \nonumber \\&\quad +\frac{1}{2}\sum _{k=2}^{N-2}\left( \left( 4k-1\right) t -2\left( 2k^2-3k+1\right) c_1\right) \left\| x^k\right\| ^2_A +t\langle Ax^{N-1}, Bz^{N-1} \rangle \nonumber \\&\quad +\sum _{k=2}^{N-2}\left( \left( 2k^2-k-1\right) c_1-2kt\right) \langle Ax^k,Ax^{k+1}\rangle +t(N-1)\langle Bz^{N-2}, Ax^{N-1}-Ax^N\rangle \nonumber \\&\quad +\frac{1}{2}\left( \left( 4N-3\right) t -\left( 2N^2-9N+10\right) c_1\right) \left\| x^{N-1}\right\| ^2_A -t\left\| x^2\right\| _A^2-\tfrac{c_1}{2}\left\| x^1\right\| _A^2\nonumber \\&\quad -\tfrac{t(N-2)^2}{2}\left\| z^{N-1}-z^{N-2}\right\| _B^2 -\tfrac{t(N-1)^2}{2}\Vert Ax^{N-1}+Bz^{N-1}\Vert ^2-t\sum _{k=3}^{N-1} \langle Ax^k, Ax^N\rangle \nonumber \\&\quad +f(x^1)-f(x^{N-1})+(N-1)(f(x^{N-1}) -f^\star +g(x^{N-1})-g^\star )\ge 0. \end{aligned}$$
(27)

By using (4) and (9), we have

$$\begin{aligned}&(N^2-3N+2)\bigg (f(x^{N-1})-f(x^{N})+\left\langle \lambda ^{N-1} +tAx^{N}+tBz^{N-1}, A\left( x^{N-1}-x^{N}\right) \right\rangle \nonumber \\&\quad -\tfrac{c_1}{2}\left\| x^{N}-x^{N-1}\right\| _A^2\bigg ) +\bigg (f(x^{N})-f(x^1)+\bigg \langle \lambda ^0+tAx^1+tBz^0, A \left( x^{N}-x^1\right) \bigg \rangle \nonumber \\&\quad -\tfrac{c_1}{2}\Vert x^{N}-x^1\Vert _A^2\bigg )+N(N-1) \left( g(z^{N})-g(z^{N-1})+\left\langle \lambda ^{N-1}, B\left( z^{N}-z^{N-1}\right) \right\rangle \right) \nonumber \\&\quad +(N^2-3N+1)\bigg (f(x^{N})-f(x^{N-1}) +\bigg \langle \lambda ^{N-1}-tBz^{N-1}+tBz^{N-2}, A\left( x^{N}-x^{N-1}\right) \bigg \rangle \nonumber \\&\quad -\tfrac{c_1}{2}\Vert x^{N}-x^{N-1}\Vert _A^2\bigg )+(N-1) \left( g^\star -g(z^{N})-\left\langle \lambda ^{N-1} +tAx^N+tBz^N, Bz^N \right\rangle \right) \nonumber \\&\quad +(N-1)\bigg (f^\star -f(x^{N-1}) -\bigg \langle \lambda ^{N-1}-tBz^{N-1}+tBz^{N-2}, Ax^{N-1} \bigg \rangle -\tfrac{c_1}{2}\Vert x^{N-1}\Vert _A^2\bigg )\nonumber \\&\quad +(N-1)^2\left( g(z^{N-1})-g(z^{N}) +\left\langle \lambda ^{N-1}+tAx^N+Bz^N, B\left( z^{N-1}-z^{N}\right) \right\rangle \right) \ge 0. \end{aligned}$$
(28)

By summing (27) and (28), we obtain

$$\begin{aligned}&\tfrac{1}{2t}\left\| \lambda ^0-\lambda ^\star \right\| ^2 +\tfrac{t}{2}\left\| z^0\right\| _B^2-\tfrac{(N^2-1)t}{2} \left\| \tfrac{N}{N+1}Ax^N+Bz^N\right\| ^2\\&\quad -\frac{t(N-1)^2}{2} \left\| z^N-z^{N-1}\right\| _B^2\\&\quad -\tfrac{(N-2)^2t}{2}\bigg \Vert Bz^{N-2}-Bz^{N-1} +\tfrac{N-1}{N-2}Ax^{N-1}-\left( 1-\tfrac{1}{(N-2)^2}\right) Ax^{N} \bigg \Vert ^2\\&\quad -\tfrac{1}{2}{{\,\textrm{tr}\,}}\left( {F(t, c_1)}\begin{pmatrix} Ax^1&\dots&Ax^{N} \end{pmatrix}^T \begin{pmatrix} Ax^1&\dots&Ax^{N} \end{pmatrix}\right) \ge 0, \end{aligned}$$

where the matrix \(F(t, c_1)\) is as follows,

$$\begin{aligned} F(t, c_1)=\left( \begin{array}{cccccccccc} 2c_1 &{} 0 &{} 0 &{} 0 &{} \dots &{} 0 &{} 0 &{} \dots &{} 0 &{} t-c_1\\ 0 &{} \alpha _2 &{} \beta _2 &{} 0 &{} \dots &{} 0 &{} 0 &{} \dots &{} 0 &{} -t\\ 0 &{} \beta _2 &{} \alpha _3 &{} \beta _3 &{} \dots &{} 0 &{} 0 &{} \dots &{} 0 &{} t\\ \vdots &{}\vdots &{} \vdots &{} \vdots &{} \vdots &{} \vdots &{} \vdots &{} \vdots &{} \vdots &{} \vdots \\ 0 &{} 0 &{} 0 &{} 0 &{} \dots &{} \alpha _k &{} \beta _k &{} \dots &{} 0 &{} t\\ \vdots &{} \vdots &{} \vdots &{} \vdots &{} \vdots &{} \vdots &{} \vdots &{} \vdots &{} \vdots &{} \vdots \\ 0 &{} 0 &{} 0 &{} 0 &{} \dots &{} 0 &{} 0 &{} \dots &{} \alpha _{N-1} &{} \beta _{N-1}\\ t-c_1 &{} -t &{} t &{} t &{} \dots &{} t &{} t &{} \dots &{} \beta _{N-1} &{} \alpha _{N}\\ \end{array}\right) , \end{aligned}$$

and

$$\begin{aligned}&\alpha _k={\left\{ \begin{array}{ll} 6c_1-5t, &{} {k=2}\\ 2\left( 2k^2-3k+1\right) c_1-\left( 4k-1\right) t, &{} {3\le k\le N-1},\\ \left( 2N^2-6N+4\right) c_1-2\left( N+\frac{1}{(N-2)^2}-\frac{2}{N+1}-3\right) t, &{} {k=N},\\ \end{array}\right. } \\&\beta _k={\left\{ \begin{array}{ll} 2kt-\left( 2k^2-k-1\right) c_1, \ \ \ {2\le k\le N-2},\\ (N+\frac{1}{2-N}-1)t-(2N^2-6N+3)c_1, &{} {k=N-1},\\ \end{array}\right. } \end{aligned}$$

The rest of the proof proceeds analogously to the proof of Theorem 4. \(\square \)

The following example shows the tightness of this bound.

Example 3

Assume that \(c_1>0\), \(N\ge 4\) and \(t\in (0, c_1]\) are given, and \(f, g: {\mathbb {R}}\rightarrow {\mathbb {R}}\) are defined by,

$$\begin{aligned}&f(x)=\tfrac{1}{2}\max \left\{ -\tfrac{N+1}{N-1}x,x\right\} +\tfrac{c_1}{2}x^2, \\&g(z)=\tfrac{1}{2}\max \left\{ \tfrac{1}{t(N-1)}-z, \tfrac{N-3}{N-1}\left( z-\tfrac{1}{t(N-1)}\right) \right\} . \end{aligned}$$

Consider the optimization problem

$$\begin{aligned}&\min _{(x, z)\in {\mathbb {R}}\times {\mathbb {R}}} f(x)+g(z),\\&\quad \textrm{s}.\,\textrm{t}.\ Ax+Bz=0. \end{aligned}$$

where \(A=B=I\). The point \((x^\star , z^\star )=(0, 0)\) with Lagrangian multiplier \(\lambda ^\star =\tfrac{1}{2}\) is an optimal solution. After performing N iterations of Algorithm 1 with setting \(\lambda ^0=\tfrac{-1}{2}\) and \( z^0=0\), we have

$$\begin{aligned}&x^k=0, \ \ \ \ \ \ \ \ \ \ k\in \{1, \dots , N\}, \\&z^k={\left\{ \begin{array}{ll} \tfrac{1}{t(N-1)}, &{} k\in \{1, \dots , N-1\}, \\ 0, &{} k=N, \end{array}\right. }\\&\lambda ^k={\left\{ \begin{array}{ll} \tfrac{2k+1-N}{2(N-1)}, &{} k\in \{1, \dots , N-1\}, \\ \tfrac{1}{2}, &{} k=N. \end{array}\right. } \end{aligned}$$

It can be seen that \(\left\| A^TB\left( z^N-z^{N-1}\right) \right\| =\tfrac{1}{(N-1)t}=\tfrac{\sqrt{\Vert \lambda ^0-\lambda ^\star \Vert ^2+t^2 \left\| z^0-z^\star \right\| _B^2}}{(N-1)t}\), which shows that the bound is tight.

Theorems 3 and 4 address the case that f is strongly convex relative to \(\Vert .\Vert _A\) and g is convex. Based on numerical results by solving performance estimation problems including (15) we conjecture, under the assumptions of Theorem 3, if g is \(c_2\)-strongly convex relative to \(\Vert .\Vert _B\), Algorithm 1 enjoys the following convergence rates

$$\begin{aligned}&D(\lambda ^\star )-D(\lambda ^N)\le \frac{\Vert \lambda ^0 -\lambda ^\star \Vert ^2+t^2\Vert z^0-z^\star \Vert _B^2}{4Nt +\tfrac{2c_1c_2}{c_1+c_2}}, \\&\left\| Ax^N+Bz^N-b\right\| \le \frac{\sqrt{\Vert \lambda ^0 -\lambda ^\star \Vert ^2+t^2\Vert z^0-z^\star \Vert _B^2}}{Nt+\tfrac{c_1c_2}{c_1+c_2}}. \end{aligned}$$

We have verified these conjectures numerically for many specific values of the parameters. Nevertheless, we could not manage to guess a closed-form formula for the residual dual in this case.

4 Linear convergence of ADMM

In this section we study the linear convergence of ADMM. The linear convergence of ADMM has been addressed by some authors and some conditions for linear convergence have been proposed, see [11, 21, 22, 25, 31, 38, 47]. Two common types of assumptions employed for proving the linear convergence of ADMM are error bound property and L-smoothness. To the best knowledge of authors, most scholars investigated the linear convergence of the sequence \(\{(x^k, z^k, \lambda ^k)\}\) to a saddle point and there is no result in terms of dual objective value for ADMM. In line with the previous section, we study the linear convergence in terms of dual objective value and we derive some formulas for linear convergence rate by using performance estimation. It is noteworthy to mention that the term "Q-linear convergence" is also employed to describe the linear convergence in the literature.

As mentioned earlier, error bound property is used by scholars for establishing the linear convergence; see e.g. [21, 25, 31, 40, 47]. Let

$$\begin{aligned} D^a(\lambda ):=\min f(x)+g(z)+\langle \lambda , Ax+Bz-b\rangle +\tfrac{a}{2}\Vert Ax+Bz-b\Vert ^2, \end{aligned}$$
(29)

stands for augmented dual objective for the given \(a>0\) and \(\varLambda ^\star \) denotes the optimal solution set of the dual problem. Note that the function \(D^a\) is an \(\tfrac{1}{a}\)-smooth function on its domain without assuming strong convexity; see [25, Lemma 2.2].

Definition 2

The function \(D^a\) is said to satisfy the error bound property if we have

$$\begin{aligned} d_{\varLambda ^\star }(\lambda )\le \tau \Vert \nabla D^a(\lambda )\Vert , \ \ \lambda \in {\mathbb {R}}^r, \end{aligned}$$
(30)

for some \(\tau >0\).

Hong et al. [25] established the linear convergence by employing the error bound property (30).

Recently, some scholars established the linear convergence of gradient methods for L-smooth convex functions by replacing strong convexity with some mild conditions, see [1, 7, 36] and references therein. Inspired by these results, we prove the linear convergence of ADMM by using the so-called PŁ inequality. It is worth noting that we employ the nonsmooth version of the PŁ inequality introduced in [6]. Concerning differentiability of dual objective, by (7), we have

$$\begin{aligned} b-A\partial f^*(-A^T\lambda )-B\partial g^*(-B^T\lambda ) \subseteq \partial \left( -D(\lambda )\right) . \end{aligned}$$
(31)

Note that inclusion (31) holds as an equality under some mild conditions, see e.g. [4, Chapter 3].

Definition 3

The function D is said to satisfy the PŁ inequality if there exists an \(L_p>0\) such that for any \(\lambda \in {\mathbb {R}}^r\) we have

$$\begin{aligned} D(\lambda ^\star )-D(\lambda ) \le \tfrac{1}{2L_p} \Vert \xi \Vert ^2, \ \ \ \xi \in \partial \left( -D(\lambda )\right) . \end{aligned}$$
(32)

Note that if f and g are strongly convex, then \(-D\) is an L-smooth convex function with \(L\le \tfrac{\lambda _{\max } (A^TA)}{\mu _1}+\tfrac{\lambda _{\max }(B^TB)}{\mu _2}\). In this setting, we have \(L_p\le \tfrac{\lambda _{\max }(A^TA)}{\mu _1} +\tfrac{\lambda _{\max }(B^TB)}{\mu _2}\). This follows from the duality between smoothness and strong convexity and

$$\begin{aligned} \left\| \nabla D(\lambda )-\nabla D(\nu )\right\|&\le \left\| \nabla f^*(-A^T\lambda )-\nabla f^*(-A^T\nu )\right\| _A\\&\quad +\left\| \nabla g^*(-B^T\lambda )-\nabla g^*(-B^T\nu )\right\| _B\\&\le \tfrac{1}{\mu _1}\left\| A^T\lambda - A^T\nu \right\| _A +\tfrac{1}{\mu _2}\left\| B^T\lambda -B^T\nu \right\| _B\\&\le \left( \tfrac{\lambda _{\max }(A^TA)}{\mu _1} +\tfrac{\lambda _{\max }(B^TB)}{\mu _2}\right) \left\| \lambda -\nu \right\| . \end{aligned}$$

In the next proposition, we show that definitions (30) and (32) are equivalent.

Proposition 1

Let \(L_a=\tfrac{1}{a}\) denote the Lipschitz constant of \(\nabla D^a\), where \(D^a\) is given in (29). Suppose that (31) holds as equality.

  1. (i)

    If \(D^a\) satisfies the error bound (30), then D satisfies the PŁ inequality (32) with \(L_p=\tfrac{1}{L_a\tau ^2}\).

  2. (ii)

    If D satisfies the PŁ inequality (32), then \(D^a\) satisfies the error bound (30) with \(\tau =\tfrac{L_p}{1+aL_p}\).

Proof

First we prove i). Suppose \(\lambda \in {\mathbb {R}}^r\) and \(\xi \in b-A\partial f^*(-A^T\lambda )-B\partial g^*(-B^T\lambda )\). By identity (6), we have \(\xi =b-A\bar{x}-B\bar{z}\) for some \((\bar{x}, \bar{z})\in {{\,\textrm{argmin}\,}}f(x)+g(z)+\langle \lambda , Ax+Bz-b\rangle \). Due to the smoothness of \(D^a\) and (30), we get

$$\begin{aligned} D^a(\lambda ^\star )-D^a(\nu ) \le \tfrac{L_a\tau ^2}{2} \Vert \nabla D^a(\nu )\Vert ^2, \ \ \nu \in {\mathbb {R}}^r, \end{aligned}$$
(33)

where \(\lambda ^\star \in \varLambda ^\star \) with \(d_{\varLambda ^\star }=\Vert \nu -\lambda ^\star \Vert \). Suppose that \(\bar{\nu }=\lambda -a(A\bar{x}+B\bar{z}-b)\). As we assume strong duality, we have \(D^a(\lambda ^\star )=D(\lambda ^\star )\). By the definitions of \(\bar{x}, \bar{y}\), we get

$$\begin{aligned} (\bar{x}, \bar{z})\in {{\,\textrm{argmin}\,}}f(x)+g(z) +\langle \bar{\nu }, Ax+Bz-b\rangle +\tfrac{a}{2}\Vert Ax+Bz-b\Vert ^2. \end{aligned}$$

By [25, Lemma 2.1], we have \(\nabla D^a(\bar{\nu })=A\bar{x}+B\bar{z}-b\). This equality with (33) imply

$$\begin{aligned} D(\lambda ^\star )-D(\lambda )\le D^a(\lambda ^\star )-D^a(\bar{\nu }) \le \tfrac{L_a\tau ^2}{2} \Vert A\bar{x}+B\bar{z}-b\Vert ^2, \end{aligned}$$

and the proof of i) is complete.

Now we establish ii). Let \(\lambda \) be in the domain of \(\nabla D^a\). By [25, Lemma 2.1], we have \(\nabla D^a(\lambda )=A\bar{x}+B\bar{z}-b\) for some \((\bar{x}, \bar{z})\in {{\,\textrm{argmin}\,}}f(x)+g(z)+\langle \lambda , Ax+Bz-b\rangle +\tfrac{a}{2}\Vert Ax+Bz-b\Vert ^2\), which implies that

$$\begin{aligned} 0\in \partial f(\bar{x})+A^T\left( \lambda +a(A\bar{x}+B\bar{z}-b)\right) , 0\in \partial g(\bar{z})+B^T\left( \lambda +a(A\bar{x}+B\bar{z}-b)\right) . \end{aligned}$$
(34)

Supposing \(\nu =\lambda +a(A\bar{x}+B\bar{z}-b)\). By (34), one can infer that \(D(\nu )=f(\bar{x})+g(\bar{z})+\langle \nu , A\bar{x}+B\bar{z}-b\rangle \). In addition, (6) implies that \(b-A\bar{x}-B\bar{z}\in b-A\partial f^*(-A^T\nu )-B\partial g^*(-B^T\nu )\). By the PŁ inequality, we have

$$\begin{aligned} \tfrac{1}{2L_p}\left\| A\bar{x}+B\bar{z}-b\right\| ^2 \ge D(\lambda ^\star )-D(\nu )=D^a(\lambda ^\star )-D^a(\lambda ) -\tfrac{a}{2}\left\| A\bar{x}+B\bar{z}-b\right\| ^2, \end{aligned}$$

where the equality follows from \(D(\nu )=D^a(\lambda )+\tfrac{a}{2}\left\| A\bar{x}+B\bar{z}-b\right\| ^2\) and \(D^a(\lambda ^\star )=D(\lambda ^\star )\). Hence,

$$\begin{aligned} D^a(\lambda ^\star )-D^a(\lambda ) \le \left( \tfrac{1}{2L_p} +\tfrac{a}{2} \right) \Vert \nabla D^a(\lambda )\Vert ^2. \end{aligned}$$

This inequality says that \(D^a\) satisfies the PŁ inequality. On the other hand, the PŁ inequality implies the error bound with the same constant, see [7], and the proof is complete. \(\square \)

In what follows, we employ performance estimation to derive a linear convergence rate for ADMM in terms of dual objective when the PŁ inequality holds. To this end, we compare the value of dual problem in two consecutive iterations, that is, \(\tfrac{D(\lambda ^\star ) -D(\lambda ^2)}{D(\lambda ^\star )-D(\lambda ^1)}\). The following optimization problem gives the worst-case convergence rate,

(35)

Analogous to our discussion in Sect. 2, we may assume without loss of generality \(b=0\), \(\lambda ^1=\begin{pmatrix} A&B \end{pmatrix} \begin{pmatrix} x^{\dagger } \\ z^{\dagger } \end{pmatrix}\) and \(\lambda ^\star =\begin{pmatrix} A&B \end{pmatrix} \begin{pmatrix} \bar{x} \\ \bar{z} \end{pmatrix}\) for some \(\bar{x}, x^{\dagger }, \bar{z}, z^{\dagger }\). In addition, we assume that \(\hat{x}^1\in {{\,\textrm{argmin}\,}}f(x)+\langle \lambda ^1, Ax\rangle \) and \(\hat{x}^2\in {{\,\textrm{argmin}\,}}f(x)+\langle \lambda ^2, Ax\rangle \). Hence,

$$\begin{aligned}{} & {} D(\lambda ^1)=f(\hat{x}^1)+g(z^1)+\langle \lambda ^1, A\hat{x}^1 +Bz^1\rangle , \\{} & {} D(\lambda ^2)=f(\hat{x}^2)+g(z^2) +\langle \lambda ^2, A\hat{x}^2+Bz^2\rangle , \end{aligned}$$

and

$$\begin{aligned}&-A^T\lambda ^1\in \partial f(\hat{x}^1), \quad -B^T\lambda ^1\in \partial g(z^1),\nonumber \\&-A^T\lambda ^2\in \partial f(\hat{x}^2), \quad -B^T\lambda ^2\in \partial g(z^2). \end{aligned}$$
(36)

Moreover, by (36) and (31), we get

$$\begin{aligned} -A\hat{x}^1-Bz^1\in \partial \left( -D(\lambda ^1)\right) , \ \ \ \ -A\hat{x}^2-Bz^2\in \partial \left( -D(\lambda ^2)\right) . \end{aligned}$$

On the other hand, \(\lambda ^2=\lambda ^1+tAx^2+tBz^2\). Therefore, by using Theorem 2, problem (35) may be relaxed as follows,

$$\begin{aligned}&\max \ \frac{f^\star +g^\star -\hat{f}^2-g^2 -\langle Ax^{\dagger }+Bz^{\dagger }+tAx^2+tBz^2, A\hat{x}^2+Bz^2\rangle }{f^\star +g^\star -\hat{f}^1-g^1 -\langle Ax^{\dagger }+Bz^{\dagger }, A\hat{x}^1+Bz^1\rangle }\nonumber \\&\ \textrm{s}.\,\textrm{t}.\ \Big \{\left( \hat{x}^1, -A^TAx^{\dagger }-A^TBz^{\dagger }, \hat{f}^1\right) , \left( x^2, -A^TAx^{\dagger }-A^TBz^{\dagger } -tA^TAx^2-tA^TBz^1, f^2\right) , \nonumber \\&\qquad \left( \hat{x}^2, -A^TAx^{\dagger } -A^TBz^{\dagger }-tA^TAx^2-tA^TBz^2, \hat{f}^2\right) , \left( 0, -A^TA\bar{x}-A^TB\bar{z}, f^\star \right) \Big \} \nonumber \\&\qquad \text {satisfy interpolation constraints} \ (4) \nonumber \\&\qquad \Big \{\left( z^1, -B^TAx^{\dagger }-B^TBz^{\dagger }, g^1\right) , \left( z^2, -B^TAx^{\dagger }-B^TBz^{\dagger } -tB^TAx^2-tB^TBz^2, g^2\right) , \nonumber \\&\qquad \left( 0, -B^TA\bar{z}-B^TB\bar{z}, g^\star \right) \Big \} \ \text {satisfy interpolation constraints} \ (4) \nonumber \\&\qquad f^*+g^*- \hat{f}^1- g^1-\left\langle Ax^{\dagger } +Bz^{\dagger }, A\hat{x}^1+Bz^1 \right\rangle \le \tfrac{1}{2L_p}\left\| A\hat{x}^1+Bz^1\right\| ^2\nonumber \\&\qquad f^*+g^*- \hat{f}^2- g^2-\left\langle Ax^{\dagger } +Bz^{\dagger }+tAx^2+tBz^2, A\hat{x}^2+Bz^2 \right\rangle \le \tfrac{1}{2L_p}\left\| A\hat{x}^2+Bz^2\right\| ^2\nonumber \\&\qquad A\in {\mathbb {R}}^{r\times n}, B\in {\mathbb {R}}^{r\times m}. \end{aligned}$$
(37)

By deriving an upper bound for the optimal value of problem (37) in the next theorem, we establish the linear convergence of ADMM in the presence of the PŁ inequality.

Theorem 6

Let \(f\in {\mathcal {F}}_{c_1}^A({\mathbb {R}}^n)\) and \(g\in {\mathcal {F}}^B_{c_2}({\mathbb {R}}^m)\) with \(c_1, c_2>0\), and let D satisfies the PŁ inequality with \(L_p\). Suppose that \(t\le \sqrt{c_1c_2}\).

  1. (i)

    If \(c_1\ge c_2\), then

    $$\begin{aligned} \frac{D(\lambda ^\star )-D(\lambda ^2)}{D(\lambda ^\star ) -D(\lambda ^1)}\le \frac{2c_1c_2-t^2}{2c_1c_2-t^2+L_pt \left( 4c_1c_2-c_2t-2t^2\right) }, \end{aligned}$$
    (38)

    in particular, if \(t=\sqrt{c_1c_2}\),

    $$\begin{aligned} \frac{D(\lambda ^\star )-D(\lambda ^2)}{D(\lambda ^\star ) -D(\lambda ^1)}\le \frac{1}{1+L_p\left( 2\sqrt{c_1c_2}-c_2\right) }. \end{aligned}$$
  2. (ii)

    If \(c_1< c_2\), then

    $$\begin{aligned}&\frac{D(\lambda ^\star )-D(\lambda ^2)}{D(\lambda ^\star ) -D(\lambda ^1)}\nonumber \\&\quad \le \frac{4 c_2^2-2c_2 \sqrt{c_1c_2}-t^2}{4 c_2^2-2c_2 \sqrt{c_1c_2}-t^2+L_pt\left( 8c_2^2+5c_2t-2\sqrt{c_1c_2} \left( 1+\tfrac{t}{c_1}\right) \left( 2c_2+t\right) \right) }. \end{aligned}$$
    (39)

Proof

The argument is based on weak duality. Indeed, by introducing suitable Lagrangian multipliers, we establish that the given convergence rates are upper bounds for problem (37). First, we prove (i). Assume that \(\alpha \) denotes the right hand side of inequality (38). As \(2c_1c_2-t^2>0\) and \(4c_1c_2-c_2t-2t^2>0\), we have \(0<\alpha <1\). With some algebra, one can show that

$$\begin{aligned}&f^\star +g^\star -\hat{f}^2-g^2-\langle Ax^{\dagger } +Bz^{\dagger }+tAx^2+tBz^2, A\hat{x}^2+Bz^2\rangle \\&\quad -\alpha \left( f^\star +g^\star -\hat{f}^1-g^1 -\langle Ax^{\dagger }+Bz^{\dagger }, A\hat{x}^1 +Bz^1\rangle \right) \\&\quad +\alpha \left( \hat{f}^2-\hat{f}^1 +\langle Ax^{\dagger }+Bz^{\dagger }, A\hat{x}^2 -A\hat{x}^1\rangle -\tfrac{c_1}{2}\left\| \hat{x}^2 -\hat{x}^1\right\| ^2_A \right) \\&\quad +\alpha \left( f^2-\hat{f}^2+\langle Ax^{\dagger } +Bz^{\dagger }+tAx^2+tBz^2, Ax^2-A\hat{x}^2\rangle -\tfrac{c_1}{2}\left\| x^2-\hat{x}^2\right\| ^2_A \right) \\&\quad +\alpha \left( \hat{f}^2-f^2+\langle Ax^{\dagger } +Bz^{\dagger }+tAx^2+tBz^1, A\hat{x}^2- Ax^2\rangle -\tfrac{c_1}{2}\left\| \hat{x}^2- x^2\right\| _A^2 \right) \\&\quad +\alpha \left( g^2-g^1+\langle Ax^{\dagger } +Bz^{\dagger }, Bz^2-Bz^1\rangle -\tfrac{c_2}{2} \left\| z^2-z^1\right\| _B^2 \right) \\&\quad +(1-\alpha )\Big (-f^\star -g^\star +\left\langle Ax^{\dagger }+Bz^{\dagger } +tAx^2+tBz^2, A\hat{x}^2+Bz^2 \right\rangle + \hat{f}^2+ g^2\\&\quad + \tfrac{1}{2L_p}\left\| A\hat{x}^2+Bz^2\right\| ^2 \Big ) = \tfrac{-c_1\alpha }{2}\left\| \hat{x}^1-\hat{x}^2\right\| ^2_A -\tfrac{c_2\alpha }{2}\left\| Bz^1-Bz^2+\tfrac{t}{c_2}Ax^2 -\tfrac{t}{c_2}A\hat{x}^2\right\| ^2\\&\quad -\alpha (c_1-\tfrac{t^2}{2c_2})\left\| Ax^2 +\tfrac{tc_2}{2c_1c_2-t^2}Bz^2-\tfrac{tc_2-2c_1c_2+t^2}{t^2-2c_1c_2} A\hat{x}^2\right\| ^2. \end{aligned}$$

Hence, we get

$$\begin{aligned}&f^\star +g^\star -\hat{f}^2-g^2-\langle Ax^{\dagger } +Bz^{\dagger }+tAx^2+tBz^2, A\hat{x}^2+Bz^2\rangle \\&\quad \le \alpha \left( f^\star +g^\star -\hat{f}^1 -g^1-\langle Ax^{\dagger }+Bz^{\dagger }, A\hat{x}^1+Bz^1\rangle \right) \end{aligned}$$

for any feasible point of problem (35) and the proof of the first part is complete. For (ii), we proceed analogously to the proof of (i), but with different Lagrange multipliers. Let \(\beta \) denote the right hand side of inequality (39), i.e.

$$\begin{aligned} \beta =\frac{4 c_2^2-2c_2 \sqrt{c_1c_2}-t^2}{4 c_2^2-2c_2 \sqrt{c_1c_2}-t^2+L_pt\left( 8c_2^2+5c_2t-2\sqrt{c_1c_2} \left( 1+\tfrac{t}{c_1}\right) \left( 2c_2+t\right) \right) }. \end{aligned}$$

It is seen that \(0<\beta <1\). By doing some calculus, we have

$$\begin{aligned}&f^\star +g^\star -\hat{f}^2-g^2-\langle Ax^{\dagger }+Bz^{\dagger } +tAx^2+tBz^2, A\hat{x}^2+Bz^2\rangle \\&\qquad -\beta \left( f^\star +g^\star -\hat{f}^1-g^1 -\langle Ax^{\dagger }+Bz^{\dagger }, A\hat{x}^1+Bz^1\rangle \right) \\&\qquad +\beta \left( \hat{f}^2-\hat{f}^1+\langle Ax^{\dagger } +Bz^{\dagger }, A\hat{x}^2-A\hat{x}^1\rangle -\tfrac{c_1}{2} \left\| \hat{x}^2-\hat{x}^1\right\| ^2_A \right) \\&\qquad +\sqrt{\tfrac{c_2}{c_1}}\beta \left( f^2-\hat{f}^2+\langle Ax^{\dagger }+Bz^{\dagger } +tAx^2+tBz^2, Ax^2-A\hat{x}^2\rangle -\tfrac{c_1}{2} \left\| x^2-\hat{x}^2\right\| _A^2 \right) \\&\qquad +\sqrt{\tfrac{c_2}{c_1}}\beta \left( \hat{f}^2-f^2+\langle Ax^{\dagger }+Bz^{\dagger } +tAx^2+tBz^1, A\hat{x}^2- Ax^2\rangle -\tfrac{c_1}{2} \left\| \hat{x}^2- x^2\right\| _A^2 \right) \\&\qquad + \sqrt{\tfrac{c_2}{c_1}}\beta \left( g^2-g^1+\langle Ax^{\dagger }+Bz^{\dagger }, Bz^2-Bz^1\rangle -\tfrac{c_2}{2}\left\| z^2-z^1\right\| _B^2 \right) \\&\qquad +\left( \sqrt{\tfrac{c_2}{c_1}}-1\right) \beta \bigg ( g^1-g^2+\langle Ax^{\dagger }+Bz^{\dagger } +tAx^2+tBz^2, Bz^1-Bz^2\rangle \\&\qquad - \tfrac{c_2}{2}\left\| z^1-z^2\right\| _B^2 \bigg ) +(1-\beta )\big (-f^\star -g^\star +\left\langle Ax^{\dagger } +Bz^{\dagger }+tAx^2+tBz^2, A\hat{x}^2+Bz^2 \right\rangle \\&\qquad +\hat{f}^2+ g^2+\tfrac{1}{2L_p}\left\| A\hat{x}^2 +Bz^2\right\| ^2 \big )\\&\quad = -\tfrac{c_1\beta }{2}\left\| \hat{x}^1-\hat{x}^2\right\| ^2_A -\left( \sqrt{c_1c_2}\beta \right) \left\| A x^2 -\left( 1-\frac{t}{2 \sqrt{c_1c_2}}\right) A\hat{x}^2 +\frac{t}{2 \sqrt{c_1c_2}}Bz^1\right\| ^2\\&\qquad -\left( \frac{\beta -1}{2L_p}+\beta t \left( 1-\frac{t}{4\sqrt{c_1c_2}}\right) \right) \bigg \Vert A\hat{x}^2-\left( \frac{\beta L_p \left( -2c_2\sqrt{c_1c_2}+4c_2^2-t^2\right) }{-\beta L_p t^2 +2 \sqrt{c_1c_2} (2 \beta L_p t+\beta -1)}\right) ^{\frac{1}{2}}Bz^1\\&\qquad +\left( \frac{2\left( 2\beta c_2L_p\left( t+c_2\right) +\sqrt{c_1c_2} \left( \beta -\beta L_p c_2-1\right) \right) }{-\beta L_p t^2+2 \sqrt{c_1c_2} (2 \beta L_p t+\beta -1)}\right) ^{\frac{1}{2}}Bz^2\bigg \Vert ^2. \end{aligned}$$

The rest of the proof is similar to that of the former case. \(\square \)

We computed the bounds in Theorem 6 by selecting suitable Lagrangian multipliers and solving the semidefinite formulation of problem (37) by hand. The semidefinite formulation is formed analogous to problem (16). Note that the optimal value of problem (37) may be smaller than the bounds introduced in Theorem 6. Indeed, our aim was to provide a concrete mathematical proof for the linear convergence rate. However, the linear convergence rate factor is not necessarily tight. Needless to say that the optimal value of problem (37) also does not necessarily give the tight convergence factor as it is just a relaxation of problem (35).

Recently the authors showed that the PŁ inequality is necessary and sufficient conditions for the linear convergence of the gradient method with constant step lengths for L-smooth function; see[1, Theorem 5]. In what follows, we establish that the PŁ inequality is a necessary condition for the linear convergence of ADMM. Firstly, we present a lemma that is very useful for our proof.

Lemma 3

Let \(f\in {\mathcal {F}}_{c_1}^A({\mathbb {R}}^n)\) and \(g\in {\mathcal {F}}^B_{c_2}({\mathbb {R}}^m)\). Consider Algorithm 1. If \((\hat{x}^1, z^1)\in {{\,\textrm{argmin}\,}}f(x)+g(z)+\langle \lambda ^1, Ax+Bz-b\rangle \), then

$$\begin{aligned} \langle A\hat{x}^1+Bz^1-b, Ax^2+Bz^2-b\rangle \le \left\| A\hat{x}^1+Bz^1-b\right\| ^2. \end{aligned}$$
(40)

Proof

Without loss of generality we assume that \(c_1=c_2=0\). By optimality conditions, we have

$$\begin{aligned}&f(\hat{x}^1)-\langle \lambda ^1, Ax^2-A\hat{x}^1\rangle \le f(x^2), \ \ \ g(z^1)-\langle \lambda ^1, Bz^2-Bz^1\rangle \le g(z^2),\\&f(x^2)-\langle \lambda ^1+t(Ax^2+Bz^1-b), A\hat{x}^1-Ax^2\rangle \le f(\hat{x}^1),\\&g(z^2)-\langle \lambda ^1+t(Ax^2+Bz^2-b), Bz^1-Bz^2\rangle \le g(z^1). \end{aligned}$$

By using these inequities, we get

$$\begin{aligned} 0&\le \tfrac{1}{t}\left( f(x^2)-f(\hat{x}^1) +\left\langle \lambda ^1, Ax^2-A\hat{x}^1\right\rangle \right) \\&\quad +\tfrac{1}{t}\left( g(z^2)-g(z^1)+\left\langle \lambda ^1, Bz^2-Bz^1\right\rangle \right) \\&\quad +\tfrac{1}{t}\left( f(\hat{x}^1)-f(x^2) +\left\langle \lambda ^1+t(Ax^2+Bz^1-b), A\hat{x}^1- Ax^2\right\rangle \right) \\&\quad +\tfrac{1}{t}\left( g(z^1)-g(z^2) +\left\langle \lambda ^1+t(Ax^2+Bz^2-b), Bz^1-Bz^2\right\rangle \right) \\&=\left\| A\hat{x}^1+Bz^1-b\right\| ^2-\left\langle A\hat{x}^1+Bz^1-b, Ax^2+Bz^2-b\right\rangle \\&\quad -\tfrac{3}{4} \left\| B\left( z^1-z^2\right) \right\| ^2\\&\quad -\left\| A\left( \hat{x}^1-x^2\right) +\tfrac{1}{2}B\left( z^1-z^2\right) \right\| ^2. \end{aligned}$$

Hence, we have

$$\begin{aligned} \frac{\langle A\hat{x}^1+Bz^1-b, Ax^2+Bz^2-b\rangle }{\left\| A\hat{x}^1+Bz^1-b\right\| ^2}\le 1, \end{aligned}$$

which completes the proof. \(\square \)

The next theorem establishes that the PŁ inequality is a necessary condition for the linear convergence of ADMM.

Theorem 7

Let \(f\in {\mathcal {F}}^A_{c_1}({\mathbb {R}}^n)\), \(g\in {\mathcal {F}}^B_{c_2}({\mathbb {R}}^m)\) and let (31) hold as equality. If Algorithm 1 is linearly convergent with respect to the dual objective value, then D satisfies the PŁ inequality.

Proof

Consider \(\lambda ^1\in {\mathbb {R}}^r\) and \(\xi \in b-A\partial f^*(-A^T\lambda ^1)-B\partial g^*(-B^T\lambda ^1)\). Hence, \(\xi =b-A\hat{x}^1-Bz^1\) for some \((\hat{x}^1, z^1)\in {{\,\textrm{argmin}\,}}f(x)+g(z)+\langle \lambda , Ax+Bz-b\rangle \). If one sets \(z^0=z^1\) and \(\lambda ^0=\lambda ^1-t(A\hat{x}^1+Bz^1-b)\) in Algorithm 1, the algorithm may generate \(\lambda ^1\). As Algorithm 1 is linearly convergent, there exist \(\gamma \in [0, 1)\) with

$$\begin{aligned} D(\lambda ^\star )-D(\lambda ^2)\le \gamma \left( D(\lambda ^\star ) -D(\lambda ^1) \right) . \end{aligned}$$

So, we have

$$\begin{aligned} (1-\gamma )\left( D(\lambda ^\star )-D(\lambda ^1)\right) \le D(\lambda ^2)-D(\lambda ^1)\le \left\langle A \hat{x}^1+Bz^1-b, \lambda ^2-\lambda ^1\right\rangle , \end{aligned}$$

where the last inequality follows from the concavity of the function D. Since \(\lambda ^2-\lambda ^1=t(Ax^2+Bz^2-b)\), Lemma 3 implies that

$$\begin{aligned} D(\lambda ^\star )-D(\lambda ^1)\le \tfrac{t}{1-\gamma } \Vert \xi \Vert ^2, \end{aligned}$$

so D satisfies the PŁ inequality. \(\square \)

Another assumption used in the literature for establishing linear convergence is L-smoothness; see for example [10, 11, 15, 38]. Deng et al. [11] show that the sequence \(\{(x^k, z^k, \lambda ^k)\}\) is convergent linearly to a saddle point under Scenario 1 and 2 given in Table 1.

Table 1 Scenarios leading to linear convergence rates

It is worth mentioning that Scenario 1 or Scenario 2 implies strong convexity of the dual objective function and therefore the PŁ inequality is implied, see [1]. Hence, Theorem 6 implies the linear convergence in terms of dual value under Scenario 1 or Scenario 2. Deng et al. [11] studied the linear convergence under Scenario 3, but they just proved the linear convergence of the sequence \(\{(x^k, Bz^k, \lambda ^k)\}\). In the next section, we investigate the R-linear convergence without assuming L-smoothness of f. Indeed, we establish the R-linear convergence when f is strongly convex, g is L-smooth and B has full row rank.

Note that the PŁ inequality does not imply necessarily Scenario 1 or Scenario 2. Indeed, consider the following optimization problem,

$$\begin{aligned}&\min \ f(x)+g(z),\\&\textrm{s}.\,\textrm{t}.\ x+z=0, \\&\qquad x,z\in {\mathbb {R}}^n, \end{aligned}$$

where \(f(x)=\tfrac{1}{2}\Vert x\Vert ^2+\Vert x\Vert _1\) and \(g(z)=\tfrac{1}{2}\Vert z\Vert ^2+\Vert z\Vert _1\). With some algebra, one may show that \(D(\lambda )=\sum _{i=1}^{n} h(\lambda _i)\) with

$$\begin{aligned} h(s)= {\left\{ \begin{array}{ll} -(s-1)^2, &{} s>1\\ 0, &{} |s|\le 1\\ -(s+1)^2, &{} s<-1. \end{array}\right. } \end{aligned}$$

Hence, the PŁ inequality holds for \(L_p=\tfrac{1}{2}\) while neither f nor g is L-smooth.

As mentioned earlier the performance estimation problem including the PŁ inequality at finite set of points is a relaxation for computing the worst-case convergence rate. Contrary to Theorem 6, we could not manage to prove the linear convergence of primal and dual residuals under the assumptions of Theorem 6 by employing performance estimation.

5 R-linear convergence of ADMM

This section focuses on examining the linear convergence rate for ADMM from a weaker convergence rate perspective than Q-linear which is already studied in Sect. 4. This concept is known as R-linear convergence where R stands for root [39]. Recall that ADMM enjoys R-linear convergence in terms of dual objective value if there exists sequence \(\{s_k\}\subseteq {\mathbb {R}}_+\) such that

$$\begin{aligned} D(\lambda ^\star )-D(\lambda ^N)\le s_k, \end{aligned}$$

and \(s_k\) tends Q-linearly to zero. It is easily seen that the linear convergence implies R-linear convergence. For an extensive discussion of convergence rates see [39, Section A.2] or [8, Section 1.5].

We investigate the R-linear convergence under the following scenarios:

  • (S1): \(f\in {\mathcal {F}}_{c_1}^A({\mathbb {R}}^n)\) is L-smooth with \(c_1>0\) and A has full row rank;

  • (S2): \(f\in {\mathcal {F}}_{c_1}^A({\mathbb {R}}^n)\) with \(c_1>0\), g is L-smooth and B has full row rank.

Under these scenarios, we could not manage to find a value of q within the range [0, 1) that satisfies the inequality:

$$\begin{aligned} D(\lambda ^\star )-D(\lambda ^{N+1})\le q \left( D(\lambda ^\star )-D(\lambda ^N)\right) . \end{aligned}$$

As a result, we turn our attention towards studying the R-linear convergence.

Our technique for proving the R-linear convergence is based on establishing the linear convergence of the sequence \(\{V^k\}\) given by

$$\begin{aligned} V^k=\Vert \lambda ^k-\lambda ^\star \Vert ^2+t^2\left\| z^k-z^\star \right\| _B^2. \end{aligned}$$
(41)

Note that \(V^k\) is called Lyapunov function for ADMM and it decreases in each iteration; see [9]. It is worth noting Q-linear and R-linear convergence of ADMM have been studied under similar scenarios for some performance measures, see e.g. [10, 15, 38]. However, to the best of knowledge, no existing results in the literature address the dual objective and \(V^k\) under Scenario (S1) and (S2).

First we consider the case that f is L-smooth and \(c_1\)-strongly convex relative to A. The following proposition establishes the linear convergence of \(\{V^k\}\).

Proposition 2

Let \(f\in {\mathcal {F}}_{c_1}^A({\mathbb {R}}^n)\) be L-smooth with \(c_1>0\), \(g\in {\mathcal {F}}_{0}({\mathbb {R}}^m)\) and let A has full row rank. If \(t< \sqrt{\tfrac{c_1 L}{\lambda _{\min }(AA^T)}}\), then

$$\begin{aligned} V^{k+1}\le \left( 1-\tfrac{2c_1 t}{c_1d+2c_1 t+ t^2} \right) V^k, \end{aligned}$$
(42)

where \(d=\tfrac{L}{\lambda _{\min }(AA^T)}\).

Proof

We may assume without loss of generality that \(x^\star , z^\star \) and b are zero; see our discussion in Sect. 2. By optimality conditions, we have

$$\begin{aligned}&\nabla f(x^{k+1})=-A^T\left( \lambda ^{k}+tAx^{k+1}+tBz^k\right) , \quad \eta ^k=-B^T\lambda ^{k+1},\\&\nabla f(x^\star )=-A^T\lambda ^\star , \quad \eta ^\star =-B^T\lambda ^\star , \end{aligned}$$

for some \(\eta ^k\in \partial g(z^{k+1})\) and \(\eta ^\star \in \partial g(z^\star )\). Let \(\alpha =\tfrac{2t}{c_1^2d^2+2c_1dt^2-4c_1^2t^2 +t^4}\). By Theorem 2, we get

$$\begin{aligned}&{\alpha \left( t^2 + c_1d\right) ^2} \left( f(x^{k+1}) -f^\star +\left\langle \lambda ^\star , Ax^{k+1}\right\rangle -\tfrac{1}{2L}\left\| A^T\left( \lambda ^{k}+tAx^{k+1} +tBz^k-\lambda ^\star \right) \right\| ^2\right) \\&\quad +2\alpha t^2{\left( c_1d+t^2\right) } \left( f^\star -f(x^{k+1})- \tfrac{c_1}{2} \left\| x^{k+1} \right\| ^2_A-\left\langle \lambda ^{k} +tAx^{k+1}+tBz^k, Ax^{k+1}\right\rangle \right) \\&\quad +2t\left( g(z^{k+1})-g^\star +\left\langle \lambda ^\star , Bz^{k+1}\right\rangle \right) +2t\left( g^\star -g(z^{k+1})-\left\langle \lambda ^{k+1}, Bz^{k+1}\right\rangle \right) \\&\quad +{\alpha \left( c_1^2d^2-t^4\right) } \Bigg ( f^\star -f(x^{k+1})-\left\langle \lambda ^{k} +tAx^{k+1}+tBz^k, Ax^{k+1}\right\rangle \\&\quad -\tfrac{1}{2L}\left\| A^T\left( \lambda ^{k} +tAx^{k+1}+tBz^k-\lambda ^\star \right) \right\| ^2\Bigg ) \ge 0. \end{aligned}$$

As \(\Vert A^T\lambda \Vert ^2\ge \tfrac{L}{d}\Vert \lambda \Vert ^2\) and \(\lambda ^{k+1}=\lambda ^k+tAx^{k+1}+tBz^{k+1}\), we obtain the following inequality after performing some algebraic manipulations

$$\begin{aligned}&\left( 1-\tfrac{2c t}{cd+2c t+ t^2} \right) \left( \left\| \lambda ^k-\lambda ^\star \right\| ^2 +t^2\left\| Bz^k\right\| ^2\right) -\left( \left\| \lambda ^{k+1}-\lambda ^\star \right\| ^2 +t^2\left\| Bz^{k+1}\right\| ^2\right) \\&\quad - {2\alpha c_1^2t}\left\| \lambda ^k-\lambda ^\star +\tfrac{t^2+2c_1t+c_1d}{2c_1}Ax^{k+1} +\tfrac{t^2+c_1d}{2c_1}Bz^{k} \right\| ^2\ge 0. \end{aligned}$$

The above inequality implies that

$$\begin{aligned} V^{k+1}\le \left( 1-\tfrac{2c_1 t}{c_1d+2c_1 t+ t^2} \right) V^k, \end{aligned}$$

and the proof is complete. \(\square \)

Note that one can improve bound (42) under the assumptions of Proposition 2 and the \(\mu \)-strong convexity of f by employing the following known inequality

$$\begin{aligned}&\tfrac{1}{2\left( 1-\tfrac{\mu }{L}\right) }\left( \tfrac{1}{L} \left\| \nabla f(x)-\nabla f(y)\right\| ^2 +\mu \left\| x-y\right\| ^2-\tfrac{2\mu }{L} \left\langle \nabla f(x)-\nabla f(y),x-y\right\rangle \right) \\&\qquad \le f(y)-f(x)-\left\langle \nabla f(x), y-x\right\rangle . \end{aligned}$$

Indeed, we employed the given inequality but we could not manage to obtain a closed form formula for the convergence rate. The next theorem establishes the R-linear convergence of ADMM in terms of dual objective value under the assumptions of Proposition 2.

Theorem 8

Let \(N\ge 4\) and let A has full row rank. Suppose that \(f\in {\mathcal {F}}_{c_1}^A({\mathbb {R}}^n)\) is L-smooth with \(c_1>0\) and \(g\in {\mathcal {F}}_{0}({\mathbb {R}}^m)\). If \(t<\min \{c_1, \sqrt{\tfrac{c_1 L}{\lambda _{\min }(AA^T)}}\}\), then

$$\begin{aligned} D(\lambda ^\star )-D(\lambda ^N)\le \rho \left( 1-\tfrac{2c_1 t}{c_1d+2c_1 t+ t^2} \right) ^{N}, \end{aligned}$$

where \(d=\tfrac{L}{\lambda _{\min }(AA^T)}\) and \(\rho =\tfrac{V^0}{16t}\left( 1-\tfrac{2c_1 t}{c_1d+2c_1 t+ t^2} \right) ^{-4}\).

Proof

By Theorem 3 and Proposition 2, one can infer the following inequalities,

$$\begin{aligned} D(\lambda ^\star )-D(\lambda ^N)&\le \tfrac{V^{N-4}}{16t}\\&\le \tfrac{V^0}{16t} \left( 1-\tfrac{2c_1 t}{c_1d+2c_1 t+ t^2}\right) ^{N-4}, \end{aligned}$$

which shows the desired inequality. \(\square \)

In the sequel, we investigate the R-linear convergence under the hypotheses of scenario (S2). The next proposition shows the linear convergence of \(\{V^k\}\).

Proposition 3

Let \(f\in {\mathcal {F}}_{c_1}^A({\mathbb {R}}^n)\) with \(c_1>0\) and let \(g\in {\mathcal {F}}_{0}({\mathbb {R}}^m)\) be L-smooth. Suppose that B has full row rank and \(k\ge 1\). If \(t\le \min \{\tfrac{c_1}{2}, \tfrac{L}{2\lambda _{\min }(BB^T)}\}\), then

$$\begin{aligned} V^{k+1}\le \left( \tfrac{L}{L+t\lambda _{\min }(BB^T)} \right) ^2 V^k. \end{aligned}$$
(43)

Proof

Analogous to the proof of Proposition 2, we assume that \(x^\star =0\), \(z^\star =0\) and \(b=0\). Due to the optimality conditions, we have

$$\begin{aligned}&\xi ^{k+1}=-A^T\left( \lambda ^{k}+tAx^{k+1}+tBz^k\right) , \quad \xi ^\star =-A^T\lambda ^\star ,\\&\nabla g(z^k)=-B^T\lambda ^k, \quad \nabla g(z^{k+1})=-B^T\lambda ^{k+1}, \quad \nabla g(z^\star )=-B^T\lambda ^\star , \end{aligned}$$

for some \(\xi ^{k+1}\in \partial f(x^{k+1})\) and \(\xi ^\star \in \partial f(x^\star )\). Suppose that \(d=\tfrac{L}{\lambda _{\min }(BB^T)}\) and \(\alpha =\tfrac{2dt}{d+t}\). By Theorem 2, we obtain

$$\begin{aligned}&\frac{\alpha \left( d^2+t^2\right) }{d^2-t^2} \left( f^\star -f(x^{k+1})-\left\langle \lambda ^{k} +tAx^{k+1}+tBz^k, Ax^{k+1}\right\rangle -\tfrac{c_1}{2}\left\| x^{k+1} \right\| _A^2\right) \\&\quad \times \frac{\alpha \left( d^2+t^2\right) }{d^2-t^2} \left( f(x^{k+1})-f(x^\star )+\left\langle \lambda ^\star , Ax^{k+1}\right\rangle -\tfrac{c_1}{2}\left\| x^{k+1} \right\| _A^2\right) \\&\quad + \alpha \left( g(z^{k+1})-g^\star +\left\langle \lambda ^\star , Bz^{k+1}\right\rangle -\tfrac{1}{2L}\left\| B^T \left( \lambda ^\star -\lambda ^{k+1} \right) \right\| ^2\right) \\&\quad +\alpha \left( g^\star -g(z^{k+1}) -\left\langle \lambda ^{k+1}, Bz^{k+1}\right\rangle -\tfrac{1}{2L}\left\| B^T\left( \lambda ^\star -\lambda ^{k+1} \right) \right\| ^2\right) \\&\quad + \alpha \left( g(z^{k})-g(z^{k+1}) +\left\langle \lambda ^{k+1}, Bz^{k}-Bz^{k+1}\right\rangle -\tfrac{1}{2L}\left\| B^T\left( \lambda ^{k+1}-\lambda ^{k} \right) \right\| ^2\right) \\&\quad + \alpha \left( g(z^{k+1})-g(z^{k})+\left\langle \lambda ^{k}, Bz^{k+1}-Bz^{k}\right\rangle -\tfrac{1}{2L}\left\| B^T \left( \lambda ^{k+1}-\lambda ^{k} \right) \right\| ^2\right) \ge 0. \end{aligned}$$

By employing \(\Vert B^T\lambda \Vert ^2\ge \tfrac{L}{d}\Vert \lambda \Vert ^2\) and \(\lambda ^{k+1}=\lambda ^k+tAx^{k+1}+tBz^{k+1}\), the aforementioned inequality can be expressed as follows after some algebraic manipulation,

$$\begin{aligned}&\tfrac{-\alpha ^2}{4}\left\| \left( \frac{2t^2}{d^2-dt}\right) Ax^{k+1}+Bz^k-\left( 1+\tfrac{t}{d}\right) Bz^{k+1} \right\| ^2-\frac{2 t \left( d^2+t^2\right) \left( cd^2-d t (c+t)-t^3\right) }{\left( d^2-t^2\right) ^2}\\&\quad \times \left\| Ax^{k+1}\right\| ^2-\tfrac{\alpha ^2}{4d^2} \left\| \lambda ^k-\lambda ^\star +\left( \frac{2 d^2 -(d-t)^2}{d-t}\right) Ax^{k+1}+\left( d+t\right) Bz^{k+1} \right\| ^2\\&\quad +\left( \tfrac{d}{d+t} \right) ^2\left( \left\| \lambda ^k -\lambda ^\star \right\| ^2+t^2\left\| Bz^k\right\| ^2\right) -\left( \left\| \lambda ^{k+1}-\lambda ^\star \right\| ^2+t^2 \left\| Bz^{k+1}\right\| ^2\right) \ge 0. \end{aligned}$$

Hence, we have

$$\begin{aligned} V^{k+1}\le \left( \tfrac{d}{d+t} \right) ^2 V^k, \end{aligned}$$

and the proof is complete. \(\square \)

As the sequence \(\{V^k\}\) is not increasing [9, Convergence Proof], we have \(V^1\le V^0\). Thus, by using Theorem 3 and Proposition 3, one can infer the following theorem.

Theorem 9

Let \(f\in {\mathcal {F}}_{c_1}^A({\mathbb {R}}^n)\) with \(c_1>0\) and let \(g\in {\mathcal {F}}_{0}({\mathbb {R}}^m)\) be L-smooth. Assume that \(N\ge 5\) and B has full row rank. If \(t<\min \{\tfrac{c_1}{2}, \tfrac{L}{2\lambda _{\min }(BB^T)} \}\), then

$$\begin{aligned} D(\lambda ^\star )-D(\lambda ^N)\le \rho \left( \tfrac{L}{L+t\lambda _{\min }(BB^T)} \right) ^{2N}, \end{aligned}$$
(44)

where \(\rho =\tfrac{V^0}{16t}\left( \tfrac{L}{L+t\lambda _{\min }(BB^T)} \right) ^{-10}\).

In the same line, one can infer the R-linear convergence in terms of primal and dual residuals under the assumptions of Theorems 8 and 9. In this section, we proved the linear convergence of \(\{V^k\}\) under two scenarios (S1) and (S2). By (7), it is readily seen that function \(-D\) is strongly convex under the hypotheses of both scenarios (S1) and (S2). Therefore, both scenarios imply the PŁ inequality. One may wonder that if the PŁ inequality and the strong convexity of f imply the linear of \(\{V^k\}\). By using performance estimation, we could not establish such an implication.

As mentioned above, function \(-D\) under both scenarios are \(\mu \)-strongly convex. Hence, the optimal solution set of the dual problem is unique and one can infer the R-linear convergence of \(\lambda ^N\) by using Theorem 8 (Theorem 9) and the known inequality,

$$\begin{aligned} \tfrac{\mu }{2}\left\| \lambda ^N-\lambda ^\star \right\| ^2 \le D(\lambda ^\star )-D(\lambda ^N). \end{aligned}$$

6 Concluding remarks

In this paper we developed performance estimation framework to handle dual-based methods. Thanks to this framework, we could obtain some tight convergence rates for ADMM. This framework may be exploited for the analysis of other variants of ADMM in the ergodic and non-ergodic sense. Moreover, similarly to [27], one can apply this framework for introducing and analyzing new accelerated ADMM variants. Moreover, most results hold for any arbitrary positive step length, t, but we managed to get closed form formulas for some interval of positive numbers.