1 Introduction

To numerically simulate unsteady flow phenomena, a commonly used approach is to first apply some time integration technique and then solve space-only problems at discrete time instances, which are coupled forward in time matching the nature of the physical process. This inherently sequential workflow naively prevents the full potential of today’s supercomputers, which consist of an enormous amount of computing units tuned to execute many computational tasks in parallel. To circumvent this barrier and increase the scaling capabilities especially when spatial parallelization techniques saturate, so called parallel-in-time solution strategies come into play, which parallelize the solution process in time so that the overall time-to-solution can be further reduced.

One famous representative of this class is Parareal first introduced by Lions, Maday, and Turinici [17], where the sequential process of time integration is relaxed in an iterative manner by applying the solution strategy on subintervals in parallel and then synchronizing the auxiliary solutions in an inexpensive corrector step. This algorithm was first applied to the solution of the incompressible Navier–Stokes equations in [27, 28] using a first order finite volume scheme for space discretization, whereas finite element and spectral approximations were considered by Fischer, Hecht, and Maday [11]. An extension of the approach to non-isothermal flows was investigated in [6, 20], while Croce, Ruprecht, and Krause [7] and Steiner et al. [26] focused on the performance of practical implementations and discussed the influence of the Reynolds number on the convergence behavior. But also other parallel-in-time methods were successfully applied to the Navier–Stokes equations. For example, the Paraexp algorithm was used in [15] for rather low viscosity parameters, while Falgout et al. [10] and most recently Christopher et al. [4, 5] studied the influence of MGRIT for the Reynolds-averaged Navier–Stokes equations and in case of an adaptive space-time mesh refinement strategy, respectively.

The algorithm discussed in this work for solution of the Oseen equations is based on [8, 19] and accelerates the convergence behavior of the all-at-once pressure Schur complement (PSC) solution strategy presented therein using the augmented Lagrangian (AL) methodology. The latter approach was first introduced in [14, 21] and modifies the momentum equation by means of the residual of the continuity equation. In this way, the solution remains the same, but very accurate and efficiently applicable PSC preconditioners can be constructed. These preconditioners are most accurate, if already existing global-in-time PSC preconditioners for the original system are incorporated in the solution process. In this work, global-in-time extensions of commonly used preconditioners like the pressure convection diffusion (PCD) [8, 19] and the least squares commutator (LSC) preconditioner [19] are considered for these purposes. The resulting solution strategy then converges rapidly to the solution of the Oseen equations even without the need of a coarse grid correction as proposed by the authors in [19].

Unfortunately, the AL methodology also comes with a disadvantage: The velocity problem involved in each Schur complement iteration becomes ill conditioned as more stabilization is introduced due to the fact that a singular matrix is added. Therefore, commonly used solution strategies fail for this problem and highly specialized versions are mandatory for an efficient solution algorithm. For example, Benzi and Olshanskii [2] and Schöberl [24] introduced a geometric multigrid approach with an adapted relaxation and intergrid transfer for triangular meshes. In this work, an extension to quadrilateral meshes [25] based on the \(Q_2\)-\(P_1\) finite element (FE) discretization is combined with the multigrid waveform relaxation method developed by [18] for efficiently solving the aforementioned global-in-time velocity problems.

The contents of the following sections are as follows: The unsteady incompressible Navier–Stokes equations are introduced in Sect. 2. The problem is then discretized in space and time to construct a global-in-time problem for the solution at all time instances (cf. Sect. 3). Section 4 summarizes the pressure Schur complement solver as presented in [19]. This approach is extended by the augmented Lagrangian methodology in Sect. 5 and numerically investigated in several linear and nonlinear test cases in Sect. 7. The work concludes in Sect. 8 with a discussion of future challenges, especially for convection-dominated fluid flows.

2 Continuous Setting

The flow of a viscous and incompressible fluid through a domain \(\Omega \subset \mathbb {R}^d\), \(d = 2,3\), can be described by the incompressible Navier–Stokes equations

$$\begin{aligned} \frac{\partial {\textbf{v}}}{\partial t} + ({\textbf{v}}\cdot \nabla ) {\textbf{v}}- \nabla \cdot \bigl (2 \mu (\dot{\gamma }({\textbf{v}})) \textrm{D}({\textbf{v}}) \bigr ) + \textrm{grad}(p)&= \textbf{g}, \end{aligned}$$
(1a)
$$\begin{aligned} \textrm{div}({\textbf{v}})&= 0, \end{aligned}$$
(1b)

where \({\textbf{v}}\in \mathbb {R}^d\) and \(p \in \mathbb {R}\) denote the unknown velocity field and pressure variable, respectively. Here, the body force density acting on the fluid is given by \(\textbf{g} \in \mathbb {R}^d\), while the dynamic viscosity \(\mu > 0\) may depend nonlinearly on the effective shear rate \(\dot{\gamma } = \sqrt{2} \bigl \Vert \textrm{D} ({\textbf{v}}) \bigr \Vert _\textrm{F}\) of the strain rate tensor \(\textrm{D}({\textbf{v}}) = \tfrac{1}{2} (\nabla {\textbf{v}}+ \nabla {\textbf{v}}^{\top })\). In the special case of a Newtonian fluid, the viscosity parameter \(\mu \) is constant so that the viscous part of the momentum equation (1a) simplifies to \(- \mu \Delta {\textbf{v}}\) due to the validity of the continuity equation (1b).

In the literature, there exist several different numerical solution techniques for problem (1). One common approach is to first apply a linearization technique, such as the Picard iteration or Newton’s method, and then solve the resulting Oseen equations in each nonlinear iteration. For the former approach, the linearized system of partial differential equations reads

$$\begin{aligned} \frac{\partial {\textbf{v}}}{\partial t} + (\bar{{\textbf{v}}} \cdot \nabla ) {\textbf{v}}- \nabla \cdot \bigl ( 2 \mu (\dot{\gamma }(\bar{{\textbf{v}}})) \textrm{D}({\textbf{v}}) \bigr ) + \textrm{grad}(p)&= \textbf{g}, \end{aligned}$$
(2a)
$$\begin{aligned} \textrm{div}({\textbf{v}})&= f \end{aligned}$$
(2b)

for some known initial guess \(\bar{{\textbf{v}}}\) and a vanishing right hand side \(f = 0\) of the continuity equation. In case of Newton’s method, the solution update solves (2) for the right hand sides \(\textbf{g}\) and f corresponding to the residuals of the momentum and continuity equations, respectively, while the additional term

$$\begin{aligned} {\mathcal {R}}(\bar{{\textbf{v}}}; {\textbf{v}}) = ({\textbf{v}}\cdot \nabla ) \bar{{\textbf{v}}} - \nabla \cdot \Bigl ( 4 \mu ' (\dot{\gamma }(\bar{{\textbf{v}}})) \dot{\gamma }(\bar{{\textbf{v}}})^{-1} \bigl ( \textrm{D}(\bar{{\textbf{v}}}) : \textrm{D}({\textbf{v}}) \bigr ) \textrm{D}(\bar{{\textbf{v}}}) \Bigr ) \end{aligned}$$

is involved in the definition of the momentum equation (2a). This linearization technique can significantly improve the nonlinear convergence behavior compared to the Picard iteration if the initial guess is sufficiently good. In the numerical examples, we therefore focus on test problems for the linear Oseen equations as occurring in the Picard iteration as well as on the nonlinear convergence behavior of Newton’s method.

3 Global-in-Time Discrete Oseen Problem

In this section, we discretize the Oseen equations in space and time and construct a global linear system of equations to compute the numerical approximation in all time instances simultaneously. For this purpose, the problem is discretized in space using the isoparametric \(Q_2\)-\(P_1\) FE pair, where each component of the velocity field is approximated by a continuous and piecewise biquadratic function while the pressure space is given by all piecewise linear functions on a quadrilateral mesh [1]. As we will see, this discretization technique allows the construction of efficient pressure Schur complement solvers using the augmented Lagrangian methodology and provides several advantages compared to an Taylor-Hood FE pair as considered in [19].

After discretization in space, the linear system of ordinary differential equations reads

$$\begin{aligned} \textrm{M}_u \dot{\textrm{u}}(t) + \textrm{A}_u(t) \textrm{u}(t) + \textrm{Bp}(t) = \textrm{g}(t), \qquad \textrm{B}^{\top } \textrm{u}(t) = \textrm{f}(t) \end{aligned}$$
(3)

where \(\textrm{u}(t) \in \mathbb {R}^{N_u}\), \(N_u \in \mathbb {N}\), and \(\textrm{p}(t) \in \mathbb {R}^{N_p}\), \(N_p \in \mathbb {N}\), are the time-dependent vectors of degrees of freedom associated with the velocity field and pressure variable, while \(\textrm{g}(t) \in \mathbb {R}^{N_u}\) and \(\textrm{f}(t) \in \mathbb {R}^{N_p}\) correspond to the right hand sides of the momentum and continuity equation. The velocity mass matrix \(\textrm{M}_u \in \mathbb {R}^{N_u \times N_u}\) is computed using the Simpsons rule and is therefore diagonal. Furthermore, the matrices \(\textrm{B} \in \mathbb {R}^{N_u \times N_p}\) and \(\textrm{B}^{\top } \in \mathbb {R}^{N_p \times N_u}\) are the discrete counterparts of the gradient and divergence operator, respectively, while all other contributions to the momentum equation are summarized in \(\textrm{A}_u(t) \in \mathbb {R}^{N_u \times N_u}\).

To numerically approximate the solution to (3), we eventually apply the Crank-Nicolson scheme for time integration, where the time step size \({\delta t}> 0\) is assumed to be constant for the sake of simplicity. Then the approximations \(\textrm{u}^{(n+1)} \approx \textrm{u} \bigl ( (n+1) {\delta t}\bigr )\) and \(\textrm{p}^{(n+1)} \approx p \bigl ( (n + \tfrac{1}{2}) {\delta t}\bigr )\) solve the linear system of equations (c.f. [29])

$$\begin{aligned} \textrm{A}_i^{(n+1)} \textrm{u}^{(n+1)} + {\delta t}\textrm{B} \textrm{p}^{(n+1)}&= \tfrac{{\delta t}}{2} ( \textrm{g}^{(n+1)} + \textrm{g}^{(n)} ) -\textrm{A}_e^{(n)} \textrm{u}^{(n)}, \end{aligned}$$
(4a)
$$\begin{aligned} \textrm{B}^{\top } \textrm{u}^{(n+1)}&= \textrm{f}^{(n+1)} \end{aligned}$$
(4b)

at each time step \(n = 0, \ldots , K\), where \(K \in \mathbb {N}\) denotes the total number of time steps and the system matrices \(\textrm{A}_i^{(n+1)}\) and \(\textrm{A}_e^{(n)}\) are defined by

$$\begin{aligned} \textrm{A}_i^{(n+1)} = \textrm{M}_u + \tfrac{{\delta t}}{2} \textrm{A}_u^{(n+1)}, \quad \textrm{A}_e^{(n)} = - \textrm{M}_u + \tfrac{{\delta t}}{2} \textrm{A}_u^{(n)}. \end{aligned}$$

Compactly written, the velocity field \(\textrm{u}^{(n+1)}\) and the scaled pressure unknown \(\tilde{\textrm{p}}^{(n+1)} = {\delta t}\textrm{p}^{(n+1)}\) solve the saddle point problem

$$\begin{aligned} \begin{pmatrix} \textrm{A}_i^{(n+1)} &{} \textrm{B} \\ \textrm{B}^{\top } \end{pmatrix} \begin{pmatrix} \textrm{u}^{(n+1)} \\ \tilde{\textrm{p}}^{(n+1)} \end{pmatrix} = \begin{pmatrix} \tilde{\textrm{g}}^{(n+1)} - \textrm{A}_e^{(n)} \textrm{u}^{(n)} \\ \textrm{f}^{(n+1)} \end{pmatrix}, \quad n = 0, \ldots , K \end{aligned}$$
(5)

using the right hand side \(\tilde{\textrm{g}}^{(n+1)} = \frac{{\delta t}}{2} (\textrm{g}^{(n+1)} + \textrm{g}^{(n)} )\). The solution of this system is usually advanced in time step-by-step due to the fact that the solution \((\textrm{u}^{(n+1)}, \tilde{\textrm{p}}^{(n+1)})\) depends on the previous velocity field \(\textrm{u}^{(n)}\). To circumvent this barrier and provide the possibility to parallelize the solution procedure in time, we treat all K time steps simultaneously and block the individual subproblems into a single all-at-once system of equations

$$\begin{aligned} \left( \begin{array}{cccc|cccc} \textrm{A}_i^{(1)} &{} &{} &{} &{} \textrm{B} \\ \textrm{A}_e^{(1)} &{} \textrm{A}_i^{(2)} &{} &{} &{} &{} \textrm{B} \\ &{} \ddots &{} \ddots &{} &{} &{} &{} \ddots \\ &{} &{} \textrm{A}_e^{(K-1)} &{} \textrm{A}_i^{(K)} &{} &{} &{} &{} \textrm{B} \\ \hline \textrm{B}^{\top } &{} &{} &{} \\ &{} \textrm{B}^{\top } &{} &{} \\ &{} &{} \ddots &{} \\ &{} &{} &{} \textrm{B}^{\top } \end{array}\right) \begin{pmatrix} \textrm{u}^{(1)} \\ \textrm{u}^{(2)} \\ \vdots \\ \textrm{u}^{(K)} \\ \hline \tilde{\textrm{p}}^{(1)} \\ \tilde{\textrm{p}}^{(2)} \\ \vdots \\ \tilde{\textrm{p}}^{(K)} \end{pmatrix} =\begin{pmatrix} \tilde{\textrm{g}}^{(1)} - \textrm{A}_e^{(0)} \textrm{u}^{(0)} \\ \tilde{\textrm{g}}^{(2)} \\ \vdots \\ \tilde{\textrm{g}}^{(K)} \\ \hline \textrm{f}^{(1)} \\ \textrm{f}^{(2)} \\ \vdots \\ \textrm{f}^{(K)} \end{pmatrix}. \end{aligned}$$
(6)

Obviously, problem (6) is equivalent to (5) and, hence, leads to the same velocity and pressure solution at all time steps. However, efficient and very robust solution techniques have to be applied to reduce the overall computational time compared to a sequential solution procedure. In the following section, a PSC approach is described, whose convergence behavior can significantly be improved by exploiting the augmented Lagrangian methodology.

4 Global-in-Time Pressure Schur Complement Iteration

In this section, we briefly summarize the PSC solution technique for the global-in-time saddle point problem (6) as it is described in [8, 19]. For this purpose, we interpret the linear system of equations as a blocked saddle point problem

$$\begin{aligned} \begin{pmatrix} \textbf{A}_K &{} \textbf{B}_K \\ \textbf{B}_K^{\top } &{} \textbf{0} \end{pmatrix} \begin{pmatrix} \textbf{u} \\ \tilde{\textbf{p}} \end{pmatrix} = \begin{pmatrix} \tilde{\textbf{g}} \\ \textbf{f} \end{pmatrix}, \end{aligned}$$
(7)

where all involved quantities are defined as expected. Multiplying the first equation of (7) by \(\textbf{A}_K^{-1}\) and inserting the resulting expression for the velocity field \(\textbf{u} = \textbf{A}_K^{-1} (\tilde{\textbf{g}} - \textbf{B}_K \tilde{\textbf{p}})\) into the discrete counterpart of the continuity equation, we obtain the PSC equation

$$\begin{aligned} \textbf{B}_K^{\top } \textbf{A}_K^{-1} \textbf{B}_K \tilde{\textbf{p}} = \textbf{B}_K^{\top } \textbf{A}_K^{-1} \tilde{\textbf{g}} - \textbf{f} \end{aligned}$$
(8)

for the unknown global-in-time pressure variable \(\tilde{\textbf{p}}\) only. Unfortunately, the involved PSC matrix \(\textbf{P}_K = \textbf{B}_K^{\top } \textbf{A}_K^{-1} \textbf{B}_K\) is generally a dense matrix and, hence, practically impossible to compute. Therefore, iterative solution techniques are commonly exploited to solve this problem without explicitly knowing the entries of the system matrix \(\textbf{P}_K\). For example, the preconditioned Richardson iteration reads

$$\begin{aligned} \tilde{\textbf{p}} \quad \mapsto \quad \tilde{\textbf{p}} +\textbf{q}, \qquad \textbf{q} = \textbf{C}_K^{-1} \bigl (\textbf{B}_K^{\top } \textbf{A}_K^{-1} (\tilde{\textbf{g}} -\textbf{B}_K \tilde{\textbf{p}}) - \textbf{f} \bigr ) \end{aligned}$$
(9)

for some initial guess \(\tilde{\textbf{p}}\) while \(\textbf{C}_K^{-1}\) is a suitable approximation to the inverse of \(\textbf{P}_K\). In each iteration of the solution procedure, a global linear system of equations has to be solved for computation of the auxiliary velocity field \(\tilde{\textbf{u}} =\textbf{A}_K^{-1} (\tilde{\textbf{g}} - \textbf{B}_K \tilde{\textbf{p}})\). This quantity approximates the exact velocity solution of (7) if the norm of the global-in-time PSC residual \(\textbf{r} = \textbf{B}_K^{\top } \tilde{\textbf{u}} - \textbf{f}\) is sufficiently small. While the efficient solution of the velocity problem is (partially) discussed in Sect. 6, we now define three possible preconditioners \(\textbf{C}_K^{-1}\), where the first two have in common that they are exact for infinitely small time step sizes \({\delta t}\) while all three can be applied very efficiently to the space-time problem under investigation.

PCD preconditioner The first candidate of a Schur complement preconditioner is a global-in-time counterpart of the pressure convection diffusion (PCD) preconditioner [16] and was first introduced by Danieli, Southworth, and Wathen [8]. Compactly written, the preconditioner has the form [19]

$$\begin{aligned} \textbf{C}_K^{-1} =(\textrm{I}_K \otimes \textrm{M}_p^{-1}) \textbf{A}_{K,p} (\textrm{I}_K \otimes \hat{\textrm{D}}_p^{-1}), \end{aligned}$$
(10)

where \(\textbf{A}_{K,p}\) is a suitable approximation of \(\textbf{A}_K\) defined in the pressure FE space, while \(\textrm{M}_p \in \mathbb {R}^{N_p \times N_p}\) and \(\hat{\textrm{D}}_p \in \mathbb {R}^{N_p \times N_p}\) denote the pressure mass matrix and pressure Poisson matrix, respectively. According to the use of a discontinuous pressure approximation, the latter matrix is readily not defined. However, a mixed formulation of the Poisson problem can be used to justify the choice of \(\hat{\textrm{D}}_p = \textrm{B}^{\top } \textrm{M}_u^{-1} \textrm{B}\). Note that this matrix can be explicitly determined due to the fact that \(\textrm{M}_u\) is computed using the Simpsons rule and is therefore diagonal. For further details on the derivation and application of the PCD preconditioner, we refer to [8, 19].

LSC preconditioner An all-at-once extension of the least-squares commutator (LSC) preconditioner [22] is defined by [19]

$$\begin{aligned} \textbf{C}_K^{-1} = \bigl ( \textrm{I}_K \otimes ( \hat{\textrm{D}}_p^{-1} \textrm{B}^{\top } \textrm{M}_u^{-1} ) \bigr ) \textbf{A}_K \bigl ( \textrm{I}_K \otimes ( \textrm{M}_u^{-1} \textrm{B} \hat{\textrm{D}}_p^{-1} ) \bigr ). \end{aligned}$$
(11)

This preconditioner explicitly exploits the definition of the system matrix \(\textbf{A}_K\) and, hence, can be applied to more general Oseen equations where different components of the velocity field possibly interact with each other. This fact is especially of interest when the deformation tensor is used in the definition of the momentum equation or Newton’s method is applied to linearize the incompressible Navier–Stokes equations. In case of the PCD preconditioner, a practically well-established workaround is to simply neglect these terms.

Uzawa preconditioner For a constant viscosity parameter \(\mu \) and sufficiently large time increments, it is easy to show that the inverse of the PSC matrix \(\textbf{P}_K\) is spectrally equivalent to

$$\begin{aligned} \textbf{C}_K^{-1} = \tfrac{{\delta t}}{2} \mu \begin{pmatrix} \textrm{M}_p^{-1} \\ \textrm{M}_p^{-1} &{} \textrm{M}_p^{-1} \\ &{} \ddots &{} \ddots \\ &{} &{} \textrm{M}_p^{-1} &{} \textrm{M}_p^{-1} \end{pmatrix} = \tfrac{{\delta t}}{2} \mu \begin{pmatrix} 1 \\ 1 &{} 1 \\ &{} \ddots &{} \ddots \\ &{} &{} 1 &{} 1 \end{pmatrix} \otimes \textrm{M}_p^{-1}. \end{aligned}$$

However, for small up to moderate time step sizes, this approximation is quite bad and, hence, will be considered only briefly in the below numerical examples.

At this point, we emphasize that all preconditioners can be applied very efficiently on modern hardware architectures due to the fact that all involved global linear systems of equations can be decomposed into K independent subproblems. The spatial system matrices to be inverted therein are the same over the whole time interval, so each global problem is equivalent to a single space-only problem with K right hand sides. This allows the use of very efficient solution strategies based on vector operations with reduced data communication.

5 Augmented Lagrangian Methodology

In the previous section, we summarized the idea of a global-in-time pressure Schur complement iteration, which solves the discretized Oseen equations for all time steps simultaneously. The involved preconditioners for the pressure-only problem are designed as straightforward extensions of their sequential counterparts and provide a high degree of parallelism. However, we will observe in Sect. 7 that the convergence behavior of this basic Schur complement iteration is quite slow and even deteriorates as the spatial resolution increases or more and more time steps are blocked. To improve the solution methodology, the authors proposed in [19] the incorporation of a coarse grid correction in the framework of a hierarchical multigrid approach. While this technique relaxes the dependency of the rate of convergence on the above mentioned discretization parameters, the efficient solution of the resulting coarse grid problem is still an unsolved problem.

In this work, we extend the PSC iteration by the augmented Lagrangian methodology. The approach was originally developed independently by Hestenes [14] and Powell [21] and aims to improve the convergence behavior without the need of a coarse grid correction by modifying the discrete momentum equation in a strongly consistent manner. This guarantees that the solution of the discrete problem remains the same, while (theoretically) arbitrarily accurate PSC preconditioners can be constructed. The success of this technique is due to the Sherman-Morrison-Woodbury identity [13]

$$\begin{aligned} (\textrm{A} + \gamma \textrm{UCV})^{-1} = \textrm{A}^{-1} - \textrm{A}^{-1} \textrm{U} (\gamma ^{-1} \textrm{C}^{-1} + \textrm{VA}^{-1} \textrm{U})^{-1} \textrm{VA}^{-1} \end{aligned}$$
(12)

holding true for any invertible matrices A, C, and \(\textrm{C}^{-1} + \textrm{VA}^{-1} \textrm{U}\) while \(\gamma \) is some positive parameter. Roughly speaking, equation (12) describes the influence of a (possibly singular) perturbation on the inverse of a non-singular matrix in an additive fashion. If \(\textrm{A}\) is chosen as a spatial system matrix \(\textrm{A}_i\) (we drop the superscript \((n+1)\) for the time being) and the identity is multiplied from left and right by \(\textrm{B}^{\top }\) and \(\textrm{B}\), respectively, the first expression on the right hand side of (12) coincides with the PSC matrix of (5) while the left hand side is the PSC matrix using a modified system matrix \(\textrm{A}_{i,\gamma } = \textrm{A}_i + \gamma \textrm{UCV}\). Therefore, the PSC matrix \(\textrm{P}_{i,\gamma } = \textrm{B}^{\top } \textrm{A}_{i,\gamma }^{-1} \textrm{B}\) is the sum of \(\textrm{P}_i = \textrm{B}^{\top } \textrm{A}_i^{-1} \textrm{B}\) and some additive term. The crucial question is now how to define the matrices \(\textrm{U}\), \(\textrm{V}\), and \(\textrm{C}\) so that the inverse of \(\textrm{P}_{i,\gamma }\) can be approximated more accurately than the inverse of \(\textrm{P}_i\) while at the same time the action of the approximation can be applied very efficiently. One possibility is the choice of \(\textrm{U} = \textrm{B}\), \(\textrm{V} = \textrm{B}^{\top }\), and \(\textrm{C} = {\delta t}\textrm{M}_p^{-1}\), which does not even extend the sparsity pattern of the system matrix \(\textrm{A}_i\) due to the special choice of a discontinuous pressure FE space. For this definition, simple algebraic manipulations and another application of the Sherman-Morrison-Woodbury identity (12) lead to the ingenious identity (cf. [30, Lemma 5.2])

$$\begin{aligned} \textrm{P}_{i,\gamma }^{-1} = \textrm{P}_i^{-1} + \gamma {\delta t}\textrm{M}_p^{-1}. \end{aligned}$$

Thus, the inverse of \(\textrm{P}_{i,\gamma }\) converges to \(\gamma {\delta t}\textrm{M}_p^{-1}\) as \(\gamma \) increases. This additive term can be applied exactly and does not need to be approximated in any way due to the fact that \(\textrm{M}_p\) is block-diagonal. This guarantees that very accurate approximations \(\textrm{C}_{i,\gamma }^{-1} = \textrm{C}_i^{-1} + \gamma {\delta t}\textrm{M}_p^{-1}\) of \(\textrm{P}_{i,\gamma }^{-1}\) can be constructed by adapting the value of \(\gamma \) even if the approximation \(\textrm{C}_i^{-1}\) of \(\textrm{P}_i^{-1}\) is quite inexact. However, adding \(\gamma {\delta t}\textrm{B} \textrm{M}_p^{-1} \textrm{B}^{\top }\) to the system matrix \(\textrm{A}_i\) generally modifies the solution of the saddle point problem, which is especially not acceptable for large values of \(\gamma \). This downside can be avoided by simultaneously adapting the right hand side of the momentum equation in a strongly consistent way. More precisely, we replace (5) by

$$\begin{aligned} \begin{pmatrix} \textrm{A}_{i,\gamma }^{(n+1)} &{} \textrm{B} \\ \textrm{B}^{\top } \end{pmatrix} \begin{pmatrix} \textrm{u}^{(n + 1)} \\ \tilde{\textrm{p}}^{(n+1)} \end{pmatrix} = \begin{pmatrix} \tilde{\textrm{g}}_\gamma ^{(n+1)} - \textrm{A}_e^{(n)} \textrm{u}^{(n)} \\ \textrm{f}^{(n+1)} \end{pmatrix} \end{aligned}$$
(13)

using the velocity system matrix \(\textrm{A}_{i,\gamma }^{(n+1)} = \textrm{A}_i^{(n+1)} + \gamma {\delta t}\textrm{B} \textrm{M}_p^{-1} \textrm{B}^{\top }\) and the modified right hand side \(\tilde{\textrm{g}}_\gamma ^{(n)} = \tilde{\textrm{g}}^{(n)} + \gamma {\delta t}\textrm{B} \textrm{M}_p^{-1} \textrm{f}^{(n)}\). Then the first equation of (13) reads

$$\begin{aligned} \textrm{A}_i^{(n+1)} \textrm{u}^{(n+1)} + \textrm{B} \tilde{\textrm{p}}^{(n+1)} = \tilde{\textrm{g}}^{(n+1)} - \textrm{A}_e^{(n)} \textrm{u}^{(n)} + \gamma {\delta t}\textrm{B} \textrm{M}_p^{-1} \underbrace{(\textrm{f}^{(n+1)} - \textrm{B}^{\top } \textrm{u}^{(n+1)})}_{=0}, \end{aligned}$$

where the last expression vanishes due to the fact that \(\textrm{B}^{\top } \textrm{u}^{(n+1)} = \textrm{f}^{(n+1)}\) is satisfied by the second equation. Therefore, the saddle point problem (13) is equivalent to (5) while the inverse of its PSC matrix can be approximated very efficiently for large values of \(\gamma \).

The augmented Lagrangian methodology described above can readily be extended to the global-in-time saddle point problem (6). In this case, only the block-diagonal of \(\textbf{A}_K\) is modified and results in the all-at-once linear system of equations

$$\begin{aligned}&\begin{pmatrix} \textbf{A}_K + \gamma {\delta t}\textbf{B}_K \textbf{M}_{K,p}^{-1} \textbf{B}_K^{\top } &{} \textbf{B}_K \\ \textbf{B}_K^{\top } &{} \textbf{0} \end{pmatrix} \begin{pmatrix} \textbf{u} \\ \tilde{\textbf{p}} \end{pmatrix} = \begin{pmatrix} \tilde{\textbf{g}} + \gamma {\delta t}\textbf{B}_K \textbf{M}_{K,p}^{-1} \textbf{f} \\ \textbf{f} \end{pmatrix} \\&\quad : \iff \left( \begin{array}{cccc|cccc} \textrm{A}_{i, \gamma }^{(1)} &{} &{} &{} &{} \textrm{B} \\ \textrm{A}_e^{(1)} &{} \textrm{A}_{i,\gamma }^{(2)} &{} &{} &{} &{} \textrm{B} \\ &{} \ddots &{} \ddots &{} &{} &{} &{} \ddots \\ &{} &{} \textrm{A}_e^{(K-1)} &{} \textrm{A}_{i,\gamma }^{(K)} &{} &{} &{} &{} \textrm{B} \\ \hline \textrm{B}^{\top } &{} &{} &{} \\ &{} \textrm{B}^{\top } &{} &{} \\ &{} &{} \ddots &{} \\ &{} &{} &{} \textrm{B}^{\top } \end{array}\right) \begin{pmatrix} \textrm{u}^{(1)} \\ \textrm{u}^{(2)} \\ \vdots \\ \textrm{u}^{(K)} \\ \hline \tilde{\textrm{p}}^{(1)} \\ \tilde{\textrm{p}}^{(2)} \\ \vdots \\ \tilde{\textrm{p}}^{(K)} \end{pmatrix} = \begin{pmatrix} \tilde{\textrm{g}}_\gamma ^{(1)} - \textrm{A}_e^{(0)} \textrm{u}^{(0)} \\ \tilde{\textrm{g}}_\gamma ^{(2)} \\ \vdots \\ \tilde{\textrm{g}}_\gamma ^{(K)} \\ \hline \textrm{f}^{(1)} \\ \textrm{f}^{(2)} \\ \vdots \\ \textrm{f}^{(K)} \end{pmatrix}, \end{aligned}$$

where \(\textbf{M}_{K,p} = \textrm{I}_K \otimes \textrm{M}_p\) denotes the global-in-time pressure mass matrix. Using the same argumentation as above, we conclude that the inverse of the all-at-once PSC matrix can be approximated by

$$\begin{aligned} \textbf{P}_{K,\gamma }^{-1} = \textbf{P}_K^{-1} + \gamma {\delta t}\textbf{M}_{K,p}^{-1} \approx \textbf{C}_K^{-1} + \gamma {\delta t}\textbf{M}_{K,p}^{-1} =: \textbf{C}_{K,\gamma }^{-1}, \end{aligned}$$

where \(\textbf{C}_K^{-1}\) is an approximation to \(\textbf{P}_K^{-1}\) and might be chosen as discussed in Sect. 4.

6 Global-in-Time Velocity Solver

While the augmented Lagrangian approach is a promising stabilization technique to improve the convergence behavior of the Schur complement iteration, the involved subproblems for computing auxiliary velocity fields become arbitrarily ill conditioned (cf. [25]). This is caused by a singular matrix added to the original velocity system matrix, which becomes more dominant as the AL parameter increases. Therefore, tailor-made solution techniques have to be constructed, which handle these instabilities in a robust and efficient manner. In this section, we present a candidate for solution of the all-at-once velocity problems, which might be stabilized using the augmented Lagrangian technique. The described approach combines the multigrid waveform relaxation method developed by [18] with a highly specialized multigrid solver proposed in [25], which by itself is inspired by [2, 24].

In each step of the PSC iteration (9), a global-in-time linear system of equations for the auxiliary velocity field \(\tilde{\textbf{u}}\) must be solved, which has the general form

$$\begin{aligned} \textbf{A}_{K,\gamma } \textbf{u} = \textbf{h} \quad : \iff \quad \begin{pmatrix} \textrm{A}_{i,\gamma }^{(1)} \\ \textrm{A}_e^{(1)} &{} \textrm{A}_{i,\gamma }^{(2)} \\ &{} \ddots &{} \ddots \\ &{} &{} \textrm{A}_e^{(K-1)} &{} \textrm{A}_{i,\gamma }^{(K)} \end{pmatrix} \begin{pmatrix} \textrm{u}^{(1)} \\ \textrm{u}^{(2)} \\ \vdots \\ \textrm{u}^{(K)} \end{pmatrix} = \begin{pmatrix} \textrm{h}^{(1)} \\ \textrm{h}^{(2)} \\ \vdots \\ \textrm{h}^{(K)} \\ \end{pmatrix}. \end{aligned}$$
(14)

Problem (14) can be interpreted as a space-only problem for vector-valued unknowns, where each spatial degree of freedom consists of the velocity solutions associated with the corresponding spatial basis function at all time steps. Using this notation, the multigrid waveform relaxation method is nothing else than a space-only multigrid technique applied to the space-only problem so that each time step is treated in the same manner. If the multigrid technique exploits a Jacobi smoother, the whole algorithm is also called time-simultaneous multigrid technique [9] and the linear system of equations involved in the relaxation step decomposes into independent subproblems for each spatial degree of freedom. Although this solution algorithm performs very well for diffusion-dominated test cases, we will see that the convergence behavior drastically deteriorates as the AL parameter \(\gamma \) increases. Therefore, we combine this global-in-time solution technique with the highly specialized multigrid algorithm proposed in [25], which is tailor-made for space-only problems stabilized by the augmented Lagrangian methodology. More precisely, we replace the intergrid transfer operators as well as the block Jacobi smoother by customized versions, which are based on solutions to many independent subproblems on local patches. For its precise definition, we first summarize the main components of the solution process as proposed in [25] for a space-only linear system of equations

$$\begin{aligned} \textrm{A}_{i,\gamma } \textrm{u} = (\textrm{A}_i + \gamma {\delta t}\textrm{B} \textrm{M}_p^{-1} \textrm{B}^{\top }) \textrm{u} = \textrm{h} \end{aligned}$$
(15)

using some (spatial) system matrix \(\textrm{A}_{i,\gamma } \in \mathbb {R}^{N_u \times N_u}\), the unknown solution \(\textrm{u} \in \mathbb {R}^{N_u}\), and the right hand side vector \(\textrm{h} \in \mathbb {R}^{N_u}\).

Space-only Smoother The relaxation method presented in [25] is based on an additive block preconditioner, where local subproblems for each spatial node of the triangulation have to be solved. More precisely, the subproblems refer to the parts of the linear system of equations \(\textrm{A}_{i,\gamma } \textrm{u} = \textrm{h}\) whose degrees of freedom are associated with points in the interior of nodal patches. For their definition, we introduce the subset \(\varpi _j \subseteq \Omega \) as the composition of all elements \(K^e\) containing mesh node \({\textbf{x}}_j\), i.e.,

$$\begin{aligned} \varpi _j = \bigcup _{e, \; {\textbf{x}}_j \in K^e} K^e, \quad j \in \{1, \ldots , N_n\}, \end{aligned}$$

where \(N_n \in \mathbb {N}\) is the total number of mesh nodes. Then the index set \({\mathcal {N}}_j\) is defined as

$$\begin{aligned} {\mathcal {N}}_j = \bigl \{ k \in \{1, \ldots , N_u\} : \tilde{{\textbf{x}}}_k \in \mathring{\varpi }_j \bigr \}, \quad j \in \{1, \ldots , N_n\}, \end{aligned}$$

where \(\tilde{{\textbf{x}}}_1, \ldots , \tilde{{\textbf{x}}}_{N_u}\) are the locations associated with the biquadratic basis functions \(\varphi _1, \ldots , \varphi _{N_u}\) of the velocity FE space, i.e., \(\varphi _j(\tilde{{\textbf{x}}}_k) = \delta _{jk}\), and \(\mathring{\varpi }_j\) is the interior of \(\varpi _j\), i.e., \(\mathring{\varpi }_j = \varpi _j {\setminus } \partial \varpi _j\). Now the additive block preconditioner \(\textrm{D}_i^{-1} \approx \textrm{A}_{i,\gamma }^{-1}\) solves local subproblems for the degrees of freedom in \({\mathcal {N}}_j\) and sums up the local solutions for all spatial nodes of the mesh, i.e.,

$$\begin{aligned} \textrm{D}_i^{-1} = \sum _{j = 1}^{N_n} {\mathcal {I}}_j ({\mathcal {I}}_j^{\top } \textrm{A}_{i,\gamma } {\mathcal {I}}_j)^{-1} {\mathcal {I}}_j^{\top }. \end{aligned}$$
(16)

Here, the injection operator \({\mathcal {I}}_j \in \mathbb {R}^{N_u \times |{\mathcal {N}}_j|}\) maps a vector of unknowns associated with the local patch \(\varpi _j\) to a vector of global degrees of freedoms and introduces zero entries in components that are not included in \({\mathcal {N}}_j\). Note that all degrees of freedoms associated with cells or edges of the mesh are treated several times and the updates are not averaged in the gathering progress.

Space-only intergrid transfer For definition of the specialized intergrid transfer operators, let \(\textrm{P}\) be the prolongation operator naturally induced by the \(Q_2\) FE space. This quantity should not be mixed up with the PSC matrices \(\textrm{P}_{i,\gamma }\) and \(\textrm{P}_i\) introduced in previous sections. Then the robust intergrid transfer operators are defined by

$$\begin{aligned} \tilde{\text {P}} = \Bigl [ \text {I} - \Bigl ( \sum _{j = 1}^{\bar{N}_e} {\mathcal {I}}_j ({\mathcal {I}}_j^{\top } \text {A}_{i,\gamma } {\mathcal {I}}_j)^{-1} {\mathcal {I}}_j^{\top } \Bigr ) (\gamma {\delta t}{\text {B}} {\text {M}}_p^{-1} {\text {B}}^{\top }) \Bigr ] {\text {P}}, \quad \tilde{\text {R}} = \tilde{\text {P}}^{\top } \end{aligned}$$
(17)

where \({{\textbf {x}}}_1, \ldots , {{\textbf {x}}}_{\bar{N}_e}\) are assumed to be the nodes of the fine mesh which are located in the interior of the elements of the coarse grid. This definition is consistent in the way that \(\textrm{P} = \tilde{\textrm{P}}\) holds true for \(\gamma = 0\). Furthermore, the local subproblems occurring in the definition of \(\tilde{\textrm{P}}\) are also involved in the definition of \(\textrm{D}_i^{-1}\) and, hence, may only need to be computed once.

We now extend the relaxation technique as well as the intergrid transfer operators to the global-in-time problem (14), so they can be exploited in the context of multigrid waveform relaxation.

Global-in-time smoother The additive block preconditioner (16) is extended to the all-at-once problem by solving subproblems on local patches for all time steps simultaneously without synchronizing the solutions in between. This guarantees that all subproblems can be solved independently of each other and preserves the methodology of the time-simultaneous (multigrid) approach providing a high degree of parallelization. In detail, the additive block preconditioner reads

$$\begin{aligned} \textbf{D}_K^{-1} = \sum _{j = 1}^{N_n} (\textrm{I}_K \otimes {\mathcal {I}}_j) \bigl ( (\textrm{I}_K \otimes {\mathcal {I}}_j) \textbf{A}_K (\textrm{I}_K \otimes {\mathcal {I}}_j^{\top }) \bigr )^{-1} (\textrm{I}_K \otimes {\mathcal {I}}_j^{\top }). \end{aligned}$$

In the numerical examples presented below, this preconditioner is embedded into the GMRES method without restart to increase the robustness of the solution strategy.

Global-in-time intergrid transfer The robust space-time prolongation and restriction operators are given by

$$\begin{aligned} \tilde{\textbf{P}}_K = \begin{pmatrix} \tilde{\textrm{P}}^{(1)} \\ &{} \tilde{\textrm{P}}^{(2)} \\ &{}&{} \ddots \\ &{}&{}&{} \tilde{\textrm{P}}^{(K)} \end{pmatrix}, \quad \tilde{\textbf{R}}_K = \tilde{\textbf{P}}_K^{\top } \end{aligned}$$

exploiting the sequential versions of the robust intergrid transfer operators as defined in (17). In contrast to the relaxation scheme, these operators only use the block diagonal entries of \(\textbf{A}_K\), but must be assembled for each time step individually. This makes the customized grid transfer more expensive than the one naturally induced by the FE space.

7 Numerical Examples

In this section, we solve several numerical test problems using the augmented Lagrangian technique to accelerate the convergence behavior of the pressure Schur complement iteration as described in Sect. 4. This iterative solver is embedded into a GMRES method, which is restarted after four inner iterations. To quantify the efficiency of the solver at hand, the total number of inner iterations is counted. This value (approximately) measures how often the implicitly defined system matrix and the involved Schur complement preconditioner are applied till a certain tolerance in accuracy is reached for a vanishing initial guess.

Fig. 1
figure 1

Computational domain including partition of boundary and triangulation on coarsest mesh level 0

The numerical examples are computed on the ‘flow around a cylinder’ domain as illustrated in Fig. 1, where all external forces are set to zero. Furthermore, homogeneous Neumann (‘do nothing’) boundary conditions are prescribed on \(\Gamma _\textrm{out}= \partial \Omega {\setminus } (\Gamma _\textrm{in}\cup \Gamma _\textrm{wall})\) while no slip boundary conditions are enforced on \(\Gamma _\textrm{wall}\). Therefore, the dynamics of the fluid are solely governed by the inflow boundary data

$$\begin{aligned} {\textbf{v}}_\textrm{in}= U(t) \frac{4 y (0.41 - y)}{0.41^2} \begin{pmatrix} 1 \\ 0 \end{pmatrix}, \quad U(t) = U_0 \bigl | \sin \left( \tfrac{\pi }{8} t\right) \bigr |, \quad U_0 = 0.3 \quad \text {on}\ \Gamma _\textrm{in}\end{aligned}$$

and the initial data \({\textbf{v}}_0 = \textbf{0}\) of the velocity field \({\textbf{v}}\) (cf. [19]).

As mentioned above, the problem is discretized in space using the \(Q_2\)-\(P_1\) FE pair, while the time integrator is given by the Crank-Nicolson scheme using a fixed time step size of \({\delta t}= \frac{1}{25} \cdot 2^{-{\textrm{lvl}}}\), where \({\textrm{lvl}}\in \{0, \ldots , 5\}\) denotes the number of uniform refinements of the triangulation at hand (cf. Fig. 1). Note that the convergence behavior of the iterative solution strategy using the PCD or LSC preconditioner can be significantly improved by choosing smaller time increments because these preconditioners then become more accurate. However, this property of the solver will not be investigated in this work and we refer to [19] for numerical results in this regard.

7.1 Stokes Problem

The first test case is given by the Stokes problem, where no convective contribution is involved in the definition of the momentum equation and the viscosity parameter is fixed to \(\mu = 10^{-2}\). In this case, we first discuss the influence of the AL parameter on the norm of the PSC residual (for a fixed solution) to define a reasonable stopping criterion for the iterative solution procedure. Figure 2 illustrates the convergence behavior of the GMRES method for the PCD preconditioner and different values of \(\gamma \), where the preconditioned PSC residual is evaluated using another choice of the AL parameter, denoted by \(\tilde{\gamma }\). While a rapid convergence can be observed for all considered variants of the PSC residual if \(\gamma \) is chosen sufficiently large, the relative norm of the preconditioned PSC residual depends only marginally on \(\tilde{\gamma }\). Thus, this quantity seems to be an adequate measure for the quality of the solution and reducing it by a certain value is a reasonable stopping criterion. Fortunately, the preconditioned PSC residual is already determined in GMRES implementations frequently used in the literature so that no additional computations are required. In all linear test cases considered in this work, the GMRES method therefore terminates if

$$\begin{aligned} \Bigl \Vert \textbf{C}_K^{-1} \bigl ( \textbf{B}_K^{\top } \textbf{A}_K^{-1} (\tilde{\textbf{g}} - \textbf{B}_K \tilde{\textbf{p}}) - \textbf{f} \bigr ) \Bigr \Vert _2 < \textrm{tol}_{\textrm{rel}} \Bigl \Vert \textbf{C}_K^{-1} \bigl ( \textbf{B}_K^{\top } \textbf{A}_K^{-1} \tilde{\textbf{g}} - \textbf{f} \bigr ) \Bigr \Vert _2 \end{aligned}$$
(18)

is satisfied for a tolerance of \(\textrm{tol}_{\textrm{rel}} = 10^{-11}\).

Fig. 2
figure 2

Stokes problem: History of norm of preconditioned PSC residual evaluated using different AL parameters \(\tilde{\gamma }\) for PCD preconditioner on mesh level 2 for 800 blocked time steps

In Table 1, the total number of (inner) iterations required to reach this stopping criterion is summarized for different PSC preconditioners. As expected, this value decreases for all solution strategies under investigation as \(\gamma \) increases. However, the GMRES method exploiting the Uzawa preconditioner is not competitive with the other algorithms for moderate choices of the AL parameter \(\gamma \) because this preconditioner approximates the inverse of the PSC matrix \(\textbf{P}_K\) less accurately. Although only (block-diagonal) pressure mass matrices have to be inverted for its application, each GMRES iteration is still costly due to the fact that an auxiliary velocity field must be computed whenever the PSC residual is determined. This makes the Uzawa preconditioner less interesting in practical applications and explains why it is not considered further in the following investigations.

Table 1 Stokes problem: total number of iterations on mesh level 2 for 800 blocked time steps and different PSC preconditioners

Next, we vary the total number of blocked time steps K and discuss its influence on the convergence behavior for the Stokes problem (cf. Fig. 3). While the GMRES method using the PCD preconditioner requires more and more iterations as the length of the time interval increases, the rate of convergence for the LSC preconditioner seems to be bounded above independently of this value even for the unstabilized Schur complement equation. However, introduction of AL stabilization significantly reduces the total number of iterations and especially relaxes the dependency on K for the PCD preconditioner.

Fig. 3
figure 3

Stokes problem: total number of iterations on mesh level 0 for different numbers of blocked time steps

The same behavior can be observed for the solution technique using the LSC preconditioner as the spatial resolution increases, as shown in Table 2. While the convergence behavior deteriorates on finer meshes, the influence becomes less significant for larger values of \(\gamma \). In contrast to this, the algorithm exploiting the PCD preconditioner even improves as the spatial resolution increases. However, a rapid convergence can be observed for both preconditioners only for a great amount of AL stabilization.

Table 2 Stokes problem: total number of iterations on different mesh levels for 800 blocked time steps

7.2 Oseen Problem for Newtonian Fluid

After investigating the Stokes equations, we now add convectivity to the definition of the momentum equation and study the resulting Oseen problem for two different viscosity parameters, namely \(\mu = 10^{-2}\) and \(\mu = 10^{-3}\). In these cases, the velocity field involved in the convective contribution is set to the solution of the incompressible Navier–Stokes equations (1), so the linear problem at hand coincides with the final Picard iteration to solve the underlying nonlinear problem. Although no vortex shedding is to be expected behind the cylinder for both viscosity parameters as the Reynolds number does not exceed \(\textrm{Re}= 2\) or \(\textrm{Re}= 20\) (cf. [23]), the convective contribution is more dominant for \(\mu = 10^{-3}\) and significantly affects the convergence behavior of the GMRES method (cf. Table 3). Without any AL stabilization, the total number of inner iterations using the PCD preconditioner increases as more and more time steps are blocked. This dependency becomes more dominant as the viscosity parameter goes to zero and possibly even leads to a non-converging solution procedure. For the LSC preconditioner, this effect is reversed and the solver tends to improve for smaller viscosity parameters. However, more and more iterations are required as the spatial resolution increases, which perfectly fits the observations for the Stokes equations.

These dependencies become less significant as AL stabilization is introduced in the solution procedure and the total number of inner GMRES iterations goes to zero as \(\gamma \) increases for all configurations under investigation. In case of the largest value of the AL parameter \(\gamma = 10^3\), only \(3-6\) iterations are required to solve the linear system of equations. In this case, the use of the PCD preconditioner seems to be preferable due to the fact that its application is two times less expensive than the application of the LSC preconditioner (cf. [19]).

Table 3 Oseen problem for Newtonian fluid: total number of iterations

7.3 Oseen Problem for Carreau–Yasuda Viscosity Model

In the final linear test problem, the Newtonian fluid rheology exploiting a constant viscosity parameter \(\mu \) is replaced by a non-Newtonian one as described by the Carreau–Yasuda model [31]

$$\begin{aligned} \mu (\dot{\gamma }) = \mu _\infty + (\mu _0 - \mu _\infty ) \bigl (1 + (\lambda \dot{\gamma })^a \bigr )^{(n-1)/a}. \end{aligned}$$
(19)

This nonlinear definition of the viscosity parameter simplifies to the well-known Carreau model [3] for \(a = 2\) and simulates shear thinning effects for \(n < 1\). In the following numerical studies, the other material parameters are chosen as \(\mu _\infty = 0\), \(\lambda = 1\), and \(n = 0.31\), while the maximum viscosity parameter \(\mu _0\) is set to \(10^{-2}\) and \(10^2\) leading to an effective viscosity parameter \(\mu (\dot{\gamma }) \in [0.0004, 0.01]\) and \(\mu (\dot{\gamma }) \in [5.8126, 100]\), respectively, as it might occur in the flow behavior of thermoplastics. Therefore, the ratio between the maximum and minimum viscosity parameters is approximately the same and only the type of nonlinearity changes by modifying \(\mu _0\). Another consequence of the varying viscosity parameter is the fact that the viscous part of the momentum equation does not simplify to \(\mu \Delta {\textbf{v}}\). Therefore, different components of the velocity field are coupled with each other even in case of the Picard iteration and only the LSC preconditioner is readily applicable to this fluid rheology.

In Table 4, the total number of inner GMRES iterations is summarized for this preconditioner and different amounts of AL stabilization. As already observed in the previous test problems, the total number of iterations required to reach the stopping criterion grows on finer meshes, while the length of the time horizon has only a minor influence on the rate of convergence. As \(\mu _0\) increases, solution of the Oseen problem becomes more expensive on mesh level 4. For both values of \(\mu _0\), however, the total number of iterations can be drastically reduced by introducing AL stabilization and goes down to \(4-6\) for the considered values of \(\gamma \). Note that the range of AL parameters is shifted by four orders of magnitude for \(\mu _0 = 10^2\) compared to \(\mu _0 = 10^{-2}\) to achieve a similar effect in the convergence behavior. Therefore, the amount of required AL stabilization to significantly reduce the number of GMRES iterations is mainly coupled to the viscous term of the momentum equation for this test case.

Table 4 Oseen problem for Carreau–Yasuda viscosity model: total number of iterations for LSC preconditioner

7.4 Nonlinear Carreau–Yasuda Viscosity Model

So far, only linear test problems were studied as they occur in the final iteration of a Picard iteration for solution of the incompressible Navier–Stokes equations. However, this nonlinear solver converges only slowly to the solution, especially for the considered space-time problems (cf. [19]), so more efficient linearization techniques are required to outperform sequential solution strategies. In case of Newton’s method, additional terms must be considered in the definition of the momentum equation resulting in a stronger coupling between different components of the velocity field. However, the above mentioned linear Schur complement iteration as well as the augmented Lagrangian methodology are still applicable in this case and result in an efficient solution strategy as we will see in this section.

To analyse the nonlinear convergence for (an inexact) Newton’s method, we again solve the incompressible Navier–Stokes equations for the Carreau–Yasuda viscosity model and the maximum viscosity coefficient \(\mu _0 = 10^2\). Firstly, all involved linear subproblems are solved exactly to illustrate the performance of Newton’s method and measure the minimum number of nonlinear iterations required for this test problem. In Fig. 4, the nonlinear convergence behavior is illustrated for different values of the AL parameter by plotting the norm of the (unpreconditioned) PSC residual. Due to the fact that the AL stabilization does not modify the solution of the Oseen equations, the nonlinear convergence behavior is exactly the same for all AL parameters under investigation. However, the norm of the residual is shifted for different values of \(\gamma \) and may lead to an inaccurate solution caused by ill-conditioned problems if \(\gamma \) is chosen too large. Therefore, the AL parameter cannot practically be chosen too large without compromising the accuracy of the solution.

Fig. 4
figure 4

Carreau–Yasuda problem for \(\mu _0 = 10^2\): History of norm of PSC residual for Newton’s method on mesh level 2 using 800 blocked time steps

The exact Oseen solver is now replaced by the above mentioned GMRES method based on the Schur complement equation and the augmented Lagrangian approach, while the (inexact) Newton method is stopped if the relative norm of the PSC residual is smaller than \(10^{-10}\). In contrast to the above investigations for linear test problems, different relative tolerances \(\textrm{tol}_{\textrm{rel}}\) are used in the stopping criterion (18) to determine the most efficient overall solution strategy.

In Table 5, the number of nonlinear Newton steps as well as the total number of inner GMRES iterations are summarized for different values of \(\gamma \) and \(\textrm{tol}_{\textrm{rel}}\). For an overresolved computation of the solution to the Oseen problem, the number of nonlinear iterations coincides, as expected, with the total number of (exact) Newton steps for this problem. However, these configurations without AL stabilization require an enormous number of linear GMRES iterations and, hence, are practically very inefficient. By relaxing the tolerance for the linear subproblems, more nonlinear steps may be required, but the total number of GMRES iterations reduces. Therefore, the overall solution strategy becomes more efficient. This procedure can be further accelerated by introducing AL stabilization. In this case, all linear subproblems are solved with sufficient accuracy even with a single inner GMRES iteration for \(\gamma \geqslant 10^6\) and \(\textrm{tol}_{\textrm{rel}} \leqslant 10^{-2}\). Under the assumption that the cost for each GMRES iteration is independent of the choice of the AL parameter, this is the most efficient solver considered in this work.

Table 5 Carreau–Yasuda problem for \(\mu _0 = 10^2\): total number of nonlinear (left column) and linear (right column) iterations on mesh level 2 for 800 blocked time steps and different linear stopping criteria

7.5 Velocity Solver for Carreau–Yasuda Viscosity Model

As mentioned above, the linear solution strategy proposed in this work can only perform well if the auxiliary velocity field involved in the definition of the PSC residual can be computed efficiently. In Sect. 6, a highly specialized multigrid solver is presented, which solves the velocity problems using customized techniques for relaxation and intergrid transfer. This solver is now applied to the Carreau–Yasuda test problem as occurring in the final step of Newton’s method. This means that the exact solution vanishes due to homogeneous boundary data and a zero vector for the right hand side. For validation purposes, the initial guess is given by the exact velocity field of the incompressible Navier–Stokes equations and the solution is accepted if 11 digits in the Euclidean norm of the residual are gained within a maximum number of 500 iterations.

In Table 6, the number of multigrid iterations is summarized for different numbers of pre- and post-smoothing steps \(\nu _1\) and \(\nu _2\), respectively. Here, the solution is considered on mesh level 2 for \(K = 200\) blocked time steps. While the specialized solver diverges for a large amount of AL stabilization and few smoothing steps, the convergence behavior significantly improves for more relaxation steps. Especially for \(\nu _1, \nu _2 \geqslant 16\), the numerical effort for solving the space-time problem depends only marginally on the AL parameter \(\gamma \) and solution of the stabilized velocity problem is at most two times as expensive as in case of \(\gamma = 0\).

Table 6 Velocity problem for Carreau–Yasuda viscosity model and \(\mu _0 = 10^2\): total number of specialized multigrid iterations on mesh level 2 for 200 blocked time steps

This observation stays valid on finer meshes and for longer time intervals, as summarized in Table 7. While no more than 10 multigrid iterations are required to solve the problem at hand if the highly specialized relaxation and intergrid transfer are used, the rate of convergence for a commonly used multigrid waveform relaxation algorithm using the canonical prolongation and restriction operators as well as the block Jacobi relaxation (cf. [9]) drastically deteriorates for \(\gamma \geqslant 10^0\). Thus, the latter approach fails in the context of the Oseen equations with AL stabilization, where \(\gamma \geqslant 10^6\) seems to be a reasonable choice for efficiently reducing the overall numerical complexity (cf. Table 5).

Table 7 Velocity problem for Carreau–Yasuda viscosity model and \(\mu _0 = 10^2\): total number of iterations for different multigrid solvers using 16 pre- and post-smoothing steps

8 Outlook

The numerical investigations presented in Sect. 7 focused on test cases for small up to moderate Reynolds numbers. In these cases, the dynamics of the fluid are mainly prescribed by the viscous part of the momentum equation and the initial condition only slightly influences the final solution. For this kind of problems, many parallel-in-time algorithms perform well and significant speedups compared to sequential solution strategies can be observed. This behavior changes as the convective term becomes more dominant, for instance, when focusing on the third test case of the well-known flow around a cylinder benchmark for \(\mu = 10^{-3}\) and \(U_0 = 1.5\) [23]. In this case, the Reynolds number attains the maximum value of \(\textrm{Re}= 100\) at \(t = 4\) and a von Kármán vortex street occurs behind the cylinder.

Although the unstabilized Schur complement iteration converges hardly at all for this test problem, again the augmented Lagrangian stabilization significantly improves the convergence behavior and only \(3-4\) inner GMRES iterations are required to reach the stopping criterion for \(\gamma = 10^3\), no matter how many time steps are blocked or how fine the mesh is resolved (cf. Table 8).

Table 8 Oseen problem for Bench3 configuration: total number of iterations

However, the time-simultaneous multigrid algorithm for the velocity subproblems is not able to preserve the performance as observed for the Carreau–Yasuda viscosity model. The line plot in Fig. 5 shows the total number of multigrid iterations for the global-in-time velocity problem as occurring in different temporal subintervals. Especially in the middle of the time interval, where the magnitude of the velocity field attains its maximum and, hence, the convectivity is most dominant, the rate of convergence deteriorates significantly as more time steps are blocked. This behavior is even more pronounced if AL stabilization is included in the linear system of equations. Therefore, the so far considered time-simultaneous multigrid solver is not well suited for convection-dominated problems and calls for other global-in-time solution strategies. One candidate possibly performing better might be a space-time velocity solver as proposed by Gander and Neumüller [12], which exploits time coarsening for discontinuous Galerkin discretizations in time. Unfortunately, this approach has not yet been studied in combination with the augmented Lagrangian approach and, hence, is part of future investigations.

Fig. 5
figure 5

Velocity problem for Bench3 configuration: Total number of specialized multigrid iterations using 16 pre- and post-smoothing steps for solving velocity problem on mesh level 3 and time intervals \(\bigl [ K {\delta t}(i - 1), K {\delta t}i \bigr ]\), \(i = 1, \ldots , 1600 K^{-1}\)

9 Conclusions

The augmented Lagrangian methodology is a well known approach to improve the convergence behavior of Schur complement solvers for the Oseen equations. It has already proved itself in several numerical test cases for stationary or sequential solution strategies. To the best knowledge of the authors, the present work provides the first attempts to extend this approach to the global-in-time solution of the Oseen equations. In combination with already existing pressure Schur complement (PSC) preconditioners for the unstabilized system and embedded into a GMRES method, the resulting solver guarantees a rapid convergence no matter how fine the spatial domain is resolved or how many time steps are considered simultaneously. The numerical results presented in this work illustrate the performance in several numerical test cases and suggest that a speedup can indeed be achieved compared to sequential solution strategies.

Unfortunately, the AL methodology comes at the cost that the velocity problems involved in each computation of the PSC residual become arbitrarily ill conditioned and, hence, require customized solution techniques. For this purpose, the specialized multigrid algorithm proposed in [25] was successfully extended to solution of the global-in-time velocity problem. If enough smoothing steps are performed, the rate of convergence seems to be independent of the amount of AL stabilization and the number of blocked time steps, at least if the viscosity parameter is sufficiently large. However, for high Reynolds numbers, the time-simultaneous solution procedure deteriorates for longer time intervals. Therefore, future work will focus on more sophisticated solution strategies for the velocity problem in the regime of dominant convective contributions. Furthermore, nonlinear solvers for the incompressible Navier–Stokes equations have to be investigated. As considered above, one promising candidate is (a global-in-time) Newton’s method, which requires some globalization technique to guarantee convergence for arbitrary initial guesses and large time intervals.