1 Introduction

Let \({\mathcal {O}} \subset {\mathbb {R}}^n\) be a polygonal domain, \(n \ge 1\), \(N \ge 1\), \(T > 0\) be finite. We are interested in the approximation of the solution process \(u :\Omega \times [0,T] \times {\mathcal {O}} \rightarrow {\mathbb {R}}^N\) to the stochastic p-Laplace system. Given an initial datum \(u_0\) and a stochastic forcing term \(G(u) \,\textrm{d}W\) (for the precise assumptions see Assumption 1), u is determined by the relations

$$\begin{aligned} \begin{aligned} \,\textrm{d}u - {\text {div}}S(\nabla u) \,\textrm{d}t&= G(u) \,\textrm{d}W \quad{} & {} \text { in } \Omega \times (0,T) \times {\mathcal {O}}, \\ u&= 0 \quad{} & {} \text { on } \Omega \times (0,T) \times \partial {\mathcal {O}}, \\ u(0)&= u_0{} & {} \text { on } \Omega \times {\mathcal {O}}, \end{aligned} \end{aligned}$$
(1.1)

where \(S(\xi ):= \left( \kappa + \left|\xi \right| \right) ^{p-2} \xi \in {\mathbb {R}}^{N \times n}\), \(p \in (1,\infty )\) and \(\kappa \ge 0\). Closely related to S is the nonlinear operator \(V(\xi ):= \left( \kappa + \left|\xi \right| \right) ^\frac{p-2}{2} \xi \in {\mathbb {R}}^{N \times n}\).

1.1 Model

The system (1.1) has many important applications in nature. As one major example, it provides a prototype system towards the modeling of non-Newtonian fluids. More specifically, it is closely related to power law fluids [14, 69]. The case \(p=2\) corresponds to the famous stochastic heat equation and has been studied extensively, both analytically e.g. [12, 47, 48] as well as numerically e.g. [1, 24, 36, 42, 67].

The p-Laplace system arises as a stochastically perturbed gradient flow of the energy \(J : W^{1,p}_0({\mathcal {O}}) \rightarrow [0,\infty )\) defined by

$$\begin{aligned} {\mathcal {J}}(u) := \int _{\mathcal {O}} \varphi (\left|\nabla u\right|) \,\textrm{d}x, \end{aligned}$$
(1.2)

where \(\varphi (t):= \int _0^t (\kappa +s)^{p-2}s \,\textrm{d}s\). In the case \(\kappa = 0\) the energy (1.2) corresponds to the classical p-Dirichlet energy.

1.2 Existence and well-posedness

The existence of analytically weak solutions in the space \( L^2\left( \Omega ; C\left( [0,T]; L^2\left( {\mathcal {O}} \right) \right) \right) \cap L^p\big (\Omega ;L^p\big (0,T;W^{1,p}_0\left( {\mathcal {O}}\right) \big )\big )\) can be established by standard monotonicity arguments [61]. It requires a linear growth assumption on the noise coefficient G and \(L^2({\mathcal {O}})\)-integrable initial data.

First results on the existence of strong solutions to stochastically perturbed gradient flows have been obtained by Gess [43]. It includes the degenerate, \(p \ge 2\), p-Laplace equation.

In the literature different generalization of (1.1) have been considered. Well-posedness for merely \(L^1\)-initial data has been addressed in [68]. More general systems, where p is allowed to depend on \((\omega ,t,x)\) respectively on (tx), are considered in [73, 74] respectively in [9]. It was further extended by Breit and Gmeineder [17] to electro-rheological fluids. Electro-rheological fluids are modeled by (1.1) complemented by an additional divergence-free constraint and a free pressure variable. The singular case \(p \in [1,2)\) has been analyzed in [60, 44] and [5].

1.3 Regularity

Regularity properties of a function u become particularly important when it comes to the approximability of u within a discrete function class. Prominently, discrete tensor spaces generated via a time stepping scheme and a finite element discretization in space can approximate smooth functions more easily compared to non-smooth functions.

Historically, many authors addressed Hölder and \(C^{1,\alpha }\)-regularity of solutions to the deterministic steady p-Laplace equation, e.g. [26, 34, 56, 64, 70,71,72]. In general \(\alpha \in (0,1)\) is an unknown quantity. While \(C^{1,\alpha }\)-regularity can be used to measure the approximation quality of finite elements, they usually fail to produce optimal results, since in general \(\alpha < 1\).

Sobolev regularity provides an alternative scale of smoothness. In the singular case, \(p \in (1,2)\), \(W^{2,2}\)-regularity has been proven by Liu and Barrett [59]. The degenerate setting is more delicate and one can not expect \(\nabla ^2 u\) to be well-defined on \({\{{\nabla u = 0}\}}\). In fact, due to the nonlinear structure of the equation, solutions have limited regularity even for smooth data. A sharp result in the 2-dimensional setting about limited regularity for the Hölder as well as Sobolev scale has been obtained by Iwaniec and Manfredi [51].

The nonlinear character of the equation naturally introduces the additional quantities \(S(\nabla u)\) and \(V(\nabla u)\). In the scalar case both expressions are robust with respect to \(p \in (1,\infty )\). Here it is possible to prove \(V(\nabla u) \in W^{1,2}\) via a difference quotient technique [13, 58] and \(S(\nabla u) \in W^{1,2}\) by a functional inequality [20]. The result \(V(\nabla u) \in W^{1,2}\) generalizes to the vectorial case. However, the functional inequality fails in the vectorial case at least point-wise for \(p \le 2(2-\sqrt{2})\approx 1.1715\). Therefore, it is unclear whether a regularity result \(S(\nabla u) \in W^{1,2}\) is achievable for small p. Nevertheless, for \(p > 2(2-\sqrt{2})\) it is shown in [2] that \(S(\nabla u) \in W^{1,2}\). Regularity for \(S(\nabla u)\) on the Besov and Triebel-Lizorkin scale in the plane for \(p \ge 2\) has been obtained in [3]. Estimates of \(S(\nabla u)\) in terms of Riesz potentials were derived in [53, 54].

Regularity results for the parabolic p-Laplace system were derived in [41, Theorem 6.2.1] (cf. [57] for \(p \ge 2\)). It was shown that \(u \in C^1_{\textrm{weak}}\big (0,T;L^2({\mathcal {O}})\big ) \cap C_{\textrm{weak}}\big ( 0,T;W^{1,p}_0({\mathcal {O}})\big )\). It was extended in [30] to the nonlinear tensor \(V(\nabla u)\). The authors showed, by formally testing the equation with \(-\Delta u\) respectively \(-\partial _t^2 u\), that

$$\begin{aligned} V(\nabla u)&\in L^2\left( 0,T;W^{1,2}\left( {\mathcal {O}}\right) \right) \cap W^{1,2}\left( 0,T;L^2\left( {\mathcal {O}}\right) \right) , \end{aligned}$$
(1.3a)
$$\begin{aligned} u&\in L^\infty \left( 0,T;W^{1,2}\left( {\mathcal {O}}\right) \right) \cap W^{1,\infty }\left( 0,T;L^2\left( {\mathcal {O}}\right) \right) . \end{aligned}$$
(1.3b)

Additionally, \(S(\nabla u) \in L^2(0,T;W^{1,2}({\mathcal {O}}))\) was proven in [22] for either \(p\in (1,\infty )\) and \(N=1\) or \(p \ge 2\) and \(N > 1\).

Within the context of the stochastic p-Laplace system it is possible to prove similar spatial regularity as in (1.3). The formal testing needs to be replaced by a suitable application of Itô’s formula as done by Breit [15]. However, the time regularity is limited due to the presence of the stochastic forcing. In the super-quadratic regime partial time regularity can be recovered by exploiting the strong formulation of the p-Laplace system as done by Wichmann [76]. Overall, for appropriate data assumptions it is possible to verify (see Sect. 2.5)

$$\begin{aligned} V(\nabla u)&\in L^2\left( \Omega ;L^2\left( 0,T;W^{1,2}\left( {\mathcal {O}}\right) \right) \cap B^{1/2}_{2,\infty }\left( 0,T;L^2\left( {\mathcal {O}}\right) \right) \right) , \end{aligned}$$
(1.4a)
$$\begin{aligned} u&\in L^2\left( \Omega ; L^\infty \left( 0,T;W^{1,2}\left( {\mathcal {O}}\right) \right) \cap B^{1/2}_{\varPhi _2,\infty }\left( 0,T;L^2\left( {\mathcal {O}}\right) \right) \right) . \end{aligned}$$
(1.4b)

Here \(\varPhi _2(s) := e^{s^2}-1\) and B denotes a Besov-Orlicz space (see Sect. 2.1).

1.4 Approximation

In the past many authors have studied the numerical approximation of the deterministic counterpart of (1.1), e.g. [6,7,8, 10, 11, 16, 19, 25, 30, 33, 35, 37, 75]. The error of the discrete and analytic solution has been expressed in various different quantities. It turned out that the natural quantity to measure the error is given by

$$\begin{aligned} \max _{0\le m \le M} \left\Vert u(t_m) - u_{m,h}\right\Vert _{L^2({\mathcal {O}})}^2 + \tau \sum _{m=1}^M \left\Vert V(\nabla u(t_m)) - V(\nabla u_{m,h})\right\Vert _{L^2({\mathcal {O}})}^2, \end{aligned}$$
(1.5)

where \(V(\xi ) := \left( \kappa + \left|\xi \right| \right) ^\frac{p-2}{2}\xi \) and \(u_{m,h}\) is an approximation of \(u(t_m)\). In [33] it has been proved that the expression \(\left\Vert V(\nabla u) - V(\nabla u_h)\right\Vert _{L^2({\mathcal {O}})}^2\) is equivalent to the energy error \({\mathcal {J}}(u_h) - {\mathcal {J}}(u)\). Here \(u_h\) is a minimizer of the energy on a subspace \(V_h \subset W^{1,p}_0\).

In the steady case, starting with the seminal work by Barrett and Liu [6] and further improvements in [37] and [29], it has been proved that

$$\begin{aligned} \left\Vert V(\nabla u) - V(\nabla u_h)\right\Vert _{L^2({\mathcal {O}})} \lesssim h \left\Vert \nabla V(\nabla u)\right\Vert _{L^2({\mathcal {O}})}. \end{aligned}$$

This settles the question about optimal convergence for piece-wise linear continuous elements. In fact, the paper [30] deals with the parabolic system and optimal convergence under the regularity assumption (1.3) has been achieved, i.e.,

$$\begin{aligned} \begin{aligned}&\max _{0\le m \le M} \left\Vert u(t_m) - u_{m,h}\right\Vert _{L^2({\mathcal {O}})}^2 + \tau \sum _{m=1}^M \left\Vert V(\nabla u(t_m)) - V(\nabla u_{m,h})\right\Vert _{L^2({\mathcal {O}})}^2 \\&\quad \lesssim h^2 + \tau ^2. \end{aligned} \end{aligned}$$
(1.6)

The error quantity (1.6) relies on the fact that the mapping \(t \mapsto \left\Vert V(\nabla u(t))\right\Vert _{L^2({\mathcal {O}})}\) is continuous. However, if the data is not sufficiently smooth, point-values might not be well-defined. In general, dealing with irregular data and therefore irregular solutions is a delicate task. Different methods have been suggested to recover well-defined error quantities.

In [16] the first and the third author develop a numerical scheme based on time averages to circumvent the usage of point evaluations. A more probabilistic approach has been used in [38]. There the authors replace deterministic evaluation points by random ones.

First results for monotone stochastic equations were derived by Gyöngy and Millet [45, 46]. They developed an abstract discretization theory that also covers the system (1.1) in the superquadratic case. In [45] they gave conditions that guarantee plain convergence of their discretization even for the full degenerate, \(\kappa = 0\), system. Contrary, convergence rates have been obtained under a non-degeneracy assumption, \(\kappa > 0\), in [46]. In both cases, they require restrictive assumptions on the regularity of the solution.

In the stochastic case, due to the limited time regularity (1.4), robust error quantities need to be used. Breit, Hofmanová and Loisel [18] use randomized time steps to construct an algorithm that achieves almost optimal convergence in time and optimal convergence in space, i.e., for all \(\alpha \in (0,1/2)\),

$$\begin{aligned}&\mathbb {E}_{{\textbf{t}}} \otimes \mathbb {E} \left[ \max _{0\le m \le M} \left\Vert u({\textbf{t}}_m) - u_{m,h}\right\Vert _{L^2(\mathcal {O})}^2 \right. \\&\qquad \left. + \sum _{m=1}^M \int _{{\textbf{t}}_{m-1}}^{{\textbf{t}}_{m}} \left\Vert V(\nabla u(s)) - V(\nabla u_{m,h})\right\Vert _{L^2(\mathcal {O})}^2 \,\textrm{d}s\right] \\&\quad \lesssim h^2 + \tau ^{2\alpha }. \end{aligned}$$

Here \(\mathbb {E}_{{\textbf{t}}}\) denotes the expectation with respect to the random time steps \({\textbf{t}}_m\) centered around deterministic grid-points \(t_m\). They use Hölder and Sobolev-Slobodeckij spaces to measure the time regularity of the solution. This excludes the limiting case \(\alpha = 1/2\).

The classical Euler–Maruyama scheme has been analyzed in [39]. The authors show, for general multi-valued monotone equations, convergence of the algorithm with convergence rate 1/4. In particular, the p-Laplace system can be treated. The authors infer convergence towards the spatially discrete solution \(v_h \in V_h\) of

$$\begin{aligned} \,\textrm{d}v_h - {\text {div}}S(\nabla v_h) \,\textrm{d}t= G(v_h) \,\textrm{d}W \end{aligned}$$

with rate 1/4 measured in the error

$$\begin{aligned} \max _{m\le M} \mathbb {E} \left[ \left\Vert v_h(t_m) - u_{m,h}\right\Vert _{L^2_x}^2 \right] \lesssim \tau ^{1/2}. \end{aligned}$$

The method has the advantage that no regularity on the abstract solution needs to be assumed. However, only a sub-optimal convergence rate can be obtained.

1.5 Main results

Based on deterministic time averages we propose two algorithms (3.14) and (3.15) that are essentially driven by the update rule, \(\mathbb {P}\)-a.s. for all \(m \ge 2\) and \(\xi _h \in V_h\)

$$\begin{aligned} \left( v_m - v_{m-1}, \xi _h \right) + \tau \left( S(\nabla v_m), \nabla \xi _h \right) = \left( G(v_{m-2})\Delta _m \mathbb {W}, \xi _h \right) . \end{aligned}$$
(1.7)

The randomness enters through the averaged increments \(\Delta _m \mathbb {W}:= \langle {W}\rangle _m - \langle {W}\rangle _{m-1}\), where \(\langle {W}\rangle _m\) is the time averaged value of W on the interval \([t_{m-1},t_m]\).

Importantly, we manage to achieve optimal convergence in time with rate 1/2 without assuming any time Hölder regularity on the solution process. Instead, we measure time regularity in terms of Nikolskii spaces. Our main results, Theorem 19 and 25, verify under the condition

$$\begin{aligned} u&\in L^2\left( \Omega ; B^{1/2}_{2,\infty }\left( 0,T; L^2\left( {\mathcal {O}} \right) \right) \cap L^\infty \left( 0,T; W^{1,2}\left( {\mathcal {O}}\right) \right) \right) , \end{aligned}$$
(1.8a)
$$\begin{aligned} V(\nabla u)&\in L^2\left( \Omega ; B^{1/2}_{2,\infty }\left( 0,T; L^2\left( {\mathcal {O}} \right) \right) \cap L^2 \left( 0,T; W^{1,2}\left( {\mathcal {O}} \right) \right) \right) , \end{aligned}$$
(1.8b)

the optimal convergence

$$\begin{aligned}&\mathbb {E} \left[ \max _{m=1,\ldots ,M} \left\Vert \langle {u}\rangle _m - v_m\right\Vert _{L^2_x}^2 + \sum _{m=1}^M\int _{t_{m-1}}^{t_m} \left\Vert V(\nabla u(s)) - V(\nabla v_m)\right\Vert _{L^2_x}^2 \,\textrm{d}s \right] \\&\quad \lesssim h^2 + \tau . \end{aligned}$$

We want to stress that the regularity assumption (1.8) is even weaker than the provable regularity (1.4).

On the other hand if the solution process u has certain amount of time regularity, e.g. (1.4b), one can recover point evaluations, cf. Lemma 17,

$$\begin{aligned} \mathbb {E}\left[ \max _{m\in {\{{1,\ldots ,M}\}}} \left\Vert u(t_m) - \langle {u}\rangle _m\right\Vert _{L^2_x}^2 \right] \lesssim \tau \ln (1+\tau ^{-1}) \mathbb {E} \left[ {[{u}]}_{ B^{1/2}_{\varPhi _2,\infty } L^2_x}^2 \right] . \end{aligned}$$

For the implementation of (1.7) one needs to sample according to the distribution of the random variable \(\Delta _m \mathbb {W}\). We show in Corollary 33 that \(\Delta _m \mathbb {W}\) is a Gaussian random variable whose variance is slightly reduced compared to the classical increments \(\Delta _m W:= W(t_m) - W(t_{m-1})\). Ultimately, we provide a sampling algorithm (4.9) that samples not only the marginal distributions but the joint distribution of the random vector \((\Delta _m W, \Delta _m \mathbb {W})_{m=1}^M\).

1.6 Outline

In Sect. 2 we formulate the functional analytic setup, construct the multiplicative forcing term G and recall known regularity results. Section 3 introduces the discrete setup and contains the main results Theorem 19 and Theorem 25. Next, Sect. 4 clarifies the construction of the discrete random input vectors and provides a sampling algorithm. Lastly, Sect. 5 contains numerical experiments.

2 Mathematical setup

This section contains classical definitions and preliminary results. It is structured as follows: Sect. 2.1 introduces the function analytical framework. Section 2.2 presents the construction of the stochastic forcing. Section 2.3 is about the nonlinear operators S and V. The solution concept is fixed in Sect. 2.4. Lastly, Sect. 2.5 collects regularity results.

Let \({\mathcal {O}} \subset {\mathbb {R}}^n\) for \(n \ge 1\) be a bounded Lipschitz domain (further assumptions on \({\mathcal {O}}\) will be needed for the regularity of solutions). For some given \(T>0\) we denote by \(I := [0,T]\) the time interval and write \({\mathcal {O}}_T := I \times {\mathcal {O}}\) for the time space cylinder. Moreover let \(\left( \Omega ,{\mathcal {F}}, ({\mathcal {F}}_t)_{t\in I}, \mathbb {P} \right) \) denote a stochastic basis, i.e. a probability space with a complete and right continuous filtration \(({\mathcal {F}}_t)_{t\in I}\). We write \(f \lesssim g\) for two non-negative quantities f and g if f is bounded by g up to a multiplicative constant. Accordingly we define \( > rsim \) and \(\eqsim \). We denote by c a generic constant which can change its value from line to line.

2.1 Function spaces

As usual, \(L^q(\mathcal {O})\) denotes the Lebesgue space and \(W^{1,q}(\mathcal {O})\) the Sobolev space, where \(1\le q < \infty \). We denote by \(W^{1,q}_0(\mathcal {O})\) the Sobolev spaces with zero boundary values. It is the closure of \(C^\infty _0(\mathcal {O})\) (smooth functions with compact support) in the \(W^{1,q}(\mathcal {O})\)-norm. We denote by \(W^{-1,q'}(\mathcal {O})\) the dual of \(W^{1,q}_0(\mathcal {O})\). We do not distinguish in the notation between vector- and matrix-valued functions.

For a Banach space \(\left( X, \left\Vert \cdot \right\Vert _X \right) \) let \(L^q(I;X)\) be the Bochner space of Bochner-measurable functions \(u: I \rightarrow X\) satisfying \(t \mapsto \left\Vert u(t)\right\Vert _X \in L^q(I)\). Moreover, C(IX) is the space of continuous functions with respect to the norm-topology. We also use \(C^{\alpha }(I;X)\) for the space of Hölder continuous functions. Given an Orlicz-function \(\varPhi : [0,\infty ] \rightarrow [0,\infty ]\), i.e. a convex function satisfying \( \lim _{t \rightarrow 0} \varPhi (t)/t = 0\) and \(\lim _{t \rightarrow \infty } \varPhi (t)/t = \infty \) we define the Luxemburg-norm

$$\begin{aligned} \left\Vert u\right\Vert _{L^\varPhi (I;X)} := \inf \left\{ \lambda > 0 : \int _I \varPhi \left( \frac{\left\Vert u\right\Vert _X}{\lambda } \right) \,\textrm{d}s\le 1 \right\} . \end{aligned}$$

The Orlicz space \(L^\varPhi (I;X)\) is the space of all Bochner-measurable functions with finite Luxemburg-norm. For more details on Orlicz-spaces we refer to [32]. Given \(h \in I\) and \(u :I \rightarrow X\) we define the difference operator \(\tau _h: {\{{u: I \rightarrow X}\}} \rightarrow {\{{u: I\cap I - {\{{h}\}} \rightarrow X}\}} \) via \(\tau _h(u) (s) := u(s+h) - u(s)\). The Besov-Orlicz space \(B^\alpha _{\varPhi ,r}(I;X)\) with differentiability \(\alpha \in (0,1)\), integrability \(\varPhi \) and fine index \(r \in (1,\infty ]\) is defined as the space of Bochner-measurable functions with finite Besov-Orlicz norm \(\left\Vert \cdot \right\Vert _{B^\alpha _{\varPhi ,r}(I;X)}\), where

$$\begin{aligned} \left\Vert u\right\Vert _{B^\alpha _{\varPhi ,r}(I;X)}&:= \left\Vert u\right\Vert _{L^{\varPhi }(I ;X)} + {[{u}]}_{B^\alpha _{\varPhi ,r}(I;X)}, \\ {[{u}]}_{B^\alpha _{\varPhi ,r}(I;X)}&:= {\left\{ \begin{array}{ll} \left( \int _{I} h^{-r\alpha } \left\Vert \tau _h u\right\Vert _{L^\varPhi (I\cap I - {\{{h}\}};X)}^r \frac{\,\textrm{d}h}{h} \right) ^\frac{1}{r} &{} \text { if } r \in [1,\infty ), \\ \mathop {\textrm{ess}\,\textrm{sup}}_{h \in I} h^{-\alpha } \left\Vert \tau _h u\right\Vert _{L^\varPhi (I\cap I - {\{{h}\}};X)} &{} \text { if } r = \infty . \end{array}\right. } \end{aligned}$$

The case \(r = \infty \) is commonly called Nikolskii-Orlicz space. When \(\varPhi (t) = t^p\) we write \(B^\alpha _{p,r}(I;X)\) and call the space Besov space.

Similarly, given a Banach space \(\left( Y, \left\Vert \cdot \right\Vert _Y \right) \), we define \(L^q(\Omega ;Y)\) as the Bochner space of Bochner-measurable functions \(u: \Omega \rightarrow Y\) satisfying \(\omega \mapsto \left\Vert u(\omega )\right\Vert _Y \in L^q(\Omega )\). The space \(L^q_{\mathcal {F}}(\Omega \times I;X)\) denotes the subspace of X-valued \((\mathcal {F}_t)_{t\in I}\)-progressively measurable processes. Let \(\left( U,\left\Vert \cdot \right\Vert _U \right) \) be a separable Hilbert space. \(L_2(U;L^2(\mathcal {O}))\) denotes the space of Hilbert-Schmidt operators from U to \(L^2(\mathcal {O})\) with the norm \(\left\Vert z\right\Vert _{L_2(U;L^2_x)}^2:= \sum _{j\in \mathbb {N}} \left\Vert z(u_j)\right\Vert _{L^2(\mathcal {O})}^2 \) where \({\{{u_j}\}}_{j \in \mathbb {N}}\) is some orthonormal basis of U. We abbreviate the notation \(L^q_\omega L^q_t L^q_x := L^q(\Omega ;L^q(I;L^q(\mathcal {O}))) \) with obvious modifications for Sobolev, Besov and Hölder spaces. Additionally, we write \(L^{q-} := \bigcap _{r< q} L^r\).

2.2 Stochastic integrals

In order to construct the stochastic forcing term, we impose the following conditions:

Assumption 1

  1. (a)

    Let \(\left( U, \left\Vert \cdot \right\Vert _U \right) \) be a separable Hilbert space. We assume that W is an U-valued cylindrical Wiener process with respect to \((\mathcal {F}_t)_{t\in I}\). Formally W can be represented as

    $$\begin{aligned} W = \sum _{j \in \mathbb {N}} u_j \beta ^j, \end{aligned}$$
    (2.1)

    where \({\{{u_j}\}}_{j \in \mathbb {N}}\) is an orthonormal basis of U and \({\{{\beta ^j}\}}_{j\in \mathbb {N}}\) are independent 1-dimensional standard Brownian motions.

  2. (b)

    Let \(v \in L^2_{\mathcal {F}}(\Omega \times I; L^2_x)\). We assume that \(G(v)(\cdot ): U \rightarrow L_{\mathcal {F}}^2(\Omega \times I; L^2_x)\) is given by

    $$\begin{aligned} u \mapsto G(v)(u) := \sum _{j \in \mathbb {N}} g_j(\cdot ,v)( u_j, u)_U , \end{aligned}$$

    where \({\{{g_j}\}}_{j\in \mathbb {N}} \in C(\mathcal {O} \times {\mathbb {R}}^N; {\mathbb {R}}^N)\) with

    1. (i)

      (sublinear growth) for all \(x \in \mathcal {O}\) and \(\xi \in {\mathbb {R}}^N\) it holds

      $$\begin{aligned} \sum _{j\in \mathbb {N}} \left|g_j(x,\xi )\right|^2 \le c_{\text {growth}}(1+\left|\xi \right|^2), \end{aligned}$$
      (2.2)
    2. (ii)

      (Lipschitz continuity) for all \(x \in \mathcal {O}\) and \(\xi _1, \xi _2 \in {\mathbb {R}}^N\) it holds

      $$\begin{aligned} \sum _{j\in \mathbb {N}} \left|g_j(x,\xi _1) -g_j(x,\xi _2) \right|^2 \le c_{\text {lip}} \left|\xi _1 - \xi _2\right|^2. \end{aligned}$$
      (2.3)

Now it is a classical result, see for example the book of Prévôt and Röckner [66], that we can construct a corresponding stochastic integral.

Proposition 2

Let Assumption 1 be true. Then the operator \(\mathcal {I}\) defined through

$$\begin{aligned} \mathcal {I}(G(v)) := \int _0^{\bullet } G(v)(\,\textrm{d}W_s) := \sum _{j \in \mathbb {N}} \int _0^{\bullet } g_j(\cdot ,v) \,\textrm{d}\beta ^j_s \end{aligned}$$
(2.4)

defines a bounded linear operator from \(L^2_{\mathcal {F}}(\Omega \times I;L_2(U; L^2_x))\) to \( L^2_\omega C_t L^2_x \). Moreover,

  1. (a)

    \(\mathcal {I}(G(v))\) is an \(L^2_x\)-valued martingale with respect to \(\left( \mathcal {F}_t\right) _{t \in I}\),

  2. (b)

    (Itô isometry) for all \(t \in I\) it holds

    $$\begin{aligned} \mathbb {E} \left[ \left\Vert \mathcal {I}(G(v)) (t)\right\Vert _{L^2_x}^2 \right] =\mathbb {E} \left[ \int _0^t \left\Vert G(v)\right\Vert ^2_{L_2(U;L^2_x)} \,\textrm{d}s\right] . \end{aligned}$$
    (2.5)

2.3 Perturbed gradient flow

Let \(\kappa \ge 0\) and \(p \in (1,\infty )\). For \(\xi \in \mathbb {R}^{N \times n}\) we define

$$\begin{aligned} S(\xi ) := \varphi '(\left|\xi \right|) \frac{\xi }{\left|\xi \right|} = \left( \kappa + \left|\xi \right| \right) ^{p-2} \xi \end{aligned}$$
(2.6)

and

$$\begin{aligned} V(\xi ) := \sqrt{\varphi '(\left|\xi \right|)} \frac{\xi }{\left|\xi \right|} = \left( \kappa + \left|\xi \right| \right) ^\frac{p-2}{2} \xi , \end{aligned}$$
(2.7)

where \(\varphi (t):= \int _0^t \left( \kappa + s \right) ^{p-2} s \,\textrm{d}s\). The nonlinear functions S and V are closely related. In particular the following lemmata are of great importance. The proofs can be found in e.g. [31]. For more details we refer to [10, 27, 29].

Lemma 3

(V-coercivity) Let \(\xi _1, \xi _2 \in \mathbb {R}^{N \times n}\). Then it holds

$$\begin{aligned} \begin{aligned} (S(\xi _1) - S(\xi _2)) : (\xi _1 - \xi _2)&\eqsim \left|V(\xi _1) - V(\xi _2)\right|^2 \\&\eqsim \left( \kappa + \left|\xi _1\right| + \left|\xi _1 - \xi _2\right| \right) ^{p-2} \left|\xi _1 - \xi _2\right|^2. \end{aligned} \end{aligned}$$
(2.8)

Lemma 4

(generalized Young’s inequality) Let \(\xi _1, \xi _2, \xi _3 \in \mathbb {R}^{N \times n}\) and \(\delta >0\). Then there exists \(c_\delta \ge 1\) such that

$$\begin{aligned} \left( S(\xi _1) - S(\xi _2) \right) : \left( \xi _2 - \xi _3 \right) \le \delta \left|V(\xi _1) - V(\xi _2)\right|^2 + c_\delta \left|V(\xi _2) - V(\xi _3)\right|^2. \end{aligned}$$
(2.9)

Lemma 5

Let \(\xi _1, \xi _2, \xi _3 \in \mathbb {R}^{N \times n}\) and \(\delta >0\). Then there exists \(c_\delta \ge 1\) such that

$$\begin{aligned} \left( S(\xi _1) - S(\xi _2) \right) : \xi _3 \le \delta \left|V(\xi _1) - V(\xi _2)\right|^2 + c_\delta \left( \kappa + \left|\xi _1\right| + \left|\xi _1 - \xi _2\right|\right) ^{p-2}\left|\xi _3\right|^2. \end{aligned}$$
(2.10)

Remark 6

A continuous, convex and strictly increasing function \(\varphi :[0,\infty ) \rightarrow [0,\infty )\) satisfying

$$\begin{aligned} \lim _{t\rightarrow 0} \frac{\varphi (t)}{t} = \lim _{t \rightarrow \infty } \frac{t}{\varphi (t)} = 0 \end{aligned}$$

is called an N-function. It is called uniformly convex if \(\varphi \in C^1[0,\infty ) \cap C^2(0,\infty )\) and \(\varphi '(t) \eqsim \varphi ''(t) t\) uniformly in \(t > 0\). Lemmas 3 and 4 are still valid if one replaces \(\varphi \) in (2.6) and (2.7) by any uniformly convex N-function. For more details we refer to e.g. [27].

Given some initial condition \(u_0 : \Omega \times \mathcal {O} \rightarrow {\mathbb {R}}^N\) and a stochastic force (GW) in the sense of Assumption 1 we are interested in the system

$$\begin{aligned} \,\textrm{d}u - {\text {div}}S(\nabla u) \,\textrm{d}t= & {} G(u) \,\textrm{d}W ~~\text { in } \Omega \times \mathcal {O}_{T}, \end{aligned}$$
(2.11a)

with boundary and initial conditions given by

$$\begin{aligned} u= & {} 0 ~~\text { on } \Omega \times I \times \partial \mathcal {O}, \end{aligned}$$
(2.11b)
$$\begin{aligned} u(0)= & {} u_0 ~\text { on } \Omega \times \mathcal {O}. \end{aligned}$$
(2.11c)

The system (2.11a) is a perturbed version of the gradient flow of the energy \(\mathcal {J}: W^{1,p}_{0,x} \rightarrow [0,\infty )\) given by

$$\begin{aligned} \mathcal {J}(u):= \int _{\mathcal {O}} \varphi (\left|\nabla u\right|) \,\textrm{d}x. \end{aligned}$$
(2.12)

2.4 Weak and strong solutions

We fix the concept of solutions as follows.

Definition 7

Let \(u_0 \in L^2_\omega L^2_x\) be \(\mathcal {F}_0\)-measurable, \(p \in (1,\infty )\) and (GW) be given by Assumption 1. An \((\mathcal {F}_t)\)-adapted process \(u \in L^2_x\) is called weak solution to (2.11) if

  1. (a)

    \(u \in L^2_\omega C_t L^2_x \cap L^p_\omega L^p_t W^{1,p}_{0,x}\),

  2. (b)

    for all \(t \in I\), \(\xi \in C^\infty _{0,x}\) and \(\mathbb {P}-a.s.\) it holds

    $$\begin{aligned} \int _{\mathcal {O}} (u(t) - u_0) \cdot \xi \,\textrm{d}x+ \int _0^t \int _{\mathcal {O}} S(\nabla u) :\nabla \xi \,\textrm{d}x\,\textrm{d}s= \int _{\mathcal {O}} \int _0^t G(u)\,\textrm{d}W_s \cdot \xi \,\textrm{d}x. \end{aligned}$$
    (2.13)

The process u is called strong solution if it is a weak solution and additionally satisfies

  1. (a)

    \({\text {div}}S(\nabla u) \in L^2_\omega L^2_t L^2_x\) ,

  2. (b)

    for all \(t \in I\) and \(\mathbb {P}-a.s.\) it holds

    $$\begin{aligned} u(t) - u_0 - \int _0^t {\text {div}}S(\nabla u) \,\textrm{d}s= \int _0^t G(u) \,\textrm{d}W_s \end{aligned}$$
    (2.14)

    as an equation in \(L^2_x\).

The next step is to answer the question about existence of weak or even strong solutions. Weak solutions can be constructed through a variational approach that uses the monotonicity of the nonlinear diffusion operator S as presented in [61, Example 4.1.9]. They rely on the Gelfand triple \(W^{1,p}_{0,x} \hookrightarrow L^2_x \hookrightarrow W^{-1,p'}_x\) for \(p \ge 2\). A quick observation shows that the Gelfand triple remains valid for \(p \ge 2n/(n+2)\). Below the threshold we need to modify the energy space. A good choice is the triple \(W^{1,p}_{0,x} \cap L^2_x \hookrightarrow L^2_x \hookrightarrow \big ( W^{1,p}_{0,x} \cap L^2_x \big )'\). Now, similar arguments as done in [61, Section 4] lead to the existence of a unique weak solution. Proving existence of a strong solution is more delicate and usually requires, not only assumptions on the growth of G, but also on the gradient of G, e.g. for all \(x \in \mathcal {O}\), \(\xi \in {\mathbb {R}}^N\)

$$\begin{aligned} \sum _{j \in \mathbb {N}} \left|\nabla _{x} g_j(x,\xi )\right|^2&\le c(1+\left|\xi \right|)^2, \end{aligned}$$
(2.15a)
$$\begin{aligned} \sum _{j \in \mathbb {N}} \left|\nabla _{\xi } g_j(x,\xi )\right|^2&\le c. \end{aligned}$$
(2.15b)

In [43] a general approach for the construction of strong solutions to gradient flow like equations is presented. In particular, it includes the case of the p-Laplace equations.

Theorem 8

([43], Theorem 4.12) Assume \(p\ge 2\) and \(\mathcal {O}\) is a bounded convex domain. Let (GW) be given by Assumption 1. Additionally, assume (2.15) and \(u_0 \in L^{p+\varepsilon }_\omega W^{1,p}_x\cap L_\omega ^{2+\varepsilon } L^2_x\) be \(\mathcal {F}_0\)-measurable for some \(\varepsilon > 0\). Then there exists a unique strong solution u to (2.11). Moreover,

$$\begin{aligned} \mathbb {E}\left[ \sup _{t\in I} \left\Vert u\right\Vert _{W^{1,p}_x}^p + \int _0^T \left\Vert {\text {div}}S(\nabla u)\right\Vert _{L^2_x}^2 \,\textrm{d}t\right] \lesssim \mathbb {E} \left[ \left\Vert u_0\right\Vert _{W^{1,p}_x}^p \right] + 1. \end{aligned}$$
(2.16)

2.5 Regularity of strong solutions

A key ingredient in the error analysis of numerical algorithms are the improved regularity properties of strong solutions in comparison to those of weak solutions. Concerning time regularity, we prove in [76] that the strong solution enjoys 1/2-time differentiability in an exponential Besov space and even the nonlinear greadient \(V(\nabla u)\) obeys 1/2-time differentiability in a Nikolskii space. The proof uses an assumption on the boundary condition of the noise coefficient G, i.e. for all \(x \in \partial \mathcal {O}\),

$$\begin{aligned} \sum _{j\in \mathbb {N}} \left|g_j(x,0)\right|^2 = 0. \end{aligned}$$
(2.17)

Theorem 9

([76] Theorem 3.8 & Theorem 3.11) Let the assumptions of Theorem 8 be satisfied. Additionally, assume (2.17). Let u be the strong solution to (2.11). Then

$$\begin{aligned} u&\in L^2_\omega B^{1/2}_{\varPhi _2,\infty } L^2_x, \end{aligned}$$
(2.18a)
$$\begin{aligned} V(\nabla u)&\in L^2_\omega B^{1/2}_{2,\infty } L^2_x, \end{aligned}$$
(2.18b)

where \(\varPhi _2(t) = e^{t^2}-1\).

Spatial regularity is closely connected to the existence of strong solutions. Local regularity has been obtained in [15].

Theorem 10

([15], Theorem 4) Let \(u_0 \in L^2_\omega W^{1,2}_x\) be \(\mathcal {F}_0\)-measurable. Let Assumption 1 be satisfied. Additionally, assume (2.15) and denote by u the strong solution of (2.11). Then,

$$\begin{aligned} u&\in L^2_\omega L^\infty _t W^{1,2}_{x,\text {loc}}, \end{aligned}$$
(2.19a)
$$\begin{aligned} V(\nabla u)&\in L^2_\omega L^2_t W^{1,2}_{x,\text {loc}}. \end{aligned}$$
(2.19b)

On sufficiently regular domains, it is possible to relate the divergence of the nonlinear operator S to the full gradient as presented in [2, 21], i.e.

$$\begin{aligned} \left\Vert {\text {div}}S(\nabla u)\right\Vert _{L^2_\omega L^2_t L^2_x} \eqsim \left\Vert \nabla S(\nabla u)\right\Vert _{L^2_\omega L^2_t L^2_x}. \end{aligned}$$
(2.20)

Precisely, given some bounded Lipschitz domain \(\mathcal {O}\) such that \(\partial \mathcal {O} \in W^{2,1}\), i.e. \(\mathcal {O}\) is locally the subgraph of a Lipschitz continuous function of \(n-1\) variables, which is also twice weakly differentiable. Denote by \(\mathcal {B}\) the second fundamental form on \(\partial \mathcal {O}\), by \(\left|\mathcal {B}\right|\) its norm and define

$$\begin{aligned} \mathcal {K}_\mathcal {O}(r) := \sup _{E \subset \partial \mathcal {O} \cap B_r(x), x \in \partial \mathcal {O}} \frac{\int _E \left|\mathcal {B}\right|\,\textrm{d}\mathcal {H}^{n-1}}{\text {cap}_{B_1(x)}(E)}, \end{aligned}$$
(2.21)

where \(B_r(x)\) denotes the ball of radius r around x, \(\text {cap}_{B_1(x)}(E)\) is the capacity of the set E relative to the ball \(B_1(x)\) and \(\mathcal {H}^{n-1}\) is the \(n-1\) dimensional Hausdorff measure. The following result follows from [21, Theorem 2.1].

Lemma 11

Assume that \(\mathcal {O}\) is either

  1. (a)

    bounded and convex,

  2. (b)

    or bounded, Lipschitz and \(\partial \mathcal {O} \in W^{2,1}\) with \(\lim _{r\rightarrow 0} \mathcal {K}_\mathcal {O}(r) \le c\).

Let \(v \in L^p_\omega L^p_t W^{1,p}_{0,x}\) with \({\text {div}}S(\nabla v) \in L^2_\omega L^2_t L^2_x\). Then \(\nabla S(\nabla v) \in L^2_\omega L^2_t L^2_x\) and

$$\begin{aligned} \mathbb {E} \left[ \left\Vert \nabla S(\nabla v)\right\Vert _{L^2_t L^2_x}^2 \right] \eqsim \mathbb {E} \left[ \left\Vert {\text {div}}S(\nabla u)\right\Vert _{L^2_t L^2_x}^2 \right] . \end{aligned}$$

In the non-degenerate setting, \(\kappa > 0\), this allows to deduce global spatial regularity.

Corollary 12

([76], Corollary 2.14) Let the assumptions of Theorem 8 be satisfied and \(\kappa > 0\). Let u be the strong solution to (2.11). Then,

$$\begin{aligned} V(\nabla u)&\in L^2_\omega L^2_t W^{1,2}_{x}. \end{aligned}$$
(2.22)

3 Numerical scheme for the averaged system

In this section we will first present the discrete structures. Afterwards we construct two algorithms that approximate the solution to (2.11). Finally, we prove convergence of the approximation towards the analytic solution with optimal rates.

3.1 Space discretization

Let \(\mathcal {O} \subset {\mathbb {R}}^n\) be a bounded polyhedral domain. By \(\mathcal {T}_h\) denote a regular partition (triangulation) of \(\mathcal {O}\) (no hanging nodes), which consists of closed n-simplices called elements. For each element (n-simplex) \(T\in \mathcal {T}_h\) we denote by \(h_T\) the diameter of T, and by \(\rho _T\) the supremum of the diameters of inscribed balls.

We assume that \(\mathcal {T}_h\) is shape regular, that is there exists a constant \(\gamma \) (the shape regularity constant) such that

$$\begin{aligned} \max _{T \in \mathcal {T}_h} \frac{h_T}{\rho _T} \le \gamma . \end{aligned}$$
(3.1)

We define the maximal mesh-size by

$$\begin{aligned} h&:= \max _{T \in \mathcal {T}_h} h_T. \end{aligned}$$

We assume further that our triangulation is quasi-uniform, i.e.

$$\begin{aligned} h_T \eqsim h \qquad \text {for all }T \in \mathcal {T}_h. \end{aligned}$$
(3.2)

For \(\ell \in \mathbb {N}_0\) we denote by \({\mathscr {P}}_\ell (\mathcal {O})\) the polynomials on \(\mathcal {O}\) of degree less than or equal to \(\ell \).

For fixed \(r \in \mathbb {N}\) we define the vector valued finite element space \(V_h\) as

$$\begin{aligned} \begin{aligned} V_h&:= {\{{v \in W^{1,1}_{0,x} \,:\, v|_T \in {\mathscr {P}}_r(T) \,\,\forall T\in \mathcal {T}_h}\}}. \end{aligned} \end{aligned}$$
(3.3)

Moreover, let \(\Pi _2: L^2_x \rightarrow V_h\) be the \(L^2_x\)-orthogonal projection defined by

$$\begin{aligned} \forall \xi _h \in V_h: \quad \left( \Pi _2 v, \xi _h \right) = \left( v,\xi _h \right) \end{aligned}$$

or equivalently

$$\begin{aligned} \Pi _2 v = \mathop {\textrm{arg}\,\textrm{min}}_{v_h \in V_h} \left\Vert v-v_h\right\Vert _{L^2_x}. \end{aligned}$$

We will need some classical results on the stability properties of the \(L^2_x\)-projection for finite elements.

Lemma 13

Let \(r \ge 1\), \(V_h\) be given by (3.3) and \(\mathcal {T}_h\) be quasi-uniform. Then

$$\begin{aligned}{} & {} \forall v \in W^{1,2}_x: \quad \left\Vert v-\Pi _2v\right\Vert _{L^2_x} + h\left\Vert \nabla (v-\Pi _2 v)\right\Vert _{L^2_x} \lesssim h\left\Vert \nabla v\right\Vert _{L^2_x},\\{} & {} \forall v \in W^{2,2}_x: \quad \left\Vert v-\Pi _2v\right\Vert _{L^2_x} + h\left\Vert \nabla (v-\Pi _2 v)\right\Vert _{L^2_x} \lesssim h^2\left\Vert \nabla ^2 v\right\Vert _{L^2_x}. \end{aligned}$$

Due to the nonlinear structure of the p-Laplace system we also need an adapted stability result.

Proposition 14

([16], Theorem 7) Let \(r \ge 1\), \(V_h\) be given by (3.3) and \(\mathcal {T}_h\) be quasi-uniform. Then

$$\begin{aligned} \left\Vert V(\nabla v) - V(\nabla \Pi _2 v)\right\Vert _{L^2_x} \lesssim h \left\Vert \nabla V(\nabla v)\right\Vert _{L^2_x}. \end{aligned}$$

3.2 Time discretization

Let \(\{0=t_0<\cdots <t_M=T\}\) be a uniform partition of [0, T] with mesh size \(\tau =T/M\). For \(m \ge 1\) define \(I_m := [t_{m-1},t_m]\). By \(\left|I_m\right|\) we denote the Lebesgue measure of \(I_m\). By we denote the mean value integral over the set \(I_m\). We also abbreviate for the mean value.

Let us discuss the stability properties of piecewise constant approximations generated by the mean values in terms of Nikolskii spaces.

Lemma 15

Let \(\alpha \in (0,1)\) and \(r \in [1,\infty )\). Additionally, assume \(u \in B^{\alpha }_{r,\infty } L^2_x\). Then

$$\begin{aligned} \left( \sum _{m=1}^M \int _{I_m} \left\Vert u - \langle {u}\rangle _m\right\Vert _{L^2_x}^r \,\textrm{d}s\right) ^\frac{1}{r} \lesssim \tau ^{\alpha } {[{u}]}_{B^{\alpha }_{r,\infty } L^2_x}. \end{aligned}$$
(3.4)

Proof

Due to Jensen’s inequality

A change of variables \(t = s+h\) and Fubini’s theorem yield

The proof is finished. \(\square \)

Remark 16

Lemma 15 is also valid for \(r=\infty \). Here we need to substitute the left hand side in (3.4) by

$$\begin{aligned} \max _{m \in {\{{1,\ldots ,M}\}}} \sup _{s \in I_m} \left\Vert u(s) - \langle {u}\rangle _m\right\Vert _{L^2_x}. \end{aligned}$$

Note that \(B^{\alpha }_{\infty ,\infty }(I) = C^{\alpha }(I)\) is the space of \(\alpha \)-Hölder continuous functions.

In our application the process u only belongs to \(L^2_\omega B^{1/2}_{\varPhi _2,\infty } L^2_x \backslash L^2_\omega C^{1/2}_t L^2_x \). Therefore we need to adjust the stability result in terms of exponentially integrable Nikolskii spaces.

Lemma 17

Let \(\alpha \in (0,1)\) and \(u \in B^{\alpha }_{\varPhi _2,\infty } L^2_x\) where \(\varPhi _2(t) = e^{t^2}-1\). Then

$$\begin{aligned} \max _{m\in {\{{1,\ldots ,M}\}}} \left\Vert u(t_m) - \langle {u}\rangle _m\right\Vert _{L^2_x} \lesssim \tau ^\alpha \sqrt{\ln (1+\tau ^{-1})} {[{u}]}_{ B^{\alpha }_{\varPhi _2,\infty } L^2_x}. \end{aligned}$$
(3.5)

Proof

Define \(I_m^k:= [t_{m} - 2^{-k}\tau , t_m]\). Due to the embedding \(B^{\alpha }_{\varPhi _2,\infty } \hookrightarrow C_t\) for all \(\alpha > 0\) we find u to be continuous as an \(L^2_x\)-valued process. Therefore, it holds

$$\begin{aligned} \lim _{k \rightarrow \infty } \langle {u}\rangle _{I_m^k} = u(t_m). \end{aligned}$$

Observe, \(\left|I_m^{k}\right| = 2^{-k} \tau \) and \(I_m^{k-1} = [t_m - 2^{-(k-1)}\tau ,t_m - 2^{-k}\tau ] \cup [t_m - 2^{-k}\tau ,t_m]\). A shift in the integral results in

Jensen’s inequality implies

Choose \(\lambda _k \) by

$$\begin{aligned} \lambda _k := \inf \left\{ \mu > 0: \int _{I \cap I - {\{{2^{-k}\tau }\}}} \varPhi _2 \left( \frac{\left\Vert u(s) - u(s - 2^{-k}\tau )\right\Vert _{L^2_x}}{\mu } \right) \,\textrm{d}s\le 1 \right\} . \end{aligned}$$

In particular, since \(I_m^k \subset I \cap I- {\{{2^{-k}\tau }\}}\),

$$\begin{aligned} \int _{I_m^k} \varPhi _2 \left( \frac{\left\Vert u(s) - u(s - 2^{-k} \tau ) \right\Vert _{L^2_x}}{\lambda _k} \right) \,\textrm{d}s\le 1. \end{aligned}$$

Thus,

Since \(u \in B^{\alpha }_{\varPhi _2,\infty } L^2_x \) it holds

$$\begin{aligned} \sup _{k\in \mathbb {N}} (2^{-k}\tau )^{-\alpha } \lambda _k \le {[{u}]}_{B^{\alpha }_{\varPhi _2,\infty } L^2_x}. \end{aligned}$$

It follows

$$\begin{aligned} \begin{aligned}&\sum _{k=1}^\infty 2^{-1} \lambda _k \max _{m\in {\{{1,\ldots ,M}\}}} \varPhi _2^{-1} \left( \left|I_m^k\right|^{-1} \right) \\&\quad \le \sup _{k\in \mathbb {N}} \left( (2^{-k}\tau )^{-\alpha } \lambda _k \right) \, \sum _{k=1}^\infty 2^{-1} (2^{-k}\tau )^{\alpha } \max _{m\in {\{{1,\ldots ,M}\}}} \varPhi _2^{-1} \left( \left|I_m^k\right|^{-1} \right) \\&\quad \le {[{u}]}_{B^{\alpha }_{\varPhi _2,\infty } L^2_x} \, \sum _{k=1}^\infty 2^{-1} (2^{-k}\tau )^{\alpha } \sqrt{\ln (1+ (2^{-k}\tau )^{-1})} . \end{aligned} \end{aligned}$$
(3.6)

Note that

$$\begin{aligned} \ln \left( 1+ (2^{-k} \tau )^{-1} \right)&= \ln (\tau + 2^k) + \ln (\tau ^{-1}) \le \ln (1 + 2^k) + \ln (1+\tau ^{-1}). \end{aligned}$$

Therefore,

$$\begin{aligned} \begin{aligned}&\sum _{k=1}^\infty 2^{-1} (2^{-k}\tau )^{\alpha } \sqrt{\ln (1+ (2^{-k}\tau )^{-1})} \\&\quad \le \tau ^{\alpha } \sum _{k=1}^\infty 2^{-1} 2^{-\alpha k} \left( \sqrt{\ln (1+ 2^k)} + \sqrt{\ln (1+\tau ^{-1})} \right) \\&\quad \le \tau ^{\alpha }\left( 1 + \sqrt{\ln (1+\tau ^{-1})} \right) \sum _{k=1}^\infty 2^{-1} 2^{-\alpha k} \sqrt{\ln (1+ 2^k)} \\&\quad \lesssim \tau ^{\alpha } \sqrt{\ln (1+\tau ^{-1})} . \end{aligned} \end{aligned}$$
(3.7)

The assertion follows by using the estimate (3.7) in (3.6). \(\square \)

3.3 The averaged algorithm

Let u be the strong solution of (2.11), i.e. for all \(t \in I\) and \(\mathbb {P}\)-a.s.

$$\begin{aligned} u(t) - u_0 - \int _0^t {\text {div}}S(\nabla u_\nu ) \,\textrm{d}\nu - \int _0^t G(u_\nu ) \,\textrm{d}W_\nu =0. \end{aligned}$$
(3.8)

Take the average over \(I_1\) in (3.8) to get

(3.9)

To obtain the general evolution of the time averaged values of the solution we subtract (3.8) for t and \(t-\tau \) and take the average over \(I_m\)

(3.10)

We denote by \((\cdot ,\cdot )\) the \(L^2_x\)-inner product. The corresponding weak formulation reads for all \(\xi \in L^2_x \cap W^{1,p}_{0,x}\)

(3.11)

Due to the (stochastic) Fubini’s Theorem we can equivalently write

$$\begin{aligned} \left( \langle {u}\rangle _1 - u_0, \xi \right) + \int _{\mathbb {R}}a_{0}(\nu ) \left( S(\nabla u_\nu ) , \nabla \xi \right) \,\textrm{d}\nu&= \left( \int _{\mathbb {R}}a_{0}(\nu ) G(u_\nu ) \,\textrm{d}W_\nu , \xi \right) , \end{aligned}$$
(3.12a)
$$\begin{aligned} \left( \langle {u}\rangle _m - \langle {u}\rangle _{m-1}, \xi \right) + \int _{\mathbb {R}}a_{m-1}(\nu ) \left( S(\nabla u_\nu ) , \nabla \xi \right) \,\textrm{d}\nu&= \left( \int _{\mathbb {R}}a_{m-1}(\nu ) G(u_\nu ) \,\textrm{d}W_\nu , \xi \right) , \end{aligned}$$
(3.12b)

where the weights are given by

$$\begin{aligned} a_0(\nu )&:= \frac{\nu }{\tau } 1_{I_1}(\nu ), \end{aligned}$$
(3.13a)
$$\begin{aligned} a_{m-1}(\nu )&:= \frac{\nu - t_{m-2}}{\tau } 1_{I_{m-1}}(\nu ) + \frac{t_m - \nu }{\tau } 1_{I_{m}}(\nu ). \end{aligned}$$
(3.13b)

The above considerations motivate the construction of the following numerical scheme:

We initialize the algorithm as

$$\begin{aligned} v_0 := \Pi _2 u_0. \end{aligned}$$
(3.14a)

In order to accurately reflect the special character of the first step (3.9), we define \(v_1\) via

$$\begin{aligned} \left( v_1 - v_0, \xi _h \right) + \frac{\tau }{2} \left( S(\nabla v_1), \nabla \xi _h \right)&= \left( G(v_0)\langle {W}\rangle _1 , \xi _h \right) . \end{aligned}$$
(3.14b)

Moreover the evolution for \(m \in {\{{2,\ldots ,M}\}}\) is determinded via

$$\begin{aligned} \left( v_m - v_{m-1}, \xi _h \right) + \tau \left( S(\nabla v_m), \nabla \xi _h \right) = \left( G(v_{m-2}) \left( \langle {W}\rangle _m - \langle {W}\rangle _{m-1} \right) , \xi _h \right) \end{aligned}$$
(3.14c)

for all \(\xi _h \in V_h\) and \(\mathbb {P}\)-a.s.

The need of a special step size in the first step (3.14b) might be undesirable. It can be overcome by performing a full initial step. We propose the following second algorithm.

Initialize

$$\begin{aligned} w_0 := \Pi _2 u_0. \end{aligned}$$
(3.15a)

Full initial step

$$\begin{aligned} \left( w_1 - w_0, \xi _h \right) + \tau \left( S(\nabla w_1), \nabla \xi _h \right)&= \left( G(w_0)\langle {W}\rangle _1 , \xi _h \right) . \end{aligned}$$
(3.15b)

Time stepping, for \(m \ge 2\),

$$\begin{aligned} \left( w_m - w_{m-1}, \xi _h \right) + \tau \left( S(\nabla w_m), \nabla \xi _h \right)&= \left( G(w_{m-2})(\langle {W}\rangle _{m}-\langle {W}\rangle _{m-1}) , \xi _h \right) . \end{aligned}$$
(3.15c)

The additional difficulties in the error analysis only occur in the estimate of the initial error.

The next theorem ensures that the numerical schemes are well defined. The arguments are rather standard and we only refer to [40] Section 3 for a detailed analysis of a more general system.

Theorem 18

Let \(u_0 \in L^2_\omega L^2_x\) be \(\mathcal {F}_0\)-measurable and Assumption 1 be satisfied. Then for all \(M \in \mathbb {N}\) there exists a unique \({\textbf{v}}= (v_0,\ldots ,v_M) \in (V_h)^{M+1}\) such that \(v_m\) is \(\mathcal {F}_{t_m}\)-measurable for all \(m \in {\{{0,\ldots ,M}\}}\) and \({\textbf{v}}\) solves (3.14) respectively (3.15).

3.4 Error analysis

We are ready to state and prove the main result.

Theorem 19

(Convergence of Algorithm (3.14)) Let \(u_0 \in L^2_\omega L^2_x\) be \(\mathcal {F}_0\)-measurable and Assumption 1 be satisfied. Additionally, assume \(\mathcal {T}_h\) to be quasi-uniform. Moreover, let u be the weak solution to (2.11) and \({\textbf{v}}\) be the numerical solution of (3.14).

  1. (a)

    Assume

    $$\begin{aligned} u&\in L^2_\omega B^{1/2}_{2,\infty } L^2_x \cap L^2_\omega L^\infty _t W^{1,2}_x, \end{aligned}$$
    (3.16a)
    $$\begin{aligned} V(\nabla u)&\in L^2_\omega B^{1/2}_{2,\infty } L^2_x \cap L^2_\omega L^2_t W^{1,2}_x. \end{aligned}$$
    (3.16b)

    Then it holds

    $$\begin{aligned} \begin{aligned}&\mathbb {E} \left[ \max _{m=1,\ldots ,M} \left\Vert \langle {u}\rangle _m - v_m\right\Vert _{L^2_x}^2 + \sum _{m=1}^M\int _{I_m} \left\Vert V(\nabla u_\nu ) - V(\nabla v_m)\right\Vert _{L^2_x}^2 \,\textrm{d}\nu \right] \\&\quad \lesssim \tau + h^2. \end{aligned} \end{aligned}$$
    (3.17)
  2. (b)

    If additionally

    $$\begin{aligned} u \in L^2_\omega B^{1/2}_{\varPhi _2,\infty } L^2_x, \end{aligned}$$
    (3.18)

    then

    $$\begin{aligned} \begin{aligned} \mathbb {E} \left[ \max _{m=1,\ldots ,M} \left\Vert u(t_m)- v_m\right\Vert _{L^2_x}^2 \right] \lesssim \tau \ln (1+\tau ^{-1}) + h^2. \end{aligned} \end{aligned}$$
    (3.19)

Remark 20

The regularity assumption (3.18) is optimal in the sense, that it is the limiting space of the time regularity of the Wiener process W. Hytönen and Veraar prove in [50] that a Banach space valued Brownian motion has full 1/2-differentiability only on the Nikolskii-scale. More precisely, they show for all \(q,r \in [1,\infty )\) and \(\mathbb {P}\)-a.s.

$$\begin{aligned} W \in B^{1/2}_{\varPhi _2, \infty } \quad \text { and } \quad W \not \in B^{1/2}_{q,r}. \end{aligned}$$

The logarithmic term in (3.19) has already been used in the context of rough stochastic differential equations [23, 55]. It quantifies the distance of \(L^{\varPhi _2}\) and \(L^\infty \).

Theorem 19 can be generalized to cover fractional time and space regularity assumptions as done for the deterministic p-Laplace system in [16].

Proof

Part (a): Fix \(m \in {\{{2,\ldots ,M}\}}\), subtract (3.11) from (3.14c) and choose \(\xi _h = \Pi _2 e_m := \Pi _2 \langle {u}\rangle _m - v_m \)

Now use the symmetry of the \(L^2_x\)-projection and the algebraic identity \(2a(a-b) = a^2 - b^2 + (a-b)^2\) to get

$$\begin{aligned} H_1&= \left( \Pi _2 (e_m - e_{m-1}), \Pi _2 e_m \right) \\&= \frac{1}{2} \left( \left\Vert \Pi _2 e_m \right\Vert _{L^2_x}^2 -\left\Vert \Pi _2 e_{m-1} \right\Vert _{L^2_x}^2 + \left\Vert \Pi _2[ e_m- e_{m-1}] \right\Vert _{L^2_x}^2 \right) . \end{aligned}$$

The second term, due to the V-coercivity (2.8),

Summation over \(m \in {\{{2,\ldots ,m^*}\}}\) with \(m^* \le M\) results in

Take the maximum over \(m^* \in {\{{2,\ldots ,M}\}}\) and expectation

(3.20)

Step 1: We start by estimating the initial error. Similarly as before, we subtract the weak formulation of (3.9) from (3.14b) and choose \(\xi _h = \Pi _2 e_1\)

where \(e_0 := u_0 - v_0\). By (3.14a) we have \(\Pi _2 e_0 = 0\). Therefore,

(3.21)

Due to Hölder’s and Young’s inequalities and Itô isometry

Absorb the second term to the left hand side. For the first term we use \(a_0 \le 1\) and the Lipschitz assumption (2.3). Now, since the operator norm of the \(L^2_x\)-projection is bounded by one,

$$\begin{aligned}&\mathbb {E} \left[ \int _0^\tau a_0(\nu )^2 \left\Vert G(u_\nu ) - G(v_0)\right\Vert _{L_2(U;L^2_x)}^2 \,\textrm{d}\nu \right] \nonumber \\&\quad \lesssim \mathbb {E} \left[ \int _0^\tau \left\Vert u_\nu -v_0\right\Vert _{L^2_x}^2 \,\textrm{d}\nu \right] \lesssim \tau \left( \mathbb {E} \left[ \left\Vert u\right\Vert _{L^\infty _t L^2_x}^2 \right] + \mathbb {E} \left[ \left\Vert u_0\right\Vert _{L^2_x}^2 \right] \right) . \end{aligned}$$
(3.22)

Thanks to the generalized Young’s inequality, cf. Lemma 4,

Absorb the first term to the left hand side in (3.21). The second is split up into a space and a time error. Using the nonlinear stability of the \(L^2_x\)-projection, cf. Proposition 14,

Overall, we arrive at the estimate from (3.21)

(3.23)

Step 2: The stochastic part in (3.20) needs some refined analysis

The first, due to Hölder’s and Young’s inequalities and an index shift,

The second term can be absorbed to the left hand side in (3.20). The third term is nothing but the initial error \(J_1\). For the first term we invoke Itô isometry, the Lipschitz condition (2.3) and \(a_{m-1} \le 1\)

Decomposition in time and space error, applying Lemmas 13 and 15 and the estimate (3.22)

$$\begin{aligned} \begin{aligned}&\sum _{m=2}^M \int _{t_{m-2}}^{t_m} \mathbb {E} \left[ \left\Vert u_\nu - v_{m-2}\right\Vert _{L^2_x}^2 \right] \,\textrm{d}\nu \\&\quad \lesssim \sum _{m=3}^M \int _{t_{m-2}}^{t_m} \mathbb {E} \left[ \left\Vert u_\nu - \langle {u}\rangle _{m-2}\right\Vert _{L^2_x}^2 \right] \,\textrm{d}\nu + \tau \sum _{m=3}^M \mathbb {E} \left[ \left\Vert \langle {u}\rangle _{m-2}- \Pi _2 \langle {u}\rangle _{m-2}\right\Vert _{L^2_x}^2 \right] \\&\qquad + \tau \sum _{m=3}^M \mathbb {E} \left[ \left\Vert \Pi _2 \langle {u}\rangle _{m-2} - v_{m-2}\right\Vert _{L^2_x}^2 \right] + \int _0^{2\tau } \mathbb {E} \left[ \left\Vert u_\nu - v_0\right\Vert _{L^2_x}^2 \right] \,\textrm{d}\nu \\&\quad \lesssim \tau \mathbb {E} \left[ {[{u}]}_{ B^{1/2}_{2,\infty } L^2_x}^2 \right] + h^2 \mathbb {E} \left[ \left\Vert \nabla u \right\Vert _{L^2_t L^2_x}^2 \right] + \tau \sum _{m=3}^M \mathbb {E} \left[ \left\Vert \Pi _2 e_{m-2} \right\Vert _{L^2_x}^2 \right] \\&\qquad + \tau \left( \mathbb {E} \left[ \left\Vert u\right\Vert _{L^\infty _t L^2_x}^2 \right] + \mathbb {E} \left[ \left\Vert u_0\right\Vert _{L^2_x}^2 \right] \right) . \end{aligned} \end{aligned}$$
(3.24)

Next, we analyze the second term \(J_{2,b}\) in the upper bound for \(J_2\). Define the discrete real-valued stochastic process

(3.25)

It is convenient to use stochastic Fubini’s theorem to rewrite

$$\begin{aligned} K_{m^*} = \sum _{m=2}^{m^*} \left( \int _{t_{m-2}}^{t_m} a_{m-1}(\nu ) [G(u_\nu ) - G(v_{m-2})] \,\textrm{d}W_\nu , \Pi _2 e_{m-2} \right) . \end{aligned}$$

In the following we abbreviate \(\mathcal {F}_m:= \mathcal {F}_{t_m}\). Note, (3.25) does not define a martingale with respect to \(\mathcal {F}_{t_m}\). In fact, the discrepancy of not being a martingale can be quantified. The general strategy is to split up the sum into a martingale and an error term. The error term is called compensator. To determine how the compensator looks, we first compute the conditional expectations of \(K_M\) with respect to \(\mathcal {F}_{m^*}\), i.e.,

$$\begin{aligned}&\mathbb {E} \left[ K_M \big | \mathcal {F}_{m^*} \right] = \mathbb {E} \left[ \sum _{m = m^* + 2}^M \left( \int _{t_{m-2}}^{t_m} a_{m-1}(\nu ) [G(u_\nu ) - G(v_{m-2})] \,\textrm{d}W_\nu , \Pi _2 e_{m-2} \right) \Big | \mathcal {F}_{m^*} \right] \\&\quad + \mathbb {E} \left[ \left( \int _{t_{m^*-1}}^{t_{m^* + 1}} a_{m^*}(\nu ) [G(u_\nu ) - G(v_{m^*-1})] \,\textrm{d}W_\nu , \Pi _2 e_{m^*-1} \right) \right] + \mathbb {E} \left[ K_{m^*}\big | \mathcal {F}_{m^*} \right] \\&\quad =:\mathcal {M}_1 + \mathcal {M}_2 +\mathcal {M}_3. \end{aligned}$$

Due to the tower property of conditional expectation, the measurability of \(e_m\) with respect to \(\mathcal {F}_m\) together with the martingale property of the stochastic integral

$$\begin{aligned} \mathcal {M}_1&= \sum _{m = m^* + 2}^M \mathbb {E} \left[ \mathbb {E} \left[ \left( \int _{t_{m-2}}^{t_m} a_{m-1}(\nu ) [G(u_\nu ) - G(v_{m-2})] \,\textrm{d}W_\nu , \Pi _2 e_{m-2} \right) \Big | \mathcal {F}_{m-2} \right] \Big | \mathcal {F}_{m^*} \right] \\&= \sum _{m = m^* + 2}^M \mathbb {E} \left[ \left( \mathbb {E} \left[ \int _{t_{m-2}}^{t_m} a_{m-1}(\nu ) [G(u_\nu ) - G(v_{m-2})] \,\textrm{d}W_\nu \Big | \mathcal {F}_{m-2} \right] , \Pi _2 e_{m-2} \right) \Big | \mathcal {F}_{m^*} \right] \\&=0. \end{aligned}$$

Again using the \(\mathcal {F}_m\)-measurability of \(e_m\), we conclude that \(K_{m^*}\) is \(\mathcal {F}_{m^*}\)-measurable. Thus,

$$\begin{aligned} \mathcal {M}_3&= K_{m^*}. \end{aligned}$$

It remains to compute the conditional expectation in \(\mathcal {M}_2\). Since \(t_{m^*}\) is an interior point of \(I_{m^*+1} \cup I_{m^*} \) we split up the stochastic integral into a part that only sees values above the threshold \(t_{m^*}\) and into a lower part that only sees values below the threshold \(t_{m^*}\), i.e.,

$$\begin{aligned} \mathcal {M}_2&= \mathbb {E} \left[ \left( \int _{t_{m^*}}^{t_{m^* + 1}} a_{m^*}(\nu ) [G(u_\nu ) - G(v_{m^*-1})] \,\textrm{d}W_\nu , \Pi _2 e_{m^*-1} \right) \big | \mathcal {F}_{m^*} \right] \\&+\mathbb {E} \left[ \left( \int _{t_{m^*-1}}^{t_{m^*}} a_{m^*}(\nu ) [G(u_\nu ) - G(v_{m^*-1})] \,\textrm{d}W_\nu , \Pi _2 e_{m^*-1} \right) \big | \mathcal {F}_{m^*} \right] . \end{aligned}$$

The first vanishes due to the martingale property of the stochastic integral, while the second is measurable with respect to \(\mathcal {F}_{m^*}\). Overall,

$$\begin{aligned} \mathcal {M}_2&= \left( \int _{t_{m^*-1}}^{t_{m^*}} a_{m^*}(\nu ) [G(u_\nu ) - G(v_{m^*-1})] \,\textrm{d}W_\nu , \Pi _2 e_{m^*-1} \right) . \end{aligned}$$

\(\mathcal {M}_2\) is called compensator and quantifies the error of not being a martingale, i.e.,

$$\begin{aligned} \mathbb {E} \left[ K_M \big | \mathcal {F}_{m^*} \right] -K_{m^*} = \mathcal {M}_2. \end{aligned}$$
(3.26)

Furthermore, increments of the discrete stochastic process K satisfy

$$\begin{aligned} K_{m^*} - K_{{m^*}-1}&= \left( \int _{t_{m^*-2}}^{t_{m^*}} a_{m^*-1}(\nu ) [G(u_\nu ) - G(v_{{m^*}-2})] \,\textrm{d}W_\nu , \Pi _2 e_{{m^*}-2} \right) . \end{aligned}$$
(3.27)

(3.26) together with (3.27) allow to identify increments of the conditional expectations,

$$\begin{aligned}&\mathbb {E} \left[ K_M \big | \mathcal {F}_{m^*} \right] - \mathbb {E} \left[ K_M \big | \mathcal {F}_{{m^*}-1} \right] \\&\quad =\left( \int _{t_{m^*-1}}^{t_{m^*}} a_{m^*}(\nu ) [G(u_\nu ) - G(v_{m^*-1})] \,\textrm{d}W_\nu , \Pi _2 e_{m^*-1} \right) \\&\quad + \left( \int _{t_{m^*-1}}^{t_{m^*}} a_{m^*-1}(\nu ) [G(u_\nu ) - G(v_{{m^*}-2})] \,\textrm{d}W_\nu , \Pi _2 e_{{m^*}-2} \right) . \end{aligned}$$

Observe \(\mathbb {E}\left[ K_M \big | \mathcal {F}_1 \right] = 0\), since \(\Pi _2 e_0 = 0\). The Burkholder-Davis-Gundy’s inequality implies

$$\begin{aligned}&\mathbb {E} \left[ \max _{{m^*} \in {\{{2,\ldots ,M}\}}} \left| \mathbb {E}\left[ K_M \big | \mathcal {F}_{m^*} \right] \right|\right] \lesssim \mathbb {E} \left[ \left( \sum _{m=2}^M \left( \mathbb {E}\left[ K_M \big | \mathcal {F}_m \right] - \mathbb {E}\left[ K_M \big | \mathcal {F}_{m-1} \right] \right) ^2 \right) ^\frac{1}{2} \right] \\&\quad \lesssim \mathbb {E} \left[ \left( \sum _{m=2}^M \left\Vert \int _{t_{m-1}}^{t_m} a_m(\nu ) [G(u_\nu ) - G(v_{m-1})] \,\textrm{d}W_\nu \right\Vert _{L^2_x}^2 \left\Vert \Pi _2 e_{m-1} \right\Vert _{L^2_x}^2\right) ^\frac{1}{2} \right] \\&\quad +\mathbb {E} \left[ \left( \sum _{m=2}^M \left\Vert \int _{t_{m-1}}^{t_m} a_{m-1}(\nu )[G(u_\nu ) - G(v_{m-2})] \,\textrm{d}W_\nu \right\Vert _{L^2_x}^2 \left\Vert \Pi _2 e_{m-2} \right\Vert _{L^2_x}^2 \right) ^\frac{1}{2} \right] . \end{aligned}$$

Now Young’s inequality, Itô isometry and the Lipschitz condition (2.3) imply

$$\begin{aligned}&\mathbb {E} \left[ \max _{{m^*} \in {\{{2,\ldots ,M}\}}} \mathbb {E}\left[ K_M \big | \mathcal {F}_{m^*} \right] \right] \\&\quad \lesssim \varepsilon \mathbb {E}\left[ \max _{{m^*} \in {\{{2,\ldots ,M}\}}} \left\Vert \Pi _2 e_{m^*}\right\Vert _{L^2_x}^2 + \left\Vert \Pi _2 e_1\right\Vert _{L^2_x}^2 \right] \\&\qquad + c_\varepsilon \mathbb {E} \left[ \sum _{m=2}^M \int _{t_{m-1}}^{t_m} \left\Vert G(u_\nu ) - G(v_{m-2})\right\Vert _{L_2(U;L^2_x)}^2\,\textrm{d}\nu \right] \\&\qquad +c_\varepsilon \mathbb {E} \left[ \sum _{m=1}^M \int _{t_{m-1}}^{t_m} \left\Vert G(u_\nu ) - G(v_{m-1})\right\Vert _{L_2(U;L^2_x)}^2 \,\textrm{d}\nu \right] \\&\lesssim \varepsilon \mathbb {E}\left[ \max _{{m^*} \in {\{{2,\ldots ,M}\}}} \left\Vert \Pi _2 e_{m^*}\right\Vert _{L^2_x}^2 + \left\Vert \Pi _2 e_1\right\Vert _{L^2_x}^2 \right] \\&\qquad + c_\varepsilon \mathbb {E} \left[ \sum _{m=2}^M \int _{t_{m-1}}^{t_m} \left\Vert u_\nu - v_{m-2}\right\Vert _{L^2_x}^2 \,\textrm{d}\nu + \sum _{m=1}^M \int _{t_{m-1}}^{t_m} \left\Vert u_\nu - v_{m-1}\right\Vert _{L^2_x}^2 \,\textrm{d}\nu \right] . \end{aligned}$$

The second term is estimated as in (3.24)

$$\begin{aligned}&\mathbb {E} \left[ \sum _{m=2}^M \int _{t_{m-1}}^{t_m} \left\Vert u_\nu - v_{m-2}\right\Vert _{L^2_x}^2 \,\textrm{d}\nu + \sum _{m=1}^M \int _{t_{m-1}}^{t_m} \left\Vert u_\nu - v_{m-1}\right\Vert _{L^2_x}^2 \,\textrm{d}\nu \right] \\&\quad \lesssim \tau \mathbb {E} \left[ {[{u}]}_{ B^{1/2}_{2,\infty } L^2_x}^2 \right] + h^2 \mathbb {E} \left[ \left\Vert \nabla u \right\Vert _{L^2_t L^2_x}^2 \right] + \tau \sum _{m=3}^M \mathbb {E} \left[ \left\Vert \Pi _2 e_{m-2} \right\Vert _{L^2_x}^2 \right] \\&\qquad + \tau \left( \mathbb {E} \left[ \left\Vert u\right\Vert _{L^\infty _t L^2_x}^2 \right] + \mathbb {E} \left[ \left\Vert u_0\right\Vert _{L^2_x}^2 \right] \right) . \end{aligned}$$

Similarly, due to (3.26), Hölder’s inequality, Young’s inequality and \(\ell ^1 \hookrightarrow \ell ^\infty \)

$$\begin{aligned}&\mathbb {E} \left[ \max _{ {m^*} \in {\{{2,\ldots ,M}\}}} \left( K_{m^*} - \mathbb {E}\left[ K_M \big | \mathcal {F}_{m^*} \right] \right) \right] \\&\quad = \mathbb {E} \left[ \max _{{m^*} \in {\{{2,\ldots ,M}\}}} \left( \int _{t_{m^*-1}}^{t_{m^*}} a_{m^*}(\nu ) [G(u_\nu ) - G(v_{m^*-1})] \,\textrm{d}W_\nu , \Pi _2 e_{m^*-1} \right) \right] \\&\quad \le \mathbb {E} \left[ \max _{{m^*} \in {\{{2,\ldots ,M}\}}} \left\Vert \int _{t_{m^*-1}}^{t_{m^*}} a_{m^*}(\nu ) [G(u_\nu ) - G(v_{m^*-1})] \,\textrm{d}W_\nu \right\Vert _{L^2_x} \left\Vert \Pi _2 e_{{m^*}-1} \right\Vert _{L^2_x}\right] \\&\quad \le \varepsilon \mathbb {E} \left[ \max _{{m^*} \in {\{{2,\ldots ,M}\}}} \left\Vert \Pi _2 e_{{m^*}-1} \right\Vert _{L^2_x}^2\right] \\&\qquad + c_\varepsilon \mathbb {E} \left[ \max _{{m^*} \in {\{{2,\ldots ,M}\}}} \left\Vert \int _{t_{m^*-1}}^{t_{m^*}} a_{m^*}(\nu ) [G(u_\nu ) - G(v_{m^*-1})] \,\textrm{d}W_\nu \right\Vert _{L^2_x}^2\right] \\&\quad \le \varepsilon \mathbb {E} \left[ \max _{{m^*} \in {\{{2,\ldots ,M}\}}} \left\Vert \Pi _2 e_{{m^*}} \right\Vert _{L^2_x}^2 +\left\Vert \Pi _2 e_{1} \right\Vert _{L^2_x}^2 \right] \\&\qquad + c_\varepsilon \mathbb {E} \left[ \sum _{m=2}^M \left\Vert \int _{t_{m-1}}^{t_{m}} a_{m}(\nu ) [G(u_\nu ) - G(v_{m-1})] \,\textrm{d}W_\nu \right\Vert _{L^2_x}^2\right] . \end{aligned}$$

Together we can estimate

$$\begin{aligned} J_{2,b}&= \mathbb {E} \left[ \max _{{m^*} \in {\{{2,\ldots ,M}\}}} K_{m^*} \right] \\&= \mathbb {E} \left[ \max _{{m^*} \in {\{{2,\ldots ,M}\}}} \left( K_{m^*} - \mathbb {E}\left[ K_M \big | \mathcal {F}_{m^*} \right] + \mathbb {E}\left[ K_M \big | \mathcal {F}_{m^*} \right] \right) \right] \\&\le \mathbb {E} \left[ \max _{{m^*} \in {\{{2,\ldots ,M}\}}} \left( K_{m^*} - \mathbb {E}\left[ K_M \big | \mathcal {F}_{m^*} \right] \right) \right] + \mathbb {E} \left[ \max _{{m^*} \in {\{{2,\ldots ,M}\}}} \mathbb {E}\left[ K_M \big | \mathcal {F}_{m^*} \right] \right] \\&\lesssim \varepsilon \mathbb {E}\left[ \max _{{m^*} \in {\{{2,\ldots ,M}\}}} \left\Vert \Pi _2 e_{m^*}\right\Vert _{L^2_x}^2 + \left\Vert \Pi _2 e_1\right\Vert _{L^2_x}^2 \right] +c_\varepsilon \mathbb {E} \left[ \tau \sum _{m=2}^M \left\Vert \Pi _2 e_{m-1}\right\Vert _{L^2_x}^2 \right] \\&\quad + c_\varepsilon \left( \tau \mathbb {E} \left[ {[{u}]}_{ B^{1/2}_{2,\infty } L^2_x}^2 \right] + h^2 \mathbb {E} \left[ \left\Vert \nabla u \right\Vert _{L^2_t L^2_x}^2 \right] + \tau \mathbb {E} \left[ \left\Vert u\right\Vert _{L^\infty _t L^2_x}^2 \right] + \tau \mathbb {E} \left[ \left\Vert u_0\right\Vert _{L^2_x}^2 \right] \right) . \end{aligned}$$

This concludes the bound for \(J_2\) in (3.20)

$$\begin{aligned} \begin{aligned} J_2&\lesssim \varepsilon \mathbb {E}\left[ \max _{ {m^*} \in {\{{2,\ldots ,M}\}}} \left\Vert \Pi _2 e_{m^*}\right\Vert _{L^2_x}^2 \right] + \varepsilon \mathbb {E} \left[ \sum _{m=2}^M \left\Vert \Pi _2 (e_m- e_{m-1})\right\Vert _{L^2_x}^2 \right] + \varepsilon J_1 \\&\quad + c_\varepsilon \left( \tau \mathbb {E} \left[ {[{u}]}_{ B^{1/2}_{2,\infty } L^2_x}^2 \right] + h^2 \mathbb {E} \left[ \left\Vert \nabla u \right\Vert _{L^2_t L^2_x}^2 \right] + \tau \mathbb {E} \left[ \left\Vert u\right\Vert _{L^\infty _t L^2_x}^2 \right] + \tau \mathbb {E} \left[ \left\Vert u_0\right\Vert _{L^2_x}^2 \right] \right) \\&\quad + c_\varepsilon \mathbb {E} \left[ \tau \sum _{m=2}^M \left\Vert \Pi _2 e_m\right\Vert _{L^2_x}^2 \right] . \end{aligned} \end{aligned}$$
(3.28)

Step 3: In this step we estimate the nonlinear gradient. Jensen’s inequality, the generalized Young’s inequality (2.9) and the nonlinear stability result Proposition 14 imply

Step 4: We aim at applying a Gronwall type argument. Collecting all estimates we get

Choosing \(\varepsilon \) sufficiently small and applying Gronwall’s Lemma ensures

$$\begin{aligned} \mathbb {E} \left[ \max _{ {m^*} \in {\{{1,\ldots ,M}\}}} \left\Vert \Pi _2 e_{m^*}\right\Vert _{L^2_x}^2 \right] \lesssim e^{cT} \left( \tau + h^2 \right) . \end{aligned}$$

This implies

(3.29)

Step 5: We artificially introduce the desired error quantities and use the stability of mean-value time projections and the \(L^2\)-space projection. Let us denote . A slight modification of Lemma 15 implies

$$\begin{aligned} \mathbb {E} \left[ \sum _{m=1}^M \int _{I_m} \left\Vert V(\nabla u_\nu ) - \langle {V(\nabla u)}\rangle _{a_m} \right\Vert _{L^2_x}^2 \,\textrm{d}\nu \right]&\lesssim \tau \mathbb {E} \left[ {[{V(\nabla u)}]}_{B^{1/2}_{2,\infty } L^2_x}^2 \right] . \end{aligned}$$

Additionally, Jensen’s inequality ensures

Therefore, invoking (3.29),

$$\begin{aligned}&\mathbb {E} \left[ \sum _{m=1}^M \int _{I_m} \left\Vert V(\nabla u_\nu ) - V(\nabla v_m)\right\Vert _{L^2_x}^2 \,\textrm{d}\nu \right] \\&\quad \lesssim \mathbb {E} \left[ \sum _{m=1}^M \int _{I_m} \left\Vert V(\nabla u_\nu ) - \langle {V(\nabla u)}\rangle _{a_m} \right\Vert _{L^2_x}^2 \,\textrm{d}\nu \right] \\&\qquad + \mathbb {E} \left[ \sum _{m=1}^M \tau \left\Vert \langle {V(\nabla u)}\rangle _{a_m} - V(\nabla v_m)\right\Vert _{L^2_x}^2 \right] \\&\quad \lesssim \tau \mathbb {E} \left[ {[{V(\nabla u)}]}_{B^{1/2}_{2,\infty } L^2_x}^2 \right] + \tau + h^2. \end{aligned}$$

The assertion (3.17) follows by an application of Lemma 13

$$\begin{aligned}&\mathbb {E} \left[ \max _{m \in {\{{1,\ldots ,M}\}}} \left\Vert \langle {u}\rangle _m - v_m\right\Vert _{L^2_x}^2 \right] \\&\quad \lesssim \mathbb {E} \left[ \max _{m \in {\{{1,\ldots ,M}\}}} \left\Vert \Pi _2 e_m\right\Vert _{L^2_x}^2 \right] + \mathbb {E} \left[ \max _{m \in {\{{1,\ldots ,M}\}}} \left\Vert \langle {u}\rangle _m - \Pi _2 \langle {u}\rangle _m \right\Vert _{L^2_x}^2 \right] \\&\quad \lesssim \tau + h^2 + h^2 \mathbb {E} \left[ \left\Vert \nabla u\right\Vert _{L^\infty _t L^2_x}^2 \right] . \end{aligned}$$

Part (b): Now, let us assume \(u \in L^2_\omega B^{1/2}_{\varPhi _2,\infty } L^2_x\). We apply Lemma 17 to bound

$$\begin{aligned}&\mathbb {E}\left[ \max _{m \in {\{{1,\ldots ,M}\}}} \left\Vert u(t_m) - v_m\right\Vert _{L^2_x}^2 \right] \\&\quad \lesssim \mathbb {E}\left[ \max _{m \in {\{{1,\ldots ,M}\}}} \left\Vert \langle {u}\rangle _m - v_m\right\Vert _{L^2_x}^2 \right] + \mathbb {E}\left[ \max _{m \in {\{{1,\ldots ,M}\}}} \left\Vert u(t_m) - \langle {u}\rangle _m\right\Vert _{L^2_x}^2 \right] \\&\quad \lesssim \tau + h^2 + \tau \ln (1+\tau ^{-1}). \end{aligned}$$

This verifies (3.19) and the proof is finished. \(\square \)

One can also use time averages on the nonlinear gradient term to measure the error of the approximation.

Corollary 21

Let the assumptions of Theorem 19(a) be satisfied. Then

$$\begin{aligned} \mathbb {E}\left[ \sum _{m=1}^M \tau \left\Vert \langle {V(\nabla u)}\rangle _m - V(\nabla v_m)\right\Vert _{L^2_x}^2 \right] \lesssim \tau + h^2 \end{aligned}$$
(3.30)

and

$$\begin{aligned} \mathbb {E}\left[ \sum _{m=1}^M \tau \left\Vert V(\nabla \langle {u}\rangle _m) - V(\nabla v_m)\right\Vert _{L^2_x}^2 \right] \lesssim \tau + h^2. \end{aligned}$$
(3.31)

Proof

The estimate (3.30) immediately follows by an application of Jensen’s inequality and the bound (3.17),

$$\begin{aligned} \mathbb {E}\left[ \sum _{m=1}^M \tau \left\Vert \langle {V(\nabla u)}\rangle _m - V(\nabla v_m)\right\Vert _{L^2_x}^2 \right]&\le \mathbb {E}\left[ \sum _{m=1}^M \int _{I_m} \left\Vert V(\nabla u_\nu ) - V(\nabla v_m)\right\Vert _{L^2_x}^2 \,\textrm{d}\nu \right] \\&\lesssim \tau + h^2. \end{aligned}$$

In order to prove the second estimate (3.31) we use Lemmas 3 and 4

$$\begin{aligned}&\mathbb {E}\left[ \sum _{m=1}^M \tau \left\Vert V(\nabla \langle {u}\rangle _m) - V(\nabla v_m)\right\Vert _{L^2_x}^2 \right] \\&\quad \eqsim \mathbb {E}\left[ \sum _{m=1}^M \int _{I_m} \int _\mathcal {O} (S(\nabla \langle {u}\rangle _m) - S(\nabla v_m) ): (\nabla u_\nu - \nabla v_m) \,\textrm{d}x\,\textrm{d}\nu \right] \\&\quad \le c \mathbb {E}\left[ \sum _{m=1}^M \int _{I_m} \left\Vert V(\nabla u_\nu ) - V(\nabla v_m)\right\Vert _{L^2_x}^2 \,\textrm{d}\nu \right] \\&\qquad + \frac{1}{2} \mathbb {E}\left[ \sum _{m=1}^M \tau \left\Vert V(\nabla \langle {u}\rangle _m) - V(\nabla v_m)\right\Vert _{L^2_x}^2 \right] . \end{aligned}$$

Absorbing the second term to the left hand side and applying the bound (3.17) verifies the assertion. \(\square \)

Remark 22

Although both (3.30) and (3.31) enjoy the same convergence rates, it is not clear whether one dominates the other. In the linear case, \(p=2\), the terms coincide.

The averaged error quantities (3.30) and (3.31) are equivalent up to oscillation to the error quantity (3.17).

Lemma 23

Let \(u \in L^p_t W^{1,p}_x\) and \(A \in L^p_x\). Then

(3.32)

and

(3.33)

Proof

The Eq. (3.32) follows by using

The estimate (3.33) is obtained trivially,

\(\square \)

Remark 24

In [28, Lemma 6.2] the authors prove the equivalence (although it is done purely in space but can be extended to time) of

If \(V(\nabla u) \in B^{1/2}_{2,\infty } L^2_x \), then Lemma 15 implies

Theorem 25

(Convergence of Algorithm 3.15) Let the assumptions of Theorem 19 be satisfied. Denote by \({\textbf{w}}\in (V_h)^{M+1}\) the solution to (3.15) and by u the weak solution to (2.11). Then

$$\begin{aligned} \begin{aligned}&\mathbb {E} \left[ \max _{m=1,\ldots ,M} \left\Vert \langle {u}\rangle _m - w_m\right\Vert _{L^2_x}^2 + \sum _{m=1}^M\int _{I_m} \left\Vert V(\nabla u_\nu ) - V(\nabla w_m)\right\Vert _{L^2_x}^2 \,\textrm{d}\nu \right] \\&\quad \lesssim \tau + h^2 \end{aligned} \end{aligned}$$
(3.34)

and

$$\begin{aligned} \mathbb {E} \left[ \max _{m=1,\ldots ,M} \left\Vert u(t_m) - w_m\right\Vert _{L^2_x}^2 \right] \lesssim \tau \ln (1+\tau ^{-1}) + h^2. \end{aligned}$$
(3.35)

Proof

The proof proceeds similarly to the proof of Theorem 19. We will only prove the bound for the initial error. Instead of comparing \(w_1\) to \(\langle {u}\rangle _1\), we rather choose . The equation for the latter one reads

(3.36)

Subtracting (3.15b) from (3.36) and choosing \(\xi _h = \Pi _2 \langle {u}\rangle _{I_1 \cup I_2} - w_1\) results in

Due to Hölder’s and Young’s inequalities, Itô isometry and the growth assumption (2.2)

The boundedness of u as an \(L^2_x\) valued-process implies

$$\begin{aligned} \mathbb {E} \left[ A_3 \right] \le \frac{1}{4} \mathbb {E} \left[ A_1 \right] + c \frac{2}{3}\tau \left\Vert 1+u\right\Vert _{L^2_\omega L^\infty _t L^2_x}^2. \end{aligned}$$

The forth term is estimated similarly,

$$\begin{aligned} \mathbb {E}\left[ A_4\right]&\le \frac{1}{4} \mathbb {E} \left[ A_1 \right] + \mathbb {E} \left[ \left\Vert G(w_0) \langle {W}\rangle _1\right\Vert _{L^2_x}^2 \right] \\&\le \frac{1}{4} \mathbb {E} \left[ A_1 \right] + c \frac{1}{3}\tau \mathbb {E} \left[ \left\Vert 1+\Pi _2 u_0\right\Vert _{L^2_x}^2 \right] . \end{aligned}$$

Using the \(L^2\)-stability of the \(L^2\)-projection we obtain

$$\begin{aligned} \mathbb {E} \left[ A_4 \right] \le \frac{1}{4} \mathbb {E} \left[ A_1 \right] + c \frac{1}{3}\tau \left\Vert 1+u_0\right\Vert _{L^2_\omega L^2_x}^2. \end{aligned}$$

It remains to check the second term. Here we use the same arguments as in step 3 of the proof of Theorem 19 to conclude

$$\begin{aligned} A_2&\ge \int _{I_1} \left\Vert V(\nabla u_\nu ) - V(\nabla w_1)\right\Vert _{L^2_x}^2 \,\textrm{d}\nu - c\left( \tau {[{V(\nabla u)}]}_{B^{1/2}_{2,\infty } L^2_x}^2 + h^2 \left\Vert \nabla V(\nabla u)\right\Vert _{L^2_t L^2_x}^2 \right) . \end{aligned}$$

Overall, we have established

$$\begin{aligned}&\mathbb {E} \left[ \left\Vert \Pi _2 \langle {u}\rangle _{I_1 \cup I_2} - w_1\right\Vert _{L^2_x}^2\right] + \mathbb {E} \left[ \int _{I_1} \left\Vert V(\nabla u_\nu ) - V(\nabla w_1)\right\Vert _{L^2_x}^2 \,\textrm{d}\nu \right] \\&\lesssim \tau \mathbb {E} \left[ \left\Vert 1+u\right\Vert _{L^\infty _t L^2_x}^2 + \left\Vert 1+u_0\right\Vert _{L^2_x}^2 + {[{V(\nabla u)}]}_{B^{1/2}_{2,\infty }L^2_x}^2 \right] + h^2 \mathbb {E} \left[ \left\Vert \nabla V(\nabla u)\right\Vert _{L^2_t L^2_x}^2 \right] . \end{aligned}$$

Lastly, using Lemmas 13 and 15

$$\begin{aligned}&\mathbb {E} \left[ \left\Vert \langle {u}\rangle _1 - w_1\right\Vert _{L^2_x}^2 \right] \\&\quad \lesssim \mathbb {E} \left[ \left\Vert \langle {u}\rangle _1 - \langle {u}\rangle _{I_1 \cup I_2}\right\Vert _{L^2_x}^2 \right] +\mathbb {E} \left[ \left\Vert \langle {u}\rangle _{I_1 \cup I_2} - \Pi _2 \langle {u}\rangle _{I_1 \cup I_2}\right\Vert _{L^2_x}^2 \right] \\&\qquad \, +\mathbb {E} \left[ \left\Vert \Pi _2 \langle {u}\rangle _{I_1 \cup I_2} - w_1\right\Vert _{L^2_x}^2 \right] \\&\quad \lesssim \tau \mathbb {E} \left[ \left\Vert 1+u\right\Vert _{L^\infty _t L^2_x}^2 + \left\Vert 1+u_0\right\Vert _{L^2_x}^2 + {[{u}]}_{B^{1/2}_{2,\infty } L^2_x}^2 + {[{V(\nabla u)}]}_{B^{1/2}_{2,\infty }L^2_x}^2 \right] \\&\qquad \, + h^2 \mathbb {E}\left[ \left\Vert \nabla u\right\Vert _{L^\infty _t L^2_x}^2 +\left\Vert \nabla V(\nabla u)\right\Vert _{L^2_t L^2_x}^2\right] . \end{aligned}$$

The bound for the initial error is complete. \(\square \)

Remark 26

We want to remark, although the first step (3.15b) is announced to be a full step, it is not a full first step. The deterministic drift is scaled by a full step \(\tau \), but the stochastic term is only scaled by \(\langle {W}\rangle _1\). In Corollary 33 we find that \(\langle {W}\rangle _1\) has only half the variance of a full stochastic step.

To overcome this problem we can introduce a stochastic dummy variable \(\mathcal {W}_0 \sim \mathcal {N}(0,\tau /3)\)Footnote 1 that artificially makes up for the missing randomness. Then (3.15b) needs to be substituted by

$$\begin{aligned} \left( w_1 - w_0, \xi _h \right) + \tau \left( S(\nabla w_1), \nabla \xi _h \right)&= \left( G(w_0)\big (\langle {W}\rangle _1 + \mathcal {W}_0 \big ) , \xi _h \right) . \end{aligned}$$
(3.37)

The error analysis of Theorem 25 can be extended to this algorithm.

4 Discrete stochastic processes

In this section we investigate the law of averaged Wiener processes and propose an implementable sampling algorithm.

4.1 Wiener process

We introduce the concept of Hilbert space valued Gaussian processes.

Definition 27

A stochastic process Y is called U-valued Gaussian process with mean operator \(m: I \times U \rightarrow {\mathbb {R}}\) and variance operator \(\Sigma : I \times U \times U \rightarrow {\mathbb {R}}\), if for all \(t \in I\) and \(u \in U\) it holds

$$\begin{aligned} \varphi _{Y_t}(u):= \mathbb {E}\left[ e^{-i \left( Y_t, u \right) _U} \right] = e^{-i m_t(u) - \frac{1}{2} \Sigma _t(u,u)}. \end{aligned}$$

In short, we write \(Y_t \sim \mathcal {N}\left( m_t, \Sigma _t \right) \).

It follows that the stochastic forcing W defined by (2.1) is an U-valued Gaussian process with mean operator \(m_t(u) = 0\) and variance operator \(\Sigma _t(u,v) := t (u,v)_U\) for all \(t\in I\) and \(u,v \in U\). Moreover, the series (2.1) converges in the weak topology of U, due to the hypercontractivity of normally distributed random variables it holds for any \(q >0\), \(u \in U\) and \(t,s \in I\)

$$\begin{aligned} \left( \mathbb {E} \left[ \left|\left( W_t - W_s, u \right) _U\right|^q \right] \right) ^\frac{1}{q}&= \left( \mathbb {E} \left[ \left| \sum _{j\in \mathbb {N}} \left( u_j , u \right) _U \left( \beta _t^j - \beta _s^j\right) \right|^q \right] \right) ^\frac{1}{q}\\&\eqsim \sqrt{q} \left|t-s\right|^\frac{1}{2} \left\Vert u\right\Vert _U. \end{aligned}$$

In fact, as soon as the index set is infinite we lose the norm convergence, since

$$\begin{aligned} \left( \mathbb {E}\left[ \left\Vert W_t\right\Vert _U^q \right] \right) ^\frac{1}{q} \eqsim \sqrt{q} \left( t \sum _{j\in \mathbb {N}} \left\Vert u_j\right\Vert _U^2 \right) ^\frac{1}{2} = \infty . \end{aligned}$$

4.2 Averaged Wiener process

In this section we compute the distributions of the random variables \((\langle {W}\rangle _m)_{m=1}^M\) and the joint distribution of the averaged increments.

A key tool in the derivation of the distribution of the averaged Wiener process is the decomposition of the process W adjusted to the equidistant time partition \({\{{I_m}\}}_{m=1}^M\). We decompose \(W|_{I_m}\) into a Brownian bridge \(\mathcal {B}_m\) and its nodal values \(W(t_{m-1})\) and \(W(t_{m})\), i.e., for \(t \in I_m\)

$$\begin{aligned} W(t) = W(t_{m-1}) + \mathcal {B}_m(t) + \frac{t - t_{m-1}}{\tau }\Delta _m W, \end{aligned}$$
(4.1)

where

$$\begin{aligned} \mathcal {B}_m(t) := W(t) - W(t_{m-1}) - \frac{t - t_{m-1}}{\tau }\Delta _m W. \end{aligned}$$
(4.2)

Brownian bridges have nice independency properties. They do not look into the past nor future. The following result can be found in [63, Section 1.2].

Proposition 28

Let \((\mathcal {B}_m)_{m=1}^M\) be given by (4.2). Then for all \(m \in {\{{1,\ldots ,M}\}}\)

$$\begin{aligned} \sigma \left( \mathcal {B}_m(t) \big | t \in I_m \right) \perp \sigma \left( W(t) \big | t\in [0,\infty ) \backslash (t_{m-1},t_m) \right) , \end{aligned}$$
(4.3)

i.e. all finite dimensional distributions of the generators of each sigma algebra are independent of each other.

Corollary 29

Let \((\mathcal {B}_m)_{m=1}^M\) be given by (4.2). Then \(\mathcal {B}_1, \ldots , \mathcal {B}_M, \Delta _1 W, \ldots , \Delta _M W\) are independent.

Next, we take the time average over the interval \(I_m\) in (4.1) and obtain

$$\begin{aligned} \langle {W}\rangle _m = W(t_{m-1}) + \langle {\mathcal {B}_m}\rangle _m + \frac{\Delta _m W}{2}. \end{aligned}$$
(4.4)

Now, it is our choice whether we want to compute the distribution of \(\langle {W}\rangle _m\) or \(\langle {\mathcal {B}_m}\rangle _m\). The formula (4.4) provides an easy way to compute the remaining one. We choose to compute the distribution of \(\langle {W}\rangle _m\). An application of Itô’s formula for \(f(s,W_s) = \frac{s-t_m}{\tau } W_s\) implies \(\mathbb {P}\)-a.s.

$$\begin{aligned} \langle {W}\rangle _m = W_{t_{m-1}}+ \int _{t_{m-1}}^{t_m} \frac{t_m - s}{\tau } \,\textrm{d}W_s. \end{aligned}$$
(4.5)

A stochastic integral that is driven by a Wiener process and a deterministic integrand stays Gaussian.

Lemma 30

([4] Prop. 7.1) Let \(f \in L^2(I)\). Then

$$\begin{aligned} t \mapsto \int _0^t f_s \,\textrm{d}W_s \end{aligned}$$

is an U-valued Gaussian process with zero mean and variance

$$\begin{aligned} \Sigma _t (u,v) = \int _0^t \left|f_s\right|^2 \,\textrm{d}s\left( u,v \right) _U. \end{aligned}$$

Corollary 31

\(\langle {W}\rangle _m\) is an U-valued Gaussian random variable with zero mean and variance \(\Sigma (u,v) = \frac{2t_{m-1} + t_m}{3}(u,v)_U \).

Proof

Let us define

$$\begin{aligned} \langle {W}\rangle _m = W_{t_{m-1}}+ \int _{t_{m-1}}^{t_m} \frac{t_m - s}{\tau } \,\textrm{d}W_s =: W_a + W_b. \end{aligned}$$

Note, that \(W_a\) and \(W_b\) are independent. Moreover, \(W_a\) is Gaussian with variance \(\Sigma _a(u,v) = t_{m-1} \left( u,v \right) _U\) and due to Lemma 30\(W_b\) is also Gaussian. Therefore \(\langle {W}\rangle _m\) is Gaussian and it suffices to compute the mean and the variance operators.

Let \(u,v \in U\). Then \(\mathbb {E} \left[ \left( \langle {W}\rangle _m, u \right) _{U} \right] = 0\) and

$$\begin{aligned}&\mathbb {E} \left[ \left( \langle {W}\rangle _m, u \right) _{U} \left( \langle {W}\rangle _m, v \right) _{U} \right] \\&\quad = \mathbb {E} \left[ \left( W_a, u \right) _{U}\left( W_a, v \right) _{U} \right] + \mathbb {E} \left[ \left( W_b, u\right) _{U}\left( W_b, v\right) _{U} \right] \\&\quad = \left( u,v \right) _U \left( t_{m-1} + \int _{t_{m-1}}^{t_m} \left( \frac{t_m - s}{\tau } \right) ^2 \,\textrm{d}s\right) \\&\quad = \left( u,v \right) _U \frac{ 2t_{m-1} + t_m}{3}. \end{aligned}$$

The assertion is proved. \(\square \)

Corollary 32

\(\langle {\mathcal {B}_m}\rangle _m\) is an U-valued Gaussian random variable with zero mean and variance \(\Sigma (u,v) = \frac{\tau }{12}(u,v)_U\).

Proof

The distribution of a random variable is uniquely determined by its characteristic function. Corollary 31 implies

$$\begin{aligned} \varphi _{\langle {W}\rangle _m} (u)&= e^{- \frac{1}{2} \frac{2 t_{m-1} + t_m}{3} \left\Vert u\right\Vert ^2_U}. \end{aligned}$$

Classically, we find

$$\begin{aligned} \varphi _{W(t_{m-1})}(u)&= e^{- \frac{1}{2} t_{m-1} \left\Vert u\right\Vert ^2_U}, \\ \varphi _{\frac{\Delta _m W}{2}}(u)&= e^{- \frac{1}{2} \frac{\tau }{4} \left\Vert u\right\Vert ^2_U}. \end{aligned}$$

The characteristic function of \(\varphi _{\langle {W}\rangle _m}\) factors due to the independence of the decomposition (4.4), i.e.,

$$\begin{aligned} \varphi _{\langle {W}\rangle _m}(u) = \varphi _{W(t_{m-1}) + \langle {\mathcal {B}_m}\rangle _m + \frac{\Delta _m W}{2}}(u) = \varphi _{W(t_{m-1})}(u) \varphi _{\langle {\mathcal {B}_m}\rangle _m }(u) \varphi _{\frac{\Delta _m W}{2}}(u). \end{aligned}$$

Rearranging implies

$$\begin{aligned} \varphi _{\langle {\mathcal {B}_m}\rangle _m }(u)&= \frac{\varphi _{\langle {W}\rangle _m} (u)}{\varphi _{W(t_{m-1})}(u) \varphi _{\frac{\Delta _m W}{2}}(u)} \\&= e^{- \frac{1}{2} \left( \frac{2 t_{m-1} + t_m}{3} - t_{m-1} - \frac{\tau }{4} \right) \left\Vert u\right\Vert _U^2} = e^{- \frac{1}{2} \frac{\tau }{12}\left\Vert u\right\Vert _U^2}. \end{aligned}$$

Overall, \(\langle {\mathcal {B}_m}\rangle _m\) has the characteristic function of a Gaussian random variable with zero mean and variance \(\Sigma (u,v) = \frac{\tau }{12}(u,v)_U\). \(\square \)

At this point it is a simple task to find the distribution of the increments of the averaged Wiener process. Let us subtract (4.4) for m and \(m-1\)

$$\begin{aligned} \Delta _m \mathbb {W} := \langle {W}\rangle _m - \langle {W}\rangle _{m-1} = \frac{\Delta _m W + \Delta _{m-1} W}{2} + \langle {\mathcal {B}_m}\rangle _m - \langle {\mathcal {B}_{m-1}}\rangle _{m-1}, \end{aligned}$$
(4.6)

where we define \(\langle {W}\rangle _0 := \Delta _0 W:= \langle {\mathcal {B}_0}\rangle _0:= 0\).

Corollary 33

\(\Delta _m \mathbb {W}\) is an U-valued Gaussian random variable with zero mean and variance \(\Sigma (u,v) = \left( \frac{2\tau }{3} \chi _{{\{{m \ge 2}\}}} + \frac{\tau }{3} \chi _{{\{{m=1}\}}} \right) \left( u, v \right) _U\).

Proof

The right hand side of (4.6) is a sum of independent, centered Gaussian random variables. Thus the left hand side is centered Gaussian. Now, it suffices to compute the variance operator.

Note, in the case \(m=1\) we have \(\Delta _1 \mathbb {W} = \langle {W}\rangle _1\) and the result follows by Corollary 31.

Let \(m \ge 2\) and \(u,v \in U\). Due to the independence,

$$\begin{aligned}&\mathbb {E} \left[ \left( \Delta _m \mathbb {W}, u \right) _{U} \left( \Delta _m \mathbb {W}, v \right) _{U} \right] \\&\quad = \mathbb {E} \left[ \left( \frac{\Delta _m W}{2}, u \right) _{U} \left( \frac{\Delta _m W}{2}, v \right) _{U} \right] + \mathbb {E} \left[ \left( \frac{\Delta _{m-1} W}{2}, u \right) _{U} \left( \frac{\Delta _{m-1} W}{2}, v \right) _{U} \right] \\&\qquad \, + \mathbb {E} \left[ \left( \langle {\mathcal {B}_m}\rangle _m, u \right) _{U} \left( \langle {\mathcal {B}_m}\rangle _m, v \right) _{U} \right] + \mathbb {E} \left[ \left( \langle {\mathcal {B}_{m-1}}\rangle _{m-1}, u \right) _{U} \left( \langle {\mathcal {B}_{m-1}}\rangle _{m-1}, v \right) _{U} \right] \\&\quad = \frac{2\tau }{3}\left( u, v \right) _U. \end{aligned}$$

\(\square \)

So far we have identified how each averaged increment \(\Delta _m \mathbb {W}\) is distributed. However, we also need to know what the joint distribution is, i.e., the distribution of a random vector.

Lemma 34

The random vector \((\Delta _m \mathbb {W})_{m=1}^M\) is an \(U^M\)-valued centered Gaussian random variable with variance operator \(\Sigma :U^M \times U^M \rightarrow {\mathbb {R}}\) given by

$$\begin{aligned} \Sigma ({\textbf{u}}, {\textbf{v}}) := \sum _{m,l=1}^M \sigma _{m,l} \left( u_m, v_l\right) _U, \end{aligned}$$

for \({\textbf{u}}, {\textbf{v}}\in U^M\) and

$$\begin{aligned} \sigma _{m,l}&= {\left\{ \begin{array}{ll} \frac{1}{3} \tau &{} \text { if }\quad l=m=1,\\ \frac{2}{3} \tau &{} \text { if }\quad l=m> 1, \\ \frac{1}{6} \tau &{} \text { if }\quad \left|l-m\right| = 1, \\ 0 &{} \text { if }\quad \left|l-m\right|> 1. \end{array}\right. } \end{aligned}$$
(4.7)

Proof

The Eq. (4.6) implies, that the random vector \((\Delta _m \mathbb {W})_{m=1}^M\) can be constructed via a linear transformation of independent Gaussian random vectors \((\Delta _m W)_{m=1}^M\) and \((\langle {\mathcal {B}_m}\rangle _m)_{m=1}^M\), i.e.,

$$\begin{aligned} \left( \Delta _m \mathbb {W} \right) _{m=1}^M = \mathbb {K}_1 (\Delta _m W)_{m=1}^M + \mathbb {K}_2 (\langle {\mathcal {B}_m}\rangle _m)_{m=1}^M, \end{aligned}$$

where \(\mathbb {K}_1, \mathbb {K}_2 \in {\mathbb {R}}^{M \times M}\) are given by

$$\begin{aligned} \mathbb {K}_1&= \frac{1}{2} \begin{pmatrix} 1 &{}\quad 0 &{}\quad 0 &{}\quad 0 &{}\quad \dots &{}\quad 0 \\ 1 &{}\quad 1 &{}\quad 0 &{}\quad 0 &{}\quad \dots &{}\quad 0 \\ 0 &{}\quad 1 &{}\quad 1 &{}\quad 0 &{}\quad \dots &{}\quad 0 \\ \vdots &{}\quad \vdots &{}\quad \vdots &{}\quad \vdots &{}\quad \vdots &{}\quad \vdots \\ 0 &{}\quad 0 &{}\quad \ldots &{}\quad 1 &{}\quad 1 &{}\quad 0 \\ 0 &{}\quad 0 &{}\quad \ldots &{}\quad 0 &{}\quad 1 &{}\quad 1 \end{pmatrix}, \quad \mathbb {K}_2 = \begin{pmatrix} 1 &{}\quad 0 &{}\quad 0 &{}\quad 0 &{}\quad \dots &{}\quad 0 \\ -1 &{}\quad 1 &{}\quad 0 &{}\quad 0 &{}\quad \dots &{}\quad 0 \\ 0 &{}\quad -1 &{}\quad 1 &{}\quad 0 &{}\quad \dots &{}\quad 0 \\ \vdots &{}\quad \vdots &{}\quad \vdots &{}\quad \vdots &{}\quad \vdots &{}\quad \vdots \\ 0 &{}\quad 0 &{}\quad \ldots &{}\quad -1 &{}\quad 1 &{}\quad 0 \\ 0 &{}\quad 0 &{}\quad \ldots &{}\quad 0 &{}\quad -1 &{}\quad 1 \end{pmatrix}. \end{aligned}$$

Therefore, \((\Delta _m \mathbb {W})_{m=1}^M\) is itself a centered Gaussian vector. It remains to compute the covariance matrix. Let \(u,v \in U\) and \(m, l \in {\{{1, \ldots , M}\}}\). If \(m = l\) Corollary 33 implies

$$\begin{aligned} \mathbb {E} \left[ \left( \Delta _m \mathbb {W}, u \right) \left( \Delta _l \mathbb {W}, v\right) \right] = \left( \frac{2\tau }{3} \chi _{{\{{m \ge 2}\}}} + \frac{\tau }{3} \chi _{{\{{m=1}\}}} \right) \left( u, v \right) _U. \end{aligned}$$

If \(\left|m-l\right|> 1\), then Eq. (4.6) and the independence imply

$$\begin{aligned} \mathbb {E} \left[ \left( \Delta _m \mathbb {W}, u \right) \left( \Delta _l \mathbb {W}, v\right) \right] = 0. \end{aligned}$$

It remains to consider the case \(\left|m-l\right|=1\). Without loss of generality \(l = m+1\). Now, using (4.6), the independence and Corollary 32

$$\begin{aligned}&\mathbb {E} \left[ \left( \Delta _m \mathbb {W}, u \right) \left( \Delta _{m+1} \mathbb {W}, v\right) \right] \\&\quad = \mathbb {E} \left[ \left( \frac{\Delta _m W}{2}, u \right) \left( \frac{\Delta _m W}{2}, v\right) \right] - \mathbb {E} \left[ \left( \langle {\mathcal {B}_m}\rangle _m, u \right) \left( \langle {\mathcal {B}_m}\rangle _m, v\right) \right] \\&\quad = \left( \frac{\tau }{4} - \frac{\tau }{12} \right) \left( u,v \right) _U = \frac{\tau }{6} \left( u,v \right) _U. \end{aligned}$$

The proof is finished. \(\square \)

4.3 Sampling algorithm

On the computer we are forced to approximate the continuous measure induced by the Wiener process W by an empirical measure. Additionally, if we want to compare different numerical schemes we need to specify how to sample the random input needed for the involved algorithms jointly. More specifically, if we want to compare our algorithms (3.14) and (3.15) and the classical Euler–Maruyama discretization (5.1), we need to sample according to the law of the random vector

$$\begin{aligned} \left( \Delta _1 W, \ldots , \Delta _M W, \Delta _1 \mathbb {W}, \ldots , \Delta _M \mathbb {W} \right) . \end{aligned}$$
(4.8)

Based on the decomposition (4.6) we propose the following sampling algorithm. The time discretization is achieved by the parameter \(M \in \mathbb {N}\) and the series truncation in (2.1) is done by the parameter \(J \in \mathbb {N}\).

Given \(M,J \in \mathbb {N}\).

  1. (a)

    (Sampling) Compute i.i.d. random variables \(\zeta _{m}^j, \eta _{m}^j \sim \mathcal {N}(0,1)\) for \(m \in {\{{1,\ldots ,M}\}}\) and \(j \in {\{{1,\ldots ,J}\}}\).

  2. (b)

    (Lift to Hilbert space U) For \(m \in {\{{1,\ldots ,M}\}}\) define the random variables

    $$\begin{aligned} Z_m := \sqrt{\tau } \sum _{j=1}^J u^j \zeta _{m}^j, \quad \tilde{\mathbb {Z}}_m := \sqrt{ \frac{\tau }{12} }\sum _{j=1}^J u^j \eta _{m}^j, \end{aligned}$$
    (4.9a)

    where \({\{{u^j}\}}_{j\in \mathbb {N}}\) is an orthonormal system of U.

  3. (c)

    (Adjusting correlation) For \(m \in {\{{1,\ldots ,M}\}}\) define the random variables

    $$\begin{aligned} \mathbb {Z}_m := \frac{Z_m + Z_{m-1}}{2} + \tilde{\mathbb {Z}}_m - \tilde{\mathbb {Z}}_{m-1}, \end{aligned}$$
    (4.9b)

    where \(Z_0 := \tilde{\mathbb {Z}}_0 := 0\).

Let \(\Pi _J\) be the U-orthogonal projection onto \(U_J := \text {span}(u_1,\ldots , u_J)\). The following proposition guarantees that the sampling algorithm (4.9) approximates the desired random variables.

Proposition 35

Let \({\textbf{Z}}:= \left( Z_1, \ldots , Z_M,\mathbb {Z}_1,\ldots ,\mathbb {Z}_M \right) \in U^{2M}\) be generated by (4.9a) and (4.9b). Then,

$$\begin{aligned} {\textbf{Z}}\sim \left( \Pi _J \Delta _1 W, \ldots , \Pi _J \Delta _M W, \Pi _J \Delta _1 \mathbb {W}, \ldots , \Pi _J \Delta _M \mathbb {W} \right) , \end{aligned}$$

where \(\Pi _J\) is the U-orthogonal projection onto \(\text {span}(u^1,\ldots , u^J)\).

Proof

First, we need to observe that \(Z_m \sim \Pi _J \Delta _m W\) and \({\tilde{Z}}_m \sim \Pi _J \langle {\mathcal {B}_m }\rangle _m\). Then, the statement follows similarly to the proof of Lemma 34. \(\square \)

Remark 36

Proposition 35 ensures that we can compare our algorithm to the classical Euler–Maruyama discretization (5.1) on an equidistant time grid. It can be adjusted to also match non-equidistant grids.

5 Simulations

In this section we perform numerical simulations to test our algorithms (3.14) and (3.15). We denote the solution of (3.14) as \({\textbf{v}}^{\text {Half}}\) and the solution of (3.15) as \({\textbf{v}}^{\text {Full}}\). Additionally, we compare our algorithms to the classical Euler–Maruyama discretization of (2.11). That reads, find \({\textbf{v}}\in (V_h)^{M+1}\) such that for all \(\xi _h \in V_h\), \(m \ge 1\) and \(\mathbb {P}\)-a.s.

$$\begin{aligned} \left( v_m- v_{m-1}, \xi _h \right) + \tau \left( S(\nabla v_m), \nabla \xi _h \right) = \left( G(v_{m-1}) \Delta _m W, \xi _h \right) . \end{aligned}$$
(5.1)

The solution to (5.1) is called \({\textbf{v}}^{\text {EM}}\).

A plain convergence result for the Euler–Maruyama scheme without any rate has been obtained in [65]. A more sophisticated analysis has been done in [39]. However, they only obtain convergence for the \(L^2_\omega L^\infty _t L^2_x\)-error with rate 1/4.

Originally the Euler–Maruyama scheme has been introduced for stochastic ordinary differential equations. In this context much more is known, see e.g. the book of Kloeden and Platen [52, Section 9.5]. However, when dropping the Lipschitz assumption on the coefficients, divergence with positive probability has been obtained in [49].

We are particularly interested in the experimental study of the following questions:

  1. (a)

    Do the algorithms (3.14) and (3.15) approximate mean-values or point-values?

  2. (b)

    How does the Euler–Maruyama scheme (5.1) compares to (3.14) and (3.15) in terms of time and space convergence?

  3. (c)

    How do the different error quantities (3.17), (3.30) and (3.31) for the gradient relate to each other?

  4. (d)

    How sensitive are the algorithms with respect to the parameter p?

All simulations are done with the help of the open source tool for solving partial differential equations FEniCS [62].

5.1 An explicit solution

In the linear case, \(p=2\), with a linear right hand side

$$\begin{aligned} G(v) \Delta W = \lambda v \, \Delta \beta ^1, \end{aligned}$$
(5.2)

for some \(\lambda \in {\mathbb {R}}\), it is possible to find an explicit solution. If we start the evolution defined by (2.11) in an eigenfunction of the Laplace operator, the dynamics become simpler. Let \(\mathcal {O} = (0,1)^2\), \(T = 1\) and \(u_0(x) = \sin (\pi x_1) \sin ( \pi x_2)\). Note that \(u_0\) is an eigenfunction of the 2-Laplacian with homogeneous Dirichlet data and corresponding eigenvalue \(\mu = 2 \pi ^2\). The unique solution to (2.11) is given by

$$\begin{aligned} \begin{aligned} u(\omega ,t,x)&= \exp \left\{ - \left( \frac{\lambda ^2}{2}+ \mu \right) t + \lambda \beta ^1(\omega ,t)\right\} u_0(x). \end{aligned} \end{aligned}$$
(5.3)

Similarly, we can give a closed expression for the solution \(u_h\) to the space-discrete equation

$$\begin{aligned} \,\textrm{d}\left( u_h, \xi _h \right) + \left( \nabla u_h, \nabla \xi _h \right) \,\textrm{d}t= \lambda \left( u_h,\xi _h \right) \,\textrm{d}\beta ^1(t). \end{aligned}$$
(5.4)

This is equivalent to the system of linear stochastic differential equations

$$\begin{aligned} \,\textrm{d}M {\textbf{u}}+ S {\textbf{u}}\,\textrm{d}t= \lambda M {\textbf{u}}\,\textrm{d}\beta ^1(t), \end{aligned}$$
(5.5)

where \(M_{i,j} = \left( \xi _h^j, \xi _h^i \right) \) and \(S_{i,j} = \left( \nabla \xi _h^j, \nabla \xi _h^i \right) \) is the mass respectively the stiffness matrix and \({\{{\xi _h^i}\}}\) form a basis of \(V_{h}\). The eigenpairs of the discretized Laplacian are related to the linear system

$$\begin{aligned} S {\textbf{u}}= \mu M {\textbf{u}}\Leftrightarrow M^{-1} S {\textbf{u}}= \mu {\textbf{u}}. \end{aligned}$$
(5.6)

Let \((\mu _h,{\textbf{u}}_h) \in (0,\infty ) \times {\mathbb {R}}^{\left|V_{h}\right|}\) be a solution to (5.6), then the solution to (5.4) started in \(u_h(0) = {\textbf{u}}_h \cdot {\varvec{\xi }}_h\) is given by

$$\begin{aligned} u_h(\omega ,t,x) = \exp \left\{ -\left( \frac{\lambda ^2}{2} + \mu _h \right) t + \lambda \beta ^1(\omega ,t) \right\} u_h(0). \end{aligned}$$
(5.7)

The main advantage of having an analytic solution of the space discrete equation is, that it rules out any space discretization errors.

To accurately compare continuous processes and discrete vectors, one needs to either lift the vector to a process or project the process to a vector. We do the latter approach and evaluate the continuous process \(u_h\) on the equidistant partition of I. This leads to the definition

$$\begin{aligned} {\textbf{u}}^{\text {Point}} := \left( u_h(t_m) \right) _{m=1}^M. \end{aligned}$$
(5.8)

Since the algorithms (3.14) and (3.15) approximate the mean value of the analytic solution, we define

$$\begin{aligned} {\textbf{u}}_{\text {exact}}^{\text {Aver}} := \left( \langle {u_h}\rangle _{I_m} \right) _{m=1}^M. \end{aligned}$$
(5.9)

Although we know the exact solution, the time averages of the exact solution are non-treatable without knowledge of the full trajectory of the Brownian motion \(\beta ^1\). Numerically, we substitute \(\langle {u_h}\rangle _{I_m}\) by a Riemann sum approximation, i.e. we fix an equidistant partition \({\{{[t_{m,k-1},t_{m,k}]}\}}_{k=1}^r\) of \(I_m\) with resolution \(r \in \mathbb {N}\) and define

$$\begin{aligned} {\textbf{u}}^{\text {Aver}} := \left( \frac{1}{r} \sum _{k=1}^r u_h(t_{m,k}) \right) _{m=1}^M. \end{aligned}$$
(5.10)

The approximation quality is measured in the error quantities

$$\begin{aligned} \mathcal {E}( {\textbf{u}},{\textbf{v}})&:= \mathbb {E} \left[ \max _{m=1,\ldots ,M} \left\Vert u_m - v_m\right\Vert _{L^2_x}^2 \right] , \end{aligned}$$
(5.11a)
$$\begin{aligned} \mathcal {V}( {\textbf{u}}, {\textbf{v}})&:= \mathbb {E} \left[ \sum _{m=1}^M \tau \left\Vert \nabla ( u_m - v_m) \right\Vert _{L^2_x}^2 \right] , \end{aligned}$$
(5.11b)

where \({\textbf{u}}\in {\{{ {\textbf{u}}^{\text {Point}}, {\textbf{u}}^{\text {Aver}}}\}}\) and \({\textbf{v}}\in {\{{{\textbf{v}}^{\text {EM}},{\textbf{v}}^{\text {Full}},{\textbf{v}}^{\text {Half}} }\}} \).

In Fig.  we plot the time convergence of the error quantities (5.11a) and (5.11b). We approximate the expectation by the Monte-Carlo method with 20 samples. Additionally, we let \(\lambda = 1\) and \(V_h\) to be the space of piecewise linear, continuous elements on a uniform mesh with \(\left|V_h\right| = 121\). The average values (5.10) are approximated by \(r = 10\).

The numerical results support that \({\textbf{v}}^{\text {EM}}\) approximates the solution on the grid points, while \({\textbf{v}}^{\text {Full}}\) and \({\textbf{v}}^{\text {Half}}\) approximate the average values of the solution. The gap is due to the difference

$$\begin{aligned} \mathbb {E} \left[ \left\Vert \langle {u}\rangle _{I_m} - u(t_m)\right\Vert _{L^2_x}^2 \right] \ge c \tau \left\Vert u_0\right\Vert _{L^2_x}^2. \end{aligned}$$

Initially, we observe a preasymptotic effect. It stabilizes at the time scale \(\tau \approx 10^{-3}\). Afterwards the predicted convergence speed of order 1 is achieved.

Fig. 1
figure 1

Time convergence of \({\textbf{v}}^{\text {EM}}\) (red), \({\textbf{v}}^{\text {Half}}\) (blue) and \({\textbf{v}}^{\text {Full}}\) (green) towards \({\textbf{u}}^{\text {Point}}\) (dashed) respectively \({\textbf{u}}^{\text {Aver}}\) (solid) measured in the error terms \(\mathcal {E}\) (left) and \(\mathcal {V}\) (right) (color figure online)

5.2 Beyond known solutions

In general, a major obstacle is the absence of an analytic solution to the Eq. (2.11). In particular, the distance of the numerical solution and the analytic solution as presented in (3.17), (3.19) respectively (3.34) and (3.35) are non-computable.

To overcome this difficulty we measure the error of a fine reference approximation \({\textbf{v}}_f\) and a coarse approximation \({\textbf{v}}_c\). Both, \({\textbf{v}}_f\) and \({\textbf{v}}_c\), are generated via the same algorithm (either (3.14), (3.15) or (5.1)) on a fine respectively coarse scale. Let \(h_c \ge h_f >0\) be coarse respectively fine space mesh sizes. Similarly, let \(M_c, M_f \in \mathbb {N}\) be coarse respectively fine time discretization parameters with corresponding timestep sizes \(\tau _c\), \(\tau _f\). For simplicity, we assume \(M_c /M_f = r \in \mathbb {N}\). Now, the coarse intervals are generated by the fine ones, i.e.

$$\begin{aligned} I_m^c&:= [(m-1)\tau _c,m\tau _c] = \bigcup _{k=1}^r I_{(m-1)r + k}^f. \end{aligned}$$

The coarse averaging operator can be decomposed into the fine averaging operator

To accurately substitute the analytic averaging operator, we define the discrete time averaging operator

$$\begin{aligned} \langle {{\textbf{v}}_f}\rangle _{m}^r:= \frac{1}{r} \sum _{k=1}^r v_{(m-1)r + k}^f. \end{aligned}$$

Additionally, we define the error quantities

$$\begin{aligned} d_{L^\infty L^2}^{\,\text {aver}}( {\textbf{v}}_f, {\textbf{v}}_c)&:= \mathbb {E} \left[ \max _{m=1,\ldots , M_c} \left\Vert \langle {{\textbf{v}}_f}\rangle _{m}^r - v_m^c \right\Vert _{L^2_x}^2 \right] , \end{aligned}$$
(5.12a)
$$\begin{aligned} d_{L^\infty L^2}^{\,\text {point}}( {\textbf{v}}_f, {\textbf{v}}_c)&:= \mathbb {E} \left[ \max _{m=1,\ldots , M_c} \left\Vert v_{mr}^f - v_m^c \right\Vert _{L^2_x}^2 \right] , \end{aligned}$$
(5.12b)

and

$$\begin{aligned} d_{L^2 V}^{\,\text {classic}}( {\textbf{v}}_f, {\textbf{v}}_c)&:= \mathbb {E} \left[ \sum _{m=1}^{M_c} \frac{\tau _c}{r} \sum _{k=1}^r \left\Vert V(\nabla v^f_{(m-1)r + k}) - V(\nabla v^c_m )\right\Vert _{L^2_x}^2 \right] , \end{aligned}$$
(5.13a)
$$\begin{aligned} d_{L^2 V}^{\,\text {inner}}( {\textbf{v}}_f, {\textbf{v}}_c)&:= \mathbb {E} \left[ \sum _{m=1}^{M_c} \tau _c \left\Vert V(\nabla \langle {{\textbf{v}}_f}\rangle _{m}^r) - V(\nabla v^c_m )\right\Vert _{L^2_x}^2 \right] , \end{aligned}$$
(5.13b)
$$\begin{aligned} d_{L^2 V}^{\,\text {outer}}( {\textbf{v}}_f, {\textbf{v}}_c)&:= \mathbb {E} \left[ \sum _{m=1}^{M_c} \tau _c \left\Vert \langle { V(\nabla {\textbf{v}}_f)}\rangle _m^r - V(\nabla v^c_m )\right\Vert _{L^2_x}^2 \right] . \end{aligned}$$
(5.13c)

5.3 Joint sampling on fine and coarse scales

It is crucial to use the same stochastic input when computing \({\textbf{v}}_f\) and \({\textbf{v}}_c\). This can be done in two different ways. Either one first samples coarse stochastic data and a posteriori samples the fine data based on the conditional probabilities of the coarse one, or we can sample the fine stochastic input and try to reconstruct the coarse stochastic data. The latter approach is more suitable for the averaged increments.

Lemma 37

Let \(M_f, M_c \in \mathbb {N}\). Assume \(M_f = r M_c\) for some \(r \in \mathbb {N}\). Then

$$\begin{aligned} \Delta _1^c \mathbb {W}&= \sum _{l=1}^r \left( 1- \frac{l-1}{r} \right) \Delta _l^f \mathbb {W}, \end{aligned}$$
(5.14a)
$$\begin{aligned} \Delta ^c_j \mathbb {W}&= \sum _{l=0}^{r-1} \frac{l+1}{r} \Delta _{rj - l}^f \mathbb {W} +\sum _{l=0}^{r-2} \left( 1- \frac{l+1}{r }\right) \Delta _{r(j-1) -l}^f \mathbb {W}, \end{aligned}$$
(5.14b)

for \(j \in {\{{2,\ldots , M_c}\}}\).

Proof

Since \(M_f = r M_c\), we have \(\tau _c = r \tau _f\). Thus,

$$\begin{aligned} \Delta _1^c \mathbb {W}&= \frac{1}{\tau _c} \int _{0}^{\tau _c} W_s \,\textrm{d}s= \frac{1}{r \tau _f} \int _{0 \tau _f}^{r \tau _f} W_s \,\textrm{d}s= \frac{1}{r \tau _f} \sum _{l=1}^r \int _{(l-1)\tau _f}^{l\tau _f} W_s \,\textrm{d}s\\&= \frac{1}{r \tau _f} \sum _{l=1}^r \int _{(l-1)\tau _f}^{l\tau _f} W_s - W_{s-(l-1)\tau _f} \,\textrm{d}s+ \frac{1}{r \tau _f} \sum _{l=1}^r \int _{(l-1)\tau _f}^{l\tau _f} W_{s-(l-1)\tau _f} \,\textrm{d}s\\&=: \textrm{I} + \textrm{II}. \end{aligned}$$

Due to the discrete Fubini’s theorem

$$\begin{aligned} \textrm{I}&= \frac{1}{r \tau _f} \sum _{l=1}^r \int _{(l-1)\tau _f}^{l\tau _f} \sum _{l'=0}^{l-2} W_{s - l' \tau _f} - W_{s-(l'+1)\tau _f} \,\textrm{d}s\\&= \frac{1}{r \tau _f} \sum _{l=1}^r \sum _{l'=0}^{l-2} \int _{(l-l'-1)\tau _f}^{(l-l')\tau _f} W_{s } - W_{s-\tau _f} \,\textrm{d}s\\&= \frac{1}{r} \sum _{l=1}^r \sum _{l'=0}^{l-2} \Delta _{l-l'}^f \mathbb {W} = \sum _{l=2}^r \left( 1+ \frac{1-l}{r} \right) \Delta _l^f \mathbb {W}. \end{aligned}$$

The second term is easily computed

$$\begin{aligned} \textrm{II} = \frac{1}{r \tau _f} \sum _{l=1}^r \int _{0}^{\tau _f} W_{s} \,\textrm{d}s= \Delta _1^f \mathbb {W}. \end{aligned}$$

Overall,

$$\begin{aligned} \Delta _1^c \mathbb {W} = \sum _{l=1}^r \left( 1+ \frac{1-l}{r} \right) \Delta _l^f \mathbb {W}. \end{aligned}$$

Let \(j \in {\{{2,\ldots ,M_c}\}}\). Since \(t_{j}^c = t_{rj}^f\), it holds

$$\begin{aligned} \Delta ^c_j \mathbb {W}&= \frac{1}{\tau _c} \int _{t_{j-1}^c}^{t_j^c} W_s - W_{s-\tau _c} \,\textrm{d}s= \frac{1}{r \tau _f} \int _{t_{r(j-1)}^f}^{t_{rj}^f} W_s - W_{s-r\tau _f} \,\textrm{d}s\\&= \frac{1}{r \tau _f} \sum _{l=0}^{r-1} \int _{t_{rj - (l + 1)}^f}^{t_{r j - l}^f} \sum _{l' = 0 }^{r-1} W_{s-l' \tau _f} - W_{s-(l'+1) \tau _f} \,\textrm{d}s\\&= \frac{1}{r \tau _f} \sum _{l=0}^{r-1} \sum _{l' = 0 }^{r-1} \int _{t_{rj - (l + l' + 1)}^f}^{t_{r j - (l + l') }^f} W_{s} - W_{s- \tau _f} \,\textrm{d}s= \frac{1}{r} \sum _{l=0}^{r-1} \sum _{l' = 0 }^{r-1} \Delta _{rj - (l+l')}^f \mathbb {W}. \end{aligned}$$

The discrete Fubini’s theorem implies

$$\begin{aligned} \sum _{l=0}^{r-1} \sum _{l' = 0 }^{r-1} \Delta _{rj - (l+l')}^f \mathbb {W} = \sum _{l=0}^{r-1} (l+1) \Delta _{rj - l}^f \mathbb {W} +\sum _{l=0}^{r-2} (r-(l+1)) \Delta _{r(j-1) -l}^f \mathbb {W}. \end{aligned}$$

Therefore,

$$\begin{aligned} \Delta ^c_j \mathbb {W} = \sum _{l=0}^{r-1} \frac{l+1}{r} \Delta _{rj - l}^f \mathbb {W} +\sum _{l=0}^{r-2} \left( 1- \frac{l+1}{r }\right) \Delta _{r(j-1) -l}^f \mathbb {W}. \end{aligned}$$

The claim is proved. \(\square \)

Remark 38

The reconstruction formula (5.14) is the key ingredient, why it is possible to compare fine and coarse numerical solutions in an efficient way. If one tries to establish a corresponding formula for randomized algorithms as proposed in [18], this task becomes more challenging.

5.4 Simulation: unknown solution

Let \(\mathcal {O} = (0,1)^2\) and \(T = 1\). We choose \(p \in {\{{1.5,3}\}}\), \(u_0(x) = \sin (\pi x_1) \sin (\pi x_2)\) and

$$\begin{aligned} G(u)\Delta W&:= \underbrace{\sin (\pi x_1)x_2 u}_{=: g_1(x,u)} \Delta \beta ^1 + \underbrace{\sin (\pi x_2)x_1 u}_{=: g_2(x,u)} \Delta \beta ^2. \end{aligned}$$
(5.15)

\(V_h\) denotes the space of piece wise linear elements with zero boundary values on a triangulation \(\mathcal {T}_h\) of \(\mathcal {O}\). We initialize the coarse triangulation \(\mathcal {T}_{h_c}\) as a uniform triangulation of \(\mathcal {O}\) such that \(\left|V_{h_c}\right| = 121\) and \(h_c \approx 1.4*10^{-1}\). \(\mathcal {T}_{h_f}\) is generated by three uniform refinements of \(\mathcal {T}_{h_c}\). Then \(\left|V_{h_f}\right| = 6561\) and \(h_f \approx 1.7*10^{-2}\). For the time discretization we use \(M_f = 1280\) and \(M_c = 40\). Therefore \(\tau _f \approx 7.8*10^{-4}\) and \(\tau _c \approx 2.5* 10^{-2}\). We measure the error of the fine numerical solution \({\textbf{v}}_f \in {\{{{\textbf{v}}_f^{\text {EM}},{\textbf{v}}_f^{\text {Half}},{\textbf{v}}_f^{\text {Full}}}\}}\) versus the coarse numerical solution \({\textbf{v}}_c \in {\{{{\textbf{v}}_c^{\text {EM}},{\textbf{v}}_c^{\text {Half}},{\textbf{v}}_c^{\text {Full}}}\}} \) of the same algorithm in the error quantities (5.12) and (5.13). The expectation is approximated by the Monte-Carlo method with 20 samples.

In Fig.   respectively Fig.   we plot the evolution of the error for \(p=3\) respectively \(p = 1.5\). In both cases we observe linear convergence. This indicates that on the used discretization scale the space error dominates the time error. We do not see a substantial difference in the gradient error quantities. Additionally, \({\textbf{v}}^{\text {EM}}\) measured in the point distance \(d_{L^\infty L^2}^{\,\text {point}}\) and \({\textbf{v}}^{\text {Half}}\) measured in the averaged distance \(d_{L^\infty L^2}^{\,\text {aver}}\) perform equally well.

If \(p=3\) then \({\textbf{v}}^{\text {Full}}\) behaves similarly in both error terms and performs slightly worse than \({\textbf{v}}^{\text {EM}}\) and \({\textbf{v}}^{\text {Half}}\). Contrary in the case \(p=1.5\), while still performing slightly worse compared to \({\textbf{v}}^{\text {EM}}\) and \({\textbf{v}}^{\text {Half}}\), \({\textbf{v}}^{\text {Full}}\) is better approximated in \(d_{L^\infty L^2}^{\,\text {aver}}\) than in \(d_{L^\infty L^2}^{\,\text {point}}\). This indicates that at least in the singular setting \({\textbf{v}}^{\text {Full}}\) also approximates average values.

Fig. 2
figure 2

Convergence for \(p=3\) of \({\textbf{v}}_c^{\text {EM}}\) (red), \({\textbf{v}}_c^{\text {Half}}\) (blue) and \({\textbf{v}}_c^{\text {Full}}\) (green) towards \({\textbf{v}}_f\) measured in \(d_{L^\infty L^2}^{\,\text {aver}}\) (solid, left), \(d_{L^\infty L^2}^{\,\text {point}}\) (dashed, left), \(d_{L^2 V}^{\,\text {classic}}\) (solid, right), \(d_{L^2 V}^{\,\text {inner}}\) (dashed, right) and \(d_{L^2 V}^{\,\text {outer}}\) (dash dotted, right). In the top row we use \(\tau \approx h\) and in the bottom row \(\tau \approx h^2\) (color figure online)

Fig. 3
figure 3

Convergence for \(p=1.5\) of \({\textbf{v}}_c^{\text {EM}}\) (red), \({\textbf{v}}_c^{\text {Half}}\) (blue) and \({\textbf{v}}_c^{\text {Full}}\) (green) towards \({\textbf{v}}_f\) measured in \(d_{L^\infty L^2}^{\,\text {aver}}\) (solid, left), \(d_{L^\infty L^2}^{\,\text {point}}\) (dashed, left), \(d_{L^2 V}^{\,\text {classic}}\) (solid, right), \(d_{L^2 V}^{\,\text {inner}}\) (dashed, right) and \(d_{L^2 V}^{\,\text {outer}}\) (dash dotted, right). In the top row we use \(\tau \approx h\) and in the bottom row \(\tau \approx h^2\) (color figure online)

5.5 Conclusion

Our algorithms achieve optimal linear convergence in space and optimal 1/2-convergence in time with minimal regularity assumptions. Experimentally, the Euler–Maruyama scheme (5.1) and our algorithms (3.14) and (3.15) share the same computational complexity and accuracy. However, the convergence of the Euler–Maruyama scheme is only proven with rate 1/4, cf. [39]. The construction of the random inputs is fast and can be done with the simple sampling algorithm (4.9).