1 Introduction

1.1 Space–time variational formulation of the heat equation

This paper is about the numerical solution of second-order parabolic evolution equations in a simultaneous space–time variational formulation. For convenience only we restrict the discussion to the model problem of the heat equation with homogeneous Dirichlet (lateral) boundary conditions on the time–space cylinder \(Q:=I \times \Omega \), where \(I:=(0,T)\), and \(\Omega \subset \mathbb R^d\) is a bounded Lipschitz domain. The common space–time variational formulation of finding u that, for given data \(f,u_0\), satisfies

$$\begin{aligned} \left\{ \begin{array}{l} \displaystyle \iint _Q \partial _t u v +\nabla _\textbf{x} u \cdot \nabla _\textbf{x} v\,dt\,d\textbf{x}= \displaystyle \iint _Q f v \,dt\,d\textbf{x}\\ \qquad \displaystyle \int _\Omega u(0,\cdot ) w \,d\textbf{x} = \displaystyle \displaystyle \int _\Omega u_0 w \,d\textbf{x} \end{array}\right. \end{aligned}$$
(1.1)

for all ‘test functions’ vw, induces a boundedly invertible operator from X to \(Y' \times L_2(\Omega )\), where

$$\begin{aligned} X:=L_2\big (I;H^1_0(\Omega )\big ) \cap H^1\big (I;H^{-1}(\Omega )\big ), \quad Y:=L_2\big (I;H^1_0(\Omega )\big ) \end{aligned}$$

(see, e.g., [7] or [24]). Already because \(X \ne Y \times L_2(\Omega )\), the corresponding bilinear form is non-coercive, so that one cannot resort to simple conforming Galerkin discretizations. See, however [17, 20], for methods that apply in the case of a homogeneous initial condition.

1.2 Minimal residual discretization

The approach introduced by Andreev in [1] is to consider minimal residual (or least squares) discretizations. They amount to minimizing the residual over a selected finite-dimensional trial space \(X^\delta \subset X\). Since the residual of the PDE lives in the dual space \(Y'\), however, its norm cannot be evaluated exactly and so has to be approximated by the dual norm over a suitable finite dimensional test subspace \(Y^\delta \subset Y\). The resulting minimal residual solution from \(X^\delta \) is a quasi-best approximation from \(X^\delta \) when the pair \((X^\delta ,Y^\delta )\) satisfies a (uniform) Ladyshenskaja–Babus̆ka–Brezzi (LBB) condition. Unfortunately, so far, the verification of this LBB condition has been restricted to pairs of finite element spaces w.r.t. partitions of Q that permit a decomposition into time-slabs [22, 23]. Consequently partitions that are locally refined simultaneously in space and time are not covered.

Remark 1.1

To circumvent the time-slab restriction, as an alternative approach in [21] uniform LBB-stability was demonstrated for trial and test spaces that are spanned by (adaptively generated) sets of tensor products of temporal wavelets and spatial finite element spaces.

1.3 FOSLS

By writing the forcing function \(f \in Y'\) (non-uniquely) as \(f_1 + {{\,\textrm{div}\,}}_{\textbf{x}} \textbf{f}_2\) for some \(f_1\in Y'\) and \(\textbf{f}_2\in L_2(Q)^d\), the heat equation for \(u=u_1\) can equivalently be written as the first-order system

$$\begin{aligned} G\textbf{u}:= \left( \begin{array}{@{}c@{}} \partial _t u_1+{{\,\textrm{div}\,}}_\textbf{x} \textbf{u}_2 \\ -\textbf{u}_2 - \nabla _\textbf{x} u_1 \\ u_1(0,\cdot )\end{array} \right) = \left( \begin{array}{@{}c@{}} f_1 \\ \textbf{f}_2\\ u_0 \end{array} \right) ,\quad u_1|_{I \times \partial \Omega } = 0. \end{aligned}$$
(1.2)

From the well-posedness of the space–time variational formulation of the second-order equation, and the boundedness of \({{\,\textrm{div}\,}}_\textbf{x}:L_2(Q)^d \rightarrow Y'\) and \(\nabla _\textbf{x}:X \rightarrow L_2(Q)^d\), one can infer that G is a boundedly invertible operator from

$$\begin{aligned} X \times L_2(Q)^d\quad \text {to} \quad Y' \times L_2(Q)^d \times L_2(\Omega ) \end{aligned}$$

(see [18, Lem. 2.3 & Rem. 2.4]). Because of the presence of the dual space \(Y'\), however, least-squares finite element discretizations of this first-order system suffer from the same restrictions imposed by an LBB condition.

As was first suggested in [5, §9.1.4], a solution for this problem is to restrict to functions \(\textbf{u}=(u_1,\textbf{u}_2)\) for which \({{\,\textrm{div}\,}}\textbf{u}:=\partial _t u_1+{{\,\textrm{div}\,}}_\textbf{x} \textbf{u}_2 \in L_2(Q)\). Doing so, in [11] it was proven that G is a boundedly invertible operator from

$$\begin{aligned} U:= \{\textbf{u}=(u_1,\textbf{u}_2) \in X \times L_2(Q)^d:{{\,\textrm{div}\,}}\textbf{u} \in L_2(Q)\}, \end{aligned}$$
(1.3)

equipped with the graph norm, to its range in

$$\begin{aligned} L:=L_2(Q)\times L_2(Q)^d \times L_2(\Omega ), \end{aligned}$$

i.e., \(\Vert G \textbf{u}\Vert _L \eqsim \Vert \textbf{u}\Vert _U\) (\(\textbf{u} \in L\)).Footnote 1 Consequently, for any \(\textbf{f}=(f_1,\textbf{f}_2,u_0) \in {{\,\textrm{ran}\,}}G\), and any closed subspace \(U^\delta \subset U\), finding \(\textbf{u}^\delta :={{\,\textrm{argmin}\,}}_{\textbf{v} \in U^\delta }\Vert \textbf{f}-G \textbf{v}\Vert _L\) is a well-posed first-order system least-squares (FOSLS) discretisation of \(G \textbf{u}=\textbf{f}\). Because the L-norm can be efficiently exactly evaluated, in [5] such a least squares discretisation is referred to as being ‘practical’. The resulting \(\textbf{u}^\delta \) is a quasi-best approximation from \(U^\delta \) to the solution \(\textbf{u}\) in the norm on U, i.e., \(\Vert \textbf{u}-\textbf{u}^\delta \Vert _U \lesssim \inf _{\textbf{v} \in U^\delta } \Vert \textbf{u}-\textbf{v}\Vert _U\).

Adding to the results from [11], in [14] the aforementioned range of G was shown to be the full space L. Furthermore, it was shown that

$$\begin{aligned} U:= \{\textbf{u}=(u_1,\textbf{u}_2) \in L_2(I;H^1_0(\Omega )) \times L_2(Q)^d:{{\,\textrm{div}\,}}\textbf{u} \in L_2(Q)\}, \end{aligned}$$

equipped with its graph norm, gives an equivalent definition of U from (1.3). Finally, assuming that \(f \in Y'\) is given as \(f=f_1 + {{\,\textrm{div}\,}}_{\textbf{x}} \textbf{f}_2\) for some \(f_1\in L_2(Q)\) and \(\textbf{f}_2\in L_2(Q)^d\), which always exist by Riesz’ representation theorem, it was shown that the first-order system formulation with spaces U and L applies whenever the space–time variational formulation of second order from Sect. 1.1 does. We also mention our recent generalization to the instationary Stokes problem with so-called slip boundary conditions [15].

Another advantage of the FOSLS approach is that it corresponds to a bilinear form that is symmetric and positive definite which is beneficial in several applications (cf. [12, 16]). Futhermore, being of least-squares type, the method comes with a built-in efficient and reliable a posteriori estimator for the error in any approximation, being the L-norm of its residual. The square of this norm naturally splits into contributions associated to the individual elements which can be used to drive an adaptive solution routine. Plain convergence of this routine has been demonstrated in [14], where in practice one observes that this adaptive routine selects a sequence of partitions that results in the best possible convergence rate.

1.4 Low rates for non-smooth solutions

For smooth solutions, this FOSLS works fully satisfactorily. When applying (vectorial) continuous piecewise polynomial approximation of degree k w.r.t. a quasi-uniform partition of Q into \((d+1)\)-simplices the error in U-norm is of the optimal order \( \text {DoF}^{-\frac{k}{d+1}}\).

Obviously for solutions that are insufficiently smooth, reduced rates are obtained. The experiments reported in [11], however, show that for several problems adaptivity hardly improves these rates, which remain disappointingly low. For example, for the heat equation on \(\Omega =(0,1)\), with forcing function \(f(=f_1)=2\) and initial condition \(u_0=1\), with a quasi-uniform partition of \(Q=I \times \Omega \) into triangles and \(k=1\), in [11] a rate equal to 0.08 was reported. A low rate was to be expected due to the singularities in the solution at the corners (0, 0) and (0, 1) of Q as a consequence of the discontinuity between initial and boundary conditions. But even with adaptivity this rate improved to only 0.17. The observed rate for \(\Omega =(0,1)^2\), \(f=0\), and \(u_0=1\) and a quasi-uniform partition of Q into tetrahedra was 0.07, which even did not improve at all by adaptive refinements.

The disappointingly low rates for non-smooth solutions with the FOSLS method whilst applying adaptive refinements can be seen as a prize to be paid for the inclusion of the condition \({{\,\textrm{div}\,}}\textbf{u} \in L_2(Q)\) in the definition of the space U. Indeed, with for example linear trial spaces for \(u_1\) and \(\textbf{u}_2\), the (local) approximation error in \(\textbf{u}\) measured in \(\Vert {{\,\textrm{div}\,}}\cdot \Vert _{L_2(Q)}\), being part of the graph norm on U, is of the order of the (local) mesh-size times the maximum of the (local) \(H^2\) semi-norms of \(u_1\) and \(\textbf{u}_2\). Realizing that \(u_1=u\), and, for \(\textbf{f}_2=0\), \(\textbf{u}_2=-\nabla _\textbf{x} u_1\), we conclude that this (local) approximation error is of the order of the (local) mesh-size times the maximum of the (local) \(L_2\)-norm of second- and third- order derivatives of u. Similar bounds on the (local) approximation errors in \(u_1\) and \(\textbf{u}_2\) in \(L_2(I;H^1_0(\Omega ))\)- or \(L_2(Q)^d\)-norms involve at most second-order derivatives of u. Obviously, similar considerations apply to finite element spaces of higher order.

1.5 Towards a remedy

To cure this problem, an idea is to apply finite element spaces that are specially designed for the space \(H({{\,\textrm{div}\,}};Q)\), as e.g. the Raviart–Thomas finite elements. Thanks to their ‘commuting diagram property’, for, say, \( RT _0\)-elements the (local) approximation error in \(\textbf{u}\) measured in \(\Vert {{\,\textrm{div}\,}}\cdot \Vert _{L_2(Q)}\) is of the order of the (local) mesh-size times the (local) \(H^1\) semi-norm of \({{\,\textrm{div}\,}}\textbf{u}\), the latter being equal to \(f_1\). In many cases, the condition that \(f_1\) has some smoothness is much milder than the corresponding condition on the smoothness of both terms \(\partial _t u_1\) and \({{\,\textrm{div}\,}}_\textbf{x} \textbf{u}_2\) separately. Unfortunately, because of the condition \(u_1 \in L_2(I;H^1_0(\Omega ))\) incorporated in the definition of U, \(H({{\,\textrm{div}\,}};Q)\)-conformity of the finite element space does not suffice, and therefore the Raviart–Thomas elements are not applicable.

In [6] the space of vectorial continuous piecewise linears was enriched with bubbles associated to the faces of the simplices such that the range of the divergence operator applied to this trial space equals the space of piecewise constants. This trial space, which is \(H^1(Q)\)- and thus U-conforming, comes with a quasi-interpolator (cf. also [3]) that satisfies a commuting diagram so that the (local) approximation error in \(\textbf{u}\) measured in \(\Vert {{\,\textrm{div}\,}}\cdot \Vert _{L_2(Q)}\) is of the order of the (local) mesh-size times the local \(H^1\) semi-norm of \({{\,\textrm{div}\,}}\textbf{u}\). Unfortunately, in our setting, where \(\textbf{u}=(u_1,\textbf{u}_2)\) and \(\textbf{u}_2=-(\textbf{f}_2+\nabla _\textbf{x} u_1)\), it turns out that with this space the problem of the appearance of higher order derivatives of u in the local error bounds manifests itself at another place. Due to the fact that not all faces of a \((d+1)\)-simplex inside Q are either parallel or perpendicular to the bottom \(\{0\}\times \Omega \) of Q, the local quasi-interpolation error in \(u_1\) in the \(L_2 \otimes H^1\)-norm depends on derivatives of \(\textbf{u}_2\), and thus on derivatives of u that are of one order higher than desirable. Numerical experiments with the FOSLS method for this trial space do not show better rates than without the inclusion of these bubbles.

1.6 The approach from this work

We consider finite element subspaces of U w.r.t. prismatic partitions of the time–space cylinder Q. The space of local shape functions on each prism will be a Cartesian product of tensor products of finite element spaces in time and space. In Sect. 2, we construct a local quasi-interpolant that satisfies the desired commuting diagram.

Defining the global quasi-interpolant piecewise as this local quasi-interpolant would correspond to a non-conforming finite element space, as the \(u_1\)-component would be discontinuous over element interfaces in the spatial direction and therefore not in \(L_2(I;H^1_0(\Omega ))\). By constructing a global quasi-interpolant by averaging the local quasi-interpolants over these interfaces, the resulting global finite element space is conforming. Although consequently a truly commuting diagram is lost, in Sect. 3.1, for locally conforming prismatic partitions, we show a local upper bound for the global interpolation error that is satisfactory in the following sense: For the same convergence rate, it requires less local smoothness of the solution than a corresponding interpolation error bound with simplicial finite element spaces.

Having the aim to allow for local (adaptive) refinements, appropriate for non-smooth solutions, nonconforming prismatic partitions have to be considered. In Sect. 3.2, we demonstrate that also on local patches which are nonconforming, the local upper bound for the global interpolation error has the right order, albeit under stronger smoothness requirements on the solution.

The numerical experiments for an adaptive solver presented in Sect. 4 demonstrate the advantage of the use of prismatic elements for various singular solutions compared to the use of standard simplicial elements.

We also mention that prismatic partitions provide the possibility to use different (local) mesh-sizes in time and space. For an ultra-weak DPG reformulation of the system (1.2) introduced in [8], a so-called parabolic scaling of the temporal and spatial mesh-sizes turns out to be beneficial for certain singular solutions, demonstrating the potential usefulness of anisotropic prisms. However, while some theoretical results in this work apply to anisotropic prismatic meshes as well, the main results and the numerical experiments are restricted to isotropic prismatic partitions.

2 The finite element

As element domain we consider a prism \(P=J \times K\), where J is a (closed) interval and K a (closed) d-simplex. We will use the notations \(h_J:={{\,\textrm{diam}\,}}J\) and \(h_K:={{\,\textrm{diam}\,}}K\).

For \(\ell , k \in \mathbb N_0\), we consider the space of shape functions given by

$$\begin{aligned} {\mathcal S}_{\ell ,k}(P):=\big ({\mathcal P}_{\ell +1}(J) \otimes {\mathcal P}_k(K)\big ) \times \big ({\mathcal P}_\ell (J) \otimes RT _k(K)\big ), \end{aligned}$$

with \( RT _k(K)\) being the Raviart–Thomas element of order k, see, e.g., [4, III.3]. For \(d=1\), this space should be read as \({\mathcal P}_{k+1}(K)\).

The definition of the finite element is completed by the specification of the degrees of freedom (DoFs) for both \({\mathcal P}_{\ell +1}(J) \otimes {\mathcal P}_k(K)\) and \({\mathcal P}_\ell (J) \otimes RT _k(K)\). As DoFs for \({\mathcal P}_{\ell +1}(J)\) we take

$$\begin{aligned} p(t) \quad (t \in \partial J)\quad \text {and, when } \ell \ge 1,\quad \int _J p q \,dt \quad (q \in {\mathcal P}_{\ell -1}(J)), \end{aligned}$$
(2.1)

and as DoFs for \({\mathcal P}_{k}(K)\),

$$\begin{aligned} \int _K p q \,d\textbf{x}\quad (q \in {\mathcal P}_k(K)). \end{aligned}$$

The DoFs for \({\mathcal P}_{\ell +1}(J) \otimes {\mathcal P}_k(K)\) are the tensor products of the DoFs for both factors.

As DoFs for \({\mathcal P}_{\ell }(J)\) we take

$$\begin{aligned} \int _J p q \,dt \quad (q \in {\mathcal P}_{\ell }(J)), \end{aligned}$$
(2.2)

and as DoFs for \( RT _k(K)\),

$$\begin{aligned}{} & {} \int _{\partial K} \textbf{p}\cdot \textbf{n} \,q\,d\textbf{s} \quad (q \in R_k(\partial K)) \quad \text {and, when } k \ge 1,\int _K \textbf{p}\cdot \textbf{q} \,d\textbf{x}\quad (\textbf{q} \in {\mathcal P}_{k-1}(K)^d), \end{aligned}$$
(2.3)

where \(R_k(\partial K)=\{q \in L_2(\partial K):q|_e \in P_k(e) \text { for all facets } e \text{ of } K\}\). Again, the DoFs for the tensor product \({\mathcal P}_\ell (J) \otimes RT _k(K)\) are the tensor products of the DoFs for both factors.

Remark 2.1

Notice that the DoFs for \({\mathcal P}_{\ell +1}(J)\) were chosen so that in a certain sense this space plays the role of a one-dimensional Raviart–Thomas space of order \(\ell \). A difference, however, is that, for \(d>1\), \( RT _k(K) \subsetneq {\mathcal P}_{k+1}(K)^d\).

2.1 The local interpolant and a commuting diagram

Above DoFs for \({\mathcal S}_{\ell ,k}(P)\) are well-defined on \(\big (C(J) \otimes L_1(K)\big ) \times \big (L_1(J) \otimes \big (L_s(K)^d \cap H({{\,\textrm{div}\,}};K)\big )\big )\) whenever \(s>2\), see [10, §17.1–2].Footnote 2 On such a space they define a quasi-interpolator \({\mathcal I}_{\ell ,k}^P={\mathcal I}_{\ell ,k}^{P,1}\times {\mathcal I}_{\ell ,k}^{P,2}:\textbf{v} \mapsto {\mathcal I}_{\ell ,k}^{P} \textbf{v} \in {\mathcal S}_{\ell ,k}(P)\) by imposing that \({\mathcal I}_{\ell ,k}^{P} \textbf{v}\) has the same DoFs as \(\textbf{v}\). By construction, this \({\mathcal I}_{\ell ,k}^P\) is thus a projector onto \({\mathcal S}_{\ell ,k}(P)\).

Proposition 2.2

(Commuting diagram property) With \(Q^P_{\ell ,k}\) denoting the \(L_2(P)\)-orthogonal projector onto \({\mathcal P}_\ell (J) \otimes {\mathcal P}_k(K)\), it holds that

$$\begin{aligned} {{\,\textrm{div}\,}}{\mathcal I}_{\ell ,k}^P=Q^P_{\ell ,k} {{\,\textrm{div}\,}}. \end{aligned}$$

Proof

Clearly \({{\,\textrm{ran}\,}}{{\,\textrm{div}\,}}{\mathcal I}_{\ell ,k}^P \subset {\mathcal P}_\ell (J) \otimes {\mathcal P}_k(K)\). For \(p \in {\mathcal P}_\ell (J)\), \(q \in {\mathcal P}_k(K)\), and \(\textbf{v}=(v_1,\textbf{v}_2)\), and with \(\textbf{n}=(n_t,\textbf{n}_\textbf{x})\) the outward pointing normal vector on \(\partial P\), integration by parts gives

$$\begin{aligned}&\int _J \int _K {{\,\textrm{div}\,}}(\textbf{v}- {\mathcal I}_{\ell ,k}^P\textbf{v})(t,\textbf{x}) p(t) q(\textbf{x})\,dt\,d\textbf{x}\\&\quad = -\int _J \int _K (v_1-{\mathcal I}^{P,1}_{\ell ,k} v_1) p'(t) q(\textbf{x})+ (\textbf{v}_2-{\mathcal I}^{P,2}_{\ell ,k} \textbf{v}_2)\cdot p(t)\nabla q(\textbf{x}) \,dt\,d\textbf{x}\\&\quad + \int _K \int _{\partial J} (v_1-{\mathcal I}^{P,1}_{\ell ,k} v_1) n_t p(s) q(\textbf{x})\,ds \,d\textbf{x}\\&\quad + \int _J \int _{\partial K} (\textbf{v}_2-{\mathcal I}^{P,2}_{\ell ,k} \textbf{v}_2) \cdot \textbf{n}_{\textbf{x}} p(t) q(\textbf{y})\,d\textbf{y} \,dt=0 \end{aligned}$$

by the definition of the DoFs. This completes the proof. \(\square \)

Corollary 2.3

For \(k' \in \{0,\ldots ,k\}\), \(k'' \in \{0,\ldots ,k+1\}\), \(\ell ' \in \{0,\ldots ,\ell +1\}\), it holds that

(i) \(\displaystyle \Vert {{\,\textrm{div}\,}}(\textbf{v} -{\mathcal I}_{\ell ,k}^P\textbf{v})\Vert _{L_2(P)} \lesssim h_J^{\ell '} |{{\,\textrm{div}\,}}\textbf{v}|_{H^{\ell '}(J)\otimes L_2(K)} + h_K^{k''} |{{\,\textrm{div}\,}}\textbf{v}|_{L_2(J)\otimes H^{k''}(K)}\),

(ii) \(\displaystyle \Vert \nabla _\textbf{x}(v_1 -{\mathcal I}^{P,1}_{\ell ,k}v_1)\Vert _{L_2(P)^d} \lesssim h_J^{\ell '+1} |v_1|_{H^{\ell '+1}(J)\otimes H^1(K)} + h_K^{{k'}} |v_1|_{L_2(J)\otimes H^{{k'}+1}(K)}\),

(iii) \(\displaystyle \Vert \textbf{v}_2 -{\mathcal I}^{P,2}_{\ell ,k}{} \textbf{v}_2\Vert _{L_2(P)^d} \lesssim h_J^{\ell '} |\textbf{v}_2|_{H^{\ell '}(J)\otimes L_2(K)^d} + h_K^{{k'}+1} |\textbf{v}_2|_{L_2(K)\otimes H^{{k'}+1}(K)^d}\),

with hidden constants depending only on the shape regularity of K, the space dimension d, and the polynomial degrees \(\ell \) and k.

Proof

Let \(\rho _\ell ^{J}\), \(\rho _{k}^{K}\) denote the projectors on \({\mathcal P}_{\ell +1}(J)\) and \( RT _k(K)\) defined by the DoFs from (2.1) and (2.3), respectively, and let \(Q_{\ell }^{J}\) and \(Q_{k}^{K}\) denote the \(L_2\)-orthogonal projectors onto \({\mathcal P}_{\ell }(J)\) and \({\mathcal P}_{k}(K)\), respectively, so that

$$\begin{aligned} {\mathcal I}^{P,1}_{\ell ,k}=\rho _\ell ^{J}\otimes Q_k^{K},\quad {\mathcal I}^{P,2}_{\ell ,k}=Q_{\ell }^{J}\otimes \rho _{k}^{K}. \end{aligned}$$

It is known, see, e.g., [10, Thm. 16.4], that for \(s \in [1,\infty ]\),Footnote 3

$$\begin{aligned} \big \Vert \big (\text {Id}-\rho _k^K\big ) \textbf{w}\big \Vert _{L_s(K)^d} \lesssim h_K^{k'+1} |\textbf{w}|_{W^{k'+1}_s(K)^d} \quad \big (\textbf{w} \in W^{k'+1}_s(K)^d\big ). \end{aligned}$$
(2.4)

Similarly, we have \(\Vert (\text {Id}-\rho _\ell ^{J}) w\Vert _{L_2(J)} \lesssim h_J^{\ell '+1} |w|_{H^{\ell '+1}(J)}\) \((w \in H^{\ell '+1}(J))\).

Thanks to Proposition 2.2, by writing

$$\begin{aligned} \text {Id}\otimes \text {Id}-Q_{\ell }^{J}\otimes Q_k^{K}&=\big (\text {Id}-Q_{\ell }^{J}\big )\otimes \text {Id}+Q_k^{K}\otimes \big (\text {Id}-Q_k^{K}\big ),\\ \text {Id}\otimes \text {Id}-\rho _\ell ^{J}\otimes Q_k^{K}&=\big (\text {Id}-\rho _\ell ^{J}\big )\otimes Q_k^{K}+\text {Id}\otimes \big (\text {Id}-Q_k^{K}\big ),\\ \text {Id}\otimes \text {Id}-Q_{\ell }^{J}\otimes \rho _{k}^{K}&=\big (\text {Id}-Q_{\ell }^{J}\big )\otimes \text {Id}+Q_{\ell }^{J} \otimes \big (\text {Id}-\rho _{k}^{K}\big ), \end{aligned}$$

standard estimates for the orthogonal projectors give the results. \(\square \)

Remark 2.4

For simplicity Corollary 2.3 is restricted to \(L_2\)-based Sobolev norms with integer smoothness indices. Error bounds in terms of general \(L_p\)-based Sobolev-norms, possibly with fractional smoothness indices can be derived as well.

In the isotropic case of \(h_J \eqsim h_K\), the best choice for the polynomial degrees \(\ell \) and k is \(\ell +1=k\). Indeed, increasing either \(\ell \) or k will not improve the estimate of minimal order among the upper bounds of Corollary 2.3. By taking \(\ell '=k=k''\), \(\ell '+1=k=k'\), and \(\ell '=k=k'+1\) in the bounds (i), (ii), and (iii), respectively, which minimizes the regularity conditions without affecting the minimal order, one arrives at the following upper bound for the interpolation error in the local U-norm:

Corollary 2.5

With

$$\begin{aligned} \Vert \textbf{v}\Vert _{U,P}:=\sqrt{\Vert \nabla _\textbf{x}v_1\Vert ^2_{L_2(P)^d}+\Vert \textbf{v}_2\Vert _{L_2(P)^d}^2+\Vert {{\,\textrm{div}\,}}\textbf{v}\Vert _{L_2(P)}^2}, \end{aligned}$$
(2.5)

for \(h_P:=h_J \eqsim h_K\) and \(k \in \mathbb N\), it holds that

$$\begin{aligned} \Vert \textbf{v} -{\mathcal I}_{k-1,k}^P \textbf{v}\Vert _{U,P} \lesssim h_P^k \big (|\nabla _\textbf{x} v_1|_{H^k(P)^d}+|\textbf{v}_2|_{H^k(P)^d}+ |{{\,\textrm{div}\,}}\textbf{v}|_{H^k(P)}\big ). \end{aligned}$$

In particular, for \(\textbf{u}=(u_1,\textbf{u}_2)\) being the solution of (1.2), i.e., \(u_1=u\) being the solution of the heat equation, it holds that

$$\begin{aligned} \Vert \textbf{u} -{\mathcal I}_{k-1,k}^P \textbf{u}\Vert _{U,P} \lesssim h_P^k \left( |\nabla _\textbf{x} u|_{H^k(P)^d}+|\textbf{f}_2|_{H^k(P)^d}+ |f_1|_{H^k(P)}\right) . \end{aligned}$$
(2.6)

Remark 2.6

For comparison, to obtain a local error bound of order k with a trial space on a \((d+1)\)-simplex T, one needs a trial space that contains \({\mathcal P}_k(T)^{d+1}\). Then, with \(h_T:=|T|^{\frac{1}{d+1}}\), the bound on the local interpolation error in the \(\Vert \cdot \Vert _{U,P}\)-norm reads as \(h_T^k |\textbf{v}|_{H^{k+1}(T)^{d+1}}\), and so for \(\textbf{v}=\textbf{u}\), as

$$\begin{aligned} h_T^k \left( |u|_{H^{k+1}(T)}+|\nabla _\textbf{x}u|_{H^{k+1}(T)^d}+|\textbf{f}_2|_{H^{k+1}(T)^d}\right) , \end{aligned}$$

which requires more smoothness of u than the bound (2.6). As was already mentioned in the introduction, for these simplicial elements the addition of bubbles in order to realize a commuting diagram for the divergence operator does not give rise to better results.

3 Global finite element space

3.1 Conforming partition into prisms

Let \({\mathcal P}\) be an (essentially) disjoint partition of the time–space cylinder \({\overline{Q}}\) into prisms \(P=J \times K\) for closed intervals \(J \subset I\) and uniformly shape-regular closed d-simplices \(K \subset \Omega \). In this subsection we assume that \({\mathcal P}\) is conforming in the sense that if the intersection of two different prisms from \({\mathcal P}\) has a positive d-dimensional measure, then these prisms share a common facet. This means that \({\mathcal P}\) has to be a Cartesian product of an essentially disjoint partition \({\mathcal J}\) of \({\overline{I}}\) into subintervals J and an essentially disjoint conforming partition \({\mathcal K}\) of \({\overline{\Omega }}\) into closed d-simplices K. For \(K \in {\mathcal K}\), we set \(\omega _K=\omega _K^{\mathcal K}:=\cup \{K'\!\! \in \!{\mathcal K}:K' \cap K \ne \emptyset \}\).

Given \(\ell \in \mathbb N_0\), \(k \in \mathbb N\), we define the global finite element space

$$\begin{aligned} {\mathcal S}_{\ell ,k}({\mathcal P})=\{\textbf{v}\in U:\textbf{v}|_P \in {\mathcal S}_{\ell ,k}(P) \,(P \in {\mathcal P})\}. \end{aligned}$$

Our aim is to construct a suitable interpolation operator into \({\mathcal S}_{\ell ,k}({\mathcal P})\). For some \(s>2\) (\(s \ge 2\) when \(d=1\)), let \(\textbf{v} \in C({\overline{I}};L_1(\Omega )) \times L_1\big (I;L_s(\Omega )^d \cap H({{\,\textrm{div}\,}};\Omega )\big )\). The piecewise interpolation operator

$$\begin{aligned} {\mathcal I}_{\ell ,k}^{{\mathcal P}}={\mathcal I}_{\ell ,k}^{{\mathcal P},1}\times {\mathcal I}_{\ell ,k}^{{\mathcal P},2}:= \textbf{v}\mapsto \left( {\mathcal I}_{\ell ,k}^{P,1} {v_1}|_{P},{\mathcal I}_{\ell ,k}^{P,2} {\textbf{v}_2}|_{P}\right) _{P \in {\mathcal P}} \end{aligned}$$

is, although it maps into \(H({{\,\textrm{div}\,}};Q)\), not appropriate since \({{\,\textrm{ran}\,}}{\mathcal I}_{\ell ,k}^{{\mathcal P},1} \not \subset L_2(I;H^1_0(\Omega ))\). Recalling that \({\mathcal I}_{\ell ,k}^{P,1}=\rho _\ell ^J \otimes Q_k^K\), we therefore replace \({\mathcal I}_{\ell ,k}^{{\mathcal P}}\) by

$$\begin{aligned} \breve{{\mathcal I}}_{\ell ,k}^{{\mathcal P}}= \breve{{\mathcal I}}_{\ell ,k}^{{\mathcal P},1} \times {\mathcal I}_{\ell ,k}^{{\mathcal P},2} \end{aligned}$$
(3.1)

with \(\breve{{\mathcal I}}_{\ell ,k}^{{\mathcal P},1}\) being defined by the replacement of the piecewise orthogonal projector \(w \mapsto (Q_k^K w|_K)_K\) by the operator \(\breve{Q}_k\) defined as this piecewise orthogonal projector followed by averaging at the nodal pointsFootnote 4 shared by multiple d-simplices K in the partition of \(\Omega \), or setting the value to zero at nodal points on \(\partial \Omega \). Since a piecewise polynomial on \(\Omega \) is in \(H^1_0(\Omega )\) if (and only if) it is continuous and vanishes at \(\partial \Omega \), the mapping \(\breve{{\mathcal I}}_{\ell ,k}^{{\mathcal P}}\) is a projector onto \({\mathcal S}_{\ell ,k}({\mathcal P})\).

It holds that for \(K \in {\mathcal K}\), \(\Vert \breve{Q}_k w\Vert _{L_2(K)} \lesssim \Vert w\Vert _{L_2(\omega _K)}\), and for \(n \in \{0,1\}\), \(m \in \{1,\ldots ,k+1\}\), and w that vanishes at \(\partial \Omega \),

$$\begin{aligned} |({\hbox {Id}}-\breve{Q}_k)w|_{H^n(K)} \lesssim h_K^{m-n} |w|_{H^{m}(\omega _K)}. \end{aligned}$$
(3.2)

The latter inequality follows by the usual homogeneity and polynomial reproduction arguments after realizing that, when applied to w with \(w|_{\partial \Omega }=0\), the definition of \(\breve{Q}_k\) does not change when the zero DOFs associated to nodal points on \(\partial \Omega \) are replaced by the usual Scott–Zhang functionals in terms of a weighted integrals of \(w|_{\partial \Omega }\) (see [13, Appendix]). A comprehensive analysis of quasi-interpolators constructed by averaging can be found in [9].

Proposition 3.1

For any \(k' \in \{0,\ldots ,k\}\), \(k'' \in \{0,\ldots ,k+1\}\), \(\ell ' \in \{0,\ldots ,\ell +1\}\), and \(P \in {\mathcal P}\) it holds that

$$\begin{aligned} \big \Vert {{\,\textrm{div}\,}}\big (\textbf{v} -\breve{{\mathcal I}}_{\ell ,k}^{{\mathcal P}} \textbf{v}\big )\big \Vert _{L_2(P)}&\lesssim h_J^{\ell '} |{{\,\textrm{div}\,}}\textbf{v}|_{{H^{\ell '}(J)\otimes L_2(K})}\\&\quad + h_K^{k''} \big ( |{{\,\textrm{div}\,}}\textbf{v}|_{L_2(J)\otimes H^{k''}(K)}+ | v_1|_{H^1(J)\otimes H^{k''}(\omega _K)}\big ),\\ \big \Vert \nabla _\textbf{x} \big (v_1 -\breve{{\mathcal I}}_{\ell ,k}^{{\mathcal P},1} v_1\big )\big \Vert _{L_2(P)^d}&\lesssim h_J^{\ell '+1} |v_1|_{H^{\ell '+1}(J)\otimes H^1(\omega _K)} + h_K^{{k'}} | v_1|_{L_2(J)\otimes H^{{k'}+1}(\omega _K)},\\ \big \Vert \textbf{v}_2 -{\mathcal I}_{\ell ,k}^{{\mathcal P},2} \textbf{v}_2\big \Vert _{L_2(P)^d}&\lesssim h_J^{\ell '} |\textbf{v}_2|_{H^{\ell '}(J)\otimes L_2(K)^d} + h_K^{{k'}+1} |\textbf{v}_2|_{L_2(J)\otimes H^{{k'}+1}(K)^d}, \end{aligned}$$

for the first and second estimate assuming that \(v_1\) vanishes at \(I \times \partial \Omega \). The hidden constants depend only on the shape regularity of \({\mathcal K}\), the space dimension d, and the polynomial degrees \(\ell \) and k.

Proof

The third estimate follows directly from Corollary 2.3(iii).

Writing \(\text {Id}\otimes \text {Id}-\rho _\ell ^{J}\otimes \breve{Q}_k=(\text {Id}-\rho _\ell ^{J})\otimes \breve{Q}_k+\text {Id}\otimes (\text {Id}-\breve{Q}_k)\), the second estimate follows from the estimate for \(\text {Id}-\rho _\ell ^{J}\) that was already used in the proof of Corollary 2.3, and, for w that vanishes on \(\partial \Omega \), \(|\breve{Q}_k w|_{H^1(K)} \lesssim |w|_{H^1(\omega _K)}\), and \(|(\text {Id}-\breve{Q}_k)w|_{H^1(K)} \lesssim h_K^{{k'}} |w|_{H^{{k'}+1}(\omega _K)}\).

Writing \(({{\,\textrm{div}\,}}\breve{{\mathcal I}}_{\ell ,k}^{{\mathcal P}}{} \textbf{v})|_P={{\,\textrm{div}\,}}{\mathcal I}_{\ell ,k}^{P}{} \textbf{v}|_P+(\partial _t \rho _\ell ^J \otimes (\breve{Q}_k-Q_k^K)v_1)|_P\), the first estimate follows from Corollary 2.3(i), \(\Vert (\rho _\ell ^J w)'\Vert _{L_2(J)}=\Vert Q_\ell ^J w'\Vert _{L_2(J)} \lesssim \Vert w'\Vert _{L_2(J)}\), and, for w that vanishes on \(\partial \Omega \), \(\Vert (\breve{Q}_k-Q_\ell ^K)w\Vert _{L_2(K)} \lesssim h_K^{k''} |w|_{H^{k''}(\omega _K)}\). \(\square \)

By taking \(\ell +1=k\), and \(\ell '=k=k''\), \(\ell '+1=k=k'\), and \(\ell '=k=k'+1\) in the first, second, and third bound of Proposition 3.1, respectively, we arrive at the following upper bound for the interpolation error in the local U-norm (2.5) in the isotropic case:

Corollary 3.2

For \(h_P:=h_J \eqsim h_K\) and \(k \in \mathbb N\), it holds that

$$\begin{aligned} \Vert \textbf{v} -\breve{{\mathcal I}}_{k-1,k}^{{\mathcal P}} \textbf{v}\Vert _{U,P} \lesssim h_P^k \big (|\nabla _\textbf{x} v_1|_{H^k(J \times \omega _K)^d}+|\textbf{v}_2|_{H^k(P)^d}+ |{{\,\textrm{div}\,}}\textbf{v}|_{H^k(P)}\big ) \end{aligned}$$
(3.3)

assuming that \(v_1\) vanishes at \(I \times \partial \Omega \).Footnote 5

In particular, for \(\textbf{u}\) being the solution of (1.2), it holds that

$$\begin{aligned} \Vert \textbf{u} -\breve{{\mathcal I}}_{k-1,k}^{{\mathcal P}} \textbf{u}\Vert _{U,P} \lesssim h_P^k \big (|\nabla _\textbf{x} u|_{H^k(J \times \omega _K)^d}+|\textbf{f}_2|_{H^k(P)^d}+ |f_1|_{H^k(P)}\big ). \end{aligned}$$

Remark 3.3

The upper bound on \(\Vert {{\,\textrm{div}\,}}(\textbf{v} -\breve{{\mathcal I}}_{\ell ,k}^{{\mathcal P}} \textbf{v})\Vert _{L_2(P)}\) from Proposition 3.1 does not depend on a norm of \(\textbf{v}_2\), but it does contain a term \(h_K^{k''} |v_1|_{H^1(J)\otimes H^{k''}(\omega _K)}\) meaning that the global interpolant \(\breve{{\mathcal I}}_{\ell ,k}^{{\mathcal P}}\) does not give rise to a (truly) commuting diagram. Considering the case that \(\ell +1=k\), this upper bound for \(\ell '=k=k''\) of order \(h_J^k+h_K^k\) involves an \(L_2\)-norm of a partial derivative of \(v_1\) of total order \(k+1\), of which one order is in time and the other k orders are in space.

Still for \(\ell +1=k\), the upper bound on \(\Vert \nabla _\textbf{x} (v_1 -\breve{{\mathcal I}}_{k-1,k}^{{\mathcal P},1} v_1)\Vert _{L_2(P)^d}\) for \(\ell '+1=k=k'\) from Proposition 3.1 of order \(h_J^k+h_K^k\) involves two \(L_2\)-norms of partial derivatives of \(v_1\) of total order \(k+1\), one that has k orders in time and one in space, and the other that has all \(k+1\) orders in space. For \(v_1=u\) being a (non-smooth) solution of the heat equation with a (relatively) smooth forcing term f, temporal derivatives of u can be expected to grow faster towards a singularity than spatial derivatives of the same order do. In view of this, the additional term in the upper bound of the interpolation error as a consequence of the absence of a truly commuting diagram is harmless. Because of this, in Corollary 3.2, we have bounded the three (local) \(L_2\)-norms of partial derivatives of \(v_1\) from the upper bounds of Proposition 3.1 by the single term \(|\nabla _\textbf{x} v_1|_{H^k(J \times \omega _K)^d}\).

3.2 General partition into prisms

When having the aim to allow local (adaptive) refinements, it is needed to consider prismatic partitions that are nonconforming. In this section, we therefore consider (essentially) disjoint partitions \({\mathcal P}\) of the time–space cylinder \({\overline{Q}}\) into prisms \(P=J \times K\), for closed intervals \(J \subset I\) and uniformly shape-regular closed d-simplices \(K \subset \Omega \), such that for any two different \(P, P' \in {\mathcal P}\), where \(P'=J' \times K'\), if \({{\,\textrm{meas}\,}}_d(P \cap P')>0\), then \(|J| \eqsim |J'|\) and \(|K| \eqsim |K'|\), and either P and \(P'\) share a common facet, or \(P \cap P'\) is a complete facet F of one of the two prisms, say P, and a proper subset of a facet \(F'\) of \(P'\). In the latter case we refer to \(P'\) (or \(F'\)) as the master of the slave P (or F). We assume that \({\mathcal P}\) is 1-irregular in the sense that if \(F' \subset P'\) is a master facet, then it has non-empty intersection with possible slave facets of \(P'\). A suitable refinement strategy generating partitions that satisfy all given requirements will be presented in Sect. 4.1. Facets that are parallel to the bottom \(\{0\}\times \Omega \) of the time–space cylinder will be referred to as horizontal facets, and the other ones as lateral facets. As in the case of having a conforming prismatic partition, we set

$$\begin{aligned} {\mathcal S}_{\ell ,k}({\mathcal P}):=\{\textbf{v}\in U:\textbf{v}|_P \in {\mathcal S}_{\ell ,k}(P) \,(P \in {\mathcal P})\}. \end{aligned}$$

We have to adapt the definition of the quasi-interpolant \(\breve{{\mathcal I}}^{{\mathcal P}}_{\ell ,k}=\breve{{\mathcal I}}_{\ell ,k}^{{\mathcal P},1} \times {\mathcal I}_{\ell ,k}^{{\mathcal P},2}\) from (3.1) to generally nonconforming partitions. We will construct an adapted quasi-interpolant, denoted by , that is a projector onto \({\mathcal S}_{\ell ,k}({\mathcal P})\) such that, for \(h_J \eqsim h_K\) and \(\ell +1=k\) (\(\ell =k \ge 1\) when \(d=1\)), an estimate like (3.3) from Corollary 3.2 is valid, albeit with stronger norms on \(\textbf{v}=(v_1,\textbf{v}_2)\) in the upper bound.

The definition will show that for all \(P \in {\mathcal P}\) for which the local patch \(\omega _P=\omega _P^{\mathcal P}:=\cup \{P'\in {\mathcal P}:P' \cap P \ne \emptyset \}\) is conforming, it holds that so that the favourable estimate (3.3) from Corollary 3.2 remains valid.

Although for the other \(P\in {\mathcal P}\) we will assume locally sufficient additional regularity of \(\textbf{v}\), a proof of the interpolation error in local \(\Vert \cdot \Vert _{U,P}\)-norm being of order \(h_P^k\) is nevertheless not immediate. Indeed, because \({\mathcal P}_k(J \times K)^{d+1}\) is not included in our local finite element space \({\mathcal S}_{k-1,k}(P)\), it is not a consequence of the Bramble–Hilbert lemma.Footnote 6

3.2.1 Definition and analyis of

Given \(\ell \in \mathbb N_0\), \(k \in \mathbb N\), we define the set of nodal points in \(P \in {\mathcal P}\) as the Cartesian product of the principal lattice of order \(\ell +1\) on J and the principal lattice of order k on K. A nodal point \(\nu \) in P that is not on a slave facet of P is called a free node in P. Notice that a function in \({\mathcal S}_{\ell ,k}(P)\) is uniquely determined by its values in the nodal points in P, and that the restriction of such a function to a facet of P is determined by its values in the nodal points on that facet.

Let \(v_1 \in C({\overline{I}};L_1(\Omega ))\), and let \(\nu \) be a node of P. For \(\nu \in P\setminus \partial P\), we set , and for \(\nu \in {\overline{I}} \times \partial \Omega \), we set . For \(\nu \in \partial P {\setminus } \partial \Omega \) being a free node in P, we define as the average of \(({\mathcal I}_{\ell ,k}^{P',1} v_1)(\nu )\) over all \(P' \in {\mathcal P}\) in which \(\nu \) is a free node. In the remaining case of \(\nu \) being on a slave facet F of P with master facet \(F'\) of \(P'\), we set equal to the value at \(\nu \) of . Note that the condition of \({\mathcal P}\) being 1-irregular ensures that all nodes of \(P'\) that are on \(F'\) are free so that the latter term is well-defined.

By construction, is continuous and vanishes at \(I \times \partial \Omega \). The same arguments used to demonstrate (3.2) together with standard inverse inequalities show that for \(P=J \times K\), \(j \in \{0,\ldots ,\min (\ell +1,k)\}\), and \(v_1\) that vanishes at \(I \times \partial \Omega \),

(3.4)
(3.5)

Because we are going to estimate

(3.6)

here we notice that similarly to (3.4), for \(v_1\) that vanishes at \(I \times \partial \Omega \) it holds that

(3.7)

3.2.2 Definition and analysis of

Let \(d \ge 2\). For some \(s>2\), let \(\textbf{v}_2 \in L_1\big (I;L_s(\Omega )^d\) \(\cap H({{\,\textrm{div}\,}};\Omega )\big )\). Recalling that \({\mathcal I}_{\ell ,k}^{{\mathcal P},2} \textbf{v}_2=({\mathcal I}_{\ell ,k}^{P,2} \textbf{v}_2|_P)_{P \in {\mathcal P}}\), to ensure \(L_2(I;H({{\,\textrm{div}\,}};\Omega ))\)-conformity of , for F being a lateral slave facet of P with corresponding master \(P'\), in the DoFs of type \(\iint _{F} \textbf{v}_2 (t,\textbf{x})\cdot \textbf{n}_\textbf{x} \,q(\textbf{x}) p(t)\,d\textbf{x}\,dt\) that define \({\mathcal I}_{\ell ,k}^{P,2} \textbf{v}_2|_P\) we replace \(\textbf{v}_2|_F \cdot \textbf{n}_\textbf{x}\) by \(({\mathcal I}_{\ell ,k}^{P',2} \textbf{v}_2|_{P'})|_F \cdot \textbf{n}_\textbf{x}\). Below, we derive local bounds for the quasi-interpolation error.

The DoFs for \( RT _k(K)\) given in (2.3) define a local quasi-interpolator as the sum over these DoFs of the product of the DoF and the associated dual basis function of \( RT _k(K)\). If we restrict this sum to those DoFs that are associated to a facet e of K, then the resulting operator, which we denote by \({\mathcal I}_e\), satisfies

$$\begin{aligned} \Vert {\mathcal I}_e \textbf{w}\Vert _{L_2(K)^d} \lesssim h_K^{d(\frac{1}{2}-\frac{1}{s})} \Vert \textbf{w}\Vert _{L_s(K)^d}+h_K \Vert {{\,\textrm{div}\,}}\textbf{w}\Vert _{L_2(K)}, \end{aligned}$$

as shown in [10, Ch. 17, (17.20)]. Since \({\mathcal I}_e \textbf{w}\) depends on the normal trace of \(\textbf{w}\) on e only, equally well it holds that

$$\begin{aligned} \Vert {\mathcal I}_e \textbf{w}\Vert _{L_2(K)^d} \lesssim h_{K}^{d(\frac{1}{2}-\frac{1}{s})} \Vert \textbf{w}\Vert _{L_s(K'')^d}+h_K \Vert {{\,\textrm{div}\,}}\textbf{w}\Vert _{L_2(K'')}, \end{aligned}$$

where \(K''\) is another uniformly shape-regular d-simplex that has e as a face. By an application of an inverse inequality, it also shows that

$$\begin{aligned} \Vert {{\,\textrm{div}\,}}{\mathcal I}_e \textbf{w}\Vert _{L_2(K)} \lesssim h_{K}^{d(\frac{1}{2}-\frac{1}{s})-1} \Vert \textbf{w}\Vert _{L_s(K'')^d}+ \Vert {{\,\textrm{div}\,}}\textbf{w}\Vert _{L_2(K'')}. \end{aligned}$$
(3.8)

By extending this to the local quasi-interpolator on \(P=J \times K\), with \({\mathcal P}_\ell (J) \otimes RT _k(K)\) equipped with tensor DoFs built from (2.2) and (2.3), and lateral facet \(F=J \times e\), we obtain for the restriction of this quasi-interpolator to the tensor DoFs associated to F, which we denote by \({\mathcal I}_F\),

$$\begin{aligned} \Vert {\mathcal I}_F \textbf{w}\Vert _{L_2(P)^d} \lesssim h_{K}^{d(\frac{1}{2}-\frac{1}{s})} \Vert \textbf{w}\Vert _{L_2(J;L_s(K'')^d)}+h_K \Vert {{\,\textrm{div}\,}}_\textbf{x} \textbf{w}\Vert _{L_2(J \times K'')}, \end{aligned}$$
(3.9)

and

$$\begin{aligned} \Vert {{\,\textrm{div}\,}}_\textbf{x} {\mathcal I}_F \textbf{w}\Vert _{L_2(P)} \lesssim h_{K}^{d(\frac{1}{2}-\frac{1}{s})-1} \Vert \textbf{w}\Vert _{L_2(J;L_s(K'')^d)}+ \Vert {{\,\textrm{div}\,}}_\textbf{x} \textbf{w}\Vert _{L_2(J \times K'')}. \end{aligned}$$
(3.10)

To proceed, it suffices to consider the situation that \(P=J \times K\) has one lateral slave facet \(J \times e\), where \(P'=J' \times K'\) denotes the corresponding master. Then we can select a uniformly shape-regular d-simplex \(K'' \subset K'\) that has facet e. Recalling the definition of , the application of (3.9) with \(\textbf{w}=(\text {Id}- {\mathcal I}_{\ell ,k}^{P',2})\textbf{v}_2\) yields

(3.11)

We estimate both terms in the upper bound separately.

Recalling that \({\mathcal I}_{\ell ,k}^{P',2}=Q_\ell ^{J'} \otimes \rho _k^{K'}\), from \({{\,\textrm{div}\,}}\rho _k^{K'}=Q_k^{K'} {{\,\textrm{div}\,}}\) we have

$$\begin{aligned} {{\,\textrm{div}\,}}_\textbf{x} \left( \text {Id}- {\mathcal I}_{\ell ,k}^{P',2}\right) =\left( \left( \text {Id}-Q_\ell ^{J'}\right) \otimes \text {Id}+Q_\ell ^{J'} \otimes \left( \text {Id}-Q_k^{K'}\right) \right) {{\,\textrm{div}\,}}_\textbf{x}, \end{aligned}$$

which for \(\ell '' \in \{0,\ldots ,\ell +1\}\) and \(k'' \in \{0,\ldots ,k+1\}\) yields

$$\begin{aligned} \begin{aligned}&\left\| {{\,\textrm{div}\,}}_\textbf{x} \left( \text {Id}- {\mathcal I}_{\ell ,k}^{P',2}\right) \textbf{v}_2\right\| _{L_2(P')}\\&\qquad \lesssim h_{J'}^{\ell ''} |{{\,\textrm{div}\,}}_\textbf{x} \textbf{v}_2|_{H^{\ell ''}(J')\otimes L_2(K')} + h_K^{k''} |{{\,\textrm{div}\,}}_\textbf{x} \textbf{v}_2|_{L_2(J')\otimes H^{k''}(K')}. \end{aligned} \end{aligned}$$
(3.12)

With the aid of (2.4), valid for \(s \in [1,\infty ]\), one shows that for \(k' \in \{0,\ldots ,k\}\) and \(\ell ' \in \{0,\ldots ,\ell +1\}\),

$$\begin{aligned} \begin{aligned}&\left\| \left( \text {Id}- {\mathcal I}_{\ell ,k}^{P',2}\right) \textbf{v}_2\right\| _{L_2(J';L_s(K')^d)} \\&\qquad \lesssim h_J^{\ell '} |\textbf{v}_2|_{H^{\ell '}(J')\otimes L_s(K')^d} + h_K^{{k'}+1} |\textbf{v}_2|_{L_2(J')\otimes W_s^{{k'}+1}(K')^d}, \end{aligned} \end{aligned}$$
(3.13)

which generalizes this inequality for \(s=2\) from Corollary 2.3(iii).

By estimating

and using Corollary 2.3(iii) to bound the first term, in combination with (3.11), (3.12), and (3.13) to bound the second one, we conclude that for \(s > 2\), \(k''' \in \{0,\ldots ,k\}\), and \(\ell ''' \in \{0,\ldots ,\ell +1\}\),

(3.14)

The application of (3.10) with \(\textbf{w}=(\text {Id}- {\mathcal I}_{\ell ,k}^{P',2})\textbf{v}_2\) gives

(3.15)

Using the bounds (3.13) and (3.12) for the norms at the right hand side, we conclude that

(3.16)

3.2.3 Wrap-up for the isotropic case

By taking \(\ell +1=k\) in the isotropic case of \(h_P:=h_J \eqsim h_K\), from (3.5) we have for \(v_1\) that vanishes at \(I \times \partial \Omega \),

(3.17)

Recalling our assumption from Sect. 3.2.2 that \(d \ge 2\), by taking \(s=s(d)=(\frac{1}{2}-\frac{1}{d})^{-1} \in (2,\infty ]\) and with \({\tilde{\Omega }}_P={\tilde{\Omega }}_P^{\mathcal P}:=\{P' \in {\mathcal P}:P \text { has a lateral slave facet on } P'\}\)

(3.18)

Finally, the combination of (3.6), Corollary 2.3(i), (3.7), and (3.16) shows that for \(v_1\) that vanishes at \(I \times \partial \Omega \),

(3.19)

In conclusion: Also for \(P \in {\mathcal P}\) for which the local patch \(\omega _P^{\mathcal P}\) is nonconforming, the sum of the upper bounds in (3.17), (3.18), and (3.19) gives an upper bound for of order \(h_P^k\), albeit under stronger smoothness assumptions than needed for (3.3) from Corollary 3.2 applicable when \(\omega _P^{\mathcal P}\) is conforming.

3.2.4 The case \(d=1\)

So far we have excluded the case \(d=1\) from the definition and analysis of in Sect. 3.2.2, and so consequently from our conclusions presented in Sect. 3.2.3. Everything that was presented in Sect. 3.2.2, however, is also valid for \(d=1\), where the condition \(s>2\) can even be read as \(s\ge 2\). Different from the \(d \ge 2\) case, however, for \(d=1\) the value of \(s\in [2,\infty ]\) cannot be chosen such that the exponent \(d(\frac{1}{2}-\frac{1}{s})-1\) in (3.8) is equal to 0. Via (3.10), (3.14), and (3.15), it means that in (3.19) we do not obtain an upper bound of order \(h_P^k\) unless we can take \(\ell '=k+1\) (more precisely \(\ell '=k+\frac{1}{2}+\frac{1}{s}\)) in (3.13). The latter however requires \(\ell =k\) instead of \(\ell +1=k\).

Taking therefore \(\ell =k \ge 1\), i.e., \({\mathcal S}_{k,k}({\mathcal P})\) instead of \({\mathcal S}_{k-1,k}({\mathcal P})\) that was found most appropriate for the case of conforming partitions, and \(s=2\), similar to (3.17) and (3.18), we obtain

whereas instead of (3.19) we have

In our numerical experiments, however, we did not observe better results for \({\mathcal S}_{k,k}({\mathcal P})\) than for \({\mathcal S}_{k-1,k}({\mathcal P})\), and therefore we present only results for the latter option.

4 Numerical results

4.1 Adaptive algorithm

In the following numerical examples, we consider the least-squares formulation (1.2) of the heat equation (1.1) for \(d=1\) and \(d=2\). On prismatic partitions \(\mathcal P^\delta \) of the space–time cylinder \(Q=I\times \Omega \) consisting of isotropic elements of the form \(P=J \times K\) for closed intervals \(J \subset I\) and closed d-simplices \(K \subset \Omega \), we compute the corresponding least-squares approximation \(\textbf{u}^\delta :={{\,\textrm{argmin}\,}}_{\textbf{v} \in {\mathcal {S}}_{0,1}(\mathcal P^{\delta })}\Vert \textbf{f}-G \textbf{v}\Vert _L\) (i.e., \(\ell +1=k\) as suggested in Remark 3.3, and the lowest order case \(k=1\)). For convenience of the reader, we recall the definition of the discrete space

$$\begin{aligned} {\mathcal S}_{0,1}({\mathcal P}^\delta )=\{\textbf{v}\in U:\textbf{v}|_P \in \big ({\mathcal P}_{1}(J) \otimes {\mathcal P}_1(K)\big ) \times \big ({\mathcal P}_0(J) \otimes RT _1(K)\big ) \,(P \in {\mathcal P}^\delta )\}, \end{aligned}$$

where \( RT _1(K)\) should be read as \({\mathcal P}_2(K)\) in the case that \(d=1\). Note that \(\textbf{u}^\delta \) can simply be computed as the solution of the finite-dimensional linear system

$$\begin{aligned} \langle G\textbf{u}^\delta , G\textbf{v} \rangle _L = \langle \textbf{f}, G \textbf{v} \rangle _L \quad (\textbf{v}\in {\mathcal {S}}_{0,1}(\mathcal P^\delta )), \end{aligned}$$

where \(\langle \cdot ,\cdot \rangle _L\) denotes the scalar product on the \(L_2\)-type space \(L=L_2(Q)\times L_2(Q)^d\times L_2(\Omega )\).

To measure the error, we use the reliable and efficient a posteriori error estimator

$$\begin{aligned} \eta (\textbf{f},\textbf{v}):= \Vert \textbf{f} - G \textbf{v} \Vert _{L} \eqsim \Vert \textbf{u} - \textbf{v}\Vert _U \quad (\textbf{v}\in U), \end{aligned}$$

with corresponding local error indicators

$$\begin{aligned} \eta (P; \textbf{f}, \textbf{v}):= \Vert \textbf{f} - G \textbf{v} \Vert _{L(P)} \quad (P\in \mathcal P^\delta ), \end{aligned}$$

where

$$\begin{aligned} L(P) := L_2(P)\times L_2(P)^d \times L_2(\partial P \cap (\{0\}\times \Omega )). \end{aligned}$$

While the a priori estimates of Sect. 3 only guarantee the convergence rate \({\mathcal {O}}(\hbox {dofs}^{-\frac{1}{d+1}})\) for a sequence of quasi-uniform meshes and sufficiently smooth solution \(\textbf{u}\), we will also investigate adaptive refinement of elements \({\mathcal {M}}^\delta \subseteq \mathcal P^\delta \) with “large” error indicator for singular solutions \(\textbf{u}\). We use Dörfler marking \( \theta \eta (\textbf{f},\textbf{u}^\delta )^2 \le \sum _{P\in {\mathcal {M}}^\delta }\eta (P;\textbf{f}, \textbf{u}^\delta )^2\) with some \(0<\theta \le 1\) to determine the marked elements.

Starting from a conforming initial partition of \({\overline{Q}}\) into (closed) prisms, we consider the following refinement procedure: In the spatial direction a d-simplex to be refined is decomposed into \(2^d\)-simplices by some rule such that a uniform refinement of a conforming partition of \({\overline{\Omega }}\) results in a conforming decomposition, and a repeated application of this rule produces uniformly shape-regular simplices, and we combine this rule with bisection in the temporal direction. Now the level of each prism that is produced is defined as the number of these product decompositions needed to create the prism from a prism in the initial partition, whose level is set to zero. All prisms that can be produced in this way are uniformly shape-regular, and so in particular (uniformly) isotropic.

In our experiments for spatial domains of dimension \(d=1\) or \(d=2\), as the rule to refine a d-simplex we applied bisection or ‘red’-refinement, respectively.

In the case of local refinements, apart from the prisms marked for refinement, a minimal number of other prisms are refined such that the difference in levels of any two prisms in the new partition that have non-empty intersection is less or equal to one. This condition guarantees that all partitions that can be created are 1-irregular.

Overall, we thus consider the following adaptive algorithm.

Algorithm 4.1

Input: Right-hand side \(\textbf{f}\in L\), initial conforming mesh \(\mathcal P^0=\mathcal P^{\delta _0}\), Dörfler parameter \(0<\theta \le 1\).

Loop: For each \(j=0,1,2,\dots \), iterate the following steps (i)–(iv):

  1. (i)

    Compute least-squares approximation \(\textbf{u}^j=\textbf{u}^{\delta _j} = {{\,\textrm{argmin}\,}}_{\textbf{v} \in {\mathcal {S}}_{0,1}(\mathcal P^{j})}\Vert \textbf{f}-G \textbf{v}\Vert _L\) of \(\textbf{u}=(u,-\nabla _\textbf{x} u-\textbf{f}_2)\).

  2. (ii)

    Compute error indicators \(\eta ({P;\textbf{f}, \textbf{u}^j})\) for all elements \({P}\in \mathcal P^j=\mathcal P^{\delta _j}\).

  3. (iii)

    Determine a minimal set of marked elements \({\mathcal {M}}^j\subseteq \mathcal P^j\) satisfying Dörfler marking

    $$\begin{aligned} \theta \,\eta (\textbf{f},\textbf{u}^j)^2 \le \sum _{P\in {\mathcal {M}}^j}\eta (P;\textbf{f}, \textbf{u}^j)^2. \end{aligned}$$
  4. (iv)

    Generate refined prismatic mesh \(\mathcal P^{j+1}\) by refining at least all marked elements \({\mathcal {M}}^j\) as explained above.

Output: Refined meshes \(\mathcal P^j\), corresponding exact discrete solutions \(\textbf{u}^j\), and error estimators \(\eta (\textbf{f}, \textbf{u}^j)\) for all \(j \in \mathbb N_0\).

For comparison, we also consider the adaptive algorithm of [11], i.e., Algorithm 4.1 for conforming simplicial meshes \(\mathcal T^\delta \) instead of prismatic meshes \(\mathcal P^\delta \), refined by newest vertex bisection [19], where, as in [11], we refine marked elements into \(2^{d+1}\) elements by repeated bisection, with associated finite element space

$$\begin{aligned} \{\textbf{v}\in C^0(Q)\times C^0(Q)^d:\textbf{v}|_T \in {\mathcal P}_1(T) \times {\mathcal P}_1(T)^d \,(T \in \mathcal T^\delta ), v_1|_{I\times \partial \Omega } = 0\}\subset U. \end{aligned}$$

Although, we present numerical comparisons only for some of the test examples of [11] (plus some new ones), we emphasize that we computed numerical results for all these examples and observed that in each case our prismatic finite element space \({\mathcal S}_{0,1}({\mathcal P}^\delta )\) gave either better or the same convergence rates of the estimator, for both uniform refinement, where in each step all elements are marked for refinement, and adaptive refinement.

Remark 4.2

In [14], we proved plain convergence

$$\begin{aligned} \lim _{j\rightarrow \infty }\eta (\textbf{f},\textbf{u}^j) = 0= \lim _{j\rightarrow \infty }\Vert {\textbf{u}-\textbf{u}}^j\Vert _U \end{aligned}$$

of the adaptive algorithm from [11], even for arbitrary polynomial degree of the employed discrete functions. Exploiting the local approximation properties derived in Sect. 3.2, the proof easily extends to Algorithm 4.1 (with prismatic meshes), even for arbitrary polynomial degrees \(\ell \in \mathbb N_0\), \(k\in \mathbb N\) in the discrete space \({\mathcal {S}}_{\ell ,k}(\mathcal P^j)\) (instead of \({\mathcal {S}}_{0,1}(\mathcal P^j)\)). Convergence is even satisfied if the marking step (iii) is replaced by determining a (not necessarily minimal) set of marked elements \({\mathcal {M}}^j\subseteq \mathcal P^j\) with the property

$$\begin{aligned} \max _{P\in \mathcal P^j\setminus {\mathcal {M}}^j} \eta (P;\textbf{f}, \textbf{u}^j) \le M(\max _{P\in {\mathcal {M}}^j} \eta (P;\textbf{f}, \textbf{u}^j)) \end{aligned}$$

for some fixed marking function \(M:[0,\infty )\rightarrow [0,\infty )\) that is continuous at 0 with \(M(0)=0\).

4.2 Experiments in 1 + 1D

For the following experiments, we fix the domain \(\Omega :=(0,1)\) and the end time point \(T:=1\). As initial prismatic mesh of the resulting space–time cylinder Q, we take \(\mathcal P^0=\{{{\overline{Q}}}\}\). For the initial triangular mesh \(\mathcal T^0\), we split \({{\overline{Q}}}\) into four triangles. We consider uniform refinement, where in each step all elements are marked for refinement, and adaptive refinement with Dörfler parameter \(\theta =0.5\).

4.2.1 Non-matching initial datum

We consider [11, Example 4], i.e.,

$$\begin{aligned} \textbf{f}=(f_1,\textbf{f}_2,u_0)=(2,0,1). \end{aligned}$$

Note that the initial datum does not match with the homogenous Dirichlet boundary conditions so that the solution u of the heat equation and the associated \(\textbf{u}=(u,-\nabla _x u)\) are singular for \((t,x)\in \{(0,0),(0,1)\}\). Figure 1 displays the resulting error estimators \(\eta \) for simplicial and prismatic meshes, and uniform and adaptive refinement vs. the degrees of freedom. The rates 0.08 and 0.17 for simplicial meshes have been already reported in [11]. For prismatic meshes, we observe significantly improved rates 0.13 and 0.43. As mentioned before, the optimal rate for smooth solutions would be 1/2.

Fig. 1
figure 1

Convergence plot of Sect. 4.2.1 for spatial domain \(\Omega =(0,1)\), right-hand side \((f_1,\textbf{f}_2,u_0) = (2,0,1)\), simplicial and prismatic meshes, and uniform and adaptive refinement

4.2.2 Initial datum with a “singularity” in the interior

We consider [11, Example 2], i.e.,

$$\begin{aligned} (f_1,\textbf{f}_2,u_0) = \big (1,0,x\mapsto 1-2\big |x-\tfrac{1}{2}\big |\big ). \end{aligned}$$

Clearly, the (at \(x=\tfrac{1}{2}\)) irregular initial datum yields a reduced regularity of the unknown solution u of the heat equation and the associated \(\textbf{u}=(u,-\nabla _x u)\). Figure 2 displays the resulting error estimators \(\eta \) for simplicial and prismatic meshes, and uniform and adaptive refinement over the degrees of freedom. The rates 0.25 and 0.42 for simplicial meshes have been already observed in [11]. Only adaptive refinement for prismatic meshes leads to the optimal rate 1/2, while uniform refinement yields the rate 0.38.

Fig. 2
figure 2

Convergence plot of Sect. 4.2.2 for spatial domain \(\Omega =(0,1)\), right-hand side \((f_1,\textbf{f}_2,u_0) = \big (1,0,x\mapsto 1-2\big |x-\tfrac{1}{2}\big |\big )\), simplicial and prismatic meshes, and uniform and adaptive refinement

4.2.3 Initial datum with a “singularity” at the boundary

We consider

$$\begin{aligned} (f_1,\textbf{f}_2,u_0) = (0,0,x\mapsto x^{1/2}(1-x)) \end{aligned}$$

Clearly, the (at \(x=0\)) irregular initial datum yields a reduced regularity of the unknown solution u of the heat equation and the associated \(\textbf{u}=(u,-\nabla _x u)\). Figure 3 displays the resulting error estimators \(\eta \) for simplicial and prismatic meshes, and uniform and adaptive refinement over the degrees of freedom. As in the previous example, only adaptive refinement for prismatic meshes leads to the optimal rate 1/2, while uniform refinement yields the rate 0.26. For simplicial meshes, we observe the rates 0.19 and 0.33.

Fig. 3
figure 3

Convergence plot of Sect. 4.2.3 for spatial domain \(\Omega =(0,1)\), right-hand side \((f_1,\textbf{f}_2,u_0) = (0,0,x\mapsto x^{1/2}(1-x))\), simplicial and prismatic meshes, and uniform and adaptive refinement

4.3 Experiments in 2 + 1D

For the following experiments, we fix the domain \(\Omega :=(0,1)^2\) and the end time point \(T:=1\). The initial prismatic mesh of the resulting space–time cylinder Q is obtained by splitting \({{\overline{Q}}}\) into two prisms with triangular base. For the initial tetrahedral mesh \(\mathcal T^0\), we split \({{\overline{Q}}}\) into twelve tetrahedra. We consider uniform refinement, where in each step all elements are marked for refinement, and adaptive refinement with Dörfler parameter \(\theta =0.5\).

4.3.1 Non-matching initial datum

We consider [11, Example 7], i.e.,

$$\begin{aligned} \textbf{f}=(f_1,\textbf{f}_2,u_0)=(0,0,1). \end{aligned}$$

Note that the initial datum does not match with the homogenous Dirichlet boundary conditions so that the solution u of the heat equation and the associated \(\textbf{u}=(u,-\nabla _x u)\) are singular for \((t,x)\in \{0\}\times \partial \Omega \). Figure 4 displays the resulting error estimators \(\eta \) for simplicial and prismatic meshes, and uniform and adaptive refinement vs. the degrees of freedom. The rates 0.07 and 0.08 for simplicial meshes have been already reported in [11]. For prismatic meshes, we observe somewhat improved rates 0.09 and 0.14. As mentioned before, the optimal rate for smooth solutions would be 1/3. The reason for the low rates even for adaptively refined prismatic meshes is likely due to the edge singularities of the solution requiring anisotropic refinement in both space and time.

Fig. 4
figure 4

Convergence plot of Sect. 4.3.1 for spatial domain \(\Omega =(0,1)^2\), right-hand side \((f_1,\textbf{f}_2,u_0) = (0,0,1)\), simplicial and prismatic meshes, and uniform and adaptive refinement

4.3.2 Initial datum with a “singularity” at an interior point

We consider

$$\begin{aligned} (f_1,\textbf{f}_2,u_0) = \big (0,0,\textbf{x}\mapsto \sqrt{(x_1-\tfrac{1}{2})^2+(x_2-\tfrac{1}{2})^2} \sin (\pi x_1) \sin (\pi x_2)\big ). \end{aligned}$$

Clearly, the (at \(\textbf{x}=(\tfrac{1}{2},\tfrac{1}{2})\)) irregular initial datum yields a reduced regularity of the unknown solution u of the heat equation and the associated \(\textbf{u}=(u,-\nabla _x u)\). Figure 5 displays the resulting error estimators \(\eta \) for simplicial and prismatic meshes, and uniform and adaptive refinement over the degrees of freedom. Only adaptive refinement for prismatic meshes leads to the optimal rate 1/3, while uniform refinement yields the rate 0.27. For simplicial meshes, we observe the rates 0.13 and 0.18.

Fig. 5
figure 5

Convergence plot of Sect. 4.3.2 for spatial domain \(\Omega =(0,1)^2\), right-hand side \((f_1,\textbf{f}_2,u_0) = \big (0,0, \textbf{x}\mapsto \sqrt{(x_1-\tfrac{1}{2})^2+(x_2-\tfrac{1}{2})^2} \sin (\pi x_1) \sin (\pi x_2)\big ) \), simplicial and prismatic meshes, and uniform and adaptive refinement

4.3.3 Initial datum with a “singularity” at a boundary edge

We consider

$$\begin{aligned} (f_1,\textbf{f}_2,u_0) = (0,0,\textbf{x}\mapsto x_1^{3/4}(1-x_1)x_2(1-x_2)) \end{aligned}$$

Clearly, the (at \(x_1=0\)) irregular initial datum yields a reduced regularity of the unknown solution u of the heat equation and the associated \(\textbf{u}=(u,-\nabla _x u)\). Figure 6 displays the resulting error estimators \(\eta \) for simplicial and prismatic meshes, and uniform and adaptive refinement over the degrees of freedom. As in the previous example, only adaptive refinement for prismatic meshes leads to the optimal rate 1/3, while uniform refinement yields the rate 0.3. For simplicial meshes, we observe the rates 0.23 and 0.27.

Fig. 6
figure 6

Convergence plot of Sect. 4.3.3 for spatial domain \(\Omega =(0,1)^2\), right-hand side \((f_1,\textbf{f}_2,u_0) = (0,0,\textbf{x}\mapsto x_1^{3/4}(1-x_1)x_2(1-x_2))\), simplicial and prismatic meshes, and uniform and adaptive refinement