1 Introduction

The support theorem for stochastic processes has a long history. One of its simplest forms can be phrased as follows: let \(B_t\) be a d-dimensional Brownian motion started from 0, then for any \(\epsilon ,t>0\) we have

$$\begin{aligned} \mathbb {P}(\sup _{s\le t}|B_s|<\epsilon )>0. \end{aligned}$$

This follows from the reflection principle of Brownian motion. Via a Girsanov change of measure, we can deduce that for any continuous \(\psi :[0,t]\rightarrow \mathbb {R}^d\) with \(\psi (0)=0\), we have

$$\begin{aligned} \mathbb {P}(\sup _{s\le t}|B_s-\psi (s)|<\epsilon )>0. \end{aligned}$$

See for example [3], pp. 59–60, (6.5) and (6.6). Both claims rely on Gaussian structure of the process B.

There is another notion, usually named Stroock–Varadhan support theorem for stochastic processes, that has a different flavour. Consider the parabolic SPDE with periodic boundary conditions

$$\begin{aligned} \partial _t u(t,x)=\partial _x^2 u(t,x)+g(t,x,u)+\sigma (t,x,u)\dot{W}(t,x),\quad x\in [0,1]. \end{aligned}$$
(1.1)

Denote by \(\mathcal {H}:=\{h:[0,T]\times [0,1]\rightarrow \mathbb {R}, h \text { absolutely continuous }, \dot{h}\in L^2([0,T]\times [0,1])\}\), the Cameron-Martin space of the Brownian sheet. \(\mathcal {H}\) is a Hilbert space endowed with the norm \(\Vert h\Vert _{\mathcal {H}}:=\Vert \dot{h}\Vert _{L^2([0,T]\times [0,1])}\). Let S(h) denote the solution of (1.1) when we take \(\dot{h}\) in place of the white noise \(\dot{W}\) for the given \(h\in \mathcal {H}\). If we assume \(\sigma \) is Lipschitz, \(u(0,\cdot )\) is Hölder continuous and g has sufficiently many derivatives, it is proved in [2] that the topological support of the probability law of the random variable u in \(\mathcal {C}^\alpha ([0,T]\times [0,1])\) (for some \(\alpha \in (0,\frac{1}{4})\) ) is given by the closure of \(\cup _{h\in \mathcal {H}}S(h)\). Here \(\mathcal {C}^\alpha ([0,T]\times [0,1])\) is the Wiener space equipped with the \(\alpha \)-Hölder topology.

Support theorems of this flavour have been proved for other examples of SPDEs, starting from [14], and see also [4, 5, 7], with the same conclusion that the topological support of the solution u is the closure of S(h), the solution of the SPDE driven by the control h in place of the noise. The support theorem for singular SPDEs has been obtained in [10], with the feature that the coset structure in the renormalization group plays a key role in characterizing topological supports of the solution. In all these works, the coefficients g and \(\sigma \) are nice enough so that the control problem S(h) can be properly solved, and the support of (1.1) can be characterized in terms of solutions to the control problem S(h).

In this paper we consider support theorems of SPDEs in the lens of the regularization by noise phenomenon. In the setting of (1.1), this means that the coefficients g or \(\sigma \) (or both) are not necessarily locally Lipschitz continuous. One usually requires \(\sigma \) to be uniformly elliptic, so that roughness of the driving noise can restore well-posedness of the equation. We generally do not have a Stroock–Varadhan support theorem since the ODE or PDE for the control process S(h) is in general not well-posed. However it is still possible to prove the solution has full support or obtain small ball probability estimates

For additive noise, i.e., \(\sigma =I_d\), and assuming the drift g is not too singular, we can remove the drift via Girsanov transform and show the solution has full support because white noise has. A related example for finite dimensional SDEs with singular drifts can be found in [12]. The story is the same when \(\sigma \) is Lipschitz continuous and uniformly elliptic.

We are particularly interested in the robustness of the support theorems, in the remaining (hardest) case that \(\sigma \) is only \(\alpha \)-Hölder continuous in u, for \(\alpha \in (0,1]\). More precisely, we assume that for some \(\mathcal {C}_1,\mathcal {C}_2,\mathcal {D}>0\) we have

$$\begin{aligned} |\sigma (t,x,u)-\sigma (t,x,v)|\le \mathcal {D}|u-v|^\alpha ,\quad \mathcal {C}_1\le |\sigma (t,x,u)|\le \mathcal {C}_2. \end{aligned}$$

for any \(t>0\), \(x\in [0,1]\) and \(u,v\in \mathbb {R}\). In this case the Girsanov transform or the control processes S(h) do not tell us the answer in the same way as before.

Before stating our support theorems, it is crucial to discuss the well-posedness issues of (1.1). For \(\sigma \) to be \(\alpha \)-Hölder continuous in u, \(\alpha >\frac{3}{4}\), strong existence and strong uniqueness have been established in [13] (see also [11] for a different perspective, also with \(\alpha >\frac{3}{4}\)). In general we may consider probabilistic weak solutions to (1.1) without the knowledge of weak uniqueness (weak existence follows from [9]). So long as \(\alpha >0\), the lower bound in our main theorem still holds, and in particular every weak solution to (1.1), considered as a probability law on \(\mathcal {C}([0,T]\times [0,1])\) (with the supremum norm), has full topological support in the sense that the support is given by \(\{w(t,x)\in \mathcal {C}([0,T]\times [0,1]):w(0,x)=u(0,x)\}\).

Our strategy to characterize the support of (1.1) goes as follows. Since \(\sigma \) is non-degenerate, one expects that (1.1) lies in the same universality class as the linear stochastic heat equation

$$\begin{aligned} \partial _t u(t,x)=\partial _x^2u(t,x)+\dot{W}(t,x),\end{aligned}$$
(1.2)

which is also named as the Edwards–Wilkinson universality class. In this universality class we observe nontrivial limiting behavior under the 1:2:4 scaling relation \(u(t,x)\mapsto \epsilon ^{-1}u(\epsilon ^{-4}t,\epsilon ^{-2}x)\).Footnote 1 From this scaling relation we can expect fairly sharp two-sided probability estimates for the solution of (1.1) on small scales, even in the case that \(\sigma \) is not constant in u. Such computations have been carried out in [1], obtaining matching small ball probability estimates of solutions to (1.1) when \(\sigma \) is Lipschitz continuous in u, with a Lipschitz constant sufficiently small.

When \(\sigma \) is merely Hölder continuous, we expect that it induces a highly nonlinear stretching of space and time, in the following sense: if u is approximated by another Gaussian field \(u'\) on a space-time grid that preserves the 1:2 parabolic scaling, when \(\sigma \) is Lipschitz with small Lipschitz constant we can prove that u stays close to \(u'\) on the desired scale via stochastic calculus, but when \(\sigma \) is merely Hölder we cannot prove u lives close to \(u'\) on that scale with high probability. To overcome this, we adjust the 1:2 space-time parabolic scaling through a reduction of the temporal length scale while keeping the spatial scale fixed. Consequently, we can still obtain nontrivial (upper and lower) probability estimates of fine scale properties of the solution, and this is already sufficient for us to prove the support theorem. The upper and lower bounds of small ball probabilities in this Hölder continuous case however do not have matching exponents in \(\epsilon \), reflecting the fact that irregularity of \(\sigma \) induces high order stretching in space and time. We note that for the linear stochastic heat equation (1.2), we can obtain small ball probabilities where the upper and lower bounds have matching exponents in \(\epsilon ,\) see [6], page 168, Theorem 5.1.

We now state the main theorem.

Denote by \(\mathcal {P}\) the predictable \(\sigma \)-field of the noise \(\dot{W}(t,x)\), generated by functions of the form \(f(x,t,\omega )=X(\omega )\cdot 1_A(x)\cdot 1_{(a,b]}(t),\) with \(A\subset [0,1]\) and X some \(\mathcal {F}_a\)-measurable random variable. We say \(h\in \mathcal {P}\mathcal {C}_b^2\) if \(h\in \mathcal {P}\) and almost surely, \(h,\partial _t h,\partial _x^2h\) are bounded by a fixed constant C.

An important remark before the statement: assuming \(\sigma \) is \(\alpha \)-Hölder in u for any \(\alpha >0\), then by a compactness argument there always exists probabilistic weak solutions to (1.3), see for example [9]. However, there is no proof in the literature that the solution is unique for general \(\alpha \). See Remark 1.3 for pathwise well-posedness results for \(\alpha \) in certain regimes.

Theorem 1.1

Consider solution \(u(t,x)\in \mathbb {R}\) to the stochastic heat equation with Neumann boundary conditions on [0, 1]

$$\begin{aligned} \partial _t u(t,x)=\frac{1}{2}\partial _x^2 u(t,x)+g(t,x,u(t,x))+\sigma (t,x,u(t,x))\dot{W}(t,x),\quad u(0,x)=u_0(x).\end{aligned}$$
(1.3)

Assume \(u_0,h\in \mathcal {P}\mathcal {C}_b^2\) with

$$\begin{aligned} \sup _{x\in [0,1]}|u_0(x)-h(0,x)|<\epsilon /2, \end{aligned}$$

and that for some constants \(D,\mathcal {C}_1,\mathcal {C}_2>0\), \(\alpha \in (\frac{3}{4},1],\) we have

$$\begin{aligned}{} & {} |\sigma (t,x,u)-\sigma (t,x,v)|\le \mathcal {D}|u-v|^\alpha ,\\{} & {} \mathcal {C}_1\le |\sigma (t,x,u)|\le \mathcal {C}_2 \end{aligned}$$

for all \(x\in [0,1]\), \(u,v\in \mathbb {R}\) and \(t\ge 0\), and

$$\begin{aligned} \sup _{t>0,x\in [0,1],u\in \mathbb {R}}|g(t,x,u)|<\infty . \end{aligned}$$

Then for any \(\beta >2-\alpha \) we may find positive constants \(C_0,C_1,C_2,C_3\) and \(\epsilon _0\) depending on \(\beta ,\mathcal {C}_1,\mathcal {C}_2\) and \(\sup _{t,x,u}|g(t,x,u)|\), such that for any \(0<\epsilon <\epsilon _0\), we have

$$\begin{aligned} C_0\exp \left( -\frac{C_1 T}{\epsilon ^{2+4\beta }}\right)&\le P(\sup _{0\le t\le T,x\in [0,1]}|u(t,x)-h(t,x)|\nonumber \\&\le \epsilon )\le C_2\exp \left( -\frac{C_3 T}{(1+\mathcal {D}^2)\epsilon ^{4+2\alpha }}\right) .\end{aligned}$$
(1.4)

If we only assume \(\alpha >0\), then the lower bound in (1.4) holds, that is,

$$\begin{aligned} C_0\exp \left( -\frac{C_1 T}{\epsilon ^{2+4\beta }}\right) \le P(\sup _{0\le t\le T,x\in [0,1]}|u(t,x)-h(t,x)|\le \epsilon ).\end{aligned}$$
(1.5)

Moreover, there exists a \(\mathcal {D}_0>0\) depending only on \(\mathcal {C}_1\), \(\mathcal {C}_2\) such that whenever \(\mathcal {D}<\mathcal {D}_0\), the above estimates (1.4) and (1.5) can hold for \(\beta =2-\alpha \) (the value of the various numerical constants may be changed.)

Since h is arbitrary, this in particular implies that the solution u has full support on Wiener space with respect to the supremum norm. More precisely:

Corollary 1.2

Assume \(\alpha \in (0,1]\). Let \(\mu \) denote the probability law of any possible weak solution to the SPDE (1.3) on \(\mathcal {C}([0,T]\times [0,1]).\) Denote by

$$\begin{aligned} \Omega _{u_0}([0,T]\times [0,1])=\left\{ w(t,x)\in \mathcal {C}([0,T]\times [0,1]): w(0,x)=u_0(x)\right\} , \end{aligned}$$

then for any such \(\mu \) there holds

$$\begin{aligned} {\text {Supp}}\mu =\Omega _{u_0}([0,T]\times [0,1]), \end{aligned}$$

where \(\mathcal {C}([0,T]\times [0,1])\) is endowed with the topology generated by the supremum norm.

Theorem 1.1 is not yet satisfactory in that the upper and lower bounds in (1.4) have a very wide gap and is thus likely to be sub-optimal. This is to be compared with the main result in [1],where the upper and lower estimates have matching powers of \(\epsilon \) whenever the Lipschitz constant of \(\sigma \) is sufficiently small. We believe that the potentially sub-optimal estimate (1.4) arises from purely technical limitations, but these technical constraints are hard to remove in our infinite dimensional setting. Indeed, when one deals with finite dimensional diffusion with no drift, one can represent the solution as a time changed Brownian motion, and the support theorem readily follows. This is however not the case for SPDEs, and for SPDEs we usually can only approximate the solution by some Gaussian variables, as done in [1] and this paper. (This also possibly explains why small ball probability estimates for finite dimensional diffusion are long known and easy to prove but the corresponding estimates for SPDEs are only obtained recently.) As discussed in Footnote 1footnote11, on small scales a 1d Gaussian field is expected to have a 1:2:4 scaling, but if we compose this Gaussian field by a Hölder continuous function which is not Lipschitz, the 1:2:4 scaling will be distorted. Our main interest in this fine scale property is that we will approximate the solution u by some Gaussian field \(u'\), and if this optimal 1:2:4 scaling is violated, our proof only gives an upper bound of distance of u and \(u'\) that is much larger in magnitude than what is considered to be optimal. To sum up, this microscopic, approximation by Gaussian procedure is precisely the place that needs the 1:2:4 scaling and thus will lead to suboptimal estimates for Hölder continuous \(\sigma \). If we can prove the support theorem and small ball probability estimate without using this approximation by Gaussian procedure, we might obtain a sharper, or even matching estimate that improves (1.4).

Remark 1.3

The technical assumption \(\alpha \in (\frac{3}{4},1]\) is only used to match with the well-posedness results in [13] (see also [11]).

For the upper bound to be proved in Sect. 2.3, we are not sure if the proof carries over for any \(\alpha >0\) or not because we have to solve another SPDE (2.4)Footnote 2 on the same probability space. This procedure requires strong well-posedness for every diffusion coefficient with the given Hölder continuity.

Remark 1.4

We have for simplicity worked on the unit interval [0, 1], but everything carries over to finite intervals [0, J] for any \(J>0\). In this paper we assume the solution u is real-valued, but at least the lower bound (1.5) carries over to the vector valued case \({\textbf {u}}(t,x)\in \mathbb {R}^d\), \(d\in \mathbb {N}_+\) without change. These extensions can be found in [1] when \(\sigma \) is Lipschitz. The upper bound in (1.4) might be hard to extend to higher dimensions as the corresponding pathwise well-posedness results are lacking.

There are a few remaining questions. One of them is to obtain support theorems in Hölder semi-norm rather than the supremum norm. Fairly sharp results have been obtained when \(\sigma \) is Lipschitz continuous in u (see [8] for recent progress), but adapting estimates in the existing literature to our Hölder continuous \(\sigma \) seems a bit out of reach. Another more fundamental question is if we assume \(\sigma \) is uniformly elliptic, it is not clear whether the assumption that the Hölder index \(\alpha >\frac{3}{4}\) is necessary for (strong or weak )well-posedness of (1.3). We believe that \(\alpha >0\) is enough for weak well-posedness but could not give a proof. Note that when \(\sigma \) is not assumed to be uniformly elliptic, then \(\alpha >\frac{3}{4}\) is a sharp condition, see [15] for the case \(\alpha <\frac{3}{4}.\)

2 Proof of main theorem

2.1 Reduction to simple cases

We quote relevant reduction steps from [1], Section 2.2 for sake of completeness. We first show that after some simple reductions we can assume \(g=0\) and \(h=0\). These reductions follow from Girsanov theorem and the non-degeneracy of \(\sigma \).

The solution to (1.3) can be reformulated as

$$\begin{aligned} \partial _t u(t,x)=\frac{1}{2}\partial _x^2 u(t,x)+\sigma (t,x,u(t,x))[\dot{W}(t,x)+\sigma ^{-1}g(t,x,u(t,x))]. \end{aligned}$$

For each \(t>0\) let \(P_t\) be \(\mathcal {P}\) restricted to the filtration \(\mathcal {F}_t\). Consider the probability law \(Q_t\) defined as

$$\begin{aligned} \frac{dQ_t}{dP_t}=\int _0^t \int _0^1 \sigma ^{-1}g(t,x,u(t,x)) \cdot W(t,x)-\frac{1}{2}\int _0^t\int _0^1 (\sigma ^{-1}g(t,x,u(t,x))^2 dxdt.\end{aligned}$$
(2.1)

By Girsanov theorem, \(\dot{W}(t,x)+\sigma ^{-1}g(t,x,u(t,x))\) is a space-time white noise with respect to \(Q_t\). Denote by A the event that

$$\begin{aligned} A=\{\sup _{s\in [0,T],y\in [0,1]}|u(s,y)-h(s,y)|<\epsilon \}, \end{aligned}$$

then by Cauchy-Schwartz inequality

$$\begin{aligned} Q_T(A)=E^{p_T}\left[ 1_A\frac{dQ_T}{dP_T}\right] \le \sqrt{P_T(A)}\sqrt{E\left( \frac{dQ_T}{dP_T}\right) ^2}\le \sqrt{P_T(A))}M,\end{aligned}$$
(2.2)

where M depends only on \(T,\mathcal {C}_1\) and \(\sup _{t,x,y}|g(t,x,y)|.\) This implies the lower bound in (1.4) for general g can be deduced from the lower bound in the case \(g=0\).

For the upper bound, a similar argument holds: one only needs to swap P and Q in (2.2) and replace g by \(-g\) in (2.1).

Now we show why we can take \(h=0\). This is outlined in page 6 of [1] but we reproduce here for completeness. Let \(H:=\partial _t-\frac{1}{2}\partial _x^2\) and consider the process

$$\begin{aligned} w(t,x)=u(t,x)-u_0(x)-h(t,x)+h_0(x), \end{aligned}$$

so that \(w(0,x)=0.\) If we set \(\sigma _1(t,x,w)=\sigma (t,x,u)\) and

$$\begin{aligned} g_1(t,x,w)=g(t,x,u)-Hu_0(x)-Hh(t,x)+Hh_0(x), \end{aligned}$$

we have

$$\begin{aligned} \partial _t w(t,x)=\frac{1}{2}\partial _x^2 w(t,x)+g_1(t,x,w)+\sigma _1(t,x,w)\cdot \dot{W}(t,x). \end{aligned}$$

Since \(u_0,h\in \mathcal {P}\mathcal {C}_b^2,\) \(\sup _{t,x,\omega }|g_1(t,x,\omega )|<\infty \), so we are reduced to the case \(h=0\) and \(u(0,x)\equiv 0,x\in [0,1]\).

2.2 Sharp two-sided estimates

Recall the heat kernel on [0, 1] is given by

$$\begin{aligned} G(t,x)=\sum _{n\in \mathbb {Z}}(2\pi t)^{-1/2}\exp \left( -\frac{(x+n)^2}{2t}\right) . \end{aligned}$$

Consider the noise term \(\textbf{N}\) defined as

$$\begin{aligned} \textbf{N}(t,x):=\int _0^t\int _0^1 G(t-s,x-y)\sigma (s,y,u(s,y))W(dyds). \end{aligned}$$

We quote the following large deviations estimate from [1], Proposition 3.4 and Remark 3.1, which is a very precise formulation of the 1:2:4 scaling of Gaussian processes:

Proposition 2.1

Assume that \(\sup _{s,y}|\sigma (s,y,u(s,y))|\le \mathcal {C}<\infty \). Then we can find universal constants \(K_1\) and \(K_2\) such that, for any \(\alpha ,\lambda ,\epsilon >0\),

$$\begin{aligned} \mathbb {P}\left( \sup _{0\le t\le \alpha \epsilon ^4,x\in [0,\epsilon ^2]}|\textbf{N}(t,x)|>\lambda \epsilon \right) \le \frac{K_1}{1\wedge \sqrt{\alpha }}\exp \left( -K_2\frac{\lambda ^2}{\mathcal {C}^2\sqrt{\alpha }}\right) . \end{aligned}$$
(2.3)

Remark 2.2

The estimate (2.3) also holds if the supremum is taken over \(x\in [(k-1)\epsilon ^2,k\epsilon ^2]\) for all \(k\in \mathbb {N}_+\) such that \(k\epsilon ^2\le 1\). This is because the proof of (2.3) in [1] only uses a modulus of continuity estimate of \(\textbf{N}(t,x)\) in t and x (check for [1], Lemma 3.3), and this estimate is independent of the location of x in [0, 1]. Also used is a path of grid points from (0, 0) to (tx), which in this case can be replaced by a path starting from \((0,(k-1)\epsilon ^2)\).

We fix a sufficiently small \(c_0>0\) such that \(0<c_0<\max \{(\frac{K_2}{36\log K_1 \mathcal {C}_2^2})^2,1\}\), and define the discretized mesh of time asFootnote 3:

$$\begin{aligned} t_n=nc_0\epsilon ^4,\quad n\ge 0, \end{aligned}$$

and denote by \(I_n:=[t_n,t_{n+1}]\) the time interval with numbering n. Choose some \(\theta =\theta (\mathcal {C}_1,\mathcal {C}_2)>0\) sufficiently large (with the precise condition given in [1], (2.11)) and fix \(c_1^2=\theta c_0\), the spatial mesh points are chosen as

$$\begin{aligned} x_n=nc_1\epsilon ^2,n\ge 0. \end{aligned}$$

This time-space mesh respects the parabolic 1:2 scaling. Fix a terminal time \(T>0\) and define the terminal index

$$\begin{aligned} n_1:=\min \{n\ge 1:t_n>T\},\quad n_2:=\min \{n\ge 1:x_n>1\}. \end{aligned}$$

Write \(p_{i,j}=(t_i,x_j)\), and consider the following two series of events

$$\begin{aligned} A_n=\{|u(t_{n+1},x)|\le \frac{\epsilon }{3}, \quad x\in [0,1], \text { and } |u(t,x)|\le \epsilon ,t\in I_n,x\in [0,1]\}, \end{aligned}$$

and

$$\begin{aligned} F_n=\{|u(p_{nj})|<\epsilon \text { for all }j\le n_2-2\}. \end{aligned}$$

The strategy of proof is first to fix the u component of \(\sigma \) and obtain an estimate in the Gaussian case, then deduce the general case via an interpolation argument. For the Gaussian case (when \(\sigma \) does not depend on u), we quote the following result from [1], Proposition 2.1:

Proposition 2.3

Under the assumptions of Theorem 1.1, assume further that \(g=0\), \(u_0(x)\equiv 0\) and \(\sigma (t,x,u)\) does not depend on u.

Then there exists constants \(\epsilon _0,C_4,C_5>0\) which depend only on \(\mathcal {C}_1\) and \(\mathcal {C}_2\) such that for any \(0<\epsilon <\epsilon _0,\)

$$\begin{aligned} P(F_n\mid \cap _{k=0}^{n-1}F_k)\le C_4\exp (-C_5\epsilon ^{-2}), \end{aligned}$$

and we can find constants \(C_6,C_7>0\) which depend only on \(\mathcal {C}_1,\mathcal {C}_2\) such that for any \(0<\epsilon <\epsilon _0,\)

$$\begin{aligned} P(A_n\mid \cap _{k=0}^{n-1}A_k)\ge C_6\exp (-C_7\epsilon ^{-2}). \end{aligned}$$

Now we prove the general case (i.e., \(\sigma \) depends on u).

2.3 Upper bound, general case

Define a function

$$\begin{aligned} f_\epsilon (z)={\left\{ \begin{array}{ll} z,\quad \quad |z|<\epsilon ,\\ \frac{\epsilon }{|z|}z,\quad |z|>\epsilon ,\end{array}\right. } \end{aligned}$$

so that \(|f_\epsilon (z)|\le \epsilon \) and \(f_\epsilon \) is Lipschitz continuous. We solve the following SPDE

$$\begin{aligned} \partial _t v(t,x)=\frac{1}{2}\partial _x^2 v(t,x)+\sigma (t,x,f_\epsilon (v(t,x)))\cdot \dot{W}(t,x)\end{aligned}$$
(2.4)

with \(v(0,x)=u_0(x)\), which is well posed because \(\sigma (t,x,f_\epsilon (u))\) is \(\alpha \)-Hölder continuous in u, for \(\alpha >\frac{3}{4}\).

As long as \(|u(t,x)|\le \epsilon \) for all \(x\in [0,1]\) and \(t\in [0,t_1]\), we have \(v(t,x)=u(t,x)\), so we proceed with the proof for v.

The point is to compare v with an auxiliary process \(v_g\) defined by

$$\begin{aligned} \partial _t v_g(t,x)=\frac{1}{2}\partial _x^2 v_g(t,x)+\sigma (t,x,f_\epsilon (u_0(x)))\cdot \dot{W}(t,x), \end{aligned}$$

with \(v_g(0,x)=u_0(x)\), where the diffusion coefficient is independent of \(v_g\). The subscript g in the notation \(v_g\) stands for Gaussian.

The difference process \(D(t,x):=v(t,x)-v_g(t,x)\) is a stochastic integral satisfying

$$\begin{aligned} D(t,x)=\int _0^t\int _0^1 G(t-s,x-y)[\sigma (s,y,f_\epsilon (v(s,y))-\sigma (s,y,f_\epsilon (u_0(y))]\cdot W(dyds). \end{aligned}$$

Define

$$\begin{aligned} H_j=\{v(p_{1j})|\le \epsilon \} \end{aligned}$$

and consider the events

$$\begin{aligned} A_{1,j}= & {} \{|v_g(p_{1j})|\le 2\epsilon \},\\ A_{2,j}= & {} \{|D(p_{1j})|\ge \epsilon \}. \end{aligned}$$

It is clear that \(H_j\subset A_{1,j}\cup A_{2,j}\). Define another sequence of events

$$\begin{aligned} B_n=\{|u(t,x)|\le \epsilon ,t\in I_{n-1},x\in [0,1].\},\quad n\ge 1. \end{aligned}$$

Then clearly \(B_n\subset F_n\) and on \(B_n\), we have \(u(t_n,x)=v(t_n,x)\). Therefore

$$\begin{aligned} P(B_1)\le P\left( \cap _{j=1}^{n_2-2}H_j\right) \le P\left( \cap _{j=1}^{n_2-2}\left( A_{1,j}\cup A_{2,j}\right) \right) . \end{aligned}$$

An elementary set-inclusion argument implies

$$\begin{aligned} P(B_1)\le P(\cap _{j=1}^{n_2-2}A_{1,j})+\sum _{j=1}^{n_2-2}P(A_{2,j}). \end{aligned}$$

We now apply Proposition 2.3 to the process \(v_g\) to deduce

$$\begin{aligned} P(\cap _{j=1}^{n_2-2}A_{1,j})=P(|v_g(p_{1,j})|\le 2\epsilon , \quad j=1,\ldots ,n_2-2)\le C_2\exp (-C_3\epsilon ^{-2}). \end{aligned}$$

By Hölder continuity of \(\sigma \) in u, we deduce that

$$\begin{aligned} |\sigma (s,y,f_\epsilon (v(s,y)))-\sigma (s,y,f_\epsilon (v_0(y)))|\le \mathcal {D}(2\epsilon )^\alpha , \end{aligned}$$

so that by Proposition 2.1 and Remark 2.2, we have for \(j=1,\ldots ,n_2-2,\)

$$\begin{aligned} P(A_{2,j})\le K_1\exp \left( -\frac{K_2}{4\epsilon ^{2\alpha }\mathcal {D}^2\sqrt{c_0}}\right) . \end{aligned}$$

Therefore

$$\begin{aligned} \begin{aligned} P(B_1)&\le C_2\exp (-C_3\epsilon ^{-2})+\sum _{j=1}^{n_2-2}K_1\exp \left( -\frac{K_2}{4\epsilon ^{2\alpha }\mathcal {D}^2\sqrt{c_0}}\right) \\ {}&\le C_2\exp (-C_3\epsilon ^{-2})+\frac{1}{c_1\epsilon ^2}K_1\exp \left( -\frac{K_2}{4\epsilon ^{2\alpha }\mathcal {D}^2\sqrt{c_0}}\right) \\ {}&\le C_4\exp \left( -\frac{C_5}{8(1+\mathcal {D}^2)\epsilon ^{2\alpha }\sqrt{c_0}}\right) , \end{aligned} \end{aligned}$$

whenever \(\epsilon \) is small enough, the \(\epsilon ^{-2\alpha }\) term in the exponent wins over the \(\epsilon ^{-2}\) term, so we keep the former.Footnote 4\(C_4\) and \(C_5\) are universal constants that depend only on \(\mathcal {C}_2\).

The expression shows that when \(\sigma \) is merely Hölder continuous, i.e. \(\alpha <1\), the \(\epsilon ^{2\alpha }\) term dominates in the upper bound. The upper and lower bounds we obtain will not have matching exponents of \(\epsilon \) (they do if \(\alpha =1\)), but both bounds are nontrivial and in particular they lead to the desired support theorem.

By the Markov property, for each \(n\le n_1,\)

$$\begin{aligned} P\left( B_n\mid \cap _{j=1}^{n-1}B_j\right) \le \exp \left( -\frac{C_7}{(1+\mathcal {D}^2)\epsilon ^{2\alpha }}\right) \end{aligned}$$

where \(C_7\) depends only on \(\mathcal {C}_1,\mathcal {C}_2\) and \(\mathcal {D}\).

Therefore

$$\begin{aligned} \begin{aligned} P(|u(t,x)|\le \epsilon ,t\in [0,T],x\in [0,1])&=P\left( \cap _{j=1}^{n_1-1}B_j\right) \\ {}&\le \exp \left( -\frac{\mathcal {C}_7}{(1+\mathcal {D}^2)\epsilon ^{2\alpha }}\right) ^{\frac{T}{\epsilon ^4}} \\ {}&\le \exp \left( -\frac{C_7 T}{(1+\mathcal {D}^2)\epsilon ^{2\alpha +4}}\right) . \end{aligned} \end{aligned}$$

This establishes the upper bound in (1.4).

2.4 Lower bound, general case

We now proceed to prove the corresponding lower bound. The argument roughly follows that in [1], while the last key estimates are different.

Fix some \(\beta >1\) to be determined later, consider a new time mesh as follows:

$$\begin{aligned} \hat{t}_n:=nc_0\epsilon ^{4\beta },n\ge 0, \end{aligned}$$

and the corresponding time intervals \(\hat{I}_n:=[\hat{t}_n,\hat{t}_{n+1}]\). This introduces a finer grid of time when \(\epsilon \) is sufficiently small. We analogously define the events

$$\begin{aligned} \hat{A}_n:=\{|u(\hat{t}_{n+1},x)|\le \frac{\epsilon }{3}, \quad x\in [0,1], \text { and } |u(t,x)|\le \epsilon ,t\in \hat{I}_n,x\in [0,1]\}, \end{aligned}$$

Assuming that \(|u_0(x)|\le \frac{\epsilon }{3}\), \(x\in [0,1]\). Define the stopping time

$$\begin{aligned} \tau =\inf \{t\ge 0:\sup _{x\in [0,1]}|u(t,x)-u_0(x)|>2\epsilon \}, \end{aligned}$$

such that on the event \(\hat{A}_0\) we have \(\tau \ge \hat{t}_1.\) Consider the process

$$\begin{aligned} \widetilde{D}(t,x)=\int _0^t\int _0^1 G(t-s,x-y)[\sigma (s,y,u(s\wedge \tau ,y))-\sigma (s,y,u_0(y))]W(dyds), \end{aligned}$$

and the auxiliary comparison process \(u_g\) solving

$$\begin{aligned} \partial _t u_g=\frac{1}{2}\partial _x^2 u_g+\sigma (t,x,u_0(x))dW(t,x),\quad u_g(0,x)=u_0(x), \end{aligned}$$

and we write \(u(t,x)=u_g(t,x)+D(t,x)\) as before, so that

$$\begin{aligned} D(t,x)=\int _0^t\int _0^1 G(t-s,x-y)[\sigma (s,y,u(s,y))-\sigma (s,y,u_0(y))]W(dyds). \end{aligned}$$

It is clear that \(D(t,x)=\widetilde{D}(t,x)\) whenever \(\tau \ge \hat{t}_1\). Consider the event

$$\begin{aligned} \widetilde{B}_0:=\left\{ |u_g(\hat{t}_1,x)|\le \frac{\epsilon }{6},x\in [0,1],\quad |u_g(t,x)|\le \frac{2\epsilon }{3}\forall t\in \hat{I}_0,x\in [0,1]\right\} . \end{aligned}$$

Then we have the following sequence of set inclusions

$$\begin{aligned} \begin{aligned} P(\hat{A}_0)&\ge P\left( \widetilde{B}_0\cap \left\{ \sup _{0\le t\le \hat{t}_1,x\in [0,1]}|D(t,x)|\le \frac{\epsilon }{6}\right\} \right) \\&=P\left( \widetilde{B}_0\cap \left\{ \sup _{0\le t\le \hat{t}_1,x\in [0,1]}|\widetilde{D}(t,x)|\le \frac{\epsilon }{6}\right\} \right) \\ {}&\ge P(\widetilde{B}_0)-P\left( \sup _{0\le t\le \hat{t}_1,x\in [0,1]}|\widetilde{D}(t,x)|>\frac{\epsilon }{6}\right) . \end{aligned} \end{aligned}$$
(2.5)

The equality in the second line needs some explanation. If \(\tau \ge \hat{t}_1\), then \(D=\widetilde{D}\) on \([0,\hat{t}_1]\). On \(\widetilde{B}_0\cap \{\tau <\hat{t}_1\}\), one must have \(\sup _x |u_g(\tau ,x)-u_0(x)|>\epsilon \), so that \(\sup _x |\widetilde{D}(\tau ,x)|>\epsilon .\)

Since \(\beta >1\) and \(\epsilon <1\), one must have \(\hat{t}_1<t_1\), so that by Proposition 2.3,

$$\begin{aligned} P(\widetilde{B}_0)\ge P(\bar{B}_0)\ge C_1\exp \left( -\frac{C_2}{\epsilon ^2}\right) , \end{aligned}$$

where \(C_1,C_2\) depend only on \(\mathcal {C}_1,\mathcal {C}_2\), and where we define

$$\begin{aligned} \bar{B}_0:=\left\{ |u_g({t}_1,x)|\le \frac{\epsilon }{6},x\in [0,1],\quad |u_g(t,x)|\le \frac{2\epsilon }{3}\forall t\in I_0,x\in [0,1]\right\} . \end{aligned}$$

It remains to estimate the probability of the last event in (2.5).

$$\begin{aligned} \begin{aligned}&P\left( \sup _{0\le t\le \hat{t}_1,x\in [0,1]}|\widetilde{D}(t,x)|>\frac{\epsilon }{6}\right) \le \frac{1}{\sqrt{c_0}\epsilon ^2}P\left( \sup _{0\le t\le \hat{t}_1,x\in [0,\sqrt{c_0}\epsilon ^2]}|\widetilde{D}(t,x)|>\frac{\epsilon }{6} \right) \\&\quad \le \frac{1}{\sqrt{c_0}\epsilon ^2}\frac{1}{\epsilon ^{2\beta -2}}K_1 \exp \left( -\frac{K_2}{\epsilon ^{2\alpha }\mathcal {D}^2\mathcal {C}_2^2\sqrt{c_0}\epsilon ^{2\beta -2}}\right) .\end{aligned} \end{aligned}$$

To be a bit more precise about the exponent of \(\epsilon \), we take \(\alpha =\epsilon ^{4\beta -4}\) and the constant \(\mathcal {C}:=\mathcal {D}(2\epsilon )^\alpha \) in the setting of Proposition 2.1. Comparing the exponents, we see that as long as we take \(\alpha +\beta >2\), we can find some \(C_8,C_9\) depending only on \(\mathcal {C}_1,\mathcal {C}_2\) and \(\mathcal {D}\) such that

$$\begin{aligned} P\left( \sup _{0\le t\le \hat{t}_1,x\in [0,1]}|\widetilde{D}(t,x)|>\frac{\epsilon }{6}\right) \le C_8\exp \left( -\frac{C_9}{\epsilon ^{2(\alpha +\beta -1)}}\right) \end{aligned}$$
(2.6)

and finally

$$\begin{aligned} \mathbb {P}(\hat{A}_0)\ge C_1\exp \left( -\frac{C_2}{\epsilon ^2}\right) - C_8\exp \left( -\frac{C_9}{\epsilon ^{2(\alpha +\beta -1)}}\right) , \end{aligned}$$

with the first term dominating. So we conclude that when \(\epsilon >0\) is sufficiently small, we can find \(C_3,C_4\) depending only on \(\mathcal {C}_1,\mathcal {C}_2,\mathcal {D}\) such that

$$\begin{aligned} \mathbb {P}(\hat{A_0})\ge C_3 \exp \left( -\frac{C_4}{\epsilon ^2}\right) .\end{aligned}$$
(2.7)

By the Markov property, for each \(n\le \hat{n}_1:=\lfloor \frac{T}{\epsilon ^{4\beta }}\rfloor \) we have

$$\begin{aligned} \mathbb {P}(\hat{A}_n\mid \cap _{j=0}^{n-1}\hat{A}_j)\ge C_3\exp \left( -\frac{C_4}{\epsilon ^2}\right) . \end{aligned}$$

Thus for some constant \(C_1>0\) depending on \(\mathcal {C}_1,\mathcal {C}_2\),

$$\begin{aligned} P\left( |u(t,x)|\le \epsilon ,t\in [0,T],x\in [0,1]\right) \ge \exp \left( -\frac{C_1}{\epsilon ^{2}}\frac{T}{\epsilon ^{4\beta }}\right) \ge \exp \left( -\frac{C_1T}{\epsilon ^{4\beta +2}}\right) . \end{aligned}$$

The various constants \(C_1,C_2,\ldots ,C_8,C_9\) depend only on \(\mathcal {C}_1,\mathcal {C}_2\) and may change from line to line. This establishes the lower bound in (1.4).

Finally, we note that we may take a different estimate of (2.6) and deduce that, for some constants \(C_8',C_9'\) depending on \(\mathcal {C}_1,\) \(\mathcal {C}_2\) but not \(\mathcal {D}\),

$$\begin{aligned} P\left( \sup _{0\le t\le \hat{t}_1,x\in [0,1]}|\widetilde{D}(t,x)|>\frac{\epsilon }{6}\right) \le C_8'\exp \left( -\frac{C_9'}{\epsilon ^{2(\alpha +\beta -1)}\mathcal {D}^2}\right) . \end{aligned}$$

Then there exists some \(\mathcal {D}_0\) depending only on \(\mathcal {C}_1\), \(\mathcal {C}_2\) so that, for any \(\mathcal {D}<\mathcal {D}_0\) and \(\beta =2-\alpha \), we can find \(C_3'\), \(C_4'\) depending on \(\mathcal {C}_1\), \(\mathcal {C}_2\) such that (note that (2.7) requires \(\beta +\alpha >2\))

$$\begin{aligned} \mathbb {P}(\hat{A_0})\ge C_3'\exp \left( -\frac{C_4'}{\epsilon ^2}\right) . \end{aligned}$$

The rest of the estimate proceeds exactly the same as before. The net benefit we get from this business is that whenever \(\mathcal {D}<\mathcal {D}_0\), all the estimates in Theorem 1.1 can be done in the case \(\alpha +\beta =2\). This justifies the last assertion of Theorem 1.1.