Paper The following article is Open access

Bi-level iterative regularization for inverse problems in nonlinear PDEs

Published 6 March 2024 © 2024 The Author(s). Published by IOP Publishing Ltd
, , Citation Tram Thi Ngoc Nguyen 2024 Inverse Problems 40 045020 DOI 10.1088/1361-6420/ad2905

0266-5611/40/4/045020

Abstract

We investigate the ill-posed inverse problem of recovering unknown spatially dependent parameters in nonlinear evolution partial differential equations (PDEs). We propose a bi-level Landweber scheme, where the upper-level parameter reconstruction embeds a lower-level state approximation. This can be seen as combining the classical reduced setting and the newer all-at-once setting, allowing us to, respectively, utilize well-posedness of the parameter-to-state map, and to bypass having to solve nonlinear PDEs exactly. Using this, we derive stopping rules for lower- and upper-level iterations and convergence of the bi-level method. We discuss application to parameter identification for the Landau–Lifshitz–Gilbert equation in magnetic particle imaging.

Export citation and abstract BibTeX RIS

Original content from this work may be used under the terms of the Creative Commons Attribution 4.0 license. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.

1. Parameter identification

1.1. Introduction

We investigate the inverse problem of recovering the spatially dependent parameter θ in the nonlinear evolution system

Equation (1)

with nonlinear model f and linear initial condition A; here, $\dot{u}$ denotes time derivative of u. The parameter identification is carried out with additional linear measurement of the state u

Equation (2)

which is usually contaminated by noise of some level δ, resulting in yδ with $\|y^\delta-y\|\unicode{x2A7D}\delta$.

The parameter identification (1) and (2) can be formulated in a reduced setting with the parameter-to-observation map G as the forward operator

Equation (3)

Essentially, S is a well-defined parameter-to-solution map whose output is the unique solution of the nonlinear PDE (1)

Equation (4)

while L is the linear observation operator

Equation (5)

This abstract setting is the framework considered throughout the paper.

1.1.1. Regularization with state approximation error.

Iterative regularization algorithms for solving the inverse problem (3)–(5), such as Landweber-type or Newton-type methods, require the repeated evaluation of $S(\theta)$ to perform an update for the search parameter. In practice, one invokes some PDE solvers, for instance finite difference (FDM) or finite element methods (FEM), to instead find a numerical solution $\tilde{u}: = \widetilde{S}(\theta)\approx S(\theta) = u$. The task becomes challenging when the PDE model f is nonlinear. Nonlinear solvers frequently involve an application of the fixed point theorem to handle the nonlinearity operation, which results in additional internal iterations. The convergence for numerically approximating u for nonlinear PDEs is rather technical. References [2, 3, 5, 8] are a few examples of FEM solvers for the nonlinear Landau–Lifshitz–Gilbert equation in magnetic particle imaging (more details in section 5.2). Approximating u by $\tilde{u}$ introduces additional error to the primary reconstruction process for the parameter θ; as such, it is important that we consider the effect that the propagation of the state approximation error $\epsilon: = \|\tilde{u}-u\|$, or the discretization error in case of FEM, has on our recovery of θ. This is the motivating point to this work.

Inspired by this interesting line of research, in particular when the underlying PDE model is nonlinear, we investigate a general framework suitable for inverse coefficient problems. In this spirit, we consider the problem of approximating the solution to the PDE (1) as another inverse problem strategically embedded into the main ill-posed parameter identification problem.

1.1.2. Introducing the lower-level.

Adapting our notation to the framework of iterative methods, we call the iteration for estimating u the lower-level iteration, and the iteration for reconstructing θ the upper-level iteration. To this end, we first recast the lower level problem as an inverse problem in which the unknown is the state u. The forward map, at a given parameter θ, is the PDE model (1)

Equation (6)

with the nonlinear model $f:X\times\mathcal{V}\to \mathcal{U}^*$ and the initial condition $A:X\to H$. We aim at minimizing the PDE residual towards zero, that is solving

Equation (7)

with the image $\varphi\in\mathcal{U}^*\times H$ simply being zero. Seeking a good approximation $\widetilde{S}(\theta)$ for the exact state $S(\theta)$ in (4) is equivalent to finding an $\tilde{u}\in\mathcal{V}$ such that the PDE residual (7) is sufficiently small. Here, the choice of function space plays an important role and will be discussed in detail later.

The state approximation error $\epsilon: = \|\tilde{u}-u\|$ in the lower-level iteration must be handled with care: Allowing it to be large saves computational time, yet at the same time, it must be kept small enough such that when it enters the upper-level iteration, one can still obtain an acceptable reconstruction result for θ.

1.1.3. Bi-level as a mixed approach.

The suggested approach could be seen as a combined version of the reduced formulation (3)–(5) and an all-at-once approach. All-at-once formulations is a recent approach introduced in optimization with PDE constraint. This concept was subsequently studied in the context of inverse problems [6, 7, 16, 27, 28, 48], and more recently specifically for time dependent inverse problems [32, 41]. In this setting, one constructs, instead of the reduced map G (3), the higher-dimensional all-at-once map $\mathbb{G}$ and

Equation (8)

By treating u as an independent unknown, one jointly recovers the parameter and the state $(\theta,u)$, bypassing the step of solving the nonlinear PDE (1) exactly. This feature makes the all-at-once approach powerful in practice. A comparison between these formulations can be found in [32, 41].

We remark that the classical reduced approach usually brings in certain structural conditions on the model f to induce unique existence for the PDE (1). The new all-at-once approach (8), as explained, bypasses the need to study unique existence as well as solving the PDE exactly. This raises the question: Can one utilize accessible properties of f, and still bypass the exact solution for any nonlinear PDE? This is the viewpoint that we take in this study. In particular, we employ coercivity of the PDE model to ensure convergence of the lower-level iteration. Accordingly, as with the all-at-once approach, no nonlinear PDE needs to be solved exactly. Essentially, bi-level as a mixed approach combining the advantages of the reduced setting and the all-at-once setting is the new perspective of this work.

1.1.4. Connection and contrast to other directions.

The importance of analyzing the finite-dimensional approximation error in an inverse problem framework has been pointed out in [40]; in particular, discretization error in FEM was treated in [9, 19, 20, 25, 33]. These stimulating works specialize in inverse problems with linear PDEs, and mainly in the context of variational regularization. Here, we focus on a general framework of inverse coefficient problems with nonlinear PDEs. In addition, our research focuses on Landweber regularization instead of variational regularization. Furthermore, we do not use FEM to approximately solve the PDE, but rather take the perspective of solving the nonlinear PDE as a lower-level inverse problem.

We point to the fact that the bi-level viewpoint has been realized in optimization and is an active research area on its own [4]. There, the main focus is on optimality conditions rather than, as is the case in inverse problems, regularization parameters; in our case, the regularization parameters will take the form of stopping rules for the Landweber iteration. To the best of the author's knowledge, the bi-level Landweber (or gradient descent) iteration has yet to be discussed for inverse problems.

Interestingly, the idea of approximating the inverse component in a regularized iteration by another iteration can already be found in existing literature. For Newton-type method, [17, 26, 43] propose to approximate the inverse component $[\alpha_k\text{Id}+G^{{\prime}}(\theta_k)^*G(\theta_k)]^{-1}$ by either Landweber or Conjugate Gradient (CG) inner iteration, forming the so-called Newton-Landweber and Newton-CG methods. Independently, the inner iteration in our work is to approximate $S(\theta_k) = F(\theta_k)^{-1}$ (recalling that $G = L\circ S$), rather than terms related to the derivative Gʹ. The outer iteration in our algorithm also employs the Landweber method. Following the established pattern, we might refer to our approach as a Landweber-Landweber method.

It is worth remarking on the growing popularity of the model order reduction approach for large-scale inverse problems [14]. This and our approach align in the idea of solving PDEs inexactly. If one replaces the lower-level by other inexact PDE solvers, our upper-level analysis is still well-integrated. In particular, it suggests an adaptive approximation/discretization estimation, e.g. via increasing degrees of freedom in the solvers, that can ensure convergence of the regularized reconstruction.

1.1.5. Contribution.

In a broad sense, we exploit the structure of the inverse coefficient problem $G(\theta) = L\circ S(\theta)$, where in practice, the observation operator L is linear but ill-posed, while the PDE solution map S is nonlinear but well-posed. In addition, we pay attention to nonlinear time-dependent PDEs, a consideration studied relatively infrequently.

The techniques in this study are largely influenced by the seminal works [29, 45] for the use of the tangential cone condition in iterative regularization to handle nonlinearity. Regarding regularization parameter choice technique, we are inspired by the very recent work [35], in which an a priori stopping scheme was introduced, as opposed to a classical posterior stopping rule. Although our proof strategy is motivated by these profound results, there are important differences between these works and our analysis. First, we investigate the Landweber method in a bi-level framework, in comparison to the standard single-level setting. Second, our work is tailored particularly to parameter identification in PDEs, rather than to the abstract formulation $F(x) = y$. Third, from a practical point of view, our bi-level algorithm (Algorithm 1) is ready to use in real-life applications, as all the ingredients in the algorithm are explicitly expressed.

Algorithm 1.
  • 1.  
    {Single-level} Iterations with exact S. Update parameter:
    Equation (15)
    where at each j
    Equation (16)
    then compute adjoint
    Equation (17)
  • 2.  
    {Bi-level} Iterations with approximated $\widetilde{S}_{K(j)}$. {Upper-level} Update parameter:
    Equation (18)
    {Lower-level} where at each j, meaning at each $\widetilde{\theta}^{\delta,j}$ , iterate:
    Equation (19)
    {back to Upper-level}Then compute the approximate adjoint
    Equation (20)

Concretely, we impose a coercivity assumption borrowed from PDEs theory on the lower Landweber iteration. This yields interesting outcomes such as verification of tangential cone condition (lemma 3), convergence rate (theorem 1) and monotonicity of the residual sequence (proposition 2). For the upper Landweber iteration, we carefully control the interaction between the state approximation error and the noise level, and their propagation through the adjoint process and iterative update. As our main result, we provide explicit adaptive stopping rules, both for lower and upper iteration, confirming convergence of the bi-level scheme (theorem 2).

1.1.6. Structure.

The paper can be outlined as follows: After an introduction of the problem setting in section 1.2, we present the bi-level algorithm in section 2. Section 3 is dedicated to studying the lower-level problem. Section 4 advances the investigation to the upper-level problem. Section 5 verifies the proposed assumptions and discusses the application to magnetic particle imaging. Finally, we summarize our findings and future prospects in section 6.

1.2. Problem setting

1.2.1. Setting

  • The space of the spatially dependent parameter is X, and the data space is denoted by $\mathcal{Y}$. Notionally, calligraphic letters indicate spaces that involve the time variable t.
  • The state space is $\mathcal{U}: = \{u\in L^2(0,T;U)\}$, with $U\hookrightarrow H\hookrightarrow U^*$ forming a Gelfand triple.
  • The smooth state space is $\mathcal{V}: = \{u\in \mathcal{U}:\dot{u}\in \mathcal{U}^*\} = \{u\in L^2(0,T;U): \dot{u}\in L^2(0,T;U^*)\}$.
  • The image space, which the PDE residual belongs to, is $\mathcal{U}^* = L^2(0,T;U^*)$. Our choice of $\mathcal{V}$ ensures well-definedness and boundedness of $\frac{\textrm{d}}{\textrm{d}t}:\mathcal{V}\to\mathcal{U}^*$. Unlike the reduced formulation, where $\mathcal{U}^*$ does not appear, in the all-at-once setting (8) and the combined setting proposed in (6), the PDE image space $\mathcal{U}^*$ plays a clear role, guaranteeing well-definedness for the lower-level problem.The state space $\mathcal{V}$ must be sufficiently smooth for well-definedness of the PDE (1) and existence of an adjoint state (lemma 1). This function space setting is inspired by the book [44, Chapter 8].
  • X, $\mathcal{Y}$, $\mathcal{U}$, $\mathcal{V}$, $\mathcal{U}^*$ are all Hilbert spaces, ensuring that our problem remains in a Hilbert space framework.
  • The parameter-to-state (or solution) map $S: X\ni\theta\mapsto u\in\mathcal{V}(\subseteq\mathcal{U})$ is nonlinear and assumed to be Fréchet differentiable.
  • The observation (or measurement) operator $L:\mathcal{U}\ni u\mapsto y\in\mathcal{Y}$ is linear and bounded. In real life, L is usually a compact operator; it cannot capture smoothness of $u\in\mathcal{V}$ due to discrete or noisy measurements. Its domain $\mathcal{U}$ hence does not need to be as smooth as $\mathcal{V}$.
  • The choice of data (or observation) space $\mathcal{Y}$ depends on how data is acquired, e.g. full measurements, partial measurements or snapshots of u; thus we keep $\mathcal{Y}$ as a general Hilbert space. The supercsript δ in yδ denotes the noise level.
  • The initial value operator $A:X\to H$ is linear and bounded. The continuous embedding $\mathcal{V}\hookrightarrow C(0,T;H)$ [44, section 7.2] enables the time-wise evaluation of u, ensuring well-definedness of the setting $u(t_0) = A\theta\in H$.
  • The PDE model $f:(0,T)\times X\times U\to U^*$ is nonlinear and is assumed to satisfy the Carathéodory condition. This means that for each $\theta\in X$, the mapping f is measurable with respect to t for any $u\in U$, and continuous with respect to u for almost every fixed $t\in(0,T)$. In this way, f can be naturally supposed to induce the similarly denoted Nemytskii operator
    We refer to [44, 47] for details on Carathéodory and Nemytskii mappings.
  • Finally, Ω is a smooth and bounded spatial domain. The time line $[0,T]$ is finite.

1.2.2. Notation

  • We use the capitalized subscript R for constants relating to the upper-level problem (e.g. (46) and (47), while the lower-case subscript r is reserved for constants in the lower-level problem (e.g. (31) and (32)). The subscript S is used in constants relating to the parameter-to-state map S (e.g. (22)). Regarding iteration index, the superscript $(\cdot)^j$ denotes the iterates of the upper-level problem, and subscript $(\cdot)_k$ is used for the lower-level problem.
  • $B^*$ denotes the Hilbert space adjoint, and $B^\star$ denotes the Banach space adjoint of a linear, bounded operator B. The notation $(\cdot,\cdot)$ means the inner product, and $\langle\cdot,\cdot\rangle$ is the dual paring (cf proposition 1).
  • $D_U:U\to U^*$ and $I_X:X^*\to X$ are isomorphisims (cf proposition 1). As an example, consider $U = X = H^1_0(\Omega)$. In this case, DU might be a differential operator, e.g. $(\text{Id}-\Delta)$; and IX could be an integral operator, e.g. $(\text{Id}-\Delta)^{-1}$.
  • $\|B\|_{X\to Y}$ denotes the operator norm of $B:X\to Y$. In most of the cases, where it is clear which norms are used due to the definition of the operators, we abbreviate, e.g. $\|L\|: = \|L\|_{\mathcal{U}\to\mathcal{Y}}, \|A\|: = \|A\|_{X\to H}$ etc. Nevertheless, in some computations when technicality is involved, we explicitly write out the norm. We also write various other norms in shortened form, e.g. $\|\cdot\|_{L^2}: = \|\cdot\|_{L^2(\Omega)}$, $\|u\|_{L^2(L^2)}: = \|u\|_{L^2(0,T;L^2(\Omega))}$.
  • The constant $C_{X\to Y}$ indicates the norm of the continuous embedding $X\hookrightarrow Y$.
  • Throughout the paper, we assume that the PDE (1) and (2) is solvable, and denote by $\theta^\dagger$ the exact parameter and $u^\dagger$ the corresponding exact solution. $B_R(\theta^\dagger)\subset X$ is the ball of radius R around the ground truth $\theta^\dagger$; similar interpretation applies for the ball $B_r(u^\dagger)\subseteq\mathcal{V}$.
  • For the lower-level problem in section 3, ${_{\theta}{B}_{r/2}(u^*)}\subseteq\mathcal{V}$ denotes the ball of radius $r/2$ centered at $u^*$, where $u^*$ is a solution of $F(\theta)(u) = \varphi$ at a given θ (see (29)). $f^{{\prime}}_\theta,\ f^{{\prime}}_u$ are the partial derivatives of f, while $\nabla f$ is its gradient.

2. Bi-level iteration

2.1. Bi-level algorithm

We now establish the bi-level Landweber algorithm 1. The main idea is to first perform upper-level iteration for θ by the standard Landweber iteration with the exact forward map $G = L\circ S$. We then replace the parameter-to-state map S by an approximation $\widetilde{S}_{K(j)}$ obtained through a lower-level Landweber iteration. $\widetilde{S}_{K(j)}$ enters the forward and backward (adjoint) evaluation in the upper-level iteration, yielding a gradient update step for θ. After Algorithm 1 is constructed, we perform some preliminary error analysis in section 2.2.

First, recalling the set up of the forward map $G(\theta) = L\circ S(\theta)$ in (3)–(5), the standard Landweber iteration for θ in this case reads as

Replacing S by its approximation $\widetilde{S}_{K(j)}$, the Landweber iteration with $\widetilde{S}_{K(j)}$ becomes

Equation (9)

where at each j, that is, each $\widetilde{\theta}^{\delta,j}$, the state approximation $\widetilde{S}_{K(j)}(\widetilde{\theta}^{\delta,j}): = u^{\,j}_{K(j)}$ is the outcome of the lower-level iteration

Equation (10)

with the PDE map $F(\widetilde{\theta}^{\delta,j})$ described in (6) and (7). The stopping rule K(j) for the lower-level iteration as well as $\widetilde{j}^*$ for the upper-level iteration will be the subject of the later sections.

In order to see how the state approximation enters the adjoint $S^{{\prime}}(\widetilde{\theta}^{\delta,j})^*$, we first of all derive the adjoint process.

Proposition 1 (Adjoint). Let $D_U:U\to U^*$ and $I_X:X^*\to X$ be isomorphisms.

Then, the Hilbert space adjoint of Gʹ with G defined in (3)–(5) is given by

Equation (11)

Equation (12)

where z is the solution to the final value problem at source DU v

Equation (13)

Here, the Banach space adjoints are

Proof. As the parameter-to-state map $S:X\to\mathcal{V}$ in (4) is assumed to differentiable, $S^{{\prime}}(\theta)\xi = :p\in\mathcal{V}$ solves the sensitivity (or linearized) equation

Equation (14)

where $u = S(\theta)$ is the solution to the original PDE (1). Note that $p\in\mathcal{V}$, with $\mathcal{V}$ being smooth as defined in section 1.2.1, along with the Nemytskii assumption on f ensure $\dot{p}+f^{{\prime}}_u(\theta,u)p\in {\mathcal{U}^*}$ and $f^{{\prime}}_\theta(\theta,u)\xi\in {\mathcal{U}^*}$, confirming well-definedness of the sensitivity equation.

In the following computation, we will respectively use (13), integration by parts and (14) with the helps of isomorphisms to translate inner products to dual pairings. For every $\xi\in X, v\in\mathcal{U}$, one has

This proves that the adjoints of $S^{{\prime}}(\theta)$ are as shown in (12). Applying the chain rule, we obtain the adjoint of the composition map

as claimed in (11). □

With the adjoints detailed, we can now state the Landweber iterations for the exact problem (single-level) and the approximated problem (bi-level).

Remark 1. In the bi-level algorithm, no exact S appears; equivalently, no nonlinear PDE needs to be exactly solved. The state approximation $\widetilde{S}_{K(j)}(\widetilde{\theta}^{\delta,j})$ enters the residual $(L\widetilde{S}_{K(j)}(\widetilde{\theta}^{\delta,j})-y^\delta)$ as well as the adjoint (20). We emphasize that the term $S^{{\prime}}(\widetilde{\theta}^{\delta,j})^*$ appearing here is not, in fact, the adjoint of the derivative of the exact S evaluated at $\widetilde{\theta}^{\delta,j}$, but an approximation thereof.

2.2. Preliminary error analysis

At this stage, the Landweber algorithms for single-level and bi-level have been established. The two algorithms differ in the state and the adjoint as mentioned in remark 1. We thus examine the system output error and the adjoint error with respect to a given approximation error $\epsilon_{K(j)}$; this is the focus of this section.

The state approximation error $\epsilon_{K(j)}$ will be quantified in the succeeding section. The subscript K(j) alludes to the fact that we will eventually derive a stopping rule K of the lower-level with respect to the upper-level iterate j. For now, we assume that K(j) is given.

Lemma 1 (System output error). Denote the state approximation error $\epsilon_{K(j)}$ at the input $\widetilde{\theta}^{\delta,j}\in B_R(\theta^\dagger)$ by

Equation (21)

and assume that there exists a constant $M_S\gt0$ such that

Equation (22)

Given any other input $\theta^{\,j}\in B_R(\theta^\dagger)$, the system output error is

Equation (23)

Proof. Differentiability of $S:X \to \mathcal{V}$, thus of $S:X \to \mathcal{U}$ due to $\mathcal{V}\hookrightarrow\mathcal{U}$, and application of the mean value theorem imply

meaning

 □

Remark 2.  MS in (22) is a locally uniform bound for solutions of the linearized PDE (14). A sufficient condition for (22) to hold is Lipschitz continuity of the parameter-to-state map S. When θ plays the role of a source term or initial data, results regarding Lipschitz continuity of S for general classes of nonlinear PDE model f exist in the literature [44, theorem 8.32].

We now turn our attention to the discrepancy between the exact adjoint and the approximate adjoint with respect to the state approximation error $\epsilon_{K(j)}$.

Lemma 2 (Adjoint state error). Let the assumptions in lemma 1 hold. Suppose that the solution z of the adjoint PDE

depends continuously on the source term

Equation (24)

Furthermore, assume that $f^{{\prime}}_\theta,f^{{\prime}}_u$ are Lipschitz continuous, with

Equation (25)

and denote $ L_{R,r}: = L_{\nabla f}\left(R+rC_{\mathcal{V}\to\mathcal{U}}\right)+\|\,f^{{\prime}}_\theta(\theta^\dagger,u^\dagger)\|_{X\to{\mathcal{U}^*}}+\|\,f^{{\prime}}_u(\theta^\dagger,u^\dagger)\|_{\mathcal{U}\to{\mathcal{U}^*}}$.

Then, given any other input $\theta^{\,j}\in B_R(\theta^\dagger)$, we obtain the adjoint error

Equation (26)

Proof. The Lipschitz condition (25) implies boundedness of the derivatives, due to

Equation (27)

recalling that R and r are the radii of the balls $B_R(\theta^\dagger)\subset X$ and $B_r(u^\dagger)\subset \mathcal{V}$. We will use this bound in evaluating the term Q2 later in the proof.

Consulting Algorithm 1, subtracting the approximate adjoint state (20) from the exact adjoint state (17), then inserting the mixed term $f^{{\prime}}_\theta(\widetilde{\theta}^{\delta,j},\widetilde{S}_{K(j)}(\widetilde{\theta}^{\delta,j}))^*z$, we see that for any $v\in \mathcal{U}$,

We recall that z and $\widetilde{z}$ are, respectively, the solutions of

Inspecting Q2, we see that the difference $z-\widetilde{z}$ solves

Equation (28)

Now, using (24), we can bound z by the source term DU v. In a similar fashion, we bound $z-\widetilde{z}$ by the source term $(-f^{{\prime}}_u(\theta^{\,j},S(\theta^{\,j})+f^{{\prime}}_u(\widetilde{\theta}^{\delta,j},\widetilde{S}_{K(j)}(\widetilde{\theta}^{\delta,j}))^\star z$, then estimate this source further via the Lipschitz condition (25). More precisely, we have

Using these and the boundedness of the derivative from (27), we can estimate Q1, Q2 and Q3 by the parameter difference $\|\theta^{\,j}-\widetilde{\theta}^{\delta,j}\|$ and their difference propagating through the state approximation $\|S(\theta^{\,j})-\widetilde{S}_{K(j)}(\widetilde{\theta}^{\delta,j})\|$, i.e.

In Q2, we have applied the estimate $\|\,f^{{\prime}}_\theta(\widetilde{\theta}^{\delta,j},\widetilde{S}_{K(j)}(\widetilde{\theta}^{\delta,j}))^\star\|_{\mathcal{U}\to X^*}\unicode{x2A7D} L_{R,r}$ derived in (27).

In the last step, combining Q1, Q2 and Q3, then applying lemma 1 for $L = \text{Id}$ to further estimate $\|S(\theta^{\,j})-\widetilde{S}_{K(j)}(\widetilde{\theta}^{\delta,j})\|_\mathcal{U}$, we arrive at

which is (26); this completes the proof. □

At this point, we have analyzed the system output error and adjoint error in the bi-level algorithm 1 via the state approximation error $\epsilon_{K(j)}$. We now study $\epsilon_{K(j)}$ itself.

3. The lower-level iteration

In this section, we approximate S by $\widetilde{S}$ at given $\theta\in B_R(\theta^\dagger)$ via the procedure (19) in Algorithm 1. In particular, we examine convergence and convergence rate of (19) in order to derive the state approximation error $\epsilon_{K(j)}: = \|S(\widetilde{\theta}^{\delta,j})-\widetilde{S}_{K(j)}(\widetilde{\theta}^{\delta,j})\|_\mathcal{U}$ in an explicit form. Optimally, $\epsilon_{K(j)}$ should be controllably small by letting the lower-level scheme iterate until a sufficiently large stopping index K(j) is reached. The lower-level stopping index K(j) will be derived in the next section, when its role in the convergence of the upper-level iteration becomes relevant.

For convenience, we shorten the notation in the lower-level problem (6). Explicitly, for a fixed θ, we introduce the model operator

Equation (29)

thus, ${_{\theta}{\mathbf{F}}}^{{\prime}}$ will indicate the derivative with respect to the state u, and an exact solution $u^*$ to (29) for a given θ satisfies

The procedure (19) in Algorithm 1, dropping the superscript j, now takes the form

Equation (30)

Finally, we recall the use of $(\theta^\dagger,u^\dagger)$ to denote the exact parameter and exact corresponding state of the original inverse problem (1) and (2).

The following result presents the convergence analysis for the lower-level Landweber iterates. We derive the convergence rate in the noise-free case not via the well-known source condition, but rather through a coercivity assumption. The weak convergence analysis is carried out without assuming weak closedness of ${_{\theta}{\mathbf{F}}}$.

Theorem 1 (Convergence and rate). Suppose that at each $\theta\in B_R(\theta^\dagger)$, the equation ${_{\theta}{\mathbf{F}}}(u) = \varphi$ is solvable at some $u^*\in B_{r/2}(u^\dagger)$.

  • i)  
    Assume that there exists a constant $M_r\gt0$ such that
    Equation (31)
    and a constant $\mu_r\gt0 $ such that the weak tangential cone condition (wTC)
    Equation (32)
    holds uniformly for all $\theta\in B_R(\theta^\dagger)$.Then, the whole sequence $\{u_k\}$ as given in (19) and (30) remains in the ball $B_r(u^\dagger)$ for all $k\in\mathbb{N}$. Furthermore, the sequence converges weakly to a solution of ${_{\theta}{\mathbf{F}}}(u) = \varphi$.
  • ii)  
    If additionally ${_{\theta}{\mathbf{F}}}$ is coercive, in the sense that there exists some $C_{coe}\gt0$ such that
    Equation (33)
    for all $u\in B_r(u^\dagger)$ and all $\theta\in B_R(\theta^\dagger)$, then we obtain strong convergence of $\{u_k\}$, with rate
    Equation (34)

Before proceeding with the proof, we make some observation on these assumptions.

Remark 3 (On assumptions of theorem 1). 

  • (i)  
    A sufficient condition for (31) is that $\|\,f^{{\prime}}_u(\theta,u)\|_{X\times\mathcal{V}\to\mathcal{U}^*}$ is uniformly bounded in $B_R(\theta^\dagger)\times B_r(u^\dagger)$. This is due to the fact that we already have boundedness of the linear operators $\|\frac{d}{\textrm{d}t}\|_{\mathcal{V}\to\mathcal{U}^*}, \|A\|_{X\to\mathcal{U}^*}$ and $\|(\cdot)_{t = 0}\|_{\mathcal{V}\to\mathcal{U}^*}$ guaranteed from the choice of function spaces suggested in section 1.2.1. The magnitude condition $M_r\lt\sqrt{2}$ is easily attained by scaling.
  • (ii)  
    Assumption (32) is equivalent to
    This is the form of the weak tangential cone condition introduced in [36, 45] with the tangential cone constant $c_{tc}: = (1-\frac{M_r^2+\mu_r}{2})\in(0,1)$. Its strong version (sTC) takes the form
    Moreover, as the initial condition operator A is linear, (wTC) and (sTC) in fact constrain only the nonlinear model f here; moreover, this constraint is with respect to the state u only, and not to the parameter θ (see also remark 10).
  • (iii)  
    Assumption (33) can be relaxed to the weaker norm $\|u-u^*\|^\alpha_\mathcal{U}$. In this case, (34) yields the rate $\|u_k-u^*\|_\mathcal{U} = \mathcal{O}\left(\frac{1}{k^{1/(2\alpha)}}\right)$; this rate is sufficient for the treatment of the upper-level problem in the later sections, as we need only quantify $\epsilon_{K(j)}: = \|S(\widetilde{\theta}^{\delta,j})-\widetilde{S}_{K(j)}(\widetilde{\theta}^{\delta,j})\|_\mathcal{U}$, the state approxmation in $\mathcal{U}$-norm.

Proof of theorem 1. (i) Fix $\theta\in B_R(\theta^\dagger)$ and ${_{\theta}{\mathbf{F}}}(u^*) = \varphi$. Following the classical approach [18, 29], we show Fejér monotonicity of the error sequence via the boundedness assumption (31) on the derivative and the weak tangential cone condition (32):

Equation (35)

This estimate shows that for each θ, as long as u0 is inside the ball ${_{\theta}{B}_{r/2}(u^*)}$, the whole sequence uk remains in ${_{\theta}{B}_{r/2}(u^*)}$. Together with the fact $u^*\in B_{r/2}(u^\dagger)$, it follows that $u_k\in B_r(u^\dagger)$ for all $\theta\in B_R(\theta^\dagger)$. As a consequence, there exists subsequence $\{u_{k_n}\}$ weakly convergent to some $\bar{u}\in\mathcal{V}$, and

as $B_r(u^\dagger)$ is closed and convex.

Next, summing the inequality (35) from 0 up to any arbitrary index $K\in\mathbb{N}$ results in

Equation (36)

Thus, together with the uniform bound (31) on the derivative, we have convergence of the image and the gradient step series;

Now as $\bar{u}\in B_r(u^\dagger)$, applying again the weak tangential cone condition (32) then inserting ϕ and using the weak convergence result $u_{k_n}\rightharpoonup\bar{u}$ and the strong convergence ${_{\theta}{\mathbf{F}}}(u_k)\to\varphi$, we have

This implies

proving that the weak limit $\bar{u}$ of any weakly convergent subsequence is in fact an exact solution.

Next, we deduce weak convergence of the whole sequence $\{u_k\}$ from Fejér monotonicity. Similar to [10, lemma 9.1], let $u^*$, $\bar{u}$ be two weak accumulation points, thus exact solutions, as argued above. Then there exist subsequences $\{u_{k_n}\}$, $\{u_{k_m}\}$ with $u_{k_n}\rightharpoonup u^*$ and $u_{k_m}\rightharpoonup\bar{u}$ when n, $m\to\infty$. Accordingly,

In the last line, these four limits exist since $\{u_{k_n}\},\{u_{k_m}\}$ are extracted from the iterate sequence $\{u_k\}$, which has monotone decreasing distance to any true solution, as shown in (35). As these limits coincide, using a sub-subsequence argument, we conclude weak convergence of the full sequence to a solution, that is,

ii) For the convergence rate, with coercivity (33) (thus $u^*$ is unique) and strong convergence of the image, we have strong convergence of uk to the exact $u^*$; in particular,

By Fejér monotonicity, we obtain the rate

Equation (37)

for the iterate sequence; for the image sequence, we similarly have

Equation (38)

 □

Remark 4 (Coercivity—PDE theory). Coercivity (33) is one of the key elements to analyzing unique existence for PDEs [12, 44] . This condition is often stated for the nonlinear model f; after some derivation, it may then be transformed into semi-coercivty of the PDE model, which is ${_{\theta}{\mathbf{F}}}$ in our case.

This, as alluded to in section 1.1, is the motivating point for our study. We have employed well-posedness of the parameter-to-state map S, which is induced from coercivity of the model ${_{\theta}{\mathbf{F}}}$, to iteratively approximate the state u with convergence guarantee and rate.

At the end of this paper (section 5.1, proposition 6, remark 9), we will estblish coercivity for some classes of nonlinear parabolic PDEs. This demonstrates that (33) is indeed a suitable and verifiable assumption for the lower-level problem.

Remark 5 (Coercivity—conditional stability). Assumption (33) means Hölder continuity of the inverse of ${_{\theta}{\mathbf{F}}}$ and can be extended to superadditive index functions φ, that is,

Equation (39)

Indeed, (37) then becomes

implying the linear rate. The extended version (39) can be seen as the conditional stability property of a continuous and injective operator.

There is also a relation to the variational source condition

in which the link from the loss functional $\ell$ to the data penalty via the index function φ is the key to convergence rates in variational regularization [13].

Going back to our setting where ${_{\theta}{\mathbf{F}}}$ describes a PDE, we focus on coercivity (33) rather than on the extension (39). In fact, coercivity leads us to the following result.

Lemma 3 (Strong tangential cone condition (sTC)). The strong tangential cone condition (see remark 3-ii)) holds under coercivity (33) and the Hölder continuity assumption

Equation (40)

on the derivative, with some $L_{F^{{\prime}}}\unicode{x2A7E}0$, for all $u,v\in B_r(u^\dagger)$, provided the ball $B_r(u^\dagger)$ satisfies

Equation (41)

In this scenario, all the claims in theorem 1 remain valid.

Proof. (40) and coercivity as in (33) yield the strong tangential cone condition

for $u,u^*\in B_r(u^\dagger)$. Above, the strict inequality (41) ensures that the tangential cone constant $\tilde{c}_{tc}$ is less than one. □

Remark 6 (Convergence is highly feasible). Lemma 3 shows that with coercivity, one only needs Hölder continuity of the derivative as in (3) to confirm convergence of the lower-level iteration. Indeed, the special case of β = 1, that is, that the derivative is Lipschitz continuous, is a natural assumption for convergence of gradient descent in many contexts, e.g. in convex optimization.

Continuing along these lines, we observe some connections between the weak tangential cone condition and other renowned conditions yielding convergence of the gradient descent method, such as convexity and the Polyak–Lojasiewicz (PL) condition. For consider the cost functional $ J(u): = \|{_{\theta}{\mathbf{F}}}(u)-\varphi\|^2 $. Recall that J is said to satisfy the PL condition [34] if there exists a µ > 0 such that for all u,

Lemma 4 (Convexity versus wTC versus Polyak-Lojasiewicz condition). Assume that the equation ${_{\theta}{\mathbf{F}}}(u) = \varphi$ is solvable at $u^*\in B_r(u^\dagger)$. Let ${_{\theta}{\mathbf{F}}}$ be differentiable with (rescaling as necessary) $\|{_{\theta}{\mathbf{F}}}^{{\prime}}(u)\|\unicode{x2A7D} M_r\lt1, \forall u\in B_r(u^\dagger)$. We have the relation:

Proof. (i) As J is Fréchet differentiable with

the subdifferential $\partial J(u)$ is a singleton. Furthermore, convexity yields

for ${_{\theta}{\mathbf{F}}}(u^*) = \varphi$ and any $u\in\mathcal{V}$. Restricting our consideration to $u\in B_r(u^\dagger)$, this is in fact the wTC (32) with the fixed constant $(M_r^2+\mu_r) = 1$; clearly, this implies the standard wTC. ii) As ${_{\theta}{\mathbf{F}}}(u^*) = \varphi$, we have $\min_u J(u) = 0$. Thus, if the wTC (32) holds and coercivity (33) is attained at α = 1, then

showing that the PL condition is fulfilled. □

Going back to the central point of this section, monotonicity of the error sequence is the main character in theorem 1; however, the same question for the image sequence in the nonlinear case, has, to the best of the author's knowledge, yet to be discussed. The following proposition gives sufficient conditions for monotonicity of the residual.

Proposition 2 (Monotonic residual). Let the conditions in theorem 1 be fulfilled, and assume moreover that (33) holds with α = 1 for any 2 $u,v\in B_r(u^\dagger)$ Assume that ${_{\theta}{\mathbf{F}}}^{{\prime}}(u_k)^*:\mathcal{V}\to\mathcal{U}^*\times H$ is injective, and moreover

Equation (42)

Then, the residual sequence $\|{_{\theta}{\mathbf{F}}}(u_k)-\varphi\|$ decreases monotonically.

Proof. Consider $u\in \mathop{\mathrm{int}}B_r(u^\dagger)$, the interior of $B_r(u^\dagger)$, and $v: = u+\epsilon h\in \mathop{\mathrm{int}} B_r(u^\dagger)$ with sufficiently small ε > 0. Employing the coercivity condition (33), we have

The open map theorem implies that at any $u\in \mathop{\mathrm{int}} B_r(u^\dagger)$, the bounded linear form ${_{\theta}{\mathbf{F}}}^{{\prime}}(u)\in\mathcal{L}(\mathcal{V},\mathcal{U}^*\times H)$ has closed range. Equivalently, the adjoint ${_{\theta}{\mathbf{F}}}^{{\prime}}(u)^*$ also has closed range. This, together with injectivity, gives us that ${_{\theta}{\mathbf{F}}}^{{\prime}}(u_k)^*$ is also bounded below by the same constant

Equation (43)

Note that this estimate holds uniformly for all ${_{\theta}{\mathbf{F}}}$ with $\theta\in B_R(\theta^\dagger)$.

Next, using the weak tangential cone condition (32) and the parallelogram law, one gets

Applying boundedness of the derivative as in (31) to the first and to the third term of the right hand side, then moving the third term to the left hand side, we have

Recall that as long as the initial guess satisfies $u_0\in B_{r/2}(u^*)$, all the iterates uk remain in the ball. Moreover, uk belongs to the interior of $B_{r/2}(u^*) $ due to Fejér monotonicity (35); thus, $u_k\in \mathop{\mathrm{int}} B_{r/2}(u^*)\subset \mathop{\mathrm{int}} B_r(u^\dagger)$. This enables us to apply the lower bound (43) to the first term in the above estimate, with $u = u_k$, $y = {_{\theta}{\mathbf{F}}}(u_{k+1})-\varphi$, thus obtaining

Equation (44)

Here, we notice that the lower bound $1/C_{coe}$ cannot exceed Mr , the upper bound for $\|{_{\theta}{\mathbf{F}}}^{{\prime}}(u)\|$; hence, we cannot claim decay of the image sequence yet. Therefore, we further estimate the middle term of (44), again invokinging coercivity (33) and the lower bound (43)

Substituting this into (44), we arrive at

where the strict inequality is achieved according to the constraint (42). This shows the claimed monotonicity of the residual sequence. □

Remark 7 (Boundedness from below of ${_{\theta}{\mathbf{F}}}^{{\prime}}(u_k)$ and its adjoint). Boundedness from below of ${_{\theta}{\mathbf{F}}}^{{\prime}}(u_k)$ and its adjoint ${_{\theta}{\mathbf{F}}}^{{\prime}}(u_k)^*$ is in fact the same as invertibility of ${_{\theta}{\mathbf{F}}}^{{\prime}}(u_k)$, thus of ${_{\theta}{\mathbf{F}}}^{{\prime}}(u_k)^*$. This ties to unique existence analysis of the linearized PDE (14) and the adjoint equation (13), which appears in the iterative Algorithm 1.

We conclude this section by discussing some links from the lower-level to the upper-level, stepping toward parameter recovery.

Discussion 1. 

  • Convergence rate of the state approximation error $\epsilon_{K(j)}: = \|u_k-u^*\|_\mathcal{U}$ in the $\mathcal{U}$-norm is sufficient to carry out the analysis of the upper-level (cf remark 3(iii)). In some practical examples, this simplifies verification of the coercivity assumption (33).
  • Consider the lower-level problem ${_{\theta}{\mathbf{F}}}(u) = \varphi$, $\theta\in B_R(\theta^\dagger)$. As long as the exact state $u^*$ and the initialization u0, which might depend on θ, lie in $B_{r/2}(u^\dagger)$, then all iterates will remain in the ball $B_r(u^\dagger)$ (theorem 1), which is independent of θ. This indicates that if we ensure that in the upper-level problem, the parameter iterates θj do not escape the ball $B_R(\theta^\dagger)$, then the lower-level iterates uk are guaranteed to remain in the ball $B_r(u^\dagger)$.
  • The lower-level problem is noise-free, as ϕ is the residual of the PDE model. We can therefore iterate uk until the error $\epsilon_{K(j)}$ is acceptably small, then input this approximate state into the upper-level iteration. This gives rise to a lower-level stopping rule $k\unicode{x2A7D} K$, where the stopping index K depends on j-th iterate of the upper-level iteration and the noise level δ. We will determine $K(j,\delta)$ in the next section.
  • The state approximation error $\epsilon_{K(j)}$ in practical experiments can be observed via plotting the PDE residual. Indeed, they behave in a similar manner (theorem 1, (ii) rate), and the PDE residual even decays monotonically (proposition 2), echoing Fejér monotonicity of the error (theorem 1, (35)).

We summarize the assumptions that have been used.

Assumption 1 (Bi-level algorithm). For all θ, $\hat{\theta}\in B_R(\theta^\dagger)$ and for all u, $\hat{u}\in B_r(u^\dagger)$, the following all hold:

  • 1.  
    Sʹ is uniformly bounded, with
  • 2.  
    The solution z to the adjoint PDE depends continuously on the source h, that is,
  • 3.  
    The derivatives $f^{{\prime}}_\theta$, $f^{{\prime}}_u$ satisfy the Lipschitz condition

for some positive constants $M_S,C_{f_u},L_{\nabla f}$.

Assumption 2 (Lower-level iteration).  ${_{\theta}{\mathbf{F}}}(u): = F(\theta,u) = \varphi$ is solvable at $u^*$. Moreover, for all $\theta\in B_R(\theta^\dagger)$ and for all u, $\hat{u}\in B_r(u^\dagger)$, the following all hold:

  • 4.  
    ${_{\theta}{\mathbf{F}}}^{{\prime}}$ is bounded, with
  • 5.  
    Either the weak tangential cone condition
    holds, or ${_{\theta}{\mathbf{F}}}^{{\prime}}$ satisfies the Hölder condition (see lemma 3)
    Equation (45)
  • 6.  
    ${_{\theta}{\mathbf{F}}}$ is coercive, with

with positive constants $L_{F^{{\prime}}}$, Ccoe , Mr , µr .

4. The upper-level iteration

In the preceeding, we approximated the PDE solution by the lower-level iteration. The approximate state $u^{\,j}_{K(j)} = :\widetilde{S}_{K(j)}(\widetilde{\theta}^{\delta,j})$ and the associated approximate adjoint will be now fed into the upper-level of Algorithm 1 to update the parameter $\widetilde{\theta}^{\delta,j}$. This section studies convergence of the upper-level iteration under the influence of the noise level δ of the measurement, state approximation and adjoint approximation errors. In particular, we discuss two types of stopping rule: a posterior choice inspired by the classical result [29], and a prior choice inspired by the newer result [35]. We put more emphasis on the latter.

Let us recall from (3) that the reduced map $G = L\circ S$ is a composition of the linear, bounded observation L and the nonlinear, differentiable parameter-to-state-map S. The forward map G has uniformly bounded derivative, satisfying

Equation (46)

for all $\theta\in B_R(\theta^\dagger)$, with MS as in assumption 1.

Assumption 3 (Upper-level iteration). 

  • (a).  
    Strong tangential cone condition (sTC): for all θ, $\hat{\theta}\in B_R(\theta^\dagger)$, the relations
    Equation (47)

hold for some positive constants MR , µR , KR with $(M_R^2+\mu_R)\lt2$.

The first part of (47) is the weak the tangential cone condition, while the second part strengthens the assumption (cf discussion 3(ii)). Assumption (47) can also be written as

Equation (48)

for all θ, $\hat{\theta}\in B_R(\theta^\dagger)$, with $C_{tc}: = 1-(M_R^2+\mu_R)/2\in(0,1)$ and $K_R = 1+C_{tc}$. Unlike the tangential cone condition w.r.t u of the lower-level problem, the derivative here is taken w.r.t the parameter θ.

Algorithm 1 presents the upper-level iteration

Equation (49)

where $\widetilde{S}_{K(j)}(\widetilde{\theta}^{\delta,j})$ is approximated according to the lower-level iteration (19) and $S^{{\prime}}(\widetilde{\theta}^{\delta,j})^*$ is computed according to the adjoint process (20).

4.1. A posterior stopping rule

In this section, we analyze an a posteriori choice for the stopping index $\widetilde{j}^*$ of the upper-level iteration. The analysis also yields a stopping rule K(j) for the lower-level problem.

Proposition 3 (Boundedness—Fejér monotnicity). Let assumptions 13 hold, and let $\widetilde{\theta}^{\delta,j}\in B_R(\theta^\dagger)$ be given. Then $\widetilde{\theta}^{\delta,j+1}$ in (49) is a better approximation, i.e.

(in particular, the whole sequence remains in the ball $B_R(\theta^\dagger)$) if, for some $\gamma(j)\unicode{x2A7E} 0$, the two following conditions are satisfied:

  • 1.  
    The lower-level (19) is iterated until step K(j), such that the state approximation error satisfies
    Equation (50)
    with some nonnegative and sufficiently small $\gamma(j)$.
  • 2.  
    In the upper-level, the residual satisfies the relation
    Equation (51)
    for some ε > 0 with Ctc as in (48) and MR , µR , KR as in (47).

Proof. Starting from (49), by inserting $LS(\widetilde{\theta}^{\delta,j+1})$ and applying Young's inequality $(a+b)^2\unicode{x2A7D} (1+\epsilon) a^2+(1+1/\epsilon)b^2$, ε > 0, one estimates the error discrepancy between two consecutive iterations as

Equation (52)

recalling that $S(\theta)$ is the exact evaluation, while $\widetilde{S}_{K(j)}(\theta)$ is the approximation via the lower-level iteration (19).

For E1, we insert $y^\delta-LS(\widetilde{\theta}^{\delta,j})$. For E2, we recall the state approximation error $\epsilon_{K(j)}: = \|S(\widetilde{\theta}^{\delta,j})-\widetilde{S}_{K(j)}(\widetilde{\theta}^{\delta,j})\|$, and use the second part of the tangential cone condition (47). For E3 and E4, we employ boundedness of the derivative from (46), as well as the definition of $\epsilon_{K(j)}$ for E4. Explicitly, one has

Combining these estimates then inserting y and applying tangential cone condition in the equivalent form (48), we get

This estimate reveals that if the state approximation error $\epsilon_{K(j)}$ is sufficiently small in comparison to $\|LS(\widetilde{\theta}^{\delta,j})-y^\delta\|$, we obtain negativity of $e_{j}^\delta$. This motivates us to control the lower-level error $\epsilon_{K(j)}$ by the upper-level residual $\gamma(j)\|LS(\widetilde{\theta}^{\delta,j})-y^\delta\|$ as in (50). Indeed, (50) leads us to

where negativity is achieved if we stop the upper-level iteration according to the rule (51). Fejér monotonicity follows. □

In view of (51), we suggest the posterior upper-level stopping rule

Equation (53)

with $\Gamma(j)$ as in (51). By Fejér monotonicity, $\widetilde{\theta}^{\delta,j+1}$ is a better approximation of $\theta^\dagger$ than $\widetilde{\theta}^{\delta,j}$, as long as the discrepancy (53) is respected.

Remark 8 (Stopping rule—consistency check). We wish to emphasize that when S is solved exactly or an analytic solution of the PDE (1) can be derived, then in (51) one can set $K(j) = \infty$, $\gamma(j) = \epsilon = 0$ and bypass the lower-level problem. In this event, (51) reduces to

coinciding with the known discrepancy principle in [29, section 2.2, (2.12)].

4.2.  A priori stopping rule

At this point, it is natural to ask how one may obtain the exact evaluation $S(\widetilde{\theta}^{\delta,j})$ required to compare the residual $\|LS(\widetilde{\theta}^{\delta,j})-y^\delta\|$ with τδ as the discrepancy principle (53) suggests, and how terminate the lower-level iteration w.r.t the upper-level residual (50). A possible scenario is that one may have access to the system output $LS:\widetilde{\theta}^{\delta,j}\mapsto LS(\widetilde{\theta}^{\delta,j})$, despite $S(\widetilde{\theta}^{\delta,j})$ not being directly accessible. In this event, the proposed posterior stopping rule (53) can be verified. If this is not the case, an a priori stopping rule $\widetilde{j}^*$ via the noise level δ is desirable.

Proposition 4 (Uniform boundedness). Let assumptions 13 hold. All the iterates $\widetilde{\theta}^{\delta,j}$ with $0\unicode{x2A7D} j\unicode{x2A7D}\widetilde{j}^*(\delta)$ remain in the ball $B_R(\theta^\dagger)$ if, for some $\gamma(j)\unicode{x2A7E} 0$, the two following conditions are satisfied:

  • 1.  
    The lower-level problem (19) is iterated until step K(j), such that the state approximation error satisfies
    Equation (54)
  • 2.  
    In the upper-level, the initialization and the prior stopping index satisfy
    Equation (55)

Proof. Similarly to proposition 3, we will evaluate the error $e_{j}^\delta$ between two consecutive iterations. We begin with the estimate (52), where in the last line, we further insert y into E1 and E3 (that is, the terms containing yδ ). So

where A has the same form as as $(E_1+E_2+E_3+E_4)$ in (52), with y replacing yδ in E1, E3, and B is the residual generated by inserting y.

In A, we can directly apply the first part of the tangential cone condition (47) to E1; the rest is estimated in a similar manner as in (52). That is,

Noticing that in comparison to E3, A3 has the extra constant $(1+\epsilon^{{\prime}})$ generated from splitting the square term $(a+b)^2\unicode{x2A7D} (1+\epsilon^{{\prime}})a^2+(1+\frac{1}{\epsilon^{{\prime}}})b^2$, $\epsilon^{{\prime}}\gt0$ when inserting y.

In B, we use the second part of the tangential cone condition (47), boundedness of the derivative as in (46), then finally Young's inequality $2ab\unicode{x2A7D} \mu_R a^2+b^2/\mu_R$ for $a: = \|LS(\widetilde{\theta}^{\delta,j})-y\|$, $b: = \delta K_R$. This leads to

We now set $\epsilon = \epsilon^{{\prime}} = \sqrt{2}-1$ in both A and B. In A3, we thus have the constant $(1+\epsilon)(1+\epsilon^{{\prime}}) = 2$; in A4, we have $(1+\frac{1}{\epsilon}) = \sqrt{2}/(\sqrt{2}-1)\lt7/2$, while in B2, we have $(1+\epsilon)(1+\frac{1}{\epsilon^{{\prime}}})\lt5$.

Next, we bound $\epsilon_{K(j)}$ in A2, A4 by $\delta \gamma(j)$ as the rule (54) suggests. In A2, we further apply Young's inequality $2ab\unicode{x2A7D} \mu_Ra^2/2 +2b^2/\mu_R$ for $a: = \|LS(\widetilde{\theta}^{\delta,j})-y\|$, $b: = \|L\|\epsilon_{K(j)} K_R$.

Now summing A and B, we see that $(A_3+B_1+\mu_Ra^2/2)$ is absorbed by part of A1, that is

Equation (56)

Next, summing up $e_{j}^\delta$ for $j = 0\ldots\bar{j}$ for any $\bar{j}\unicode{x2A7D}\widetilde{j}^*(\delta)$ with the upper-stopping index $\widetilde{j}^*$ proposed in (55), we arrive at

Equation (57)

This shows that all the iterates until $\widetilde{j}^*(\delta)$ remain in the ball

and the proof is complete. □

In comparison to proposition 3, a prior stopping rule cannot yield Fejér monotonicity of the iterate with noisy data. As monotonicity is essential in ensuring the regularizing property of the Landweber iteration, and also due to the bi-level factor, we need to take particular care regarding stability of the upper-level.

The following Proposition is crucial to derive a more concrete stopping rule, which can ensure controllability of the approximation errors propagating to the upper-level. At this point, the preliminary error analysis in section 2.2 comes into play. We remind ourselves that the circumflex $\widetilde{(\cdot)}$ denotes the approximation/bi-level scheme; notation without the circumflex refers to the exact/single-level scheme.

Proposition 5 (Noise propagation-approximation error). Let the assumption in proposition 4 hold. Assume moreover that the lower-level precision (54) satisfies

Equation (58)

Then, we achieve the estimate

Equation (59)

for all $j\unicode{x2A7D}\widetilde{j}^*(\delta)$, where

Proof. We begin with comparing the $(j+1)$-iteration of the single-level scheme with exact data, and of the bi-level scheme with noisy data

Inserting $S^{{\prime}}(\theta^{\,j})^*L^*(L\widetilde{S}_{K(j)}(\widetilde{\theta}^{\delta,j})-y^\delta)$ (first estimate below) and using the analysis of the state approximation error derived in lemma 1 (second and third estimates) and of the adjoint approximation error derived lemma 2 (second estimate), we deduce

Simplify the notation by denoting

where in $B_{\delta,\epsilon_{K(j)}}$, we have invoked uniform boundedness of the iterates in the ball $B_R(\theta^\dagger)$ proved in proposition 4. We now group the estimate in terms of $T^{\,j}, \delta$ and $\epsilon_{K(j)}$, and obtain

Equation (60)

The estimate (60) exposes an important fact: if $\epsilon_{K(j)}$ decays sufficiently fast with respect to δ then we can bound the term $T^{j+1}$ from above. Indeed, (54) suggests running the lower-level iterate until $\epsilon_{K(j)}\unicode{x2A7D}\delta\gamma(j)$. Employing the upper bounds $\bar{\delta}$ and $\bar{\gamma}$ from the statement of the proposition, we have

This results in the recursive relation

Equation (61)

By induction and choosing the same initialization $\widetilde{\theta}^{\delta,0} = \theta^{0}$, i.e. T0 = 0, we arrive at the important estimate

Equation (62)

By (58), $\gamma(j)\unicode{x2A7D}\frac{1}{q^{\,j}}$; we can thus collapse the sum in (62), and arrive at the key estimate

for all $j\unicode{x2A7D}\widetilde{j}^*(\delta)$, as required. □

This immediately suggests a stopping rule:

Assumption 4 (prior stopping rule). The upper-stopping index $\widetilde{j}^*(\delta)$ satisfies

Equation (63)

with $\hat{\Gamma}$ as in (59).

With all the auxiliary results prepared, we now state the main result.

Theorem 2 (Convergence of bi-level iteration). Assuming that the following both hold:

  • (a)  
    The upper-level stopping index $\widetilde{j}^*(\delta)$ is chosen according to (55) and (63).
  • (b)  
    The lower-level stopping index K(j) is chosen according to (58) at each $j\unicode{x2A7D}\widetilde{j}^*(\delta)$.

Then, the bi-level iterates $\widetilde{\theta}^{\delta,\widetilde{j}^*(\delta)}$ in algorithm 1 converge to a solution of the inverse problem (1) and (2) as δ → 0 .

Proof. We begin by claiming that for exact data y, the whole sequence θj run with the exact/single-level scheme converges in the weak sense to a solution of the inverse problem (1) and (2). Indeed, looking at the proof of proposition 4, in particular the expression (56), by setting δ = 0, $\gamma(j) = 0$ we have

showing Fejér monotonicity. (57) with $\bar{j} = \infty$, $G = L\circ S$ becomes

These two properties are exactly the same as (35) and (36) in theorem 1, the key steps to prove convergence of the iteration, with θ in place of u, and G in place of ${_{\theta}{\mathbf{F}}}$. Therefore, following the line of the proof of theorem 1, part (i), noting that the tangential cone condition (even the strong version) is also assumed here, we can assert that the whole iterate sequence of the single-level algorithm with noise-free data converges to a solution to the inverse problem problem, i.e.

Moreover, one can claim strong convergence. This is done in [29, theorem 2.4] (which we do not repeat here) by showing that $e_j^{\delta = 0}$ in fact forms a Cauchy sequence; thus,

Equation (64)

This, together with proposition 5 and the stopping rule (63), allows us to conclude convergence of the sequence $\widetilde{\theta}^{\delta_n,\widetilde{j}^*(\delta_n)}$ via the decomposition

where convergence of the first term follows from proposition 5 with the rule (63), and convergence of the second term is (64). The proof ends here. □

5. Discussion on assumptions and application

5.1. Classes of problems with coercivity

As presented in remark 4 of section 3, coercivity (34) is the essential property that we extract from well-posedness of S to achieve the convergence rate for the lower-level iterations. In what follows, we present two classes of PDEs where coercivity can be established: Lipschitz nonlineartiy and monotone-type nonlinearity (66).

Let us consider the PDE (29), where the unknowns $\theta: = (a,c,\phi,u_0)\in X$ consist of the diffusion coefficient a, potential c, source term φ and initial state u0. These unknown parameters are embedded in the PDE model via

Equation (65)

in the Hilbert space framework (cf section 1.2)

with bounded Lipschitz domain $\Omega\subset\mathbb{R}^3$.

In (65), the nonlinearity $\Phi:\mathcal{V}\to\mathcal{U}^*$ is a Nemytskii operator induced by $\Phi:U\to U^*$ with (by abuse of notation) $[\Phi(u)](t): = \Phi(u(t))$. We consider

Equation (66)

for all u, $v\in U$ with $L_\Phi$, $M_\Phi\unicode{x2A7E}0$. Clearly, case (b) covers the class of standard monotonic functions.

Proposition 6 (Coercivity verification). The classes of PDEs with Lipschitz or monotone-type nonlinearities (65) and (66) satisfy the coercivity property listed in Assumption 2 with α = 1 and in $\mathcal{U}$-norm. That is, the relation

Equation (67)

holds uniformly for all $\theta: = (a,c,\phi,u_0)\in B_R(\theta^\dagger)$ with a positive constant Ccoe .

The property (67) is also referred to as Lipschitz stability.

Proof. We begin by testing ${_{\theta}{\mathbf{F}}}(u)-{_{\theta}{\mathbf{F}}}(v)$ with $(u-v; 0)\in\mathcal{V}\times H$. Setting $\|u\|_{H_0^1}: = \|\nabla u\|_{L^2}$ due to the Gagliardo-Nirenberg-Sobolev inequality and recalling $a\unicode{x2A7E} \underline{a}\gt0$, we write

Equation (68)

Estimating the c-term by using Hölder's inequaltiy with $p = q = 2$, then by Young's inequality $xy\unicode{x2A7D} x^p/p+y^q/q, p = 4/3,q = 4$, we get

where $C_{H^1\to L^6}^\Omega$ indicates the application of the continuous embedding $H^1(\Omega)\hookrightarrow L^6(\Omega), \Omega\subset\mathbb{R}^3$. For the nonlinear term Φ in (66), one observes

where for the case (a), we have used Young's inequality $xy\unicode{x2A7D} x^2/\epsilon+\epsilon y^2, \epsilon = \underline{a}/8$. Combining the above estimates of the nonlinearity Φ, the c-term to (68), we have that, for all $t\in[0,T]$

Equation (69)

where in the second line, we have applied one more time Young's inequality $ab\unicode{x2A7D} \epsilon a^2+b^2/\epsilon, \epsilon\lt\underline{a}/8$ and grouped the L2-terms with the nonnegative constant

In the second step, taking the integral of (69) over the time interval $(0,t)$, $t\unicode{x2A7D} T$ leads to

Equation (70)

Now, we apply Grönwall's inequality in the form

for (70), where $x(t): = \frac{1}{2}\|u(t)-v(t)\|^2_{L^2}$ is the second term on the left hand side and $\Pi: = \left(\frac{1}{\epsilon}+\frac{1}{2}\right)\|{_{\theta}{\mathbf{F}}}(u)-{_{\theta}{\mathbf{F}}}(v)\|_{\mathcal{U}^*\times H}^2$. We then get

In (70), one can moreover bound the term $(\underline{a}/8-\epsilon)\|u-v\|_{L^2(0,T;H^1(\Omega))}$ on the left hand side by the entire right hand side, thus eventually by $\|{_{\theta}{\mathbf{F}}}(u)-{_{\theta}{\mathbf{F}}}(v)\|_{\mathcal{U}*\times H}$. Altogether, we obtain

Equation (71)

with some $\widetilde{C}_{\Phi,R}\gt0$. Importantly, as the parameter θ lies in the ball $B_R(\theta^\dagger)$ with finite radius R, the estimate (71) is locally uniform. If only $(\varphi,u_0)$ are the unknown parameters, the radius R is allowed to be arbitrarily large, as ${C}_{\Phi,R}$, thus $\widetilde{C}_{\Phi,R}$, depends only on (c, a). This shows Lipschitz stability of $u\mapsto{_{\theta}{\mathbf{F}}}(u)$, equivalently the desired coercivity. □

Remark 9 (coercivity with $\mathcal{V}$-norm). By evaluating

in $L^2(0,T;H^{-1}(\Omega))$-norm, one can from the preceding result immediately deduce the stronger estimate

for Lipschitz nonlinearity Φ, case (a) in (66). However, as pointed out in remark (3)(iii), a $\mathcal{U}$-norm coercivity is sufficient for the bi-level algorithm.

Remark 10 (Tangential cone condition). In view of corollary 3, the tangential cone condition for the lower-level problem (in u) is clearly satisfied for $\Phi(u) = 0$. When $\Phi(u)$ is as in (66), as coercivity/Lipschitz stability property is already establish in proposition 6, one only needs to assume Lipschitz continuity of its derivative to conclude the tangential cone condition. Thus, convergence of the lower-level iteration is highly achievable.

For the upper-level problem, verification of the tangential condition (in θ) is another technical question. We refer to [30] for a study of such condition in several linear and nonlinear parabolic benchmark problems, to [21] for abstract linear elliptic problems, to [23] for quantitative elastography, to [24] for magnetic resonance elastography, to [36] for the electrical impedance tomography, to [11] for full wave from inversion, and recently, to [1, 46] in the context of machine learning.

5.2. Application to magnetic particle imaging

This section is dedicated to discussion regarding a real-life application: magnetic particle imaging (MPI). The material of the following discussion is extracted from the two papers [31, 42]. The work in [31] puts forward the mathematical analysis of the parameter identification problem in MPI; subsequently, [42] proceeds with Landweber, Landweber-Kaczmarz algorithms and numerical experiments. Moreover, an ad hoc bi-level algorithm, in which lower-level iteration is used as an approximation method for the nonlinear PDE (73), was successfully implemented without full analysis.

5.2.1. State of the art.

MPI is a novel medical imaging technique invented by B. Gleich and J. Weizenecker in 2005 [37]. The underlying concept is observing the magnetic behavior of iron oxide nanoparticle tracers, which are injected into patients and respond to an external oscillating magnetic field (with a field-free-point). One can thus visualize the particles' concentration, yielding quantitative imaging of blood vessels. MPI offers favorable features: no harmful radiation, high resolution (${\lt}1$ mm) and fast acquisition (${\lt}0.1$ s).

5.2.2. Parameter identification in MPI.

The external magnetic field induces the temporal change $\partial\mathbf{m}/\partial t$ of the particles, which by Faraday's law induces a measurable electric voltage

Equation (72)

Here, µ0 is magnetic permeability in vacuum, $\widetilde{a}_l$ is a band-pass filter, c is the known particle concentration and $\mathbf{p}^{\mathrm{R}}$ is the 3D receive coil sensitivity.

Inspired by [15, 38], we use the Landau–Lifshitz–Gilbert (LLG) equation to model the evolution of the magnetization moment vector m in a microscopic scale

Equation (73)

with the relaxation parameter $\alpha\in\mathbb{R}^2$, external magnetic field h and the saturation magnetization constant mS . We highlight that as (73) includes the relaxation effect (via α and the cross product) and the particle interaction (via $\Delta \mathbf{m}$ and $|\nabla \mathbf{m} \rvert^2 \mathbf{m}$), our approach is realistic to a greater extent in comparison to simplified models, e.g. the Langevin function.

Combining (72) and (73), we investigate the inverse problem of identifying α along side the magnetization m

Equation (74)

The LLG equation (73) is highly nonlinear, with vectorial solution m; numerical solution of (73) is a greatly challenging task [2, 3, 5, 8] (see also [42, remark 1] for a survey). This initiated our idea of a bi-level approach, in which the lower-level problem is used to approximate m.

We invite the reader to review [31, 42] for various numerical results. To illustrate, figure 1 shows the state approximation $\textbf{m}(t,x)$ in the lower-level and the reconstruction as well as the internal operation of the bi-level scheme . The reconstruction with Landweber step size µ = 1 ran 250 upper-level iterations and a total of 11 095 lower-level iterations over 90 000 seconds. Note that this can be sped up by using the larger step size µ = 10. The final state error is below 1%, and the parameter error is below 2% [42, table 13]. In essence, the proposed bi-level scheme demonstrates its robustness through several test cases with experimental parameters, true physical parameters, noise free and noisy data. We are currently carrying out an extended numerical comparison of the three approaches—reduced, all-at-once and bi-level.

Figure 1.

Figure 1. Left: lower-level, applied field $\textbf{h}(t)$ (left), initial state m0 (middle), approximate trajectory $\textbf{m}(t)$ (right). Top right: Bi-level convergence via reconstruction. Bottom right: number of iterations for bi-level. 250 total upper iterations (x-axis), corresponding number of lower-iterations (y-axis). These figures are taken from [42, figures 9 and 14]. Reproduced with permission from [42].

Standard image High-resolution image

6. Conclusion

In this work, we have proposed a bi-level Landweber framework with convergence guarantee for coefficient identification in ill-posed inverse problems. The method is tailored, but not limited, to nonlinear time-dependent PDE models. In addition, the regularization study in the upper-level can be combined with other inexact PDE solvers like reduced order models, principal component analysis etc to obtain an end-to-end error analysis.

The author intends to extend the study in the following directions:

  • The lower-level could further include a model inexactness term δPDE . This reflects the case, for instance, when the PDE is not modeled in ideal conditions, or when a simplified version is employed. In this situation, the PDE itself contains a quantity of inexactness, and the problem becomes ill-posed in both levels.
  • In Algorithm 1, the lower-level iteration is initiated at some u0 near $u^\dagger$, and the lower-level iterations between different upper-iterates do not communicate with each other. Arguing that a small update in the parameter θ causes a corresponding small update in the PDE solution u, one can initialize the lower-level iteration of step j + 1 at the last iteration of step j, i.e. $u_0^{\,j+1}: = u^{\,j}_{K(j)}$, with the aim of lowering the stopping index $K(j+1)$. Inspired by the all-at-once approach, the author also wishes to investigate the situation where one lower-level iteration is sufficient, i.e. $K(j+1) = K(j)+1$, yielding alternating updates between u and θ.

Replacing the Landweber method by its accelerated version [22, 39] or by another iterative method in either or both levels is also a promising research direction.

Acknowledgments

The author thanks Barbara Kaltenbacher for many valuable comments, and Christian Aarset for thoroughly proofreading the manuscript. The author wishes to thank the reviewers for their insightful suggestions, leading to an improvement of the manuscript. The author acknowledges support from the DFG through Grant 432680300—SFB 1456 (C04).

Data availability statement

All data that support the findings of this study are included within the article (and any supplementary files).

Footnotes

  • That is, not only at a solution $v = u^*$.

Please wait… references are loading.