Abstract
This study focuses on addressing the inverse source problem associated with the parabolic equation. We rely on sparse boundary flux data as our measurements, which are acquired from a restricted section of the boundary. While it has been established that utilizing sparse boundary flux data can enable source recovery, the presence of a limited number of observation sensors poses a challenge for accurately tracing the inverse quantity of interest. To overcome this limitation, we introduce a sampling algorithm grounded in Langevin dynamics that incorporates dynamic sensors to capture the flux information. Furthermore, we propose and discuss two distinct dynamic sensor migration strategies. Remarkably, our findings demonstrate that even with only two observation sensors at our disposal, it remains feasible to successfully reconstruct the high-dimensional unknown parameters.
Export citation and abstract BibTeX RIS
Original content from this work may be used under the terms of the Creative Commons Attribution 4.0 license. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.
1. Introduction
1.1. Mathematical model
We give the mathematical statement of our interested inverse problem. Firstly the heat equation is given as follows:
Here is the two-dimensional unit disc. In equation (1.1), the discontinuous source is the characteristic function and the support is unknown. More precisely, if x is included by D, the value of is one; otherwise, it equals zero. We assume that is sufficiently smooth. In this work, we will use the boundary flux data observed by the dynamic moving sensors to recover the unknown support D. The mathematical formulation of the measurements can be written as
Here z(t) is the observation point and located in the boundary ; is the observed area. We employ the notations z(t) and to emphasize that the sensor locations are time-dependent. Our analysis centers on sparse boundary data, where Γ represents a very small subset of the boundary. The limited availability of observations on this sparse boundary presents substantial challenges in recovering χD . To address these challenges and verify the proposed algorithm 1, we opt to rely solely on flux information obtained from two sensors, further enhancing the complexity of the problem.
.
Algorithm 1. Pseudo-algorithm for dynamical locations determination. |
---|
1. Initialization: two observation points (sensors) location p1 and p2, observation time stamps , current observation time , initial PDE simulation time t0 = 0, and the total sampling iteration limits K, the number of samples N; |
2. for k = 1 to K − 1 do |
3. Observe the flux at p1 and p2; |
4. Simulate the PDE solution until from t0 for the computation of ; |
5. Obtain posterior samples (the target) and (solution at ) according to the sampling schemes presented in algorithm 2; |
6. Update two sensors p1 and p2 based on posterior samples (please refer to section 3.2.1 and 4.1.1 for details); |
7. Reset initial time ; |
8. Reset the mean of to be the initial value for the next simulation ; |
9. Pick up and set to be the next observation and the end-of-simulation time from tʹ. |
1.2. Background and literature
In practical applications of inverse problems, sparse boundary data holds great importance. Let us take an example in atmospheric science [11]. The equation (1.1) can be used to describe the diffusion of pollutants [11], where the solution u represents pollutant concentration. In such scenarios, the region of the pollutant source, denoted as D, should be heavily contaminated. To safeguard the health of engineers, the utilization of boundary measurements becomes imperative.
Furthermore, in the realm of practical inverse problems, the project's cost cannot overlooked. This includes expenses related to equipment, labor, computation, and more. This cost-conscious perspective has motivated numerous researchers to seek solutions to inverse problems using sparse data.
However, when opting for sparse boundary data, raises a pertinent question: how do we determine the sensors' locations? Although one can establish the well-posedness of the inverse partial different equation (PDE) problem, the sensors' locations significantly influence the quality of numerical simulation. Unfortunately, in the context of inverse problems, we often have to select observation areas randomly. In practice, it is entirely possible to initially choose suboptimal observation points, which can pose challenges.
To address this dilemma and ensure the quality of our approximations, we propose the concept of using moving observation points. Specifically, if the initial observation points are situated in less favorable locations, we can relocate the facilities to more suitable positions. Consequently, the data obtained from these improved locations will compensate for the adverse effects of the suboptimal points and enhance the accuracy of our approximations.
In line with this idea, the fundamental question we must address is how to determine whether a location is advantageous or not; that is, how to decide when and where to move the observation points.
The inverse source problem of the heat equation is a classical field in the literature of inverse problems, and abundant academic achievements are generated. See [5, 14, 16–19, 25, 31, 32] and the references therein. Nowadays, due to the significant application value of the sparse data, more and more researchers have paid attention to this field. Here we list several references on the inverse problems with sparse data. In [14], the authors consider the inverse source problem of the heat equation on the two-dimensional unit disc. They recover the space-dependent source f(x) by the flux data at finite points of the boundary. The conclusion in [14] is promoted by [32], in which the variable separable source can be uniquely determined by the sparse boundary data. Note that in [32], the spatial term and temporal term are both unknown. In [22, 25], the authors further recover the semi-discrete unknown source, which is investigated in this work, and the considered equation is the parabolic equation on a general domain. Also, for the geometric inverse problems, the authors in [13, 20] recover the manifold by the image of a single specific function. We should note that for such geometric inverse problems, people usually use a whole operator as the measurements.
1.3. Main result and outline
We summarize the contributions as follows.
- 1.We study inverse problems with sparse measurements and moving observation points.
- 2.We introduce a sampling-based approach and a dynamic sensor migration algorithm designed for source tracing. Moreover, we present two migration strategies and validate their effectiveness through two challenging numerical examples.
The rest of this article is organized as follows. In section 2, we will provide an overview of the existing problems. We then introduce the Bayesian methodology and the Langevin-based algorithm in section 3, in which the uniqueness of this inverse problem is also discussed. In particular, we will discuss the proposed sampling methods and proposed dynamical sensors migration algorithm in section 3.1 and 3.2. Finally, we will verify the performance and introduce the sensors migration strategies in section 4.
2. Preliminaries
2.1. The inverse problem settings
This article is an extension of [31, 32], in which the authors prove the uniqueness theorem of the inverse source problem under sparse data. More precisely, under suitable conditions, the flux data generated from two points of the boundary can uniquely determine the unknown source. However, as we mention in the introduction, when we solve this inverse problem numerically, there is a natural question that how to determine the locations of the observation points. This question holds paramount importance in computational work because the precision of the numerical approximation is profoundly influenced by the placement of observation points. To substantiate this assertion, we test a numerical experiment by the algorithm proposed in [32] and obtain the comparison (please see figure 1).
In figure 1, the dotted blue line represents the support of the exact source, while the dashed red line corresponds to the support of the approximated source. The black points located along the boundary represent the observation points. It is evident that the support of the exact source in both figures is the same; the only difference lies in the placement of the observation points. Consequently, the two approximations exhibit significant disparities.
Evidently, in the left figure, the observation points are positioned unfavorably, earning them the label of 'bad' points, while those in the right figure are considered 'good'. However, it is important to recognize that in practical applications of inverse problems, obtaining the exact solution for the unknown source is often infeasible. Therefore, we are left with the task of selecting observation points largely based on chance.
If, by chance, we initially select 'bad' points and do not alter their positions, it results in the production of poor numerical outcomes, as demonstrated in the left part of figure 1. This underscores the reason why we endeavor to leverage data from moving observation points to offset the approximations adversely affected by these 'bad' observation points.
Following [14, 32], the unknown support D of the source can be uniquely determined when the observed area Γ is fixed and only consists of two chosen points on the boundary. Hence, in this work, we will suppose that consists of two points of for each . With the idea of moving observation points, the measurements we used in this work will be represented as
The notation is the path of each observation point, and the essential difficulty of this work is how to determine the path zj . We will provide some rules for determining and give the corresponding numerical experiments. These will be discussed in the next sections.
2.2. Bayesian framework
A highly effective approach for tackling the inverse problem involves adopting the Bayesian inference framework, as referenced in prior works [6, 10, 23, 33, 35]. In this context, we define the measured flux data as and represent the prior distribution of ξ as , where ξ is the unknowns used to parameterize the shape of the source and the inverse quantities of interests (QoIs). Utilizing the Bayesian formula, we can express this relationship as follows:
where represents the likelihood function given the parameter ξ .
We denote the mapping from the space of parameter ξ to the observations as the forward model, labeled as g. Given that observation data inherently contain noise and that noise is also introduced during the forward evaluation, we make the assumption that given ξ , the data d follows a normal distribution with mean and variance σ2. Specifically, the likelihood function takes the following form:
One of the key advantages of employing the Bayesian inference method is our ability to obtain the distribution of the unknown parameters. This capability allows us to quantitatively assess the uncertainty associated with QoIs, such as the outputs of the forward model. The trajectory of the observation points is subsequently determined in a sequential manner using posterior samples of the QoI.
3. Langevin diffusion and Bayesian sampling
In this study, we will employ the Bayesian approach to ascertain the trajectory of observation points and iteratively address the inverse source problem. To attain the desired posterior distribution , various Markov chain Monte Carlo (MCMC) methods can be employed for sampling. Among these methods, the Langevin diffusion (LD)-based MCMC [4, 8, 9, 28] stands out as a computationally efficient choice, particularly when dealing with high-dimensional and multimodal distributions [21, 28].
Under certain regularity conditions (see, e.g. [2, 30]), the following stochastic differential equation (SDE)
known as the preconditioned LD [26, 29], has a stationary distribution
In the equation (3.3), L is an arbitrary symmetric, positive-definite matrix, is the energy function, Wt is the p-dimensional standard Brownian motion, and τ is the temperature parameter. Specifically, let us define the energy function as,
the target posterior distribution can then be sampled by the LD-based MCMC algorithms, where the Markov chain is constructed by discretizing the equation (3.3) without adjustment or with the Metropolis–Hastings (MH) acceptance step [3, 34]. The choice of the temperature parameter τ has crucial influence on the sampling. Motivated by the replica-exchange methods [23, 24, 27] extends the preconditioned LD-based methods to multiple chains with different temperature parameters to accelerate distribution simulation.
In practical sampling scenarios, computing the gradient of the energy function can be computationally expensive, especially when dealing with PDE-based inverse problems that involve time-consuming forward model simulations. Moreover, in this work, the source term exhibits discontinuities concerning the unknown parameters. Therefore, we opt for the gradient-free MH MCMC method [12].
To address the inverse problem, we will employ the adaptive preconditioned Crank–Nicolson MCMC (pCN-MCMC) method within a Bayesian inference framework.
3.1. Adaptive pCN
As the sampling acceptance rate is independent of the dimension of parameters, the method of pCN-MCMC [7] stands as an efficient strategy for addressing large-scale inference challenges. Specifically, it is based on the SDE
which is a special case of equation (3.3), where the prior is the Gaussian distribution , the gradient of the log-likelihood vanishes in the drift term, and the temperature is τ = 1. Applying the Crank–Nicolson schemes [1, 7] to the SDE (3.4), it follows that,
where represents samples from a multivariate normal distribution, and δ denotes the step size. When L = B is chosen, the pCN proposal is resulted,
where
Moreover, the equation (3.5) can also be rewritten as [15]
where
yielding the adaptive pCN scheme, it is used to accelerate the convergence of the pCN MCMC method. One of the optimal choices of C is the posterior covariance matrix, which is not directly available but can be determined by the historical samples empirically. For both the pCN and adaptive pCN scheme, the acceptance rate has the form [15]
where
Please refer to algorithm 2 for more details on the adaptive pCN sampling method.
Algorithm 2. The adaptive pCN MCMC. |
---|
Input: the covariance matrix B of the prior, parameters β1, β2, noise of the observed data σ, the initial guess , current observation time . The number of samples N1 and N, the update frequent number k0. |
Output: posterior samples and the PDE solution at time . |
1. for do |
2. Generate the candidate by the pCN scheme (3.6) |
3. Accept as with the rate |
4. Collect the posterior samples of the parameters and the PDE solution at time . |
5. Compute the empirical covariance matrix of the as C. |
6. for do |
7. if then |
8. Update the empirical covariance matrix of as C |
9. Propose the candidate by the adaptive pCN scheme (3.7) |
10. Accept as with the probability |
11. Store the posterior samples of the parameters and PDE solutions at time . |
3.2. Methodology
As mentioned earlier, the uniqueness of this inverse problem has undergone comprehensive scrutiny in prior works [31, 32]. In this study, we align ourselves with the principles outlined in these [31, 32] to articulate the uniqueness theorem.
Theorem 1. Given two unknown supports D1 and D2 which have sufficiently smooth boundaries, we denote the solutions corresponding to Dj by uj , . Recalling the data (2.2), we assume that there exists such that
where is the set of rational numbers. If
then in the sense of .
Proof. The proof of theorem 1 follows from [31, theorem 3.1] and the analyticity of the natural exponential function straightforwardly. □
Theorem 1 establishes that, under appropriate conditions, we can ascertain the precise support of the source using the data (2.2). However, extensive experimental findings, as presented in [25, 32], suggest that relying solely on fixed-sensor flux observations may not yield accurate source predictions. Furthermore, in both sampling methods [25, 33] and the classical Levenberg–Marquardt iteration, the accuracy of source capture is greatly contingent on sensor placement, particularly when only using a limited number of sensors.
Inspired by theorem 1, we introduce an algorithm that introduces variability in the locations of observation points (sensors). Firstly, two observed positions (sensors) along the boundary at time T1 are selected randomly, the positions are denoted as and , respectively. We then estimate the shape of the source based on the measured flux data collected at p1 and p2 in the framework of Bayesian inference; particularly the adaptive pCN method displayed in algorithm 2 is used to draw the posterior samples. We then move the observation points location at T2 according to the sensors migration rules detailed in sections 3.2.1 and 4.1.1. Moreover, to save the computational cost, we collect the posterior samples of the target inverse QoIs U at time T1 as , N is the number of samples, calculate the mean of and use it as the initial guess of the PDE among the interval , i.e. once the observed locations at time T2 determined, we only simulate the forward model during the time interval , which saves the cost of re-simulating the PDE solution from . The process is repeated until the predicted locations are closed to the current locations. For a comprehensive understanding of the procedure, we direct your attention to algorithms 1 and 2. It is worth noting that the option to select new sensor locations at random is a possibility. However, in section 4.1.1 and 3.2.1, we delve into two distinct approaches for altering the sensor locations, which we detail further.
3.2.1. A sensors migration rule.
Theorem 1 provides us with the possibility of using time-dependent sensors, and we also observe the improved numerical results in section 4. Notably, we even observe that the random sensors relocation strategy (moving the sensors at random at the next observation time) demonstrates superior performance compared to the fixed-sensor approach. However, it remains challenging for us to determine the optimal time to relocate the sensors. To address this issue, we introduce an effective approach that can help us identify when it is appropriate to cease relocating the sensors.
Our approach revolves around relocating the two sensors to locations characterized by the highest flux variance observed across all sampling iterations. To be precise, let the collection of flux at position for all different sampling steps be denoted as . We then proceed to shift one sensor to:
The second sensor is subsequently moved to the position exhibiting the second largest variance. It is important to emphasize that this strategy incurs no additional costs. Precisely, to compute the flux at the sensor locations, we must evaluate the equation's solution using the sampled source, guided by the acceptance rate (3.8). This enables us to compute the flux variance across all sampling iterations at any given point along the domain boundary. Notably, higher variance corresponds to a larger confidence interval and a less confident prediction. Consequently, our strategy involves relocating the sensors to positions characterized by the greatest variance.
4. Numerical experiments
In this section, we will conduct numerical experiments to test the performance of the algorithm. We shall delve into scenarios involving two distinct unknown sources, each characterized by disparate geometries and dimensions pertaining to the unknown parameters. Across both sets of problems, the task involves measuring boundary observation flux at two distinct points along the boundary—an endeavor that has proven to be particularly challenging for conventional implementation methodologies [31, 32]. Employing a comparative approach alongside the fixed observation points method, our analysis reveals a consistent trend: the methods we propose adeptly capture the source.
Consider the problem
the physical domain is . b represents the predefined intensity of the source term. χD denotes the characteristic function related to the domain D, where D is the unspecified region in need of reconstruction. We will specify D in the experiments later. The observations are the flux collected at some time and locations along .
For the convenience of solving the forward model, we transform the Cartesian plane to the polar coordinates, i.e.
the domain is then transformed to be . The equation becomes
The spatial and temporal discretization of the forward model (4.10) is conducted with the finite difference and backward Euler method, respectively. The synthetic measurement data is generated by solving the PDE with the reference source term with the spatial grid and the discretized time step . The discretized time step is set as when simulating the forward model to avoid the 'inverse crime'. The measurement locations are sequentially determined during the inference following algorithm 1.
4.1. Circle-shape source
In this experiment, we consider the following unknown domain D,
where are the unknowns. The corresponding expression under the polar coordinates is
where are the unknown parameters. As the range of the samples is for this algorithm, while the range of our parameters is finite, we use the bijection map
to map to the intervals (0, 1) and , respectively. The ξ become the unknowns endowed with the prior to be inferred. The relationship between ξ and the location of the center is
The statistical properties of η can be estimated through the posterior samples of ξ .
The predefined strength of the source term is b = 50, the ground truth parameter is , implying that the true parameter vector ξ is given by . The measurement error variance is . We perform a total of N1 = 0 initial iterations, followed by iterations, with updates occurring every iterations. In both the pCN and adaptive pCN schemes, the tunable parameters β1 and β2 are set to be equal to ensure an acceptance rate of 30%–40%.
4.1.1. The second sensors migration rule.
In section 3.2.1, we introduce a general sensor relocation strategy that is based on an evaluation of the variance of the flux for all different sampling steps at different points along the boundary. In this section, we introduce the second rule which is specific to this problem but can be easily generalized to problems represented in spherical or polar coordinates.
The parameters ξ1 and ξ2 represent the radius and center coordinates of the target circle. As the observation locations approach the target area, the gathered data becomes more informative. Since the target area is circular, the proximity of the observation locations to the domain can be quantified by measuring the distance between the center and the observation locations. Hence, the moving observation locations are determined based on the samples of ξ2, specifically,
where is the posterior mean of , hθ is the spatial step of the θ direction, yielding and the index of locations along the boundary.
Firstly, we take two measured locations along the edge randomly at time , and draw samples from the posterior density. We then calculate the posterior mean of as , and determine the new sensors location index as and at the next time layer T2. Obviously, the selected observation points p1 and p2 are adjacent. The process is repeated until the observation time is , and the forward model simulation and Bayesian inference are then stopped. The results of the dynamic procedure are displayed in figure 2.
Download figure:
Standard image High-resolution imageFor comparison purposes, we maintain the sensor location fixed at , and flux measurements are taken at time instances , respectively. The measurement error remains consistent for both the fixed location strategy and the strategy involving sequentially determined locations. In other words, for each observed time Ti , the noise introduced to the flux at the fixed locations is identical to the noise introduced to the flux at the sequentially determined locations. In figure 3, the posterior mean of the samples is represented by the red circle, offering an accurate estimate of the target region's center. However, figure 4 reveals a distinct pattern: the variance of the samples acquired through sequentially determined sensor placements is significantly reduced compared to those obtained using the fixed measurement strategy. This outcome is a result of the fact that sequentially determined measurements provide more informative data, leading to a substantial reduction in uncertainty.
Download figure:
Standard image High-resolution imageDownload figure:
Standard image High-resolution imageWe additionally performed a comparative analysis between our proposed method and a random selection strategy for measurement locations. To gain a more comprehensive understanding of the stochastic nature, we conducted two experiments with distinct measurement locations. The trace plots of posterior samples for ξ are presented in figure 5. It is evident that the dynamical sensors placement method, utilizing both the proposed varying method as detailed in section 4.1.1, and the random varying strategy successfully capture the inverse QoIs. Nevertheless, it is important to highlight that the random varying strategy results in a broader confidence interval.
Download figure:
Standard image High-resolution image4.2. High dimensional peanut-shape source
In this example, we examine a problem of higher dimensions with asymmetrical characteristics [31]. Specifically, the unknown domain has the boundary that can be parameterized as
with a smooth, periodic function , which has the form
To ensure the smoothness of the approximation, we follow the paper [31] and set the penalty term to be the H2 norm of , which implies the Gaussian prior density for ξ , where the covariance matrix is diagonal, with entries
The true parameter is , the strength is preset as b = 10, the variance of the measurement error is . The iteration numbers are , , the update frequent number is . The parameters in the pCN and adaptive pCN scheme are tuned separately, both lead to the acceptance rate 30%–40%.
We first assess the random sensor relocation method and showcase the corresponding outcomes in figure 6(b). We compare the proposed method with the fixed-sensor approach, and as shown in figure 6(a), the limited data available to the fixed-location strategy hinders its ability to accurately capture the shape of the target source D. Despite some overlap in measurement locations across different time points, the data collected by randomly relocating sensors proves to be sufficiently informative for capturing the intricate shape of the high-dimensional source. As depicted in figure 7, the samples derived from the randomly moving sensors strategy tend to cluster around or in proximity to the inverse QoIs denoted as ξ , whereas the fixed-sensor approach struggles to identify most of the QoIs, with the exception of ξ1.
Download figure:
Standard image High-resolution imageDownload figure:
Standard image High-resolution imageFollowing the aforementioned rule detailed in section 3.2.1, we sequentially ascertain the measurement locations and present the posterior mean of the prediction samples in figure 8. Notably, the locations at time T5 coincide with those at time T3. Specifically, the suggested locations at time T5 are . This recurrence of the location also appears at T4, indicating a pattern of repeated cycling between the locations and . Consequently, we choose to conclude the sequential process at time T5. This method offers a means to determine when to cease taking measurements, providing insights into the richness of the collected data to some extent. The subfigure includes the 1.5σ confidence interval of the posterior samples for , within which the actual value is predicted.
Download figure:
Standard image High-resolution image5. Concluding remarks
In this work, we consider the inverse source problem of the heat equation. We use the boundary measurements where the observation sensors can be moved. The Bayesian method is used and we attempt some numerical examples. The numerical results illustrate that the Bayesian approach is feasible to solve such an inverse problem which uses the data from the moving observation points.
We outline several directions for future research. First, it is worth exploring the application of the PDE (1.1) in a more general domain. In this study, we confined our attention to the unit disc in , which possesses advantageous geometric properties. Expanding our analysis to encompass general domains in or would undoubtedly introduce additional complexities to the uniqueness argument and reconstruction algorithms.
Second, a significant open question pertains to the optimal strategy for relocating observation sensors. While we proposed a migration rule (3.9) in this work, we acknowledge that it lacks rigorous mathematical proofs, rendering it somewhat empirical in nature. Consequently, an essential avenue for future research involves a thorough investigation of this migration rule to establish its validity and optimize its performance.
Acknowledgments
Na Ou acknowledges the support of Chinese NSF 11901060, Hunan Provincial NSF 2021JJ40557 and Scientific Research Foundation of Hunan Provincial Education Department 22B0333. Zhidong Zhang is supported by National Natural Science Foundation of China (Grant No. 12101627). Guang Lin acknowledges the support of the National Science Foundation (DMS-2053746, DMS-2134209, ECCS-2328241, and OAC-2311848), and US Department of Energy (DOE) Office of Science Advanced Scientific Computing Research program DE-SC0023161, and the Uncertainty Quantification for Multifidelity Operator Learning project (Project No. 81739), and DOE-Fusion Energy Science, under Grant Number: DE-SC0024583.
Data availability statement
The data cannot be made publicly available upon publication because the cost of preparing, depositing and hosting the data would be prohibitive within the terms of this research project. The data that support the findings of this study are available upon reasonable request from the authors.