1 Introduction

Coherent diffraction imaging (CDI) is a method for observing the complex amplitude field of an object with the intensity distribution of a diffraction pattern. In CDI, reference light is not necessary, and the optical hardware is simpler compared with that in interferometrical methods. The complex amplitude object is reconstructed from an intensity image captured with the CDI hardware module by using a phase retrieval (PR) algorithm [1, 2]. CDI with multiple shots to improve the ill-posedness in PR is termed ptychography. CDI, including ptychography, is especially useful for imaging in the non-visible spectral regions, such as X-rays, ultraviolet, and infrared, because refractive optical elements for those regions are difficult to manufacture [3,4,5,6]. These techniques are also fundamental for computational label-free imaging, where specimens without light absorption are visualized by retrieving their phase delay, in the visible spectral region [7,8,9].

CDI has become a promising imaging tool for various applications, e.g., astronomy and biomedicine [10,11,12]. In these applications, aberrating media, such as atmospheric turbulence, biological tissue, and imperfect optical components, significantly degrade the imaging performance. This issue can be solved by a computational inversion process if the aberrating impulse response is known [13,14,15]. However, this approach requires an invasive calibration process to obtain the impulse response of the aberrating media, and this restricts the range of applications. Furthermore, the inverse problem in CDI through unknown aberrating media is challenging due to its ill-posedness. Not only must the complex amplitude object be estimated in this case, but so does the aberrating impulse response.

In this paper, we present a method for single-shot CDI through shift-invariant aberrating media without the need for an invasive process. We use a random pinhole array, which is called a coded aperture, on the pupil plane to reduce the unknown variables in the optical transfer function. This is an extension of our previous work for single-shot blind deconvolution in imaging with incoherent light [16]. In the previous case, we assumed intensity objects, and the method proposed there was not applicable for visualizing phase objects. The phase imaging capability of our method in this study is especially promising for the field of biomedical imaging, where many specimens are transparent, and visualizing the phase without fluorescence staining is important for stem cell technology and immunotherapy [17]. In previous work on CDI with a coded aperture, modulation by a coded aperture played the role of spreading the diffraction intensity to improve the condition of PR [18,19,20,21]. Here, we utilize the coded aperture not only for PR but also for blind deconvolution by taking into account unknown aberrations during the measurement process.

2 Methods

The optical setup of the proposed method is conceptually shown in Fig. 1. In this method, a complex amplitude object f is captured as an intensity image g through the unknown aberrating medium and the coded aperture A, which is a binary random pattern located on the pupil plane and is known. We assume an existence area s of the object, which is called the support and restricts the object size [1, 2]. We also assume shift-invariant and absorption-free aberration represented by an aberrated pupil function P. Although the aberrating medium is not located on the pupil plane in Fig. 1, the aberrating impulse response is shift-invariant, and the aberration is described as the pupil function when the field of view of an imaging system is limited, as indicated by the object’s support s [22,23,24].

Fig. 1
figure 1

Schematic diagram of proposed CDI through aberrating media with a coded aperture on the pupil plane

This measurement process is written as

$$\begin{aligned} g=|{\mathcal {F}}^{-1}[A\odot P]\otimes (s\odot f)|^2, \end{aligned}$$
(1)

where \({\mathcal {F}}^{-1}[\bullet ]\) is the inverse Fourier transform, \(\odot\) denotes the element-wise product, and \(\otimes\) denotes the convolution. As shown in Eq. (1), we introduce not only the object’s support s but also the pupil’s support A by the coded aperture to relax the ill-posedness in PR. The coded aperture A reduces the unknown variables on the aberrated pupil function P by forcibly setting many pixels on the pupil to zero. However, at the same time, the coded aperture A partially blocks the object’s frequency spectrum. In the reconstruction process, we utilize the object’s support s and regularization for the object f to recover the blocked object’s frequency spectrum based on compressive sensing, as mentioned below [25,26,27]. For example, an object field is recovered from a sparsely and randomly sampled frequency spectrum by assuming redundancy of the object field [28, 29]. Our scheme stands on the perspective that objects have morphological structures; thus, they are compressible compared with pupil functions representing aberrations or turbulence.

We assume the following cost function e with the mean squared error for the inverse problem of Eq. (1):

$$\begin{aligned} e=\frac{1}{N}\Vert {\widehat{g}}-g\Vert _2^2, \end{aligned}$$
(2)

where N is the number of pixels on the captured intensity image g, \({\widehat{g}}\) is an estimation of the captured intensity image g, and \(\Vert \bullet \Vert _2\) is the \(\ell _2\) norm. We find an estimated complex amplitude object \({\widehat{f}}\) and an estimated pupil function \({\widehat{P}}\) for minimizing the cost function e in Eq. (2) as follows:

$$\begin{aligned} \mathop {\textrm{arg min}}\limits _{{\widehat{f}},{\widehat{P}}} e({\widehat{f}},{\widehat{P}}). \end{aligned}$$
(3)

This optimization problem is solved based on the gradient descent method. The partial derivative of e with respect to \({\widehat{f}}\) is written as

$$\begin{aligned} \begin{aligned} \frac{\partial e}{\partial {\widehat{f}}}&=\frac{4}{N}s\odot {\mathcal {F}}^{-1}[(A\odot {\widehat{P}})^*\\&\quad \odot {\mathcal {F}}[({\mathcal {F}}^{-1}[A\odot {\widehat{P}}]\otimes (s\odot {\widehat{f}}))\odot ({\widehat{g}}-g)]], \end{aligned} \end{aligned}$$
(4)

where \({\mathcal {F}}[\bullet ]\) is the Fourier transform, and the superscript \(*\) denotes the complex conjugate of the variable. The partial derivative of e with respect to \({\widehat{P}}\) is written as

$$\begin{aligned} \begin{aligned} \frac{\partial e}{\partial {\widehat{P}}}&=\frac{4}{N}{\mathcal {F}}[(s\odot {\widehat{f}})]^*\odot A \\&\quad \odot {\mathcal {F}}[({\mathcal {F}}^{-1}[A\odot {\widehat{P}}]\otimes (s\odot {\widehat{f}}))\odot ({\widehat{g}}-g)]. \end{aligned} \end{aligned}$$
(5)

Based on the gradient descent step with the partial derivatives in Eqs. (4) and (5), the object \({\widehat{f}}\) and the pupil function \({\widehat{P}}\) are estimated from the single captured intensity image g. By assuming the object’s support s in Eq. (1), the elements of \({\widehat{f}}\) corresponding to zero elements on s are known to be zeros. Similarly, by introducing the coded aperture A on the pupil in Eq. (1), the elements of \({\widehat{P}}\) corresponding to zero elements on A do not influence the captured intensity image g and do not need to be estimated; in other words, they can be set arbitrarily. Then, these elements of \({\widehat{f}}\) and \({\widehat{P}}\) corresponding to zero elements on s and A, respectively, are initially set to zeros and are not updated in the iterative feedbacks because the partial derivatives of them are zeros. The other elements of \({\widehat{f}}\) and \({\widehat{P}}\) are updated by the partial derivatives. In this study, the Adam optimizer is used for the gradient descent step [30]. We additionally employ the \(\ell _1\) norm and the total variation norm for the regularization in the inverse problem to ensure the sparsity and the smoothness of the object \({\widehat{f}}\) [25, 31]. Through the above-mentioned gradient descent process, the convolution of the shift-invariant aberration is blindly deconvoluted.

3 Demonstration

3.1 Simulation

We numerically demonstrated the proposed method, as shown in Fig. 2. The pixel count of each image was \(128\times 128\). The amplitude and phase of the object are shown in Figs. 2(a) and 2(b), respectively. The object’s support s for the reconstruction process was set to a square of \(40\times 40\) pixels at the center of the object plane. The aberrated pupil function P was generated as a uniform random distribution with a range of \(\pm \pi\), as shown in Fig. 2(c). We assumed a binary random pattern for the coded aperture A, and changed its transmittance ratio, which is the percentage of light passing (white) pixels to the total pixel count, to verify the impact of the support on the pupil. White Gaussian noise with a signal-to-noise ratio of 30 dB was added to each captured intensity image to show the robustness of our method against measurement noise.

Fig. 2
figure 2

Simulation results. a Amplitude and b phase of the object. c Aberrated pupil function. d Captured intensity image, e reconstructed amplitude, and f reconstructed phase at transmittance ratio of 10%. g Captured intensity image, h reconstructed amplitude, and i reconstructed phase at transmittance ratio of 20%

Fig. 3
figure 3

Reconstruction error calculated with the RMSEs versus transmittance ratio of the coded aperture

In the reconstruction process, the step size and the number of iterations in the Adam optimizer were empirically chosen to be \(5\times 10^{-3}\) and \(5\times 10^4\), respectively. The other parameters of the Adam optimizer were the same as those in the original work [30]. The initial estimations of the object \({\widehat{f}}\) and the aberrated pupil function \({\widehat{P}}\) were set as random distributions with ranges of 0 to 1 for \({\widehat{f}}\) and of \(-\pi\) to \(\pi\) for \({\widehat{P}}\). Ten trials with different initial sets were performed and the best one achieving the lowest cost function e in Eq. (2) was chosen as the final result. The ambiguity of image translations in blind deconvolution was compensated with the cross-correlation between the original and reconstructed amplitudes. The computational time for the reconstruction was about 45 min with an Intel Core i9-9980HK processor running at 2.40 GHz and 64 GB of RAM.

The reconstruction errors at coded aperture transmittance ratios from 10% to 100% with intervals of 10% were evaluated with the root-mean-square errors (RMSEs), as shown in Fig. 3. The RMSEs between the object’s complex amplitude field f and the reconstructed one \({\widehat{f}}\) was calculated as

$$\begin{aligned} \textrm{RMSE}=\sqrt{\frac{1}{N}\Vert f-{\widehat{f}}\Vert _2^2}. \end{aligned}$$
(6)

In Fig. 3, the centers and lengths of the error bars are the averages and standard deviations calculated with ten different sets of the aberrated pupil function and the coded aperture. At transmittance ratios equal to or less than 20%, the reconstruction errors were lower, and the best result was achieved at 20%. The reconstruction results at transmittance ratios equal to or larger than 30% were almost entirely noise.

The captured intensity image and the reconstructed amplitude and phase at the transmittance ratio of 10% are shown in Figs. 2(d)–(f), respectively. Small noise was observable on both the reconstructed amplitude and phase. The captured intensity image and the reconstructed amplitude and phase at the transmittance ratio of 20% are shown in Figs. 2(g)–(i), respectively. Noise on the reconstructed complex amplitude field was smaller than that at the transmittance ratio of 10%. These results were consistent with the RMSEs in Fig. 3. The reconstruction in Fig. 2 and the analysis in Fig. 3 indicate that a smaller transmittance ratio blocked the object’s frequency spectrum and a higher transmittance ratio resulted in a less stringent support on the pupil. In Fig. 3, the difference of the RMSEs between 20% and 30% was larger than that between 10% and 20%. This trend indicates that the number of unknown variables on the aberrated pupil function has a strong impact on the reconstruction compared with that on the object’s frequency spectrum. Therefore, it is necessary to choose the transmittance ratio to enhance the performance of single-shot blind deconvolution in CDI based on our method.

3.2 Experiment

We performed an optical experiment with the setup in Fig. 4 to verify the proposed method. The optical setup was composed of two 4f systems. The focal length of the lenses in the 4f systems was 100 mm. The upstream 4f system implemented the shift-invariant aberration by the turbid medium on the pupil plane. In this experiment, we used a lens as the turbid medium to introduce a severe defocus as reproducible and stable aberrations. The downstream 4f system was the CDI module with the coded aperture implemented by a transmissive spatial light modulator (SLM: LC2012 manufactured by HOLOEYE, pixel count: \(1024\times 768\), pixel pitch: 36 µm) in the amplitude mode. The light from the two 4f systems was captured by a monochrome image sensor (PL-D7512 manufacutured by PixeLink, pixel count: \(4096\times 3000\), pixel pitch: 3.45 µm). A transmissive object was implemented by a sheet of aluminium foil, which was partially covered by a cover glass to induce a phase delay. This object was illuminated with a collimated beam from a laser source (SDL-532-010T manufactured by Shanghai Dream Lasers Technology, wavelength: 532 nm).

Fig. 4
figure 4

Optical setup composed of two 4f systems. SLM: spatial light modulator

A random binary pattern with \(128\times 128\) pixels was displayed on the SLM as the coded aperture A. In accordance with the results of Fig. 3 in the simulation, we performed the experiment with the transmittance ratios of 10% and 20%. The central area of \(428\times 428\) pixels on the captured image was clipped and downsampled to \(128\times 128\) pixels and was provided to the reconstruction algorithm as the captured intensity image g. The object’s support s in the reconstruction process was set to a square of \(40\times 40\) pixels.

Fig. 5
figure 5

Experimental results with a rectangle. a Amplitude and b phase of the object observed with OFC without the turbid medium. c Captured intensity image with the turbid medium and without the coded aperture (transmittance ratio of 100%). d Captured intensity image, e reconstructed amplitude, and f reconstructed phase with the turbid medium and the coded aperture at the transmittance ratio of 10%. g Captured intensity image, h reconstructed amplitude, and i reconstructed phase with the turbid medium and the coded aperture at the transmittance ratio of 20%. The length of the scale bar in a is 0.5 mm

Experimental results with a rectangular hole are shown in Fig. 5. The amplitude and phase of the object are shown in Figs. 5(a) and 5(b), respectively. The upper part of the rectangle was covered by a cover glass. The phase of the object was observed by using overlapped Fourier coding (OFC) [32]. In OFC, a small square aperture was scanned on the SLM and the phase was retrieved from a number of captured intensity images by a ptychographical algorithm. The turbid media was not present when performing OFC. When a lens with the focal length of 200 mm was used as the turbid media on the pupil of the upstream 4f system in Fig. 4, the object was defocused in the captured intensity image without the coded aperture, as shown in Fig. 5(c).

The captured intensity image and the reconstructed amplitude and phase at the transmittance ratio of 10% are shown in Figs. 5(d)–(f), respectively, where the reconstruction fidelity was low. The captured intensity image and the reconstructed amplitude and phase at the transmittance ratio of 20% are shown in Figs. 5(g)–(i), respectively. In this case, both the amplitude and phase were reconstructed well. The RMSEs between the complex amplitude field with OFC and the reconstructed complex amplitude fields at the transmittance ratios of 10% and 20% were 0.55 and 0.43, respectively.

Fig. 6
figure 6

Experimental results with two circles. a Amplitude and b phase of the object observed with OFC without the turbid medium. c Captured intensity image with the turbid medium and without the coded aperture (transmittance ratio of 100%). d Captured intensity image, e reconstructed amplitude, and f reconstructed phase with the turbid medium and the coded aperture at the transmittance ratio of 10%. g Captured intensity image, h reconstructed amplitude, and i reconstructed phase with the turbid medium and the coded aperture at the transmittance ratio of 20%. The length of the scale bar in a is 0.5 mm

Experimental results with two circular holes are shown in Fig. 6. The amplitude and phase of the object were observed by OFC without the turbid media, as shown in Figs. 6(a) and 6(b), respectively. In this case, the upper right hole was covered by a cover glass. The turbid media was a lens with a focal length of 100 mm, which induced a more severe defocus compared with the previous case. The intensity image captured through the turbid media without the coded aperture suffered from strong defocus, as shown in Fig. 6(c).

The captured intensity image and the reconstructed amplitude and phase at the transmittance ratio of 10% are shown in Figs. 6(d)–(f), respectively. The captured intensity image and the reconstructed amplitude and phase at the transmittance ratio of 20% are shown in Figs. 6(g)–(i), respectively. From a comparison between the two results, the latter achieved fewer artifacts. The RMSEs between the complex amplitude field with OFC and the reconstructed complex amplitude fields at the transmittance ratios of 10% and 20% were 0.81 and 0.53, respectively.

The results at the transmittance ratio of 20% were better than those at 10% in both experiments of Figs. 5 and 6. This is in agreement with the simulation in Fig. 3. The reconstruction error calculated with the RMSEs in the experiment were worse than those in the simulation. This may be caused by an alignment error of the optical components and imperfections of the coded aperture implemented by the SLM, such as a low fill factor. Further calibrations and a more-precise optical model will improve the performance in the experiment.

4 Conclusion

We proposed a method for single-shot blind deconvolution of an unknown shift-invariant aberration in CDI. In the method, we used a coded aperture as the support of the aberrated pupil function to reduce unknown variables in the reconstruction process. A complex amplitude object illuminated with coherent light was recovered from a single intensity image captured through turbid media and the coded aperture by using a gradient decent algorithm. Although the coded aperture on the pupil plane blocks the object’s frequency spectrum, the object was able to be reconstructed by applying regularization to the object in the spatial domain and by choosing an appropriate transmittance ratio of the coded aperture. We numerically and experimentally demonstrated our method under severe turbulence conditions. In the simulations and the experiments, including a quantitative evaluation with OFC, the coded aperture with a transmittance ratio of 20% realized the best performance.

One issue with the proposed method is the need to isolate the object to introduce the support on the object plane. This issue may be addressed by employing state-of-the-art regularization and optimizing the coded aperture [33, 34]. Our method estimates both the amplitude and the phase of an object field through unknown aberrating media with a non-invasive process. It is promising for biomedical imaging to visualize transparent specimens in a label-free manner. Another advantage of our method is the simplicity of the optical hardware which uses a single shot, without the need for interferometrical measurements. Therefore, the proposed method will contribute to not only biomedicine but also other fields, such as security and free-space optical communication.