NSClean: An Algorithm for Removing Correlated Noise from JWST NIRSpec Images

Bernard J. Rauscher

doi:10.1088/1538-3873/ad1b36

1. Introduction

JWST is today's premier space observatory for mid and near-infrared (NIR) astronomy (Gardner et al. 2023). To enable science objectives cutting across astrophysics, JWST carries a suite of four science instruments: a Near Infrared Camera (NIRCam; Rieke et al. 2023), a Near Infrared Imager and Slitless Spectrograph (NIRISS; Doyon et al. 2023), a Mid-infrared Instrument (MIRI; Wright et al. 2023), and a Near Infrared Spectrograph (NIRSpec; Jakobsen et al. 2022). This article concerns NIRSpec, an algorithm, and a software package to further reduce its already low read noise: "NSClean."

From early on, it was understood that NIRSpec required ultra-low noise detectors. NIRSpec is detector noise limited for all but prism-mode observations. This is in contrast to other JWST instruments that are generally limited by the astronomical background. Consequently, NIRSpec had more stringent noise requirements.

"Total Noise" is intended to account for the combined effects of detector read noise and shot noise on integrated dark current. To measure it; one defines a standard scientific exposure, takes many such exposures (typically >40), and then computes the standard deviation per pixel. Across JWST's NIR instruments, the exposure time was taken to be 1000 s. For NIRCam and NIRISS, median total noise was required to be <10 e⁻ per exposure. For NIRSpec, the requirement was <6 e⁻.¹

This <6 e⁻ noise requirement is the reason why we developed Improved Reference Sampling and Subtraction (IRS²; pronounced IRS-square; Rauscher et al. 2017). In IRS² mode, NIRSpec uses a special clocking pattern and reference correction pipeline step to reduce correlated noise as far as possible using the NIRSpec detector's built-in references. Using IRS², NIRSpec's total noise is slightly less than 6 e⁻ on average, and to within the uncertainties compliant with requirements. IRS² is the recommended readout mode for most observations except for extremely bright targets (JWST User Documentation website 2023).

However, even with NIRspec's detectors meeting requirements, many NIRSpec observers report seeing faint, correlated read noise in count rate images that complicates calibration. Fortunately, for NIRSpec, much of this can be removed using dark areas of images as references.

Figure 1 shows an example of the correlated noise from an early NIRSpec Integral Field Unit (IFU) observation. We have smoothed the images and stretched the grayscales to emphasize correlated noise that would otherwise be more difficult to see against the background of NIRSpec's ∼6 electrons total noise. One sees a "picture frame" effect, whereby areas near the edges of both detectors on all four sides seem less noisy. In the interiors, one sees faint vertical striping. While the amplitude is small, this correlated noise can undermine accurate photometry when no local sky is available and cause false spectral features when working near the read noise.

NSClean uses blanked off areas of NIRSpec scenes to model and subtract the background, including the correlated noise. Figure 2 shows a typical background pixels mask for IFU mode. The red shaded pixels are used to build the background model. Because it uses more information and allows for structure in the background, NSClean's correlated noise correction is more complete and more uniform than is possible using simpler techniques such as subtracting a rolling median of background columns (Figure 3).

We developed and tested NSClean using IFU data. As Figure 3 shows, the reduction in correlated noise can be dramatic. Although we have not tested NSClean with multi-object spectrograph (MOS) data ourselves, we understand from colleagues on the Early Release Science (ERS) TEMPLATES² team that it works well. They are studying extremely magnified panchromatic lensed arcs with extended star formation. The advantage for TEMPLATES is that the irregularly shaped sources seldom align with a standard 3-slit MOS dither pattern. We are in the early stages of testing a Bright Object Time Series (BOTS) module now. We plan to include that in a future release.

The rest of this paper is structured as follows. In Section 2, we explain the underlying physical cause of the correlated noise that NSClean removes. Section 3 describes the NSClean algorithm. The idea is simple. NSClean approximates the Fourier transform of the background using an algorithm that is robust against missing data and gaps where spectra lie. NSClean then apodizes the Fourier transform using a low-pass filter³ and inverts it to compute the background model. Section 4 discusses the python-3 implementation and computing requirements. We close in Section 5 with a summary.

2. Physical Cause of the Correlated Noise

Our focus in this paper is on the specific correlated noise that NSClean is designed to fix. Readers who want to learn more about NIRSpec's read noise in general may want to see some of our earlier papers. Rauscher (2015) describes the origins of NIRSpec's white and 1/f noise, and provides a python package for simulating it. Rauscher et al. (2017) describes NIRSpec's IRS² readout mode. Without IRS², the residual correlated noise that remains today would be much worse.

The correlated noise that remains after IRS² is a logical consequence of how IRS² works. NIRSpec uses two Teledyne H2RG NIR detector arrays (Loose et al. 2003). Each H2RG provides two types of reference information that can be used to remove correlated read noise. These are the "reference pixels" that form a 4-pixel wide frame on all sides of NIRSpec images and one "reference output" per H2RG. The reference output is not visible in the usual pipeline data products, but it is used most of the time. As described in Rauscher et al. (2017), IRS² is built on principal component analysis (PCA) showing that NIRSpec's read noise is covariance stationary to a high degree of approximation. Informally, this means that the read noise is independent of when one looks.

It turns out that in JWST's NIR detector systems, thermal instability causes noise that is not covariance stationary. There is a picture frame pattern that changes in time at the ∼1 e⁻ level. Rauscher et al. (2013) describe how small temperature fluctuations can drive the picture frame. This is why the vertical banding that is visible in Figure 3(a) seems to fade away near the edges. The relatively quiet edges are in the picture frame while the vertical bands are not. IRS² relies on the reference pixels to see noise in order to remove it. Since the reference pixels are in the picture frame and do not see the vertical banding, IRS² is powerless to remove it.

3. Algorithm

NSClean is built on the Fourier transform of the instrumental background. Our treatment starts in Section 3.1, by reviewing how python's numpy package implements the classical Fast Fourier Transform (FFT; Cooley & Tukey 1965) for fully sampled data. Since NIRSpec's background is not fully sampled (because of astronomical sources), Section 3.2 explains how NSClean computes a statistically optimal approximation to the Fourier transform using all available background samples.

The next two subsections describe the linear algebra that underpins NSClean. Insofar as possible, we have tried to use a consistent, standard notation. Throughout this paper, boldface lowercase letters are vectors and uppercase boldface letters are matrices. When discussing matrix elements, we use superscripts for row indices and subscripts for column indices.

3.1. Numpy's Classical FFT

For dark exposures, one can use numpy's FFT package to compute the Fourier transform of an image column. Like all FFTs, numpy uses a highly efficient factorization of the Fourier matrix, F , to solve the matrix equation,

$\begin{eqnarray}&&{\boldsymbol{F}}{\boldsymbol{f}}={\boldsymbol{d}},\end{eqnarray} \tag{ 1 }$

where f is the Fourier transform of the data, d . For n pixels per column, in numpy the elements of F are,

$\begin{eqnarray}&&{F}_{k}^{m}=\exp \left\{2\pi i\displaystyle \frac{{mk}}{n}\right\}.\end{eqnarray} \tag{ 2 }$

Because NIRSpec's data are real valued and n = 2048 is an even number; m = 0, 1,...,n − 1 and k = 0, 1,...,n/2.

3.2. NSClean's Fourier Transform

For NIRSpec's incompletely sampled background, NSClean uses weighted least squares to approximate Fourier transforms. The starting point is again Equation (1),

$\begin{eqnarray}&&{\boldsymbol{F}}{\boldsymbol{f}}\approx {\boldsymbol{d}},\end{eqnarray} \tag{ 3 }$

but now as an approximation and with the understanding that F , f , and d are incomplete. F is missing columns where light falls on the detector and rows for frequencies that we choose not to fit. f contains only a few very low frequencies to minimize noise. d is missing rows where the detector is illuminated.

To solve Equation (3) using least squares, we minimize the generalized distance squared,

$\begin{eqnarray}&&{\delta }^{2}={({\boldsymbol{F}}{\boldsymbol{f}}-{\boldsymbol{d}})}^{{\rm{H}}}{\boldsymbol{W}}({\boldsymbol{F}}{\boldsymbol{f}}-{\boldsymbol{d}}),\end{eqnarray} \tag{ 4 }$

using all available background samples. The symbol, "^H," denotes the conjugate transpose, which is also known as the Hermitian transpose. A weight matrix, W , is required to compensate for non-uniform background sampling. NSClean weights inversely by the local sample density squared, ρ⁻²:

$\begin{eqnarray}{\boldsymbol{W}}=\left[\begin{array}{cccc}{\rho }_{00}^{-2} & 0 & 0 & 0\\ 0 & {\rho }_{11}^{-2} & 0 & 0\\ 0 & 0 & \ddots & 0\\ 0 & 0 & 0 & {\rho }_{{n}^{{\prime} }-1\ {n}^{{\prime} }-1}^{-2}\end{array}\right].\end{eqnarray} \tag{ 5 }$

W is diagonal and equal to its conjugate transpose. Section 3.3 describes W in more detail. The quantity ${n}^{{\prime} }\leqslant n$ is equal to the number of background samples. Under these conditions, the least squares solution to Equation (4) is,

$\begin{eqnarray}&&{\boldsymbol{f}}={\left({{\boldsymbol{W}}}^{1/2}{\boldsymbol{F}}\right)}^{+}{{\boldsymbol{W}}}^{1/2}{\boldsymbol{d}}.\end{eqnarray} \tag{ 6 }$

The symbol, "⁺," denotes the Moore–Penrose inverse. Being a Fourier transform, the quantity f is a complex valued vector.

Equation (6) is this paper's key result. NSClean uses this expression to approximate the Fourier transform of the incompletely sampled background.

Figure 4 shows an example of how Equation (6) works in practice. Panel (a) shows a vertical cut through NRS2, which has the most correlated noise of the two detectors. To show detail, Panel (b) plots only the innermost 1024 rows. The blue points are background samples, the orange points are pixels that the background mask marked as potentially illuminated, and the blue line is the model built using Equation (6). As a practical matter, we were able to fit about nine frequencies (≈18 free parameters) before we started to see increased noise due to over fitting. As expected, the blue line passes near the centers of groups of blue points. It is smooth, continuous, and very low noise compared to the pixels themselves.

3.3. The Weight Matrix, W

The weight matrix compensates for uneven background sampling. Returning to Figure 2, there are often only a few rows of blanked off background pixels between the spectral traces. But; near the bottom, middle, and top of each detector, there are much larger areas of background pixels. When nothing is done to compensate for the uneven background sampling, scientifically uninteresting areas of the scene carry far too much weight.

As described earlier, NSClean computes the Fourier transforms of columns individually using weighted least squares fits. After a bit of trial and error, we found that weighting inversely by the local background sample density in columns works well. There is nothing fundamental about this weighting scheme. We imagine that some observers may find better ones for their data.

One could compute the local sample density by convolving the background mask with a tophat function (Figure 5). While effective, the resulting weight curve is quantized in units of the tophat's width. To eliminate the quantization while still approximating the local density, NSClean convolves columns of the background mask with a Gaussian kernel. In the current release, the kernel's standard deviation is hard coded to be σ = 32 pixels. Going forward, it may be possible to come up with something more elegant. 32 pixels seems to work well for many IFU observations.

3.4. Making Masks

In general, NSClean masks are scene dependent, although depending on the program it may not be necessary to make new masks for every exposure. This is because the specific pixels that are blanked off depend on the instrument configuration in ways that only the instrument model can predict. For MOS modes, the blanked off pixels differ with every microshutter array (MSA) configuration.

This section describes how we made the masks shown in Figure 2. Since we only needed two masks for development purposes, we made them manually based on visual inspection of input images using SAOImage DS9. For real science programs, it would be helpful to automate the process. The TEMPLATES team has tools that create masks from information available in the pipeline. This is described in Section 3.5. Looking forward, it would be beneficial to better understand how to make masks in the presence of complicating factors like scattered light. However, this goes beyond the scope of this first paper.

For this paper, we used the GNU Image Manipulation Program (GIMP) to make masks. The starting point was a NIRSpec "rate.fits" image displayed in SAOImage DS9. We adjusted the grayscale to show spectral traces and illuminated areas. Then, using DS9, we exported the 2048 × 2048 image to Portable Network Graphics (PNG) format. We chose PNG because we knew that it imported well into GIMP.

In GIMP, we created a selection that contained all illuminated pixels and inverted this to get the background sample. We shaded the selection 30% red as shown in Figure 2 and exported the result to another PNG image. The NSClean package includes a python method that is capable of importing a shaded PNG image and converting it to a FITS background pixels mask. There is an example notebook in the distribution that shows how to do this.

3.5. Automated Mask Making

This description was provided by Taylor Hutchison and Brian Welch of the TEMPLATES team. It has been lightly edited for consistency.

For the TEMPLATES overview paper (Rigby et al. 2023), the team created masks from the pre-existing stage 2 products (specifically the cal.fits files). These mask out everything except the science pixels.⁴ In addition to the masking provided by the cal.fits files, Rigby et al. masked out the region of the detector where the fixed slits are located and all jump detections (using the data quality flag). Finally, to ensure clean masking, they employed a subtle binary dilation to the cal.fits files to add a small amount of buffering on the mask edges.

4. Implementation

NSClean is written in python-3. We chose python for compatibility with the rest of the JWST pipeline. The current NSClean version is not computationally demanding. The typical cleaning time for one 2048 × 2048 NIRSpec image is a few seconds. This assumes that multithreading is turned on for the python linear algebra libraries as described in Section 4.2.

The current NSClean version works column-by-column. Since there are only 2048 pixels per column, this means that it requires very little RAM, and the time penalty for projecting out Fourier vectors using Equation (6) is small compared to using the FFT algorithm.⁵

We have tested NSClean on JWST Calibration Pipeline Stage 1 products ("rate.fits"). Because the operations are linear, one could in principle apply NSClean prior to slope fitting in Stage 1, although we have not tried this.

4.1. Computing Requirements

The execution time on our development server is about 6 s for one 2048 × 2048 pixel NIRSpec image. The server, which is a few years old, has 8×Intel Xeon cores running at 3.5 GHz and 250 GB of RAM. In practice, NSClean uses only a tiny fraction of the RAM. Although our server has an NVIDIA Quadro M4000 GPU with 8 GB of RAM, in practice we found that NSClean's execution time was about the same in CPUs as in the GPU. This is because Equation (6) 's matrices are not large when images are processed in columns.

We have also tested NSClean on a 2019 MacBook Pro. Execution time on the MacBook is about 12 s per image. The MacBook has an 8-Core Intel i9 CPU running at 2.3 GHz and 32 GB of RAM. Again, NSClean did not use much of this RAM. According to the Apple ActivityMonitor App, peak usage was about 150 MB.

Our development server had the following software; Oracle Linux Server release 8.7, python-3.10.8, astropy-5.0.4, cupy-11.5.0, numpy-1.22.3, and pillow-9.3.0.

4.2. Multithreading

NSClean is not explicitly multithreaded. In practice, however, we always have multithreading turned on for python's linear algebra libraries. As a result, when we run NSClean, it usually shows all CPUs being used because most of the work is linear algebra.

On our Intel-based computers, this is done by installing the Intel version of numpy and setting an environment variable. For our 8-core server, the python code is as follows.

import os
os.environ["MKL_NUM_THREADS"] = "8"

Our understanding is that on non-Intel computers, similar functionality exists, although the environment variables are different.

When a GPU is used, python's cupy package automatically parallelizes linear algebra over however many GPU cores are available. Our NVIDIA Quadro M4000 has 1664 CUDA cores. However, as a practical matter, on our server the time penalty for uploading/downloading data to the GPU overcame the advantages provided by increased parallelization. We therefore typically run using only the system's CPUs.

4.3. Installing NSClean

NSClean is a standard pip-installable python package. It is available from the NASA JWST website (NASA JWST website 2023). To install it on MacOS or Linux, change into a directory that is in your python path, and download the distribution. Then, use pip to install it,

pip install –e nsclean.

This will install nsclean as an editable package in your python path.

5. Summary

Many JWST observers are finding that there is faint vertical banding and a picture frame pattern in pipeline calibrated NIRSpec images. The effect is particularly challenging for IFU observations because it can add spectral features that are not real. This article describes the NSClean python package that uses dark areas of NIRSpec scenes to remove this noise. To use NSClean, one must provide a mask specifying which pixels are to be treated as background. For each count rate image, NSClean then: (1) computes the Fourier transform of the background using an algorithm that can handle missing data, (2) applies a low-pass apodizing filter to reduce noise, and (3) inverts the Fourier transform yielding a background model. When the background model is subtracted from the image, it removes most of the correlated noise. NSClean is simple and computationally undemanding. The NSClean python package is freely available for download from the NASA JWST website (NASA JWST website 2023).

Acknowledgments

This work was supported by NASA as part of the JWST Project. I wish to thank Jane Rigby of the TEMPLATES team for her eagerness to test early versions of NSClean and enthusiastic support ever since. Jane read an early draft of this article and made many helpful comments. I am grateful to TEMPLATES team members Taylor Hutchison and Brian Welch for providing the description of how TEMPLATES makes background masks (Section 3.4) on very short notice. I thank Stephan Birkmann of the JWST NIRSpec Team. Stephan was the first person to describe the issue to me and its impact on early JWST science. I have worked with Stephan ever since the early days of NIRSpec. It has always been, and continues to be, a pleasure.

Facility: JWST(NIRSpec) -

Software: JWST Calibration Pipeline (Bushouse et al. 2023), astropy (Astropy Collaboration et al. 2013, 2018).

NSClean: An Algorithm for Removing Correlated Noise from JWST NIRSpec Images

Article metrics

Author e-mails

Author affiliations

ORCID iDs

Dates

Abstract

1. Introduction

2. Physical Cause of the Correlated Noise