Methods for Averaging Spectral Line Data

L. D. Anderson; B. Liu; Dana. S. Balser; T. M. Bania; L. M. Haffner; Dylan J. Linville; Matteo Luisi; Trey V. Wenger

doi:10.1088/1538-3873/ad0444

1. Introduction

Averaging spectral line data allows one to increase the signal-to-noise ratio (S/N) of the resultant spectrum. Such averaging is straightforward when the spectra are taken of the same source and have similar noise characteristics. The situation is complicated, however, if the noise or source intensity differ significantly between observations. In such cases, depending on the distribution of peak intensities in the observations and the weighting scheme, averaging spectra can result in a decrease in the S/N. We are interested in exploring the implications of using one averaging or weighting method over another.

This is not an entirely new problem but general prescriptions are lacking in the astronomical literature. Rosales-Ortega et al. (2012) explored how to maximize the S/N for integral field spectroscopy (IFS) observations and provided code to the community. They argued that the optimal integration method depends on the science case. Zhang & McElvain (1999) dealt with the problem of averaging multiple spectra taken from a chromatography/ spectroscopy experiment. They found that for a Gaussian-peaked signal distribution, the maximum S/N is attained when the 38% highest S/N individual spectra are averaged. Unser & Eden (1990) developed a method for maximizing the S/N for a set of 2D images that better accounts for noisy data where the S/N of individual observations is difficult to measure. Adaptive smoothing of 2D images, such as using Voronoi Tessellations or Weighted Voronoi Tessellations, can be used to create spatial regions that meet user-specified S/N criteria (Cappellari & Copin 2003; Diehl & Statler 2006). The ideal averaging method may depend on whether the intensities of the spectra to be averaged are uniform or have a large variance.

In this paper, we explore methods for spectral averaging and provide guidance for multiple use-cases. We focus our analytical treatment on radio spectroscopic observations (i.e., we use the variable "T" for intensity), but the method is applicable to any spectral line data set.

2. The Signal to Noise Ratio

A spectral line has a S/N given by Lenz & Ayres (1992)

$\begin{eqnarray}&&{\rm{S}}/{\rm{N}}=C{\left(\displaystyle \frac{{\rm{\Delta }}V}{{\rm{\Delta }}\lambda }\right)}^{0.5}\displaystyle \frac{\langle {T}_{P}\rangle }{\langle \sigma \rangle },\end{eqnarray} \tag{ 1 }$

where C is a constant whose value depends on the line shape, ΔV is the full width at half-maximum (FWHM) line width, Δλ is the spectral resolution (or width of the smoothing kernel), 〈T_P〉 is the average peak line brightness temperature, and 〈σ〉 is the average rms noise. For a spectrum that can be modeled as a Gaussian line with white noise, C = 0.7.

Upon averaging n spectra each with (unnormalized) weighting w_i, the average intensity at a given spectral channel is

$\begin{eqnarray}&&\langle T\rangle =\displaystyle \frac{{\sum }_{i=1}^{n}{T}_{i}{w}_{i}}{{\sum }_{i=1}^{n}{w}_{i}}.\end{eqnarray} \tag{ 2 }$

At the line center, the peak line intensity 〈T_P〉 is therefore

$\begin{eqnarray}&&\langle {T}_{P}\rangle =\displaystyle \frac{{\sum }_{i=1}^{n}{T}_{P,i}{w}_{i}}{{\sum }_{i=1}^{n}{w}_{i}}.\end{eqnarray} \tag{ 3 }$

The average (uncorrelated) rms spectral noise is

$\begin{eqnarray}&&\langle \sigma \rangle ={\left[\displaystyle \frac{{\sum }_{i=1}^{n}{\sigma }_{i}^{2}{w}_{i}^{2}}{{\left({\sum }_{i=1}^{n}{w}_{i}\right)}^{2}}\right]}^{0.5}.\end{eqnarray} \tag{ 4 }$

The S/N in the average spectrum is then

$\begin{eqnarray}&&{\rm{S}}/{\rm{N}}=C{\left(\displaystyle \frac{{\rm{\Delta }}V}{{\rm{\Delta }}\lambda }\right)}^{0.5}\displaystyle \frac{{\sum }_{i=1}^{n}{T}_{P,i}{w}_{i}}{{\left({\sum }_{i=1}^{n}{\sigma }_{i}^{2}{w}_{i}^{2}\right)}^{0.5}}.\end{eqnarray} \tag{ 5 }$

Equation (5) is the fundamental equation that governs the increase in S/N when averaging multiple spectra with weighting w_i, assuming uncorrelated noise.

We discuss three weighting schemes below. An observer's choice of weighting is dictated by their science goals and the availability of information about their data. If the noise and peak intensity are the same for all spectra such that σ_i = σ₀ and T_P,i = T_P,0, for the weighting schemes considered here Equation (5) reduces to

$\begin{eqnarray}&&{\rm{S}}/{\rm{N}}=C{\left(\displaystyle \frac{{\rm{\Delta }}V}{{\rm{\Delta }}\lambda }\right)}^{0.5}\displaystyle \frac{{T}_{P,0}}{{\sigma }_{0}}{n}^{0.5}.\end{eqnarray} \tag{ 6 }$

Equation (6) approximates the S/N when averaging multiple spectra taken of the same source with the same integration times and observing conditions. If the noise is correlated between spectra, then the noise term in Equation (5) will include covariances between the spectra and the exponent of n in Equation (6) will be less than 0.5. One can recover the dependence on n in Equation (6) by considering only the number of independent spectra.

If the noise is correlated, the average noise decreases slowly when averaging and therefore the exponent in the term n^0.5 decreases and the S/N increases more slowly than in the uncorrelated case. One can recover the expected dependence on n by considering the number of independent samples.

2.1. Intensity-noise Weighting

For "intensity-noise weighting,"

$\begin{eqnarray}&&{w}_{i}={T}_{P,i}{\sigma }_{i}^{-2}.\end{eqnarray} \tag{ 7 }$

Using this weighting will bias the average peak line intensity by the highest values of T_P,i:

$\begin{eqnarray}&&{\rm{S}}/{\rm{N}}=C{\left(\displaystyle \frac{{\rm{\Delta }}V}{{\rm{\Delta }}\lambda }\right)}^{0.5}{\left({\sum }_{i=1}^{n}{T}_{P,i}^{2}{\sigma }_{i}^{-2}\right)}^{0.5}.\end{eqnarray} \tag{ 8 }$

If all spectra have the same (uncorrrelated) noise σ₀, Equation (8) reduces to

$\begin{eqnarray}&&{\rm{S}}/{\rm{N}}=C{\left(\displaystyle \frac{{\rm{\Delta }}V}{{\rm{\Delta }}\lambda }\right)}^{0.5}\displaystyle \frac{{\left({\sum }_{i=1}^{n}{T}_{P,i}^{2}\right)}^{0.5}}{{\sigma }_{0}}.\end{eqnarray} \tag{ 9 }$

If all signal strengths T_P,0 are the same but the noise is variable, as in the case of averaging data taken of the same source under different observing conditions or integration times, Equation (8) becomes

$\begin{eqnarray}&&{\rm{S}}/{\rm{N}}=C{\left(\displaystyle \frac{{\rm{\Delta }}V}{{\rm{\Delta }}\lambda }\right)}^{0.5}{T}_{P,0}{\left({\sum }_{i=1}^{n}{\sigma }_{i}^{-2}\right)}^{0.5}.\end{eqnarray} \tag{ 10 }$

2.2. Noise Weighting

For "noise weighting,"

$\begin{eqnarray}&&{w}_{i}={\sigma }_{i}^{-2}.\end{eqnarray} \tag{ 11 }$

This weighting will bias the average peak line intensity toward spectra with lower noise:

$\begin{eqnarray}&&{\rm{S}}/{\rm{N}}=C{\left(\displaystyle \frac{{\rm{\Delta }}V}{{\rm{\Delta }}\lambda }\right)}^{0.5}\displaystyle \frac{{\sum }_{i=1}^{n}{T}_{P,i}{\sigma }_{i}^{-2}}{{\left({\sum }_{i=1}^{n}{\sigma }_{i}^{-2}\right)}^{0.5}}.\end{eqnarray} \tag{ 12 }$

If all spectra have the same (uncorrelated) noise σ₀, Equation (12) reduces to

$\begin{eqnarray}&&{\rm{S}}/{\rm{N}}=C{\left(\displaystyle \frac{{\rm{\Delta }}V}{{\rm{\Delta }}\lambda }\right)}^{0.5}\displaystyle \frac{{\sum }_{i=1}^{n}{T}_{P,i}}{{\sigma }_{0}}{n}^{-0.5}.\end{eqnarray} \tag{ 13 }$

If all signal strengths T_P,0 are the same, but the noise is variable, we again find Equation (10).

2.3. Uniform Weighting

For "Uniform weighting,"

$\begin{eqnarray}&&{w}_{i}=1\end{eqnarray} \tag{ 14 }$

and

$\begin{eqnarray}&&{\rm{S}}/{\rm{N}}=C{\left(\displaystyle \frac{{\rm{\Delta }}V}{{\rm{\Delta }}\lambda }\right)}^{0.5}\displaystyle \frac{{\sum }_{i=1}^{n}{T}_{P,i}}{{\left({\sum }_{i=1}^{n}{\sigma }_{i}^{2}\right)}^{0.5}}.\end{eqnarray} \tag{ 15 }$

If all spectra have the same (uncorrelated) noise σ₀, we again find Equation (13). If all signal strengths T_P,0 are the same but the noise is variable, we again find Equation (10).

2.4. Maximizing the Signal to Noise Ratio

For a given weighting scheme and averaging method, the optimal value for n is often found when the S/N reaches a maximum value S/N ${}_{\max }$ , or when

$\begin{eqnarray}&&\displaystyle \frac{d}{{dn}}({\rm{S}}/{\rm{N}})=0.\end{eqnarray} \tag{ 16 }$

For intensity-noise weighting, or in the case that all spectra have the same values of T_P,i and σ₀, averaging all available spectra will result in the highest S/N (see Equation (6)). For noise and uniform weighting, if the values of T_P,i or σ_i are different, the ideal number of spectra to average may be less than the total number of spectra available.

To determine the ideal number of spectra to average for noise and uniform weighting, our method requires that the spectra be ordered by decreasing S/N. For individual spectra,

$\begin{eqnarray}&&{\rm{S}}{/{\rm{N}}}_{i}\propto \displaystyle \frac{{T}_{P,i}}{{\sigma }_{i}}.\end{eqnarray} \tag{ 17 }$

To determine n and S/N ${}_{\max }$ , one therefore must:

1.
compute or estimate the peak line intensity, T_P,i, and the rms spectral noise, σ_i, for all spectra;
2.
order the spectra in terms of S/N_i (using Equation (17));
3.
determine when the average S/N is maximized (S/N ${}_{\max }$ ), either theoretically using Equation (5) or by fitting the average spectra with a model.

Below, we use this method to estimate S/N ${}_{\max }$ and n for simulated distributions of T_P.

3. Simulated Signal and Noise Distributions

We perform Monte Carlo simulations to assess the effects of different intensity and noise distributions, as well as weighting schemes.

3.1. Distributions for T_P

We investigate two characteristic distributions for T_P: half-normal and power law. We plot the distributions in Figure 1, for a range of half-normal standard deviations (see Section 3.1.1) and power law indices (see Section 3.1.2). The half-normal distribution is what is measured from a compact source and a Gaussian telescope response, whereas the power law distributions are meant to model diffuse (low power law indices) and compact (high power law indices) sources. For both distributions, we assume that the distribution of noise values is Gaussian, characterized by a mean value of σ₀ and a standard deviation of s_σ (measured in units of the index).

3.1.1. Half-normal Distribution for T_P

We explore how a half-normal distribution of T_P affects the derived values of n and S/N. A half-normal distribution can be a good approximation for data sets where the brightest spectra have a much higher signal strength than the mean. If T_P follows a half-normal distribution,

$\begin{eqnarray}&&{T}_{P,i}={T}_{P,\max }\exp \left(-\displaystyle \frac{{i}^{2}}{2{s}_{T}^{2}}\right),\end{eqnarray} \tag{ 18 }$

where ${T}_{P,\max }$ is the maximum line height in the data set and the standard deviation in the distribution of T_P is s_T (measured in units of the index). From Equation (5), the S/N is then

$\begin{eqnarray}&&\begin{array}{l}{\rm{S}}/{\rm{N}}={{CT}}_{P,\max }{\left(\displaystyle \frac{{\rm{\Delta }}V}{{\rm{\Delta }}\lambda }\right)}^{0.5}\displaystyle \frac{{\sum }_{i=1}^{n}{w}_{i}\exp \left(-\tfrac{{i}^{2}}{2{s}_{T}^{2}}\right)}{{\left({\sum }_{i=1}^{n}{\sigma }_{i}^{2}{w}_{i}^{2}\right)}^{0.5}}\end{array}\end{eqnarray} \tag{ 19 }$

To illustrate the basic functional dependencies, we can assume that the noise is uncorrelated, is the same in all spectra to be averaged, and is equal to σ₀. In this case, for intensity-noise weighting, we have

$\begin{eqnarray}&&{\rm{S}}/{\rm{N}}=C\displaystyle \frac{{T}_{P,\max }}{{\sigma }_{0}}{\left(\displaystyle \frac{{\rm{\Delta }}V}{{\rm{\Delta }}\lambda }\right)}^{0.5}\exp ({s}_{T}){\left[\displaystyle \sum _{i=1}^{n}\exp (-{i}^{2})\right]}^{0.5}.\end{eqnarray} \tag{ 20 }$

For noise and uniform weighting, we have

$\begin{eqnarray}&&{\rm{S}}/{\rm{N}}=C\displaystyle \frac{{T}_{P,\max }}{{\sigma }_{0}}{\left(\displaystyle \frac{{\rm{\Delta }}V}{{\rm{\Delta }}\lambda }\right)}^{0.5}\exp (0.5{s}_{T}){\left[\displaystyle \sum _{i=1}^{n}\exp (-{i}^{2})\right]}^{0.5}.\end{eqnarray} \tag{ 21 }$

As can be seen in Equations (20) and (21), in the case of constant noise the S/N depends on the ratio of the maximum line intensity divided by the noise, rather than the individual value of either quantity. We use this ratio to parameterize the simulations.

We create 100 simulated peak signal and noise distributions for values of ${T}_{P,\max }/{\sigma }_{0}$ of 0.01, 0.05, 0.1, 0.5, 1, 2, and 5 in two noise distributions: "constant" noise (all spectra have the same noise value) and Gaussian noise with s_σ = 0.1 (each spectrum has a noise value drawn randomly from a normal distribution). All signal distributions have a standard deviation s_T = 1.5. For the Gaussian noise trials, we split the analysis into two categories: (1) the estimation of the signal strength is unaffected by the noise; and (2) the estimation of the signal strength is modified by the normal distribution of standard deviation s_σ. The former case represents the theoretical situation when noise does not affect the estimation of the signal; the latter case is more realistic. We analyze both cases to determine how noise affects the analysis.

For each set of distributions, we estimate S/N ${}_{\max }$ using Equation (5). For noise and uniform weighting trials, we additionally compute the signal at the maximum S/N compared to the maximum signal, ${T}_{P,i=n}/{T}_{P,\max }$ . We give our results from all three weighting schemes in Table 1 and show the noise-weighting analysis in Figure 2 (uniform weighting produces nearly identical results).

**Figure 2.** S/N analysis for half-normal-distributed values of T_P, with noise weighting. The S/N in intensity-noise weighting (not shown) increases without bound whereas uniform weighting (also not shown) produces nearly identical results to those of noise weighting. Panels in the left column show ${T}_{P,{\max }}/{\sigma }_{0}=0.1$ and those of the right column show ${T}_{P,\max }/{\sigma }_{0}=1.0$ . The top row of panels has s_σ = 0 (constant noise) and the bottom row has s_σ = 0.1σ₀. In all panels, solid lines show individual values and dashed lines show integrated values. The blue curves show the S/N (and use the left y-axis), the red curves show T_P, and the green curves show the noise (both use the right y-axis). The shaded regions in the lower panels show the standard deviations from the Monte Carlo simulations. The vertical gray lines indicate the peaks of the S/N distributions; the gray shaded areas show the range within one standard deviation of the S/N distributions.
Download figure:
Standard image High-resolution image

**Figure 2.** S/N analysis for half-normal-distributed values of T_P, with noise weighting. The S/N in intensity-noise weighting (not shown) increases without bound whereas uniform weighting (also not shown) produces nearly identical results to those of noise weighting. Panels in the left column show ${T}_{P,{\max }}/{\sigma }_{0}=0.1$ and those of the right column show ${T}_{P,\max }/{\sigma }_{0}=1.0$ . The top row of panels has s_σ = 0 (constant noise) and the bottom row has s_σ = 0.1σ₀. In all panels, solid lines show individual values and dashed lines show integrated values. The blue curves show the S/N (and use the left y-axis), the red curves show T_P, and the green curves show the noise (both use the right y-axis). The shaded regions in the lower panels show the standard deviations from the Monte Carlo simulations. The vertical gray lines indicate the peaks of the S/N distributions; the gray shaded areas show the range within one standard deviation of the S/N distributions.
Download figure:
Standard image High-resolution image

Table 1. S/N Analysis for Half-normal Intensity Distributions

		Intensity-noise	Noise		Uniform
	${T}_{P,\max }/{\sigma }_{0}$	S/N ${}_{\max }$	S/N ${}_{\max }$	${T}_{P,i=n}/{T}_{P,\max }$	S/N ${}_{\max }$	${T}_{P,i=n}/{T}_{P,\max }$
σ_i = σ₀	0.01	0.037	0.035	0.41	0.035	0.41
	0.05	0.19	0.18	0.41	0.18	0.41
	0.1	0.37	0.35	0.41	0.35	0.41
	0.5	1.9	1.8	0.41	1.8	0.41
	1.0	3.7	3.6	0.41	3.6	0.41
	5.0	19	18	0.41	18	0.41

s_σ = 0.1σ₀	0.01	0.038	0.036	0.42	0.035	0.37
	0.05	0.18	0.19	0.42	0.18	0.39
	0.1	0.38	0.35	0.43	0.35	0.40
	0.5	1.9	1.8	0.42	1.7	0.39
	1.0	3.8	3.6	0.43	3.5	0.39
	5.0	19	18	0.42	18	0.40

	0.01	0.038	0.026	0.67	0.038	0.26
	0.05	0.19	0.13	0.66	0.13	0.65
s_σ = 0.1σ₀	0.1	0.38	0.27	0.68	0.26	0.65
T_P,i modified	0.5	1.9	1.7	0.60	1.7	0.61
	1.0	3.8	3.5	0.49	3.5	0.48
	5.0	19	18	0.42	18	0.40

Download table as: ASCII Typeset image

Intensity-noise weighting leads to an increase in the S/N without bound and therefore sets S/N ${}_{\max }$ for any averaging method. For constant-noise half-normal signal distributions and noise or uniform weighting we find:

1.
S/N ${}_{\max }\simeq 3.5{T}_{P,\max }/{\sigma }_{0};$
2.
S/N ${}_{\max }$ is obtained when averaging all spectra satisfying ${T}_{P}\gtrsim 0.4{T}_{P,\max };$
3.
S/N ${}_{\max }$ is ∼5% less than that from averaging all spectra using intensity-noise weighting, assuming T_P can be reliably estimated.
4.
The S/N can decrease by up to 30% relative to S/N ${}_{\max }$ when averaging spectra down to ${T}_{P}/{T}_{P,\max }\simeq 0.01$ .
5.
S/N ${}_{\max }$ for noise and uniform weighting is ∼95% that found for intensity-noise weighting.

These relationships also hold for the variable noise distributions when the noise and signal strength are uncorrelated. The above are theoretical best-case scenarios. If noise affects the estimation of the signal strength, as it does for actual data, the inability to reliably order the highest S/N spectra affects the S/N; these effects are larger if the noise is comparable to the signal.

3.1.2. Power Law distribution for T_P

We perform a similar analysis assuming a power law distribution for T_P:

$\begin{eqnarray}&&{T}_{P,i}={T}_{P,\max }{i}^{-\alpha },\end{eqnarray} \tag{ 22 }$

where the maximum value is ${T}_{{\rm{P}},\max }$ and α is the power law index. The relevant S/N equation is then

$\begin{eqnarray}&&{\rm{S}}/{\rm{N}}={{CT}}_{P,\max }{\left(\displaystyle \frac{{\rm{\Delta }}V}{{\rm{\Delta }}\lambda }\right)}^{0.5}\displaystyle \frac{{\sum }_{i=1}^{n}{w}_{i}\ {i}^{-\alpha }}{{\left({\sum }_{i=1}^{n}{w}_{i}^{2}{\sigma }_{i}^{2}\right)}^{0.5}}\end{eqnarray} \tag{ 23 }$

In the case of constant noise, for intensity-noise weighting, we have

$\begin{eqnarray}&&{\rm{S}}/{\rm{N}}=C\displaystyle \frac{{T}_{P,\max }}{{\sigma }_{0}}{\left(\displaystyle \frac{{\rm{\Delta }}V}{{\rm{\Delta }}\lambda }\right)}^{0.5}{\left(\displaystyle \sum _{i=1}^{n}{i}^{-2\alpha }\right)}^{0.5}.\end{eqnarray} \tag{ 24 }$

For noise and uniform weighting, we have

$\begin{eqnarray}&&{\rm{S}}/{\rm{N}}=C\displaystyle \frac{{T}_{P,\max }}{{\sigma }_{0}}{\left(\displaystyle \frac{{\rm{\Delta }}V}{{\rm{\Delta }}\lambda }\right)}^{0.5}\left(\displaystyle \sum _{i=1}^{n}{i}^{-\alpha }\right){n}^{-0.5}.\end{eqnarray} \tag{ 25 }$

Once again we see that the S/N is linearly proportional to the ratio of the maximum peak to the standard deviation.

We investigate the effect of different power law distributions for α = 0.5 to 4.5 in increments of 0.5 with ${T}_{P},\max /{\sigma }_{0}=1.0$ , and for ${T}_{P},\max /{\sigma }_{0}=0.1,0.5,1.0,5.0,10$ and 50 with α = 2.0. We show these results in Figure 3 and in Table 2. We do not consider variable noise (and so set s_σ = 0), which we assume has a minor effect, as it does for the half-normal distributions. For a constant-noise power law signal distribution with noise or uniform weighting, we find:

1.
S/N ${}_{\max }$ decreases with increasing power law index and decreasing values of ${T}_{P,\max }/{\sigma }_{0};$
2.
S/N ${}_{\max }$ is obtained when averaging all spectra satisfying ${T}_{P}\gtrsim 0.35{T}_{P},\max$ for the values of α and ${T}_{P},\max /{\sigma }_{0}$ investigated;
3.
As for the half-normal signal distribution, S/N ${}_{\max }$ is ∼5% less than that from averaging all spectra using intensity-noise weighting.
4.
The S/N can decrease by up to 30% relative to S/N ${}_{\max }$ when averaging spectra down to ${T}_{P}/{T}_{P,\max }\simeq 0.01$ .
5.
S/N ${}_{\max }$ for noise and uniform weighting is ∼95% that found for intensity-noise weighting.

**Figure 3.** S/N analysis for power law-distributed values of T_P, with noise weighting. Uniform weighting (not shown) produces nearly identical results.. All panels have ${T}_{P},\max /{\sigma }_{0}=1.0$ , and clockwise from the top-left panel α ranges from 1 to 4 in integer steps. In all panels, solid lines show individual values and dotted lines show integrated values. The blue curves show the S/N, the red curves show T_P, and the green curves show the noise. The vertical gray lines indicates the peaks of the S/N distribution.
Download figure:
Standard image High-resolution image

**Figure 3.** S/N analysis for power law-distributed values of T_P, with noise weighting. Uniform weighting (not shown) produces nearly identical results.. All panels have ${T}_{P},\max /{\sigma }_{0}=1.0$ , and clockwise from the top-left panel α ranges from 1 to 4 in integer steps. In all panels, solid lines show individual values and dotted lines show integrated values. The blue curves show the S/N, the red curves show T_P, and the green curves show the noise. The vertical gray lines indicates the peaks of the S/N distribution.
Download figure:
Standard image High-resolution image

Table 2. S/N Analysis for Power Law Intensity Distributions

		Intensity-noise	Noise		Uniform
${T}_{P,\max }/{\sigma }_{0}$	α	S/N ${}_{\max }$	S/N ${}_{\max }$	${T}_{P,i=n}/{T}_{P,\max }$	S/N ${}_{\max }$	${T}_{P,i=n}/{T}_{P,\max }$
1.0	0.5	4.5	4.3	0.39	4.3	0.39
1.0	1.0	3.7	3.5	0.36	3.5	0.36
1.0	1.5	3.2	3.0	0.34	3.0	0.34
1.0	2.0	2.9	2.7	0.35	2.7	0.35
1.0	2.5	2.6	2.4	0.33	2.4	0.33
1.0	3.0	2.5	2.3	0.33	2.3	0.33
1.0	3.5	2.3	2.1	0.35	2.1	0.35
1.0	4.0	2.2	2.0	0.35	2.0	0.35
1.0	4.5	2.1	1.9	0.36	1.9	0.36
0.1	2.0	0.029	0.027	0.35	0.027	0.35
0.5	2.0	0.14	0.13	0.35	0.13	0.35
1.0	2.0	0.29	0.27	0.35	0.27	0.35
5.0	2.0	1.4	1.3	0.35	1.3	0.35
10	2.0	2.9	2.7	0.35	2.7	0.35
50	2.0	14	13	0.35	13	0.35

Download table as: ASCII Typeset image

4. Application to GDIGS Data

The GBT Diffuse Ionized Gas (GDIGS) survey (Anderson et al. 2021) traced the radio recombination line (RRL) emission across the inner Galaxy, over −5° < ℓ < 32°, ∣ b ∣ < 0.5°. The data were collected using the C-band receiver on the Green Bank Telescope (GBT) in total power mode. Within the 4–8 GHz bandpass, GDIGS tuned to 15 usable hydrogen RRLs and averaged their signals to produce the reduced H n α data set. The reduced data have a spatial resolution of 2 farcm 65, a spaxel size of 30'', and a spectral resolution of 0.5 km s⁻¹. The rms spectral noise per spaxel is ∼10 mK.

We test the above spectral averaging methods using GDIGS data to constrain the ionic ⁴He⁺/ H⁺ abundance ratio by number, y⁺. Measurements of elemental abundances provide key constraints for our understanding of Galactic chemical evolution. We define y⁺ as

$\begin{eqnarray}&&{y}^{+}=\displaystyle \frac{{T}_{{\rm{P}},\ \mathrm{He}}{\rm{\Delta }}{V}_{\mathrm{He}}}{{T}_{{\rm{P}},\ {\rm{H}}}{\rm{\Delta }}{V}_{{\rm{H}}}}\end{eqnarray} \tag{ 26 }$

where T_P is the peak line intensity and ΔV is the FWHM line width. The uncertainty on y⁺ is therefore

$\begin{eqnarray}&&\begin{array}{l}{\sigma }_{{y}^{+}}={y}^{+}\left[{\left(\displaystyle \frac{{\sigma }_{{T}_{{\rm{P}},\mathrm{He}}}}{{T}_{{\rm{P}},\mathrm{He}}}\right)}^{2}+{\left(\displaystyle \frac{{\sigma }_{{\rm{\Delta }}{V}_{\mathrm{He}}}}{{\rm{\Delta }}{V}_{\mathrm{He}}}\right)}^{2}\right.\\ +{\left.{\left(\displaystyle \frac{{\sigma }_{{T}_{{\rm{P}},{\rm{H}}}}}{{T}_{{\rm{P}},{\rm{H}}}}\right)}^{2}+{\left(\displaystyle \frac{{\sigma }_{{\rm{\Delta }}{V}_{{\rm{H}}}}}{{\rm{\Delta }}{V}_{{\rm{H}}}}\right)}^{2}\right]}^{0.5},\end{array}\end{eqnarray} \tag{ 27 }$

where σ denotes parameter uncertainties. If the source is optically thin, y⁺ measures the ⁴He⁺/ H⁺ abundance ratio directly.

Because the mass of helium is greater than that of hydrogen, its RRL velocity is shifted by ∼ −122 km s⁻¹ from that of hydrogen. Both lines therefore fall within the same GDIGS bandpass and are subject to the same systematic effects.

To spectrally average GDIGS RRL data using intensity-noise weighting, we:

1.
align the spectra in velocity using the velocity centroids from the Automatic Gaussian Decomposition (AGD) results described in Anderson et al. (2021) (which in turn use the code from Riener et al. 2019);
2.
average all spectra;
3.
remove a fifth-order polynomial baseline and determine the S/N in the average spectrum using Gaussian fits to the hydrogen RRLs.

To spectrally average GDIGS RRL data using noise or uniform weighting, we:

1.
determine the S/N and peak intensity for each spaxel using the results from the AGD analysis;
2.
align the spectra in velocity using the velocity centroids from the AGD analysis;
3.
average spectra, starting with the highest S/N spectrum;
4.
remove a fifth-order polynomial baseline from line-free portions of the spectrum;
5.
determine the S/N in the average spectrum using Gaussian fits to the hydrogen RRLs;
6.
and cease averaging when the average spectrum S/N stops increasing, with a buffer of 100 spectra (once a peak in S/N is reached, continue averaging the next 100 to determine if the S/N peak is local).

For all weighting schemes, we only use spaxels fit by a single Gaussian component in the AGD. We determine y⁺ for all average spectra by fitting the helium line using velocities from −135 to −110 km s⁻¹ and the hydrogen line using velocities from −20 to +20 km s⁻¹.

We perform this analysis on the GDIGS H n α data in a $100^{\prime} \times 60^{\prime}$ zone centered on the massive star-forming region W43 that was first analyzed in Luisi et al. (2020). The GDIGS data of this zone has 24,000 spectra, of which 19,849 are fit in the AGD with a single hydrogen line. This zone has numerous H ii regions and also diffuse ionized gas (see Luisi et al. 2020). We show the distribution of AGD-derived peak line intensities in Figure 4 for the 1000 highest-intensity values in the field. We also separate this distribution into those derived from spaxels falling within H ii regions defined by the WISE Catalog of Galactic H ii Regions (Anderson et al. 2014, hereafter the "WISE Catalog"), and those that do not fall within H ii regions. The peak line intensities approximately follow a half-normal distribution of 4000 values with s_T = 400, ${T}_{P,\max }=10\,\,{\rm{K}}$ , and that is scaled so the minimum value is 0.14 K (a power law with α = 4 also fits fairly well). The exception to this good fit is at intensities ≳0.6 K where the model over-predicts the data. Thus, the signal distribution is more complicated than the simulated distributions considered here. Most of the values, and a greater fraction of the high-intensity values, are associated with H ii regions.

**Figure 4.** GDIGS data toward W43. Shown are the 1000 highest intensity fitted line height values (T_P), separated into those that are spatially coincident with H ii regions ("H ii") and those that are not ("DIG"; diffuse ionized gas). The "H ii" spectra are more numerous in this field at all intensities studied, and all intensities T_P > 0.4 K are cospatial with H ii regions. A half-normal distribution fits the lower intensity values well, but drastically over-predicts the high values, indicating that the intensity distribution is more complicated than the simple models considered here.
Download figure:
Standard image High-resolution image

We create five different average spectra and compute y⁺ for each: intensity-noise weighting all spectra, noise weighting with S/N maximization, uniform weighting with S/N maximization, noise weighting all spectra and uniform weighting all spectra. The noise and uniform S/N maximizations use 569 and 1102 of the ∼20,000 spectra, respectively, corresponding to approximate intensity values of T_P > 0.15 K and T_P > 0.14 K.

We show the five average spectra in Figure 5. Each spectrum is independently normalized. All five spectra have the same basic shape, although the S/N maximization spectra have the smallest deviations from a single Gaussian line. In Table 3 we summarize the H and He line height (T_P) and FWHM line width (ΔV) for the H and He RRLs, as well as their 1σ fit uncertainties, y⁺ (Equation (26)) and its 1σ uncertainty (Equation (27)), and the spectral rms σ. The derived values of y⁺ differ depending on the averaging method and the weighting scheme. Differences in y⁺ are not accounted for by the uncertainties in ${\sigma }_{{y}^{+}}$ . As expected, uniformly weighting all spectra results in the largest rms spectral noise; the other spectral noise values are similar.

Table 3. Analysis of GDIGS data^a

	H				He
Weighting	T_P	${\sigma }_{{T}_{P}}$	ΔV	σ_ΔV	${T}_{P,\max }$	${\sigma }_{{T}_{P}}$	ΔV	σ_ΔV	y⁺	${\sigma }_{{y}^{+}}$	σ
	( K)	( K)	( km s⁻¹)	( km s⁻¹)	( K)	( K)	( km s⁻¹)	( km s⁻¹)			( K)
Intensity-Noise (All)	1.00	0.00329	28.4	0.133	0.0482	0.000677	21.6	0.532	0.037	0.0011	0.0028
S/N Max (Noise)	1.00	0.00287	27.9	0.112	0.0578	0.000989	21.0	0.611	0.044	0.0015	0.0039
S/N Max (Unweighted)	1.00	0.00445	29.0	0.166	0.0571	0.000675	21.5	0.445	0.042	0.0011	0.0037
Noise (All)	1.00	0.00364	29.6	0.160	0.0438	0.000733	23.1	0.739	0.034	0.0013	0.0036
Unweighted (All)	1.00	0.00383	29.8	0.170	0.0460	0.00153	22.2	1.34	0.034	0.0024	0.0076

Note.

^aAll intensity values are normalized such that the hydrogen line intensity has a value of 1.00.

Download table as: ASCII Typeset image

5. Discussion and Summary

In this paper we explored methods for averaging spectra. Intensity-noise weighting leads to the highest possible S/N. For noise and uniform weighting, averaging the 35%–45% highest intensity individual spectra (assuming similar noise characteristics for each) results in the maximum S/N average spectrum, in agreement with the results of Zhang & McElvain (1999). This average spectrum created from the 35%–45% highest intensity individual spectra has ∼95% the S/N of the intensity-noise weighted average spectrum. Our results are largely independent of the intensity distribution; other peaked signal distributions should have similar results.

We apply our averaging methods to Green Bank Telescope (GBT) Diffuse Ionized Gas (GDIGS) H n α data (Anderson et al. 2021) to determine the ionic abundance ratio, y⁺. The different averaging methods give values of y⁺ that differ by ∼25%.

Differences in the derived values of y⁺ can be explained by which locations are weighted more heavily during averaging. Intensity-noise weighting obviously preferences the spectra with the highest peak intensity. For GDIGS, the highest intensities are found toward discrete H ii regions; the highest intensity diffuse regions are found just outside of the discrete H ii regions (see Luisi et al. 2020). Noise weighting preferences the spectra with the lowest noise, whereas uniform weighting weights all spectra evenly. Since H ii regions have bright radio continuum emission, noise weighting can preference the diffuse regions. The S/N maximization method only averages the highest S/N spectra, which means that only the brightest regions may appear in the average, regardless of their noise levels.

That the value derived for y⁺ depends on the weighting scheme employed indicates that there are differences in y⁺ in the GDIGS field studied; if y⁺ were invariant, all averaging techniques would produce the same result. This piece of evidence is not as apparent without averaging, as the He RRL signal that goes into the y⁺ computation is weak and can only be seen in a fraction of the GDIGS spectra. We caution that studies of y⁺ that include a range of intensity values (i.e., from both H ii regions and from diffuse ionized gas, as in our example) will be biased depending on the weighting scheme. In future research with the GDIGS data, we will investigate and model y⁺ over the survey area with these considerations in mind.

The S/N maximizing procedure allows for the creation of more sensitive spectra, and therefore a more accurate determination of y⁺, but the derived y⁺ values in all average GDIGS spectra are low relative to those found previously for Galactic H ii regions. For comparison, an analysis of the 80 high-quality RRL spectra toward H ii regions in Quireza et al. (2006) by Wenger et al. (2013) found y⁺ = 0.075 ± 0.024. Wenger et al. (2013) found y⁺ = 0.068 ± 0.023 in a sample of 54 high-quality RRL spectra toward Galactic H ii regions. For the H ii region W43, which is in the studied field, y⁺ = 0.068 ± 0.0052 Bania et al. (1997, 2007). It may be that the inclusion of the diffuse ionized gas outside of H ii regions has caused the discrepancy with values derived for H ii regions; we will investigate the cause of the low y⁺ values in a subsequent paper.

Acknowledgments

We thank the anonymous referee, whose helpful comments greatly improved the clarity of this manuscript. This work is supported by NSF grant AST1516021 to L.D.A. T.V.W. is supported by a National Science Foundation Astronomy and Astrophysics Postdoctoral Fellowship under award AST2202340. T.M.B. is supported by NSF grant AST1714688. The National Radio Astronomy Observatory is a facility of the National Science Foundation operated under cooperative agreement by Associated Universities, Inc. The Green Bank Observatory is a facility of the National Science Foundation operated under cooperative agreement by Associated Universities, Inc.

Methods for Averaging Spectral Line Data

Article metrics

Author e-mails

Author affiliations

ORCID iDs

Dates

Abstract

1. Introduction