Regionalization of hydroclimate variables in the contiguous United States

Carbone, Gregory J.; Gao, Peng; Lu, Junyu

doi:10.1007/s00704-024-04903-z

Regionalization of hydroclimate variables in the contiguous United States

Original Paper
Open access
Published: 08 March 2024

(2024)
Cite this article

Download PDF

You have full access to this open access article

Theoretical and Applied Climatology Aims and scope Submit manuscript

Regionalization of hydroclimate variables in the contiguous United States

Download PDF

228 Accesses
Explore all metrics

Abstract

We apply a hierarchical clustering algorithm to the Parameter-elevation Relationships on Independent Slopes Model (PRISM) database. The method employs linkage clustering while forcing spatial contiguity. We apply it to the lower-48 United States, deriving regions that are based on temperature and precipitation averages and anomalies, as well as statistical parameters underlying several drought and intense precipitation measures. Resulting regions make intuitive sense from the perspective of driving influences on temperature and precipitation averages and anomalies, and are compatible with results from another empirically derived clustering scheme. Regions selected for individual variables show high similarity across different time frames. There is slightly less similarity when comparing regions created for different monthly or daily hydroclimate variables, and relatively low similarity between monthly vs. daily measures. It is unlikely that any one regionalization solution could summarize hydroclimate extremes given the wide range of variables used to describe them, but geographically sensitive datasets like PRISM and flexible algorithms provide useful methods for regionalization that can aid in drought monitoring and forecasting, and with impacts and planning associated with heavy precipitation.

Ecoregions of the Conterminous United States: Evolution of a Hierarchical Spatial Framework

Article 16 September 2014

Geostatistical Models and Spatial Interpolation

1 Introduction

Constructing climate regions has both descriptive and applied purposes. Regionalization can reveal new aspects of data, inform data stratification, or stimulate hypotheses about physical drivers influencing climatological observations (Wilks 2019). It can also provide a rational basis for managing resources influenced by climate variability. Given the threat of drought or heavy precipitation, understanding coherent regions derived from long-term datasets with high spatial resolution would improve both theoretical and applied aspects of hydroclimate extremes.

Early rule-based regionalization schemes include those by Köppen (1900), who devised a simple scheme based on monthly temperature and precipitation values to mirror existing global natural vegetation regions, and by Thornthwaite (1931, 1948), who created regions based on potential evapotranspiration for North America. More recent delineation of climate regions emphasizes more empirical or data-driven approaches (e.g., Fovell and Fovell 1993; Fovell 1997). The practical application of coherent, statistically-based climate regions has prompted several efforts to construct them at regional to globe scales. For example, Wolter and Allured (2007), recognizing how such units might benefit drought monitoring and seasonal climate forecasts, established an iterative clustering scheme to distill data from the US Cooperative Network into coherent patterns of temperature and precipitation anomalies. Their rationale was that the spatial extent and pattern of past droughts should inform operational drought monitoring. They further argued that, since seasonal forecasts rely on the response of mid-latitude circulation to tropical Pacific sea-surface temperature and pressure anomalies, defining regions on the basis of historical temperature and precipitation anomalies produces logical spatial units to verify forecasts. Their effort defined functional regions as an alternative to the two dominant schemes used for long-term regional records and for seasonal climate forecasting – the National Oceanographic and Atmospheric Administration’s 344 climate divisions and its 102 climate forecast divisions, respectively. The former were constructed using somewhat arbitrary criteria (Guttman and Quayle 1996); the latter, while less constrained by these criteria, overlap significantly with the climate divisions and are not based on statistical properties of the climate record (Wolter and Allured 2007). Bieniek et al. (2012), also recognizing the shortcomings of the established climate divisions, constructed empirical climate regions for Alaska that better reflected temperature and precipitation response to a suite of teleconnections.

Statistically-based climate regions provide an appropriate scale to measure climate change, to avoid shortcomings associated with estimates at individual stations (Wu et al. 2018), to identify trends (Karl et al. 1994a, b; DeGaetano 2001; Bharath and Srinivas 2015), to assess the adequacy of observing networks (DeGaetano 2001; Bharath and Srinivas 2015), or to attribute changes to specific causes (Fazel et al. 2018). Data-derived regions are appropriate for evaluating model output from numerical weather prediction models (e.g., Argüeso et al. 2011) or control runs of general circulation models (Belda et al. 2015), or for creating climate change scenarios via downscaling from coarse to more local spatial scale (Maraun et al. 2010; Winkler et al. 2011; Carbone 2014; Perdinan and Winkler 2015).

Regionalization has many hydroclimate applications, including assessing the spatial and temporal patterns and determining causes of drought. A variety of clustering algorithms have been used to identify the duration, spatial locus and extent, and intensity associated with significant droughts (Andreadis et al. 2005; Xu et al. 2015) or to define homogenous drought monitoring regions (Ali et al. 2019). Spatial clustering informs frequency analysis to uncover drought periodicity and to provide clues for a drought’s causes (Vicente-Serrano 2006; Bordi and Sutera 2007; Santos et al. 2010; Gocic and Trajkovic 2014; Zhang et al. 2017; Manzano et al. 2019). Regionalization also can help improve existing methods for drought forecasts or for deriving future scenarios (Mishra and Singh 2011). Creating rational climate regions is equally relevant in analyzing heavy precipitation and flooding, as well as in storm-water and disaster and risk management (DeGaetano 2001; Du et al. 2014; Gao et al. 2018; Irwin et al. 2017). It is the basis for understanding heavy precipitation regions and is essential to stochastic storm transposition, wherein historic storms are superimposed on different regions as plausible heavy rain events. Resampling the intensity, duration, and spatial extent of a past event provides a complementary method for rainfall and flood frequency analysis (Yin et al. 2016; Wright et al. 2020) but demands coherency in the statistical properties used to construct intensity–duration–frequency (IDF) curves and other measures of probability and risk due to heavy precipitation (Gao et al. 2018). Hence, investigators have looked at spatial coherence in precipitation intensity, including diurnal rainfall patterns (Mooney et al. 2017), annual precipitation extremes (Yang et al. 2010), and variability in mountainous regions (Sugg and Konrad 2020).

While data-driven climate regions are often constructed from point data, gridded data sources abound. Derived from observational networks, reanalysis, or other model output, these data are an important part of modern climatological analysis. Gridded datasets often integrate in-situ measurements with areally-based data using multiple sources to maximize the spatial and temporal resolution. Examples include remotely sensed data, model output, and response variables such as vegetation (Rhee et al. 2008; Bieniek et al. 2012; Zhang et al. 2017). The resulting integrated products can be used to establish functional climate regions that respond to hydroclimate extremes.

In this paper, we apply a hierarchical clustering algorithm to statistical measures of the widely used Parameter-elevation Relationships on Independent Slopes Model (PRISM) temperature and precipitation database. We construct regions based on temperature and precipitation averages and anomalies, as well as the statistical parameters underlying the Standardized Precipitation Index (SPI), the Standardized Precipitation Evapotranspiration Index (SPEI), and 1-, 2-, 3-, and 4-day annual precipitation maxima. Our analysis offers an inherently areal perspective to regionalization with a gridded dataset widely employed in research and operations. The parameters upon which measures of drought or heavy precipitation are grounded both provide a rational basis for regionalization and assist in applications that depend on spatial coherence when applying probability distributions to regional data (Hosking and Wallis 1993; Bonnin et al. 2006). By clustering on the parameters of theoretical probability distributions, we consider a range of possible probability distributions beyond those provided by limited observed records (Wilks 2019). Clustering on these parameters emphasizes anomalies like those represented by the SPI and SPEI, return intervals associated with heavy precipitation, and operational products such as seasonal climate forecasts. Our analysis has three goals:

to apply the regionalization scheme, REDCAP, to the PRISM dataset and to compare our resulting regions against other data-driven spatial analysis and established aggregation units (e.g., climate divisions, forecast regions);
to measure how clustering solutions compare across hydroclimate variables and different periods of record; and
to determine the feasibility of creating coherent regions that simultaneously capture multiple hydroclimate extremes.

While clustering algorithms have been applied routinely to climatological data, this paper offers several innovations. We apply a clustering algorithm that enforces spatial contiguity and has not been used extensively with climatological data. By applying the algorithm to a popularly-used gridded data set, we seek to identify the degree to which different extreme hydroclimate variables can produce similar regions.

2 Data and methods

We conduct all analyses using temperature and precipitation values from the PRISM (all networks) dataset (Daly et al. 2008). PRISM data helps us address the challenge associated with the unit size of original datasets. Fovell and Fovell (1993) acknowledged that the climate divisions used to create their regions are more evenly sized in the eastern and southern USA, but more erratically sized in the west. We avoid this latent bias by clustering on the evenly-sized PRISM grid cells. These gridded data are interpolated from several different observation networks to a 4-km grid across the lower-48 United States, incorporating properties of location, elevation, coastal proximity, slope orientation, vertical atmospheric profiles, topographic position, and orographic effectiveness. Like other gridded datasets, PRISM provides the advantage of complete records and spatial coverage. After 2001, PRISM precipitation data incorporated radar inputs. PRISM data have been used as the basis of or to augment other regionalization studies (Abatzoglou et al. 2009; Bieniek et al. 2012; Sugg and Konrad 2020). All variables were derived from monthly PRISM data sets except for annual maximum precipitation, derived from the daily PRISM data set.

Our regionalization focuses on hydroclimate variables and includes independent analysis of the following:

Monthly average precipitation (12 variables, 1 for each month)
Monthly average temperature and precipitation (24 variables)
Combined temperature and precipitation anomalies (following Wolter and Allured 2007; Abatzoglou et al. 2009)
Drought indices (parameters used to calculate SPI and SPEI)
- 3-month SPI: two-parameter gamma distribution for each month (24 variables)
- 3-month SPEI: three-parameter log-logistic distribution for each month (36 variables)
Heavy precipitation: 1-, 2-, 3-, and 4-day annual maximum precipitation based on Anderson–Darling distance

We computed average monthly temperature and average total precipitation for each PRISM grid in each month for the period of record, 1895–2018. Monthly anomalies were calculated by subtracting the period-of-record average from each monthly observation. Our analysis for all monthly products includes three different time frames: 1895–2018, 1957–2018, and 1981–2018; in addition, we examined other time frames for specific comparison with previous studies. We do this not only because the parameters used to calculate SPI and SPEI are sensitive to record length (Wu et al. 2005; Vergni et al. 2017; Carbone et al. 2018), but also to consider changes that may be due to climate variability and change, or to changes in the number and distribution of stations used to produce the PRISM grids. While we cannot distinguish between these two factors, we can document how different time frames affect the coherency of regions inherent in this widely used dataset.

We computed SPI values following the method of McKee et al. (1993). Three‐month running precipitation totals were calculated for each grid and month during the period of record. Probability distribution functions (PDFs) were fit independently for each month to derive gamma shape (alpha) and scale (beta) parameters using maximum likelihood estimation (Wilks 2019). We used the three-parameter log–logistic function to fit the probability distribution of the precipitation—potential evapotranspiration difference (Vicente-Serrano et al. 2010). We calculated SPEI using the R package SPEIcalc (URL: http://sac.csic.es/spei) developed by Vicente-Serrano et al. (2010). To regionalize the characteristics of heavy precipitation, we summed total rainfall for consecutive 1-, 2-, 3-, and 4-day periods for all grid cells and each year from January 1, 1981, through December 31, 2018.

We adopted a hierarchical clustering method, Regionalization with Dynamically Constrained Agglomerative Clustering and Partitioning (REDCAP; Guo 2008), to identify clusters based on underlying parameters for the probability distributions used to calculate SPI (gamma), SPEI (Pearson-Type III), and 1- to 4-day precipitation (gamma). REDCAP is a widely used clustering algorithm with previous hydroclimate applications (Gao et al. 2018; Yang et al. 2020).REDCAP offers a flexible algorithm that overcomes limitations inherent in conventional clustering methods that do not consider spatial information, one of the key factors that shape climate regions. It uses the average linkage clustering method but directly enforces spatial contiguity during the clustering procedure forming spatially contiguous regions within which various analyses – such as climate model evaluation and resampling of rare or extreme events – could be conducted (Gao et al. 2015, 2018). By accommodating different variables derived from raw temperature and precipitation and various distance measurements (Table 1), REDCAP allows us to investigate multiple facets of hydroclimate and distill information from raw data.

Table 1 Hydroclimate variables and dissimilarity used in regionalization

Full size table

REDCAP identifies groups of grid cells (i.e., clusters) from PRISM data that are spatially contiguous and have similar statistical properties defined by each variable (details below). The algorithm considers each grid cell as an individual cluster. Then, it iteratively merges pairs of clusters that are spatially contiguous and have the highest similarity (or the lowest dissimilarity/distance). This follows the same clustering procedure as conventional clustering methods (e.g., average linkage clustering) except that REDCAP requires that pairs of clusters merged in each iteration must be spatially contiguous. When all grid cells are merged, a spatially contiguous dendrogram is constructed. REDCAP then partitions the spatially contiguous dendrogram into the desired number of subtrees assigned by a user, each forming a cluster of spatially contiguous grid cells. REDCAP can incorporate different spatial coherence measures by considering the distance and the dissimilarity between pairs of grid cells. We consider dissimilarity between each pair of grid cells using three distinct distance measures (Table 1). The first is Euclidian distance, applied to situations where each grid cell has multiple climate variables, i, associated with that cell. The distance (D) between a pair of grid cells m and n each of which has k climate variables is defined by Eq. 1.

$$D=\sqrt{{\sum }_{i=1}^{k}\begin{array}{cc}& {\left({m}_{i}-{n}_{i}\right)}^{2}\end{array}}$$

(1)

Euclidian distance is widely used in clustering algorithms.

The second distance measure follows that used by Wolter and Allured (2007) to compare our results with their statistically-derived climate divisions. Average temperatures and precipitation totals for every three-month season were computed for each grid cell for the entire study period. The anomaly at each grid cell was then calculated by subtracting the average for the same season. Finally, the dissimilarity is determined as one minus the correlation of pairwise anomalies.

The third distance measure is Anderson–Darling (AD) distance measuring the similarity of distributions of annual precipitation maxima. The AD distance for two series of n year annual maxima at grid cell a and b is defined by Eq. 2.

$$AD\;(a, b)=\frac{n}{2}{\int }_{-\infty }^{\infty }\frac{{({F}_{a}\left(x\right)-{F}_{b}(x))}^{2}}{{F}_{ab}\left(x\right)(1-{F}_{ab}(x))}d{F}_{ab}(x)$$

(2)

where F_a(x), F_b(x), and F_ab(x) are the sample distribution functions of a, b, and a combined sample of a and b, respectively. This measure suits the special nature of statistical properties associated with extreme values, such as annual precipitation maxima.

REDCAP produces any user-defined number of regions. We determined the number of clusters for each variable using two approaches: First, we selected the same number of regions used operationally or in prior studies. For example, we used 344 regions, the number of USA climate divisions used operationally and for which a 125-year historical record exists for many climate variables, including several important for hydroclimatology. We used 102 regions, the number of seasonal climate forecast regions used by NOAA’s Climate Prediction Center. We also selected 139 regions, the number established by Wolter and Allured (2007) when clustering three-month temperature and precipitation anomalies from 1978–2006 National Oceanographic and Atmospheric Administration Cooperative Observer Program (NOAA COOP) data.

Second, we used the “L method” to determine a statistically-optimal number of regions by analyzing within-region heterogeneity at each hierarchical level (Salvador and Chan 2005). Within-region heterogeneity is defined as the sum of squared deviations in each region plotted against the number of clusters. The L method identifies points of maximum curvature in the plot, assuming that an appropriate number of clusters often coincides with rapid changes in within-region heterogeneity as clusters are merged. Visually, the L method locates the ‘elbow’ in the plot. Mathematically, the L method divides the plot into two parts, those with a lower or higher number of clusters (L_c and R_c, respectively) for each possible number of clusters (c). Separate lines were fitted for L_c and R_c, and total RMSE (Root Mean Squared Error) at c was defined as:

$${RMSE}_{c}= \frac{c-1}{b-1}RMSE\left({L}_{c}\right)+ \frac{b-c}{b-1}RMSE({R}_{c})$$

(3)

where RMSE(L_c) and RMSE(R_c) are RMSEs of the lines in L_c and R_c respectively, and b is the largest possible number of clusters. The statistically optimal number of regions corresponds to the value of c that minimizes RMSEc. When b is very large, the optimal number results in fine-grain clusters with little practical meaning. To solve this problem, the L method iteratively updates the plot by assigning b to be twice the optimal number of clusters in the last iteration. It keeps searching for the optimal number of regions in the updated plot, ultimately yielding the optimal number of regions at each iteration.

Regionalization of different climatic variables yielded different sets of regions. We determined spatial coherence of these regions using Strehl and Ghosh’s (2002) index which measures the mutual spatial information shared by two random variables. Among a variety of cluster similarity indices, the Strehl and Ghosh index has several advantages. It is independent of the number of clusters or the number of elements, and it does not rely on distribution assumptions for hypothesis testing. The index is based on mutual information between two sets of clusters C and C’ (I(C, C’)) and is defined as:

$$I\;(C, C^{\prime})={\sum\;}_{i=1}^{k}\;{\sum\;}_{j=1}^{l}\;P\left(i,j\right)lo{g}_{2}\frac{P(i,j)}{P(i)P(j)}$$

(4)

where P(i, j) is the probability that an element belongs to cluster C_i in C and to cluster ${C}_{j}{\prime}$ in C’ (Eq. 5). The Strehl and Ghosh index is the mutual information between clusters C and C’ standardized by the geometric mean of the entropy of cluster C (H(C)) and C’ (H(C’)) defined by Eq. 6, where P(i) is the probability that an element is in cluster C_i $\in$ C which has n elements (Eq. 7).

$$P (i,j)=\frac{\left|{C}_{i}\cap {C}_{j}^{\prime}\right|}{n}$$

(5)

$$H\left(C\right)=-{\sum }_{i=1}^{k}P(i)lo{g}_{2}P(i)$$

(6)

$$P\left(i\right)= \frac{\left|{C}_{i}\right|}{n}$$

(7)

We apply the Strehl and Ghosh index for two purposes: 1) to compare spatial similarity of clusters derived for two different variables, and 2) to compare spatial similarity of clusters derived from the same variable across different periods of record.

3 Results

3.1 Monthly temperature and precipitation averages

Two distinct “elbows” define statistically optimal solutions for 64 and 19 clusters using monthly temperature and precipitation averages (Table 2). The 64-region solution shows familiar patterns in the eastern two-thirds of the contiguous USA (Fig. 1): a north–south gradient from the Gulf of Mexico to the Great Lakes driven by solar insolation and temperature, an east–west gradient in the central USA/Great Plains driven by precipitation, and SW-NE oriented clusters reflecting the Appalachian Mountains or proximity to the Atlantic. In the West, the clusters are considerably smaller and highlight the influences of elevation and coastal proximity.

Table 2 “Elbow” breakpoints for each variable

Full size table

In order to compare REDCAP clusters with a prior regionalization for monthly temperature and precipitation averages, we use a 14-cluster solution (close to a statistically optimal breakpoint). The 14-cluster REDCAP solution (Fig. 2a) shows general similarities to the well-known Fovell and Fovell (1993) clusters (Fig. 2b). In particular, large latitudinal bands characterize regions in the eastern USA. This north–south temperature gradient, combined with elevation effects and the east–west precipitation gradient dominate the regional patterns in the western USA. The most striking differences produced by REDCAP are associated with a more refined precipitation gradient across the central Plains and more detailed regions in the western USA reflecting topographic influences on temperature and precipitation. These differences stem from the use of PRISM data that incorporates elevation and slope into its gridding algorithm. By contrast, the climatological division data used by Fovell and Fovell (1993) are aggregated at the coarser-scale climate divisions and without explicit consideration for elevation. Moreover, the climatological division data in mountainous areas are disproportionately constructed from stations at lower elevations.

Also noteworthy is that when forced to 14 clusters, REDCAP produces different clusters for different periods of the PRISM record, e.g., 1931–1981 vs. 1981–2018. While similar general patterns emerge from the two periods, there are distinct differences in the central Plains and western USA (Fig. 2c). From this analysis, it is impossible to distinguish the degree to which such differences represent changes in temperature and precipitation after 1981 as opposed to different stations used to construct the PRISM dataset. One clear difference regarding the latter period is that it overlaps considerably with the period from which PRISM incorporates Snow Telemetry (SnoTel) data in the mountainous west and radar-based precipitation estimates across the USA.

Closer inspection across different periods of record reveals that similarity varies with the number of clusters selected and the record length and/or overlap between periods. Strehl and Ghosh index values, quantifying the mutual information shared by clusters, show that, with forty or more clusters, spatial similarity is consistently high (approximately 0.8) irrespective of record length or specific time periods (Fig. 3). This suggests that, for much of the country, the use of statistical properties from coherent regions formed on the basis of monthly temperature and precipitation averages is relatively insensitive to the period of record used to construct the regions when the total number of regions is sufficiently large. With fewer clusters, mutual shared information decreases rapidly. For example, a ten-cluster solution derived from 1981–2018 average temperature and precipitation data has a relatively low similarity index (approximately 0.65) to ten-cluster solutions based on 1895–1956, 1895–2018, 1957–2018, or 1931–1981 data. By contrast, increasing period length and/or overlap—1895–2018 vs. 1957–2018—results in similarity index values above 0.8 with as few as five clusters.

Since we report the Strehl and Ghosh similarity index values throughout this paper, it is helpful to calibrate these values to a series of maps. The similarity index between the two time periods used in the 14-cluster solution above (Fig. 2a vs. 2c), for example, is 0.69. At 60 clusters, the similarity for the same two time periods (1981–2018 and 1931–1981) is 0.81 (Fig. 4). At 60 clusters, smaller regions in the West capture how dramatic topographic changes influence climatological variables.

REDCAP’s solution for 102 clusters based on monthly temperature and precipitation normals produces regions quite different than NOAA’s standard forecast regions (Fig. 5). REDCAP regions vary more in size and are more elongated and irregular. The REDCAP clusters reveal familiar climate controls. In the eastern USA, the elongated clusters amplify the role of latitude and proximity to the Gulf of Mexico or Atlantic coasts on temperature and precipitation anomalies. But the most dramatic differences between REDCAP’s clusters and NOAA’s forecast regions is in the West where properties of elevation and slope inherent in the PRISM dataset matter more. These results are not surprising as the current forecast regions – a result of National Weather Service reorganization in the 1990s – retain remnants of state borders with only limited exceptions for physical features. Many forecast region boundaries are determined by either climate divisions or political boundaries rather than by data-driven regions.

3.2 Monthly temperature and precipitation anomalies

Forcing REDCAP to produce 139 clusters to compare against the 139 clusters created by Wolter and Allured (2007) resulted in remarkably similar regional patterns (Fig. 6). While both schemes use the same measure of seasonal temperature and precipitation anomalies and both the same time periods, they differ in constraints on spatial contiguity (REDCAP demands spatial contiguity, Wolter and Allured do not) and operate on different datasets (PRISM grids vs. NWS Coop stations). Yet, both schemes produce clusters that reflect elevation, orientation, and proximity to the coast. In the eastern USA, both schemes produce larger regions than the well-established climatic divisions. In the western USA, the derived clusters are often smaller than the standard climate divisions used in mountainous states. When considering more than twenty-five clusters, REDCAP produces similar seasonal temperature and precipitation anomalies (Strehl and Ghosh similarity index of approximately 0.8), irrespective of the period of record. For 139 clusters, the similarity index is approximately 0.83 whether comparing clusters derived from 1895–1956, 1895–2018, 1957–2018, or 1981–2018 data.

3.3 Drought indices

The L method produces breakpoints for monthly temperature-precipitation anomalies, SPEI, and SPI clusters at 74, 56, and 50 respectively (Table 2).

Here, we show that for a mid-range value of 60 clusters, temperature-precipitation anomalies regions are remarkably uniform in size across the USA (Fig. 7a). Clusters based on statistical parameters used to construct SPI or SPEI values vary more in size and more clearly highlight elevation and slope characteristics captured by the PRISM dataset (Fig. 7b and 7c). Yet, the statistical similarity for all possible pairs of these three variables is high – the Strehl and Ghosh similarity index is greater than 0.75 for 60 or more clusters.

In addition, clustering results are robust across a range of time frames for each of these variables. With more than thirty clusters, the Strehl and Ghosh similarity index is greater than 0.75 when comparing clusters derived from 1895–1956, 1895–2018, 1957–2018, or 1981–2018 data for temperature-precipitation anomalies, as well as for SPI and SPEI parameters.

The Strehl and Ghosh similarity index for SPEI vs. monthly temperature and precipitation averages ranges from 0.7 to 0.78 when creating more than 60 regions. The similarity index for SPI vs. monthly temperature and precipitation averages and temperature and precipitation anomaly vs. monthly temperature and precipitation averages is in the same approximate range – 0.7—0.76 for more than 60 regions. Both are lower than the similarity between the drought indices – SPI vs. SPEI – or the similarity of a single anomaly or drought measure across different time frames.

3.4 Heavy precipitation

We examine clusters forming 11 and 30 heavy precipitation regions based on average breakpoint elbows found in 1- to 4-day annual maxima precipitation totals (Table 2). The resulting regional patterns reflect three general influences: proximity to the Gulf, Atlantic, or Pacific, the east–west moisture gradient in the central USA, and orographic effects (Figs. 8 and 9).

The regions show a moderate degree of similarity across different durations (1- to 4-days) when eleven or more clusters are selected – Strehl and Ghosh similarity index values range from approximately 0.70 to 0.78 from 11 to 30 regions. Strehl and Ghosh similarity index values are also in this range for a single precipitation variable (i.e., 1- to 4-day annual maximum) when compared across different record lengths (e.g., 1981–1999, 2000–2018, 1981–2018). By contrast, similarity index values between any of the daily precipitation intensity measures and any of the monthly variables range are considerably lower, ranging from 0.5 to 0.58. Here, we illustrate that similarity values between monthly mean precipitation and SPI are considerably higher than either variable compared with 1- to 4-day annual maximum precipitation (Fig. 10).

The clusters created by such analysis could form the basis for regional frequency analysis whereby precipitation data are pooled to calculate intensity–duration–frequency curves (Hosking and Wallis 1993; Irwin et al. 2017; Yang et al. 2010; Burn 2014). Whether such pooling is done independently for each duration period (1- to n-days) depends on the level of similarity required. Clustering of heavy precipitation statistics also reduces interannual variability associated with station measurements (Kunkel et al. 2020).

4 Discussion and summary

Our method creates empirically derived regions exploiting the underlying statistical structure of important hydroclimate variables. By using PRISM, we define regions with a dataset that is widely employed in research and operations and also incorporates the role of topography, proximity to the coast, and other geographic features. These organic regions are often similar to those derived with other clustering algorithms and often quite different than many commonly-used spatial units (e.g., climate divisions, forecast regions). Of course, such units will remain a well-established standard and have practical value in many circumstances. State, county, or other political boundaries are important for management and policy decisions. However, it is important to recognize that there are drawbacks to using these when selecting individual stations to represent a region, as an aggregated coherent spatial unit, or for downscaling or aggregating data. While the boundaries of data-driven regions do not stop at political borders, this should not preclude their use. Resource managers concerned with a political unit can use them within their jurisdictions, not worrying about how large, small, or differently sized they are.

Many clustering algorithms draw on raw data or an underlying statistical structure. While we do not provide a comprehensive evaluation of different clustering schemes here, our method shows high similarity with that of Wolter and Allured (2007) who use station data and employ a different algorithm. This suggests that a number of different data-based clustering solutions can offer a sound approach for aggregation, one that has practical applications for monitoring temperature and precipitation anomalies, and for producing seasonal temperature or precipitation forecasts or assessing their accuracy. It is also worth noting that unlike traditional methods (e.g., average linkage and Ward’s method used by Wolter and Allured (2007)), our method enforces spatial contiguity which sacrifices a certain degree of homogeneity of units.

Our analysis also sought to assess similarity of regions based on different periods of record and different hydroclimate variables. The monthly hydroclimate variables considered here show high similarity across different time periods when the total number of regions was sufficiently large – typically more than 25 clusters across the conterminous United States. Similarity indices for monthly temperature and precipitation means and for temperature and precipitation anomalies exceeded 0.8 when comparing first and second halves of the record; SPI and SPEI were only slightly lower (~ 0.75). Even heavy precipitation had relatively high similarity (~ 0.7) across different time periods, despite analysis using shorter record lengths. Similarity values across different hydroclimate variables were typically lower than a single variable across different time periods. Combinations of monthly average temperature, temperature-precipitation anomalies, SPI, and SPEI, and combinations of 1- to 4-day annual maximum precipitation values ranged from 0.7 to 0.78. Similarity between any single monthly variable and any single daily precipitation variable was considerably lower (0.5–0.58). This is not surprising given the short-term nature of intense precipitation events and differences between the drivers of these events versus the controls on monthly temperature and precipitation anomalies.

Certainly, no single regionalization accommodates the variables commonly used to define hydroclimate extremes like drought or heavy precipitation, making the creation of consistent “hydroclimate extreme” regions challenging. Our findings suggest that no one solution suits our particular selection of hydroclimate extremes. Nor would one solution satisfy the diverse number of research or operational needs for which climate regions might be appropriate (Abatzoglou et al. 2009). Since the definition of clusters depends on the variable of interest and specific application, flexible algorithms that allow user-defined parameters (e.g., REDCAP) offer a practical tool for decision makers.

Data availability

The datasets generated during and/or analyzed during the current study are available from the corresponding author upon reasonable request.

Code availability

Software and code used in the study is available from the corresponding author upon reasonable request.

References

Abatzoglou JT, Redmond KT, Edwards LM (2009) Classification of regional climate variability in the State of California. J Appl Meteorol Climatol 48:1527–1541. https://doi.org/10.1175/2009Jamc2062.1
Article ADS Google Scholar
Ali Z, Hussain I, Faisal M, Shoukry AM, Gani S, Ahmad I (2019) A framework to identify homogeneous drought characterization regions. Theoret Appl Climatol 137:3161–3172. https://doi.org/10.1007/s00704-019-02797-w
Article ADS Google Scholar
Andreadis KM, Clark EA, Wood AW, Hamlet AF, Lettenmaier DP (2005) Twentieth-century drought in the Conterminous United States. J Hydrometeorol 6(6):985–1001. https://doi.org/10.1175/JHM450.1
Article ADS Google Scholar
Argüeso D, Hidalgo-Muñoz JM, Gámiz Fortis SR, Esteban-Parra MJ, Dudhia J, Castro-Díez Y (2011) Evaluation of WRF parameterizations for climate studies over southern Spain using a multistep regionalization. J Clim 24:5633–5651. https://doi.org/10.1175/JCLI-D-11-00073.1
Article ADS Google Scholar
Belda M, Holtanová E, Halenka T, Kalvová J, Hlávka Z (2015) Evaluation of CMIP5 present climate simulations using the Köppen-Trewartha climate classification. Climate Res 64:201–212. https://doi.org/10.3354/cr01316
Article ADS Google Scholar
Bharath R, Srinivas VV (2015) Delineation of homogeneous hydrometeorological regions using wavelet-based global fuzzy cluster analysis. Int J Climatol 31(15):4707–4727. https://doi.org/10.1002/joc.4318
Article Google Scholar
Bieniek PA, Bhatt US, Thoman RL, Angeloff H, Pertain J, Papineau J, Fritsch F, Holloway E, Walsh JE, Daly C, Shulski M, Hufford G, Hill DF, Calos S, Gens R (2012) Climate divisions for Alaska based on objective methods. J Appl Meteorol Climatol 51(7):1276–1289. https://doi.org/10.1175/JAMC-D-11-0168.1
Article ADS Google Scholar
Bonnin GM, Martin D, Lin B, Parzybok T, Yekta M, Riley D (2006) Precipitation-frequency Atlas of the United States, NOAA Atlas 14, Volume 2, Version 3.0, NOAA, National Weather Service, Silver Spring, Maryland
Bordi I, Sutera A (2007) Drought monitoring and forecasting at large scale. In: Rossi G, Vega T, Bonaccorso B (eds) Methods and tools for drought analysis management. Springer, Dordrecht, pp 3–27
Chapter Google Scholar
Burn DH (2014) A framework for regional estimation of intensity–duration–frequency (IDF) curves. HydrologicalProcess 28:4209–4218. https://doi.org/10.1002/hyp.10231
Article ADS Google Scholar
Carbone GJ (2014) Managing climate change scenarios for societal impact studies. Phys Geogr 35(1):22–49. https://doi.org/10.1080/02723646.2013.869714
Article Google Scholar
Carbone GJ, Lu J, Brunetti M (2018) Estimating uncertainty associated with the standard precipitation index. Int J Climatol 38(S1):e607–e616. https://doi.org/10.1002/joc.5393
Article Google Scholar
Daly C, Halbleib M, Smith JI, Gibson WP, Doggett MK, Taylor GH, Curtis J, Pasteris PP (2008) Physiographically sensitive mapping of climatological temperature and precipitation across the conterminous United States. Int J Climatol 28:2031–2048. https://doi.org/10.1002/joc.1688
Article Google Scholar
DeGaetano AT (2001) Spatial grouping of United States climate stations using a hybrid clustering approach. Int J Climatol 21:791–807. https://doi.org/10.1002/joc.645
Article Google Scholar
Du H, Xia J, Zeng S (2014) Regional frequency analysis of extreme precipitation and its spatio-temporal characteristics in the Huai River Basin, China. Nat Hazards 70:195–215. https://doi.org/10.1007/s11069-013-0808-6
Article Google Scholar
Fazel N, Berndtsson R, Uvo CB, Madani K, Kløve B (2018) Regionalization of precipitation characteristics in Iran’s Lake Urmia basin. Theoret Appl Climatol 132(1–2):363–373. https://doi.org/10.1007/s00704-017-2090-0
Article ADS Google Scholar
Fovell RG (1997) Consensus clustering of US temperature and precipitation data. J Clim 10(6):1405–1427. https://doi.org/10.1175/1520-0442(1997)010%3c1405:CCOUST%3e2.0.CO;2
Article ADS Google Scholar
Fovell RG, Fovell M-YC (1993) Climate zones of the conterminous United States defined using cluster analysis. J Clim 6:2103–2135. https://doi.org/10.1175/1520-0442(1993)006%3c2103:CZOTCU%3e2.0.CO;2
Article ADS Google Scholar
Gao P, Carbone GJ, Guo D (2015) Assessment of NARCCAP model in simulating rainfall extremes using a spatially constrained regionalization method. Int J Climatol 36(5):2368–2378. https://doi.org/10.1002/joc.4500
Article Google Scholar
Gao P, Carbone GJ, Lu J, Guo D (2018) An area-based approach for estimating extreme precipitation probability. Geogr Anal 50(3):314–333. https://doi.org/10.1111/gean.12148
Article Google Scholar
Gogic M, Trajkovic S (2014) Spatiotemporal characteristics of drought in Serbia. J Hydrol 510:110–123. https://doi.org/10.1016/j.jhydrol.2013.12.030
Article Google Scholar
Guo D (2008) Regionalization with dynamically constrained agglomerative clustering and partitioning (REDCAP). Int J Geogr Inf Sci 22(7):801–823. https://doi.org/10.1080/13658810701674970
Article Google Scholar
Guttman NB, Quayle RG (1996) A historical perspective of U.S. bulletin of the American meteorological society 77(2):293–304. https://doi.org/10.1175/1520-0477(1996)077<0293:AHPOUC>2.0.CO;2
Hosking JRM, Wallis JR (1993) Some statistics useful in regional frequency analysis. Water Resour Res 29(2):271–281. https://doi.org/10.1029/92WR01980
Article ADS Google Scholar
Irwin S, Srivastav RK, Simonovic SP, Burn DH (2017) Delineation of precipitation regions using location and atmospheric variables in two Canadian climate regions: the role of attribute selection. Hydrol Sci J 62(2):191–204. https://doi.org/10.1080/02626667.2016.1183776
Article Google Scholar
Karl TR, Easterling DR, Knight RW, Hughes PY (1994a) US, national and regional temperature anomalies. In: Boden TA, Kaiser DP, Sepanski RJ, Stoss FW (eds) Trends ’93: A Compendium of Data on Global Climate Change. ORNL/CDIAC-65. Carbon Dioxide Information Analysis Center, Oak Ridge National Laboratory: Oak Ridge, TN, 686–736
Karl TR, Easterling DR, Groisman PY (1994b) United States historical climatology network – national and regional estimates of monthly and annual precipitation. In: Boden TA, Kaiser DP, Sepanski RJ, Stoss FW (eds) Trends ’93: A Compendium of Data on Global Climate Change. ORNL/CDIAC-65. Carbon Dioxide Information Analysis Center, Oak Ridge National Laboratory: Oak Ridge, TN, 830–905
Köppen W (1900) Versuch einer Klassifikation der Klimate, Vorzugsweise nach ihren Beziehungen zur Pflanzenwelt [Attempted climate classification in relation to plant distributions]. Geogr Z 6(593–611):657–679
Google Scholar
Kunkel KE, Karl TR, Squires MF, Yin X, Stegall ST, Easterling DR (2020) Precipitation extremes: Trends and relationships with average precipitation and precipitable water in the contiguous United States. J Appl Meteorol Climatol 59(1):125–142. https://doi.org/10.1175/JAMC-D-19-0185.1
Article ADS Google Scholar
Manzano A, Clemente MA, Morata A, Luna MY, Begueria S, Vicente-Serrano SM, Martin ML (2019) Analysis of the atmospheric circulation pattern effects over SPEI drought index in Spain. Atmos Res 230:UNSP 104630. https://doi.org/10.1016/j.atmosres.2019.104630
Article Google Scholar
Maraun D, Wetterhall F, Ireson AM, Chandler RE, Kendon EJ, Widmann M, Briernen S, Rust HW, Sauter T et al (2010) Precipitation downscaling under climate change: Recent developments to bridge the gap between dynamical models and the end user. Revi Geophys 48(3):RG3003. https://doi.org/10.1029/2009RG000314
Article ADS Google Scholar
McKee TB, Doesken NJ, Kleist J (1993) The relationship of drought frequency and duration to time scales. Eighth Conference on Applied Climatology, 17–22 January 1993, Anaheim, California. https://climate.colostate.edu/pdfs/relationshipofdroughtfrequency.pdf
Mishra AK, Singh VP (2011) Drought modeling –- A review. J Hydrol 403(1–2):157–175. https://doi.org/10.1016/j.jhydrol.2011.03.049
Article Google Scholar
Mooney PA, Broderick C, Bruyere CL, Mulligan FJ, Prein AF (2017) Clustering of observed diurnal cycles of precipitation over the United States for evaluation of a WRF Multiphysics regional climate ensemble. J Clim 30(22):9267–9286. https://doi.org/10.1175/JCLI-D-16-0851.1
Article ADS Google Scholar
Perdinan, Winkler JA (2015) Selection of climate information for regional climate change assessments using regionalization techniques: an example for the Upper Great Lakes Region, USA. Int J Climatol 35(6):1027–1040. https://doi.org/10.1002/joc.4036
Article Google Scholar
Rhee J, Im J, Carbone GJ, Jensen JR (2008) Delineation of climate regions using in-situ and remotely-sensed data for the Carolinas. Remote Sens Environ 112(6):3099–3111. https://doi.org/10.1016/j.rse.2008.03.001
Salvador S, Chan P (2005) Learning states and rules for detecting anomalies in time series. Appl Intell 23:241–255. https://doi.org/10.1007/s10489-005-4610-3.pdf
Article Google Scholar
Santos JF, Pulido-Calvo I, Portella MM (2010) Spatial and temporal variability of droughts in Portugal. Water Resour Res 46(3):W03503. https://doi.org/10.1029/2009WR008071
Article ADS Google Scholar
Strehl A, Ghosh J (2002) Cluster ensembles – a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3:583–617. https://doi.org/10.1162/153244303321897735
Article MathSciNet Google Scholar
Sugg JW, Konrad CE (2020) Defining hydroclimatic regions using daily rainfall characteristics in the southern Appalachian Mountains. Int J Climatol 13(7):785–802. https://doi.org/10.1080/17538947.2019.1576785
Article Google Scholar
Thornthwaite CW (1931) The climates of North America, according to a new classification. Geogr Rev 21:633–655
Article Google Scholar
Thornthwaite CW (1948) An approach towards a rational classification of climate. Geogr Rev 38(1):55–94
Article Google Scholar
Vergni L, Di Lena B, Todisco F, Mannocchi F (2017) Uncertainty in drought monitoring by the standardized precipitation index: the case study of the Abruzzo region (Central Italy). Theoret Appl Climatol 128:13–26. https://doi.org/10.1007/s00704-015-1685-6
Article ADS Google Scholar
Vicente-Serrano SM (2006) Spatial and temporal analysis of droughts in the Iberian Peninsula (1910–2000). Hydrol Sci J 51(1):83–97. https://doi.org/10.1623/hysj.51.1.83
Article Google Scholar
Vicente-Serrano SM, Beguería S, López-Moreno JI (2010) A multiscalar drought index sensitive to global warming: The standardized precipitation evapotranspiration index. J Clim 23(7):1696–1718. https://doi.org/10.1175/2009JCLI2909.1
Article ADS Google Scholar
Wilks D (2019) Statistical Methods in the Atmospheric Sciences, 4th edn. Elsevier, Cambridge
Google Scholar
Winkler JA, Guentchev GS, Perdinan T-N, Zhong S, Liszewska M, Abraham Z, Niedźwiedź T, Ustrnul Z (2011) Climate scenario development and applications for local/regional climate change impact assessments: An overview for the non-climate scientist. Part i: Scenario Development Using Downscaling Methods Geography Compass 5(6):301–328. https://doi.org/10.1111/j.1749-8198.2011.00425.x
Article Google Scholar
Wolter K, Allured D (2007) New climate divisions for monitoring and predicting climate in the U.S. Intermountain West Climate Summary 3(5):2–6. https://climas.arizona.edu/sites/default/files/pdf2007julnewclimatedivisions.pdf. Accessed 16 July 2021
Wright DB, Guo Y, England JF (2020) Six decades of rainfall and flood frequency analysis using stochastic storm transposition: Review, progress, and prospects. J Hydrol 585:124816. https://doi.org/10.1016/j.jhydrol.2020.124816
Article Google Scholar
Wu FF, Yang XH, Shen ZY (2018) A three-stage hybrid model for regionalization, trends and sensitivity analyses of temperature anomalies in China from 1966 to 2015. Atmos Res 205:80–92. https://doi.org/10.1016/j.atmosres.2018.02.008
Article Google Scholar
Wu H, Hayes MJ, Wilhite DA, Svoboda MD (2005) The effect of the length of record on the standardized precipitation index calculation. Int J Climatol 25:505–520. https://doi.org/10.1002/joc.1142
Article Google Scholar
Xu K, Yang DW, Yang HB, Li Z, Qin Y, Shen Y (2015) Spatio-temporal variation of drought in China during 1961–2012: A climatic perspective. J Hydrol 526:253–264. https://doi.org/10.1016/j.jhydrol.2014.09.047
Article Google Scholar
Yang T, Shao QX, Hao ZC, Chen X, Zhang ZX, Xu CY, Sun LM (2010) Regional frequency analysis and spatio-temporal pattern characterization of rainfall extremes in the Pearl River Basin. China J Hydrol 380(3–4):386–405. https://doi.org/10.1016/j.jhydrol.2009.11.013
Article ADS Google Scholar
Yang W, Deng M, Tang J, Jin R (2020) On the use of Markov chain models for drought class transition analysis while considering spatial effects. Nat Hazards 103:2945–2959. https://doi.org/10.1007/s11069-020-04113-6
Article Google Scholar
Yin Y, Chen H, Xu C, Xu W, Chen C, Sun S (2016) Spatio-temporal characteristics of the extreme precipitation by L-moment-based index-flood method in the Yangtze River Delta region, China. Theoret Appl Climatol 124:1005–1022. https://doi.org/10.1007/s00704-015-1478-y
Article ADS Google Scholar
Zhang Q, Kong DD, Singh VP, Shi PJ (2017) Response of vegetation to different time-scales drought across China: Spatiotemporal patterns, causes and implications. Global Planet Change 152:1–11. https://doi.org/10.1016/j.gloplacha.2017.02.008
Article ADS CAS Google Scholar

Download references

Funding

Open access funding provided by the Carolinas Consortium. This research was supported by supported by the National Oceanic and Atmospheric Administration (NOAA) Climate Program Office (grant no. NA16OAR4310163).

Author information

Authors and Affiliations

Department of Geography, University of South Carolina, Columbia, SC, 29208, USA
Gregory J. Carbone
Department of Earth and Ocean Sciences, University of North Carolina Wilmington, Wilmington, NC, 28403, USA
Peng Gao
School of Community Resources and Development, Arizona State University, 411 N. Central Avenue, Suite 550, Phoenix, AZ, 85004, USA
Junyu Lu
The Hainan University, Arizona State University Joint International Tourism College, Hainan University, 58 Remin Road, Haikou, 570004, Hainan Province, China
Junyu Lu

Authors

Gregory J. Carbone
View author publications
You can also search for this author in PubMed Google Scholar
Peng Gao
View author publications
You can also search for this author in PubMed Google Scholar
Junyu Lu
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed largely by Peng Gao and Junyu Lu. All authors contributed to the first draft and editing of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Gregory J. Carbone.

Ethics declarations

Ethics approval

Not applicable.

Consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Carbone, G.J., Gao, P. & Lu, J. Regionalization of hydroclimate variables in the contiguous United States. Theor Appl Climatol (2024). https://doi.org/10.1007/s00704-024-04903-z

Download citation

Received: 22 July 2021
Accepted: 23 February 2024
Published: 08 March 2024
DOI: https://doi.org/10.1007/s00704-024-04903-z

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Regionalization of hydroclimate variables in the contiguous United States

Abstract

Similar content being viewed by others

Ecoregions of the Conterminous United States: Evolution of a Hierarchical Spatial Framework

Geostatistical Models and Spatial Interpolation

Geostatistical Models and Spatial Interpolation

1 Introduction

2 Data and methods

3 Results

3.1 Monthly temperature and precipitation averages