1 Introduction

Constructing climate regions has both descriptive and applied purposes. Regionalization can reveal new aspects of data, inform data stratification, or stimulate hypotheses about physical drivers influencing climatological observations (Wilks 2019). It can also provide a rational basis for managing resources influenced by climate variability. Given the threat of drought or heavy precipitation, understanding coherent regions derived from long-term datasets with high spatial resolution would improve both theoretical and applied aspects of hydroclimate extremes.

Early rule-based regionalization schemes include those by Köppen (1900), who devised a simple scheme based on monthly temperature and precipitation values to mirror existing global natural vegetation regions, and by Thornthwaite (1931, 1948), who created regions based on potential evapotranspiration for North America. More recent delineation of climate regions emphasizes more empirical or data-driven approaches (e.g., Fovell and Fovell 1993; Fovell 1997). The practical application of coherent, statistically-based climate regions has prompted several efforts to construct them at regional to globe scales. For example, Wolter and Allured (2007), recognizing how such units might benefit drought monitoring and seasonal climate forecasts, established an iterative clustering scheme to distill data from the US Cooperative Network into coherent patterns of temperature and precipitation anomalies. Their rationale was that the spatial extent and pattern of past droughts should inform operational drought monitoring. They further argued that, since seasonal forecasts rely on the response of mid-latitude circulation to tropical Pacific sea-surface temperature and pressure anomalies, defining regions on the basis of historical temperature and precipitation anomalies produces logical spatial units to verify forecasts. Their effort defined functional regions as an alternative to the two dominant schemes used for long-term regional records and for seasonal climate forecasting – the National Oceanographic and Atmospheric Administration’s 344 climate divisions and its 102 climate forecast divisions, respectively. The former were constructed using somewhat arbitrary criteria (Guttman and Quayle 1996); the latter, while less constrained by these criteria, overlap significantly with the climate divisions and are not based on statistical properties of the climate record (Wolter and Allured 2007). Bieniek et al. (2012), also recognizing the shortcomings of the established climate divisions, constructed empirical climate regions for Alaska that better reflected temperature and precipitation response to a suite of teleconnections.

Statistically-based climate regions provide an appropriate scale to measure climate change, to avoid shortcomings associated with estimates at individual stations (Wu et al. 2018), to identify trends (Karl et al. 1994a, b; DeGaetano 2001; Bharath and Srinivas 2015), to assess the adequacy of observing networks (DeGaetano 2001; Bharath and Srinivas 2015), or to attribute changes to specific causes (Fazel et al. 2018). Data-derived regions are appropriate for evaluating model output from numerical weather prediction models (e.g., Argüeso et al. 2011) or control runs of general circulation models (Belda et al. 2015), or for creating climate change scenarios via downscaling from coarse to more local spatial scale (Maraun et al. 2010; Winkler et al. 2011; Carbone 2014; Perdinan and Winkler 2015).

Regionalization has many hydroclimate applications, including assessing the spatial and temporal patterns and determining causes of drought. A variety of clustering algorithms have been used to identify the duration, spatial locus and extent, and intensity associated with significant droughts (Andreadis et al. 2005; Xu et al. 2015) or to define homogenous drought monitoring regions (Ali et al. 2019). Spatial clustering informs frequency analysis to uncover drought periodicity and to provide clues for a drought’s causes (Vicente-Serrano 2006; Bordi and Sutera 2007; Santos et al. 2010; Gocic and Trajkovic 2014; Zhang et al. 2017; Manzano et al. 2019). Regionalization also can help improve existing methods for drought forecasts or for deriving future scenarios (Mishra and Singh 2011). Creating rational climate regions is equally relevant in analyzing heavy precipitation and flooding, as well as in storm-water and disaster and risk management (DeGaetano 2001; Du et al. 2014; Gao et al. 2018; Irwin et al. 2017). It is the basis for understanding heavy precipitation regions and is essential to stochastic storm transposition, wherein historic storms are superimposed on different regions as plausible heavy rain events. Resampling the intensity, duration, and spatial extent of a past event provides a complementary method for rainfall and flood frequency analysis (Yin et al. 2016; Wright et al. 2020) but demands coherency in the statistical properties used to construct intensity–duration–frequency (IDF) curves and other measures of probability and risk due to heavy precipitation (Gao et al. 2018). Hence, investigators have looked at spatial coherence in precipitation intensity, including diurnal rainfall patterns (Mooney et al. 2017), annual precipitation extremes (Yang et al. 2010), and variability in mountainous regions (Sugg and Konrad 2020).

While data-driven climate regions are often constructed from point data, gridded data sources abound. Derived from observational networks, reanalysis, or other model output, these data are an important part of modern climatological analysis. Gridded datasets often integrate in-situ measurements with areally-based data using multiple sources to maximize the spatial and temporal resolution. Examples include remotely sensed data, model output, and response variables such as vegetation (Rhee et al. 2008; Bieniek et al. 2012; Zhang et al. 2017). The resulting integrated products can be used to establish functional climate regions that respond to hydroclimate extremes.

In this paper, we apply a hierarchical clustering algorithm to statistical measures of the widely used Parameter-elevation Relationships on Independent Slopes Model (PRISM) temperature and precipitation database. We construct regions based on temperature and precipitation averages and anomalies, as well as the statistical parameters underlying the Standardized Precipitation Index (SPI), the Standardized Precipitation Evapotranspiration Index (SPEI), and 1-, 2-, 3-, and 4-day annual precipitation maxima. Our analysis offers an inherently areal perspective to regionalization with a gridded dataset widely employed in research and operations. The parameters upon which measures of drought or heavy precipitation are grounded both provide a rational basis for regionalization and assist in applications that depend on spatial coherence when applying probability distributions to regional data (Hosking and Wallis 1993; Bonnin et al. 2006). By clustering on the parameters of theoretical probability distributions, we consider a range of possible probability distributions beyond those provided by limited observed records (Wilks 2019). Clustering on these parameters emphasizes anomalies like those represented by the SPI and SPEI, return intervals associated with heavy precipitation, and operational products such as seasonal climate forecasts. Our analysis has three goals:

  • to apply the regionalization scheme, REDCAP, to the PRISM dataset and to compare our resulting regions against other data-driven spatial analysis and established aggregation units (e.g., climate divisions, forecast regions);

  • to measure how clustering solutions compare across hydroclimate variables and different periods of record; and

  • to determine the feasibility of creating coherent regions that simultaneously capture multiple hydroclimate extremes.

While clustering algorithms have been applied routinely to climatological data, this paper offers several innovations. We apply a clustering algorithm that enforces spatial contiguity and has not been used extensively with climatological data. By applying the algorithm to a popularly-used gridded data set, we seek to identify the degree to which different extreme hydroclimate variables can produce similar regions.

2 Data and methods

We conduct all analyses using temperature and precipitation values from the PRISM (all networks) dataset (Daly et al. 2008). PRISM data helps us address the challenge associated with the unit size of original datasets. Fovell and Fovell (1993) acknowledged that the climate divisions used to create their regions are more evenly sized in the eastern and southern USA, but more erratically sized in the west. We avoid this latent bias by clustering on the evenly-sized PRISM grid cells. These gridded data are interpolated from several different observation networks to a 4-km grid across the lower-48 United States, incorporating properties of location, elevation, coastal proximity, slope orientation, vertical atmospheric profiles, topographic position, and orographic effectiveness. Like other gridded datasets, PRISM provides the advantage of complete records and spatial coverage. After 2001, PRISM precipitation data incorporated radar inputs. PRISM data have been used as the basis of or to augment other regionalization studies (Abatzoglou et al. 2009; Bieniek et al. 2012; Sugg and Konrad 2020). All variables were derived from monthly PRISM data sets except for annual maximum precipitation, derived from the daily PRISM data set.

Our regionalization focuses on hydroclimate variables and includes independent analysis of the following:

  • Monthly average precipitation (12 variables, 1 for each month)

  • Monthly average temperature and precipitation (24 variables)

  • Combined temperature and precipitation anomalies (following Wolter and Allured 2007; Abatzoglou et al. 2009)

  • Drought indices (parameters used to calculate SPI and SPEI)

    • 3-month SPI: two-parameter gamma distribution for each month (24 variables)

    • 3-month SPEI: three-parameter log-logistic distribution for each month (36 variables)

  • Heavy precipitation: 1-, 2-, 3-, and 4-day annual maximum precipitation based on Anderson–Darling distance

We computed average monthly temperature and average total precipitation for each PRISM grid in each month for the period of record, 1895–2018. Monthly anomalies were calculated by subtracting the period-of-record average from each monthly observation. Our analysis for all monthly products includes three different time frames: 1895–2018, 1957–2018, and 1981–2018; in addition, we examined other time frames for specific comparison with previous studies. We do this not only because the parameters used to calculate SPI and SPEI are sensitive to record length (Wu et al. 2005; Vergni et al. 2017; Carbone et al. 2018), but also to consider changes that may be due to climate variability and change, or to changes in the number and distribution of stations used to produce the PRISM grids. While we cannot distinguish between these two factors, we can document how different time frames affect the coherency of regions inherent in this widely used dataset.

We computed SPI values following the method of McKee et al. (1993). Three‐month running precipitation totals were calculated for each grid and month during the period of record. Probability distribution functions (PDFs) were fit independently for each month to derive gamma shape (alpha) and scale (beta) parameters using maximum likelihood estimation (Wilks 2019). We used the three-parameter log–logistic function to fit the probability distribution of the precipitation—potential evapotranspiration difference (Vicente-Serrano et al. 2010). We calculated SPEI using the R package SPEIcalc (URL: http://sac.csic.es/spei) developed by Vicente-Serrano et al. (2010). To regionalize the characteristics of heavy precipitation, we summed total rainfall for consecutive 1-, 2-, 3-, and 4-day periods for all grid cells and each year from January 1, 1981, through December 31, 2018.

We adopted a hierarchical clustering method, Regionalization with Dynamically Constrained Agglomerative Clustering and Partitioning (REDCAP; Guo 2008), to identify clusters based on underlying parameters for the probability distributions used to calculate SPI (gamma), SPEI (Pearson-Type III), and 1- to 4-day precipitation (gamma). REDCAP is a widely used clustering algorithm with previous hydroclimate applications (Gao et al. 2018; Yang et al. 2020).REDCAP offers a flexible algorithm that overcomes limitations inherent in conventional clustering methods that do not consider spatial information, one of the key factors that shape climate regions. It uses the average linkage clustering method but directly enforces spatial contiguity during the clustering procedure forming spatially contiguous regions within which various analyses – such as climate model evaluation and resampling of rare or extreme events – could be conducted (Gao et al. 2015, 2018). By accommodating different variables derived from raw temperature and precipitation and various distance measurements (Table 1), REDCAP allows us to investigate multiple facets of hydroclimate and distill information from raw data.

Table 1 Hydroclimate variables and dissimilarity used in regionalization

REDCAP identifies groups of grid cells (i.e., clusters) from PRISM data that are spatially contiguous and have similar statistical properties defined by each variable (details below). The algorithm considers each grid cell as an individual cluster. Then, it iteratively merges pairs of clusters that are spatially contiguous and have the highest similarity (or the lowest dissimilarity/distance). This follows the same clustering procedure as conventional clustering methods (e.g., average linkage clustering) except that REDCAP requires that pairs of clusters merged in each iteration must be spatially contiguous. When all grid cells are merged, a spatially contiguous dendrogram is constructed. REDCAP then partitions the spatially contiguous dendrogram into the desired number of subtrees assigned by a user, each forming a cluster of spatially contiguous grid cells. REDCAP can incorporate different spatial coherence measures by considering the distance and the dissimilarity between pairs of grid cells. We consider dissimilarity between each pair of grid cells using three distinct distance measures (Table 1). The first is Euclidian distance, applied to situations where each grid cell has multiple climate variables, i, associated with that cell. The distance (D) between a pair of grid cells m and n each of which has k climate variables is defined by Eq. 1.

$$D=\sqrt{{\sum }_{i=1}^{k}\begin{array}{cc}& {\left({m}_{i}-{n}_{i}\right)}^{2}\end{array}}$$
(1)

Euclidian distance is widely used in clustering algorithms.

The second distance measure follows that used by Wolter and Allured (2007) to compare our results with their statistically-derived climate divisions. Average temperatures and precipitation totals for every three-month season were computed for each grid cell for the entire study period. The anomaly at each grid cell was then calculated by subtracting the average for the same season. Finally, the dissimilarity is determined as one minus the correlation of pairwise anomalies.

The third distance measure is Anderson–Darling (AD) distance measuring the similarity of distributions of annual precipitation maxima. The AD distance for two series of n year annual maxima at grid cell a and b is defined by Eq. 2.

$$AD\;(a, b)=\frac{n}{2}{\int }_{-\infty }^{\infty }\frac{{({F}_{a}\left(x\right)-{F}_{b}(x))}^{2}}{{F}_{ab}\left(x\right)(1-{F}_{ab}(x))}d{F}_{ab}(x)$$
(2)

where Fa(x), Fb(x), and Fab(x) are the sample distribution functions of a, b, and a combined sample of a and b, respectively. This measure suits the special nature of statistical properties associated with extreme values, such as annual precipitation maxima.

REDCAP produces any user-defined number of regions. We determined the number of clusters for each variable using two approaches: First, we selected the same number of regions used operationally or in prior studies. For example, we used 344 regions, the number of USA climate divisions used operationally and for which a 125-year historical record exists for many climate variables, including several important for hydroclimatology. We used 102 regions, the number of seasonal climate forecast regions used by NOAA’s Climate Prediction Center. We also selected 139 regions, the number established by Wolter and Allured (2007) when clustering three-month temperature and precipitation anomalies from 1978–2006 National Oceanographic and Atmospheric Administration Cooperative Observer Program (NOAA COOP) data.

Second, we used the “L method” to determine a statistically-optimal number of regions by analyzing within-region heterogeneity at each hierarchical level (Salvador and Chan 2005). Within-region heterogeneity is defined as the sum of squared deviations in each region plotted against the number of clusters. The L method identifies points of maximum curvature in the plot, assuming that an appropriate number of clusters often coincides with rapid changes in within-region heterogeneity as clusters are merged. Visually, the L method locates the ‘elbow’ in the plot. Mathematically, the L method divides the plot into two parts, those with a lower or higher number of clusters (Lc and Rc, respectively) for each possible number of clusters (c). Separate lines were fitted for Lc and Rc, and total RMSE (Root Mean Squared Error) at c was defined as:

$${RMSE}_{c}= \frac{c-1}{b-1}RMSE\left({L}_{c}\right)+ \frac{b-c}{b-1}RMSE({R}_{c})$$
(3)

where RMSE(Lc) and RMSE(Rc) are RMSEs of the lines in Lc and Rc respectively, and b is the largest possible number of clusters. The statistically optimal number of regions corresponds to the value of c that minimizes RMSEc. When b is very large, the optimal number results in fine-grain clusters with little practical meaning. To solve this problem, the L method iteratively updates the plot by assigning b to be twice the optimal number of clusters in the last iteration. It keeps searching for the optimal number of regions in the updated plot, ultimately yielding the optimal number of regions at each iteration.

Regionalization of different climatic variables yielded different sets of regions. We determined spatial coherence of these regions using Strehl and Ghosh’s (2002) index which measures the mutual spatial information shared by two random variables. Among a variety of cluster similarity indices, the Strehl and Ghosh index has several advantages. It is independent of the number of clusters or the number of elements, and it does not rely on distribution assumptions for hypothesis testing. The index is based on mutual information between two sets of clusters C and C’ (I(C, C’)) and is defined as:

$$I\;(C, C^{\prime})={\sum\;}_{i=1}^{k}\;{\sum\;}_{j=1}^{l}\;P\left(i,j\right)lo{g}_{2}\frac{P(i,j)}{P(i)P(j)}$$
(4)

where P(i, j) is the probability that an element belongs to cluster Ci in C and to cluster \({C}_{j}{\prime}\) in C’ (Eq. 5). The Strehl and Ghosh index is the mutual information between clusters C and C’ standardized by the geometric mean of the entropy of cluster C (H(C)) and C’ (H(C’)) defined by Eq. 6, where P(i) is the probability that an element is in cluster Ci \(\in\) C which has n elements (Eq. 7).

$$P (i,j)=\frac{\left|{C}_{i}\cap {C}_{j}^{\prime}\right|}{n}$$
(5)
$$H\left(C\right)=-{\sum }_{i=1}^{k}P(i)lo{g}_{2}P(i)$$
(6)
$$P\left(i\right)= \frac{\left|{C}_{i}\right|}{n}$$
(7)

We apply the Strehl and Ghosh index for two purposes: 1) to compare spatial similarity of clusters derived for two different variables, and 2) to compare spatial similarity of clusters derived from the same variable across different periods of record.

3 Results

3.1 Monthly temperature and precipitation averages

Two distinct “elbows” define statistically optimal solutions for 64 and 19 clusters using monthly temperature and precipitation averages (Table 2). The 64-region solution shows familiar patterns in the eastern two-thirds of the contiguous USA (Fig. 1): a north–south gradient from the Gulf of Mexico to the Great Lakes driven by solar insolation and temperature, an east–west gradient in the central USA/Great Plains driven by precipitation, and SW-NE oriented clusters reflecting the Appalachian Mountains or proximity to the Atlantic. In the West, the clusters are considerably smaller and highlight the influences of elevation and coastal proximity.

Table 2 “Elbow” breakpoints for each variable
Fig. 1
figure 1

Sixty-four regions based on monthly temperature and precipitation normals from PRISM data, 1981–2018

In order to compare REDCAP clusters with a prior regionalization for monthly temperature and precipitation averages, we use a 14-cluster solution (close to a statistically optimal breakpoint). The 14-cluster REDCAP solution (Fig. 2a) shows general similarities to the well-known Fovell and Fovell (1993) clusters (Fig. 2b). In particular, large latitudinal bands characterize regions in the eastern USA. This north–south temperature gradient, combined with elevation effects and the east–west precipitation gradient dominate the regional patterns in the western USA. The most striking differences produced by REDCAP are associated with a more refined precipitation gradient across the central Plains and more detailed regions in the western USA reflecting topographic influences on temperature and precipitation. These differences stem from the use of PRISM data that incorporates elevation and slope into its gridding algorithm. By contrast, the climatological division data used by Fovell and Fovell (1993) are aggregated at the coarser-scale climate divisions and without explicit consideration for elevation. Moreover, the climatological division data in mountainous areas are disproportionately constructed from stations at lower elevations.

Fig. 2
figure 2

REDCAP clustering into 14 regions based on monthly temperature and precipitation normals from PRISM data, 1931–1981 (a) and 1981–2018, (c) Digital version of Fovell and Fovell’s (1993) 14 regions using climate division data, 1931–1981 (b)

Also noteworthy is that when forced to 14 clusters, REDCAP produces different clusters for different periods of the PRISM record, e.g., 1931–1981 vs. 1981–2018. While similar general patterns emerge from the two periods, there are distinct differences in the central Plains and western USA (Fig. 2c). From this analysis, it is impossible to distinguish the degree to which such differences represent changes in temperature and precipitation after 1981 as opposed to different stations used to construct the PRISM dataset. One clear difference regarding the latter period is that it overlaps considerably with the period from which PRISM incorporates Snow Telemetry (SnoTel) data in the mountainous west and radar-based precipitation estimates across the USA.

Closer inspection across different periods of record reveals that similarity varies with the number of clusters selected and the record length and/or overlap between periods. Strehl and Ghosh index values, quantifying the mutual information shared by clusters, show that, with forty or more clusters, spatial similarity is consistently high (approximately 0.8) irrespective of record length or specific time periods (Fig. 3). This suggests that, for much of the country, the use of statistical properties from coherent regions formed on the basis of monthly temperature and precipitation averages is relatively insensitive to the period of record used to construct the regions when the total number of regions is sufficiently large. With fewer clusters, mutual shared information decreases rapidly. For example, a ten-cluster solution derived from 1981–2018 average temperature and precipitation data has a relatively low similarity index (approximately 0.65) to ten-cluster solutions based on 1895–1956, 1895–2018, 1957–2018, or 1931–1981 data. By contrast, increasing period length and/or overlap—1895–2018 vs. 1957–2018—results in similarity index values above 0.8 with as few as five clusters.

Fig. 3
figure 3

Similarity of regions based on monthly temperature and precipitation averages across four different periods: (a) 1957–2018 vs 1981–2018, (b) 1895–2018 vs. 1981–2018, (c) 1895–2018 vs. 1957–2018, and (d) 1931–1981 vs. 1981–2018

Since we report the Strehl and Ghosh similarity index values throughout this paper, it is helpful to calibrate these values to a series of maps. The similarity index between the two time periods used in the 14-cluster solution above (Fig. 2a vs. 2c), for example, is 0.69. At 60 clusters, the similarity for the same two time periods (1981–2018 and 1931–1981) is 0.81 (Fig. 4). At 60 clusters, smaller regions in the West capture how dramatic topographic changes influence climatological variables.

Fig. 4
figure 4

Sixty-four regions based on monthly temperature and precipitation normals from PRISM data, 1981–2018 (a) and 1931–1981 (b)

REDCAP’s solution for 102 clusters based on monthly temperature and precipitation normals produces regions quite different than NOAA’s standard forecast regions (Fig. 5). REDCAP regions vary more in size and are more elongated and irregular. The REDCAP clusters reveal familiar climate controls. In the eastern USA, the elongated clusters amplify the role of latitude and proximity to the Gulf of Mexico or Atlantic coasts on temperature and precipitation anomalies. But the most dramatic differences between REDCAP’s clusters and NOAA’s forecast regions is in the West where properties of elevation and slope inherent in the PRISM dataset matter more. These results are not surprising as the current forecast regions – a result of National Weather Service reorganization in the 1990s – retain remnants of state borders with only limited exceptions for physical features. Many forecast region boundaries are determined by either climate divisions or political boundaries rather than by data-driven regions.

Fig. 5
figure 5

NOAA’s 102 forecast regions (a) 102 regions based on normal T and P from 1981 to 2018 created by REDCAP (b)

3.2 Monthly temperature and precipitation anomalies

Forcing REDCAP to produce 139 clusters to compare against the 139 clusters created by Wolter and Allured (2007) resulted in remarkably similar regional patterns (Fig. 6). While both schemes use the same measure of seasonal temperature and precipitation anomalies and both the same time periods, they differ in constraints on spatial contiguity (REDCAP demands spatial contiguity, Wolter and Allured do not) and operate on different datasets (PRISM grids vs. NWS Coop stations). Yet, both schemes produce clusters that reflect elevation, orientation, and proximity to the coast. In the eastern USA, both schemes produce larger regions than the well-established climatic divisions. In the western USA, the derived clusters are often smaller than the standard climate divisions used in mountainous states. When considering more than twenty-five clusters, REDCAP produces similar seasonal temperature and precipitation anomalies (Strehl and Ghosh similarity index of approximately 0.8), irrespective of the period of record. For 139 clusters, the similarity index is approximately 0.83 whether comparing clusters derived from 1895–1956, 1895–2018, 1957–2018, or 1981–2018 data.

Fig. 6
figure 6

Comparison of 139 clusters of monthly, October 1978—September, 2006 temperature and precipitation anomalies derived by Wolter and Allured (2007; colored dots) vs. REDCAP (black boundary lines)

3.3 Drought indices

The L method produces breakpoints for monthly temperature-precipitation anomalies, SPEI, and SPI clusters at 74, 56, and 50 respectively (Table 2).

Here, we show that for a mid-range value of 60 clusters, temperature-precipitation anomalies regions are remarkably uniform in size across the USA (Fig. 7a). Clusters based on statistical parameters used to construct SPI or SPEI values vary more in size and more clearly highlight elevation and slope characteristics captured by the PRISM dataset (Fig. 7b and 7c). Yet, the statistical similarity for all possible pairs of these three variables is high – the Strehl and Ghosh similarity index is greater than 0.75 for 60 or more clusters.

Fig. 7
figure 7

Sixty-cluster solutions for: (a) temperature-precipitation anomalies, (b) SPEI parameters, and (c) SPI parameters

In addition, clustering results are robust across a range of time frames for each of these variables. With more than thirty clusters, the Strehl and Ghosh similarity index is greater than 0.75 when comparing clusters derived from 1895–1956, 1895–2018, 1957–2018, or 1981–2018 data for temperature-precipitation anomalies, as well as for SPI and SPEI parameters.

The Strehl and Ghosh similarity index for SPEI vs. monthly temperature and precipitation averages ranges from 0.7 to 0.78 when creating more than 60 regions. The similarity index for SPI vs. monthly temperature and precipitation averages and temperature and precipitation anomaly vs. monthly temperature and precipitation averages is in the same approximate range – 0.7—0.76 for more than 60 regions. Both are lower than the similarity between the drought indices – SPI vs. SPEI – or the similarity of a single anomaly or drought measure across different time frames.

3.4 Heavy precipitation

We examine clusters forming 11 and 30 heavy precipitation regions based on average breakpoint elbows found in 1- to 4-day annual maxima precipitation totals (Table 2). The resulting regional patterns reflect three general influences: proximity to the Gulf, Atlantic, or Pacific, the east–west moisture gradient in the central USA, and orographic effects (Figs. 8 and 9).

Fig. 8
figure 8

REDCAP 11-cluster solution optimizing similarity in annual maxima for (a) 1-day, (b) 2-day, (c) 3-day, and (d) 4-day precipitation

Fig. 9
figure 9

REDCAP 30-cluster solution optimizing similarity in annual maxima for (a) 1-day, (b) 2-day, (c) 3-day, and (d) 4-day precipitation

The regions show a moderate degree of similarity across different durations (1- to 4-days) when eleven or more clusters are selected – Strehl and Ghosh similarity index values range from approximately 0.70 to 0.78 from 11 to 30 regions. Strehl and Ghosh similarity index values are also in this range for a single precipitation variable (i.e., 1- to 4-day annual maximum) when compared across different record lengths (e.g., 1981–1999, 2000–2018, 1981–2018). By contrast, similarity index values between any of the daily precipitation intensity measures and any of the monthly variables range are considerably lower, ranging from 0.5 to 0.58. Here, we illustrate that similarity values between monthly mean precipitation and SPI are considerably higher than either variable compared with 1- to 4-day annual maximum precipitation (Fig. 10).

Fig. 10
figure 10

Similarity index for sample pairs of monthly average precipitation (p), monthly SPI, and 1- to 4-day annual maximum precipitation

The clusters created by such analysis could form the basis for regional frequency analysis whereby precipitation data are pooled to calculate intensity–duration–frequency curves (Hosking and Wallis 1993; Irwin et al. 2017; Yang et al. 2010; Burn 2014). Whether such pooling is done independently for each duration period (1- to n-days) depends on the level of similarity required. Clustering of heavy precipitation statistics also reduces interannual variability associated with station measurements (Kunkel et al. 2020).

4 Discussion and summary

Our method creates empirically derived regions exploiting the underlying statistical structure of important hydroclimate variables. By using PRISM, we define regions with a dataset that is widely employed in research and operations and also incorporates the role of topography, proximity to the coast, and other geographic features. These organic regions are often similar to those derived with other clustering algorithms and often quite different than many commonly-used spatial units (e.g., climate divisions, forecast regions). Of course, such units will remain a well-established standard and have practical value in many circumstances. State, county, or other political boundaries are important for management and policy decisions. However, it is important to recognize that there are drawbacks to using these when selecting individual stations to represent a region, as an aggregated coherent spatial unit, or for downscaling or aggregating data. While the boundaries of data-driven regions do not stop at political borders, this should not preclude their use. Resource managers concerned with a political unit can use them within their jurisdictions, not worrying about how large, small, or differently sized they are.

Many clustering algorithms draw on raw data or an underlying statistical structure. While we do not provide a comprehensive evaluation of different clustering schemes here, our method shows high similarity with that of Wolter and Allured (2007) who use station data and employ a different algorithm. This suggests that a number of different data-based clustering solutions can offer a sound approach for aggregation, one that has practical applications for monitoring temperature and precipitation anomalies, and for producing seasonal temperature or precipitation forecasts or assessing their accuracy. It is also worth noting that unlike traditional methods (e.g., average linkage and Ward’s method used by Wolter and Allured (2007)), our method enforces spatial contiguity which sacrifices a certain degree of homogeneity of units.

Our analysis also sought to assess similarity of regions based on different periods of record and different hydroclimate variables. The monthly hydroclimate variables considered here show high similarity across different time periods when the total number of regions was sufficiently large – typically more than 25 clusters across the conterminous United States. Similarity indices for monthly temperature and precipitation means and for temperature and precipitation anomalies exceeded 0.8 when comparing first and second halves of the record; SPI and SPEI were only slightly lower (~ 0.75). Even heavy precipitation had relatively high similarity (~ 0.7) across different time periods, despite analysis using shorter record lengths. Similarity values across different hydroclimate variables were typically lower than a single variable across different time periods. Combinations of monthly average temperature, temperature-precipitation anomalies, SPI, and SPEI, and combinations of 1- to 4-day annual maximum precipitation values ranged from 0.7 to 0.78. Similarity between any single monthly variable and any single daily precipitation variable was considerably lower (0.5–0.58). This is not surprising given the short-term nature of intense precipitation events and differences between the drivers of these events versus the controls on monthly temperature and precipitation anomalies.

Certainly, no single regionalization accommodates the variables commonly used to define hydroclimate extremes like drought or heavy precipitation, making the creation of consistent “hydroclimate extreme” regions challenging. Our findings suggest that no one solution suits our particular selection of hydroclimate extremes. Nor would one solution satisfy the diverse number of research or operational needs for which climate regions might be appropriate (Abatzoglou et al. 2009). Since the definition of clusters depends on the variable of interest and specific application, flexible algorithms that allow user-defined parameters (e.g., REDCAP) offer a practical tool for decision makers.