1 Introduction

Severe storms’ events, such as hailstorms and thunderstorms winds, have significant impacts on human activities and infrastructures. Namely, their increasing occurrence in space and in time has a significant impact in terms of economic damages and heavy losses (Kossin et al. 2017). As stated by the Insurance Information Institute (2021), among severe storms’ types, hailstorms and thunderstorms winds result to be the most dangerous events, in terms of insurance losses (Gao and Shi 2022), as well as life losses (Kunkel et al. 2013). Hailstorms and thunderstorms winds’ events are described as severe local storms and are classified as severe convective ones (Brooks 2013). Severe convective storms earn their designation based on the magnitudes of their grasps. A convective storm related to a hailstorm’s event is considered severe if the hail diameter extension (National Center for Environmental Information (NCEI) of the National Oceanic and Atmospheric Administration (NOAA) of the United States of America (USA) 2022b) is higher than 2.50 cm \(\approx 0.98^{\prime\prime}\) (inches), while that related to a gust wind’s event is considered severe if its wind speed is higher than 95.00 km/h \(\approx 0.03\) km/s (Kunkel et al. 2013). Both thunderstorms winds’ and hailstorms processes are sensible to climate change (Raupach et al. 2021) as their relationship with environmental, meteorological and climatological covariates (Brown and Murphy 1996; Brooks et al. 2003) has been investigated through regression techniques, such as Principal Component Regression (PCR), Partial Least Squares Regression (PLSR—Eccel et al. 2011), Poisson counting regression models (Allen et al. 2015) and intensity estimation (Gensini and Allen 2018).

The spatio-temporal dynamics and the occurrence of severe storms’ events are complex and require sophisticated modeling techniques (Sobash et al. 2011; Dehshiri and Firoozabadi 2023; Fortuin et al. 2022), that are not even enough satisfactory to describe their spatio-temporal structural and reproduction characteristics. Spatio-temporal dynamics of hailstorms have recently been evaluated according to replicated spatial point processes’ models (Gao and Shi 2022), and that of thunderstorms winds have been evaluated through temporal Poisson renewal processes (Li 2000). Furthermore, extreme environmental hazards’ processes involving the most violent gusty winds, i.e., tornadoes, are already evaluated from the spatio-temporal point processes point of view (González et al. 2014, 2019; Valente and Laurini 2020).

Spatio-temporal point processes’ models have been used to understand the mechanism of the spatio-temporal evolution of phenomena (González et al. 2016), in specialized fields, such as epidemiology (Quesada et al. 2017) and seismology (Adelfio and Chiodi 2020). However, recently, their application has been developed in other fields such as ecology/biology (Soriano-Redondo et al. 2019), social sciences (Tang et al. 2022), finance (Adelfio et al. 2020). Spatio-temporal point processes allow the description of a point pattern’s first-order characteristics, related to the spatio-temporal locations or in asymmetric relationships with the available environmental covariates, and its second-order characteristics. The latter involves the spatio-temporal interactions or interpoint distances. In summary, point processes allow for the recognition of the so-called generating process, i.e. the underlying spatio-temporal point process (Diggle 2006a, b; Illian et al. 2007).

An important issue in the assessment of the dynamics of thunderstorms winds and hailstorms processes relates to their mark variables, i.e. their extension and wind speed, respectively (Gao and Shi 2022). Models for marked spatio-temporal point patterns are suitable for the evaluation of spatio-temporal environmental hazards’ processes (Mateu and Ignaccolo 2015), such as for seismological phenomena (Adelfio and Chiodi 2020). Our interest is to evaluate the historical processes of severe hailstorms and thunderstorms winds that occurred in the USA, from 1996 to 2022. More precisely, we aim to evaluate the dynamics of hailstorms and thunderstorms winds using stochastic point processes theory, and according to the marked spatio-temporal self-exciting point processes approach. As well as many other natural phenomena, thunderstorms winds and hailstorms processes are characterized by an additional random variable indicating the power of occurrence at each spatio-temporal location. This power is measured in terms of the so-called mark, which represents the phenomenon’s driving force. Marked self-exciting point processes, as well as log-Gaussian Cox processes for the unmarked case and patterns where the driving force is represented by the random intensity function, are suitable models for assessing the underlying processes of marked clustered patterns. Besides the aggregated nature of the data, they require an additional assumption regarding the real-valued driving random variable, or mark, that makes them marked. Each marked self-exciting process has a spatio-temporal distribution (and the associated spatio-temporal conditional intensity function) which depends on the mark distribution. Marked self-exciting point processes assume that the process self-excites with a higher frequency of offspring events conditional to a high value of the mark; i.e. the higher the mark value, the higher the offspring productivity, in the immediate nearest neighbourhood, according to some specific spatial and temporal reproduction laws. As well as seismic sequences, both hailstorms’ and thunderstorms winds’ after-sequences could be produced starting from spatio-temporal decay laws. The subject of the analysis is of great importance and allows for the explanation, as well as the possible forecast, of catastrophic events, damages and hazards resulting from severe convective storms, using the dataset provided by the National Center for Environmental Information (NCEI) of the National Oceanic and Atmospheric Administration (NOAA) of the USA (2022a).

This work aims to understand the spatio-temporal dynamics of severe hailstorms and thunderstorms winds coherently with stochastic processes’ theory. We are here interested in the evaluation of thunderstorms winds’ and hailstorms’ processes in terms of Windowed ETAS models (Nicolis et al. 2015), over several temporal (in particular yearly) sub-patterns.

In order to fit and attempt to explain the spatio-temporal data related to the historical severe storms in the USA, according to the marked self-exciting spatio-temporal point processes theory, the formulation of the extended Hawkes (1971)—the Epidemic Type-Aftershocks Sequence (ETAS—Ogata 1988)—model is considered.

The paper is organized as follows: after this brief introduction, Sect.  2 introduces the point process framework related to spatio-temporal point processes and the ETAS model; Sect.  3 shows the case study, the exploratory analysis and the model fitting results; Sect.  4 provides the interpretation of ETAS model fitting; and finally, Sect. 5 provides a brief discussion and some concluding remarks on methodology and future developments.

2 Spatio-temporal point processes and conditional intensity function (CIF)

Let X be a marked spatio-temporal point process, so a random countable subset \(X \in \mathbb {R}^{2}\times \mathbb {R}^{+}\times \mathcal {M}\), where a point \((\varvec{u}, t, m)\in X\) corresponds to an event at \(\varvec{u}\in \mathbb {R}^{2}\) occurring at time \(t\in \mathbb {R}^{+}\), with attached mark value \(m \in \mathbb {R}^{+}\). A marked point pattern, which is a realization of a marked spatio-temporal point process X, is a finite set of distinct points of size \(n > 0\), \(\{\varvec{u}_{i}, t_{i}, m_{i} \}_{i=1}^{n}\), within a bounded spatio-temporal region \(W\times T \times \mathcal {M}\subset \mathbb {R}^{2}\times \mathbb {R}^{+}\times \mathbb {R}^{+}\), with area \(|W |> 0\), length \(|T |>0\) and marking structure \(\mathcal {M} \in \mathbb {R}^{+}\) (Daley et al. 2003). Let N be a counting measure, such that \(N(A\times B)\) denotes the number of (unmarked) points of a set \((A\times B)\bigcap X\), with \(A\subseteq W\) and \(B \subseteq T\).

First- and second-order properties, representing the point process characteristics associated with the mean and the covariance functions, respectively, are described by the corresponding first, \(\lambda (\varvec{u}, t, m)\), \(\forall \{(\varvec{u}, t, m)\} \in X\), and second-order, \(\lambda ^{(2)}\left( (\varvec{u}, t, m), (\varvec{v}, l, m^{\prime })\right)\), \(\forall \{(\varvec{u}, t, m), (\varvec{u}, l, m^{\prime })\} \in X\), intensity functions. A refined version (Illian et al. 2007) of the intensity function, \(\lambda _{\varvec{\theta }}(\varvec{u}, t, m)\), is the so-called conditional intensity function, \(\lambda (\varvec{u}, t, m |\mathcal {F})\), where \(\mathcal {F}\) indicates the filtration, given by a sequence of sub \(\sigma\)-algebras to which the evaluation of the pattern is subject (each time, whenever a new event occurs). The conditional intensity function is a stochastic process itself and completely characterizes the associated spatio-temporal point pattern (Adelfio and Chiodi 2014; Daley et al. 2003). If the filtration is represented by the past history of the process, including the marks: \(\mathscr {H}_{t} = \left\{ (\varvec{u}_{i}, t_{i}, m_{i}): t_{i} < t\right\}\), then the conditional intensity function is defined as:

$$\begin{aligned} \lambda (\varvec{u}, t, m \mid \mathscr {H}_{t}) = J_{\mathcal {M}}(m)\lambda (\varvec{u}, t \mid \mathscr {H}_{t}) \end{aligned}$$
(1)

where \(J_{\mathcal {M}}(m)\) is a probability density function describing the mark distribution; i.e., the mark density distribution function, which describes the expected number of offsprings of an event of magnitude m, where \(m \ge m_{0}\); \(\lambda (\varvec{u}, t\mid \mathscr {H}_{t})\) is the conditional intensity function of the unmarked spatio-temporal point pattern; \(\mathscr {H}_{t}\) is the past history of the marked point pattern (Ogata 1988; Stindl and Chen 2022; Zhuang et al. 2002). No inference about the mark density distribution function is provided in the paper.

Under the assumption of underlying marked self-exciting spatio-temporal point process, the conditional intensity function of the corresponding unmarked spatio-temporal point pattern, \(\lambda (\varvec{u}, t \mid \mathscr {H}_{t})\), is defined as:

$$\begin{aligned} \lambda (\varvec{u}, t |\mathscr {H}_{t}) = \displaystyle \lim _{\Delta \varvec{u}, \Delta t \rightarrow 0} \frac{\mathbb {E}\left[ N([\varvec{u}, \varvec{u} + \Delta \varvec{u}]\times [t, t+\Delta t]) |\mathscr {H}_{t} \right] }{\Delta \varvec{u}\Delta t} \end{aligned}$$
(2)

where \(\mathbb {E}[\cdot ]\) is the expectation operator; \(\Delta \varvec{u}\) and \(\Delta t\) are spatial and temporal infinitesimal increments; \(\mathscr {H}_{t}\) is the spatio-temporal occurrence history of the process up to time t. This is also a conditional expectation, and represents the history-dependent (conditional) probability that an event occurs in the volume \(\{[\varvec{u}, \varvec{u} + \Delta \varvec{u}]\times [t, t+\Delta t]\}\), around the location \((\varvec{u}, t) \in X\) and up to time t (Adelfio and Ogata 2010).

2.1 The Epidemic Type-Aftershocks Sequences (ETAS) model

A branching process is a Markov process in which each individual in the nth generation produces random numbers of individuals in the \((n+1)\)th generation, according to a probability distribution function which is repeated from individual to individual. Branching processes are generally used to model reproduction phenomena (Adelfio and Chiodi 2014).

The conditional intensity function of a spatio-temporal branching model corresponds to the sum between a term indicating spontaneous or background activity and a term indicating induced or triggered activity. The triggered component evaluates the interaction activity with the events characterizing the past history of the process, and the parametric conditional intensity function is:

$$\begin{aligned} \lambda _{\varvec{\theta }}(\varvec{u}, t |\mathscr {H}_{t}) = \mu f(\varvec{u}) + \tau _{\varvec{\phi }}(\varvec{u}, t)\quad \text {with:}\hspace{0.5em} \tau _{\varvec{\phi }}(\varvec{u}, t) = \displaystyle \sum _{t_{j} < t} \nu _{\varvec{\phi }}(\varvec{u} - \varvec{u}_{j}, t-t_{j}\mid m_{j}) \end{aligned}$$
(3)

where \(\varvec{\theta } = (\mu , \varvec{\phi })\) is the vector of parameters of the background (\(\mu\)) and the induced (\(\varvec{\phi }\)) intensity, respectively; \(f(\varvec{u})\) is the spatial density; \(\tau _{\varvec{\phi }}(\varvec{u}, t)\) is the conditional triggered intensity to the mark value attached to the corresponding point. Under the assumption of spatio-temporal separability, the latter, \(\forall j: t_{j} < t\), can be further factorized (Stindl and Chen 2023; Molkenthin et al. 2022) as the product between the spatial occurrence rate, \(g(\varvec{u}-\varvec{u}_{j})\), the temporal occurrence rate, \(t(t-t_{j})\), and a boost function, which expresses the proportionality with the shifted/truncated exponential law associated to the magnitude distribution, \(\kappa (m_{j})\), i.e.:

$$\begin{aligned} \nu _{\varvec{\phi }}(\varvec{u}-\varvec{u}_{j}, t-t_{j}\mid m_{j}) = f(t-t_{j})g(\varvec{u}-\varvec{u}_{j}) \kappa (m_{j}) \end{aligned}$$
(4)

An Epidemic Type-Aftershocks Sequence (ETAS—Ogata 1988) model, which can be considered as an extension of the Hawkes (1971) model, assumes a branching process of descendants (Adelfio et al. 2006), and is defined according to a parametric conditional intensity function which is expressed as:

$$\begin{aligned} \lambda _{\varvec{\theta }}(\varvec{u}, t |\mathscr {H}_{t}) = \mu f(\varvec{u}) + \kappa _{0}\displaystyle \sum _{t_{j}<t}\frac{e^{\alpha (m_{j}-m_{0})}\left\{ (\varvec{u}-\varvec{u}_{j})^{2}+d \right\} ^{-q}}{(t-t_{j}+c)^{p}} \end{aligned}$$
(5)

where the triggered component results from the product between three components, which are those listed below:

  • the Modified Omori Law (Omori 1895; Utsu 1961), representing the density of aftershocks in time, defining the occurrence rate of aftershocks at time t, following the earthquake of time \(t_{j}\), which is represented by a non-stationary Poisson process (Ogata 1983), and which is expressed as:

    $$\begin{aligned} f(t-t_{j})=\frac{\kappa _{0}}{(t-t_{j}+c)^{p}} \quad \text {with:} \hspace{0.2em} t > t_{j} \end{aligned}$$
    (6)

    where c and p are the temporal displacement and decay parameters; \(\kappa _{0}\) is the normalizing constant linked to the aftershock productivity;

  • the density of aftershocks in space, associated with the spatio-temporal location of the same event,

    $$\begin{aligned} g(\varvec{u}-\varvec{u}_{j}) = \frac{1}{[(\varvec{u}-\varvec{u}_{j})^{2} + d]^{q}} \end{aligned}$$
    (7)

    where d and q are the spatial displacement and decay parameters, respectively;

  • the boost function, \(\kappa (m_{j})\), which indicates the mark density distribution function and is proportional to the assumed shifted exponential density \(J_{\mathcal {M}}(m) = \beta \exp \{-\beta (m_{j}-m_{0})\}\) in (3), as follows:

    $$\begin{aligned} \kappa (m_{j}) = \exp \{\alpha (m_{j} - m_{0})\} \propto \beta \exp \{- \beta (m_{j} - m_{0})\} \quad \text {with:} \hspace{0.2em} m_{0} \le m_{j} \end{aligned}$$
    (8)

    where: \(m_{j}\) represents the magnitude of the jth event; \(\beta = b\log (10)\) is the rate value, which is a function of the b parameter describing the Gutenberg–Richter recurrence law (Gutenberg and Richter 1944—as detailed in Sect.  3.4); \(m_{0}\) is the completeness magnitude threshold; \(\alpha\) describes the expected number of offsprings generated by a single event.

The modified Omori Law in (6) describes the decay of the number of aftershocks after a mainshock event. The decay rates of aftershocks, including the modified Omori Law, are both described by power law decay functions.

The log-likelihood function to be maximized is expressed as:

$$\begin{aligned} \ell _{X}(\varvec{\theta }; X) = \log \left[ \mathcal {L}_{X}(\varvec{\theta }; X) \right] = \displaystyle \sum _{i=1}^{n}\log \left[ \lambda _{\varvec{\theta }}(\varvec{u}_{i}, t_{i} |\mathscr {H}_{t}) \right] - \displaystyle \int _{0}^{T}\int _{W}\lambda _{\varvec{\theta }}(\varvec{u}, t |\mathscr {H}_{t})d\varvec{u}dt \end{aligned}$$
(9)

2.2 Parameters estimation

The set of parameters \(\varvec{\theta } = \{\mu , \kappa _{0}, c, p, \alpha , d, q\}\) is generally estimated through the maximum likelihood (ML) approach (Ogata 1988) or by implementing the expectation-maximization (EM) algorithm to maximize the expected complete data log-likelihood function (Veen and Schoenberg 2008). The background spatial intensity, \(f(\cdot )\), is generally estimated through non-parametric methods (Chiodi and Adelfio 2008), involving kernel-based techniques. Namely, the latter can be estimated through a weighted Gaussian Kernel estimator (Sheather 2004; Silverman 1981), whose formulation is:

$$\begin{aligned} \hat{f}_{\varvec{\Sigma }}(\varvec{u}) = \frac{\displaystyle \sum _{i=1}^{n}\rho _{i}\mathcal {K}(\varvec{u}-\varvec{u}_{i}, \varvec{\Sigma })}{\displaystyle \sum _{i=1}^{n}\rho _{i}} \end{aligned}$$
(10)

where the weight \(\rho _{i}\) is the estimated probability that the event \((\varvec{u}_{i}, t_{i})\) belongs to the background component, and is given by:

$$\begin{aligned} \rho _{i} = \frac{\hat{\mu }\hat{f}_{\varvec{\Sigma }}(\varvec{u}_{i})}{\hat{\lambda }_{\varvec{\hat{\theta }}}(\varvec{u}_{i}, t_{i}|\mathscr {H}_{t})} \end{aligned}$$
(11)

where: \(\hat{\mu }\) is the estimated temporal background intensity; \(\hat{f}_{\varvec{\Sigma }}(\varvec{u}_{i})\) is the Gaussian Kernel estimate of the ith point; \(\hat{\lambda }_{\varvec{\hat{\theta }}}(\varvec{u}_{i}, t_{i}|\mathscr {H}_{t})\) is the estimated conditional intensity value of the ith point, all estimated by the ETAS model.

For the evaluation of the bandwidth, the classical Silverman’s rule of thumb can be used, as well as the cross-validation one proposed in Adelfio and Ogata (2010), or the semi-parametric, simultaneous and alternated Forward Likelihood-based Predictive (FLP) approach, proposed by Chiodi and Adelfio (2017), which maximizes the predictive information. For further details about the FLP approach, see Chiodi and Adelfio (2008) and Adelfio et al. (2013).

3 Results

3.1 Data

The Storm Events Database (National Center for Environmental Information (NCEI) of the National Oceanic and Atmospheric Administration (NOAA) of the USA 2022a) is an integrated database containing information on severe weather events that occurred in the USA, from 1950 to 2023. For each severe event, a variety of variables are collected, including severe event type, magnitude (severity), magnitude type, spatio-temporal location, start and end time, event azimuth, direct and indirect impacts, such as the cost of damage to property and crops. Events are recorded if they are of sufficient intensity to cause loss of life, injury, significant property damage and/or disruption of commerce; or if they are considered rare and unusual enough to attract media attention; or if they are associated with other meteorological or climatic events; or if they are consistent with concurrently recorded maximum or minimum temperature values or precipitation amounts. Preliminary information and data are included in the Storm Prediction Center (2023) reports, during the first 120 days from the day of occurrence.

The corresponding documentation is prepared and published by the National Weather Service (1999). The recorded severe occurrences cover 40 types of meteorological and climatic events, including severe convective storms and, in particular, thunderstorms winds and hailstorms. The latter are the most frequent and spatio-temporally dynamic (Kelly et al. 1985) storms in the USA’s history of natural disasters, and they are the only two types of severe weather events that have their own magnitude accompanying them.

In this application, the listed variables are considered for each severe convective storm event:

  • Spatial location, related to longitude and latitude values, recorded in decimal degrees and converted into km;

  • Temporal location, related to the exact occurrence’s hour and day;

  • Magnitude value, i.e., hail extension for hailstorms, recorded and analysed in inches, wind speed for thunderstorms winds, recorded in MPH and analysed in km/s.

The total number of recorded hailstorms’ events, from 1996 to 2022, is 288, 126. The total number of recorded thunderstorms winds’ events, from 2003 to 2022, is 240, 040. They are yearly analysed because of the lack of knowledge about the generating processes of both severe convective storms, i.e., as a first understanding of the hypothesized dynamics of both processes.

The marked spatio-temporal distributions of hailstorms’ (green, Fig. 1a) and thunderstorms winds’ (blue, Fig. 1b) selected events, that occurred within the USA territory, are represented in Fig. 1.

Fig. 1
figure 1

Marked spatio-temporal distribution of hailstorms’ (a) and thunderstorms winds’ (b) selected events

In Fig. 1, events are represented according to their spatio-temporal locations. Spatial (longitude and latitude) and temporal locations are represented in km and days, respectively. The time domain corresponds to the temporal distance, in days, from the occurrence of the first event. Each point is represented with size and color proportional to the recorded magnitude value, represented by hail extension in inches (for hailstorms, Fig. 1a) and thunderstorms winds’ speed in km/s (for thunderstorms winds, Fig. 1b): the higher the recorded magnitude value, the darker and larger the point, for both spatio-temporal patterns.

Only events belonging to the complete catalogues—according to the completeness magnitude thresholds values, as detailed in Sect.  3.4—are shown, i.e., only hailstorms whose hail extension is greater than 0.8 inches and thunderstorms winds with wind speed greater than 1.35 km/s. The total number of hailstorms and thunderstorms winds above the corresponding magnitude threshold, is 188, 075 and 163, 783, respectively. Both patterns are yearly analysed from the marked self-exciting point processes’ point of view. In particular, the occurrence of hailstorms is analysed for each year ranging from 1996 to 2022; the occurrence of thunderstorms winds is yearly analysed for the years that go from 2003 to 2022 (due to the lack of occurrence for the years 1996/2002).

3.2 Exploratory analysis

Figure 2 shows the temporal distribution of the total number of daily recorded hailstorms’ (green, Fig. 2a) and thunderstorms winds’ (blue, Fig. 2b) events.

Fig. 2
figure 2

Temporal distribution of the total number of daily recorded hailstorms’ (a) and thunderstorms winds’ (b) events

As shown in Fig. 2, both patterns follow a seasonal yearly pattern, and a higher number of events is recorded during mid-year. The highest number of severe hailstorms (Fig. 2a) is 488, which occurred on March 21, 2011. More than 400 severe hailstorms are observed on April 9 and 10, 2009, and on May, 21/22 and 24/26, 2011. In 2011, the higher severe hailstorm occurrence with hail extension greater than 0.8 inches are recorded. The highest number of severe thunderstorms winds (Fig. 2b) is 721, which occurred on June 29, 2012. More than 500 severe thunderstorms winds are recorded on 29/30 June, 2012 and on February 28 and March 1, 2017. Finally, at least one severe convective storm is observed within the USA, per type, between January 1st, 2003 and December 31st, 2022.

Figure 3 shows the spatial distributions of the magnitudes, and in particular of the hail extensions, in inches (green, Fig. 3a), and the wind speeds, in km/s (blue, Fig. 3b); the longitude and latitude coordinates’ values are shown in km, and the higher the recorded magnitude value, the darker and larger the associated point.

Fig. 3
figure 3

Spatial distribution of hail extension (a) and wind speed (b)

As shown in Fig. 3, both hailstorms’ and thunderstorms winds’ patterns are mainly detected in the areas from the Central to the Northern-East. As shown in Fig. 3a, events with the highest hail extension, ranging from 5 to 8 inches, are recorded in the Central areas, and other events with higher hail extension are observed in the Northern-East areas. Some aggregations, as well as a dominating general trend towards spatial heterogeneity and sparsity, conditionally to the extension, are observed. As shown in Fig. 3b, thunderstorms winds’ events with the highest wind speeds are grouped in well-defined areas, i.e. nearest to each other if with higher wind speed values. Otherwise, thunderstorms winds’ pattern shows a dominating general trend towards spatial aggregation, conditionally to wind speed.

Figure 4 shows the average daily distributions of hail extension (green, Fig. 4a) and wind speed (blue, Fig. 4b) over the completeness magnitude threshold values, with associated daily minimum and maximum values (grey).

Fig. 4
figure 4

Average, minimum and maximum daily distribution of hail extension (inches, a) and wind speeds (km/s, b) patterns

As shown in Fig. 4, seasonal yearly and grouping behaviour is observed in both hailstorms’ and thunderstorms winds’ average magnitude values, conditionally to the magnitude values. For each year, during 1th and 4th quarters and during 2nd and 3th quarters, respectively, the lowest and highest average daily magnitude values are observed. As shown in Fig. 4a, the average daily hail extension mainly ranges between 0.8 and 2, and a whole decreasing trend (from 2007 to 2019), which becomes increasing (from 2020 to 2022) is observed. As shown in Fig. 4a, the average wind speeds mainly range between 1.35 and 2.20, though some peaks over 2.20 km/s are observed in 2003, 2005, from 2009 to 2012 and from 2017 to 2022. A general increasing trend, as well as increasing variability, from 2003 to 2022, is observed.

3.3 ETAS models identification

The aggregation behaviour of the yearly spatio-temporal patterns is evaluated according to the empirical unweighted K-functions (Ripley 1976; Gabriel and Diggle 2009), whose formulation is:

$$\begin{aligned} K(r, h) = 2\pi \displaystyle \int _{0}^{h}\int _{0}^{r}\frac{\lambda ^{(2)}\left( (\varvec{u}, t), (\varvec{v}, l)\right) }{\lambda ({\varvec{u}}, t)\lambda (\varvec{v}, l)}drdh \quad r, h > 0 \end{aligned}$$
(12)

where: \(r = \Vert \varvec{u}-\varvec{v}\Vert\), is the Euclidean distance (or distance in norm \(L_{2}\)) between spatial locations \(\varvec{u}\) and \(\varvec{v}\); \(h = |t-l|\) is the Manhattan distance (or distance in norm \(L_{1}\)) between temporal locations t and l. Under Complete Spatial and Temporal Randomness (CSTR), the second-order function takes values as that describing the spatio-temporal homogeneous Poisson process: \(K(r, h) = 2\pi hr^{2}\). The unbiased weighted spatio-temporal K-functions’ estimator, \(\hat{K}_{I}(r, h)\), is:

$$\begin{aligned} \hat{K}_{I}(r, h) = \frac{|W ||T |}{n(n - 1)} \displaystyle \sum _{(\varvec{u}, t) \in X} \sum _{(\varvec{v}, l) X_{\backslash \{(\varvec{u}, l)\}}} \frac{\mathbbm {1}\left[ \Vert \varvec{u} - \varvec{v} \Vert \le r, |t - l |\le h \right] }{\hat{\lambda }(\varvec{u}, t)\hat{\lambda }(\varvec{v}, l)w\big ((\varvec{u}, t), (\varvec{v}, l)\big )} \end{aligned}$$
(13)

where: \(\hat{\lambda }(\varvec{u}, t)\) and \(\hat{\lambda }(\varvec{v}, l)\) are the inhomogeneous intensity values evaluated by the model, at points \((\varvec{u}, t)\) and \((\varvec{v}, l)\); \(w\left( (\varvec{u}, t), (\varvec{v}, l)\right)\) is a correction weight which adjusts for possible biases due to the spatio-temporal domain definition (Baddeley and Turner 2000; Gabriel et al. 2013); \(\mathbbm {1}\left[ \cdots \right]\) is the indicator operator, which takes a value of 1, if the statement \(\cdots\) is true and 0 otherwise. Any positive deviation from CSTR indicates spatio-temporal clustering with respect to the spatio-temporal Poisson distribution.

Finally, the chosen correction method is the translation (Ohser 1983) one, which allows amendment with respect to events lying further away from spatial and temporal bounded domains, with respect to those assessed for the event (Gabriel and Diggle 2009).

It is important to highlight that LISTA functions, such as the K and pair correlation functions, are also commonly used for the assessment of point processes’ models’ fitting. In this case, weighted second-order functions have to be computed under the assumption of known first-order intensity function, according to that estimated by the model. Further information on weighted second-order diagnostic methods can be found in Adelfio and Schoenberg (2008), Adelfio and Chiodi (2008) and Adelfio (2007).

Figure 5 shows the unweighted empirical spatio-temporal K-functions calculated for each yearly sub-pattern under translation edge correction, for hailstorms’ and thunderstorms winds’ patterns. In Fig. 5 green and blue surfaces are the empirical unweighted K-functions of the yearly hailstorms’ (green, Fig. 5a) and thunderstorms winds’ (blue, Fig. 5b) sub-patterns, respectively, whose intensity function estimates are computed under the assumption of spatio-temporal homogeneity \(\left( \hat{\lambda }(\varvec{u}, t) = \hat{\lambda }(\varvec{v}, l) \equiv \hat{\lambda } = \frac{N(W\times T)}{|W||T |}\right)\); black surfaces are the theoretical K-functions, computed under CSTR and grey surfaces are upper and lower \(95\%\) envelopes obtained with 39 Montecarlo simulations within the spatio-temporal domain where patterns are observed.

Fig. 5
figure 5

Empirical unweighted K-functions for each yearly hailstorms’ (a) and thunderstorms winds’ (b) pattern, theoretical K-function (black) and \(95\%\) envelopes (grey) under CSTR

As shown in Fig. 5, yearly thunderstorms winds’ patterns (Fig. 5a) are the most aggregated and less variable, while hailstorms’ patterns (Fig. 5b) are the least aggregated and most heterogeneous, along the years of observation. Once we have proved the aggregation behavior of yearly sub-patterns for each severe convective storm, we will try to evaluate them according to the marked self-exciting point processes’ model fitting.

The proposed application for the evaluation of severe convective storms’ underlying processes is based on a declustering technique for clustered processes. The whole pattern is hypothesized to be a superposition of a spatio-temporal Poisson (background) and a clustered (triggered) point pattern. For the resulting pattern to be declustered, the definition of two intensity functions—the former for the background pattern and the latter for the triggered one—is needed. We decide to assume a self-exciting spatio-temporal underlying generating process, subject to a driving random variable (the magnitude).

3.4 Empirical choice of the magnitude threshold

Based on the ETAS model fitting, an important parameter to be chosen in advance is the completeness magnitude threshold, \(m_{0}\), which is assumed to be the minimum value above which all events are included in the catalogue; i.e. the minimum value assumed by the model for the magnitude distribution. The completeness magnitude threshold is linked to the frequency-magnitude distribution (Ishimoto 1939), which for seismic processes is shown to be well approximated by an exponential function, and in particular the Gutenberg–Richter law (Gutenberg and Richter 1944), which is expressed as:

$$\begin{aligned} \log _{10}(N_{m}) = a - bm \end{aligned}$$
(14)

where \(N_{m}\) is the cumulative number of earthquakes with a magnitude value greater than \(m \in \mathcal {M}\); \(\log _{10}(\cdot )\) is the logarithm with base 10; a and b are constants. The completeness magnitude threshold is the value of the maximum frequency at which events are observed.

The log-cumulative frequencies plot, with corresponding completeness magnitude threshold values (in violet) of hailstorms’ (in green, Fig. 6a) and thunderstorms winds’ (in blue, Fig. 6b) patterns is shown in Fig. 6.

Fig. 6
figure 6

Log-cumulative frequency plot of hail extension (a) and wind speed (b), with corresponding completeness magnitude threshold’s values lines

Both values of the completeness magnitude threshold are graphically identified, through the log-cumulative frequency plots. For each plot, the former is identified as a point for which there is a strong change in the line slope. The completeness magnitude threshold values are \(m_{0} = 0.80\) inches, for hailstorms’ pattern, and \(m_{0} = 1.35\), for thunderstorms winds’ pattern.

3.5 Yearly models fitting

We fit an ETAS model for each year of observation, from 1996 to 2022 for the hailstorms pattern, and from 2003 to 2022 for the thunderstorms winds’ pattern. The temporal parameters are estimated by setting up the day on which the events occur, while the spatial parameters refer to longitude and latitude values recorded in kilometers. We tried to consider a linear predictor within which we inserted a linear predictor containing trigonometric functions (such as \(\sin (\cdot )\) and \(\cos (\cdot )\) functions), to capture the daily within-year seasonality effect (clearly shown in Subsect.  3.2), but the estimated parameters for the same covariates are not statistically significant. In this paper, we used the R algorithms in the etasFLP package (Chiodi and Adelfio 2023). For parameter estimation and bandwidth lengths evaluation an Expectation-Maximization (EM) type algorithm—with a weighting step—is used. Namely, during the Expectation step, the bandwidth lengths are evaluated, using Silverman’s rule of thumb and according to the set of weights [each computed with the formulation in (11)] attributed to each point; during the Maximization step, the log-likelihood function (9) is maximized with a Newton-type algorithm, to obtain the set of parameter estimates. The corresponding stop criterion is defined according to some convergence criterions. For the details on the used algorithm, see: Chiodi and Adelfio (2017).

The results of the ETAS model fitting are shown in Figs. 7 and 8, for marked spatio-temporal hailstorms’ and thunderstorms winds’ patterns, respectively. In Figs. 7 and 8, the black lines represent the values of the parameter estimates, and the colored lines above and below are the associated ± estimated standard errors.

Fig. 7
figure 7

Yearly parameter estimates of the ETAS models for hailstorms’ patterns

According to the results shown in Fig. 7, the of background events (Fig. 7a) varies between 0.5 and 2 (according to the estimated minimum and maximum values, in 2007 and 2012) each day. The aftershocks productivity parameter (Fig. 7b) estimate is the highest in 2003 and the lowest in 1998. Several peaks are observed, in 2003, 2006 and 2009, where the background component of the patterns recorded the highest amount of offsprings generation. The spatial (Fig. 7e) and temporal (Fig. 7c) displacement parameter estimates, as well as the spatial (Fig. 7f) and the temporal (Fig. 7d) decay parameter estimates, follow an overall decreasing trend, i.e. the pattern’s ability to reproduce itself decreases over years. According to the estimated parameters of the offspring generation (Fig. 7g) over the years of observation, the hail extension value has an increasingly positive effect on the spatio-temporal reproductivity of the phenomenon, although its estimated uncertainty is the highest.

Fig. 8
figure 8

Yearly parameter estimates of the ETAS models for thunderstorms winds’ patterns

As shown in Fig. 8, the daily rate of background events (Fig. 8a) varies between 0.5 and 1 (according to the estimated minimum and maximum values, in 2010 and 2017); the aftershock productivity parameter (Fig. 8b) follows a variable trend, i.e., peaks are continuously observed, as well as for the temporal displacement (Fig. 8d) parameter, according to which the temporal decay is highly variable along the years of observation. The spatial decay rate (Fig. 2a) follows a constant and mainly uncertain trend, and the estimated spatial displacement parameter’s value (Fig. 8c) slowly decreases until it maintains a constant trend. Finally, the estimated magnitude’s offspring generation’s (Fig. 8g) parameters follow a decreasing trend over the years of observation, i.e., thunderstorms winds’ speed has a decreasing effect on the spatio-temporal reproductivity of the phenomenon.

Finally, Fig. 9 shows the yearly number of events included within the catalogues (Fig. 9a, c) and the estimated proportions of background events (Fig. 9b, d) by the yearly ETAS models, for hailstorms’ (first row) and thunderstorms winds’ (second row) patterns.

Fig. 9
figure 9

Total number of events and estimated proportions of background events by each yearly ETAS model, for hailstorms’ (a, b) and thunderstorms winds’ (c, d) patterns

As shown in Fig. 9, the number of recorded severe (with high magnitude values) events increases until 2011, for both patterns (Fig. 9a, c), and decreases in the following years. Regarding the yearly proportions of background events for the hailstorms’ patterns (Fig. 9b): the maximum number of estimated mainshocks occurs in 1998, when \(16\%\) of events are classified as mainshocks, and linearly decreases until 2008, when \(4\%\) of events are classified as mainshocks. After 2008, the estimated proportion of mainshock events linearly increases until 2022. About the yearly proportions of background events for the thunderstorms winds’ patterns (Fig. 9d): the maximum number of estimated mainshocks occurs in 2022, when \(\approx 6\%\) of events are classified as mainshocks. The lowest number of mainshock events is estimated at 2010, and for the remaining years, it varies between \(2.3\%\) and \(5.5\%\). For both severe convective storms’ sub-patterns, the lower the number of severe convective storms, the higher the proportions of events classified as mainshocks, by the ETAS models.

3.6 Second-order diagnostics

The evaluation of the model fitting, in terms of residual analysis, is carried out by computing the yearly weighted K-functions, with the estimated conditional intensity functions by the fitted ETAS models, under translation edge correction. Figure 10 shows the yearly weighted K-functions (the black and grey surfaces are the same used in Fig. 5), for each yearly hailstorms (green, Fig. 10a) and thunderstorms winds (blue, Fig. 10b) pattern.

Fig. 10
figure 10

Weighted K-functions for each yearly hailstorms (a) and thunderstorms winds’ (b) pattern, computed under the assumption of known conditional intensity function by the fitted ETAS models, theoretical K-function and \(95\%\) envelopes (grey) under CSTR

As shown in Fig. 10, most of the weighted surfaces are completely covered by the \(95\%\) envelopes around the theoretical K-function under CSTR, and some are close to them. As shown in Fig. 10a, one surface is above the \(95\%\) envelopes, and refers to the year 2008, for the associated severe hailstorms yearly process; similarly, as shown in Fig. 10b, one surface is above the \(95\%\) envelopes and refers to the year 2011, for the corresponding severe thunderstorms winds yearly process. In both cases, there are some patterns that have not been fully captured by the fitted ETAS models (for the same years, and the associated patterns, the analysis should be further deepened).

4 Interpretation of the ETAS models’ results

According to the results of the fitted ETAS models, the analysed hailstorms patterns recorded within the USA, between 1996 and 2022, are described according to the following interpretation:

  • the number of severe hailstorms increases over time, and at least 1 hailstorm daily occurs, from 1996 to 2022, in the USA;

  • the proportion of mainshock events, with respect to the detected severe events (events estimated as parents of a group of offspring events), decreases with time, i.e. all minor events are produced according to the spatio-temporal decay of the mainshock events, whose effect spatio-temporally lasts with an ever decreasing (and almost constant) displacement;

  • the hail extension has an increasingly positive and strong effect on the offspring generation of small grasps;

  • the higher the occurrence of daily mainshock events, the higher the aftershocks productivity effects.

while about the analysed thunderstorms winds’ patterns recorded within the USA, between 2003 and 2022, we have:

  • the proportion of mainshock events tends to be low and increases in 2022, when the number of severe convective (thunder)storms is the lowest recorded within the observation period;

  • the higher the number of mainshock events, the higher the aftershock productivity rate;

  • a decreasing temporal displacement effect along the years of observation is associated with a variable temporal decay, i.e. the mainshock effects on the generation of offspring contract and expand with some variability along the years of observation;

  • the decreasing spatial displacement effect is always associated with a constant and approximately equal to 1 spatial decay, i.e. there is no variability around the estimated spatial decay, despite the decreasing trend of the spatial displacement;

  • the wind speed of thunderstorms winds has a decreasing positive effect on the spatio-temporal phenomenon reproductivity.

5 Conclusions

In this paper, we focused on the use of stochastic process theory to evaluate the spatio-temporal behaviour of historical severe convective storms that occurred in the USA, starting from the available Storm Events Database (National Center for Environmental Information (NCEI) of the National Oceanic and Atmospheric Administration (NOAA) of the USA 2022a). We considered the hailstorms’ and the thunderstorms winds’ processes and evaluated their spatio-temporal aggregation behaviour from the marked self-exciting point processes point of view.

Our aim was to find a way to capture the yearly spatio-temporal behaviours, i.e. by dividing the temporal domain into sub-intervals, each associated with the corresponding year. We chose the temporal Windowed ETAS model approach, so that we could evaluate the behaviour of the sub-processes for each year of observation. We aimed to fit yearly spatio-temporal ETAS models for our first understanding of the most frequent severe convective storm processes (related to hailstorms and thunderstorms winds) that occurred in the USA (Kelly et al. 1985), between 1996 and 2022, from the marked self-exciting point processes point of view.

We evaluated the yearly patterns’ behaviours and discovered how processes tend to spatio-temporally aggregate in all years of observations, conditionally to the magnitude values: hailstorms’ pattern showed an aggregation behaviour mainly observed in the Central areas of the USA and along the days of the year (for each year), as well as a general trend towards sparsity and heterogeneity of the spatial hail extension within the whole territory; thunderstorms winds’ pattern showed a spatial aggregation behaviour mainly observed in several sub-areas located from the Central to the Northern-East areas of the USA territory, as well as the highest temporal variability along each year of observation. For both patterns, given the aggregated nature of the data, it seemed reasonable to assume spatio-temporal self-exciting underlying processes. We then computed the yearly local indicators of spatio-temporal association (LISTA—Siino et al. 2018), i.e. the unweighted spatio-temporal K-functions. Taking into account the just-evaluated spatio-temporal aggregation behaviour of the yearly patterns, we fitted several yearly spatio-temporal ETAS models, such that yearly patterns were evaluated as self-exciting types. The yearly ETAS models were fitted according to the choice of the completeness magnitude threshold, i.e. by selecting only the events that make the catalogue complete. The complete magnitude threshold values were graphically assessed, according to the magnitude-frequency distributions plots (describing an exponential Richter-Gutenberg decay law). The chosen completeness magnitude threshold values were \(m_{0}=0.80\) inches, for hail extension, and \(m_{0}=1.35\) km/s, for thunderstorms winds’ speed. Beyond these values, the catalogue should be complete, i.e. without missing events. The number of selected severe convective storms increased until 2011, and then slowly decreased in the successive years.

For all years of observation, the estimated parameters are always significant and we noticed how in some cases they follow a coherent (increasing or decreasing) long-term trend, or how they show a variable, jagged and (in some cases) uncertain trend.

For hailstorms’ pattern, after 2008, a decrease in the spatial and temporal displacements and an associated decrease in the spatial and temporal decay laws is estimated, along the years of observation, so the spatio-temporal reproduction effects around the parents’ centers, in terms of spatio-temporal distances, tend to reduce, as well as the aftershock productivity (and the number of offspring events) tends to be subject to the occurred mainshocks at highest spatio-temporal distances. The recorded magnitude values tend to have an increasing positive effect on the offspring generation, while the marginal aftershocks productivity parameter is estimated to have an increasing value, until 2010, to gradually decrease until 2017, and again to slowly increase along the remaining years of observation. The average number of days between two mainshocks tends to vary between the minimum temporal interval, corresponding to a half of day, and a maximum temporal interval of two days. For thunderstorms winds’ patterns, a decrease in the spatial displacement is observed, as well as a tendency of the temporal displacement and decay laws estimated parameters, to assume lower and lower values, along the years of observation. For thunderstorms winds’ patterns, the spatial decay law estimate is approximately equal to 1, and at a higher number of mainshocks events, a higher aftershocks productivity is always associated. Finally, for thunderstorms winds’ patterns, a tendentially decreasing trend in the magnitude effect is estimated along the years of observation.

Finally, the estimated proportions of mainshock events tend to decrease for the hailstorms’ patterns: a mainshocks percentage between 4% and \(16\%\) is always observed, and those for thunderstorms winds’ patterns tend to vary along the years, until 2022, when the same assumes the highest value.

In further applications, we will consider the influence of covariates, as well as atmospheric and environmental variables, and eventually, several and different declustering techniques could be experimented.