1 Introduction

Flood Frequency Analysis (FFA) aims to describe the relationship between flood magnitude and its frequency. It is used in the design of civil infrastructure, risk assessment and mitigation. One of the basic hypotheses underlying FFA is that the hydrological quantity of interest can be studied in a probabilistic framework under the assumption that flood observations can be considered to be a sample of an independent identically distributed random variable (Stedinger et al. 1993). A number of probability models have been used in FFA, the most common being Lognormal, Gumbel (GUM), Generalized Extreme Value (GEV), Pearson type 3 (PE3), Log-Pearson type 3 (LP3), Generalized Pareto (GPA) and Generalized Logistic (GLO) (e.g., Cunnane 1989; Laio et al. 2009; Rao and Hamed 2000).

As noted by Klemes (2000), a tension has emerged since the beginnings of the science of hydrology between the need to extrapolate values of interest from observations limited in time and space (Fuller 1914; Hazen 1914a, b). Warnings “to recognize the nature of the physical processes involved and their limitations in connection with the use of statistical methods” were also raised by Horton (1931). One of the more pressing problems in FFA considered left unresolved at the end of the twentieth century (e.g., Bobée and Rasmussen 1995; Singh and Strupczewski 2002) regards the need of better integrating the knowledge of hydrological processes within the FFA methods used for engineering problems in order to reduce the gap between practice and research in hydrology.

The development of asymptotic Extreme Value Theory provided a basis for a more rigorous foundation in the choice of probability model. Gumbel (1958), in the first chapter of his book on the Statistics of Extremes, while focusing on “the flood problem”, stated: “until recent years, engineers employed purely empirical procedures for the analysis of hydrological data and, especially, the evaluation of the frequencies of floods and droughts”.

The asymptotic properties of extreme value distributions have recently been revisited exploiting new theoretical insights and larger datasets (El Adlouni et al. 2008; Koutsoyiannis 2004a, b; Papalexiou and Koutsoyiannis 2013; Papalexiou et al. 2013; Serinaldi and Kilsby 2014). Koutsoyiannis (2004a) acknowledged that the convergence to the asymptotic distributions can be too slow compared to the usual length of extreme event samples and called for renewed attention on development of exact analytical models for analysis of extremes (see also De Michele 2019; Lombardo et al. 2019). In fact, parallel to the asymptotic Extreme Value Theory is the theory of exact distributions of extremes first presented by Todorovic (1970) and Todorovic and Zelenhasic (1970). This theory seeks a stronger link between physics of floods (rainfalls) and stochastic processes. In particular, it allows one to derive the distribution of the Annual Maximum (AM) value from hydrological series characterized by Poissonian occurrences and assigned distribution of exceedances. In this framework the distributions of the GEV family have been recognized as the exact distributions of annual maxima arising from a Generalized Pareto distribution of exceedances (Madsen et al. 1997). Furthermore, Rossi et al. (1984) derived the Two-Component Extreme Value (TCEV) distribution, from a mixture of exponential exceedances, whose investigation is the main focus of this paper.

The interest in the use of mixed distributions dates back to Potter (1958) who observed the so-called dog-leg in flood frequency plots of Eastern United States. He stated that switching between different flood-producing processes (e.g., rainfall runoff and snowmelt) may lead to such a step-change in the slopes of the flood frequency curve. Cunnane (1985) recognized that flood distributions can be affected by the presence of different subpopulations due to different causes for flood generation. He also stated that a mixture of two subpopulations might be useful to explain the presence of very large outliers and the condition of separation of skewness observed by Matalas et al. (1975).

The physical processes affecting flood generation are numerous and may include different types of atmospheric forcing (e.g., Gaál et al. 2015; Hirschboeck 1987; House and Hirschboeck 1997; Merz and Bloschl 2003), different runoff generation mechanisms (saturation/infiltration excesses, e.g., Iacobellis and Fiorentino 2000; Sivapalan et al. 1990), and influence of fluvial geomorphological features (Archer 1981; Beven and Wood 1983; Woltemade and Potter 1994).

In extensive analysis of rainfall extreme values (Papalexiou and Koutsoyiannis 2013; Serinaldi and Kilsby 2014) the often-observed heavy flood frequency tail has been attributed to the mixture of different processes (extreme and non-extreme events) or temporal fluctuations of the parent distribution as affected by oscillation of climate and other physical mechanisms controlling the rainfall process.

Open issues in flood-type classification are discussed by Sikorska et al. (2015), particularly the uncertainty associated with flood classification. A number of recent studies highlight the growing interest in this field of research due to availability of large datasets, new kinds of observations and more powerful (analytical and computational) tools (e.g., Barth et al. 2016; Smith et al. 2011, 2018; Szolgay et al. 2016; Villarini 2016; Villarini and Smith 2010; Villarini et al. 2011; Tarasova et al. 2020).

In recent decades rising attention has been devoted to the influence of climate and environmental change on flood frequency, the role of changing flood generating mechanisms or other sources of “permanent change” in time series (e.g., Koutsoyiannis 2013, Bloschl et al. 2017, 2019; Do et al. 2017; Hesarkazzazi et al. 2021; Ishak and Rahman 2019; Tabari 2020; Vormoor et al. 2015, 2016; Yan et al. 2017). To account for change, several authors (e.g., Khaliq et al. 2006; Salas and Obeysekera 2014; Strupczewski and Kaczmarek 2001; Strupczewski et al. 2001a, b) have suggested the use of distributions with parameters dependent on time or other covariates accounting for trends in mean and/or variance. Zeng et al. (2014) suggested the use of mixed distributions when a change point is detected in time series analysis.

This paper investigates whether the use of mixed distributions coupled with Poissonian occurrences can be used to model the frequency of flood series affected by low-frequency (decadal) climate fluctuations. In many areas of the world, consistent results have been obtained by studying associations between global climate indices such as the El Niño/Southern Oscillation (ENSO) on hydrological variables (e.g., Vecchi and Wittenberg 2010). In the coastal basins of Eastern Australia, Verdon et al. (2004) observed lower rainfall and increased air temperatures in El Niño (warm phase of ENSO) years when compared with La Niña years. Moreover, it has been shown that multidecadal variations in the Pacific Decadal Oscillation (PDO) and Interdecadal Pacific Oscillation (IPO) indices modulate the frequency of ENSO events (Kiem and Franks 2004; Kiem et al. 2003; Power et al. 1999). Both the IPO and PDO indices are subject to chaotic effects, which prevents deterministic prediction of the climate state (Franks 2004). Nonetheless, according to Kiem et al. (2003), these indices are characterized by persistent phases at decadal scales. Kiem et al. (2003) also showed, by using stratified data, that flood distributions in Eastern Australia are affected by low-frequency climate oscillation related to ENSO and its multidecadal IPO modulation.

We focus on the theoretical properties of exact distributions of extremes arising from compound processes and provide a general theoretical framework aimed to extend the use of mixed distributions to time series affected by climate fluctuations at interannual scales. In particular, for modeling occurrences that may take place in years characterized by different interannual conditions, we exploit properties of thinning and superposition of Poissonian processes within the classical theoretical framework of compound Poisson processes.

The descriptive capacity of the TCEV distribution and the physical consistency of its estimated parameters with the proposed theoretical development is assessed in a case study involving flood series in eastern Australia. We find that the TCEV is a good candidate for modelling the frequency of flood series affected by low-frequency (decadal) climate fluctuations. However, its parameters were found to sometimes lie outside the TCEV domain on the L-moment ratio diagram (LMRD) presented by Gabriele and Arnell (1991). We therefore extend the LMRD domain of the TCEV distribution to provide a graphical tool to help assess whether the TCEV is a reasonable option for observed data in comparison with many other distributions of extremes.

The paper is structured as follows. Section 2 introduces the theory underlying the derivation of TCEV distribution. In Sect. 3 a theoretical background is first developed for processes with interannual modulation of flood risk and then followed by the analytical derivation of AM flood models based on mixture processes. The conditions under which the TCEV distribution emerges as the AM distribution are identified. Section 4 describes the updating of the TCEV L-moment ratio diagram. Section 5 presents application of the TCEV to Eastern Australian AM flood data, previously shown to be affected by multidecadal variation in IPO phases. A discussion of results and their physical interpretation is reported in Sect. 6. Conclusions and final remarks are summarized in Sect. 7.

2 Theoretical background and development

2.1 Distribution of the largest exceedance in a year

The main concepts of the theory of exact distributions of extremes are described in the works of Todorovic (1970), Todorovic and Yevjevich (1969), Todorovic and Zelenhasic (1968, 1970) and Zelenhasic (1970). Consider a streamflow time series represented by a positive random variable \(Q\) and let \(n\) be the number of exceedances over a certain threshold \({q}_{0}\) during the time interval \(\left[0,t\right].\) The \(j\)-th exceedance (\(j=\mathrm{1,2},..,n\)), occurring at time \({t}_{j}\), assumes the value:

$$X_{j} = Q_{j} - q_{0}$$
(1)

This is a marked point stochastic process (e.g., Snyder and Miller 2012). We adopt a block maxima approach, where the highest of all recorded values for a given time period or “block” is investigated. Given our focus on annual maximum, the interval \(\left[0,t\right]\) is set equal to one year. According to Todorovic (1970), we assume that both the exceedances \({X}_{j}\) and their number of occurrences, \(N\), are random variables. Therefore, as observed by Lombardo et al. (2019), n can be considered as the realization of \(N\).

If exceedances are independent, distributed as \(H(x)=Pr\left\{X\le x\right\}\), the cumulative distribution function (cdf) of the largest exceedance in any year, \(Y=max\left\{{X}_{1}\dots {X}_{n}\right\}\), can be shown to be (Zelenhasic 1970, pag. 8)

$$F(y) = \Pr({Y \le y}) = \mathop \sum \limits_{n = 0}^{\infty } \left\{ {\left[ {H(y)} \right]^{n} p_{N}(n)} \right\}$$
(2)

being \({p}_{N}(n)\) the probability of observing n exceedances in a year.

Suppose N follows a Poisson distribution. Then:

$$p_{N}(n) = \frac{{\lambda^{n} e^{ - \lambda } }}{n!}$$
(3)

where \(\uplambda\) represents the mean number of exceedances per year. It then follows that

$$F(y) = e^{ - \lambda } \left\{ {1 + \mathop \sum \limits_{n = 1}^{\infty } \frac{{\left[ {\lambda H(y)} \right]^{n} }}{n!}} \right\} = \exp \left\{ { - \lambda \left[ {1 - H(y)} \right]} \right\}$$
(4)

According to Todorovic (1978), a common distribution function for \(H(x)\) can be of the exponential type, i.e.,

$$H(x) = 1 - e^{ - x/\beta }$$
(5)

with \(\beta\) equal to the mean exceedance. Substituting Eq. (5) into Eq. (4) leads to:

$$F(y) = \exp \left( { - \lambda e^{ - y/\beta } } \right)$$
(6)

which is the Gumbel distribution. It should be remarked that, as presented in Gumbel (1958), this distribution holds also for negative values. If \(H(x)\) corresponds to the Generalized Pareto distribution, then \(F(y)\) follows a GEV distribution (Madsen et al. 1997).

2.2 Mixture distributions and two-component Poissonian model

In flood frequency analysis, mixture distributions were developed to account for the presence of two (or more) physical processes responsible for extreme hydrological events. Two broad classes of mixtures have been identified (Alila and Mtiraoui 2002; Waylen et al. 1993).

The first class consists of additive mixture models (Moran 1959), which are appropriate when only one process generates the annual maximum flood in a year. Mixtures of lognormal distributions were studied by Alila and Mtiraoui (2002) and Singh and Sinclair (1972). More complex additive mixture models were developed by Grego and Yates (2010), Shin et al. (2015), and Yan et al. (2017, 2019).

The second class consists of compound models (Gumbel 1958), which are appropriate when two or more different processes may generate the annual maximum flood. Early applications were developed by Waylen and Woo (1982) and Rossi et al. (1984) followed by Strupczewski et al. (2009), Rulfova et al. (2016) and Fischer et al. (2016).

2.3 Two-component Poisson models

Define \({H}_{1}(x)\) and \({H}_{2}(x)\) as the distributions of exceedances of the first and the second component. In the mixture process approach, the distribution of an exceedance arising from two components is:

$$H(x) = p H_{1}(x) + \left( {1 - p} \right) H_{2}(x)$$
(7)

where \(p\) is the probability of the exceedance being sampled from \({H}_{1}(x).\)

If the annual number of exceedances of each component, \({N}_{1}\) and \({N}_{2}\), follow a Poisson distribution with parameters \({\uplambda }_{1}\) and \({\uplambda }_{2}\), then the total annual number of exceedances \(N={N}_{1}+{N}_{2}\) is also Poisson distributed with parameter \({\uplambda =\uplambda }_{1}+{\uplambda }_{2}\). This property is known as superposition of Poisson processes. Under the hypothesis that component exceedances are exponentially distributed, combining Eqs. (4) and (7), Rossi et al. (1984) obtained:

$$F(y) = \exp \left( { - \lambda_{1} e^{{ - \frac{y}{{\theta_{1} }}}} - \lambda_{2} e^{{ - \frac{y}{{\theta_{2} }}}} } \right)$$
(8)

which represents the Two-Component Extreme Value (TCEV) distribution with constraints \({\uplambda }_{1}>{\uplambda }_{2}\ge 0\). The first two parameters (\({\uplambda }_{1}\) and \({\uplambda }_{2}\)) represent the mean number of independent peaks in a year for each component, while the last two parameters (\({\theta }_{1}\) and \({\theta }_{2}\)) represent the mean peak amplitude of the ordinary and the outlying components respectively.

The TCEV can be re-expressed using the dimensionless parameters:

$$\theta_{*} = \frac{{\theta_{2} }}{{\theta_{1} }}{\text{ }}{\text{ }}{\text{ }}{\text{ }}{\text{ }}\lambda_{*} = \frac{{\lambda_{2} }}{{\lambda_{1}^{{1/\theta_{*} }} }}$$
(9)

to give

$$F(y) = \exp \left( { - \lambda_{1} e^{{ - \frac{y}{{\theta_{1} }}}} - \lambda_{*} \lambda_{1}^{{1/\theta_{*} }} e^{{ - \frac{y}{{\theta_{*} \theta_{1} }}}} } \right)$$
(10)

3 Modeling low-frequency climate variability

The primary focus of this paper is on exploiting properties of two-component Poisson models in order to model flood frequency analysis accounting for the presence of low-frequency variability (e.g., ENSO, PDO) which modulates flood risk over multiyear time scales. Let the low-frequency variability be represented by alternating wet/dry epochs (years), here respectively identified by the subscript \(i=\mathrm{1,2}\), and assume that their length (expressed in number of years) follows a Poisson distribution with mean \({\mu }_{i}\) so that \(\mu ={\mu }_{1}+{\mu }_{2}\).

In fact, the process of occurrences in the timeline of alternating different epochs can be modelled exploiting properties associated with thinning and superposition of Poisson processes (e.g., Lewis and Shedler 1979), where:

  • Thinning: suppose \(N\) is a Poisson process (with parameter \(\uplambda\)) counting the number of occurrences, where events can be recognized as one of two different types: Type 1 and Type 2. Let \({p}_{j}\) denote the probability that an event is of Type \(j\) with \(j = 1, 2\). The \({N}_{j}\) process counting the events of Type \(j\) is a Bernoulli thinned Poisson process with parameter \({\uplambda }_{j}={p}_{j}\uplambda\). (Chandramohan and Liang 1985; Isham 1980,).

  • Superposition: suppose \({N}_{1}\) and \({N}_{2}\) are independent Poisson counting processes with parameters \({\uplambda }_{1}\) and \({\uplambda }_{2}\) respectively. The sum process \(N = {N}_{1} + {N}_{2}\), consisting of all arrivals generated by process 1 and process 2, is a Poisson counting process with parameter \(\uplambda ={\uplambda }_{1}+{\uplambda }_{2}\).

Based on these properties of Poisson processes, in the next sections we develop two cases. In Case 1 we assume that the influence of climatic fluctuations is sufficiently strong that the type of flood event is almost deterministically dependent on climatic epoch, then different type of events may be obtained only considering different epochs. In Case 2 we assume that two (or more) different type of event may occur in any of the climatic epochs. For each case, we derive the marginal annual maximum distributions using the mixed process approach and show which additional assumptions are needed to produce a TCEV distribution of the marginal AM values.

3.1 Case 1: One component per epoch

Suppose the flood generation process in each year of epoch \(i\) is characterized by a Poissonian number of occurrences, \({N}_{i}\), with mean value \({\uplambda }_{{\text{i}}}\) and distribution of exceedances \({H}_{i}(x)\)

$$H_{i}(x) = \Pr({X \le x|yr \in {\text{epoch }}i}) = \Pr({X_{i} \le x})$$
(11)

reflecting different interannual hydrological or climatological conditions peculiar of epoch \(i\). We show that the distribution of AM may arise from a timeline of random years characterized by a stochastic alternation of different epochs under the hypothesis that different types of hydrological events may be triggered in any year, with the distribution conditional on epoch.

In particular, in this case, we assume that in dry (wet) epochs, Type 1 (Type 2) events prevail. In other words, Type \(i\) events are typical of epoch \(i\). Their counting process can be described as a thinned Poisson process \({N}_{1}\) (\({N}_{2}\)) characterized by probability \({p}_{1}\) (\({p}_{2}\)) with.

$$p_{i} = \frac{{\mu_{i} }}{\mu }$$
(12)

The superposition of the two thinned Poisson processes \({N}_{1}\) and \({N}_{2}\), with parameters \(\lambda_{1}^{\prime }\) and \(\lambda_{2}^{\prime }\), provides the overall annual process accounting for intra- and inter-annual variability, which is still a Poisson process with parameter

$$\lambda = p_{1} \lambda_{1}^{\prime } + p_{2} \lambda_{2}^{\prime } = \lambda_{1} + \lambda_{2} .$$
(13)

Following this interpretation, the distribution of exceedances is represented by a mixture distribution of two components, descending from Eq. (7):

$$H(x) = \frac{{\lambda_{1} H_{1}(x) + \lambda_{2} H_{2}(x)}}{\lambda }$$
(14)

which when substituted into Eq. (4) with exponential distributions of exceedances \({H}_{1}(x)\) and \({H}_{2}(x)\), results in a TCEV distribution (Eq. 8) of AM with Poisson parameters

$$\lambda_{i} = \frac{{\mu_{i} }}{\mu }\lambda_{i}^{\prime }$$
(15)

Note that, in this case, if the flood generation mechanism is not affected by epoch, (i.e., the distribution of exceedance is unconditional on epochs), it results \({\theta }_{1}={\theta }_{2}\), then \(F(y)\) reduces to a Gumbel distribution.

3.2 Case 2: Two components per epoch

Suppose within each epoch \(i\), two independent flood processes, \(k=\mathrm{1,2}\), generate floods with Poisson number of occurrences \({N}_{k,i}\) with mean \(\lambda_{k,i}^{\prime }\) and exceedances with distribution

$$H_{k,i}(x) = P( {X_{k}\leq x |{\text{epoch}} \, i}),\quad \left( {i = 1,2; \;k = 1,2} \right)$$
(16)

where \((k,i)=(\mathrm{1,1}), (\mathrm{1,2}), (\mathrm{2,1}), (\mathrm{2,2})\).

Thus, two different types of hydrological events may be triggered in any year, with the distribution conditional on epoch. Four \({N}_{k,i}\) Poisson processes occur, two in each epoch i(\({N}_{\mathrm{1,1}}\), \({N}_{\mathrm{2,1}}\), and \({N}_{\mathrm{1,2}}\), \({N}_{\mathrm{2,2}}\)) with parameters \(\lambda_{1,1}^{\prime }\), \(\lambda_{1,2}^{\prime }\), \(\lambda_{2,1}^{\prime }\), \(\lambda_{2,2}^{\prime }\) and four exceedance distributions \({H}_{\mathrm{1,1}}(x)\), \({H}_{\mathrm{1,2}}(x)\), \({H}_{\mathrm{2,1}}(x)\), \({H}_{\mathrm{2,2}}(x)\).

Considering a Bernoulli thinning of those Poisson processes, applied for each epoch \(i\) to both \({N}_{k,i}\) (for \(k=1, 2\)), the overall Poisson process accounting for any type of event in any year is obtained by the superposition of four thinned Poisson processes with parameters:

$$\lambda_{k,i} = p_{i} \lambda_{k,i}^{\prime } = \frac{{\mu_{i} \lambda_{k,i}^{\prime } }}{\mu }$$
(17)

and namely:

$$\begin{array}{*{20}l} {\lambda_{1,1} = p_{1} \lambda_{1,1}^{\prime } {\text{ }} {\text{and}} {\text{ }} \lambda_{2,1} = p_{1} \lambda_{2,1}^{\prime } } \hfill & {{\text{relative }}\;{\text{to }}\;{\text{epoch 1}}} \hfill \\ {\lambda_{1,2} = p_{2} \lambda_{1,2}^{\prime } {\text{ }} {\text{and}} {\text{ }} \lambda_{2,2} = p_{2} \lambda_{2,2}^{\prime } } \hfill & {{\text{relative}}\;{\text{ to }}\;{\text{epoch 2}}} \hfill \\ \end{array}$$
(18)

The mixture distribution of exceedances is obtained as:

$$\begin{aligned} H(x) = & \frac{{p_{1}( {\lambda_{1,1}^{\prime } H_{1,1}(x) + \lambda_{2,1}^{\prime } H_{2,1}(x)} ) + p_{2}( {\lambda_{1,2}^{\prime } H_{1,2}(x) + \lambda_{2,2}^{\prime } H_{2,2} (x)})}}{\lambda } \\ & = \frac{{\lambda_{1,1} H_{1,1}(x) + \lambda_{2,1} H_{2,1}(x) + \lambda_{1,2} H_{1,2}(x) + \lambda_{2,2} H_{2,2}(x)}}{\lambda } \\ \end{aligned}$$
(19)

with

$$\lambda = p_{1}( {\lambda_{1,1}^{\prime } + \lambda_{2,1}^{\prime } }) + p_{2}( {\lambda_{1,2}^{\prime } + \lambda_{2,2}^{\prime } }) = \lambda_{1,1} + \lambda_{2,1} + \lambda_{1,2} + \lambda_{2,2}$$
(20)

and the distribution of AM is found by inserting Eq. (19) in Eq. (4).

If all exceedance distributions are exponential, the AM distribution has eight parameters (\({\uplambda }_{\mathrm{1,1}}\), \({\uplambda }_{\mathrm{1,2}}\), \({\uplambda }_{\mathrm{2,1}}\), \({\uplambda }_{\mathrm{2,2}}\) and \({\theta }_{\mathrm{1,1}}\), \({\theta }_{\mathrm{1,2}}\), \({\theta }_{\mathrm{2,1}}\), \({\theta }_{\mathrm{2,2}}\)). The AM mixed distribution of Case 1 is trivially obtained by this general formulation by assuming \({\uplambda }_{\mathrm{1,2}}={\uplambda }_{\mathrm{2,1}}=0\).

A TCEV distribution is also obtained by assuming \({\theta }_{1}={\theta }_{\mathrm{1,1}}={\theta }_{\mathrm{1,2}}\), and \({\theta }_{2}={\theta }_{\mathrm{2,1}}={\theta }_{\mathrm{2,2}}\) , hence:

$$H_{1}(x) = H_{1,1}(x) = H_{1,2}(x) = 1 - e^{{ - x/\theta_{1} }}$$
(21)

and

$$H_{2}(x) = H_{2,1}(x) = H_{2,2}(x) = 1 - e^{{ - x/\theta_{2} }}$$
(22)

stating that the hydrological conditions depending on the epoch actually affect only the number of occurrence of events of different types and not their distribution of exceedances. The remaining two TCEV parameters can be shown to be

$$\lambda_{1} = p_{1} \lambda_{1,1}^{\prime } + p_{2} \lambda_{1,2}^{\prime } = \frac{{\mu_{1} \lambda_{1,1}^{\prime } }}{\mu } + \frac{{\mu_{2} \lambda_{1,2}^{\prime } }}{\mu }$$
(23)
$$\lambda_{2} = p_{1} \lambda_{2,1}^{\prime } + p_{2} \lambda_{2,2}^{\prime } = \frac{{\mu_{1} \lambda_{2,1}^{\prime } }}{\mu } + \frac{{\mu_{2} \lambda_{2,2}^{\prime } }}{\mu }$$
(24)

In the above-described conditions, the TCEV distribution holds as the distribution of AM affected by both intra- and inter-annual mixed processes. This development could be easily extended to more than 2 types of events by further incrementing k values. The marginal AM always results in a TCEV distribution because the number of components is equal to the number of climatic epochs.

The conditions \({\theta }_{1}={\theta }_{\mathrm{1,1}}={\theta }_{\mathrm{1,2}}\), and \({\theta }_{2}={\theta }_{\mathrm{2,1}}={\theta }_{\mathrm{2,2}}\) are consistent with a known property of the exponential distribution (often referred to as the memoryless property). Following this property, the probability of exceedances is conditionally independent of the threshold value and, as a consequence, only the frequency of exceedances is affected by epoch, not their distributions.

We find the mixture process approach with Poissonian occurrences and exponential exceedances yields a TCEV distribution with a process-based interpretation for the parameters. This suggests the TCEV is a promising and robust candidate for application to locations where flood risk alternates between dry and wet epochs. This will be evaluated in the case study using sites from eastern Australia which have been shown to exhibit low-frequency variability in flood risk.

4 Model assessment and parameter estimation

Despite the wide interest in mixture distributions and the recognized existence of different processes, the application of the methods described in Sect. 2 is mainly limited by problems connected with the estimation of parameters and difficulty in recognizing and observing different physical processes. To overcome these problems, different strategies have been proposed. Some authors (Singh and Sinclair 1972; Waylen and Woo 1982; Woo and Waylen 1984) suggest separating the events of the mixture components to estimate the parameters of each component and evaluate the weights of the total distribution in the case of the first class. Nevertheless, difficulties are encountered identifying the component for each single event in a data series when it is not sufficient to identify flood components by means of the season or period of the event’s occurrence (e.g., Hirschboeck 1987). Other authors (Arnell and Gabriele 1988; Fiorentino et al. 1985, 1987; Rossi et al. 1984), with reference to the TCEV model, recommend regional analysis to estimate parameters (in particular parameters related to skewness and kurtosis) and discourage at-site application of such models. The absence of a complete methodological framework for assessment of the robustness of mixture distributions in FFA (Kjeldsen et al. 2018) and for the evaluation of uncertainty and confidence limits of quantiles has been noted by Bocchiola and Rosso (2014). Furthermore, this class of models is not always suitable for comparison by goodness of fit tests with homogenous distributions (Laio et al. 2010; Bocchiola and Rosso 2014).

The L-moment ratio diagram, introduced by Hosking (1990), is considered a valuable aid for the selection of a parent distribution (e.g., Cunnane 1989; Salinas et al. 2014; Stedinger et al. 1993; Strupczewski et al. 2011; Vogel and Fennessey 1993). The LMRD representation of TCEV was developed by Arnell and Beran (1988) and Gabriele and Arnell (1991). To our knowledge the TCEV is the only mixed compound distribution whose domain has been represented in the LMRD. Some of its characteristics are illustrated in Connell and Pearson (2001).

In Figure 1a, the TCEV domain is the same as used by Gabriele and Arnell (1991), namely \(2\le {\uptheta }_{*}\le 100\) and \(0.05\le {\uplambda }_{*}\le 1\). The figure compares this domain with the domains of frequently used distributions (GEV, GLO, GPA, PE3).

Fig. 1
figure 1

L-moment ratio diagrams with TCEV: a as in Gabriele and Arnell (1991); b with the extended domain. For building LMRD (except for TCEV), the R package lmomco (Asquith 2021) was exploited

In Fig. 1b we extend the domain of TCEV parameters in the LMRD by increasing the range for \({\uplambda }_{*}\) from 1 to 5 to cover the wider range of values of \({\uplambda }_{2}\) and \({\uplambda }_{1}\) found in the case study. We kept the original range of \({\theta }_{*}\) with a maximum value of 100 (Arnell and Beran 1988; Gabriele and Arnell 1991). Higher values are also possible, but for practical purposes it can be deemed a reliable upper limit. The following features of the extended LMRD are noted:

  1. 1.

    The TCEV domain moves from the Gumbel point towards higher values of L-skewness and L-kurtosis.

  2. 2.

    For some combination of its parameters, the TCEV collapses to the Gumbel distribution as a special case. Holding the inequality \({\theta }_{1}<{\theta }_{2}\) (introduced by Beran et al. 1986), this occurs if only one component is present (e.g., \({\uplambda }_{1}\to 0\) or \({\uptheta }_{1}\to 0\)).

  3. 3.

    The GEV, GLO and GPA lines lie within the TCEV domain for a wide range of L-skewness and L-kurtosis.

  4. 4.

    The extension fills a gap in the neighborhood of the Gumbel point. In the extended area the curves for increasing \({\uplambda }_{*}\) (above 1) are progressively closer (almost indistinguishable), while those of \({\theta }_{*}\) are very close to each other, making visual interpretation of sample points difficult. Understanding the physical behavior of the processes underlying the extended area could assume a greater importance for a proper assessment of the TCEV model.

It needs to be stressed that other four-parameter distributions can span a domain similar to TCEV in the LMRD. Of particular relevance is the four-parameter Kappa distribution (Hosking 1994) which has the GEV as a special case when the second shape parameter \(h=0\). Hosking showed that the Kappa distribution is the distribution of the largest exceedance in a year if \(H(x)\) corresponds to the Generalized Pareto distribution and the number of events in a year are sampled from a binomial rather than Poisson distribution. He further showed that as the binomial parameter n increases, the Kappa converges from below to the GEV line on the LMRD.

While several methods are available for estimating TCEV parameters, practical and implementation issues limit the choice. In general, the method of moments is not considered a reliable approach for distributions with three or more parameters due to the high bias connected with relatively small samples. In TCEV studies only a few methods have been used: Rossi et al. (1984) suggested a Maximum Likelihood (ML) approach; Beran et al. (1986) derived TCEV Probability Weighted Moments (PWMs); Arnell and Beran (1988), Gabriele and Arnell (1991) and Gabriele and Iiritano (1994) reported L-moments expressions for TCEV parameters. Connell and Pearson (2001) applied a least squares approach. For the evaluation of TCEV parameters we use unbiased L-moments estimators (Hosking and Wallis 1995; Wang 1996) exploiting the computational approach for the evaluation of TCEV parameters from sample L-moments provided in Gabriele and Iiritano (1994).

5 Case study: dataset and results

This section evaluates the ability of TCEV to fit annual maximum of peak flows for eastern Australia where multidecadal variability is known to affect flood risk. The aim is not to compare TCEV against other models but to evaluate its potential in a region where flood risk is affected by multidecadal variability.

Annual maximum instantaneous discharge data for New South Wales (NSW) and Queensland (QLD) employed in this study were obtained from Australian Rainfall and Runoff (ARR) Project 5 (P5) Regional Flood Methods (Rahman et al. 2015). The dataset retrieved is very large and comprises flood data from 372 gauged sites, with sample size ranging between 20 and 102 years and observations from 1910 to 2012. The same sites are provided online by the Bureau of Meteorology (BoM) of Australian Government and are freely available at the website http://www.bom.gov.au/waterdata/. To maximize length of records, we integrated the P5 project data up to 2020 by retrieving data from this latter source. Locations were selected with the aim of sampling the eastern coast of Australia at different latitudes. Hence, we focused on stations with longer records spread along the coast of Eastern Australia. They are all located in New South Wales and Queensland. The main features of catchments and data series are reported in Table 1, while their geographical position is shown in Fig. 2.

Table 1 Summary of key features of eastern Australian case study sites
Fig. 2
figure 2

Location of case study sites in New South Wales and Queensland

Table 2 reports estimates of the TCEV parameters and sample L-moments ratios (\({t }_{3}, \, {t }_{4}\)).

Table 2 TCEV parameters and unbiased L-moment at-site estimates for case study sites

Figure 3 presents the LMRD for the case study sites showing all sites lie within the extended TCEV domain of Fig. 1b. With respect to curves of three-parameter distributions shown in the LMRD, all sites lie below the Generalized Logistic, 1 of 12 sites lies between the GLO and GEV lines, while remaining 11 sites are below the GEV line and closer to Generalized Pareto or Pearson type 3 distributions.

Fig. 3
figure 3

L-moment ratio diagram with sample unbiased estimates for selected stations in Eastern Australia

Figure 4 presents TCEV probability plots for all sites with parameters estimated using L-moments.

Fig. 4
figure 4

Gumbel probability plots with L-moment fitted TCEV for selected sites in Eastern Australia (Cunnane plotting position has been applied)

A good fit is obtained to observed annual maximum series at all sites. However, what is of more interest is evidence of the presence of two components, which is associated by the presence of a dog-leg, a step-change in the slopes of the extremal straight lines of the cdf. Plots at all but one site (Pascoe River at Fall Creek) clearly display such a sharp transition in curvature.

6 Discussion

One of the main motivations for this work comes from the observation that, in some regions, the flood regime is subject to the influence of low-frequency climate variability. This is the case for Eastern Australia for which several studies have highlighted the influence of decadal climate variations on flood phenomenology (Franks and Kuczera 2002; Micevski et al. 2006), thus challenging the classical hypothesis that annual maximum floods can be treated as identically distributed. By using stratified flood frequency data of the same region, through robust ENSO classification, Kiem et al. (2003) showed that significantly different regional flood index distributions are obtained from data series stratified by IPO multidecadal phases. In particular a strong control of La Niña events, modulated in magnitude by multidecadal IPO processes is assessed and a strong shift in flood frequency curves is observed comparing extremes that occurred in different climatic epochs.

The results in Sect. 5 suggest the TCEV distribution can adequately fit the marginal distribution over the entire range of observed values without the explicit need to resort to climate covariates and removal of potentially influential low values. Since Potter (1958), a dog-leg shaped cdf has been considered a key signature of the presence of different flood generation processes. Figure 4 clearly displays this signature. Nevertheless, we see a good fit as a necessary but not sufficient condition justifying the use of TCEV. To tackle this with rigor would require a clear and complete understanding of all physical processes relating climatic patterns to flood generation processes thus allowing for a continuous simulation approach similar to that described in Samuel and Sivapalan (2008). Such a task is well beyond the scope of this study, nonetheless, some headway can be made by reviewing our understanding of the hydroclimate mechanisms responsible for floods in Eastern Australia and assessing whether these mechanisms are consistent with the assumptions in Sect. 3 that led to the TCEV.

Kiem et al. (2003) show that IPO negative phases have a significantly higher frequency of La Niña events than IPO positive phases. While La Niña years are known to have above average rainfall, it is of interest to understand what features of rainfall events change in La Niña years. In his work on building a high-resolution stochastic rainfall model, Frost (2003) analyzed the statistics of storm events using long-term high-resolution rainfall records at Sydney and Brisbane. He found there was little difference in average storm duration and intensity for La Niña, El Niño and Neutral years. What differed significantly was the average duration of dry spells—for example, at Sydney, average dry spells were 41.6 h in La Niña years, 45.3 h in Neutral years and 49.4 h in El Niño years.

The increased frequency of rainfall events in La Niña years would be expected to result in wetter antecedent moisture conditions. This was confirmed by Pui et al. (2011) when investigating the influence of IPO phases on flood events using a large dataset of rainfall and flow across Australia. They concluded that the significant differences in flood characteristics across the two IPO phases are due more to differences in antecedent soil moisture conditions than differences in rainfall. In La Niña years wetter antecedent moisture conditions increase the occurrence of significant flood events and increase the magnitude of the flood for the same rainfall.

Following these observations, we may identify wet\dry epochs according to IPO different phases (\(IPO<-0.5\) and \(IPO>-0.5\)). Hence the two-component-per-epoch model of Sect. 3.2 (i.e., Case 2, as defined by Eqs. (21) to (24)) appears to be conceptually consistent with the reviewed evidence. The most significant difference between IPO wet and dry epochs is the frequency of La Niña years. If component 1 is associated with non-La Niña years and component 2 with La Niña years, we would expect the exceedance distributions to satisfy \({H}_{1}(x)>{H}_{2}(x)\) which for exponentially distributed exceedances implies \({\theta }_{2}>{\theta }_{1}\). This is consistent with the clear separation between \({\theta }_{2}\) and \({\theta }_{1}\) reported in Table 2.

Table 2 reveals a clear separation for the inequality \({\uplambda }_{1}>{\uplambda }_{2}\). Kiem et al. (2003) show that in IPO positive phases there is about a 12% chance of La Niña year, while in IPO negative phases there is a 50% chance of a La Niña year. Using the results reported by Frost (2003), we would expect up to 20% more rainfall events in a La Niña year than a non-La Niña year. As a result, we would expect \(\lambda_{1,1}^{\prime }\) to be considerably greater than \(\lambda_{2,1}^{\prime }\) in an IPO positive phase, and \(\lambda_{1,2}^{\prime }\) to be slightly less than \(\lambda_{2,2}^{\prime }\) in an IPO positive phase. On balance this suggests \({\uplambda }_{1}>{\uplambda }_{2}\) is plausible. Moreover, the increase of \(\lambda_{2,2}^{\prime }\)(counterbalanced somewhat by a lower \({p}_{2}\) weight) relative to \(\lambda_{1,2}^{\prime }\) inflates \({\uplambda }_{2}\) and contributes to the fact that 5 of 12 sites in Table 2 have \({\uplambda }_{2} > 1\), and hence a higher \({\uplambda }_{*}\). In fact, \({\uplambda }_{*}\) exceeds 0.7 at 6 of the 12 sites and exceeds 1 at 3 sites, thus locating the TCEV parameters in the extended area of the TCEV domain in Fig. 3.

In order to provide a numerical assessment of the proposed theoretical derivation of TCEV coming from thinning and superposition of Poisson processes, we performed numerical experiments based on Monte Carlo numerical simulations whose underlying process is consistent with the physical evidence documented above and with the two cases defined in Sect. 3. For characterizing the TCEV parameters, following Kiem et al. (2003), we assumed that the probability of falling into the wet epoch is equal to the percentage of years characterized by \(IPO<-0.5\). Results of this analysis, reported in the Appendix, support the agreement of TCEV distribution to this class of underlying processes.

The theoretical development of TCEV provided in Sect. 3 and the analysis of the study case suggests the TCEV is a robust choice when dealing with mixed processes. Several studies report that different mechanisms are considered responsible for flood runoff generation in a significant portion of Australian catchments (e.g., Johnson et al. 2016). Trancoso et al. (2016) analyzed the hydrological regime of a large number of catchments in Eastern Australia highlighting the presence of three mechanisms as dominant in the streamflow hydrograph (i.e., Horton overland flow, subsurface stormflow and return flow). Jarihani et al. (2017) applied a coarse-scale water balance for the Upper Burdekin catchment, a large basin in Queensland whose hydrological behavior is considered representative of basins across northern Australia and Queensland. They observed that at the annual scale, runoff is triggered by saturation overland flow (from November to March due to wetter seasonal conditions) and Hortonian overland flow from March to October with weights depending on local hydrological factors. Similar results are also reported in plot studies in Australian savannas (McIvor et al. 1995; Roth 2004; Silburn et al. 2011). Considering that high rainfall variability is linked with ENSO events, with above average rainfall during La Niña events and drier conditions during El Niño events, they also concluded that the impact of these changes on runoff events is mainly related to a generalized increase in the antecedent soil moisture and a higher frequency of different type of events.

The use of TCEV is warranted without the need to a priori identify the type of event from the available series of the annual maxima or the events’ marginal annual maximum distribution (Pelosi et al. 2020). The marginal distribution of AM of such a mixed process may be pursued also using additive mixture distributions, but their use requires knowledge of the marginal distributions of the two types of events and also their rate of occurrence in different climate epochs. Applying additive mixture models to annual maximum flood series affected by low-frequency climatic signatures raises problems of identification of different periods with reference to their influence on flood generation mechanisms. Problems would include whether (i) the climate phenomena and phases are always clearly distinguishable and historically identifiable without uncertainty; (ii) the flood generation processes are well recognized and it is possible to know how many years have maxima belonging to the different types, (iii) the probability that in some definite years (as it could arise from climatic phases and phenomena) more than one type of event may occur.

With regard to TCEV parameter estimation, in the original formulation of TCEV there were two physical constraints, respectively on the values of the variable (\(x\ge 0\)) and on the mutual values of two parameters, i.e., \({\uplambda }_{2}>{\uplambda }_{1}\ge 0\). This latter can find justification from the conceptual basis of the nature of flood generation mechanisms. In fact, its definition (Rossi et al. 1984, pag. 852) relies also on a semantic distinction between the two components, assuming that the “ordinary” component can generate “frequent” floods, and the “outlying distribution” is responsible of rare and heavy events. The constraint \({\theta }_{2}>{\theta }_{1}>0\) is theoretically justified by the condition imposed by Beran et al. (1986) for deriving all the analytical derivations of moments and PWMs (\({\theta }_{*}>1\)).

Kjeldsen et al. (2018) stated that the use of a TCEV distribution should be considered reasonable whenever two processes generate an event in every year, “e.g., at least one rainfall and one snowmelt flood event in each year”. Actually, the TCEV, as shown in Sect. 2, arises as the probability of the annual maximum value in a process characterized by a Poissonian occurrence, where the discrete random Poisson variable may assume a value equal to zero, with probability equal to \(exp(-\uplambda )\), and its average value \(\uplambda\) may assume values less than 1. To our knowledge, in almost all TCEV applications in regional analysis of annual maximum rainfall and floods, the mean annual number of extraordinary events (\({\uplambda }_{2}\)) assumes values less than 1. In this paper we show that the TCEV application, following arguments and results shown in Sects. 4 and 5, is also feasible in cases where \({\uplambda }_{1}\) and \({\uplambda }_{2}\) have similar values, both of them can be higher than 1 and their ratio can be close to 1 or even smaller confirming that the TCEV can be considered as a candidate parent both in presence of infrequent phenomena, as in its regional applications (see Castellarin et al. 2012), and for high frequency (i.e., seasonal, Strupczewski et al. 2012) or low-frequency phenomena.

The LMRD of Fig. 3 also suggests that all but one of the 12 sites fall below the GEV line while all of them are within the domain of a four-parameter Kappa distribution (Hosking 1994). As noted in Sect. 2.1, the GEV is the distribution of the largest exceedance in a year if \(H(x)\) corresponds to the Generalized Pareto distribution while the Kappa is obtained if \(H(x)\) is a Generalized Pareto distribution and the number of events in a year are sampled from a binomial. While the sample of 12 sites is small, the separation shown in Fig. 3 is conceptually consistent with the two-component foundation of the TCEV but also compatible with a Kappa or different three-parameter distributions (GEV, GLO, GPA, PE3).

Collectively this evidence is believed to provide reasonable physical support for the hypotheses leading to the choice of a TCEV distribution if process dependence by physical (e.g., climatic) factors is assessed. Further insights into the effect of multidecadal climatic fluctuations and ENSO events on the flood cdf could be obtained through a deeper analysis of stratified series which in turn could shed light about the presence of multiple runoff generation mechanisms in any epoch.

We conclude with a comment on the methods used for parameter evaluation and distribution assessment. As mentioned in the introduction, a number of studies have highlighted the problem of slow convergence to asymptotic distributions and the need for long sample records. Following these recommendations, we have chosen among the longest flood series available in the coastal basins of Eastern Australia. The record lengths range from 53 to 101 years producing, especially in the shorter series, a possible bias in the evaluation of higher order L-moments. On the other hand, in this paper we do not pursue the issue of comparison between different probabilistic models. In fact, the use of mixed distributions calls and allows for the development of new criteria for model selection strongly based on the knowledge and investigation of physical processes.

7 Conclusions

A theoretical formulation is presented supporting the use of compound mixture distributions to describe flood populations affected by low-frequency climate oscillations such as the decadal IPO and ENSO-related phenomena. The formulation exploits the property of superposition and thinning Poissonian counting processes. It shows that the TCEV is a promising and robust candidate for this purpose. This is confirmed by the good results obtained by fitting the TCEV distribution to the longest available series of annual maximum floods in the coastal basins of Queensland and New South Wales in Eastern Australia. The TCEV is shown to adequately fit the marginal distribution over the entire range of observed values without the explicit need to resort to climate covariates and removal of potentially influential low values. Furthermore, a literature review of evidence regarding rainfall and flood generation in the different IPO epochs is shown to be not inconsistent with the assumptions underpinning the TCEV.

Previous applications of TCEV have focused on regional analysis. This study shows that TCEV can be applied to at-site analyses of flood series with lengths ranging from 53 to 101 years) where the dog-leg signature of different flood generation processes is evident. It also extends the domain covered by the TCEV distribution in the L-moment ratio diagram and shows for the Eastern Australian case study that all sites lie within the TCEV extended domain.

This study shows that the choice of a parent distribution can be guided by a deeper knowledge of the climate and runoff controls affecting floods. The incorporation of physically based reasoning into flood frequency analysis has been already undertaken in a significant number of studies which used a derived distribution or mixtures of different populations approaches. Nonetheless, at the operational level, FFA handbooks and guidelines, all over the world, largely rely on curve fitting and statistical testing. While we acknowledge this is necessary, it is not necessarily sufficient. Curve fitting cannot subject a probability model to scrutiny beyond the observed data. When the intent of FFA is to extrapolate well beyond the data, a deeper form of model scrutiny is required.