Skip to main content

Advertisement

Log in

A zero-inflated model for spatiotemporal count data with extra zeros: application to 1950–2015 tornado data in Kansas

  • Published:
Environmental and Ecological Statistics Aims and scope Submit manuscript

Abstract

In many tornado climate studies, the number of tornado touchdowns is often the primary outcome of interest. These outcome measures are usually generated under a spatiotemporal correlation structure and contains many zeros due to the rarity of tornado occurrence at a specific location and time interval. To model the spatiotemporal count data with excess zeros, we propose a spatiotemporal zero-inflated Poisson (ZIP) model, which lends itself to ease of interpretation and computational simplicity. Technically, we embed a modified conditional autoregressive model in the ZIP model to describe the spatial and temporal correlations, where the probability of a pure zero in the ZIP is purposely designed to depend on locations but independent of time. Illustrated with the longitudinal tornado touchdown data in the state of Kansas from 1950 to 2015, our model suggests that the spatial correlation among the counties and the corresponding temperature are significant factors attributed to the tornado touchdowns. Through the model, we can also estimate the probabilities of no tornado touchdowns for each county over time. These estimated probabilities substantially help us understand the pattern of touchdowns and further identify the risk areas across Kansas. Moreover, these estimates can be iteratively updated when more current touchdown data are available. The final model for Kansas tornado touchdown data is evaluated using more recent data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  • Agarwal DK, Gelfand AE, Citron-Pousty S (2002) Zero-inflated models with application to spatial count data. Environ Ecol Stat 9(4):341–355

    Article  MathSciNet  Google Scholar 

  • Akers CM, Smith NJ, Shifa N (2014) Multinomial logistic regression model for predicting tornado intensity based on path length and width. World Environ 4(2):61–66

    Google Scholar 

  • Amek N, Bayoh N, Hamel M, Lindblade KA, Gimnig J, Laserson KF, Slutsker L, Smith T, Vounatsou P (2011) Spatio-temporal modeling of sparse geostatistical malaria sporozoite rate data using a zero inflated binomial model. Spat Spatio-Temporal Epidemiol 2(4):283–290

    Article  Google Scholar 

  • Besag J, York J, Mollié A (1991) Bayesian image restoration, with two applications in spatial statistics. Ann Inst Stat Math 43(1):1–20

    Article  MathSciNet  Google Scholar 

  • Böhning D, Dietz E, Schlattmann P, Mendonca L, Kirchner U (1999) The zero-inflated Poisson model and the decayed, missing and filled teeth index in dental epidemiology. J R Stat Soc A 162(2):195–209

    Article  Google Scholar 

  • Buu A, Li R, Tan X, Zucker RA (2012) Statistical models for longitudinal zero-inflated count data with applications to the substance abuse field. Stat Med 31(29):4074–4086

    Article  MathSciNet  PubMed  PubMed Central  Google Scholar 

  • Cheng VY, Arhonditsis GB, Sills DM, Gough WA, Auld H (2015) A Bayesian modelling framework for tornado occurrences in north America. Nat Commun 6(1):6599

    Article  ADS  CAS  PubMed  Google Scholar 

  • Chib S, Greenberg E (1995) Understanding the Metropolis Hastings algorithm. Am Stat J 49:327–335

    Article  Google Scholar 

  • Cressie NAC (2015) Statistics for spatial data. John Wiley & Sons

    Google Scholar 

  • Cressie NAC  (1993) Statistics for spatial data, revised edition. Wiley, New York

  • Diaz J, Joseph MB (2019) Predicting property damage from tornadoes with zero-inflated neural networks. Weather Clim Extrem 25:100216

    Article  Google Scholar 

  • Diuk-Wasser MA, Vourc’h G, Cislo P, Hoen AG, Melton F, Hamer SA, Rowland M, Cortinas R, Hickling GJ, Tsao JI et al (2010) Field and climate-based model for predicting the density of host-seeking nymphal Ixodes scapularis, an important vector of tick-borne disease agents in the eastern United States. Glob Ecol Biogeogr 19(4):504–514

    Article  Google Scholar 

  • Doswell CA III, Carbin GW, Brooks HE (2012) The tornadoes of spring 2011 in the USA: an historical perspective. Weather 67(4):88–94

    Article  ADS  Google Scholar 

  • Dzupire NC, Ngare P, Odongo L (2018) A Poisson-gamma model for zero inflated rainfall data. J Probab Stat 2018:1–12

    Article  MathSciNet  Google Scholar 

  • Farewell VT, Sprott D (1988) The use of a mixture model in the analysis of count data. Biometrics 44(4):1191–1194

    Article  CAS  PubMed  Google Scholar 

  • Gelfand AE, Smith AFM (1990) Sampling-based approaches to calculating marginal densities. J Am Stat Assoc 85:398–409

    Article  MathSciNet  Google Scholar 

  • Geman S, Geman D (1984) Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images. IEEE Trans. Pattern Anal Mach Intell 6:721–741

    Article  CAS  PubMed  Google Scholar 

  • Gomez-Rubio V, Cameletti M, Finazzi F (2015) Analysis of massive marked point patterns with stochastic partial differential equations. Spat Stat 14:179–196

    Article  MathSciNet  Google Scholar 

  • Gu X, Yan X, Ma L, Liu X (2020) Modeling the service-route-based crash frequency by a spatiotemporal-random-effect zero-inflated negative binomial model: an empirical analysis for bus-involved crashes. Accid Anal Prev 144:105674

    Article  PubMed  Google Scholar 

  • Hastings WK (1970) Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57:97–109

    Article  MathSciNet  Google Scholar 

  • Hong H, Huang Q, Jiang W, Tang Q, Jarrett P (2021) Tornado wind hazard mapping and equivalent tornado design wind profile for Canada. Struct Saf 91:102078

    Article  Google Scholar 

  • Hu M-C, Pavlicova M, Nunes EV (2011) Zero-inflated and hurdle models of count data with extra zeros: examples from an HIV-risk reduction intervention trial. Am J Drug Alcohol Abuse 37(5):367–375

    Article  PubMed  PubMed Central  Google Scholar 

  • Jagger TH, Elsner JB, Widen HM (2015) A statistical model for regional tornado climate studies. PLoS ONE 10(8):0131876

    Article  Google Scholar 

  • Karpman D, Ferreira MA, Wikle CK (2013) A point process model for tornado report climatology. Stat 2(1):1–8

    Article  MathSciNet  Google Scholar 

  • Kim D-W, Deo RC, Park S-J, Lee J-S, Lee W-S (2019) Weekly heat wave death prediction model using zero-inflated regression approach. Theor Appl Climatol 137(1):823–838

    Article  ADS  Google Scholar 

  • Lee AH, Wang K, Scott JA, Yau KK, McLachlan GJ (2006) Multi-level zero-inflated Poisson regression modelling of correlated count data with excess zeros. Stat Methods Med Res 15(1):47–61

    Article  MathSciNet  PubMed  Google Scholar 

  • Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller E(1953)Equations of state calculations by fast computing machines. J Chem Phys 21:1087–1092

    Article  ADS  CAS  Google Scholar 

  • Monfredo W (1999) Relationships between phases of the El Nino-Southern Oscillation and character of the tornado season in the south-central United States. Phys Geogr 20(5):413–421

    Article  Google Scholar 

  • Neelon B (2019) Bayesian zero-inflated negative binomial regression based on pólya-gamma mixtures. Bayesian Anal 14(3):829–855

    Article  MathSciNet  PubMed  PubMed Central  Google Scholar 

  • Rathbun SL, Fei S (2006) A spatial zero-inflated Poisson regression model for oak regeneration. Environ Ecol Stat 13(4):409–426

    Article  MathSciNet  Google Scholar 

  • Ridout M, Demétrio CG, Hinde J (1998) Models for count data with many zeros. In: Proceedings of the XIXth international biometric conference, vol 19. pp 179–192

  • Smith AN, Anderson MJ, Millar RB, Willis TJ (2014) Effects of marine reserves in the context of spatial and temporal variation: an analysis using Bayesian zero-inflated mixed models. Mar Ecol Prog Ser 499:203–216

    Article  ADS  Google Scholar 

  • Standohar-Alfano CD, Lindt JW (2015) Empirically based probabilistic tornado hazard analysis of the United States using 1973–2011 data. Nat Hazard Rev 16(1):04014013

    Article  Google Scholar 

  • Strader SM, Ashley WS, Pingel TJ, Krmenec AJ (2017a) Observed and projected changes in United States tornado exposure. Weather Clim Soc 9(2):109–123

  • Strader SM, Ashley WS, Pingel TJ, Krmenec AJ (2017b) Projected 21st century changes in tornado exposure, risk, and disaster potential. Clim Change 141(2):301–313

  • Thom H (1963) Tornado probabilities. Mon Weather Rev 91(10):730–736

    Article  ADS  Google Scholar 

  • Tippett MK, Sobel AH, Camargo SJ (2012) Association of U.S. tornado occurrence with monthly environmental parameters. Geophys Res Lett. https://doi.org/10.1029/2011GL050368

    Article  Google Scholar 

  • Torabi M (2017) Zero-inflated spatio-temporal models for disease mapping. Biom J 59(3):430–444

    Article  MathSciNet  PubMed  Google Scholar 

  • Valente F, Laurini M (2020) Tornado occurrences in the United States: a spatio-temporal point process approach. Econometrics 8(2):25

    Article  Google Scholar 

  • Wang X, Chen M-H, Kuo RC, Dey DK (2015) Bayesian spatial-temporal modeling of ecological zero-inflated count data. Stat Sin 25(1):189

    MathSciNet  PubMed  PubMed Central  Google Scholar 

  • Wang J, Reyes-Gibby CC, Shete S (2021) An approach to analyze longitudinal zero-inflated microbiome count data using two-stage mixed effects models. Stat Biosci 13(2):267–290

    Article  Google Scholar 

  • Wikle CK, Anderson CJ (2003) Climatological analysis of tornado report counts using a hierarchical Bayesian spatiotemporal model. J Geophys Res Atmos. https://doi.org/10.1029/2002JD002806

    Article  Google Scholar 

  • Yip KC, Yau KK (2005) On modeling claim frequency data in general insurance with extra zeros. Insur Math Econ 36(2):153–163

    Article  Google Scholar 

Download references

Acknowledgements

We extend our gratitude to the Editors, the Associate Editor, and the two anonymous referees for their valuable time and insightful comments, which have significantly improved this paper. Hong-Ding Yang’s research is funded by the National Science and Technology Council, Taiwan (NSTC 111-2118-M-415-001 and NSTC 112-2118-M-415-001). Chun-Shu Chen’s research is founded by the National Science and Technology Council, Taiwan (NSTC111-2118-M-008-002-MY2).

Author information

Authors and Affiliations

Authors

Contributions

Conceptualization: H-DY, W-WH, and C-SC. Data curation: H-DY and AC. Formal analysis and discussion: H-DY, W-WH, and C-SC. Figures and tables: H-DY and AC. Methodology: H-DY, W-WH, and C-SC. Writing—original draft: H-DY, W-WH, and C-SC. Writing—review & editing: H-DY, AC, W-WH, and C-SC. Revision: H-DY, W-WH, and C-SC.

Corresponding author

Correspondence to Chun-Shu Chen.

Ethics declarations

Conflict of interest

No potential conflict of interest was reported by the authors.

Additional information

Handling Editor: Luiz Duczmal.

Appendix: derivations of the full conditional distributions for the model parameters

Appendix: derivations of the full conditional distributions for the model parameters

According to the assumptions for the variance components \(\sigma ^2_t\); \(t=0,1,\ldots ,T\), and spatiotemporal random process \(\delta _{0i}\); \(i=1,\ldots ,N\), the corresponding full conditional densities can be exactly determined and then the Gibbs sampling schedule can be used to generate the posterior samples \(\sigma ^2_t\) and \(\delta _{0i}\), respectively. This supplement focuses on illustrating the full conditional distributions of \(\sigma ^2_t\) and \(\delta _{0i}\). For each \(\sigma ^2_t\), \(t=0,1,\ldots ,T\), the full conditional density can be derived by the following manner:

$$\begin{aligned}{} & {} P\big (\sigma ^2_t\mid \varvec{\theta }_{\big (-\sigma ^2_t\big )},\varvec{\delta }_1,\ldots ,\varvec{\delta }_T,\varvec{\delta }_0,\textbf{Y}\big ) \nonumber \\{} & {} \quad \propto P\big (\varvec{\delta }_{t}\mid \sigma ^2_t,\phi _t,\varvec{\delta }^{*}_t\big ) \times \pi \big (\sigma ^2_t\big ) \nonumber \\{} & {} \quad = \frac{1}{(2\pi )^{(N/2)}\left( \text{ det }\left( \sigma ^2_t\varvec{V}(\phi _t)\right) \right) ^{1/2}} \exp \left\{ -\frac{1}{2}(\varvec{\delta }_t-\varvec{\delta }^{*}_t)'\big (\sigma ^2_t\varvec{V}(\phi _t)\big )^{-1}(\varvec{\delta }_t-\varvec{\delta }^{*}_t)\right\} \nonumber \\{} & {} \qquad \times \frac{1}{\Gamma (a_t)b_t^{a_t}}\left( \frac{1}{\sigma ^2_t}\right) ^{a_t+1}\exp \left\{ -\frac{1}{\sigma ^2_tb_t}\right\} \nonumber \\{} & {} \quad \propto \left( \frac{1}{\sigma ^2_t}\right) ^{\frac{N}{2}+a_t+1} \exp \left\{ -\frac{1}{\sigma ^2_t}\left[ \frac{1}{2}(\varvec{\delta }_t-\varvec{\delta }^{*}_t)'(\varvec{V}(\phi _t))^{-1}(\varvec{\delta }_t-\varvec{\delta }^{*}_t)+\frac{1}{b_t}\right] \right\} , \end{aligned}$$
(A1)

where \(\varvec{\delta }^{*}_t=\varvec{W\alpha }\) for \(t=0\) and \(\varvec{\delta }^{*}_t=\varvec{\delta }_0\) for \(t=1,\ldots ,T\). According to (A1), the full conditional distribution of \(\sigma ^2_t\), for \(t=0,1,\ldots ,T\) is

$$\begin{aligned} \sigma ^2_t\mid \phi _t,\varvec{\delta }_t,\varvec{\delta }^{*}_t \sim IG \left( \frac{N}{2}+a_t,\left[ \frac{1}{2}(\varvec{\delta }_t-\varvec{\delta }^{*}_t)'(\varvec{V}(\phi _t))^{-1}(\varvec{\delta }_t-\varvec{\delta }^{*}_t)+\frac{1}{b_t}\right] ^{-1}\right) . \end{aligned}$$

Next, we show that the full conditional density of \(\delta _{0i}\); \(i=1,\ldots ,N\) is a Gaussian density. That is,

$$\begin{aligned}{} & {} P(\delta _{0i}\mid \varvec{\theta },\varvec{\delta }_1,\ldots ,\varvec{\delta }_T,\varvec{\delta }_{0(-i)},\textbf{Y}) \nonumber \\{} & {} \quad \propto P\big (\delta _{0i}\mid \sigma ^2_0,\phi _0,\varvec{\alpha },\varvec{\delta }_{0(-i)}\big )\times \prod ^{T}_{t=1} P\big (\delta _{ti}\mid \sigma ^2_t,\phi _t,\varvec{\delta }_0,\varvec{\delta }_{t(-i)}\big ) \nonumber \\{} & {} \quad = \frac{1}{\sqrt{2\pi \sigma ^2_0}}\exp \left\{ -\frac{1}{2\sigma ^2_0}\left[ \delta _{0i}-\Bigg (\varvec{W}'_{j}\varvec{\alpha }+\phi _0\sum _{j\in N_i}c_{ij}\Big (\delta _{0j}-\varvec{W}'_{j}\varvec{\alpha }\Big )\Bigg )\right] ^2\right\} \nonumber \\{} & {} \qquad \times \prod ^{T}_{t=1} \frac{1}{\sqrt{2\pi \sigma ^2_t}} \exp \left\{ -\frac{1}{2\sigma ^2_t}\left[ \delta _{ti}-\Bigg (\delta _{0i}+\phi _t\sum _{j\in N_i}c_{ij}\Big (\delta _{tj}-\delta _{0j}\Big )\Bigg )\right] ^2\right\} \nonumber \\{} & {} \quad \propto \exp \left\{ -\frac{1}{2} \sum ^T_{t=0}\frac{1}{\sigma ^2_{t}} \left[ \delta _{0i}-\Bigg (\delta ^{*}_{ti}+\phi _t\sum _{j\in N_i}c_{ij}\Big (\delta _{0j}-\delta ^{*}_{ti}\Big )\Bigg )\right] ^2\right\} \nonumber \\{} & {} \quad = \exp \left\{ -\frac{1}{2}\left[ \sum ^{T}_{t=0}\frac{1}{\sigma ^2_t}\times \frac{\displaystyle \prod ^{T}_{t=0}\sigma ^2_t}{\displaystyle \sum ^{T}_{t=0}\prod _{k\ne t}\sigma ^2_k}\right] \times \sum ^{T}_{t=0}\frac{1}{\sigma ^2_t}(\delta _{0i}-d^*_{ti})^2 \right\} \nonumber \\{} & {} \quad = \exp \left\{ -\frac{1}{2} \sum ^{T}_{t=0}\frac{1}{\sigma ^2_t} \times \sum ^{T}_{t=0}\frac{\displaystyle \prod _{k\ne t}\sigma ^2_k}{\displaystyle \sum ^{T}_{t=0}\prod _{k\ne t}\sigma ^2_k} \big (\delta ^2_{0i}-2\delta _{0i}d^*_{ti}+d^{*2}_{ti}\big ) \right\} \nonumber \\{} & {} \quad \propto \exp \left\{ -\frac{1}{2} \sum ^{T}_{t=0}\frac{1}{\sigma ^2_t} \times \left[ \delta ^2_{0i}-2\delta _{0i}\sum ^{T}_{t=0}\frac{\displaystyle \prod _{k\ne t}\sigma ^2_k}{\displaystyle \sum ^{T}_{t=0}\prod _{k\ne t}\sigma ^2_k}d^*_{ti}\right] \right\} \nonumber \\{} & {} \quad \propto \exp \left\{ -\frac{1}{2} \sum ^{T}_{t=0}\frac{1}{\sigma ^2_t} \times \left[ \delta _{0i}-\sum ^{T}_{t=0}\frac{d^*_{ti}/\sigma ^2_t}{\displaystyle \sum ^{T}_{t=0}\frac{1}{\sigma ^2_t}}\right] ^2\right\} , \end{aligned}$$
(A2)

where \(\delta ^{*}_{ti} = \varvec{W}'_{i}\varvec{\alpha }\) for \(t=0\) and \(\delta ^{*}_{ti} = \delta _{ti}\) for \(t=1,\ldots ,T\), and \(d^*_{ti}\equiv \delta ^{*}_{ti}+\phi _t\displaystyle \sum _{j\in N_i}c_{ij}(\delta _{0j}-\delta ^{*}_{tj})\). The last result of (A2) follows from

$$\begin{aligned} \sum ^{T}_{t=0}\frac{\displaystyle \prod _{k\ne t}\sigma ^2_k}{\displaystyle \sum ^{T}_{t=0}\prod _{k\ne t}\sigma ^2_k}\times d^*_{ti} = \sum ^{T}_{t=0}\frac{\displaystyle \prod ^T_{t=0}\sigma ^2_k}{\displaystyle \sum ^{T}_{t=0}\prod _{k\ne t}\sigma ^2_k}\times \frac{d^*_{ti}}{\sigma ^2_t} = \sum ^{T}_{t=0}\left( \displaystyle \sum ^{T}_{t=0}\frac{1}{\sigma ^2_t}\right) ^{-1}\times \frac{d^*_{ti}}{\sigma ^2_t}. \end{aligned}$$

Thus, the full conditional distribution of \(\delta _{0i}\) is a Gaussian distribution with mean \(A_{2i}A^{-1}_1\) and variance \(A^{-1}_1\), where \(A_1=\displaystyle \sum ^{T}_{t=0}\frac{1}{\sigma ^2_t}\) and \(A_{2i}=\displaystyle \sum ^T_{t=0}\frac{d^*_{ti}}{\sigma ^2_t}=\displaystyle \sum ^T_{t=0}\frac{1}{\sigma ^2_t}\left( \delta ^{*}_{ti}+\phi _t\displaystyle \sum _{j\in N_i}c_{ij}(\delta _{0j}-\delta ^{*}_{tj})\right)\) for \(t=1,\ldots ,T\).

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, HD., Chang, A., Hsu, WW. et al. A zero-inflated model for spatiotemporal count data with extra zeros: application to 1950–2015 tornado data in Kansas. Environ Ecol Stat 31, 1–25 (2024). https://doi.org/10.1007/s10651-023-00586-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10651-023-00586-3

Keywords

Navigation