Abstract
In many tornado climate studies, the number of tornado touchdowns is often the primary outcome of interest. These outcome measures are usually generated under a spatiotemporal correlation structure and contains many zeros due to the rarity of tornado occurrence at a specific location and time interval. To model the spatiotemporal count data with excess zeros, we propose a spatiotemporal zero-inflated Poisson (ZIP) model, which lends itself to ease of interpretation and computational simplicity. Technically, we embed a modified conditional autoregressive model in the ZIP model to describe the spatial and temporal correlations, where the probability of a pure zero in the ZIP is purposely designed to depend on locations but independent of time. Illustrated with the longitudinal tornado touchdown data in the state of Kansas from 1950 to 2015, our model suggests that the spatial correlation among the counties and the corresponding temperature are significant factors attributed to the tornado touchdowns. Through the model, we can also estimate the probabilities of no tornado touchdowns for each county over time. These estimated probabilities substantially help us understand the pattern of touchdowns and further identify the risk areas across Kansas. Moreover, these estimates can be iteratively updated when more current touchdown data are available. The final model for Kansas tornado touchdown data is evaluated using more recent data.
Similar content being viewed by others
References
Agarwal DK, Gelfand AE, Citron-Pousty S (2002) Zero-inflated models with application to spatial count data. Environ Ecol Stat 9(4):341–355
Akers CM, Smith NJ, Shifa N (2014) Multinomial logistic regression model for predicting tornado intensity based on path length and width. World Environ 4(2):61–66
Amek N, Bayoh N, Hamel M, Lindblade KA, Gimnig J, Laserson KF, Slutsker L, Smith T, Vounatsou P (2011) Spatio-temporal modeling of sparse geostatistical malaria sporozoite rate data using a zero inflated binomial model. Spat Spatio-Temporal Epidemiol 2(4):283–290
Besag J, York J, Mollié A (1991) Bayesian image restoration, with two applications in spatial statistics. Ann Inst Stat Math 43(1):1–20
Böhning D, Dietz E, Schlattmann P, Mendonca L, Kirchner U (1999) The zero-inflated Poisson model and the decayed, missing and filled teeth index in dental epidemiology. J R Stat Soc A 162(2):195–209
Buu A, Li R, Tan X, Zucker RA (2012) Statistical models for longitudinal zero-inflated count data with applications to the substance abuse field. Stat Med 31(29):4074–4086
Cheng VY, Arhonditsis GB, Sills DM, Gough WA, Auld H (2015) A Bayesian modelling framework for tornado occurrences in north America. Nat Commun 6(1):6599
Chib S, Greenberg E (1995) Understanding the Metropolis Hastings algorithm. Am Stat J 49:327–335
Cressie NAC (2015) Statistics for spatial data. John Wiley & Sons
Cressie NAC (1993) Statistics for spatial data, revised edition. Wiley, New York
Diaz J, Joseph MB (2019) Predicting property damage from tornadoes with zero-inflated neural networks. Weather Clim Extrem 25:100216
Diuk-Wasser MA, Vourc’h G, Cislo P, Hoen AG, Melton F, Hamer SA, Rowland M, Cortinas R, Hickling GJ, Tsao JI et al (2010) Field and climate-based model for predicting the density of host-seeking nymphal Ixodes scapularis, an important vector of tick-borne disease agents in the eastern United States. Glob Ecol Biogeogr 19(4):504–514
Doswell CA III, Carbin GW, Brooks HE (2012) The tornadoes of spring 2011 in the USA: an historical perspective. Weather 67(4):88–94
Dzupire NC, Ngare P, Odongo L (2018) A Poisson-gamma model for zero inflated rainfall data. J Probab Stat 2018:1–12
Farewell VT, Sprott D (1988) The use of a mixture model in the analysis of count data. Biometrics 44(4):1191–1194
Gelfand AE, Smith AFM (1990) Sampling-based approaches to calculating marginal densities. J Am Stat Assoc 85:398–409
Geman S, Geman D (1984) Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images. IEEE Trans. Pattern Anal Mach Intell 6:721–741
Gomez-Rubio V, Cameletti M, Finazzi F (2015) Analysis of massive marked point patterns with stochastic partial differential equations. Spat Stat 14:179–196
Gu X, Yan X, Ma L, Liu X (2020) Modeling the service-route-based crash frequency by a spatiotemporal-random-effect zero-inflated negative binomial model: an empirical analysis for bus-involved crashes. Accid Anal Prev 144:105674
Hastings WK (1970) Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57:97–109
Hong H, Huang Q, Jiang W, Tang Q, Jarrett P (2021) Tornado wind hazard mapping and equivalent tornado design wind profile for Canada. Struct Saf 91:102078
Hu M-C, Pavlicova M, Nunes EV (2011) Zero-inflated and hurdle models of count data with extra zeros: examples from an HIV-risk reduction intervention trial. Am J Drug Alcohol Abuse 37(5):367–375
Jagger TH, Elsner JB, Widen HM (2015) A statistical model for regional tornado climate studies. PLoS ONE 10(8):0131876
Karpman D, Ferreira MA, Wikle CK (2013) A point process model for tornado report climatology. Stat 2(1):1–8
Kim D-W, Deo RC, Park S-J, Lee J-S, Lee W-S (2019) Weekly heat wave death prediction model using zero-inflated regression approach. Theor Appl Climatol 137(1):823–838
Lee AH, Wang K, Scott JA, Yau KK, McLachlan GJ (2006) Multi-level zero-inflated Poisson regression modelling of correlated count data with excess zeros. Stat Methods Med Res 15(1):47–61
Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller E(1953)Equations of state calculations by fast computing machines. J Chem Phys 21:1087–1092
Monfredo W (1999) Relationships between phases of the El Nino-Southern Oscillation and character of the tornado season in the south-central United States. Phys Geogr 20(5):413–421
Neelon B (2019) Bayesian zero-inflated negative binomial regression based on pólya-gamma mixtures. Bayesian Anal 14(3):829–855
Rathbun SL, Fei S (2006) A spatial zero-inflated Poisson regression model for oak regeneration. Environ Ecol Stat 13(4):409–426
Ridout M, Demétrio CG, Hinde J (1998) Models for count data with many zeros. In: Proceedings of the XIXth international biometric conference, vol 19. pp 179–192
Smith AN, Anderson MJ, Millar RB, Willis TJ (2014) Effects of marine reserves in the context of spatial and temporal variation: an analysis using Bayesian zero-inflated mixed models. Mar Ecol Prog Ser 499:203–216
Standohar-Alfano CD, Lindt JW (2015) Empirically based probabilistic tornado hazard analysis of the United States using 1973–2011 data. Nat Hazard Rev 16(1):04014013
Strader SM, Ashley WS, Pingel TJ, Krmenec AJ (2017a) Observed and projected changes in United States tornado exposure. Weather Clim Soc 9(2):109–123
Strader SM, Ashley WS, Pingel TJ, Krmenec AJ (2017b) Projected 21st century changes in tornado exposure, risk, and disaster potential. Clim Change 141(2):301–313
Thom H (1963) Tornado probabilities. Mon Weather Rev 91(10):730–736
Tippett MK, Sobel AH, Camargo SJ (2012) Association of U.S. tornado occurrence with monthly environmental parameters. Geophys Res Lett. https://doi.org/10.1029/2011GL050368
Torabi M (2017) Zero-inflated spatio-temporal models for disease mapping. Biom J 59(3):430–444
Valente F, Laurini M (2020) Tornado occurrences in the United States: a spatio-temporal point process approach. Econometrics 8(2):25
Wang X, Chen M-H, Kuo RC, Dey DK (2015) Bayesian spatial-temporal modeling of ecological zero-inflated count data. Stat Sin 25(1):189
Wang J, Reyes-Gibby CC, Shete S (2021) An approach to analyze longitudinal zero-inflated microbiome count data using two-stage mixed effects models. Stat Biosci 13(2):267–290
Wikle CK, Anderson CJ (2003) Climatological analysis of tornado report counts using a hierarchical Bayesian spatiotemporal model. J Geophys Res Atmos. https://doi.org/10.1029/2002JD002806
Yip KC, Yau KK (2005) On modeling claim frequency data in general insurance with extra zeros. Insur Math Econ 36(2):153–163
Acknowledgements
We extend our gratitude to the Editors, the Associate Editor, and the two anonymous referees for their valuable time and insightful comments, which have significantly improved this paper. Hong-Ding Yang’s research is funded by the National Science and Technology Council, Taiwan (NSTC 111-2118-M-415-001 and NSTC 112-2118-M-415-001). Chun-Shu Chen’s research is founded by the National Science and Technology Council, Taiwan (NSTC111-2118-M-008-002-MY2).
Author information
Authors and Affiliations
Contributions
Conceptualization: H-DY, W-WH, and C-SC. Data curation: H-DY and AC. Formal analysis and discussion: H-DY, W-WH, and C-SC. Figures and tables: H-DY and AC. Methodology: H-DY, W-WH, and C-SC. Writing—original draft: H-DY, W-WH, and C-SC. Writing—review & editing: H-DY, AC, W-WH, and C-SC. Revision: H-DY, W-WH, and C-SC.
Corresponding author
Ethics declarations
Conflict of interest
No potential conflict of interest was reported by the authors.
Additional information
Handling Editor: Luiz Duczmal.
Appendix: derivations of the full conditional distributions for the model parameters
Appendix: derivations of the full conditional distributions for the model parameters
According to the assumptions for the variance components \(\sigma ^2_t\); \(t=0,1,\ldots ,T\), and spatiotemporal random process \(\delta _{0i}\); \(i=1,\ldots ,N\), the corresponding full conditional densities can be exactly determined and then the Gibbs sampling schedule can be used to generate the posterior samples \(\sigma ^2_t\) and \(\delta _{0i}\), respectively. This supplement focuses on illustrating the full conditional distributions of \(\sigma ^2_t\) and \(\delta _{0i}\). For each \(\sigma ^2_t\), \(t=0,1,\ldots ,T\), the full conditional density can be derived by the following manner:
where \(\varvec{\delta }^{*}_t=\varvec{W\alpha }\) for \(t=0\) and \(\varvec{\delta }^{*}_t=\varvec{\delta }_0\) for \(t=1,\ldots ,T\). According to (A1), the full conditional distribution of \(\sigma ^2_t\), for \(t=0,1,\ldots ,T\) is
Next, we show that the full conditional density of \(\delta _{0i}\); \(i=1,\ldots ,N\) is a Gaussian density. That is,
where \(\delta ^{*}_{ti} = \varvec{W}'_{i}\varvec{\alpha }\) for \(t=0\) and \(\delta ^{*}_{ti} = \delta _{ti}\) for \(t=1,\ldots ,T\), and \(d^*_{ti}\equiv \delta ^{*}_{ti}+\phi _t\displaystyle \sum _{j\in N_i}c_{ij}(\delta _{0j}-\delta ^{*}_{tj})\). The last result of (A2) follows from
Thus, the full conditional distribution of \(\delta _{0i}\) is a Gaussian distribution with mean \(A_{2i}A^{-1}_1\) and variance \(A^{-1}_1\), where \(A_1=\displaystyle \sum ^{T}_{t=0}\frac{1}{\sigma ^2_t}\) and \(A_{2i}=\displaystyle \sum ^T_{t=0}\frac{d^*_{ti}}{\sigma ^2_t}=\displaystyle \sum ^T_{t=0}\frac{1}{\sigma ^2_t}\left( \delta ^{*}_{ti}+\phi _t\displaystyle \sum _{j\in N_i}c_{ij}(\delta _{0j}-\delta ^{*}_{tj})\right)\) for \(t=1,\ldots ,T\).
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Yang, HD., Chang, A., Hsu, WW. et al. A zero-inflated model for spatiotemporal count data with extra zeros: application to 1950–2015 tornado data in Kansas. Environ Ecol Stat 31, 1–25 (2024). https://doi.org/10.1007/s10651-023-00586-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10651-023-00586-3