Abstract
Overdispersion often occurs when fitting a binomial, multinomial or Poisson model to count data. In the Bayesian setting, failure to allow for overdispersion leads to the posteriors for the parameters being too narrow. A simple and natural approach is to incorporate parameter heterogeneity in the model, e.g. by adding a random effect to the linear predictor. However, overdispersion can also be caused by a lack of independence, which may not be straightforward to model explicitly. In addition, there may still be some residual overdispersion after allowing for heterogeneity or lack of independence. In many settings where overdispersion is present, it is reasonable to assume that the variance of the response variable is proportional to that assumed by the model. When this is the case, we propose estimating the amount of overdispersion, and discuss the link between this estimate and the use of a posterior predictive p-value to check lack-of-fit. We also provide a residual plot that can be used to check the assumption of proportionality. We show how to use the estimate of overdispersion to make a simple adjustment to the posterior distribution for each parameter, analogous to the use of quasi-likelihood in the frequentist setting. We use two examples, regression modelling of count data and estimation of survival from a mark-recapture study, to illustrate the calculation of the estimate of overdispersion, and the resulting adjustment to the posteriors. We perform simulation studies based on the examples to assess the frequentist coverage properties of the adjusted posteriors. In both simulation studies, the adjusted posteriors lead to credible intervals that have approximately the correct coverage for a range of overdispersion scenarios. Our approach provides a new, simple and robust tool for Bayesian modelling of overdispersed data, when it is reasonable to assume that the variance of the response variable is proportional to that assumed by the model.
Access this article
We’re sorry, something doesn't seem to be working properly.
Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.
Similar content being viewed by others
Data availability
The data in Sects. 3.1 and 3.2 are publicly available at https://github.com/davidatkaritane/ees-datafiles.
References
Abadi F, Botha A, Altwegg R (2013) Revisiting the effect of capture heterogeneity on survival estimates in capture-mark-recapture studies: does it matter? PLoS ONE 8(4):e62636
Anderson D, Burnham K, White G (1994) AIC model selection in overdispersed capture-recapture data. Ecology 75(6):1780–1793
Annis DH (2007) A note on quasi-likelihood for exponential families. Stat Probab Lett 77(4):431–437
Bischof R, Dupont P, Milleret C, Chipperfield J, Royle JA (2020) Consequences of ignoring group association in spatial capture–recapture analysis. Wildlife Biol 2020(1):1–10
Christensen OF, Waagepetersen R (2002) Bayesian prediction of spatial count data using generalized linear mixed models. Biometrics 58(2):280–286
Cox DR, Snell EJ (1989) Analysis of binary data, 2nd edn. Chapman and Hall, New York
Draghici AM, Challenger WO, Bonner SJ (2021) Understanding the impact of correlation within pair-bonds on Cormack-Jolly-Seber models. Ecol Evol 11(11):5966–5984
Fitzmaurice GM (1997) Model selection with overdispersed data. J R Stat Soc: D 46(1):81–91
Flack VF, Flores RA (1989) Using simulated envelopes in the evaluation of normal probability plots of regression residuals. Technometrics 31(2):219–225
Fletcher D (2012) Estimating overdispersion when fitting a generalized linear model to sparse data. Biometrika 99(1):230–237
Fletcher D (2018) Model Averaging. Springer, New York
Friedl H, Stadlober E (1997) Resampling methods in generalized linear models useful in environmetrics. Environ: Off J Int Environ Soc 8(5):441–457
Gelman A (2003) A Bayesian formulation of exploratory data analysis and goodness-of-fit testing. Int Stat Rev 71(2):369–382
Gelman A, Stern HS, Carlin JB, Dunson DB, Vehtari A, Rubin DB (2013) Bayesian data analysis. Chapman and Hall, London
Gelman A, Su Y-S (2022)arm: Data Analysis Using Regression and Multilevel/Hierarchical Models. R package version 1.13-1
Gimenez O, Choquet R (2010) Individual heterogeneity in studies on marked animals using numerical integration: capture-recapture mixed models. Ecology 91(4):951–957
Greco L, Racugno W, Ventura L (2008) Robust likelihood functions in Bayesian inference. J Stat Plan Inference 138(5):1258–1270
Hjort NL, Dahl FA, Steinbakk GH (2006) Post-processing posterior predictive p values. J Am Stat Assoc 101(475):1157–1174
Lebreton J-D, Burnham KP, Clobert J, Anderson DR (1992) Modeling survival and testing biological hypotheses using marked animals: a unified approach with case studies. Ecol Monogr 62(1):67–118
Li WK (1994) Time series models based on generalized linear models: some further results. Biometrics 50(2):506–511
Liang K-Y, Hanfelt J (1994) On the use of the quasi-likelihood method in teratological experiments. Biometrics 50(3):872–880
Lindsey J (1999) Response surfaces for overdispersion in the study of the conditions for fish eggs hatching. Biometrics 55(1):149–155
Lindsey JK (1999) On the use of corrections for overdispersion. Appl Stat 48(4):553–561
McCullagh P, Nelder JA (1989) Generalized Linear Models, 2nd edn. Chapman and Hall, London
Müller UK (2013) Risk of Bayesian inference in misspecified models, and the sandwich covariance matrix. Econometrica 81(5):1805–1849
Pledger S, Pollock KH, Norris JL (2003) Open capture-recapture models with heterogeneity: I Cormack-Jolly-Seber mode. Biometrics 59(4):786–794
Plummer M, Stukalov A (2016) rjags: Bayesian graphical models using mcmc. R package version, 4(6)
R Core Team (2022) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria
Racugno W, Salvan A, Ventura L (2010) Bayesian analysis in regression models using pseudo-likelihoods. Commun Stat—Theory Methods 39(19):3444–3455
Raftery AE (1996) Approximate Bayes factors and accounting for model uncertainty in generalised linear models. Biometrika 83(2):251–266
Royall R, Tsou T-S (2003) Interpreting statistical evidence by using imperfect models: robust adjusted likelihood functions. J Royal Stat Soc: B 65(2):391–404
Stafford JE et al (1996) A robust adjustment of the profile likelihood. Ann Stat 24(1):336–352
Tjur T (1998) Nonlinear regression, quasi likelihood, and overdispersion in generalized linear models. Am Stat 52(3):222–227
Turek D, Wehrhahn C, Gimenez O (2021) Bayesian non-parametric detection heterogeneity in ecological models. Environ Ecol Stat 28(2):355–381
Vehtari A, Gelman A, Gabry J (2017) Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Stat Comput 27(5):1413–1432
Ventura L, Cabras S, Racugno W (2010) Default prior distributions from quasi-and quasi-profile likelihoods. J Stat Plan Inference 140(11):2937–2942
Ventura L, Racugno W (2016) Pseudo-likelihoods for Bayesian inference. In: Di Battista T, Moreno E, Racugno W (eds) Topics on methodological and applied statistical inference. Springer, Cham, pp 205–220
Wedderburn RWM (1974) Quasi-likelihood functions, generalized linear models, and the Gauss–Newton method. Biometrika 61(3):439–447
West M (1985) Generalized linear models: Scale parameters, outlier accommodation and prior distributions. In: Bernardo JM, DeGroot MH, L DV, Smith AFM (eds) Bayesian Statistics, 2nd edn. Elsevier, North-Holland, pp 531–558
Author information
Authors and Affiliations
Contributions
DF and PWD conceived the idea for the paper; PWD wrote the code for the initial simulations; DF wrote the code for the examples and the final simulations; DF led the writing of the manuscript. MP proposed the residual plot in Figs. 1 and 2. All authors contributed critically to the drafts and gave approval for publication.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Fletcher, D., Dillingham, P.W. & Parry, M. A simple and robust approach to Bayesian modelling of overdispersed data. Environ Ecol Stat 30, 289–308 (2023). https://doi.org/10.1007/s10651-023-00567-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10651-023-00567-6