Skip to main content
Log in

A simple and robust approach to Bayesian modelling of overdispersed data

  • Research
  • Published:
Environmental and Ecological Statistics Aims and scope Submit manuscript

Abstract

Overdispersion often occurs when fitting a binomial, multinomial or Poisson model to count data. In the Bayesian setting, failure to allow for overdispersion leads to the posteriors for the parameters being too narrow. A simple and natural approach is to incorporate parameter heterogeneity in the model, e.g. by adding a random effect to the linear predictor. However, overdispersion can also be caused by a lack of independence, which may not be straightforward to model explicitly. In addition, there may still be some residual overdispersion after allowing for heterogeneity or lack of independence. In many settings where overdispersion is present, it is reasonable to assume that the variance of the response variable is proportional to that assumed by the model. When this is the case, we propose estimating the amount of overdispersion, and discuss the link between this estimate and the use of a posterior predictive p-value to check lack-of-fit. We also provide a residual plot that can be used to check the assumption of proportionality. We show how to use the estimate of overdispersion to make a simple adjustment to the posterior distribution for each parameter, analogous to the use of quasi-likelihood in the frequentist setting. We use two examples, regression modelling of count data and estimation of survival from a mark-recapture study, to illustrate the calculation of the estimate of overdispersion, and the resulting adjustment to the posteriors. We perform simulation studies based on the examples to assess the frequentist coverage properties of the adjusted posteriors. In both simulation studies, the adjusted posteriors lead to credible intervals that have approximately the correct coverage for a range of overdispersion scenarios. Our approach provides a new, simple and robust tool for Bayesian modelling of overdispersed data, when it is reasonable to assume that the variance of the response variable is proportional to that assumed by the model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Data availability

The data in Sects. 3.1 and 3.2 are publicly available at https://github.com/davidatkaritane/ees-datafiles.

References

  • Abadi F, Botha A, Altwegg R (2013) Revisiting the effect of capture heterogeneity on survival estimates in capture-mark-recapture studies: does it matter? PLoS ONE 8(4):e62636

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Anderson D, Burnham K, White G (1994) AIC model selection in overdispersed capture-recapture data. Ecology 75(6):1780–1793

    Article  Google Scholar 

  • Annis DH (2007) A note on quasi-likelihood for exponential families. Stat Probab Lett 77(4):431–437

    Article  Google Scholar 

  • Bischof R, Dupont P, Milleret C, Chipperfield J, Royle JA (2020) Consequences of ignoring group association in spatial capture–recapture analysis. Wildlife Biol 2020(1):1–10

    Article  Google Scholar 

  • Christensen OF, Waagepetersen R (2002) Bayesian prediction of spatial count data using generalized linear mixed models. Biometrics 58(2):280–286

    Article  PubMed  Google Scholar 

  • Cox DR, Snell EJ (1989) Analysis of binary data, 2nd edn. Chapman and Hall, New York

    Google Scholar 

  • Draghici AM, Challenger WO, Bonner SJ (2021) Understanding the impact of correlation within pair-bonds on Cormack-Jolly-Seber models. Ecol Evol 11(11):5966–5984

    Article  PubMed  PubMed Central  Google Scholar 

  • Fitzmaurice GM (1997) Model selection with overdispersed data. J R Stat Soc: D 46(1):81–91

    Google Scholar 

  • Flack VF, Flores RA (1989) Using simulated envelopes in the evaluation of normal probability plots of regression residuals. Technometrics 31(2):219–225

    Article  Google Scholar 

  • Fletcher D (2012) Estimating overdispersion when fitting a generalized linear model to sparse data. Biometrika 99(1):230–237

    Article  Google Scholar 

  • Fletcher D (2018) Model Averaging. Springer, New York

    Book  Google Scholar 

  • Friedl H, Stadlober E (1997) Resampling methods in generalized linear models useful in environmetrics. Environ: Off J Int Environ Soc 8(5):441–457

    Google Scholar 

  • Gelman A (2003) A Bayesian formulation of exploratory data analysis and goodness-of-fit testing. Int Stat Rev 71(2):369–382

    Article  Google Scholar 

  • Gelman A, Stern HS, Carlin JB, Dunson DB, Vehtari A, Rubin DB (2013) Bayesian data analysis. Chapman and Hall, London

    Book  Google Scholar 

  • Gelman A, Su Y-S (2022)arm: Data Analysis Using Regression and Multilevel/Hierarchical Models. R package version 1.13-1

  • Gimenez O, Choquet R (2010) Individual heterogeneity in studies on marked animals using numerical integration: capture-recapture mixed models. Ecology 91(4):951–957

    Article  CAS  PubMed  Google Scholar 

  • Greco L, Racugno W, Ventura L (2008) Robust likelihood functions in Bayesian inference. J Stat Plan Inference 138(5):1258–1270

    Article  Google Scholar 

  • Hjort NL, Dahl FA, Steinbakk GH (2006) Post-processing posterior predictive p values. J Am Stat Assoc 101(475):1157–1174

    Article  CAS  Google Scholar 

  • Lebreton J-D, Burnham KP, Clobert J, Anderson DR (1992) Modeling survival and testing biological hypotheses using marked animals: a unified approach with case studies. Ecol Monogr 62(1):67–118

    Article  Google Scholar 

  • Li WK (1994) Time series models based on generalized linear models: some further results. Biometrics 50(2):506–511

    Article  CAS  PubMed  Google Scholar 

  • Liang K-Y, Hanfelt J (1994) On the use of the quasi-likelihood method in teratological experiments. Biometrics 50(3):872–880

    Article  CAS  PubMed  Google Scholar 

  • Lindsey J (1999) Response surfaces for overdispersion in the study of the conditions for fish eggs hatching. Biometrics 55(1):149–155

    Article  CAS  PubMed  Google Scholar 

  • Lindsey JK (1999) On the use of corrections for overdispersion. Appl Stat 48(4):553–561

    Google Scholar 

  • McCullagh P, Nelder JA (1989) Generalized Linear Models, 2nd edn. Chapman and Hall, London

    Book  Google Scholar 

  • Müller UK (2013) Risk of Bayesian inference in misspecified models, and the sandwich covariance matrix. Econometrica 81(5):1805–1849

    Article  Google Scholar 

  • Pledger S, Pollock KH, Norris JL (2003) Open capture-recapture models with heterogeneity: I Cormack-Jolly-Seber mode. Biometrics 59(4):786–794

    Article  PubMed  Google Scholar 

  • Plummer M, Stukalov A (2016) rjags: Bayesian graphical models using mcmc. R package version, 4(6)

  • R Core Team (2022) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria

  • Racugno W, Salvan A, Ventura L (2010) Bayesian analysis in regression models using pseudo-likelihoods. Commun Stat—Theory Methods 39(19):3444–3455

    Article  Google Scholar 

  • Raftery AE (1996) Approximate Bayes factors and accounting for model uncertainty in generalised linear models. Biometrika 83(2):251–266

    Article  Google Scholar 

  • Royall R, Tsou T-S (2003) Interpreting statistical evidence by using imperfect models: robust adjusted likelihood functions. J Royal Stat Soc: B 65(2):391–404

    Article  Google Scholar 

  • Stafford JE et al (1996) A robust adjustment of the profile likelihood. Ann Stat 24(1):336–352

    Article  Google Scholar 

  • Tjur T (1998) Nonlinear regression, quasi likelihood, and overdispersion in generalized linear models. Am Stat 52(3):222–227

    Google Scholar 

  • Turek D, Wehrhahn C, Gimenez O (2021) Bayesian non-parametric detection heterogeneity in ecological models. Environ Ecol Stat 28(2):355–381

    Article  Google Scholar 

  • Vehtari A, Gelman A, Gabry J (2017) Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Stat Comput 27(5):1413–1432

    Article  Google Scholar 

  • Ventura L, Cabras S, Racugno W (2010) Default prior distributions from quasi-and quasi-profile likelihoods. J Stat Plan Inference 140(11):2937–2942

    Article  Google Scholar 

  • Ventura L, Racugno W (2016) Pseudo-likelihoods for Bayesian inference. In: Di Battista T, Moreno E, Racugno W (eds) Topics on methodological and applied statistical inference. Springer, Cham, pp 205–220

    Google Scholar 

  • Wedderburn RWM (1974) Quasi-likelihood functions, generalized linear models, and the Gauss–Newton method. Biometrika 61(3):439–447

    Google Scholar 

  • West M (1985) Generalized linear models: Scale parameters, outlier accommodation and prior distributions. In: Bernardo JM, DeGroot MH, L DV, Smith AFM (eds) Bayesian Statistics, 2nd edn. Elsevier, North-Holland, pp 531–558

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Contributions

DF and PWD conceived the idea for the paper; PWD wrote the code for the initial simulations; DF wrote the code for the examples and the final simulations; DF led the writing of the manuscript. MP proposed the residual plot in Figs. 1 and 2. All authors contributed critically to the drafts and gave approval for publication.

Corresponding author

Correspondence to David Fletcher.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (PDF 263 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fletcher, D., Dillingham, P.W. & Parry, M. A simple and robust approach to Bayesian modelling of overdispersed data. Environ Ecol Stat 30, 289–308 (2023). https://doi.org/10.1007/s10651-023-00567-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10651-023-00567-6

Keywords

Navigation