A simple and robust approach to Bayesian modelling of overdispersed data

Fletcher, David; Dillingham, Peter W.; Parry, Matthew

doi:10.1007/s10651-023-00567-6

A simple and robust approach to Bayesian modelling of overdispersed data

Research
Published: 12 June 2023

Volume 30, pages 289–308, (2023)
Cite this article

Environmental and Ecological Statistics Aims and scope Submit manuscript

David Fletcher¹,
Peter W. Dillingham^2,3 &
Matthew Parry^2,4

308 Accesses
Explore all metrics

Abstract

Overdispersion often occurs when fitting a binomial, multinomial or Poisson model to count data. In the Bayesian setting, failure to allow for overdispersion leads to the posteriors for the parameters being too narrow. A simple and natural approach is to incorporate parameter heterogeneity in the model, e.g. by adding a random effect to the linear predictor. However, overdispersion can also be caused by a lack of independence, which may not be straightforward to model explicitly. In addition, there may still be some residual overdispersion after allowing for heterogeneity or lack of independence. In many settings where overdispersion is present, it is reasonable to assume that the variance of the response variable is proportional to that assumed by the model. When this is the case, we propose estimating the amount of overdispersion, and discuss the link between this estimate and the use of a posterior predictive p-value to check lack-of-fit. We also provide a residual plot that can be used to check the assumption of proportionality. We show how to use the estimate of overdispersion to make a simple adjustment to the posterior distribution for each parameter, analogous to the use of quasi-likelihood in the frequentist setting. We use two examples, regression modelling of count data and estimation of survival from a mark-recapture study, to illustrate the calculation of the estimate of overdispersion, and the resulting adjustment to the posteriors. We perform simulation studies based on the examples to assess the frequentist coverage properties of the adjusted posteriors. In both simulation studies, the adjusted posteriors lead to credible intervals that have approximately the correct coverage for a range of overdispersion scenarios. Our approach provides a new, simple and robust tool for Bayesian modelling of overdispersed data, when it is reasonable to assume that the variance of the response variable is proportional to that assumed by the model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Flexible models for overdispersed and underdispersed count data

Article Open access 04 February 2021

The Tilted Beta-Binomial Distribution in Overdispersed Data: Maximum Likelihood and Bayesian Estimation

Article 09 June 2022

On Poisson-exponential-Tweedie models for ultra-overdispersed count data

Article 11 August 2020

Data availability

The data in Sects. 3.1 and 3.2 are publicly available at https://github.com/davidatkaritane/ees-datafiles.

References

Abadi F, Botha A, Altwegg R (2013) Revisiting the effect of capture heterogeneity on survival estimates in capture-mark-recapture studies: does it matter? PLoS ONE 8(4):e62636
Article CAS PubMed PubMed Central Google Scholar
Anderson D, Burnham K, White G (1994) AIC model selection in overdispersed capture-recapture data. Ecology 75(6):1780–1793
Article Google Scholar
Annis DH (2007) A note on quasi-likelihood for exponential families. Stat Probab Lett 77(4):431–437
Article Google Scholar
Bischof R, Dupont P, Milleret C, Chipperfield J, Royle JA (2020) Consequences of ignoring group association in spatial capture–recapture analysis. Wildlife Biol 2020(1):1–10
Article Google Scholar
Christensen OF, Waagepetersen R (2002) Bayesian prediction of spatial count data using generalized linear mixed models. Biometrics 58(2):280–286
Article PubMed Google Scholar
Cox DR, Snell EJ (1989) Analysis of binary data, 2nd edn. Chapman and Hall, New York
Google Scholar
Draghici AM, Challenger WO, Bonner SJ (2021) Understanding the impact of correlation within pair-bonds on Cormack-Jolly-Seber models. Ecol Evol 11(11):5966–5984
Article PubMed PubMed Central Google Scholar
Fitzmaurice GM (1997) Model selection with overdispersed data. J R Stat Soc: D 46(1):81–91
Google Scholar
Flack VF, Flores RA (1989) Using simulated envelopes in the evaluation of normal probability plots of regression residuals. Technometrics 31(2):219–225
Article Google Scholar
Fletcher D (2012) Estimating overdispersion when fitting a generalized linear model to sparse data. Biometrika 99(1):230–237
Article Google Scholar
Fletcher D (2018) Model Averaging. Springer, New York
Book Google Scholar
Friedl H, Stadlober E (1997) Resampling methods in generalized linear models useful in environmetrics. Environ: Off J Int Environ Soc 8(5):441–457
Google Scholar
Gelman A (2003) A Bayesian formulation of exploratory data analysis and goodness-of-fit testing. Int Stat Rev 71(2):369–382
Article Google Scholar
Gelman A, Stern HS, Carlin JB, Dunson DB, Vehtari A, Rubin DB (2013) Bayesian data analysis. Chapman and Hall, London
Book Google Scholar
Gelman A, Su Y-S (2022)arm: Data Analysis Using Regression and Multilevel/Hierarchical Models. R package version 1.13-1
Gimenez O, Choquet R (2010) Individual heterogeneity in studies on marked animals using numerical integration: capture-recapture mixed models. Ecology 91(4):951–957
Article CAS PubMed Google Scholar
Greco L, Racugno W, Ventura L (2008) Robust likelihood functions in Bayesian inference. J Stat Plan Inference 138(5):1258–1270
Article Google Scholar
Hjort NL, Dahl FA, Steinbakk GH (2006) Post-processing posterior predictive p values. J Am Stat Assoc 101(475):1157–1174
Article CAS Google Scholar
Lebreton J-D, Burnham KP, Clobert J, Anderson DR (1992) Modeling survival and testing biological hypotheses using marked animals: a unified approach with case studies. Ecol Monogr 62(1):67–118
Article Google Scholar
Li WK (1994) Time series models based on generalized linear models: some further results. Biometrics 50(2):506–511
Article CAS PubMed Google Scholar
Liang K-Y, Hanfelt J (1994) On the use of the quasi-likelihood method in teratological experiments. Biometrics 50(3):872–880
Article CAS PubMed Google Scholar
Lindsey J (1999) Response surfaces for overdispersion in the study of the conditions for fish eggs hatching. Biometrics 55(1):149–155
Article CAS PubMed Google Scholar
Lindsey JK (1999) On the use of corrections for overdispersion. Appl Stat 48(4):553–561
Google Scholar
McCullagh P, Nelder JA (1989) Generalized Linear Models, 2nd edn. Chapman and Hall, London
Book Google Scholar
Müller UK (2013) Risk of Bayesian inference in misspecified models, and the sandwich covariance matrix. Econometrica 81(5):1805–1849
Article Google Scholar
Pledger S, Pollock KH, Norris JL (2003) Open capture-recapture models with heterogeneity: I Cormack-Jolly-Seber mode. Biometrics 59(4):786–794
Article PubMed Google Scholar
Plummer M, Stukalov A (2016) rjags: Bayesian graphical models using mcmc. R package version, 4(6)
R Core Team (2022) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria
Racugno W, Salvan A, Ventura L (2010) Bayesian analysis in regression models using pseudo-likelihoods. Commun Stat—Theory Methods 39(19):3444–3455
Article Google Scholar
Raftery AE (1996) Approximate Bayes factors and accounting for model uncertainty in generalised linear models. Biometrika 83(2):251–266
Article Google Scholar
Royall R, Tsou T-S (2003) Interpreting statistical evidence by using imperfect models: robust adjusted likelihood functions. J Royal Stat Soc: B 65(2):391–404
Article Google Scholar
Stafford JE et al (1996) A robust adjustment of the profile likelihood. Ann Stat 24(1):336–352
Article Google Scholar
Tjur T (1998) Nonlinear regression, quasi likelihood, and overdispersion in generalized linear models. Am Stat 52(3):222–227
Google Scholar
Turek D, Wehrhahn C, Gimenez O (2021) Bayesian non-parametric detection heterogeneity in ecological models. Environ Ecol Stat 28(2):355–381
Article Google Scholar
Vehtari A, Gelman A, Gabry J (2017) Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Stat Comput 27(5):1413–1432
Article Google Scholar
Ventura L, Cabras S, Racugno W (2010) Default prior distributions from quasi-and quasi-profile likelihoods. J Stat Plan Inference 140(11):2937–2942
Article Google Scholar
Ventura L, Racugno W (2016) Pseudo-likelihoods for Bayesian inference. In: Di Battista T, Moreno E, Racugno W (eds) Topics on methodological and applied statistical inference. Springer, Cham, pp 205–220
Google Scholar
Wedderburn RWM (1974) Quasi-likelihood functions, generalized linear models, and the Gauss–Newton method. Biometrika 61(3):439–447
Google Scholar
West M (1985) Generalized linear models: Scale parameters, outlier accommodation and prior distributions. In: Bernardo JM, DeGroot MH, L DV, Smith AFM (eds) Bayesian Statistics, 2nd edn. Elsevier, North-Holland, pp 531–558
Google Scholar

Download references

Author information

Authors and Affiliations

David Fletcher Consulting Limited, Karitane, New Zealand
David Fletcher
Department of Mathematics and Statistics, University of Otago, Dunedin, New Zealand
Peter W. Dillingham & Matthew Parry
Coastal People Southern Skies Centre of Research Excellence, University of Otago, Dunedin, New Zealand
Peter W. Dillingham
Te Pūnaha Matatini Centre of Research Excellence, University of Auckland, Auckland, New Zealand
Matthew Parry

Authors

David Fletcher
View author publications
You can also search for this author in PubMed Google Scholar
Peter W. Dillingham
View author publications
You can also search for this author in PubMed Google Scholar
Matthew Parry
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

DF and PWD conceived the idea for the paper; PWD wrote the code for the initial simulations; DF wrote the code for the examples and the final simulations; DF led the writing of the manuscript. MP proposed the residual plot in Figs. 1 and 2. All authors contributed critically to the drafts and gave approval for publication.

Corresponding author

Correspondence to David Fletcher.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (PDF 263 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Fletcher, D., Dillingham, P.W. & Parry, M. A simple and robust approach to Bayesian modelling of overdispersed data. Environ Ecol Stat 30, 289–308 (2023). https://doi.org/10.1007/s10651-023-00567-6

Download citation

Received: 03 November 2022
Accepted: 24 May 2023
Published: 12 June 2023
Issue Date: June 2023
DOI: https://doi.org/10.1007/s10651-023-00567-6

Keywords

Access this article

Log in via an institution

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

A simple and robust approach to Bayesian modelling of overdispersed data

Abstract

Access this article

Similar content being viewed by others

Flexible models for overdispersed and underdispersed count data

The Tilted Beta-Binomial Distribution in Overdispersed Data: Maximum Likelihood and Bayesian Estimation

On Poisson-exponential-Tweedie models for ultra-overdispersed count data

Data availability

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Supplementary Information

Supplementary file 1 (PDF 263 KB)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A simple and robust approach to Bayesian modelling of overdispersed data

Abstract

Access this article

Similar content being viewed by others

Flexible models for overdispersed and underdispersed count data

The Tilted Beta-Binomial Distribution in Overdispersed Data: Maximum Likelihood and Bayesian Estimation

On Poisson-exponential-Tweedie models for ultra-overdispersed count data

Data availability

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Supplementary Information

Supplementary file 1 (PDF 263 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation