Skip to main content
Log in

Determining sample sizes for combined incident and prevalent cohort studies with and without follow-up

  • Original Paper
  • Published:
Statistical Methods & Applications Aims and scope Submit manuscript

Abstract

The determination of the sample size is key in the design of a cohort study when requiring a preset statistical power for comparing time to event outcomes of two groups. In complex survival analysis study designs, the time to event data for the two groups can be sampled from a single cohort using a variety of different procedures or, the time to event data can be drawn from a collection of different cohorts. By assuming a unified study design where the observations from various sampling schemes or independent cohort studies are combined, the potential logistical constraints on acquiring a sufficient number of subjects may be mitigated. We derive sample size formulae for data collected from combined incident and prevalent cohort studies with and without follow-up. We show analytically how a combined cohort study requires fewer observations from its individual cohort components relative to studies using data collected solely from a single cohort. We describe how our sample size formulae may be generalized to arbitrary collections of cohort samples and demonstrate, using simulated cohort data, how the proposed combined cohort testing procedure achieves comparable empirical power relative to when the same procedure is applied to data drawn from a single cohort study.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Data availability statement

We do not analyse any datasets as our work proceeds within a theoretical and mathematical approach. The simulation code and corresponding datasets generated are available from the corresponding author(s) on reasonable request.

References

  • Abner EL, Schmitt FA, Nelson PT, Lou W, Wan L, Gauriglia R, Dodge HH, Woltjer RL, Yu L, Bennett DA, Schneider JA, Chen R, Masaki K, Katz MJ, Lipton RB, Dickson DW, Lim KO, Hemmy LS, Cairns NJ, Grant E, Tyas SL, Xiong C, Fardo DW, Kryscio RJ (2015) The statistical modeling of aging and risk of transition project: data collection and harmonization across 11 longitudinal cohort studies of aging, cognition, and dementia. Obs Stud 1:56–73

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Allison PD (1985) Survival analysis of backward recurrence times. J Am Stat Assoc 80(390):315–322

    Article  Google Scholar 

  • Asgharian M, M’Lan CE, Wolfson DB (2002) Length-biased sampling with right censoring: An unconditional approach. J Am Stat Assoc 97(457):201–209

    Article  MathSciNet  Google Scholar 

  • Daepp MIG, Hamilton MJ, West GB, Bettencourt LMA (2015) The mortality of companies. J R Soc Interface. https://doi.org/10.1098/rsif.2015.0120

    Article  PubMed  PubMed Central  Google Scholar 

  • Efron B (1967) The two sample problem with censored data. In: Proceedings of the 5th Berkeley symposium on mathematical statistics and probability, vol 4, pp 831–852

  • Groothuis PA, Hill JR (2011) Pay discrimination, exit discrimination or both? Another look at an old issue using NBA data. J Sports Econ 14(2):171–185

    Article  Google Scholar 

  • Gross AJ, Clark VA (1975) Survival distributions: reliability applications in the biomedical sciences. Wiley series in probability and mathematical statistics. Wiley, New York

    Google Scholar 

  • Hoadley B (1971) Asymptotic properties of maximum likelihood estimators for the independent not identically distributed case. Ann Math Stat 42(6):1977–1991

    Article  MathSciNet  Google Scholar 

  • Humbert M, Sitbon O, Yaïci A, Montani D, O’Callaghan DS, Jaïs X, Parent F, Savale L, Natali D, Günther S, Chaouat A, Chabot F, Cordier JF, Habib G, Gressin V, Jing ZC, Souza R, Simonneau G (2010) On behalf of the French pulmonary arterial hypertension network: survival in incident and prevalent cohorts of patients with pulmonary arterial hypertension. Eur Respir J 36:549–555

    Article  CAS  PubMed  Google Scholar 

  • Keiding N, Kvist K, Hartvig H, Tvede M (2002) Estimating time to pregnancy from current durations in a cross-sectional sample. Biostatistics 3(4):565–578

    Article  PubMed  Google Scholar 

  • Lachin JM (1981) Introduction to sample size determination and power analysis for clinical trials. Control Clin Trials 2(2):93–113

    Article  CAS  PubMed  Google Scholar 

  • Lachin JM, Foulkes MA (1986) Evaluation of sample size and power for analyses of survival with allowance for nonuniform patient entry, losses to follow-up, noncompliance, and stratification. Biometrics 42(3):507–519

    Article  CAS  PubMed  Google Scholar 

  • Lawless JF (2003) Statistical models and methods for lifetime data, 2nd edn. Wiley series in probability and statistics. Wiley, Hoboken

    Google Scholar 

  • Lee CH, Ning J, Kryscio RJ, Shen Y (2019) Analysis of combined incident and prevalent cohort data under a proportional mean residual life model. Stat Med 38(12):2103–2114

    Article  MathSciNet  PubMed  Google Scholar 

  • Liu H, Shen Y, Ning J, Qin J (2017) Sample size calculations for prevalent cohort designs. Stat Methods Med Res 26(1):280–291

    Article  MathSciNet  PubMed  Google Scholar 

  • Macleod AD, Taylor KSM, Counsell CE (2014) Mortality in Parkinson’s disease: a systematic review and meta-analysis. Mov Disord 29(13):1615–1622

    Article  PubMed  Google Scholar 

  • McVittie J, Wolfson D, Stephens D, Addona V, Buckeridge D (2020a) Parametric models for combined failure time data from an incident cohort study and a prevalent cohort study with follow-up. Int J Biostat 2:283–293

    Google Scholar 

  • McVittie JH, Wolfson DB, Stephens DA (2020b) Parametric modelling of prevalent cohort data with uncertainty in the measurement of the initial onset date. Lifetime Data Anal 26:389–401

    Article  MathSciNet  CAS  PubMed  Google Scholar 

  • McVittie JH, Wolfson DB, Stephens DA (2020c) A note on the applicability of the standard non-parametric maximum likelihood estimator for combined incident and prevalent cohort data. Stat. https://doi.org/10.1002/sta4.280

    Article  Google Scholar 

  • McVittie JH, Wolfson DB, Addona V, Li Z (2022a) Stacked survival models for residual lifetime data. BMC Med Res Methodol. https://doi.org/10.1186/s12874-0.21-0.1496-3

    Article  PubMed  PubMed Central  Google Scholar 

  • McVittie JH, Best AF, Wolfson DB, Stephens DA, Wolfson J, Buckeridge DL, Gadalla SM (2022b) Survival modelling for data from combined cohorts: opening the door to meta survival analyses and survival analysis using electronic health records. Int Stat Rev. https://doi.org/10.1111/insr.12510

    Article  PubMed  Google Scholar 

  • Ning J, Qin J, Shen Y (2010) Nonparametric tests for right-censored data with biased sampling. J R Stat Soc Ser B Stat Methodol 72(5):609–630

    Article  MathSciNet  Google Scholar 

  • Ning J, Hong C, Li L, Huang X, Shen Y (2017) Estimating treatment effects in observational studies with both prevalent and incident cohorts. Can J Stat 45(2):202–219

    Article  MathSciNet  PubMed  PubMed Central  Google Scholar 

  • Raina P, Wolfson C, Kirkland S, Griffith L, Oremus M, Patterson C, Tuokko H, Penning M, Balion C, Hogan D, Wister A, Payette H, Shannon H, Brazil K (2009) The Canadian longitudinal study on aging (CLSA). Can J Aging/La Revue Canadienne Du Vieillissement 28(3):221–229

    Article  Google Scholar 

  • Saarela O, Kulathinal S, Karvanen J (2009) Joint analysis of prevalence and incidence data using conditional likelihood. Biostatistics 10(3):575–587

    Article  PubMed  Google Scholar 

  • Samet JM, Muñoz A (1998) Evolution of the cohort study. Epidemiol Rev 20:1–14

    Article  CAS  PubMed  Google Scholar 

  • Tierney JF, Pignon JP, Gueffyier F, Clarke M, Askie L, Vale CL, Burdett S (2015) On behalf of the Cochrane IPD meta-analysis methods group: how individual participant data meta-analyses have influence trial design, conduct, and analysis. J Clin Epidemiol 68:1325–1335

    Article  PubMed  PubMed Central  Google Scholar 

  • van der Vaart AW (1998) Asymptotic statistics. Cambridge series in statistical and probabilistic mathematics. Cambridge University Press, New York

    Book  Google Scholar 

  • Welch SM (1998) Nonparametric estimates of the duration of welfare spells. Econ Lett 60:217–221

    Article  Google Scholar 

  • Wolfson C, Wolfson DB, Asgharian M, M’Lan C-E, Østbye T, Rockwood K (2001) For the Clinical Progression of Dementia Study Group, D.B.H.: A reevaluation of the duration of survival after the onset of dementia. N Engl J Med 344(15):1111–1116

    Article  CAS  PubMed  Google Scholar 

  • Wolfson DB, Best AF, Addona V, Wolfson J, Gadalla SM (2019) Benefits of combining prevalent and incident cohorts: an application to myotonic dystrophy. Stat Methods Med Res 28(10–11):3333–3345

    Article  MathSciNet  PubMed  Google Scholar 

  • Zhong Y, Cook RJ (2014) Measurement error for age of onset in prevalent cohort studies. Appl Math 5:1672–1683

    Article  Google Scholar 

Download references

Acknowledgements

We thank the reviewers and editorial board for their suggestions, which we believe, led to an improved manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to James H. McVittie.

Ethics declarations

Conflict of interest

The author(s) declare that they have no competing interests that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

See Tables 3, 4, 5, and 6.

Table 3 Empirical power levels and standard errors calculated from 50,000 simulated individual cohort data sets
Table 4 Empirical power levels and standard errors calculated from 50,000 simulated individual cohort data sets
Table 5 Empirical power levels and standard errors calculated from 50,000 simulated combined cohort data sets
Table 6 Empirical power levels and standard errors calculated from 50,000 simulated combined cohort data sets

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

McVittie, J.H. Determining sample sizes for combined incident and prevalent cohort studies with and without follow-up. Stat Methods Appl (2024). https://doi.org/10.1007/s10260-024-00744-2

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10260-024-00744-2

Keywords

Navigation