Determining sample sizes for combined incident and prevalent cohort studies with and without follow-up

McVittie, James H.

doi:10.1007/s10260-024-00744-2

Determining sample sizes for combined incident and prevalent cohort studies with and without follow-up

Original Paper
Published: 14 February 2024

(2024)
Cite this article

Statistical Methods & Applications Aims and scope Submit manuscript

James H. McVittie¹

63 Accesses
Explore all metrics

Abstract

The determination of the sample size is key in the design of a cohort study when requiring a preset statistical power for comparing time to event outcomes of two groups. In complex survival analysis study designs, the time to event data for the two groups can be sampled from a single cohort using a variety of different procedures or, the time to event data can be drawn from a collection of different cohorts. By assuming a unified study design where the observations from various sampling schemes or independent cohort studies are combined, the potential logistical constraints on acquiring a sufficient number of subjects may be mitigated. We derive sample size formulae for data collected from combined incident and prevalent cohort studies with and without follow-up. We show analytically how a combined cohort study requires fewer observations from its individual cohort components relative to studies using data collected solely from a single cohort. We describe how our sample size formulae may be generalized to arbitrary collections of cohort samples and demonstrate, using simulated cohort data, how the proposed combined cohort testing procedure achieves comparable empirical power relative to when the same procedure is applied to data drawn from a single cohort study.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A note on the partial likelihood estimator of the proportional hazards model for combined incident and prevalent cohort data

Article 09 September 2022

Estimating the hazard rate difference from case-cohort studies

Article 14 June 2021

Joint modeling of longitudinal and survival data with the Cox model and two-phase sampling

Article 23 March 2016

Data availability statement

We do not analyse any datasets as our work proceeds within a theoretical and mathematical approach. The simulation code and corresponding datasets generated are available from the corresponding author(s) on reasonable request.

References

Abner EL, Schmitt FA, Nelson PT, Lou W, Wan L, Gauriglia R, Dodge HH, Woltjer RL, Yu L, Bennett DA, Schneider JA, Chen R, Masaki K, Katz MJ, Lipton RB, Dickson DW, Lim KO, Hemmy LS, Cairns NJ, Grant E, Tyas SL, Xiong C, Fardo DW, Kryscio RJ (2015) The statistical modeling of aging and risk of transition project: data collection and harmonization across 11 longitudinal cohort studies of aging, cognition, and dementia. Obs Stud 1:56–73
Article CAS PubMed PubMed Central Google Scholar
Allison PD (1985) Survival analysis of backward recurrence times. J Am Stat Assoc 80(390):315–322
Article Google Scholar
Asgharian M, M’Lan CE, Wolfson DB (2002) Length-biased sampling with right censoring: An unconditional approach. J Am Stat Assoc 97(457):201–209
Article MathSciNet Google Scholar
Daepp MIG, Hamilton MJ, West GB, Bettencourt LMA (2015) The mortality of companies. J R Soc Interface. https://doi.org/10.1098/rsif.2015.0120
Article PubMed PubMed Central Google Scholar
Efron B (1967) The two sample problem with censored data. In: Proceedings of the 5th Berkeley symposium on mathematical statistics and probability, vol 4, pp 831–852
Groothuis PA, Hill JR (2011) Pay discrimination, exit discrimination or both? Another look at an old issue using NBA data. J Sports Econ 14(2):171–185
Article Google Scholar
Gross AJ, Clark VA (1975) Survival distributions: reliability applications in the biomedical sciences. Wiley series in probability and mathematical statistics. Wiley, New York
Google Scholar
Hoadley B (1971) Asymptotic properties of maximum likelihood estimators for the independent not identically distributed case. Ann Math Stat 42(6):1977–1991
Article MathSciNet Google Scholar
Humbert M, Sitbon O, Yaïci A, Montani D, O’Callaghan DS, Jaïs X, Parent F, Savale L, Natali D, Günther S, Chaouat A, Chabot F, Cordier JF, Habib G, Gressin V, Jing ZC, Souza R, Simonneau G (2010) On behalf of the French pulmonary arterial hypertension network: survival in incident and prevalent cohorts of patients with pulmonary arterial hypertension. Eur Respir J 36:549–555
Article CAS PubMed Google Scholar
Keiding N, Kvist K, Hartvig H, Tvede M (2002) Estimating time to pregnancy from current durations in a cross-sectional sample. Biostatistics 3(4):565–578
Article PubMed Google Scholar
Lachin JM (1981) Introduction to sample size determination and power analysis for clinical trials. Control Clin Trials 2(2):93–113
Article CAS PubMed Google Scholar
Lachin JM, Foulkes MA (1986) Evaluation of sample size and power for analyses of survival with allowance for nonuniform patient entry, losses to follow-up, noncompliance, and stratification. Biometrics 42(3):507–519
Article CAS PubMed Google Scholar
Lawless JF (2003) Statistical models and methods for lifetime data, 2nd edn. Wiley series in probability and statistics. Wiley, Hoboken
Google Scholar
Lee CH, Ning J, Kryscio RJ, Shen Y (2019) Analysis of combined incident and prevalent cohort data under a proportional mean residual life model. Stat Med 38(12):2103–2114
Article MathSciNet PubMed Google Scholar
Liu H, Shen Y, Ning J, Qin J (2017) Sample size calculations for prevalent cohort designs. Stat Methods Med Res 26(1):280–291
Article MathSciNet PubMed Google Scholar
Macleod AD, Taylor KSM, Counsell CE (2014) Mortality in Parkinson’s disease: a systematic review and meta-analysis. Mov Disord 29(13):1615–1622
Article PubMed Google Scholar
McVittie J, Wolfson D, Stephens D, Addona V, Buckeridge D (2020a) Parametric models for combined failure time data from an incident cohort study and a prevalent cohort study with follow-up. Int J Biostat 2:283–293
Google Scholar
McVittie JH, Wolfson DB, Stephens DA (2020b) Parametric modelling of prevalent cohort data with uncertainty in the measurement of the initial onset date. Lifetime Data Anal 26:389–401
Article MathSciNet CAS PubMed Google Scholar
McVittie JH, Wolfson DB, Stephens DA (2020c) A note on the applicability of the standard non-parametric maximum likelihood estimator for combined incident and prevalent cohort data. Stat. https://doi.org/10.1002/sta4.280
Article Google Scholar
McVittie JH, Wolfson DB, Addona V, Li Z (2022a) Stacked survival models for residual lifetime data. BMC Med Res Methodol. https://doi.org/10.1186/s12874-0.21-0.1496-3
Article PubMed PubMed Central Google Scholar
McVittie JH, Best AF, Wolfson DB, Stephens DA, Wolfson J, Buckeridge DL, Gadalla SM (2022b) Survival modelling for data from combined cohorts: opening the door to meta survival analyses and survival analysis using electronic health records. Int Stat Rev. https://doi.org/10.1111/insr.12510
Article PubMed Google Scholar
Ning J, Qin J, Shen Y (2010) Nonparametric tests for right-censored data with biased sampling. J R Stat Soc Ser B Stat Methodol 72(5):609–630
Article MathSciNet Google Scholar
Ning J, Hong C, Li L, Huang X, Shen Y (2017) Estimating treatment effects in observational studies with both prevalent and incident cohorts. Can J Stat 45(2):202–219
Article MathSciNet PubMed PubMed Central Google Scholar
Raina P, Wolfson C, Kirkland S, Griffith L, Oremus M, Patterson C, Tuokko H, Penning M, Balion C, Hogan D, Wister A, Payette H, Shannon H, Brazil K (2009) The Canadian longitudinal study on aging (CLSA). Can J Aging/La Revue Canadienne Du Vieillissement 28(3):221–229
Article Google Scholar
Saarela O, Kulathinal S, Karvanen J (2009) Joint analysis of prevalence and incidence data using conditional likelihood. Biostatistics 10(3):575–587
Article PubMed Google Scholar
Samet JM, Muñoz A (1998) Evolution of the cohort study. Epidemiol Rev 20:1–14
Article CAS PubMed Google Scholar
Tierney JF, Pignon JP, Gueffyier F, Clarke M, Askie L, Vale CL, Burdett S (2015) On behalf of the Cochrane IPD meta-analysis methods group: how individual participant data meta-analyses have influence trial design, conduct, and analysis. J Clin Epidemiol 68:1325–1335
Article PubMed PubMed Central Google Scholar
van der Vaart AW (1998) Asymptotic statistics. Cambridge series in statistical and probabilistic mathematics. Cambridge University Press, New York
Book Google Scholar
Welch SM (1998) Nonparametric estimates of the duration of welfare spells. Econ Lett 60:217–221
Article Google Scholar
Wolfson C, Wolfson DB, Asgharian M, M’Lan C-E, Østbye T, Rockwood K (2001) For the Clinical Progression of Dementia Study Group, D.B.H.: A reevaluation of the duration of survival after the onset of dementia. N Engl J Med 344(15):1111–1116
Article CAS PubMed Google Scholar
Wolfson DB, Best AF, Addona V, Wolfson J, Gadalla SM (2019) Benefits of combining prevalent and incident cohorts: an application to myotonic dystrophy. Stat Methods Med Res 28(10–11):3333–3345
Article MathSciNet PubMed Google Scholar
Zhong Y, Cook RJ (2014) Measurement error for age of onset in prevalent cohort studies. Appl Math 5:1672–1683
Article Google Scholar

Download references

Acknowledgements

We thank the reviewers and editorial board for their suggestions, which we believe, led to an improved manuscript.

Author information

Authors and Affiliations

Department of Mathematics and Statistics, University of Regina, 3737 Wascana Parkway, Regina, SK, S4S 0A2, Canada
James H. McVittie

Authors

James H. McVittie
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to James H. McVittie.

Ethics declarations

Conflict of interest

The author(s) declare that they have no competing interests that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

See Tables 3, 4, 5, and 6.

Table 3 Empirical power levels and standard errors calculated from 50,000 simulated individual cohort data sets

Full size table

Table 4 Empirical power levels and standard errors calculated from 50,000 simulated individual cohort data sets

Full size table

Table 5 Empirical power levels and standard errors calculated from 50,000 simulated combined cohort data sets

Full size table

Table 6 Empirical power levels and standard errors calculated from 50,000 simulated combined cohort data sets

Full size table

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

McVittie, J.H. Determining sample sizes for combined incident and prevalent cohort studies with and without follow-up. Stat Methods Appl (2024). https://doi.org/10.1007/s10260-024-00744-2

Download citation

Accepted: 11 January 2024
Published: 14 February 2024
DOI: https://doi.org/10.1007/s10260-024-00744-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Determining sample sizes for combined incident and prevalent cohort studies with and without follow-up

Abstract

Access this article

Similar content being viewed by others

A note on the partial likelihood estimator of the proportional hazards model for combined incident and prevalent cohort data

Estimating the hazard rate difference from case-cohort studies

Joint modeling of longitudinal and survival data with the Cox model and two-phase sampling

Data availability statement

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Determining sample sizes for combined incident and prevalent cohort studies with and without follow-up

Abstract

Access this article

Similar content being viewed by others

A note on the partial likelihood estimator of the proportional hazards model for combined incident and prevalent cohort data

Estimating the hazard rate difference from case-cohort studies

Joint modeling of longitudinal and survival data with the Cox model and two-phase sampling

Data availability statement

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation