Abstract
The determination of the sample size is key in the design of a cohort study when requiring a preset statistical power for comparing time to event outcomes of two groups. In complex survival analysis study designs, the time to event data for the two groups can be sampled from a single cohort using a variety of different procedures or, the time to event data can be drawn from a collection of different cohorts. By assuming a unified study design where the observations from various sampling schemes or independent cohort studies are combined, the potential logistical constraints on acquiring a sufficient number of subjects may be mitigated. We derive sample size formulae for data collected from combined incident and prevalent cohort studies with and without follow-up. We show analytically how a combined cohort study requires fewer observations from its individual cohort components relative to studies using data collected solely from a single cohort. We describe how our sample size formulae may be generalized to arbitrary collections of cohort samples and demonstrate, using simulated cohort data, how the proposed combined cohort testing procedure achieves comparable empirical power relative to when the same procedure is applied to data drawn from a single cohort study.
Similar content being viewed by others
Data availability statement
We do not analyse any datasets as our work proceeds within a theoretical and mathematical approach. The simulation code and corresponding datasets generated are available from the corresponding author(s) on reasonable request.
References
Abner EL, Schmitt FA, Nelson PT, Lou W, Wan L, Gauriglia R, Dodge HH, Woltjer RL, Yu L, Bennett DA, Schneider JA, Chen R, Masaki K, Katz MJ, Lipton RB, Dickson DW, Lim KO, Hemmy LS, Cairns NJ, Grant E, Tyas SL, Xiong C, Fardo DW, Kryscio RJ (2015) The statistical modeling of aging and risk of transition project: data collection and harmonization across 11 longitudinal cohort studies of aging, cognition, and dementia. Obs Stud 1:56–73
Allison PD (1985) Survival analysis of backward recurrence times. J Am Stat Assoc 80(390):315–322
Asgharian M, M’Lan CE, Wolfson DB (2002) Length-biased sampling with right censoring: An unconditional approach. J Am Stat Assoc 97(457):201–209
Daepp MIG, Hamilton MJ, West GB, Bettencourt LMA (2015) The mortality of companies. J R Soc Interface. https://doi.org/10.1098/rsif.2015.0120
Efron B (1967) The two sample problem with censored data. In: Proceedings of the 5th Berkeley symposium on mathematical statistics and probability, vol 4, pp 831–852
Groothuis PA, Hill JR (2011) Pay discrimination, exit discrimination or both? Another look at an old issue using NBA data. J Sports Econ 14(2):171–185
Gross AJ, Clark VA (1975) Survival distributions: reliability applications in the biomedical sciences. Wiley series in probability and mathematical statistics. Wiley, New York
Hoadley B (1971) Asymptotic properties of maximum likelihood estimators for the independent not identically distributed case. Ann Math Stat 42(6):1977–1991
Humbert M, Sitbon O, Yaïci A, Montani D, O’Callaghan DS, Jaïs X, Parent F, Savale L, Natali D, Günther S, Chaouat A, Chabot F, Cordier JF, Habib G, Gressin V, Jing ZC, Souza R, Simonneau G (2010) On behalf of the French pulmonary arterial hypertension network: survival in incident and prevalent cohorts of patients with pulmonary arterial hypertension. Eur Respir J 36:549–555
Keiding N, Kvist K, Hartvig H, Tvede M (2002) Estimating time to pregnancy from current durations in a cross-sectional sample. Biostatistics 3(4):565–578
Lachin JM (1981) Introduction to sample size determination and power analysis for clinical trials. Control Clin Trials 2(2):93–113
Lachin JM, Foulkes MA (1986) Evaluation of sample size and power for analyses of survival with allowance for nonuniform patient entry, losses to follow-up, noncompliance, and stratification. Biometrics 42(3):507–519
Lawless JF (2003) Statistical models and methods for lifetime data, 2nd edn. Wiley series in probability and statistics. Wiley, Hoboken
Lee CH, Ning J, Kryscio RJ, Shen Y (2019) Analysis of combined incident and prevalent cohort data under a proportional mean residual life model. Stat Med 38(12):2103–2114
Liu H, Shen Y, Ning J, Qin J (2017) Sample size calculations for prevalent cohort designs. Stat Methods Med Res 26(1):280–291
Macleod AD, Taylor KSM, Counsell CE (2014) Mortality in Parkinson’s disease: a systematic review and meta-analysis. Mov Disord 29(13):1615–1622
McVittie J, Wolfson D, Stephens D, Addona V, Buckeridge D (2020a) Parametric models for combined failure time data from an incident cohort study and a prevalent cohort study with follow-up. Int J Biostat 2:283–293
McVittie JH, Wolfson DB, Stephens DA (2020b) Parametric modelling of prevalent cohort data with uncertainty in the measurement of the initial onset date. Lifetime Data Anal 26:389–401
McVittie JH, Wolfson DB, Stephens DA (2020c) A note on the applicability of the standard non-parametric maximum likelihood estimator for combined incident and prevalent cohort data. Stat. https://doi.org/10.1002/sta4.280
McVittie JH, Wolfson DB, Addona V, Li Z (2022a) Stacked survival models for residual lifetime data. BMC Med Res Methodol. https://doi.org/10.1186/s12874-0.21-0.1496-3
McVittie JH, Best AF, Wolfson DB, Stephens DA, Wolfson J, Buckeridge DL, Gadalla SM (2022b) Survival modelling for data from combined cohorts: opening the door to meta survival analyses and survival analysis using electronic health records. Int Stat Rev. https://doi.org/10.1111/insr.12510
Ning J, Qin J, Shen Y (2010) Nonparametric tests for right-censored data with biased sampling. J R Stat Soc Ser B Stat Methodol 72(5):609–630
Ning J, Hong C, Li L, Huang X, Shen Y (2017) Estimating treatment effects in observational studies with both prevalent and incident cohorts. Can J Stat 45(2):202–219
Raina P, Wolfson C, Kirkland S, Griffith L, Oremus M, Patterson C, Tuokko H, Penning M, Balion C, Hogan D, Wister A, Payette H, Shannon H, Brazil K (2009) The Canadian longitudinal study on aging (CLSA). Can J Aging/La Revue Canadienne Du Vieillissement 28(3):221–229
Saarela O, Kulathinal S, Karvanen J (2009) Joint analysis of prevalence and incidence data using conditional likelihood. Biostatistics 10(3):575–587
Samet JM, Muñoz A (1998) Evolution of the cohort study. Epidemiol Rev 20:1–14
Tierney JF, Pignon JP, Gueffyier F, Clarke M, Askie L, Vale CL, Burdett S (2015) On behalf of the Cochrane IPD meta-analysis methods group: how individual participant data meta-analyses have influence trial design, conduct, and analysis. J Clin Epidemiol 68:1325–1335
van der Vaart AW (1998) Asymptotic statistics. Cambridge series in statistical and probabilistic mathematics. Cambridge University Press, New York
Welch SM (1998) Nonparametric estimates of the duration of welfare spells. Econ Lett 60:217–221
Wolfson C, Wolfson DB, Asgharian M, M’Lan C-E, Østbye T, Rockwood K (2001) For the Clinical Progression of Dementia Study Group, D.B.H.: A reevaluation of the duration of survival after the onset of dementia. N Engl J Med 344(15):1111–1116
Wolfson DB, Best AF, Addona V, Wolfson J, Gadalla SM (2019) Benefits of combining prevalent and incident cohorts: an application to myotonic dystrophy. Stat Methods Med Res 28(10–11):3333–3345
Zhong Y, Cook RJ (2014) Measurement error for age of onset in prevalent cohort studies. Appl Math 5:1672–1683
Acknowledgements
We thank the reviewers and editorial board for their suggestions, which we believe, led to an improved manuscript.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The author(s) declare that they have no competing interests that could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
McVittie, J.H. Determining sample sizes for combined incident and prevalent cohort studies with and without follow-up. Stat Methods Appl (2024). https://doi.org/10.1007/s10260-024-00744-2
Accepted:
Published:
DOI: https://doi.org/10.1007/s10260-024-00744-2