1 Introduction

Classical demography imputes population ageing to low fertility, not to low mortality (Coale 1956; Keyfitz 1975). When mortality is very high, its decline may even lead to population rejuvenation, due to improved infant and child survival, as it happened at the beginning of the demographic transition (Coale 1972; Chesnais 1990, 1992). Extreme cases aside, Coale (1957) showed that, had fertility remained constant, the age structure of Sweden would have been practically the same in 1860 and 1950, despite strong mortality reduction. Bengtsson and Scott (2005, 2010) updated the exercise and confirmed that, with constant fertility, the proportion of people aged 65 and over in Sweden would have remained practically unchanged between 1900 and 2000. In both cases, so the argument goes, huge mortality progress proves of very small consequence on the population age structure, contrary to intuition.

After World War II, however, with improvements in survival concentrated at old and very old ages (Vaupel 2010) and the emergence of a sort of ageing “at the apex” (Bourgeois-Pichat 1979), or “from the top” (Preston et al. 1989; Caselli and Vallin 1990), some “revisionist” scholars, as Lee and Zhou (2017) call them with specific reference to Preston and Stokes (2012), challenged the idea that fertility is always the major driver of population ageing. Murphy (2017), for instance, shows that in 11 European populations in the past 65 years or so, ageing was mainly due to low mortality. However, the idea of ageing from the top remains controversial: Lee and Zhou (2017) based on their own counterfactual analysis, maintain that even in modern populations, low fertility remains the main cause of population ageing.

While Murphy (2021) warns that the results of counterfactuals and population projections depend on the base year selected for the analysis and on population momentum, the main qualitative conclusions remain the same: fertility decline is at the root of population ageing, although the role of improving survival is no longer negligible nowadays.

In this paper, we show that mortality is, in the long run, a very good predictor of the age structure of a population and that its decline drives population ageing. To do this, we exploit two sets of empirical data, for various countries and several years. Our dependent variable is the share of individuals by age, or current (population) age structure. Gender differences are ignored in what follows, for several reasons. First, we want our readers to focus on the basics of our approach, which is non-customary in this field of research and rather complex in itself. Second, studies on the causes of population ageing rarely, if ever, work separately by sex. Finally, our preliminary analyses, not reported here, did not reveal any systematic gender difference in the relations that we are about to analyse.

Our independent variable is the age structure of the stationary population associated with the current, cross-sectional life table, i.e. the Lx series: this we will call reference age structure. Several scholars question the validity of period life tables as indicators of the “true” evolution of survival, maintaining that cohort life tables should be used instead (e.g., Borgan and Keilman 2019). While a theoretical discussion about this would be too long in this paper, we indirectly contribute to the debate by showing that an empirical and strong relationship exists, for any given population, between period life tables and population age structures at (about) the same dates.

Our analysis relies on cointegration (Sect. 2). Cointegration analysis has been widely used in the field of mortality (Arnold and Sherris 2013, 2016; Gaille and Sherris 2011; Lazar and Denuit 2009; Yang and Wang 2013; Zhou et al. 2014), but not, to the best of our knowledge, to analyse the evolution of the age structure of a population. Readers who are not familiar with cointegration can nonetheless follow our line of reasoning, which is trivial: we test whether the reference age structure (the proportion of individuals of age x in the stationary population associated with the period life table of year t) “attracts” the current age structure of a population, and whether this small, but persistent force of attraction eventually prevails, and shapes population age structures.

To simplify, two opposite outcomes are possible:

1) Our predicting capacity proves limited. In this case, the effect of mortality (or, at least, of period mortality) on the age structure, if any, is small and the classical result of demographic analysis holds: something else (low fertility arguably, but definitely not mortality decline) is at the root of population ageing;

2) A large part of the age structure dynamics can be explained by the evolution of survival (as described by a succession of period life tables), and this in all the countries under scrutiny (eight, see below) and for the entire period of observation (the last two centuries or so).

Our results point in the latter direction: period life tables “explain” a large part of observable population age structures. This, we argue in our conclusions, may lead scholars to re-evaluate the role of mortality as a driver of population ageing.

The paper is organized as follows. The next section is devoted to the formal description of the test that we will use in this paper. The third section presents the data and some descriptive statistics. The fourth section contains a preliminary analysis of the relationship between the current and the reference age structure: even without great sophistication, the two age structures turn out to be very closely connected. The fifth section confirms these results using a more refined approach: cointegration and the test introduced in Sect. 2. The sixth and last section discusses our findings and their implications.

2 Testing cointegration between the current and the reference age structure

Let \({C}_{x,t}\) denote the population age structure expressed as the proportion of individuals of age x in year t

$${C}_{x,t}=\frac{{P}_{x,t}}{{P}_{t}}$$
(1)

,

where \({P}_{x,t}\) and \({P}_{t}\) stand for the population of age x and in total, respectively, in year t.

Similarly, the reference age structure of the population (i.e., the age structure of the current stationary population) can be defined as:

$${K}_{x,t}=\frac{{L}_{x,t}}{\sum _{x}{L}_{x,t}}$$
(2)

where \({L}_{x,t}\) indicates the person-years lived at age x in the life table of year t. However, we will use the log transformations of \({K}_{x,t}\) and \({C}_{x,t}\), \({k}_{x,t}\) and \({c}_{x,t}\) respectively, for two main reasons: to circumvent one of the limitations of proportions (they are bounded in the 0–1 interval, with the lower limit, 0, particularly disturbing) and to better approximate linearity in the relationship between the two series.

The main goal of this paper is to test whether a long-run linear relationship exists between \({c}_{x,t}\) and \({k}_{x,t}\) for any given age x, and for varying t:

$${c}_{x,t}={m}_{x}+{\gamma }_{x}{k}_{x,t}+{\epsilon }_{x,t},$$
(3)

where \({m}_{x}\) and \({\gamma }_{x}\) are the model age-specific coefficients and \({\epsilon }_{x,t}\) is the error term. If this relationship exists for a given age x, it becomes possible to predict the proportion of individuals of that age at time t from the reference proportion \({k}_{x,t}\). If this holds for all ages \({x}_{s}\), the entire age structure of year t can be derived from the life table of that year.

To understand the rationale underling Eq. (3), let us start from the opposite assumption, that mortality and fertility are independent of each other. In this case, long historical phases would be possible, and would be observed, with constant or increasing fertility and declining mortality, or vice versa, and, consequently, with no long-run tendency towards equilibrium between the current and the reference age structures. However, such situations have never been observed in sufficiently large populations and for protracted periods. Fertility and mortality have always been relatively close to each other, sometimes at high levels (before the demographic transition), sometimes at low levels (after it), and sometimes following a declining trend (during the demographic transition). Alternative scenarios, incidentally, are difficult to imagine, because they would lead to population explosion or disappearance. This logical and empirical regularity is the main reason that leads us to posit that, in the long run, the age structure of a population cannot be too far away from its reference counterpart.

Equation (3) does not rule out the possibility that also preceding values of kx affect the current value of cx,t. Let us assume, for instance, that the “true” dynamic of the age structure is given by the following Autoregressive Distributed Lag (ADL) model:

$${c}_{x,t}={\beta }_{0,x}+{\beta }_{1,x}{c}_{x,t-1}+{\beta }_{2,x}{k}_{x,t}+{\beta }_{3,x}{k}_{x,t-1}+{\epsilon }_{x,t}$$
(3’)

.

It can be proved that this form of dynamics corresponds to a long-run relationship of the type of Eq. (3) where \({m}_{x}=\frac{{\beta }_{0,x}}{1-{\beta }_{1,x}}\) and \({\gamma }_{x}=\frac{{\beta }_{2,x}+{\beta }_{3,x}}{1-{\beta }_{1,x}}\) (see e.g. Johnston and DiNardo 2007:245; Pesaran 2015; Box et al. 2016. For a simplified approach, see Giles 2013).

If a long-run linear relationship between \({c}_{\text{x},t}\) and \({k}_{\text{x},t}\) actually exists, we are also interested in assessing whether Eq. (3) can be approximated by simpler models such as:

$${c}_{x,t}={\gamma }_{x}{k}_{x,t}+{\epsilon }_{x,t},$$
(4)
$${c}_{x,t}={k}_{x,t}+{\epsilon }_{x,t},$$
(5)

Model (4) assumes that \({m}_{\text{x}}=0\), while model (5) introduces the additional assumption that \({\gamma }_{\text{x}}=1\) for all x’s. The interest of Eq. (5) lies in its simplicity: it says that the current population age structure of a population can be approximated by simply looking at its reference age structure, i.e. the life table of that year. This makes the estimation of \({m}_{\text{x}}\) and \({\gamma }_{\text{x}}\) unnecessary, and saves the need to collect and analyse long time series of both sets of data (current and reference age structures).

Unfortunately, assessing the existence of a long-run relationship between two time series of the type described in Eq. (3) proves difficult, because spurious correlation must be ruled out (Granger and Newbold 1974). If the “innovations” that affect these series have permanent effects on them (the series have a “unit root”, in the econometric jargon), their dynamics are non-stationary (i.e., their means, variances and autocovariances depend on time) and the two series may appear to be correlated even if they are not. Arguably, permanent innovations play a fundamental role in the long-run dynamics of mortality and fertility. Think of vaccines, penicillin and antibiotics, coronary bypass, the formation of antibiotic-resistant bacteria, the spread of smoking and alcohol drinking, etc.: all with permanent effects on the evolution of mortality. Likewise, the reduction of infant and child mortality, the increase in the cost of children, the new role of women (higher education and greater participation in the labour market), etc., have permanently changed the evolution of fertility.

Two series are said to be cointegrated if:

  1. 1.

    Innovations produce permanent effects (unit roots) and

  2. 2.

    The series are not spuriously correlated in the long run, with a functional relation of the type described in Eq. (3).

In classical cointegration analysis, therefore, one must first test for the presence of unit roots in the series under consideration, and then, if these are present, test for cointegration. The cointegration test is based on the residuals of Eq. 3, not on its coefficients (Engle and Granger 1987). If the residuals turn out to be stationary (their mean, variance and covariance are independent of time), a long-run relationship between the two series is likely to exist. This test is usually performed via the Augmented Dickey-Fuller (ADF) unit root test, although other solutions are also possible.

To understand the rationale of this approach, let us define \({\widehat{c}}_{x,t}={m}_{x}+{\gamma }_{x}{k}_{x,t}\) as the log-proportion of individuals of age x in year t predicted by Eq. 3 based on the reference age structure (better: on the reference log-proportion of individuals of age x in year t). If the current and the reference age structure are cointegrated, the current log-proportion of individuals aged x, \({c}_{x,t}\), will show a tendency to “revert” to its long-run equilibrium value \({\widehat{c}}_{x,t}\). In practice, cointegration means that \({c}_{x,t}\) and \({\widehat{c}}_{x,t}\) cannot be too far away from each other, because some “force” pushes \({c}_{x,t}\) towards \({\widehat{c}}_{x,t}\). In this case, the residuals \({\widehat{c}}_{x,t}-{c}_{x,t}\) will show a stationary dynamic (never diverging too much), and this explains why classical cointegration tests focus on residuals.

Unfortunately, the standard test used to check whether time series are stationary often proves inconclusive. Its “low power” derives ultimately from the fact that three conditions must be met:

1 and 2) The test must not reject the null hypothesis of a unit root for each of the two series under scrutiny (this is to confirm that innovations do produce permanent effects in both series);

3) The test must reject the null hypothesis of a unit root when it is applied to the analysis of residuals (this is to prove that one of the series tends to revert on the other).

These conditions are rarely met in practice: not necessarily because the underlying hypotheses are false, but because of “noise”, such as data errors and, possibly, other intervening variables. Therefore, other, more powerful approaches have been proposed. Among these, the so-called “bounds test” (Pesaran et al. 2001), which, in a way, verifies the three aforementioned conditions with a single test.

The general idea behind this approach is to test the existence of a possible tendency of a series to revert towards its long-run equilibrium path by estimating the following conditional error correction model (ECM) for a specific and constant age x, and for varying t:

$$\varDelta {c}_{x,t}={\alpha }_{x,0} {+\delta }_{x,1}{c}_{x,t-1}+{\delta }_{x,2}{k}_{x,t-1}+\sum _{i=1}^{p}{\alpha }_{x,i}\varDelta {c}_{x,t-i}+\sum _{i=0}^{q}{\beta }_{x,i}\varDelta {k}_{x,t-i}+{\epsilon }_{x,t},$$
(6)

where \({\alpha }_{i}\), \({{\upbeta }}_{i}\) and \({\delta }_{\text{i}}\) are model parameters, \(\varDelta {c}_{a,t}={c}_{a,t}-{c}_{a,t-1}\), \(\varDelta {k}_{a,t}={k}_{a,t}-{k}_{a,t-1}\), and p and q are lags, to be discussed shortly.

Equation 6 presents several advantages. First, it can be estimated with ordinary least squares (OLS).

Second, the lags p and q do not need to be predetermined: a statistical procedure, based on the Bayesian information criterion (BIC) will suggest the best combination of the two. To determine the best p and q (lags) of our ECM, we started from a 3·4 grid search. In practice, for each country in our analysis we estimated the model 12 times, with different combinations of p = 1, …, 3 and q = 0, …, 3. In this case, and deviating from what we did in the rest of our analyses, we decided to work with quinquennial data: therefore, we are considering lags of up to 5·3 = 15 years. In most cases, the best model (with the lowest BIC value) has p = 1 (77% of the cases) and q = 0 (74% of the cases).

Third, the parameters of Eq. 3 in the main text can be derived from those of Eq. 6, because:

$${m_x} = {{{\alpha _{x,0}}} \over { - {\delta _{x,1}}}}\,{\rm{and}}\,{\gamma _x} = {{{\delta _{x,2}}} \over { - {\delta _{x,1}}}}.$$
(7)

Fourth, the existence of a cointegration relationship between \({c}_{x,t}\) and \({k}_{x,t}\) can be tested with the F statistics on the null hypothesis \({H}_{0}:{\delta }_{x,1}={\delta }_{x, 2}=0\). The distribution of this statistic is non-standard, but its critical values, calculated with Monte Carlo simulations, are tabulated in Pesaran et al. (2001, Table CI(iii). These critical values are generally greater than those employed in the standard F test, which makes the rejection of the null hypothesis (of no cointegration) more difficult.

For any given significance level, two critical values are offered: \({F}_{U}\) and \({F}_{L}\), upper and lower, respectively (Pesaran 2015:526). This, which incidentally justifies the name “bounds test”, means that three possible outcomes are possible:

  1. 1)

    \(F>{F}_{U}\) signals the likely existence of a long-run relationship;

  2. 2)

    \(F<{F}_{L}\)indicates that the long-run relationship is unlikely to exist; and, in between,

  3. 3)

    \({F}_{L}<F<{F}_{U}\) leads to a “suspension verdict”: the inference is inconclusive.

The third outcome of the bounds test may have various possible causes: for instance, it emerges when the series have “a different order of integration”, which means that only one of them has a unit root, with innovations inducing permanent effects. In this case, further analyses are needed to verify the possible cointegration between the two series.

If \({c}_{x,t}\) and \({k}_{x,t}\) turn out to be cointegrated, one can estimate Eq. 3 with OLS and use the \({k}_{x,t}\) series and the estimated coefficients to predict the evolution of \({c}_{x,t}\). The proportion of the overall variance of \({c}_{x,t}\) explained by \({k}_{x,t}\) is given by the simple \({R}^{2}\) statistics of the regression model.

3 The data

For our analysis, we use data taken from the Human Mortality Database (HMD) on eight populations located in Europe, North America and Oceania (Table 1). We selected these populations based on two main criteria:

  1. 1.

    The data start at least in the 1930s so the series span at least 85 years;

  2. 2.

    The effects of the two world wars are limited, either because the national territory was spared (as in the case of Australia, Canada, New Zealand, Sweden, Switzerland and the USA) or because the data allow us to focus only on the civilian population (as in the case of France and England-Wales).

Table 1 Summary statistics

We used data on exposures to calculate the current proportions of individuals of age x in year t (\({C}_{x,t}\)), and period life tables to compute their reference counterpart, i.e. the proportions of person-years at age x out of the total (\({K}_{x,t}\)). We focused on the ages between 0 and 99 years, by five-year age groups (0–4; 5–9; …; 95–99).

The countries covered by our analysis have time series of varying lengths. Typically, data for the European populations are available since the 19th century, occasionally earlier (Swedish data, for instance, begin in the 18th century). However, we decided to start in 1820, or as early as possible after that, because older data are often scarcely reliable. In non-European countries, the series start typically between 1910 and 1930.

The age structures of our populations changed considerably over time. The mean age, for instance increased from about 25–26 years to about 38–39 years, while the mean age of the corresponding reference populations (i.e., the stationary populations associated with the current period life tables) passed from 30 to 33 to 39–40 years (Table 1). The relative weights of the youngest and the oldest age groups changed markedly in the last two centuries in both the current and the reference populations, while the relative weights of the central age groups barely varied. At these central ages, the modest variability of our independent variable \({k}_{x,t}\) limits the explanatory power of the ECM (Eq. 6); in practice, however, this is less problematic than it seems, because also the dependent variable \({c}_{x,t}\) barely changes. The case of Sweden, the country that we will systematically use as an example in this paper, is shown in the box plots of Fig. 1. France (see below) and other countries not presented here for reasons of space behave similarly.

Fig. 1
figure 1

Median values and variability of kx and cx by five-year age classes in Sweden (1820–2017)

Note: kx=ln(Kx); cx=ln(Cx). Five-year ages are classes indicated with their central age

One of the difficulties that our analysis must face is represented by mortality crises: e.g., the cholera epidemics of the 19th century and the Spanish flu epidemics of 1918. These discontinuities may introduce several forms of distortion. The most problematic are probably those linked to fertility swings, which induce baby booms and busts and create “waves” in the age structure \({C}_{x,t}\) (but not in the reference age structure \({K}_{x,t}\), our independent variable) of the subsequent 100 years or so. This reduces the explanatory power of our model, which, however, remains high, as we will see shortly.

Another factor to keep under control is linearity, which is assumed in our two fundamental Eqs. (3 and 6), but which may not exist in real life. Figure 2 suggests that the linearity assumption generally holds, except for a small slope change in the passage to older ages (above 60 years). More refined statistical analyses (interpolation with higher order polynomials, not shown here) indicate that there is also a slight (but statistically significant) convexity after 60 years, and a concavity before this age. This deviations from linearity indicate that some heterogeneity likely exists in the values of the \({m}_{x}\) and \({\gamma }_{x}\) parameters of Eq. (3).

Fig. 2
figure 2

Relationship between kx and cx for all the countries and years of Table 1

Note: kx=ln(Kx) and cx=ln(Cx). Small values of Kx and Cx, and therefore lower values of kx and cx, (bottom left), characterise older ages

Source: See Table 1

4 The long-run relationship between the reference and the current age structure: a naïve analysis

Before tackling the cointegration analysis of Sect. 5, let us present our argument (existence of a long-run relationship between the current and the reference age structure) in the simplest possible way.

Table 2 refers, once again, to Sweden and it shows, at different ages, the estimates of the parameters of Eqs. 3 and 4, along with their standard errors and the associated R2. The last three columns of Table 4 report the mean squared errors (MSE) of three models: the full one (Eq. 3), the restricted one (intercept set to zero, Eq. 4) and the twice-restricted one, with zero intercept and unitary slope (Eq. 5). The general idea behind this exercise is to assess how close, on average, the current age structure is to its reference counterpart, and whether the use of simpler models is statistically justified.

Table 2 Parameter estimates by age of Eqs. 3, 4 and 5 for Sweden

Note that because of the non-stationary nature of the \({c}_{x,t}\) and the \({k}_{x,t}\) series, the estimated standard errors of Table 2 are likely biased. The \({R}^{2}\) values, on the other hand, may reflect a spurious correlation between the series, although this seems unlikely, because the estimated slopes are positive in 18 cases out of 20 (full model: Eq. 3, Table 2). In other words, at almost all ages, the current and the reference proportions of individuals tend to move in the same direction, decreasing or increasing together. The only exceptions are two central age classes (35–39 and 40–44 years) where, however, variability is limited in both cx,t and kx,t (Fig. 1), which also leads to very low \({R}^{2}\) values

The estimates of Table 2 can be used to model, or “predict”, the evolution of the current age structure over time. Figure 3.A presents three predicted proportions of individuals by age: the blue dotted line represents the evolution of \({K}_{x,t}\) (reference age structure and Eq. 5); the red, solid line represents the evolution of the current proportion of individuals of age x (\({C}_{x,t}\)), and the orange-dashed line represents the prediction of this proportion \(\left({\widehat{C}}_{x,t}\right)\) based on the \({K}_{x,t}\) series and on the estimated age-specific parameters of Table 2 (Eq. 3, full linear model).

Fig. 3
figure 3

A Current (red, solid line), reference (blue, dotted line) and predicted (orange, dashed line) proportion of individuals in selected age-groups (Sweden, 1820–2017)

Note: predictions based on the full model (Eq. 3). Please mind the different scales on the y-axis

Our models, based exclusively on the current life table and the associated stationary population, capture remarkably well the general evolution of the age structure in the past two hundred years in Sweden, in France (Fig. 3.B), and in the other countries of Table 1 not shown here. Of course, our simplified models cannot accurately depict the fluctuations around the underlying trend, i.e. the age-structure effects of (a) previous mortality-affecting events such as wars and epidemics, and (b) fertility and migration, totally ignored, here.

Fig. 3
figure 4

B Current (red, solid line), reference (blue, dotted line) and predicted (orange, dashed line) proportion of individuals in selected age-groups (France, 1820–2017)

With the model proportions of Fig. 3.A, we can also estimate the entire age structure of the Swedish population at different epochs (Fig. 4.A). Both Fig. 3.A and 4.A show that the close correspondence between the current C and the predicted \(\widehat{C}\) values dates back to very long ago. In other words, the correlation between survival (period life tables) and the shape of the age structure (which includes population ageing) seems to predate not only the strong improvements in old-age mortality that materialized after the 1950s (Murphy 2017), but also the onset of the demographic transition, which started shortly after 1860, in Sweden.

Fig. 4
figure 5

A Current (red, solid line), reference (blue, dotted line) and predicted (orange, dashed line) proportion of individuals in selected periods (Sweden)

Figure 3.A and 4.A also show that the twice-restricted version of Eq. 3 (i.e., Eq. 5 – no intercept, unitary slope) roughly works: the reference proportion of individuals aged x is generally close to its current counterpart (Kx,t~Cx,t). The approximation (as measured with the MSE for instance) is generally only slightly worse than in the full model (and it is even better than it in recent years), and the interpretation is much simpler. As a first approximation, the age structure of the reference population in year t gives a good idea of the current age structure of that population in the same period, which is a way of saying that survival (more precisely: current survival) “explains” most of the current age structure of the population. At young ages (< 30), the current proportion of individuals tends to be slightly higher than the reference, while at older ages (> 55) the reverse is true. This is not surprising in a population with a four-fold increase in the period (from 2.5 million in 1820 to more than 10 million in 2018). Indeed, this small distortion disappears altogether in recent times (years 2000–2004; last panel of Fig. 4.A), when the effects of the demographic transition are over.

Similar results emerge also for the other countries of our dataset. France for instance experienced a very early, but also a very gradual demographic transition: consequently, the evolution of its age structure deviates from that of Sweden. This notwithstanding, and despite the shocks of two World wars, the current and the reference age structures remain close one to each other during the entire period under observation (Fig. 3.B and 4.B).

Fig. 4
figure 6

B Current (red, solid line), reference (blue, dotted line) and predicted (orange, dashed line) proportion of individuals in selected periods (France)

Figure 5 shows the median R2 value associated with the estimate of Eq. 3 at different ages in all the eight countries included in this analysis. Results are unsatisfactory (low R2) between 25 and 54 years, but this shortcoming is less serious than it seems, because also the variability of both \({k}_{x,t}\) and \({c}_{x,t}\) is very limited at these ages. Instead, where the changes in the age structure of the population are more relevant, at young and old ages, the part of the variance that our model can “explain” becomes substantial, frequently above 75%, especially past age 60 years.

Fig. 5
figure 7

Median R2 (explained variance) of Eq. 3 by five-year age groups (all the countries and years of Table 1)

Note: The points represent the median explained variance of Eq. 3 by age group (all countries). The shaded area indicates the 95% band

5 Cointegration tests

In this section we present the results of the cointegration tests that we carried out considering separately each pair \({c}_{x,t}\) and \({k}_{x,t}\) for a fixed age x and varying t. The purpose of these tests is to exclude spurious correlations between the two series, which their stochastic trends make possible. We did this for each of the eight countries of Table 1, and for 20 five-year age groups x (0–4, 5–9, …, 95–99).

Table 3 summarizes the main results. Its first row says that the existence of a long-run relationship between the two series (\({c}_{x,t}\) and \({k}_{x,t}\)) is deemed likely in 107 cases (67% of the total), unlikely in 41 cases (26%) and uncertain in the remaining 12 cases (8%). Adopting a 5% significance level, instead of 10%, a long-run relationship is deemed likely in 60% of the cases (32.5% unlikely and 7.5% uncertain).

Table 3 Summary of results on the relationship between ca,tand ka,t(all the countries and years of Table 1)

To assess the reliability of our tests and to identify possible violations of model assumptions, we performed three standard kinds of diagnostic on the residuals:

  1. 1)

    The Box-Pierce test of autocorrelation (Box and Pierce 1970)

  2. 2)

    The Shapiro-Wilk test of normality (Royston 1982) and

  3. 3)

    The score test for non-constant error variance (heteroscedasticity; Cook and Weisberg 1983)

Based on these analyses, we identified the “problematic” cases that, due to autocorrelation, non-normality or heteroscedasticity, are not perfectly fit for this test. Removing them, however, which leaves us with 114 “well-behaved” time series, does not affect our results: in 66% of the cases, a long-run relationship can be identified (Table 3, second row). Overall, the tests results robustly indicate that in about two thirds of the cases, the two series (\({c}_{x,t}\) and \({k}_{x,t}\)) are likely cointegrated. Admittedly, there is still a relatively large proportion of “exceptions”, which we impute to the noisiness of our time series, connected with the profound demographic shocks of the period. The most relevant of these are due to the two World wars, both for their mortality effects and, more importantly, for the discontinuities they induced in the number of births: all of these effects remain visible for decades after their occurrence, especially in France (Fig. 3.B). This notwithstanding, our results appear supportive of the existence of a true (not spurious) long-run relationship between the current and the reference age structure.

Our tests can be broken down by country, as we did in Fig. 6.A. In four of them (Australia, Canada, Sweden and the US) a long-run relationship emerges very frequently, in more than 75% of the 20 age groups considered. The remaining countries (England-Wales, France, New Zealand and Switzerland) do not perform as well, but even in these cases a cointegration relationship appears to be likely in more than 50% of the series (age groups). Readers may note that the four countries where our model works best (Australia, Canada, Sweden and the US) are those that were comparatively less affected by the two World wars. Once again, this indicates that sudden mortality or fertility crises, with their long lasting “wave” effects on the age structure, weaken the relationship between the two series, ca,t and ka,t. However, even in the worst cases, the capacity of the model to predict the age structure based solely on the reference structure remains remarkably high (see e.g. Figure 4.A and 4.B).

Fig. 6
figure 8

Bounds test

Note: The 20 (5-year age group) series of each of the eight countries are observed for an unequal number of years (see Table 1)

Source: see Table 1

Figure 6.B breaks down our results by age group. In the age range between 15 and 44 years, a cointegration relationship can be found in most of the series scrutinized. Furthermore, a cointegration relationship can be found in more than 50% in all remaining age class except two: the 4–9 and 75–79 groups.

Beside wars, the variability of our results by country and age may be due to the effects of epidemics. The countries where our tests perform best generally present shorter time series starting in the 1930s, well after the end of the deadliest epidemics of the 19th century. Sweden is an exception, of course, but this country has been relatively spared by epidemics throughout its history, in comparison with other European countries. The age pattern of the cointegration test seems to be consistent with this interpretation (perturbing role of epidemics): the test performs better in the central age range (15–44 years), where epidemic-related mortality is generally lower.

Summing things up, a long-run relationship between cx,t and kx,t can be detected in about two thirds of the cases. Considering also the several disturbing factors that we could not keep under control (e.g. data quality, epidemics, wars, and the typically low statistical power of cointegration tests), we take this as a good indication in favour of our hypothesis that the current and the reference age structure are cointegrated.

6 Conclusions and interpretation of results

In this paper, we showed that a long-term equilibrium exists between the current and the reference age structure of a population. We did that in three different ways. First, we advanced a “logical” argumentation. If such equilibrium did not exist, the dynamics of mortality and fertility would be independent of each other, and this would lead to frequent cases of population explosion or extinction – which is not the case. Secondly, we used predictions and we showed that in the eight countries for which we had sufficiently long series, the evolution of the current age structure can be satisfactorily predicted on the sole base of mortality (cross-sectional life tables). Finally, we performed a series of tests to ascertain whether the current and the reference age structures of the eight countries under scrutiny are cointegrated. In this last part of our analysis we had to face some difficulties, in part because historical demographic series are generally noisy (due to wars, epidemics, famine, etc.) and shorter than it would be ideal, and in part because cointegration tests are characterized by a low statistical power. However, using the relatively more powerful bounds test, we found cointegration to be likely in about two thirds of the age-specific series under scrutiny.

None of the different methods that we employed to prove the existence of a long-term relationship between the current and the reference age structure can be considered conclusive in itself. Taken together, however, they seem supportive of the existence of the long-term equilibrium that we hypothesized. This means that most of the change observed in the proportions of young, adult and old people can be derived from the change in survival, and this for a very long time interval, dating back to as much as possible with the available data. The correlation does not seem to be spurious (i.e., due to common stochastic trends) and is rather strong. A simplified version of this finding (which emerges when the intercept of the regression is forced to zero and the slope to one) is that the reference age structure Kx,t represents an acceptable approximation of the current age structure Cx,t.

In 2017, Murphy argued that, since mid-20th century, most of the evolution in the age structure in 11 European countries depends on the evolution of mortality. He could not go back in time for more than a century because of the data requirements of his technique, the PHE decomposition (Preston et al. 1989). Conversely, we could: our method, less data demanding, allows us to use longer series, up to almost two centuries in the case of Sweden and France, for instance.

The implications of our findings are several. The first, and possibly the most important, is that demographers may want to reconsider the relative role traditionally attributed to fertility and mortality in shaping population age structures. In the same vein, our findings provide a standard against which to evaluate current age structures – their peaks and troughs, their peculiarities, the existence and strength of a “demographic bonus” or “dividend” (that is, a structurally favourable situation). All this is easier to detect, and is more objectively measured, if a standard derived from the reference age structure (or the reference age structure itself) is explicitly adopted.

Our findings also contribute to re-evaluate the use of cross-sectional life tables, whose validity has sometimes been questioned (e.g. by Borgan and Keilman 2019). Not only are they much more practical than their longitudinal alternatives: they provide a valid measure of the effects of mortality in any given period. This, incidentally, justifies their use in practical fields, such as pension laws, because they constitute not only a timely but also a valid basis for adjusting retirement ages as survival progresses. Whether this is done in the best possible way is, of course, a different question.