1 Introduction

The area of mortality is of particular interest to researchers and actuaries. This field of study includes the examination of historical mortality trends, the inspection of mortality projections, and the exploitation of data suitable for the underwriting, pricing, and analysis of life assurance and pensions products (see Ridsdale et al. 2010). The twentieth century witnessed longevity improvements in many high-income countries. Against this background, it is of utmost significance to understand mortality behaviours related to countries, which share similar socio-cultural roots, in order to evaluate the impact of longevity risk and for predicting possible future impacts in the insurance sector, especially with reference to life assurance and pensions. In other words, it is important to identify persistent groups of countries with homogeneous mortality behaviours related to the evolutionary process of longevity improvements.

Dong et al. (2020) summarize the history of the development of mortality modelling into four main stages. The first stage is characterized by the creation of deterministic and one-dimensional models; the second step concerns the deterministic and two-dimensional models; the third stage relies on the development of stochastic models. The model that marks the beginning of this new era is the Lee-Carter model (Lee and Carter 1992). It is one of the earliest stochastic mortality models in the literature for a single population. The last stage is characterized by research concerning multi-populations mortality models. The milestone of this research line is the innovative paper of Li and Lee (2005), which extends the Lee-Carter model to a group of populations.

In parallel with the growing interest in the topic of mortality, there has also been a massive development of statistical methodologies to treat multi-dimensional mortality data. Indeed, the embedding of additional sources of information and the introduction of multivariate data, i.e. age, year and country, has called for the exploitation of multivariate methods developed years earlier in other fields of research.

Among these methodologies, multi-way tensor decomposition models have been applied to the topic of mortality only recently. In fact, tensor algebra was established in the sixties, but for many years its applications were confined to psychometrics and chemometrics (see Harshman and Kiers 1987; Harshman 1978). Only in more recent times, they have been used in statistics, where the first attempts of using these models concern three dimensions (see, e.g., Kiers and Van Mechelen 2001), and the analysis of mortality data (see Russolillo et al. 2011; Giordano et al. 2019; Dong et al. 2020; Cardillo et al. 2023). In particular, Russolillo et al. (2011) proposed a three-way extension of the Lee-Carter model by considering death rates aggregated for time, age groups and country. Giordano et al. (2019) applied the three-way Lee-Carter model developed in Russolillo et al. (2011) to a group of countries by extending the Lee-Carter model to a three-way structure. Dong et al. (2020) and Cardillo et al. (2023) generalised the model used in Russolillo et al. (2011) by applying different tensor decompositions and by considering death rates aggregated for age, year, and country/gender, and addressed the forecasting problem of multi-population mortality.

As emerges from these papers, a researcher, who analyses multidimensional mortality data, usually deals with statistics stored in different matrices, where the information is replicated over different occasions. For example, in the context of underwriting, pricing, or forecast, an actuary manages a greater amount of information and could have to deal with the death rates (or with the number of deaths) by age, year, and replicated for different countries. In other words, mortality data displays both a cross-sectional and a temporal dimension and can be easily stored in a three-way array or tensor. Tucker3 (Tucker 1966; Giordani and Kiers 2018) represents the most common model for three-way analysis in the actuarial field.

Following this line of research, we propose an in-depth inspection of country mortality data to uncover how the evolving relationships among multi-population log central death rates induce distinguishable subsets associated with the socio-cultural attitudes of such countries. First, we investigate the patterns of co-occurrences of log central death rates through time to determine similarities and differences across countries. Secondly, we exploit the nonnegative three-way DEcomposition into DIrectional COMponents (DEDICOM) model, with the aim to find persistent subsets of countries with similar mortality behaviours and to forecast future trajectories of such components.

The DEDICOM model was introduced for two-way data, stored in a standard matrix, by Harshman and co-authors (see Harshman 1978; Harshman et al. 1982; Harshman and Kiers 1987). In a nutshell, DEDICOM is an algebraic model which provides information on latent components in the data that can be regarded as “properties” or “aspects” of the objects, by also providing patterns of relationships among these components. In DEDICOM, individual objects can have substantial weights on more than one of the latent components or patterns, meaning that DEDICOM identifies types of correspondence patterns that have distinctive properties, and these are then linked to the individuals that exhibit mixtures of these patterns in their particular history. The DEDICOM model has also been formulated for three-way data (see Harshman 1978; Kiers 1993; Lundy et al. 2003; Bader et al. 2006, 2007; Phan et al. 2010). In this context, one is able to better investigate the relationships among the components by aggregating trends over time and also to study the strength of each component’s participation in the third mode, i.e., time.

The intrinsically temporal nature of the methodology allows us also to shed light on similarity patterns among countries during various temporal phases. Moreover, we exploit the temporal activity of each component to develop a forecasting exercise in which we predict future similarity patterns between country pairs.

Results indicate that data can be described by two different communities that reveal some peculiar aspects of the mortality phenomenon. The first community is represented by the countries of Eastern Europe, which experienced higher mortality than the countries of Western Europe. Component-2 refers to the remaining countries, including countries that present very high levels of life expectancy, like Japan and Northern European countries, and Italy, France, and Spain, which are emerging as longevous populations. Finally, we have exploited the component information, together with recurrent neural networks, for forecasting future trajectories of similarities among countries’ mortality behaviours. This approach reveals that, on average, similarity increases until 2021, decreasing afterwards, meaning that the initial process of convergence among country log central death rates is replaced by a diverging behaviour.

The paper is structured as follows: Sect. 2 encompasses the identification and description of the database, the construction of the three-dimensional similarity tensor, and the description of the decomposition method. Section 3 reports the findings of the analysis by showing the results of the tensor decomposition and of the similarity forecasts. Finally, Sect. 4 concludes.

2 Data description and methodology

2.1 Data information and pre-processing phase

We consider 29 countriesFootnote 1 out of the 41 collected in the Human Mortality Database (HMD),Footnote 2 an archive of death information on various countries worldwide. The 29 countries analyzed were selected according to the availability of the data over the period 1960-2015, which is widely recognized as the most reliable period for longevity study. Two more countries, Luxembourg and New Zealand, were removed due to the small size of the population. The countries included in our analysis are listed in Table 1.

The methodology requires structuring the data in three dimensions: country, age, and time. The age dimension is organized into 21 classes from 0 to 99 years, with a step of 5 years, except for the first two classes, which represent respectively the individuals aged 0, and aged 1-4. The analysis is developed for the total population, considering both genders.

Table 1 Countries considered from HMD

We consider an initial array with three dimensions (country, age, time). The dimension of the array is 30 \(\times\) 21 \(\times\) 55, for a total of 34,650 observations. In particular, following Li and Lee (2005) that generalized the Lee-Carter model, we collect (with a different order) the information related to the log central death rates in the three-dimensional array \(\mathcal {Y}\), such that each entry contains the specific information of the log central death rate for population i, of age j at time k, i.e. \(y_{ijk}=log(m_{ijk})\). The central death rate for an individual in country i, of age j at time k is computed as:

$$\begin{aligned} m_{ijk}=\frac{d_{ijk}}{L_{ijk}}, \end{aligned}$$

where \(d_{ijk}=l_{ijk}-l_{i(j+1)k}\) is the expected number of deaths for population in country i, between age j and \(j+1\) in year k and \(l_{ijk}\) is the expected number of living individuals at age j in year k in country i initially made up of \(l_{0}\) individuals. The variable \(L_{ijk}\simeq \frac{1}{2}[l_{ijk}+l_{i(j+1)k}]\) is the risk exposure of the total population i, of age j in year k (assumed to be equal to the population at mid-year). To investigate the similarity patterns among countries through the application of the DEDICOM model, we need to create a country \(\times\) country similarity matrix \(\textbf{X}(k)\) for each year k, starting from the observations contained in the original array \(\mathcal {Y}\). To accomplish this purpose, we describe each country through the vector \(\textbf{y}(k)\) containing the logarithm of the central death rates for each class, this means that we consider the rows of each matrix country \(\times\) age for every single year k. Once we have constructed the yearly vectors of countries’ log central death rates \(\textbf{y}(k)\), we compute the cosine metric for measuring the distances among each pair of countries. Given two countries, whose time-k log central death rates for age classes are collected in vectors \(\textbf{y}_{i}(k)\) and \(\textbf{y}_{ l }(k)\), the time-k cosine distance between country i and country \(l\) is defined as

$$\begin{aligned} \textbf{C}_{i l }(k)=1-\cos \left( \textbf{y}_{i}(k), \textbf{y}_{ l }(k)\right) , \end{aligned}$$

where the \({\text {cosine}}\) \((\cos )\) is the angle between the two vectors:

$$\begin{aligned} \cos \left( \textbf{y}_{i}(k), \textbf{y}_{ l }(k)\right) =\frac{<\textbf{y}_{i}(k), \textbf{y}_{ l }(k)>}{\left\| \textbf{y}_{i}(k)\right\| _F\left\| \textbf{y}_{ l }(k)\right\| _F}, \end{aligned}$$

with the symbols \(<\circ>\) and \(\Vert \circ \Vert _F\) indicating the inner product and the Frobenious norm, respectively. We rely on the cosine distance because our primary aim is to consider the gap among countries’ mortality rate vectors in terms of orientation and not by their length. For the sake of comparison we have also created a tensor of Euclidean distances among vectors where the previous formula has been replaced with \(\textbf{C}_{i l }(k)=||\textbf{y}_{i}(k)- \textbf{y}_{ l }(k)||.\) Finally, from the cosine (Euclidean) distance, we produce the similarity tensor by collecting the matrices:

$$\begin{aligned} \textbf{X}_{i l }(k)=\max (\textbf{C}(k))-\textbf{C}_{i l }(k), \end{aligned}$$

where \(\max (\textbf{C}(k))\) is the maximum value of the time-k cosine distance matrix.

In a nutshell, the stronger the similarity (i.e., the force that connects two countries’ mortality characteristics), the shorter the length of the links connecting the countries. In other words, pairs of countries that are similar receive higher weights since they are placed nearby from each other, while values approaching zero are assigned to pairs with highly dissimilar characteristics.

2.2 DEcomposition into DIrectional COMponents (DEDICOM)

Starting from the similarity matrix \(\textbf{X}(k)\) obtained at each year k, we build the three-way tensor (see Bader et al. 2006; Avdjiev et al. 2019; Spelta et al. 2018; Pecora and Spelta 2017), \(\mathcal {X} \in \mathbb {R}^{N \times N \times K}\), where \(N=30\) represents the number of countries and \(K=55\) denotes the number of years. Thus, the tensor is composed of the 55 slices \(\textbf{X}(k) \in \mathbb {R}^{30 \times 30}\). The tensor \(\mathcal {X}\) encompasses both the cross-sectional and temporal information of the evolving similarities that characterize the countries. To reveal the hidden structure of such traits and their activity patterns, lower-dimensional components need to be identified and extracted from the data. In other words, in order to retrieve the mesoscale structure of the temporal similarity tensor, we exploit the DEcomposition into DIrectional COMponents (DEDICOM) model ( see Harshman 1978; Bader et al. 2006, 2007). The formulation of DEDICOM with P components, given the three-way tensor \(\mathcal {X}\), is as follows:

$$\begin{aligned} \textbf{X}_{k} \approx \textbf{A} \textbf{D}_{k} \textbf{R} \textbf{D}_{k} \textbf{A}^{T}, k=1,2, \ldots , K, \end{aligned}$$

where we use \(\textbf{X}_{k} = \textbf{X}(k)\) to simplify the notation. This linear algebra model maps the country-to-country similarity structures onto a community-level structure of P latent components and comprises:

  • \(\textbf{A}\): \(N \times P\) matrix, where each value \(a_{i p}\) reflects the strength of country i belonging to component p;

  • \(\textbf{D}_k\): \(P \times P\) diagonal matrix with diagonal elements describing the role of each component at time k;

  • \(\textbf{R}\): \(P \times P\) matrix, where each value \(r_{p q}\) is a measure of the interaction between components p and q.

The DEDICOM model is very general and contains several models as special cases. First of all, DEDICOM can be applied to both symmetric and asymmetric similarities. In the asymmetric case, \(\textbf{R}\) is an asymmetric matrix, whereas, in the symmetric case, \(\textbf{R}\) is a symmetric matrix and DEDICOM coincides with the so-called PARAFAC2 model (Harshman 1972). When \(\textbf{R}=\textbf{I}\), i.e., no information on the relationships among the components is provided by the model, the DEDICOM model reduces to the INDSCAL one (Carroll and Chang 1970). All the previous models analyze data stored in the three-way tensor \(\mathcal {X}\). If \(K=1\), data are available for only one year and \(\mathcal {X}\) is nothing but a standard (two-way) matrix, say \(\textbf{X} \in \mathbb {R}^{N \times N}\). In such a case, the matrices \({\textbf{D}}_k\) disappear and the model, called DEDICOM for two-way data, can be formalized as

$$\begin{aligned} \textbf{X}=\textbf{A}\textbf{R}\textbf{A}^{T}. \end{aligned}$$

Before entering into model estimation, some mathematical notation is needed. The mode-n matricized version of a tensor \(\mathcal {X}\) is denoted by \(\textbf{X}_{(n)}\). The Khatri-Rao, Kronecker, and Hardamard products and elementwise division are denoted by \(\odot\), \(\otimes\), \(\circledast\), and \(\oslash\), respectively. The product of a tensor and a matrix along the mode-n is denoted as \(\mathcal {X}=\mathcal {G} \times _{n} \textbf{A}\), or \(\textbf{X}_{(n)}=\textbf{A} \textbf{G}_{(n)}\). The contracted product of two three-way tensors \(\mathcal {A} \in \mathbb {R}^{I \times J \times K}\) and \(\mathcal {B} \in \mathbb {R}^{P \times Q \times R}\) along mode 1 returns the four-way tensor defined as \(\mathcal {C}=\langle \mathcal {A}, \mathcal {B}\rangle _{1} \in \mathbb {R}^{J \times K \times Q \times R}\), or \(c_{j k q r}=\sum _{i} a_{i j k} b_{i q r}\), for \(I=P\), and the contracted product along the modes 1 and 2 returns a matrix defined as \(\textbf{F}=\langle \mathcal {A}, \mathcal {B}\rangle _{1,2}=\langle \mathcal {A}, \mathcal {B}\rangle _{-3}=\textbf{A}_{(3)} \textbf{B}_{(3)}^{T} \quad \in \mathbb {R}^{K \times R}\).(Fig. 1)

Fig. 1
figure 1

Graphical representation of the model. Starting from the HMD mortality data representing log central death rates for the population in country i, at age j in time k, yearly country-by-country cosine similarity matrices \(\textbf{X}_k\) are created. Those matrices are embedded into a three-way tensor of similarities \(\mathcal {X}\) of dimensions country \(\times\) country \(\times\) year. The tensor is thus decomposed by the DEDICOM model, which is able to provide information on latent components in the matrix \(\textbf{R}\) and to associate individual countries with component weights stored in matrix \(\textbf{A}\). Moreover, thanks to the relationship of this decomposition with the Tucker model, the methodology yields also information on the strength of each component’s activity at time k (matrix \(\textbf{D}_k\))

2.3 Estimation

First, we introduce a relationship between the DEDICOM and Tucker models. By denoting a core tensor \(\mathcal {G} \in \mathbb {R}^{P \times P \times K}\), whose frontal slices are represented by

$$\begin{aligned} \textbf{G}_{k}=\textbf{D}_{k} \textbf{R} \textbf{D}_{k}, \end{aligned}$$

the DEDICOM model of a data tensor \(\mathcal {X}\) relates to the particular (constrained) Tucker2 decomposition:

$$\begin{aligned} \mathcal {X}=\mathcal {G} \times _{1} \textbf{A} \times _{2} \textbf{A}. \end{aligned}$$

Generally, the updating rule for \(\textbf{A}\) can be derived by minimizing the Frobenius cost function using the Alternating Least Squares (ALS) approach:

$$\begin{aligned} J(\textbf{A})=\left\| \mathcal {X}-\mathcal {G} \times _{1} \textbf{A} \times _{2} \textbf{A}\right\| ^{2}. \end{aligned}$$

Alternatively, multiplicative updating rules can be derived by using Nonnegative Matrix Factorization (NMF). By matricizations of the tensor approximation along modes 1 and 2, we have:

$$\begin{aligned} \textbf{X}_{(1)}=\textbf{A} \textbf{G}_{(1)}(\textbf{I} \otimes \textbf{A})^{T}, \quad \text{ and } \quad \textbf{X}_{(2)}=\textbf{A} \textbf{G}_{(2)}(\textbf{I} \otimes \textbf{A})^{T}, \end{aligned}$$

or conveniently in a matrix factorization of the concatenation matrix:

$$\begin{aligned} \tilde{\textbf{X}}=\left[ \textbf{X}_{(1)} \textbf{X}_{(2)}\right] =\textbf{A}\left[ \textbf{G}_{(1)} \textbf{G}_{(2)}\right] (\textbf{I} \otimes \textbf{A})^{T}=\textbf{A} \tilde{\textbf{G}}(\textbf{I} \otimes \textbf{A})^{T}, \end{aligned}$$

where \(\tilde{\textbf{G}}=\left[ \textbf{G}_{(1)} \textbf{G}_{(2)}\right]\). Assuming nonnegativity constraints since similarities are always non-negative, the above model can be considered as a special form of NMF (see Lee and Seung 1999; Pecora et al. 2016), where the objective is to estimate the nonnegative matrices \(\textbf{A}\) and \(\tilde{\textbf{G}}(\textbf{I} \otimes \textbf{A})^{T}\). Thus, a multiplicative updating rule can be derived for \(\textbf{A}\):

$$\begin{aligned} \begin{aligned} \textbf{A} \leftarrow&\textbf{A} \circledast \left( \tilde{\textbf{X}}(\textbf{I} \otimes \textbf{A}) \tilde{\textbf{G}}^{T}\right) \oslash \left( \textbf{A} \tilde{\textbf{G}}(\textbf{I} \otimes \textbf{A})^{T}(\textbf{I} \otimes \textbf{A}) \tilde{\textbf{G}}^{T}\right) \\ =&\textbf{A} \circledast \left( \left\langle \mathcal {X}, \mathcal {G} \times _{2} \textbf{A}\right\rangle _{-1}+\left\langle \mathcal {X}, \mathcal {G} \times _{1} \textbf{A}\right\rangle _{-2}\right) \\ {}&\oslash \left( \textbf{A}\left( \left\langle \mathcal {G} \times _{2} \textbf{A}, \mathcal {G} \times _{2} \textbf{A}\right\rangle _{-1}+\left\langle \mathcal {G} \times _{1} \textbf{A}, \mathcal {G} \times _{1} \textbf{A}\right\rangle _{-2}\right) \right) . \end{aligned} \end{aligned}$$

Although the core tensor \(\mathcal {G}\) does not exist explicitly in the DEDICOM model, it takes a particular role in updating the matrices \(\textbf{D}_k\), \(k=1,\ldots , K\), and \(\textbf{R}\), and can be updated using the multiplicative updating rule as in Cichocki et al. (2009):

$$\begin{aligned} \mathcal {G} \leftarrow \mathcal {G} \circledast \left( \mathcal {X} \times _{1} \textbf{A}^{T} \times _{2} \textbf{A}^{T}\right) \oslash \left( \mathcal {G} \times _{1} \textbf{A}^{T} \textbf{A} \times { }_{2} \textbf{A}^{T} \textbf{A}\right) . \end{aligned}$$

The matrix \(\textbf{R}\) can be updated from the core tensor \(\mathcal {G}.\) The vectorization of each slice \(\textbf{G}_{k}\) is given by

$$\begin{aligned} {\text {vec}}\left( \textbf{G}_{k}\right) =\left( \textbf{D}_{k} \otimes \textbf{D}_{k}\right) {\text {vec}}(\textbf{R})=\left( \textbf{d}_{k} \otimes \textbf{d}_{k}\right) \circledast {\text {vec}}(\textbf{R}), \end{aligned}$$

where \(\textbf{d}_{k}\) is the column vector containing the diagonal elements of \(\textbf{D}_{k}\). Hence, the mode-3 matricized version of the core tensor \(\mathcal {G}\) can be formulated as: \(\textbf{G}_{(3)}^{T}={\text {diag}}\{{\text {vec}}(\textbf{R})\}(\textbf{D} \odot \textbf{D})\), where \(\textbf{D}\) is the \(P \times K\) matrix with columns equal to \(\textbf{d}_{k}\), \(k=1,\ldots ,K\). Thus, the updating rule for \(\textbf{R}\) is given by:

$$\begin{aligned} {\text {vec}}(\textbf{R}) \leftarrow \left( \left( \textbf{G}_{(3)}^{T} \circledast \textbf{G}_{(3)}^{T}\right) \textbf{1}\right) \oslash \left( \left( \tilde{\textbf{D}} \circledast \textbf{G}_{(3)}^{T}\right) \textbf{1}\right) , \end{aligned}$$

where \(\tilde{\textbf{D}}=\textbf{D} \odot \textbf{D} \in \mathbb {R}^{P^{2} \times T}\).

The relation between the core tensor \(\mathcal {G}\) and the matrices \(\textbf{D}\) and \(\textbf{R}\) can also be expressed as

$$\begin{aligned} \textbf{G}_{k}=\left( \textbf{d}_{k} \textbf{d}_{k}^{T}\right) \circledast \textbf{R}, \end{aligned}$$

or equivalently

$$\begin{aligned} \textbf{P}_{k}=\textbf{G}_{k} \oslash \textbf{R}=\textbf{d}_{k} \textbf{d}_{k}^{T}, \end{aligned}$$

where the matrix \(\textbf{d}_{k} \textbf{d}_{k}^{T}\) is the rank-1 approximation of the matrix \(\textbf{P}_{k}\). In practice, instead of directly estimating the vector \(\textbf{d}_{k}\) from the matrix \(\textbf{P}_{k}\), it is more convenient to use the symmetrized form:

$$\begin{aligned} \textbf{P}_{k}=\left( \textbf{G}_{k}+\textbf{G}_{k}^{T}\right) \oslash \textbf{R}=2 \textbf{d}_{k} \textbf{d}_{k}^{T}. \end{aligned}$$

Since the matrix \(\textbf{P}_{k}\) is symmetric, it can be approximated by the eigenvector corresponding to the largest eigenvalue \(\textbf{P}_{k} \approx s \textbf{u} \textbf{u}^{T}\), where \(\textbf{u}\) is the leading eigenvector corresponding the largest eigenvalue s of the matrix \(\textbf{P}_{k}\). Being the matrix \(\textbf{P}_{k}\) non-negative (and irreducible), from the Perron-Frobenius theorem the leading eigenvector is guaranteed to be non-negative. Therefore, the vector \(\textbf{d}_{k}\) is derived as:

$$\begin{aligned} \textbf{d}_{k}=\sqrt{s / 2} \textbf{u}. \end{aligned}$$

3 Results

We report here a two-fold walkthrough of our results: in the first place, we display the results of the DEDICOM model applied to the cosine-based similarity tensor and the roles of the countries for the components. In doing so, we also compare such a solution with the ones of its competitors, namely DEDICOM applied to the Euclidean-based similarity tensor, two-way DEDICOM, and INDSCAL. Secondly, we show how to exploit the components extracted by the DEDICOM model for predicting countries’ similarities in future timesteps.

3.1 The DEDICOM components

We start our analysis by providing an overview of the model fitting for a different number of components P. This is used in order to select the model complexity. Moreover, as with many tensor decompositions, also the DEDICOM model could be affected by the presence of local minima. To account for this feature, for each P we have run 100 times the DEDICOM model using different starting points at each run. Figure 2 shows the percentage fits for the different number of components. Each coloured circle refers to a model run with different starting conditions, while the black line shows the maximum fit for each component. In order to discover homogeneous traits at the component level that may reveal countries’ different mortality patterns, we employ the DEDICOM model with \(P=2\) components, which explains about 98.7% of the original data variability. We have opted for two components to have a balance between model fit and stability of the results.

Fig. 2
figure 2

Model Fit. The figure reports the percentage fit of the DEDICOM model while varying the number of components P and also for different starting conditions as shown by the blue circles. The black line shows the maximum fit for each component. At \(P=2\), the DEDICOM results in a fit equal to \(98.7\%\)

Next, we are interested in discovering relevant patterns, at the component level, that may reveal homogeneous mortality behaviours related to the evolutionary process of longevity improvements in different countries. Against this background, we normalize the loading matrix \(\textbf{A}\) so that the columns have unit lengths. In this way, the elements of \(\textbf{A}\) have comparable sizes and describe membership degrees relating the N countries to the P components. Moreover, we also normalize the interaction matrix \(\textbf{R}\) so that its diagonal elements are equal to one. Finally, we compensated these two normalizations in the matrices \(\textbf{D}_k\), \(k=1, \ldots , K\). In formulae, setting \(\textbf{T} ={\text {diag}}\left\{ {|| \textbf{a}_1||^{-1/2}, \cdots , || \textbf{a}_P} ||^{-1/2} \right\}\), and \(\textbf{V}={\text {diag}}\{ r_{11}^{-1/2}, \cdots , r_{PP}^{-1/2}\}\) where \({\text {diag}}\{\circ \}\) is the operator which creates a diagonal matrix with the given argument values along the diagonal and \(\textbf{a}_p\) (\(p=1,\ldots ,P\)) is the p-th column of \(\textbf{A}\), we have:

$$\begin{aligned}{} & {} {\textbf{A}} \leftarrow \textbf{A}\textbf{T},\\{} & {} {\textbf{R}} \leftarrow \textbf{V}\textbf{R}\textbf{V}\\{} & {} {\textbf{D}}_k \leftarrow \textbf{T}^{-1}\textbf{D}_k \textbf{V}^{-1} = \textbf{V}^{-1}\textbf{D}_k \textbf{T}^{-1}, \end{aligned}$$

Once the information on latent components in the data is extracted, the DEDICOM model associates each country to more than one of the latent components or patterns with different weights. In a nutshell, DEDICOM identifies the communities by grouping countries that have mortality profiles that are similar in age and temporal dimensions.

Results are reported in Fig. 3 where the left sub-plot displays the temporal profiles embedded in the matrix \(\textbf{D}\). This matrix describes the level of synchronization of death similarities within each component, i.e., a high value of an element of \(\textbf{D}\) denotes the importance of a component for a year, i.e., in such a year, similarities among countries are strongly explained by the component involved. The right panel refers to the interaction matrix \(\textbf{R}\) among the two components. By taking into consideration both the cross-sectional and temporal features of country similarities, the DEDICOM model allows us to investigate how components respond to events affecting the logarithm of the central death rates. Indeed, the investigation of \(\textbf{D}\) reveals how the similarities summarized by the components have changed over time, i.e. the level of synchronization among components’ members decreased over time.

Fig. 3
figure 3

Temporal activity patterns \(\textbf{D}\) and latent components’ interaction \(\textbf{R}\). The left sub-plot reports the level of synchronization of death similarities within each component. The right panel refers to the interaction of patterns among the two components

The DEDICOM model provides a soft membership distribution of countries inside components. Specifically, subgroups of countries can be retrieved independently from each other and countries can belong to more than one community. On the other hand, we are mainly interested in a hard partition scheme, where overlapping is not allowed, this is obtained by assigning countries to the component in which they have the highest degree of membership. Figure 4 reports the WordCloud charts of such a hard partition scheme, where each coloured country is assigned to the component in which it has the highest value of \(\textbf{A}\). In particular, the right panel refers to Component-1, and the left panel identifies Component-2. The sizes of the words correspond to the country component membership degrees in \(\textbf{A}\) and the coloured countries are those having the highest membership degrees in the given component.

Fig. 4
figure 4

WordCloud chats of the hard partition scheme. The figure reports the hard partition scheme, where each coloured country is assigned to the component in which it has the highest degree of membership. In particular, the left panel refers to Component-1 and the right panel identifies Component-2. The size of the words corresponds to the country membership degrees in \(\textbf{A}\) and the coloured countries are those characterized by the highest membership degrees in the given component

To better understand the results obtained, firstly, it is interesting to observe the composition of the dataset. We have considered 30 countries, specifically, 26 from Europe, and 4 from the rest of the world (Japan, USA, Canada, and Australia). Component-1 is characterized by Eastern European countries, where longevity patterns are related to the improvement in social economic conditions with a direct reflection in the healthcare system (see Tóth 2021). Component-2 includes the remaining countries some of which are well-known for their longevity record, such as Japan. Others, such as Italy, France, and Spain are considered as emerging longevous populations, as underlined by Nigri et al. (2022). Also, the Northern European countries fall into this component due to their considerable levels of life expectancy at birth (Christensen et al. 2010).

To shed light on the main drivers of the component dynamics we have investigated the average behavior of the log central death rates in each group. Figure 5 illustrates the characteristics of the components and how they have been changing over time. Specifically, in the figure, the average log central death rate of each component is displayed as a dashed line, while standard errors are shown as coloured areas and identify the variability of component features.

Fig. 5
figure 5

Death characteristics of the components over time. The figure shows the average log central death rate of each component, together with the standard errors and how they have been changing over time. In these panels, countries have been associated with each component according to the hard partition scheme

Figure 5, clearly explains the different behaviour of the two components, which globally show distinct dynamics, with remarkable differences starting from the age of 20 onwards.

We can observe a peculiar behaviour related to older ages (75 and older). In contrast to the constant decreasing trend for Component-1, Component-2 shows a high variability around a stable trend up to the ’90 s, then starting to decrease.

For the sake of comparison, we report in Appendix A the results derived from the application of DEDICOM using the Euclidean-based similiarity tensor. Moreover, we also give the results of the DEDICOM model for two-way data by averaging the similarity tensor over the time dimension, and the INDSCAL model. Results for the two-dimensional DEDICOM and INDSCAL models are reported in Appendix B and C, respectively.

Before discussing these results, we remind that the INDSCAL model does not give information on the relationships among the components, which are offered by the DEDICOM solution (in particular by the off-diagonal elements of \(\textbf{R}\) which, by taking into account that the various components reveal different mortality patterns, help to assess the proximities between pairs of mortality patterns). Finally, although the DEDICOM solution for two-way data is interesting, by averaging the similarities across the years we lose the relevant information on the temporal activity patterns (stored in the matrix \(\textbf{D}\)). Obviously, this noticeably limits the impact of the analysis because it prevents us from carrying out the forecast analysis.

Looking at the results, we observe that DEDICOM applied to the Euclidean-based similarity tensor has a poorer performance in terms of fit, if compared with the previously-described cosine-based solution, with a percentage of data explained of around 95.5%. Moreover, unlike DEDICOM applied to the cosine-based similarities, DEDICOM applied to the Euclidean-based ones highlights mortality dynamics that do not find evidence in the demographic literature (see Fig. 13). We can also draw similar conclusions for two-way DEDICOM, where, for instance, Component-1 brings together countries with very different mortality dynamics such as Japan and France with Lithuania and Latvia (see Fig. 16). Finally, also the INDSCAL model does not provide results consistent with the effective mortality dynamics historically observed in the countries analyzed. For example, Component-1 of INDSCAL is mainly driven by Iceland (see Fig. 18), which has no particular demographic features such as to characterize a component.

3.2 Forecasting similarities among countries

This section concerns the application of the DEDICOM model to the country similarities prediction. Specifically, in order to forecast the country similarity matrix, we make use of the components extracted by the DEDICOM model to assign the scores to the similarities \(\textbf{X}_{il}(k)\) according to their likelihood of being present in the future. The temporal profiles are captured in the columns \(\textbf{d}_k\) of the matrix \(\textbf{D}\). Different components may have different trends, for example, they may have increasing, decreasing, or steady profiles. In our heuristic approach, we use the temporal profiles \(\textbf{d}_k\) as a basis for predicting the similarities in future timestamps. In particular, we employ a simple long short-term memory (LSTM) neural network to forecast future patterns associated with each \(\textbf{d}_k\). LSTM networks are a type of artificial neural network designed to recognize patterns in sequences of data and are explicitly designed to avoid the long-term dependency problem.

The core components of an LSTM network are a sequence input layer and an LSTM layer. The sequence input layer loads the time series data \(\textbf{d}_k\) into the network. An LSTM layer learns long-term dependencies between time steps of sequence data. The diagram reported in Fig. 6 illustrates the architecture of a simple LSTM network for regression (see Hochreiter and Schmidhuber 1997). The network starts with a sequence input layer followed by an LSTM layer. The network ends with a fully connected layer and a regression output layer. Here, we point out that the development of the most accurate neural network to deal with predictions on short time series is beyond the scope of the paper. We rather want to show how, by forecasting the entries of \(\textbf{D}\), while fixing the matrices \(\textbf{A}\) and \(\textbf{R}\), one is able to unveil future similarities. Before forecasting similarities among countries, we inspect the ability of the LSTM neural network to forecast the temporal profiles derived from DEDICOM. Due to the scarcity of data points, we use, as an in-sample set, a data time window of 44 years, which equals the \(80\%\) of the data points with the aim of forecasting the remaining observations. Figures 7 and 8 report the results of the exercise.

Fig. 6
figure 6

Schematic diagram of the LSTM neural network. The network starts with a sequence input layer that reads vectors \(\textbf{d}_k\) followed by an LSTM layer, which extracts temporal patterns. The network ends with a fully connected layer and a regression output layer

Specifically, Fig. 7 shows, for each component, the behaviour of the observed activity \(\textbf{d}_k\), in grey, and the forecasted dynamic \(\tilde{\textbf{d}}_k\), represented by the black dashed line.

Fig. 7
figure 7

Observed and forecasted patterns of the activity of each component. The figure shows, for each component, the dynamic of the observed activity through \(\textbf{d}_k\) with a grey line, together with the forecasting results from \(\tilde{\textbf{d}}_k\) obtained by the application of the LSTM neural network (black dashed line)

Figure 8 evaluates the goodness of the forecasts through some prediction accuracy measures, namely the Mean Average Error (MAE), the Mean Squared Error (MSE) and the Root Mean Squared Error (RMSE). From the figures, it appears that the forecasted activity patterns of Component-1 exhibit the highest performances, being closely related to the observed \(\textbf{d}_k\) while errors are higher for Component-2.

Fig. 8
figure 8

Forecasting error bars. The figure reports the Mean Average Error (MAE), the Mean Squared Error (MSE), and the Root Mean Squared Error (RMSE) associated with the forecast of each component

In order to forecast future similarity matrices \(\tilde{\textbf{X}}_k\), we make use of the DEDICOM formulation. In particular, since our dataset encompasses information up to the year 2015, our aim is to forecast similarity patterns among countries for the next 10 years, namely from 2016 to 2025. To accomplish this purpose, we use all the temporal information embedded into \(\textbf{d}_k\) to produce pure out-of-sample forecasts \(\tilde{\textbf{d}}_k\). Given these projections, and according to the matrices \(\textbf{A}\) and \(\textbf{R}\) extracted by DEDICOM, we find the forecasted similarity matrices \(\tilde{\textbf{X}}_k\) as follows:

$$\begin{aligned} \tilde{\textbf{X}}_{k} \approx \textbf{A} \tilde{\textbf{D}}_{k} \textbf{R} \tilde{\textbf{D}}_{k} \textbf{A}^{T} \text{ for } k=2016, 2017, \ldots , 2025. \end{aligned}$$

The results of the forecasting exercise can be observed in Fig. 9 which shows a three-dimensional heatmap, where each slice of the graph represents the similarity matrix forecasted for a given year. Indeed, on the x- and y-axis we report country tickers, while the z-axis represents the yearly forecasted similarities. The intensity of the colour reflects the similarity between country pairs. The colour-bar associates colours with the numerical values of the similarity. The panel shows how, on average, the similarity fluctuates with an unclear pattern; this means that the initial process of convergence among country log central death rates is replaced by a diverging behaviour where each country follows its own trajectory. Moreover, the behaviour of some eastern European countries such as Bulgaria, Belarus and Latvia which tease apart from the rest of the countries is also noticeable.

Fig. 9
figure 9

Forecasted similarity matrices. The figure reports the forecasted cosine similarities between country pairs of countries for the years 2016-2025. The colour-bar maps colours into numerical values of the similarity (color figure online)

Fig. 10
figure 10

Dendrograms of forecasted similarity for the years 2022 and 2025. The figure show two dendrograms associated with the hierarchical structures of the years 2022 and 2025, where different colours represent different groups, in upper and central panels. The bottom panel reports the evolution of the sum of the distances along with the residuality coefficient

This behaviour is also visible in Fig. 10, which reports the dendrograms obtained for the years 2022 and 2025, in the top and central panels, respectively. The dendrograms show the hierarchical clustering derived from merging the forecasted country similarities at different scales. In other words, this panel allows us to understand how the hierarchical clustering progressively aggregates countries based on their similarities, in these specific years. Colours represent the component that can be found at a distance that is the \(80\%\) of the maximum link of the dendrogram. We observe a single cluster encompassing all the countries except for some Eastern European countries breaking away from the leading group.

To have a deeper understanding of how this global trend affects components’ behaviour we filter the cosine similarity matrices \(\tilde{\textbf{X}}_{k}\) using the Maximum Spanning Tree (MST) (see Spelta and Araújo 2012) and we then compute the residuality coefficient as the ratio between the sum of the forecasted similarities \(\tilde{\textbf{X}}_{k}\) smaller and larger than L(k), i.e. the threshold, which ensures the connectivity of the MST. The residuality coefficient is formulated as:

$$\begin{aligned} \rho (k)=\frac{\sum _{\tilde{\textbf{X}}_{k}<L(k)} \tilde{\textbf{X}}_{k}}{\sum _{\tilde{\textbf{X}}_{k}\ge L(k)} \tilde{\textbf{X}}_{k}}. \end{aligned}$$

We report the evolution of the residuality coefficient along with the sum of the forecasted similarities in the bottom panel of Fig. 10. While the total similarity shows fluctuating trend, the residuality index displays a decreasing pattern until the year 2021, increasing afterwards meaning that similarities below the threshold L(k) have increased more than those above L(k).

4 Conclusion and discussion

In this paper, we have applied the DEDICOM model to the similarity tensor computed on multi-population log central death rates, starting from an initial array of three dimensions (country, age, time) obtained from the Human Mortality Database. The DEDICOM model is an algebraic model, which provides information on latent components in the data that can be regarded as “properties” or “aspects” of the objects, by also supplying patterns of relationships among these components. In the DEDICOM model, individual objects can have substantial weights on more than one of the latent components or patterns, meaning that DEDICOM identifies types of correspondence patterns that have distinctive properties, and these are then linked to the individuals that exhibit mixtures of these patterns in their particular history. We have created yearly similarity matrices by starting from the cosine distances computed for each pair of countries defined by log central death rates vectors. We have shown how the DEDICOM model is able to extract meaningful relational patterns by reporting a two-fold walkthrough of our results: in the first place, we have displayed results of the tensorial model and membership degrees of the countries associated with the components. Secondly, we have exploited such membership degrees for predicting countries’ similarities in future timesteps.

Compared to the INDSCAL model which does not offer information on the relationship among components, DEDICOM considers this relationship to disclose more flexible and enhanced mortality patterns. Another interesting comparison is with the two-way DEDICOM model that can be applied to tensors by, for instance, averaging the similarities across the years. In this case, on one hand, we lose the relevant information on the time trend which is known to be fundamental to describing the mortality dynamics and, on the other hand, it prevents us from carrying out the forecasting.

The DEDICOM model identifies a persistent subgroup of countries with homogeneous mortality behaviours related to the evolutionary process of longevity improvements. In particular, we found two different communities that reveal some peculiar aspects of the mortality phenomenon. The first community is represented by the countries of Eastern Europe, which experienced higher mortality than the countries of Western Europe. As highlighted by Meslé and Vallin (2017), for some of them (like Poland or the Czech Republic), the main challenge relates to their ability to follow Western countries that are lowering mortality at old and very old ages. Aiming at this, their healthcare systems should change to meet the needs of the elderly population. Component-2 refers to the remaining countries, including countries that present very high levels of life expectancy, like Japan and Northern European countries, and Italy, France, and Spain, which are emerging as longevous populations.

Finally, we have exploited the component information, together with recurrent neural networks, for forecasting future trajectories of similarities among countries’ mortality behaviours. Despite the merits of our approach to unveil future countries’ similarities, we are aware of its main limitations which derive from the over-simplified neural network structure. To mitigate this issue, future works may address the statistical significance of the forecasting exercises by carrying out more refined architectures. On the other hand, our work, by specifically describing the mesoscale interactions between components and their evolution in time, could help to design appropriate actions against longevity risk that may impact the stability conditions of life assurance and pensions.