1 Introduction

Principal component analysis (PCA) is a statistical method employed in many applied fields. The statistical and mathematical properties of PCA allow practitioners to derive a set of independent components from large datasets. This collection is obtained by decomposing the eigenstructure of the variance–covariance (VC) matrix, which returns loadings corresponding to eigenvectors and component scores corresponding to new sets of coordinates (Jolliffe 2002).

PCA is also used to synthetise different variables into multidimensional (i.e., composite) indicators (OECD 2008). Component scores from PCA may be considered less subjective indicators since weights are not arbitrarily assigned but data driven. De Muro et al. (2011) and Mazziotta and Pareto (2019), among others, criticise this use of PCA, as its weights are defined through a purely statistical technique and do not always reflect the actual relevance of the particular variables. Despite these limitations, PCA is largely used to define composite measures (Decanq and Lugo 2013).

Standard PCA is also mainstream in the analysis of geographically distributed data (Demšar et al. 2013). However, conventional PCA is often applied to spatial objects, while “geographical effects do not play any role in the PCA itself” (Demšar et al. 2013; p. 111). Hence, adapting PCA to spatial issues represents a relevant objective for researchers and practitioners.

Indeed, spatial data present particular characteristics that must be considered when applying statistical techniques. In the literature, two different spatial effects can be considered: spatial dependence and spatial heterogeneity (Anselin 1988). While the first implies the analysis of spatial autocorrelation produced by contagion between spatial units (LeSage and Pace 2009), the second considers spatial instabilities in estimated coefficients due to spatial regimes or heteroskedasticity (Anselin 1988; Postiglione et al. 2013; Murakami and Griffith 2019).

In recent decades, the literature has largely focused on the analysis of spatial dependence. Conversely, spatial heterogeneity has been considered only in a smaller and more recent number of contributions. Examples of techniques that account for spatial heterogeneity in regression models are geographically weighted regression (GWR, Fotheringham et al. 2002) and spatially varying coefficients (Wheeler and Calder 2007).

Presumably, the prevalence of studies about spatial dependence can be explained by two factors. Primarily, the analysis of dependence has been explored to avoid potential biases due to spatial autocorrelation, particularly while estimating regressions for geographical data that are popular in theoretical and empirical research. Second, sophisticated algorithms are often needed for modelling spatial heterogeneity, especially in a cross-section framework (Andreano et al. 2017; Billè et al. 2017). This has left room for the analysis of cross-sectional heterogeneity as one of the open challenges for spatial analysts (Postiglione et al. 2013; Murakami and Griffith 2019).

PCA is not exempted from a deeper discussion about spatial effects. In this sense, a first contribution that considered spatial dependence in PCA, denoted as spatial PCA (i.e., sPCA), was given by Jombart et al. (2008). sPCA defines spatial components through an objective function that combines spatial dependence with standard decomposition of the VC matrix. Essentially, this method finds new composite measures that no longer maximise the variance of the scores (as in PCA) but instead maximises the product of their variance and Moran’s \(I\) (Moran 1950). For an interesting application of sPCA to the definition of a composite measure of well-being, see Giacalone et al. (2022).

Concerning spatial heterogeneity, this has been mainly investigated for PCA through geographically weighted principal components analysis (GWPCA; Fotheringham et al. 2002). In GWPCA, the local VC matrices are obtained by using a kernel function so that GW components are influenced more by closer observations, and spatial instabilities are approached by obtaining loadings and scores at each unit. However, the loadings and components from GWPCA are not always straightforward, as results must be listed in wide sets of local loadings or mapped in terms of locally relevant variables (i.e., winning variables; Harris et al. 2011).

Following this narrative, this paper contributes to the development of PCA for geographical data (Wartenberg 1985; Jombart et al. 2008; Lloyd 2010; Harris et al. 2011; Sarra and Nissi 2020; Cartone and Postiglione 2021) in the special case of spatial heterogeneity. In this sense, our contribution substantially differs from Jombart et al. (2008) and Giacalone et al. (2022), who used sPCA to examine spatial dependence.

To this end, we consider spatial heterogeneity as a criterion to divide the sample of observations into smaller homogeneous groups, individuating subpopulations as a solution of a combinatorial optimisation problem. We introduce a new algorithm that extends the application of simulated annealing (SA) in spatial regression (Postiglione et al. 2013) to the case of PCA (hereafter, SA-PCA). When solving this optimisation problem, the proposed algorithm decomposes the VC matrix to compute eigenvalues, eigenvectors, and component scores for a parsimonious number of regimes and overcomes the hypothesis of spatial homogeneity.

In this paper, we also note that the SA-PCA algorithm can be an alternative to the use of cluster techniques on geographical units when using coordinates and/or other attributes (for example, when using \(k\)-means) and computation of PCA to each cluster. This method may be denoted as a cluster-specific PCA. In a different manner, Libório et al. (2022) have already extended the use of \(k\)-means clustering to PCA for computing piecewise composite indicators.

However, two main differences exist between a cluster-specific PCA and SA-PCA. First, by using \(k\)-means, clusters are calculated at the first stage, and after, components are obtained by applying PCA on groups. Conversely, local loadings in SA-PCA are individuated by directly modelling instability in the VC matrix, and information from PCA is used step by step in the algorithm for the determination of the groups. Second, in our method, spatial information is explicitly accounted for. Thus, while cluster-specific PCA is meant to individuate components without considering any geographical issue, our proposal aims to tackle unobserved spatial heterogeneity by considering space directly in the objective function.

Furthermore, this contribution differs from GWPCA for two main reasons. First, SA-PCA does not address nonstationarity in continuous space, as it directly considers the problem of structural differences due to multiple regimes. In fact, spatial heterogeneity can also be present in the form of discrete heterogeneity for which parameters vary between spatial regimes or groups of regions (Anselin 2010). Second, in this paper, heterogeneous eigenvectors are individuated for a limited number of clusters. Thus, in the presence of these features, SA-PCA addresses some critical aspects of GWPCA, especially to improve the interpretability of its results.

Since the use of PCA for composite indicators has been extensively explored (Pampalon and Raymond 2000; Havard et al. 2008), we start from the definition of a composite measure to evaluate the consequences of spatial heterogeneity. Then, a multidimensional indicator of local services for 121 municipalities in the province of Rome is defined. This province is selected as it is one of the most densely populated in Italy.

In the application, various aspects are discussed. SA-PCA, GWPCA and cluster-specific PCA are performed to treat spatial heterogeneity, and the results are compared. The results show that spatial heterogeneity is generally relevant in PCA and that SA-PCA can be used to efficiently individuate endogenous groups and to capture local characteristics. This may enrich the insight in terms of ad hoc policies at the local level.

Last, as highlighted by Jombart et al. (2008), it is meaningless to compare sPCA eigenvalues to the sum of all eigenvalues as in PCA. Additionally, the percentage of the total criterion associated with an eigenvalue in sPCA cannot be used as a direct rule to choose the number of components in terms of the explained variance. Furthermore, since eigenvalues may be both positive and negative in sPCA, it can be difficult to select more representative components for PCA. By considering this last drawback and our focus on spatial heterogeneity, we limited the comparison of SA-PCA to standard PCA, GWPCA, and cluster-specific PCA.

The layout of the paper is as follows. Section 2 is devoted to summarising the methodological contribution of the paper, with a review of the main characteristics of PCA and applying SA to identify zones of local stationarity for eigenvalues and eigenvectors. Section 3 contains the description of the data collected from the ISTAT Statistical Atlas of Municipalities and the results of the composite indicator. In this section, we also apply our algorithm to compare the results of SA-PCA to those of PCA, GWPCA, and cluster-specific PCA. Finally, Sect. 4 presents some concluding remarks and outlines the future research agenda.

2 Methodology

PCA is based on the analysis of a centred matrix \({\mathbf{X}}_{nm}\) with \(n\) statistical units and \(m\) variables. The essential idea of PCA is the representation of units in \(q\)-dimensional subspaces (with \(q < m\)), retaining the maximum amount of statistical information. The reduction in data dimensionality allows easier interpretative analysis.

A primary result in PCA is (Jolliffe 2002):

$${{\varvec{\Sigma}}} = {\mathbf{A\Lambda A}}^{t}$$
(1)

where \({{\varvec{\Lambda}}}\) is the diagonal matrix of eigenvalues, \({\mathbf{A}}\) is the corresponding matrix of loadings (i.e., the eigenvectors), and \({{\varvec{\Sigma}}}\) is the VC matrix. The eigenvalue \(\lambda_{j}\) in \({{\varvec{\Lambda}}}\) represents the variance of the principal component \({\mathbf{Y}}_{j}\) defined as:

$${\mathbf{Y}}_{j} = {\mathbf{Xa}}_{j}$$
(2)

where \({\mathbf{a}}_{j}\) is the \(j\)-th column of the loading matrix \({\mathbf{A}}\) of \({{\varvec{\Sigma}}}\) and represents the contribution of each variable in \({\mathbf{X}}\) to the \(j\)-th principal component \({\mathbf{Y}}_{j}\).

Basically, for an adequate \(q\), the component scores related to components \(q + 1\) to \(m{ }\) represent the Euclidean distances alongside the axes of the corresponding orthogonal vectors to a q-dimensional linear subspace. The first q components are chosen so that this subspace contains the highest proportion of the total variance. The first \(q\) components are described by:

$${\mathbf{Y}}_{q} = {\mathbf{XA}}_{q}$$
(3)

where \({\mathbf{Y}}_{q}\) is the \(n \times q\) score matrix and \({\mathbf{A}}_{q}\) is the \(m \times q\) loading matrix with only the first \(q{ }\) columns of \({\mathbf{A}}\).

Jolliffe (2002) describes an important property that is very useful for the definition of our algorithm. Making use of singular value decomposition theorem, data matrix \({\mathbf{X}}\) can be written as:

$${\mathbf{X}} = \mathop \sum \limits_{j = 1}^{r} {\mathbf{d}}_{j} l_{j}^{1/2} {\mathbf{a}}_{j}^{t} = \mathop \sum \limits_{j = 1}^{r} l_{j}^{ - 1/2} {\mathbf{Xa}}_{j} l_{j}^{1/2} {\mathbf{a}}_{j}^{t} = \mathop \sum \limits_{j = 1}^{r} {\mathbf{Xa}}_{j} {\mathbf{a}}_{j}^{t}$$
(4)

where \({\mathbf{d}}_{j}\) is the \(j\)-th column of the matrix \({\mathbf{D}}\) (with \({\mathbf{D}}^{t} {\mathbf{D}} = {\mathbf{I}}_{r}\)), \(l_{j}^{{}}\) denotes the \(j\)-th eigenvalue of \({\mathbf{X}}^{t} {\mathbf{X}}\) (with \(l_{j}^{{}} = \frac{{\lambda_{j} }}{n - 1}\)) and \({\mathbf{a}}_{j} \user2{ }\) is the \(j\)-th column of the matrix \({\mathbf{A}}\)(with \({\mathbf{A}}^{t} {\mathbf{A}} = {\mathbf{I}}_{r}\)). The rank of \({\mathbf{X}}\) is supposed to be \(r\), and thus, \(l_{j}^{{}} = 0\) (and \({\mathbf{Xa}}_{j} = 0)\) for \(j = \left( {r + 1} \right), \left( {r + 2} \right), \ldots ,m.\)

Equation (4) can be written element by element as:

$$x_{il} = \mathop \sum \limits_{j = 1}^{r} d_{ij} l_{j}^{1/2} a_{lj}$$
(5)

where \(d_{ij}\) is the \(\left( {i,j} \right)\)-th entry of \({\mathbf{D}}\) and \(a_{lj}\) is the \(\left( {l,j} \right)\)-th entry of \({\mathbf{A}}\).

Retaining a number of \(q\) components (with \(q < r\)), Eq. (5) can be written as:

$${}_{q}^{{}} \tilde{x}_{il} = \mathop \sum \limits_{j = 1}^{q} d_{ij} l_{j}^{1/2} a_{lj}$$
(6)

where \(d_{ij} l_{j}^{1/2} a_{lj}\) is the part of \(x_{il}\) corresponding to the \(j\)-th component for \(j = 1, 2, \ldots ,q\). Householder and Young (1938) and Gabriel (1978) highlighted that \({}_{q}^{{}} \tilde{x}_{il}\) represents the best rank \(q\) approximation to \(x_{il}\), in the sense that \({}_{q}^{{}} \tilde{x}_{il}\) minimises the following function:

$$\mathop \sum \limits_{i = 1}^{n} \mathop \sum \limits_{l = 1}^{m} ({}_{q}^{{}} x_{il} - x_{il} )^{2}$$
(7)

where \({}_{q}^{{}} x_{il}\) is any possible rank q approximation to \(x_{il} .\) According to (6) and (7), an \(n \times m\) residual matrix \({\mathbf{S}}\) can be defined as the difference between the data matrix \({\mathbf{X}}\) and the best rank \(q\) approximation \({}_{q}^{{}} {\tilde{\mathbf{X}}}\) as:

$${\mathbf{S}} = {\mathbf{X}} - {}_{q}^{{}} {\tilde{\mathbf{X}}}$$
(8)

where \({}_{q}^{{}} {\tilde{\mathbf{X}}} = \mathop \sum \limits_{j = 1}^{q} {\mathbf{Xa}}_{j} {\mathbf{a}}_{j}^{t} = {\mathbf{XA}}_{q} {\mathbf{A}}_{q}^{t}\) (see Harris et al. 2015). Equation (8) represents the core function of our proposed algorithm.

In standard PCA, the implicit hypothesis is that the VC structure of the process is homogenous throughout the geographical area under investigation. This assumption is obviously not always realistic for geographically distributed data (Harris et al. 2015). Therefore, it is necessary to relax this hypothesis to consider potential heterogeneity in principal components.

The first appropriate PCA technique for spatial data is represented by GWPCA (Fotheringham et al. 2002; Harris et al. 2011). In this approach, Eq. (1) can be generalised as (Harris et al. 2015):

$${{\varvec{\Sigma}}}\left( {g_{i} ,h_{i} } \right) = {\mathbf{A}}\left( {g_{i} ,h_{i} } \right){{\varvec{\Lambda}}}\left( {g_{i} ,h_{i} } \right){\mathbf{A}}\left( {g_{i} ,h_{i} } \right)^{t}$$
(9)

where \({{\varvec{\Lambda}}}\left( {g_{i} ,h_{i} } \right)\) is the diagonal matrix of local eigenvalues, \({\mathbf{A}}\left( {g_{i} ,h_{i} } \right)\) is the corresponding matrix of local eigenvectors, \({{\varvec{\Sigma}}}\left( {g_{i} ,h_{i} } \right)\) is the local VC matrix, and \(\left( {g_{i} ,h_{i} } \right)\) are the geographical coordinates of spatial unit \(i\). The corresponding local component scores \({\mathbf{Y}}\left( {g_{i} ,h_{i} } \right)\) are:

$${\mathbf{Y}}\left( {g_{i} ,h_{i} } \right) = {\mathbf{XA}}\left( {g_{i} ,h_{i} } \right)$$
(10)

GWPCA represents a valid tool to model continuous spatial heterogeneity in PCA. The output of GWPCA consists of different loadings and component scores defined for each spatial unit. For example, if Eq. (10) identifies composite indicators, GWPCA defines completely different indices for each spatial unit as a function of distinct loadings. This may produce remarkable difficulties in the interpretation of the results.

To simplify the interpretation of the phenomena, we propose applying SA to PCA to identify groups of spatial units that share the same eigenstructure (i.e., identifying the same composite indicators). This approach recalls Postiglione et al. (2013) and the enhancements proposed by Postiglione et al. (2017) and Billè et al. (2017). Our contribution is described in the next subsection.

2.1 The proposed algorithm

The main idea of the methodology is that the appropriate treatment of spatial heterogeneity in PCA is substantially equivalent to partitioning an area into groups of geographical zones that are not necessarily conterminous and have similar component scores. Following this, the output is not represented by different loadings at each spatial unit as for GWPCA but by distinct loadings for every group of regions identified by the SA algorithm.

SA is a stochastic relaxation algorithm that was originally introduced in statistical mechanics by Metropolis et al. (1953) and Kirkpatrick et al. (1983). SA is a random-search technique that is based on the analogy between the way in which a metal freezes into a minimum energy crystalline structure (i.e., the annealing process) and the search for a minimum in a more general system. This approach constitutes the basis of an optimisation technique for the solution of many combinatorial problems.

Geman et al. (1990) observed that a spatial combinatorial optimisation problem might be described through a Markov random field (MRF). The probability measure of an MRF by using Gibbs distribution is defined through the energy function \(U\left( {{\mathbf{X}},{\mathbf{k}}} \right),\) which in our algorithm represents the objective function to be minimised, and a control parameter, \(T\) (see Geman and Geman 1984; Postiglione et al. 2013). \(U\left( {{\mathbf{X}},{\mathbf{k}}} \right)\) depends on observed data \({\mathbf{X}}\) and the label vector \({\mathbf{k}} = \left( {k_{1} ,k_{2} , \ldots ,k_{i} , \ldots ,k_{n} } \right)\), which categorises the heterogeneous zones, identifying clusters of regions.

\(U\left( {{\mathbf{X}},{\mathbf{k}}} \right)\) is defined by considering two different effects: a measure of the goodness of fit of the model and a proximity constraint that describes the extent of aggregation of the spatial units. At the \(c\)-th iteration of the procedure, the energy function is defined as:

$$U\left( {{\mathbf{X}},{\mathbf{k}}} \right) = \beta I_{c} \left( {{\mathbf{X}},{\mathbf{k}}} \right) - \left( {1 - \beta } \right)V_{c} \left( {\mathbf{k}} \right)$$
(11)

In (11), \(I_{c}\) is the interaction term at iteration c calculated as:

$$I_{c} \left( {{\mathbf{X}},{\mathbf{k}}} \right) = \mathop \sum \limits_{i = 1}^{m} \mathop \sum \limits_{l = 1}^{n} s_{il}^{2}$$
(12)

where \(s_{il}\) are the entries of the residual’s matrix \({\mathbf{S}}\) defined according to (8). The second term of (12) is a penalty constraint defined through a Potts model as:

\(V_{c} \left( {\mathbf{k}} \right) = \mathop \sum \limits_{r = 1}^{n} \mathop \sum \limits_{z = 1}^{n} b_{rz} 1_{{\left( {k\left( l \right)_{r} = k\left( l \right)_{z} } \right)}}\) (13).

Specifically, \(b_{rz}\) is the element \(\left( {r,z} \right)\) of a binary contiguity matrix, \(1_{{\left( {k\left( l \right)_{r} = k\left( l \right)_{z} } \right)}}\) is the indicator function of the event and \(k\left( l \right)_{r} = k\left( l \right)_{z}\), and \(\left( {1 - \beta } \right)\) is a parameter that discourages configurations with nonconterminous units. The parameter \(\left( {1 - \beta } \right)\) is chosen by the researcher and models the importance of the proximity between the spatial units. We note that, unlike Postiglione et al. (2013), the two parts of the energy function (11) are balanced with complementary weights to better control the cooling process.

At the initial value of control parameter \(T_{0}\), each unit \(i\) is randomly classified as \(k_{i,0}\), where \(k_{i,0} \in \left\{ {1,2, \ldots ,K} \right\}\), and \(K\) is the number of clusters. This step defines the initial configuration \(F_{0}\). At the \(\left( {c + 1} \right) - th\) iteration, given a current configuration \(F_{c}\), a different configuration \(F_{c} \ne F_{c + 1}\) is randomly chosen, defining a new energy function \(U(F_{c + 1} )\) that is compared with the previous one \(U(F_{c} ).\) The old configuration \(F_{c}\) is substituted by the new \(F_{c + 1}\) in accordance with the probability:

$$Pr_{c,c + 1} = {\text{min}}\left\{ {1,{\text{exp}}\left( { - \frac{{U(F_{c + 1} ) - U(F_{c} )}}{{T_{c} }}} \right)} \right\}$$
(14)

Probability (14) avoids local minima by defining a positive probability for the change in configuration also when the objective function \(U\left( F \right)\) increases. In essence, more likely patterns (i.e., configurations with lower states of energy) are always accepted, but it is also possible to accept a poorer configuration.

Another relevant issue that should be addressed before applying our SA-PCA algorithm is the choice of the number of groups of regions to be considered. Many criteria exist to determine the optimal number of clusters (Gordon 1999). As highlighted by Harris et al. (2015), the variance levels of the components of S are a measure of the “goodness of fit” of the projected subplanes. Hence, it seems reasonable to use these in our approach to determine the optimal number of groups. Following the idea by Krzanowski and Lai (1988), we consider the pooled within-group covariance matrix \({\mathbf{W}}_{K}\) for the components of S calculated for any number of partitions of the dataset and, in particular, \(W_{K} = trace\left( {{\mathbf{W}}_{K} } \right)\). The optimal number of groups \(K\) maximises the following function:

$$KL\left( K \right) = \left| {\frac{Diff\left( K \right)}{{Diff\left( {K + 1} \right)}}} \right|$$
(15)

with:

$$Diff\left( K \right) = \left( {K - 1} \right)^{2/p} W_{K - 1} - K^{2/p} W_{K}$$
(16)

where \(K\) is the number of groups and \(W_{K - 1}\) and \(W_{K}\) are the traces of the pooled within-group covariance matrix for the components of S for \(K - 1\) and \(K\), respectively. This approach is used in our empirical application.

The main steps of our algorithm for considering spatial discrete heterogeneity in PCA are summarised in Appendix 1.

3 Empirical results

In this case study, we calculated a composite indicator of local services in the province of Rome to evaluate the effects of spatial heterogeneity and to assess the capacity of SA-PCA. In recent studies, there has been rising interest in composite indicators of well-being and development at the local level (Salvati and Carlucci 2014; Fusco et al. 2018). Even in Italy, a discussion on well-being under a multivariate perspective is justified by the attention given by official statistics (see, for a broader discussion, Mazziotta and Pareto 2019). Therefore, in this paper, we have calculated an indicator for targeting disparities in access to private and public services as well as for assisting policy-makers in evaluating development at the municipality level.

In the construction of the index, we have briefly considered some theoretical aspects suggested by OECD (2008) to better understand the phenomenon under investigation. Since there is no unique and generally accepted definition of a composite indicator to evaluate local governance, we have taken inspiration from recent literature in the field of local services and well-being (D’Inverno and De Witte 2020; Tommaselli et al. 2021). Accordingly, various domains at the local level, such as economy, education, health, social care services, environment, local mobility, and public transport, have been considered.

Moreover, other variables inspired by recent literature for the case of Italy have been added. We have used the number of businesses as a relevant indicator for territorial development and potential in wealth generation (Scaccabarozzi et al. 2022). We have included the number of accommodation facilities, as they are able to increase local attractiveness and preserve cultural heritage (see, for Italy, Cracolici and Nijkamp 2009). A variable for local production of quality agricultural goods (Protected Geographical Indicator Agriculture, PGI) has been added, as promoting those businesses can support socially and environmentally sustainable development (Calcagnini and Perugini 2019; ISTAT 2022).

The province of Rome has been selected as it is one of the most populated areas in Italy but also as a region that presents a significant amount of spatial heterogeneity. In fact, due to the wide range of densely urban areas and rural or inner municipalities in the region, it seems particularly suitable to observe the effects of spatial heterogeneity on composite indicators (Salvati et al. 2019; Cartone and Postiglione 2021).

Official data from the Statistical Atlas of MunicipalitiesFootnote 1 by the Italian National Statistical Institute (ISTAT) have been employed in the study. Specifically, we have considered the 121 municipalities (“Comuni”) in the province. Depending on data availability, data were collected for 2015, except for two variables available only in the database for 2014. Hence, through this choice, we can consider a sufficient number of variables for the phenomenon and in a more recent period than 2011, the last census year. In Table 1, the various domains as well as the variables included in the indicators are reported.

Table 1 Variables used for the derivation of a composite indicator of local services from the statistical atlas of italian municipalities (ISTAT)

As a first step, standard PCA is applied after data have been standardised to zero mean and unit variance. Complete results from PCA suggest that all first five components have eigenvalues above one and should be accounted for following Kaiser’s rule (see Appendix 2a). However, for simplicity, only the first two components are more deeply explored in Table 2.

Table 2 Loadings, eigenvalues, proportion of variance, and cumulative proportion associated with the first five components of PCA on 121 municipalities in the Province of Rome

From Table 2, we observe that loadings of the first component are characterised by a higher magnitude of variables related to water, business and social welfare, all negative in sign.

By looking at the results, two issues can be raised with reference to global PCA. On one hand, the first two components account for less than half of the total variance. Hence, by considering a whole map estimation, the representativeness is quite low. Cartone and Postiglione (2021) noted that this situation may be linked to spatial heterogeneity, as discarding spatial instabilities may sometimes result in a poor specification of composite indicators.

Furthermore, there may be difficulties in gaining evidence from loadings of global estimation. Global PCA returns averaged weights over the whole area under investigation, and this simplification can preclude a detailed assessment of local services at the municipality level. Accordingly, the scores for the first two global PCs are mapped in Fig. 1A and B, while the other selected components are reported in Appendix 2b. The extent of spatial instabilities for the first two components is also evaluated by using local G statistics (Ord and Getis 1995).Footnote 2 To do so, a k-nearest contiguity matrix accounting for neighbours of the five closest municipalities is used (see Fig. 1C and D).

Fig. 1
figure 1

Quantile maps of scores for global PCA. First component scores are mapped on the left A while second component is on the right B and relative Getis—Ord clusters are mapped in C and D

In fact, although some characteristics may be common to any municipality in the province, the relevance of local services may vary depending on various factors, such as history and tradition, as well as geographical features (see, for public services, Narbón-Perpiñá and De Witte 2018). Hence, to locally examine profiles of municipalities, we explicitly account for spatial heterogeneity.

One alternative in order to relax spatial homogeneity is to use GWPCA. As mentioned before, this technique allows loadings to locally change by computing the VC matrix by using a kernel function, and it can also be applied to wider datasets (Kallio et al. 2018; Trogu and Campagna 2018). As a starting point for GWPCA, we perform a Monte Carlo (MC) test to verify the significant nonstationarity of eigenvalues, as suggested by Harris et al. (2011). Figure 2 reports the results for the test, highlighting the significant nonstationarity of eigenvalues (p value = 0.041).

Fig. 2
figure 2

Monte Carlo test for the stationarity of eigenvalues

According to Fotheringham et al. (2002), GW technique results are relatively insensitive to the choice of the kernel, while bandwidth selection is a crucial step. For this reason, bandwidth is carefully investigated in this paper by exploring cross-validation (CV) functions for various components and bandwidth levels. Again, the CV function is calculated equivalently to Harris et al. (2011) and by using the residuals for various values of the bandwidth based on an adaptive bisquare kernel. The choice of an adaptive kernel is justified by the presence of irregularities in the spatial configuration.

In Fig. 3, the CV functions calculated for each component are shown. For the number of components, \(q=2, 4,\) and \(5\), a minimum of the CV function is obtained. However, to make results from GWPCA comparable to those of global PCA, we consider \(q=5\) and a bandwidth of 33 according to the minimum of the CV function.

Fig. 3
figure 3

Cross-validation scores for an adaptive bisquare kernel

An often-used method to visualise results from GWPCA is to map the winning variables, i.e., indicating the variable corresponding to loadings with higher magnitude in absolute value at each location. For the phenomenon under study, we observe that the winning variables change considerably, a result in line with the performed MC test.

In Fig. 4, the maps of the winning variables of the first two components are shown. According to the first component, water supply can be considered the winning variable in the municipality of Rome and in the northeast, while the number of businesses in tourism is the winning variable in the south. The second component is also characterised by significant differences in terms of winning variables, with the north being more influenced by public transportation and the south by childcare structures.

Fig. 4
figure 4

Map of the winning variables for the first A and second B local component obtained with GWPCA

If maps of winning variables give evidence on the differences across the study area, they cannot be considered themselves as maps of strictly homogeneous groups. Indeed, in GWPCA, the structure of eigenvectors changes in any spatial unit by construction. Thus, using this technique, it is not directly possible to identify spatial regimes. To overcome this shortcoming, we employ our novel algorithm SA-PCA to endogenously individuate subgroups of spatial units. This methodology relaxes the assumption of homogeneity of the VC matrix, and it provides endogenous spatial regimes.

Parameters related to the cooling process have been set according to previous successful experiences in the literature (Stander and Silverman 1994; Fouskakis and Draper 2002; Postiglione et al. 2013). To avoid falling into local minima, the cooling rate should be set in the interval between 0.80 and 0.99. Hence, the level of \(\theta\) is chosen to be equal to 0.95. Additionally, the level of the initial temperature \({T}_{0}\) is set at approximately 0.05, like Postiglione et al. (2013). After studying the behaviour of the energy function (11) for various levels of the two parameters, this combination ensures the convergence of the algorithm while preventing entrapment in local minima at the same time.

Regarding the choice of \(1-\beta\), we investigate various spatial configurations for certain levels of the parameter. Some examples of these spatial configurations are reported in Fig. 5. We observed, for 5 groups, that choosing a value of \(1-\beta\) lower than 0.50 (Fig. 5A) leads to very agglomerated spatial groups. In fact, increasing the penalty term forces contiguous spatial units to be part of the same group. This may result in a poorer optimisation process, and this circumstance may not be preferable in a context such as the province of Rome, which presents local pockets that are hard to capture by strictly conterminous groups. Conversely, when the contiguity constraint is set very low (Fig. 5B), the interpretability of groups decreases, as units of the same cluster tend to be scattered. Hence, in this application, our choice consists of a slight level of 0.20 for \(1-\beta\), which reveals only partially scattered groups and good adaptation to the phenomenon under investigation.

Fig. 5
figure 5

Spatial configuration obtained by SA-PCA for 5 groups with different levels of \(1-\beta\), (A) \(1-\beta =0.45\), (B) \(1-\beta =0.05\)

Another relevant choice to be addressed before applying SA-PCA is the number of groups. Although in composite indicators the choice of groups can often be linked to specific research aims (Nardo et al. 2005), certain selections may lead to suboptimal solutions. Hence, this choice must be properly considered by researchers to obtain more reliable and interpretable configurations.

Nevertheless, the proposed SA-PCA algorithm is studied to trim the objective function by a constraint that discourages nonconterminous groups. As mentioned before, this feature encourages more conterminous spatial configurations that tend to be more robust to outliers in the geographical area under study (Benedetti et al. 2013). Additionally, robustness to spatial outliers obtained by using constraints may help the stability of the grouping procedures (García-Escudero et al. 2010). In addition to being less sensitive to outliers, groups obtained by SA-PCA preserve a certain amount of proximity between units despite the number of groups allowed, which can help interpret underlying spatial features over the study area.

In the application, we first obtain partitions as solutions of the SA algorithm and then calculate the \(KL\left(K\right)\) function for various levels of \(K\). The results for the criterion are summarised in Table 3. By maximising the \(KL\) statistic on the residuals obtained by the solution to the optimisation problem, we find that a level of \(K=5\) is the most convenient to apply SA-PCA.

Table 3 Criterion values for different numbers of groups \(K\) obtained by SA-PCA

Figure 6 presents the map of the clusters of units obtained when using our SA-PCA algorithm. Group 1 includes the City of Rome and neighbours’ municipalities mainly in the centre of the province. A sizeable number of southern municipalities are in Group 2. Group 4 is largely composed of areas located in the west of the province, while Group 3 consists of two smaller pockets in the east of the province. The rest of the eastern municipalities are mostly included in Group 5.

Fig. 6
figure 6

Groups of spatial units obtained by SA-PCA

In terms of local characteristics, Group 1 includes densely populated municipalities in the centre of the province, while Group 2 is comprised of coastal municipalities largely linked to the city of Rome (such as Anzio, Nettuno, and Velletri). Group 3 consists of small and scarcely populated municipalities in the mountain area in the east (e.g., Arsoli, Subiaco, and Vallinfreda). Last, Group 4 includes mainly rural municipalities on the hills at the borders, especially those in the south, known as Castelli, and Group 5 includes a variety of larger municipalities often considered to be Roman suburbs (e.g., Guidonia Montecelio and Tivoli).

As expected, the five clusters show substantial differences in the underlying phenomenon. The compositions of eigenvectors for the first and second components are shown in Tables 4 and 5, respectively. From the results, we see that considering spatial heterogeneity leads to differences resulting from local instabilities in the loadings. From Table 4, those differences among the five different groups can be further appreciated.

Table 4 Loadings of the first component for each group obtained with SA-PCA
Table 5 Loadings of the second component for each group obtained with SA-PCA

In Group 1, business and water are the major drivers together with the presence of welfare aids. The value of tourism is also consistent with the presence of tourists that come yearly across the city of Rome and Fiumicino airport. For Group 2, business is important and positively correlated with the phenomenon under investigation. Additionally, PGI and welfare are positively connected to the phenomenon. For Group 3, health, banks, and public transport represent the most relevant loadings, contributing to an increase in the first component scores. For Group 4, the level of local service in those municipalities is mainly related to banks. In Group 5, the increasing presence of museum and PGI mainly contribute to the positive development of local services.

Table 5 indicates the loadings for the second components of the five groups generated by SA-PCA. Once again, the evidence emphasises structural heterogeneity suggested by diverse values of eigenvectors. For instance, the second component of Group 1 indicates the need for facilities to take care of children at an early stage (high magnitude of kindergartens).

In Fig. 7, performances for global PCA (dashes), cluster-specific PCA (dots), GWPCA (dashes and dots) and SA-PCA (line) for the first five components are reported in terms of average proportion of variance accounted, while the plotted points indicate unit-by-unit fit for GWPCA. In the application, it also appears that SA-PCA helps to specify components according to structural differences in the VC matrix. This last feature is justified by a major increase in the proportion of variance accounted when compared to that of global PCA. Even when compared to that of GWPCA, representativeness increases by approximately five percent in terms of the average proportion of variance explained. Finally, our SA-PCA presents higher performance also when compared with that of cluster-specific PCA, calculated by using the \(k\)-means algorithm in the first step and standard PCA, for each cluster, in the second step.

Fig. 7
figure 7

Performance (average proportion of variance of the first five components) of global PCA (dashes), cluster-specific PCA (dots), GWPCA (dashes and dots), and SA-PCA (line). Grey circles show the by-unit GWPCA performance of 121 municipalities of Rome

In Fig. 8, the first component indicators are shown for SA-PCA (a), GWPCA (b), and cluster-specific PCA (c) to highlight some differences between the various definitions. SA-PCA reports higher levels for the composite indicator mainly for municipalities on the coast and in the northern part of the province. GWPCA shows higher levels for large municipalities in the immediate surroundings in eastern Rome. Moreover, cluster-specific PCA returns higher achievements mainly in Rome, at the northeast, and on the southeast side. Those differences may be connected to the diverse treatment of spatial issues. In SA-PCA, for instance, consideration of discrete heterogeneity slightly shifts the performance of Rome compared to when using cluster-specific PCA, where spatial effects are not explicitly considered. Conversely, this feature seems to favour seaside towns connected to the main city.

Fig. 8
figure 8

First component composite indicator for SA-PCA a, GWPCA b, and cluster-specific PCA c

Finally, in addition to improving fit performance, the empirical application shows that SA-PCA can directly address nonstationarity by identifying endogenous clusters of units. Therefore, the components (i.e., indicators) obtained from SA-PCA are perfectly comparable within groups, as they are directly obtained by loadings from the same spatial cluster. This feature may be of help for regional policy-makers in two ways. On the one hand, instead of relying on unit level indicators as in GWPCA, SA-PCA can define a limited number of clusters to model discrete heterogeneity, which is in many cases a reasonable assumption (Anselin 1988, 2010). Moreover, being easily interpretable, indicators from SA-PCA allow for local policies to be more accurately set, providing better support multilevel governance.

4 Concluding remarks

Spatial heterogeneity is a relevant issue when standard statistical techniques are applied. In this paper, this spatial effect has been investigated deeper to add to the previous literature in the field of PCA. To this end, we developed a novel methodology to consider heterogeneity in the form of various spatial groups. SA-PCA offers a novel approach to tackle unobserved spatial heterogeneity by using a spatially constrained algorithm.

As seen above, while GWPCA considers continuous nonstationarity, SA-PCA relaxes the hypothesis of spatial homogeneity by providing groups as a solution to a combinatorial problem. By identifying spatial clusters of units, SA-PCA allows us to individuate loadings and scores for units that share the same structure of the VC matrix in the same group.

In this paper, we benchmark this new methodology against other options while defining composite indicators. SA-PCA, GWPCA and cluster-specific PCA were applied to calculate a composite indicator of deprivation in 121 municipalities in the province of Rome. Here, SA-PCA generally had better representativeness than that of GWPCA and cluster-specific PCA in terms of the average proportion of variance explained. In this sense, SA-PCA offers a plausible alternative in the case of spatial heterogeneity. Additionally, SA-PCA allows for a more straightforward interpretation than does GWPCA because SA-PCA utilizes a limited number of different loadings and score sets.

Last, we note that the algorithm shows some limitations when applied to very large datasets, as a wider number of observations and variables can add to the computational burden. In future studies, the application to large datasets can be deepened by considering alternative algorithms extended to PCA for the same purpose.