1 Introduction

Various empirical studies have examined factors that influence the residential location, such as demographic characteristics of households, socio-economic characteristics of households, economic attributes of housing stock, housing characteristics/structural attributes of housing stock, social attributes of neighbourhood and accessibility attributes (Digambar et al., 2010; Sanit et al., 2013; Montgomery & Curtis, 2006; Andrew & Meen, 2006; Belart, 2011; Eliasson, 2010; Guo & Bhat, 2007; Habib & Miller, 2009; Limbumba, 2010). Limbumba (2010) clusters the location models or theories into three main themes, which are: (i) the ones which address the issue of accessibility to the CBD and workplace, (ii) the ones which address the Life-Cycle or Life- Stage, and (iii) the ones which address neighbourhood, environment and the community. While Digambar et al. (2010) categorise location theories into four main streams: geographic models, social models, economic models, and hybrid models. These models or theories seek to explain the rationale behind the decisions taken by households for the choice of certain residential locations.

Geographic models focus on the parameters of accessibility such as distance to the workplace, shopping destinations, social facilities and amenities. In these models, cost implications are important based on affordability and choice (Digambar et al., 2010; Muth, 1969; Straszheim, 1980). Social models emphasize the life cycles factors such as age and structure of households, neighbourhood characteristics, quality of life, environmental pollution, community relations, ethnic and cultural ties and social recognition as the main explanatory variables (Rossi, 1955; Speare, 1974a, 1974b; Alba & Logan, 1991). Economic models focus more on economic factors such as housing prices and quality, subsidies and taxes, and the availability of housing finance (Ellickson, 1971, 1973, 1977; Goodspeed, 1998). On the other hand, hybrid models introduced the role of spatial externalities such as neighbourhood prestige, pollution, school quality and so on (Digambar et al., 2010). Moreover, hybrid models focus on the aspects of distance from CBD; nature of land use in the neighbourhood as the aspects which impact residential location choice (Digambar et al., 2010; Smith et al., 1988; Werezberger, 1995).

Discrete choice theory is another branch of research in residential location choice modelling. The discrete choice models are the standard techniques for modelling choice derived from the random utility theory (Suel, 2016; Walker, 2001). Discrete choice models analyse and explain the residential location choice differently from monocentric bid-rent theories. Discrete choice models consider the preference variations of the households and households’ decision-making processes in the residential location choice (Kim et al., 2005). They propose that the locational equilibrium of households is achieved by a probability distribution which indicates the odds that a household chooses one location among discrete choices (Anas, 1982; Kim et al., 2005). Moreover, discrete choice models assume that the ultimate goal of households’ behaviour is to maximize the combined utility from the set of commodities in the market subject to the budget constraint (Kim et al., 2005; Ogu, 2002; Sato, 2003). In other words, given the demographic characteristics of households such as marital status, employment status and household size, individual households choose a particular location based on their income level. The income level of the households is a very important factor that determines the affordability of a house in a particular location. This, in turn, affects other household characteristics such as the quality of the house, the socio-economic characteristics, economic attributes of housing stock, quality or structural attributes, social attributes of the neighbourhood and accessibility attributes.

It is evident against this background that there is little agreement or consensus on the effects of these factors on residential location choice behaviour. This is partly because there are a lot of interactions between factors influencing residential location choice behaviour or because very few researchers have organised these factors into a model, so as to investigate the relationships between them. Moreover, the housing market is a complex and heterogeneous market, the cognitive structures of housing users for housing attributes are also complex as well as their choice behaviours. The interactions between multiple physical factors (such as type of building, location of the house, access to roads and access to shopping centres) and behavioural factors (such as attitudes, beliefs, knowledge, norms, perceptions and reputation) factors influencing residential location choices is a very complex issue. The evidence this backdrop further proves that it is not easy to empirically measure and quantity residential location choice behaviour.

In this study, we remedied the limitations of previous studies in several ways. First, in addition to the survey method, factor analysis is used as an empirical tool to reinforce obtained results. Second, unlike previous works that mainly explore urban areas, this present study focuses explicitly on South Africa where there are challenges of urbanization, the population is growing and housing challenges are apparent.

2 Housing market context in South Africa

The provision of adequate housing is a perennial and emotive issue in South Africa, which can be traced back to the Apartheid era and the adoption of a draconian rule that leads to segregation along racial lines. During this period, the government explicitly introduced the laws of segregation which were used as tools to exclude the majority of the black population in this country. The various forms of segregation were applied to Africans, Indians, as well as people of colour. This form of segregation is persistent in almost every area of life. The provision of houses was made in the planned segregated residential areas, particularly in the urban areas, whereas in the regional scale or rural areas, the provision of houses in the demarcated homelands. It is also important to note that, unlike in the current dispensation, the government did not provide decent houses to the majority of the population during this period. Additionally, during this period, the majority of the people were not allowed to move in the country freely. As a result, the choices in the houses were non-existing. Thus, the theory of choice and preferences in the housing market was not tested in the South African context. These laws made it possible for an exclusive housing market that favoured the White minority. Most black people were placed in their former homelands according to their ethnic groups and were not allowed to reside where they wished freely. During the apartheid era, townships in South Africa were provided with insufficient housing (Mamba, 2008).

Moreover, the provision of houses during this period was characterised by forced removals, influx control and the provision of rental houses in the early 1970s (Bailey, 1995; Goodlad, 1996). As a result, the majority of the people were confined in terms of their choices in the housing market. The majority of the people in the urban areas were reduced to rental housing or hostels, which were mainly single-sex male hostels. The Group Areas Act of 1950 (GAA) introduced segregation measures in the housing market. The implementation of this Act led to a large-scale of forced removals in South Africa in places such as Sophiatown (Johannesburg), District Six (Capetown), Cato Manor (Durban), Lady Selborne (Pretoria), South End (Port Elizabeth) and Duncan’s Village (East London). Before these removals, people were staying freely in the racially mixed inner-city slums. These removals paved the way for the massive township development such as Pretoria-Witwatersrand-Vereeniging (PWV) and, later, the establishment of others like Soweto, Kathorus Daveyton, KwaMashu, Mamelodi and Gugulethu (Harrison & Todes, 2013).

This study contributes to the literature on two fronts. Firstly, after democracy, the South African housing market faced structural change and transition in terms of both supply and demand of houses, particularly for the Black majority, which was excluded from the participation of the housing market due to Apartheid. After the apartheid era, South Africa faced a housing backlog and shortage. So there was a need for the new democratically elected government in 1994 to address this challenge. Moreover, due to the policies of exclusion during apartheid and the segregation laws in the housing delivery process, households in this country did not have the liberty to freely choose their places of residence. The majority of the people were confined to certain restricted areas. Since the dawn of democracy in this country, people have been allowed freely to choose the places of residence and the houses they want as long as they can afford to do so. This phenomenon also opened the gates of urbanization in the country as more people who were previously confined by apartheid laws flocked to the towns and cities.

Secondly, in the existing literature, most of the studies are devoted to investigating factors influencing residential location choice behaviour employing conventional models which have drawbacks such as non-dynamism. These approaches have failed to provide an intelligible viewpoint of the residential location choice behaviour processes and to take into account the role of households’ behaviours in the residential location choice. The findings of this study will shed more light on how to effectively implement housing policies to circumvent overcrowding vis-à-vis available limited housing. Most of the policy initiatives in the housing market are designed without really taking into account the interactions between the behavioural and deterministic variables which determine or predict the households’ preferences and choices. Too frequently, the policy initiatives are based on the conventional research paradigms which rely on statistical techniques or quantitative methods which did not take into the behavioural factors influencing the decisions of households in the residential location choice behaviour.

Most importantly, the results of this study will deepen the understanding of professionals, academics and policy-makers on unobserved underlying factors driving the behaviour of people in the selection of residential properties or locations for development housing projects. The understanding of these factors in this study will also be useful to property developers, and surveyors, to have in-depth knowledge of the important factors influencing residential locations and attracting household investment. If property developers or policy-makers are able to understand the self-connection between housing users and the housing market or housing choice, they will be empowered in designing more effective policy interventions in the housing market or human settlement. Also, it will allow property developers to be profitable in terms of demand, supply, and consumer preference. Against this backdrop, this study investigates the underlying factors influencing residential location choice behaviour in South Africa by employing EFA and CFA techniques.

3 Methodology

This paper employs a quantitative approach with data collected utilizing a structured questionnaire (see Appendix A in supplementary), with a 5-point Likert scale. A structured questionnaire containing 90 indicator variables identified from the literature was administered online to a total of 266 households in South Africa. Collected data were analysed in two stages; the first stage involved EFA, while the second stage involved a CFA. EFA and CFA were utilised in analysing the collected data, with goodness-of-fit based on a two-index strategy used in determining model acceptability.

EFA was employed in the first stage to determine the unobserved (latent) factors driving residential location choice decisions among households. EFA is preferred since it is more suitable for determining the exact number of latent factors underlying the set of items in each model construct in a conceptual and statistical manner (Wipulanusat et al, 2017). The approach also provides valuable insights into the dimensionality of the latent variables and confirmed the reliability of the measurement scales underpinning the model constructs (Wipulanusat et al, 2017). CFA was utilised in the second stage to affirm the results to provide a foundation for subsequent model assessment and refinement (Wipulanusat et al., 2017). The CFA results were used to demonstrate whether the model had acceptable levels of fit, convergent validity, discriminant validity and unidimensionality.

4 Results and discussion

This section discusses the results from the two-stage approach of both EFA and CFA.

4.1 Exploratory factor analysis

We applied the exploratory factor analysis procedure to uncover the unobserved (i.e., latent) factors, considered as main determinants underscoring the influence of the identified constructs that measure residential location choice.

4.1.1 Reliability analysis and internal consistency of the data

The KMO test proposed by Kaiser (1974) and Cronbach’s Alpha test (Hair et al., 2010) were used as diagnostic tests to evaluate the sampling adequacy and internal consistency respectively, across all responses or items from the questionnaire survey. Kaiser–Meyer–Olkin, (KMO) and Cronbach’s Alpha were used for diagnostic tests to assess sampling and internal consistency—reliability of the survey questions, respectively. Typically, the value of the Cronbach’s Alpha test ranges from 0 to 1(Tabachnick & Fidell, 2007). The value of the Cronbach Alpha test typically ranges from 0 to 1. For analytical purposes, values: > 0.9 = excellent; > 0.8 = good; > 0.6 = questionable; > 0.5 = poor; 0.5 = unacceptable. There is internal consistency across all items or the variables within a single measure if the statistical value of Cronbach’s Alpha test is closer to 1, while the opposite exists if this value is closer to 0 (Tabachnick & Fidell, 2007). A value closer to 1 shows an internal consistency among variables. For analytical purposes, the value of the KMO must be close to 0.5 (minimum) to satisfactory factor analysis (Kaiser, 1974). The criteria for KMO values: 0.5 = barely acceptable; 0.7—0.8 = acceptable; 0.9 = very good.

Looking at the reported results in Table 1, the reliability of an instrument or questionnaire is concerned with the consistency, stability and dependability of the scores. For this purpose, Cronbach’s alpha test was employed to test the internal consistency of each factor. According to Blunch (2008), if the alpha value is higher than 0.9, the internal consistency is excellent, and if it is at least higher than 0.7, the internal consistency is acceptable. Table 1 shows the internal consistency of Dwelling Unit Features, Services Provided by the Government, Households’ Self-Congruence and Functional congruence, Green Building Features and Stakeholders’ Relations are excellent since their Cronbach’s values are higher than 0.9. Whilst the internal consistency of Demographic Characteristics and Neighbourhood Features is acceptable Cronbach’s values are more than 0.7.

Table 1 KMO and cronbach alpha of the factors

Then next, after the execution of the Kaiser–Meyer–Olkin (KMO) Measure of Sampling Adequacy (KMO) test and Bartlett’s Test of Sphericity in order to determine construct validity and to confirm that the data collected for an exploratory factor analysis were appropriate. Table 1 confirms the reliability and internal consistency of the data. Specifically, the values of the KMO and Bartlett’s sphericity of all factors are significant at a 1% critical level. In addition, the reported results in Table 1 confirm the reliability and internal consistency of the data in all the factors. All these results confirm data consistency. While the Cronbach-Alpha value that exceeds 0.7 confirms the reliability of the data used.

4.1.2 Pattern matrix

Before proceeding with the factor analysis, the next time was to evaluate the values of the extraction communalities between the administered survey question to remove misspecification bias and unreliable inference. Communalities show how the variance in a variable has been accounted for in the extracted factors. Tables 2, 3, 4, 5, 6, 7 and 8 present results of commonalities of each survey question (variable) using the EFA. Firstly, the values of the extraction communalities represent the variance in each item calculated before and after the factor analysis. The values of such communalities for each item which are less than 0.50 were dropped from further analysis (Hair et al., 2010).

Table 2 Demographic Characteristics: Factor loadings, eigenvalues and percentage of variances
Table 3 Dwelling features: factor loadings, eigenvalues and percentage of variances
Table 4 Neighbourhood features: factor loadings, eigenvalues and percentage of variances
Table 5 Services provided by the government: factor loadings, eigenvalues and percentage of variances
Table 6 Households’ self-congruence and functional congruence: factor loadings, eigenvalues and percentage of variances
Table 7 Green building features: factor loadings, eigenvalues and percentage of variances
Table 8 Stakeholders’ relations: factor loadings, eigenvalues and percentage of variances

Having demonstrated that the collected data (i.e., based on respondent’s feedback) from the survey are reliable and consistent based on the satisfactory results of the KMO and Cronbach Alpha tests, the next step is to extract the common components (or latent factors) underscoring the influence of the seven (7) factors, namely, Demographic Characteristics, Dwelling Unit Features, Neighbourhood Features, the Services Provided by the Government, Households’ Self-Congruence and Functional Congruence, Green Building Features and Stakeholders Relations. These are based on the assessment of the computed pattern matrix generated in the EFA models of each of these seven (7) factors and the results are discussed in the sections that follow. Tables 2, 3, 4, 5, 6, 7 and 8 present the detail of factor loading, eigenvalues and percentage of variance explained by the extracted constructs for all seven (7) factors.

The results of the estimated EFA model for demographic characteristics are reported in Table 2 and its scree plot in Fig. 1. Explicitly, the results of the EFA model for the 12—demographic characteristics items confirmed only three (3) latent factors as the main factors underscoring the influencing of demographic characteristics as a construct measuring residential location choice. These factors are associated with (i) household structure, (ii) socioeconomic and (iii) ethnicity and race. These three components jointly account for about 57% of the variance in the data underscoring the influence of demographic characteristics as a construct measuring residential location choice. In the context of evaluating the influence of these factors on residential location choice (in this case, the construct), these three factors explained about 40%, 10% and 7% of the variance in the data, respectively (Table 2). On this basis, the first factor appears to be more important than the second factor and is followed by the third factor, with loadings of 4.03, 3.33 and 3.22, respectively. By interpretation, the main factors influencing residential location choice in demographic characteristics can be explained by the extracted three (3) latent factors, with varying specificities.

Fig. 1
figure 1

Demographic characteristics: scree plot

Moreover, the results of the estimated EFA model for dwelling unit features) are reported in Table 3 and the scree plot is presented in Fig. 2. Explicitly, the results of the EFA model for the 16—dwelling unit features items confirmed only three (3) latent factors as the main factors underscoring the influence of dwelling unit features as a construct measuring residential location choice (Table 3). These factors are associated with: (i) physical features of the building, (ii) children’s features and (iii) intangible features of the building. The components of these factors derived from the factor analysis are presented in Table 3. These three components jointly account for about 64% of the variance in the data underscoring the influence of the dwelling unit features as a construct measuring residential location choice. Based on the rotated sums of the factor loadings, the first factor appears to be more important than the third factor and is followed by the second factor, with loadings of 6.66, 4.76 and 2.85, respectively. By interpretation, the main factors influencing residential location choice in dwelling unit features can be explained by the three (3) extracted latent factors with varying specificities.

Fig. 2
figure 2

Dwelling unit features: scree plot

Similarly, the results of the estimated model for neighbourhood features are reported in Table 4 and its scree plot is presented in Fig. 3. Evaluating the computed pattern matrix generated in the EFA model displayed in Table 4, it can be seen, that two (2) unobserved factors were extracted from neighbourhood features as a construct measuring residential location choice. These factors are associated with (i) municipal services and (ii) accessibility features. The components of these factors derived from the factor analysis are presented in Table 4. These two components cumulatively explained roughly 49% of the variance in the data underscoring the influence of the neighbourhood features as a construct measuring residential location choice. The first factor explained about 38% of the variance of the measured construct, while the second factor explained a lesser variance of about 10% of the measured construct. Based on this, the first factor appears to be more important than the second and third factors. By interpretation, the main features that largely define the neighbourhood features' influence in the residential location choice can be explained by the extracted two (2) latent factors with varying specificities.

Fig. 3
figure 3

Neighbourhood features: scree plot

Additionally, the results of the estimated EFA model for the services provided by the government are reported in Table 5 and its scree plot in Fig. 4. Considering the computed pattern matrix generated in the EFA model displayed in this table, two (2) unobserved factors were extracted from services provided by the government as a construct measuring residential location choice. These factors are associated with (i) municipal utilities and (ii) redundant services. The components of these factors derived from the factor analysis are presented in Table 5. These two components jointly explained about 66% of the variance in the data underscoring the influence of the services provided by the government as a construct measuring residential location choice. The first factor explained about 58% of the variance of the measured construct, while the second factor explained a lesser variance of about 9% of the measured construct. Based on this, the first factor appears to be more important than the second and third factors (Table 5). By interpretation, the main features that largely define the influencing of the services provided by the government in the residential location choice can be explained by the extracted two (2) latent factors, with varying specificities.

Fig. 4
figure 4

Services provided by the government: scree plot

Focusing on the computed pattern matrix generated in the EFA model displayed in Table 6 and its scree plot in Fig. 5, three (3) unobserved factors were extracted from households’ self-congruence and functional congruence as a construct measuring residential location choice. These three components jointly explained about 65% of the variance in the data underscoring the influence of the households’ self-congruence and functional congruence as a construct measuring residential location choice. These factors are associated with: (i) similar household, (ii) quality and reputation, and (iii) government or municipal services. The first factor explained about 39% of the variance of the measured construct, while the second factor explained a lesser variance of about 18% of the measured construct and the third factor lastly explained 8% of the measured construct. Based on this, the first factor appears to be more important than the second and third factors (Table 6). By interpretation, the main features that largely define the households’ self-congruence and functional congruence in the residential location choice can be explained by the extracted three (3) latent factors, with varying specificities.

Fig. 5
figure 5

Households’ self-congruence and functional congruence: scree plot

Turning on the computed pattern matrix generated in the EFA model displayed in Table 7 and its scree plot in Fig. 6, as can be seen, only one (1) unobserved factor was extracted from green building features as a construct measuring residential location choice. This component explained about 67% of the variance in the data underscoring the influence of the green building features as a construct measuring residential location choice. Based on this, this factor appears to be more important (Table 7). By interpretation, the main features that largely define green building features influencing the residential location choice can be explained by the extracted one (1) latent factor.

Fig. 6
figure 6

Green building features: scree plot

Considering the computed pattern matrix generated in the EFA model displayed in Table 8 and its scree plot in Fig. 7, as can be seen, two (2) unobserved factors were extracted from stakeholders’ relations (SR) as a construct measuring residential location choice. These factors are associated with: (i) communication management and (ii) engagement management. These two components jointly explained about 65% of the variance in the data underscoring the influence of the stakeholders’ relations as a construct measuring residential location choice. The first two (2) factors explained about 57% and 8% of the variance of the measured construct, respectively (Table 8). Based on the rotated sums of the factor loadings, the first factor appears to be more important than the second factor, with loadings of 4.74 and 3.94, respectively (Table 8). By interpretation, the main features that largely define the influencing of the stakeholders’ relations in the residential location choice can be explained by the extracted two (2) latent factors, with varying specificities.

Fig. 7
figure 7

Stakeholders’ relations: scree plot

4.2 Confirmatory factor analysis

The unidimensional model for residential location choice is presented in this section. The analysis is conducted in two steps. Firstly, a series of Confirmatory Factor Analyses (CFAs) to test for the measurement equivalency for each of the eight proposed latent constructs (individual latent models attached in Appendix B in Supplementary Notes) and the manifest or composite variables for RLC represented in the hypothesised model of RLC in Fig. 8. The number of cases that were analysed for the full latent variable model was from the sample of 521. Out of the total sample size, all 266 cases had positive weights, while 255 had missing variables, which were corrected with the maximum likelihood method of missing data correction. The model had 91 observable variables in the distributed questionnaire. The robust maximum likelihood estimation method was utilised to analyse the covariance matrix of the model. The raw data used for the analysis was not transformed since data transformation can provide an incorrect specification.

Fig. 8
figure 8

An integrated conceptual model of residential location choice (RLC)

The CFA results defined the relations between the observed and unobserved variables. In order words, it provided the link between scores on a measurement instrument and the underlying constructs they are designed to measure. This was done to reaffirm the factor structure of the observed and unobserved variables, hence, the construct validity. Secondly, the fit of the entire measurement model underlying the hypothesised structural model was tested. The structural model defined the relationship amongst the different exogenous variables and specified how each exogenous variable directly or indirectly influences the changes in the values of other exogenous constructs in the model, thus, defining the endogenous or dependent variable (RLC). All analyses were performed using EQations software (EQS), including testing the hypothesised Structural Equation Models.

The results revealed that the eight exogenous or independent variables incorporated in the full structural model were working very well based on the parameter estimates and their statistical significance. Covariances between the exogenous factors were added to the model to establish their relationships among themselves. Moreover, covariances between the exogenous variables and outcome variables are also added to rule out the possibility that any of them may serve as an indicator of any of the proposed factors. Furthermore, the distribution of residuals was symmetrical and centred around zero for the model to be designated as well-fitting and had a good fit (Byrne, 2013:94; Aigbavboa, 2013). Moreover, the analysis of the measurement models showed that the models (latent variables CFA’s) were working very well based on the parameter estimates and their statistical significance. Therefore, based on this, it was feasible to test the full latent variable model.

A confirmatory factor analysis of the full latent model was conducted. The full structural model hypothesised that demographic characteristics (DCE), dwelling unit features (DUF), neighbourhood features (NBF), services provided by the government (SPG), the assessment of the households’ self-congruity (HSC), functional congruity (HFC), green building features (GBF) and stakeholders’ relations (SR) define RLC. Figure 8 presents the SEM Model which is founded on the general hypothesis for the study, which is based on the fact that overall RLC is directly related to the influence of the exogenous variables in predicting overall RLC.

The number of cases that were analysed for the full latent variable (Fig. 8) was from the sample of 521. Out of the total sample size, all 266 cases had positive weights, while 255 had missing variables, which were corrected with the maximum likelihood method of missing data correction. The model had 88 observable variables in the distributed questionnaire. The robust maximum likelihood estimation method was utilised to analyse the covariance matrix of the model. The raw data used for the analysis was not transformed since data transformation can provide an incorrect specification (Shook et al., 2004:399).

4.2.1 Analysis of residual covariance estimate

The results revealed that the absolute standardised residual values of all unobserved constructs in this model as shown in Table 9 are less than 2.58, demonstrating that their distribution of residuals was symmetrical and centred around zero (Byrne, 2013:94), which was displayed by the analyzed data in this model. Furthermore, the distribution of residuals should be symmetrical and centred around zero (Byrne, 2013:94) since 100% of them fell within the acceptable range of −0.1 and + 0.1, showing that their average off-diagonal absolute values were close to zero as shown by the analysed data. Therefore, the findings from this suggested that the hypothesised RLC model in this study appeared to be quite well-fitting and had a good fit for the sample data. Therefore, further goodness-of-fit tests on the measurement model are justified since this initial assessment of the structural model residuals indicated a good fit.

Table 9 Factor loadings and Z-statistics of estimated parameters

4.2.2 Structural model hypothesis testing

Moreover, evaluating the goodness-of-fit of the structural model, the feasibility of a model was judged by a further inspection of the obtained solution, and this involves inspection of the statistical significance of the parameter estimates, standard errors and the test statistics (Raykov et al., 1991). It was generally hypothesised that households’ overall RLC is directly related to the influence of the exogenous variables in predicting their choices of places of residence in South Africa. Results from the SEM analysis yielded support for all the exogenous variables. The hypothesized relationships between the exogenous and endogenous factors were significant, and all had definite positive directions. Examination of the correlation values, standard errors and the test statistic in Table 9 revealed that all eight test statistics (Z-values) were greater than 1.96 (p < 0.05), and the signs were appropriate: they all have positive values. This suggested that all eight (8) measured variable estimates considerably influence RLC and are statistically significant. Therefore, the general hypothesis that households’ overall RLC is directly related to the influence of the exogenous variables in predicting their choices of places of residence in South Africa could not be rejected. Furthermore, the assessment of the outcome variables of overall RLC revealed that all standardised factor loadings values were generally large and statistically significant.

4.2.3 Statistical goodness-of-fit of the indices

Goodness-of-fit indices were examined to determine how well the factor model (Fig. 8) represented the data. The Normed Fit Index (NFI), Comparative Fit Index (CFI), Root Mean Square Error of Approximation (RMSEA), and Standardised Root Mean Residual (SRMR) (Hair et al., 2010) were fit indices that were utilised. Findings in Table 10 show that all the goodness-of-fit statistics indices met the decision criteria and are acceptable. This suggested that model in Fig. 8 is a good fit for RLC outcomes and is accepted as the overall integrated RLC model.

Table 10 Robust fit indices for structural model (Fig. 8)

4.2.4 Internal reliability and construct validity of the SEM model

The scale of accuracy or reliability and validity of the measured model was evaluated by four tests, namely; Composite reliability (CR), average variance extracted (AVE), maximum shared variance (MSV) and average shared variance (ASV) in this study (Fornell & Larcker, 1981). Results from presented Table 11 revealed that composite reliability for all indicator variables in the model exceeded 0.7. This demonstrated that all these factors had adequate internal consistency, and the overall amount of variances in the indicators was accounted for by the latent construct. Moreover, this value showed that all the indicators were sufficient to represent the construct (Fornell & Larcker, 1981; Mgiba, 2016). Similarly, the overall amount of variance in the indicators accounted for by the latent construct is revealed by the average variance extracted (AVE). The values of AVE in this measurement model were greater than 0.5 except the one for demographics, demonstrating that more variance was captured by the variables within each factor and shared more variance in the factor than with the other factor. Using average variance extracted (AVE), green building was better explained RLC, followed by government services and stakeholder relations. Finally, the MSV values for all constructs were greater than the ASV values. Therefore, all the constructs successfully passed reliability and convergent and discriminant validity tests. This also suggested that the model in Fig. 8 was a good fit for RLC outcomes. Therefore, these eight (8) indicator variables were the most significant driving factors influencing residential location choice (RLC).

Table 11 Reliability and construct validity of the latent variables

4.2.5 Discriminant validity of the SEM model

Table 12 presents the inter-construct correlation matrix for all paired latent variables in the measured model, which assisted in the discriminant validity of the measured constructs. If there is no discriminate validity, variables correlate more highly with variables outside of the parent factor than with variables within their parent factors. That implies that the latent factor is better explained by some other variables (Fornell & Larcker 1981; Mgiba, 2016). Therefore, to check if there is discriminate validity or not of the research constructs, discriminant validity was ensured in this study. The evaluation of whether the correlations among latent constructs are less than 1.0 was conducted to check the discriminant validity of the research constructs. According to Wipulanusat et al. (2017) and Kline (2015), discriminant validity provides evidence that a construct is distinct from other constructs and captures the phenomena and concepts that other constructs do not. Initial evidence of discriminate validity is provided by inspecting the correlation coefficient between each pair of constructs. If two constructs have a significantly high correlation coefficient (i.e. greater than 0.850), then it might reflect the same concept and should be incorporated as a single construct (Tabachnick & Fidell, 2007; Wipulanusat et al., 2017). Unidimensionality can be established when the variables load only a single construct. In order to be considered unidimensional, all model fit indices must meet the acceptable level (Koufteros, 1999; Wipulanusat et al., 2017).

Table 12 Inter-construct correlation matrix

The inter-construct correlation matrix for all paired latent variables in the measured model, which assisted in the discriminant validity of the measured constructs are presented in Table 12 (Fornell & Larcker, 1981; Mgiba, 2016). The findings showed that inter-correlation values for all paired unobserved constructs are less than 1.0. This suggested discriminant validity amongst these constructs (Nunnally and Bernstein, 1994; Mgiba, 2016).

5 Discussion and analysis

The study generally hypothesized that RLC is related to the influence of demographic characteristics. The results suggest that marital status, head of household, age, number of children and gender (DEC2; DEC3; DEC4; DEC5 and DEC1) have a significant effect in influencing residential location choice. The residential choice was also found to be dependent on household demographics such as household size, life cycle and income (Ubani et al., 2017). These findings concur with the existing literature (Alkay, 2011; Reed and Mills, 2007; Wang and Li, 2006). Moreover, the study generally hypothesized that RLC is related to the influence of dwelling unit features. The results from SEM showed that the location of the living room, size of the bedrooms, size of the kitchen, size of the bathrooms and size of the wardrobes or closets (DUF2; DUF3; DUF4; DUF5 and DUF6) are the most significant factors influencing RLC. The study revealed that location improvement of the dwelling unit such as the amount of privacy in the units, size of the wardrobes and the overall appearance of the dwelling improve the satisfaction of the occupants. The implication of these findings dwelling unit features has a direct influence on residential location choice. Hence, the residential location choice can be enhanced through the improvement of dwelling unit features, such as the location of the living room, size of the bedrooms, size of the kitchen, size of the bathrooms and size of the wardrobes or closets.

In addition, the study generally hypothesized that RLC is related to the influence of neighbourhood features. The results from SEM showed that the closeness of the residence /residential area to public transportation, availability of free parking on the street (municipal pavements/parking bays) outside the residence, availability of accessible walkways and access to main roads, and proximity of residence to the public recreation area or leisure parks (NBF 8; NBF 7; NBF 9 and NBF 10) are the most significant factors influencing RLC. These findings replicated the results of a majority of studies on residential location choice about neighbourhood features (Axhausen et al., 2004; Bürgle 2006; Chen et al., 2008; Zhou and Kockelman 2008; Eluru et al., 2009; Habib & Miller 2009; Lee & Waddell 2010; Belart 2011; Pinjari et al., 2009, 2011; Zolfaghari et al., 2012). Moreover, the study generally hypothesized that RLC is related to the services provided by the government. The results from SEM showed that the availability of a functional drainage system, timely collection of refuse or garbage by municipal workers, provision of uninterrupted electricity supply, availability of an uninterrupted and clean water supply, and the provision of good quality public services by the municipality (SPG 1; SPG 2; SPG 7; SPG 8 and SPG 11) are the most significant factors influencing RLC. These findings are consistent with the previous studies (Aigbavboa, 2013; Tiebout, 1956). The findings in the current study are significant in the sense they include or suggest a checklist of the most important services that government need to provide in order to achieve sustainable human settlement solutions, particularly at the municipal level. This checklist includes the availability of a functional drainage system, timely collection of refuse or garbage by municipal workers, provision of uninterrupted electricity supply, availability of an uninterrupted and clean water supply, and the provision of good quality public services by the municipality.

Furthermore, the study generally hypothesized that RLC is related to the influence of households’ self-congruence. Results from SEM showed that: how similar the typical household in the area is to how others see me, how similar the typical household in the area is to how I would like to be, how similar the typical household in the area is to how I would like to see myself, how similar the typical household in the area is to how I would like others to see me and how similar the typical household in the area is to how I ideally like to be seen by others (HSC 4; HSC 5; HSC 6; HSC 7 and HSC 8) are the most significant factors influencing RLC. These findings were consistent with the findings in the literature (Aguirre-Rodriguez et al., 2012; Hosany & Martin, 2012; Johar & Sirgy, 1991; Sirgy, 1982; Sirgy et al., 2005). Likewise, the study generally hypothesized that RLC is related to the influence of households’ functional congruence. The findings from the SEM model suggested that the features of households’ functional congruence have a direct influence on predicting residential location choice. Findings revealed that the features of households’ functional congruence had a significant association with the latent variables in predicting the endogenous variable (i.e., residential location choice). The results showed that the place where the house is located has good amenities for household, the place where one stays is of a high-quality for residential location, this place has been long regarded as high-quality for residential location, this place has a long history and good reputation of being a residential location and this place is convenient for the location of houses (HFC1, HFC2, HFC3, HFC4 and HFC5) are the most significant factors influencing RLC.

Similarly, the study generally hypothesized that RLC is related to the influence of green building features. The results from SEM showed that having flood retention measures within the location (e.g. ponds, rivers, green roofs and rainwater retention), having energy-efficient cooling through plants and spaces, having buffering for urban sprawl (e.g. establishing a green belt), having strong biodiversity and having a reduction in waste heat and greenhouse gas emissions through energy efficiency, transit access, and walkability (GBF4, GBF7, GBF9, GBF10 and GBF13) are the most significant factors influencing RLC. This suggests that the presence of the features of green buildings in the model has a greater influence on the RLC. The findings in the current study are significant in the sense they include or suggest the most important indicators or variables that governments need to look into in order to raise awareness of greenhouse emissions in the building sector with relevant stakeholders to help achieve the global carbon emission goals. Equally, the study generally hypothesized that RLC is related to the influence of stakeholders’ relations. The results from SEM showed that the quality of communication with other households in the area, quality of communication with local and regional community professionals and developers, quality of communication with local institutions, frequency of communication with local stakeholders and the likelihood of maintaining a good relationship with local stakeholders (SR1; SR2; SR3; SR4; SR5) are the most significant factors influencing RLC. Findings suggest that when stakeholders’ relations are incorporated into the housing delivery process, the outcomes are more likely to suit the local circumstances in the housing market, ensure community ownership and ensure sustainable development in this market.

6 Conclusions

From the onset, in this paper, we set out to empirically examine the factors influencing residential location choice in South Africa. We add value to the existing literature by evaluating these factors by employing a multi-pronged analytical approach consisting of a questionnaire survey and factor analysis. To the best of our knowledge, this is the first attempt, to assess factors influencing residential location choice using this methodology, particularly in South Africa. The overall model postulated that RLC is directly related to the influence of the exogenous variables in predicting or determining overall RLC outcomes. These results were obtained from an analysis of SEM to investigate whether the indicator variables (questionnaire items) measured the constructs that they were supposed to measure. Results were also presented to establish whether the statistically significant number of factors for the model was feasible. Equally, the measurement reliability of the model, as well as construct validity, were also reported.

The structural model analysis (full latent model) was conducted, which validates the hypothesized integrated residential location choice model. The influence of the independent variables (latent variables) on the dependent variable (endogenous variable) was also reported. Considering the feasibility and statistical significance of all good fit estimates, with particular reference to the index values of SRMR (0.060), NFI (0.835), NNFI (0.913), CFI (0.921), RMSEA (0.053) and RMSEA (95%) (0.048–0.058) and lack of any substantial evidence of model misfit, the model was deemed fit. Therefore, it was concluded that there is no need to further improve the structural model's fit. Further findings from the SEM results showed that the exogenous variables influence RLC in South Africa. Further, it was found that all eight exogenous variables influence RLC in South Africa, and their influence was statistically significant. It can be concluded that an eight-factor model schematically portrayed in Fig. 8 represents an adequate description of residential location choice in South Africa.

Based on these findings, for policy design, it is important that policymakers in South Africa closely pay attention to the uncovered novel features or constructs in this paper in order to improve policy designs and interventions in the housing market. Second, we assert that these uncovered novel features or constructs should be taken into account to improve the inclusiveness and social cohesion in the South African housing market. Nonetheless, it is crucial to note that, albeit interesting and valuable findings have merged from this study, the study is not without limitations. Thus, the findings in this study might reflect the experiences of the South African housing market. Therefore, the adoption of similar research in other countries might show some differences and disparities in the findings.