1 Introduction

The tension between the rapidly rising cost of living in large cities and housing affordability is becoming more obvious among middle/low-income tenant workers as China's urbanization process continues. At the fourth session of the 13th National People's Congress on 5 March 2021, the Chinese government work report states that the main tasks for China during the 14th Five-Year Plan period include “improving the housing market system and housing security system, and enhancing the quality of urbanization development” and a “focus on raising the income of low-income groups and expanding the middle-income group” to help new citizens and young people alleviate their housing difficulties. On the one hand, the overly high cost of living restricts the ability of middle/low-income tenant workers to spend money elsewhere, which negatively affects the health of the economy and the ability for this group to improve their standard of living. On the other hand, middle/low-income tenant workers have been left with no choice but to live in inferior circumstances or further away from job hubs due to rising prices and rents in urban cores, driven by the effects of economic agglomeration. In China, there is a critical need to address the living and working conditions of the middle- to low-income tenant workers in large cities.

The spatial mismatch hypothesis (SMH) and the jobs–housing imbalance in large cities have long been a matter of concern and discussion [1,2,3,4,5,6]. According to several studies, people's interactions between their jobs and residences are significantly influenced by the surrounding built environment [7, 8]. Studies on transit-oriented development (TOD), in particular, contend that excellent built environment design, through its effect on commuting patterns, can enhance urban environments and traffic situations [9,10,11,12]. There have been a number of studies on the jobs–housing and commuting behavior of disadvantaged groups such as ethnic minorities and poor families [13,14,15,16], but there has been little done in the Chinese context on the housing and commuting habits of middle/low-income tenant workers in large cities. Because of affordability issues, these workers have fewer housing alternatives, poorer living conditions, greater travel distances, and longer commute times than other groups [17]. The mechanisms underlying the impact of the built environment on the selection of rental housing by low- and middle-income groups and the relationship with commuting distances need to be further clarified as an important starting point for improving the living conditions and jobs–housing status of middle/low-income tenant workers. The current lack of research may make it difficult to comprehend important topics like rental housing, settlement planning, and building regulations for middle/low-income populations.

This study is based on data from the questionnaire survey of tenant workers in Beijing (2020), the third economic census in Beijing (2013), the census of geographical conditions in Beijing (2019), and the resident population obtained through big data methods (2018) and other points of interest related to jobs–housing such as transportation facilities (2018). From the perspective of middle/low-income tenant workers, the goal of the study is to investigate the nonlinear relationship between the residential built environment variables and the spatial distance between work and housing. The purpose of the study is to offer an essential scientific underpinning for the creation of site planning strategies for rental housing and settlements and associated urban transport policies in order to enhance the job–housing situation of middle/low-income tenant workers.

2 Literature Review

2.1 Analysis of the Objective Influencing Factors and Mechanisms of Jobs–Housing Balance

The modifiable areal unit problem (MAUP), a source of statistical bias, has long influenced the spatial scale and calculation methods concerning jobs–housing [18, 19]. According to some researchers, the government should coordinate the ratio of jobs to residential units through urban planning and tax policies in order to achieve a balance between employment and habitation in particular areas, as there are real-world constraints that prevent the market from performing its regulatory function [2, 20]. Therefore, jobs–housing balance is also seen as an urban planning and public policy issue, and effective administrative orders and planning controls can have a large impact on the spatial relationship between jobs and housing [3]. Other academics, on the other hand, have questioned the limitations placed on policy to encourage a balance between housing and employment: if more jobs are added close to where people live or more housing is added close to where they work, how likely is it that people will choose to live or work in that area [21]? They contend that, as a result of market forces, the spatial relationship between housing and work in cities will continue to evolve and improve over time, eventually leading to a balanced development of both. According to the co-location hypothesis, individuals will logically select their home depending on their preferences and other living amenities in order to maximize the overall utility of life. In order to increase their profitability, businesses will also periodically modify their spatial locations to take into account the effects of agglomeration and the spatial dispersion of possible personnel and customers. Therefore, many academics think that trying to strike a balance between work and housing through policy action is useless [22,23,24,25].

As society becomes more productive, urban inhabitants spend more time on leisure activities and less time working [26]. The location of employment and home may not be the most crucial aspect; therefore, choosing a place to live is based on a complicated array of circumstances [3]. The significance of the connection between work and home is reviewed in this context. Researchers are increasingly focusing on the mechanisms behind employment accessibility [27]. Employment variables including commuting time, commuting distance, commuting mode, and employment rate have demonstrated a very strong link with employment accessibility as a crucial indicator of jobs–housing balance [15, 16, 28,29,30]. The spatial relationship between work and housing is still an important issue to take into account for individual inhabitants, even though the macro function of the jobs–housing balance theory has been questioned by many. Employment accessibility can significantly influence housing decisions, particularly for disadvantaged populations like middle/low-income tenant workers [31,32,33]. The spatial relationship between renters’ jobs and housing is ultimately determined by employment accessibility, which is one of the most crucial factors to consider when choosing a place to live, particularly in China's large cities, where tenants are typically employed residents whose place of employment is relatively fixed.

Numerous studies have long demonstrated that elements of the built environment affect people's links between occupation and housing as well as their commuting habits [34, 35]. The built environment primarily affects the characteristics of land use in the link between employment and housing. According to a number of studies, there is a positive association between the jobs–housing situation and the land use mix, and this has an effect on how people commute [19]. A high land use combination enhances the likelihood that individuals will live close to where they work by balancing the supply of jobs and housing in a particular location. On the other hand, it is impossible to meet the employment needs of local residents or to rely solely on locally residents to fill jobs in the area due to the uneven supply and demand of jobs or residential units in a given area with a low land use combination [7, 8]. As a result, in places with a lower mix, the separation of jobs and housing is frequently more obvious. For long-distance commuting, workers are more inclined to use more efficient motorized transportation methods [36]. More locations for everyday activities (shopping, entertainment, dining, etc.) for individuals in employment can be found in communities with a greater mix land use than in areas with a single land use purpose. Large concentrations of residential areas have a greater impact on the separation of jobs and housing than large-scale employment centers, according to other researchers, who also contend that large-scale employment centers and residential areas are more likely to produce an imbalance between jobs and housing [8, 21].

2.2 Study of the Relationship Between Jobs–Housing and Commuting Patterns of Renters in Large Cities

The construction of rental housing has emerged as a key strategy for addressing China's housing crisis during the 14th Five-Year Plan era as a result of the housing crisis brought on by high housing costs in recent years. However, the rental housing market in China, particularly in large cities with a continuous inflow of population, suffers from a lack of supply, poor living conditions, excessive rent increases, a lack of protection for tenants' rights and interests, and a lack of management regulations, as there are no mature and perfect supporting systems and regulations for the design and construction, investment and financing, operation, and management of rental housing [37]. Thus, the significance of studies on urban rental housing has become increasingly apparent, and interest has steadily grown.

The contrast between the quickly rising housing costs and the constrained financial resources of the middle/low-income tenant workers has grown more pronounced in recent years as a result of the rapid increases in housing prices and rent in major cities. On the one hand, the high cost of living restricts the capacity of middle/low-income tenant workers to spend money in other places, which is detrimental to the economy's potential to grow [38]. On the other hand, driven by the agglomeration effect of the economy, the increasing housing prices and rents in the central city force the middle/low-income tenant workers to move to places far away from the employment centers, which further perpetuates the imbalance between jobs and housing and increases the daily traffic pressure, thus increasing the operating costs for the whole city [39, 40]. The unequal socioeconomic growth between various regions within large cities is made worse by the unfavorable position of middle/low-income tenant workers in the competition for good spatial locations [41]. In order to improve the well-being of middle/low-income tenant workers, promote social equity in large cities, relieve traffic pressure in large cities, and promote sustainable development, jobs–housing balance and good accessibility to employment are crucial. Additionally, the internal logical relationship between the built environment and commuting distance needs to be further developed and explored.

2.3 Advances in Nonlinear Research Methods in the Built Environment and Commuting Behavior

Even though numerous studies have looked into how the built environment factors affect how residents commute, the majority of studies assume a linear relationship between the built environment and commuter behavior, omitting the nonlinear effect route. The primary drawback of classic linear regression is that only one variable can be analyzed at a time, making it impossible to identify any potential associations between variables [42]. Recent studies in transportation and planning have shown that in most cases, there is a significant nonlinear association between the built environment and residential travel behavior, and there are some differences in trends in nonlinear effects among built environment factors [43, 44].

With the development and popularization of artificial intelligence and machine learning in recent years, various novel techniques have been created and are being employed in the study of nonlinear interactions in urban problems. Ding et al. [42] evaluated the nonlinear processes of the effects on residents' commuting behavior and used machine learning methods to verify the influence of the built environment on residential and employment sites separately. Other researchers have used machine learning techniques to investigate the relationship between the built environment in the vicinity of metro stations and the intention to commute by subway [45, 46]. According to the findings, there is a positive correlation between job density and subway journeys up to a certain point, after which there is no longer a connection. The link between walking distance and geographical features was investigated by Tao et al. [46] using registered survey data gathered in the American metropolitan districts of Minneapolis and St. Paul, Minnesota. The findings revealed a nonlinear association between spatial features and walking distance. Using a machine learning model based on building census data from 2008 and 2014 in Shenzhen, Yang et al. [47] investigated the nonlinear relationship between subway station accessibility and land development intensity (i.e., change in building area). They came to the conclusion that subway station accessibility indicators were more significant in predicting changes in building area than other transportation modes. Additionally, when stations are close to significant employment centers or commercial areas, the relationship between neighboring stations and urban vibrancy is strengthened, and the relationship between rail corridors and land use characteristics is also strengthened [48].

Overall, a large number of studies have verified the relationship between built environment factors and residents' commuting behaviors, and the urgent need to solve the housing problem in mega cities also requires an in-depth exploration of the internal logic between the built environment of subdistricts and commuting distance. However, most existing studies assume a linear relationship between the built environment and commuting behavior, ignoring the more complex and variable nonlinear relationships that may exist between them, and the results of these studies lack precision and effectiveness as a basis for policy guidance. This paper will use multi-source data to expand the nonlinear relationship between the built environment and commuting distance from the perspective of middle/low-income tenants, with the aim of providing a scientific basis for the development of strategies to improve the jobs–housing conditions of middle/low-income tenants and the selection of residential sites.

3 Data

3.1 Research Subjects and Study Areas

The Beijing tenant employees with middle/low incomes are the focus of this essay. This article defines renters as those with a monthly after-tax income of less than 6000 yuan based on the average disposable income of urban residents in Beijing in 2020 (69,434 yuan), according to the identification criteria and calculation techniques reported in a previous study [49]. The research subjects must also meet the following three requirements in order to ensure a proper analysis of the micro factors influencing the jobs–housing spatial relationship of middle/low-income tenant workers, taking into account the general characteristics of middle/low-income tenant workers in Beijing and the practical significance of this study:

  • The housing lease is a market-oriented lease.

  • The renters have lived in Beijing for more than six consecutive months.

  • The renters have been employed in Beijing.

The questionnaires were collected between February and May 2020; 4176 questionnaires were distributed, of which 3819 were returned, with a validity rate of 91.45%. As shown in Figure 1, the study covers nine administrative districts in Beijing, including the six central districts (Xicheng, Dongcheng, Haidian, Chaoyang, Shijingshan, and Fengtai districts), as well as Changping, Daxing, and Tongzhou districts, where the rental population is relatively large.

It should be emphasized that even though the survey was conducted while COVID-19 was spreading, respondents whose place of residence and employment were impacted by the outbreak were still required to answer the survey questions in the same manner as they had before the outbreak. The majority of respondents (83.3% of the entire valid sample) in this study were 16–29 years old. The Lianjia website's rental transaction data for 2019 and 2020 shows that this age group accounts for a larger proportion of renters and that the difference in rental transaction volume between the two most recent years is relatively small.

3.2 Data Collection, Statistical Analysis, and Preprocessing

3.2.1 Data Sources

The research data in this paper include the following: (1) resident population data (2018) and employment data (2013) at the subdistrict level obtained from Baidu Huiyan big data and the third economic census in Beijing, respectively; (2) socioeconomic data at the individual level of the middle/low-income tenant workers obtained through questionnaires (2020); (3) transportation facilities data and road network data mined on Internet maps using big data methods (2019); and (4) building volume and road area data of different functional types obtained from the census of geographical conditions in Beijing (2019).

3.2.2 Variable Settings

As shown in Table 1, in order to study the relationship between the built environment around the residential areas of middle/low-income tenant workers and their commuting distance, the variables involved in this study are divided into three categories: (1) factors related to the built environment surrounding the residence of the middle/low-income tenant workers are considered independent variables, including rail station density, bus station density, road area ratio, intersection density, resident population density, job density, residential building ratio, office building ratio, commercial building ratio, and land use mix; (2) the socioeconomic characteristics of the middle/low-income tenant workers are considered control variables, including age group, education status, marital status, monthly after-tax household income, rental type, household registration status, and commuting mode; (3) the spatial distance between the workplace and residence of middle/low-income tenant workers is regarded as a dependent variable (Fig. 1).

Table 1 Definition and expression of variables
Fig. 1
figure 1

Research area

3.2.3 Descriptive Analysis

As shown in Table 2, the questionnaire sample for this study has the following in terms of socioeconomic characteristics: nearly 80% of respondents are under the age of 30; the education level is relatively high (more than 80% have a college degree or above); more than three fourths of the respondents have an average monthly income of less than 4000 yuan after tax; the vast majority (nearly 95%) of respondents do not have a Beijing hukou; co-renting with others and owning a private room is the main rental arrangement (about two thirds). In terms of commuting characteristics (Table 2), the sample of middle/low-income tenant workers mainly used public transportation (including “bus/job shuttle” and “subway/light rail,” accounting for about 58.8% of the sample) and slow traffic (including “walking/cycling,” approximately 33.8% of the sample) to commute, while only 7.4% commuted by car (including “private car/taxi”). Among the respondents, the middle/low-income tenant workers who used “metro/light rail” had the longest commuting distances, with a median of 11.63 km.

Table 2 Descriptive analysis of samples’ commute distance

3.2.4 Data Preprocessing

  1. (1)

    Multicollinearity test of independent variables

    Although gradient-boosting decision trees (GBDT), a decision tree-based model, is not affected by multicollinearity in prediction, considering that this study has many independent variables and involves certain causal analysis, we use the correlation coefficient method to test the multicollinearity of 10 built environment independent variables.

    As shown in Table 3, the correlation coefficients of the variables “crsden” and “rdper,” “resFAR,” “busFAR,” and “landuse” are greater than 0.7, indicating that there is a multicollinearity problem between these built environment independent variables. In order to ensure the reliability of the data results, the three variables “crsden,” “resFAR,” and “busFAR” are excluded from the analysis in this paper.

    Table 3 Multicollinearity test
  2. (2)

    Calculation of land use mix

    This paper refers to existing research and the information entropy method to calculate the land use mix [50]. The specific formula is as follows:

    $${\text{Landusemi}}\;x_{{\text{i}}} = \frac{{ - \sum\nolimits_{{{\text{i}} = 1}}^{{\text{k}}} {P_{{{\text{ki}}}} \ln \left( {P_{{{\text{ki}}}} } \right)} }}{\ln k}$$
    (1)

    The constraints are:

    $$\sum\nolimits_{{{\text{k}} = 1}}^{6} {P_{{{\text{ki}}}} = 1,\,{\text{i}} = 1,...,\,6}$$
    (2)

    Here, \({\text{Landusemi}}\;x_{{\text{i}}}\) is the land use mix of the subdistrict represented by entropy, and \({\text{k}}\) is the number of land use types of the subdistrict \({\text{i}}\). According to the data regarding the number of different building functional types in the Beijing geographical census, this paper selects residential, office, commercial, cultural and entertainment, and public services as types of building functions (land use) related to jobs–housing activities and daily activities of the middle/low-income tenant workers. Then the other building (land use) types are grouped into one category, and thus \(K = 6\). The proportion of each land use type to the total construction volume of the subdistrict \(P_{{{\text{ki}}}}\) indicates the proportion of the land use type of \({\text{k}}\) to the total construction volume of the subdistrict. According to Eq. (1), the \({\text{Landusemi}}\;x_{{\text{i}}}\) value is between 0 and 1, and the higher the value, the more balanced the distribution of buildings of various functional types of the subdistrict, that is, the higher the land use mix. Conversely, the lower the value, the more uneven the distribution of buildings of various functional types in the subdistrict, that is, the lower the land use mix.

4 Modelling Approach

In order to better explore the mechanism underlying the influence of the built environment around the residence on the commuting distance of the middle/low-income tenant workers, this study introduces the gradient-boosting decision trees (GBDT) model in the field of machine learning for analysis. As a machine learning algorithm, GBDT has the following main advantages [42]: (1) The GBDT model can easily capture the possible nonlinear relationship between the measured variables; (2) GBDT can continuously adjust and correct the influence weights of independent variables through phased learning data to improve the accuracy of model prediction; and (3) GBDT can solve the multicollinearity problem between independent variables to a certain extent.

Considering the close relationship between different commuting modes and commuting distance, this study divided the sample into four groups (slow traffic, ground bus, rail transit, and car) according to their commuting modes, and then substituted them into GBDT for analysis.

First, we assume that \(x\) is a set of independent variables (including factors related to the built environment around the residence, commuting mode, and individual socioeconomic characteristics of the middle/low-income tenant workers), \(F\left( x \right)\) is the approximation function of the dependent variable \(y\) (job–residence distance), and GBDT estimates the function \(F\left( x \right)\) based on the basis function \(I\left( {x;\varepsilon_{{\text{m}}} } \right)\) after iterating for multiple rounds. According to existing research [42, 51], the GBDT is expressed as followed:

$$F\left( x \right) = \sum\limits_{{{\text{i}} = 1}}^{{\text{m}}} {f_{{\text{m}}} \left( x \right)} = \sum\limits_{{{\text{i}} = 1}}^{{\text{m}}} {\alpha_{{{\text{jm}}}} I\left( {x;\;\;\varepsilon_{{\text{m}}} } \right)}$$
(3)

Parameter \(\varepsilon_{{\text{m}}}\) is expressed as the split variable, split position, and mean of leaf nodes in each regression tree \(I\left( {x;\varepsilon_{{\text{m}}} } \right)\), and \(\alpha_{{{\text{jm}}}}\) is estimated by minimizing a specified loss function \(L\left( {y,\,f\left( x \right)} \right) = 1/2\left( {y - f\left( x \right)} \right)^{2}\). In addition, Friedman [52] proposed using the gradient boosting algorithm to estimate the parameters, that is, using the negative gradient of the loss function instead of the residuals in the boosting algorithm. The specific formula can be derived in the following steps [42, 53]:

First, initialize the weak learner \(f_{0} \left( x \right)\):

$$f_{0} \left( x \right) = \arg \min_{\varepsilon } \sum\nolimits_{{{\text{i}} = 1}}^{3819} {L(y_{{\text{i}}} ,\;\varepsilon )}$$
(4)

Then, for iteration rounds \({\text{m}}\left( {{\text{m}} = 1,\,2,\,3,...,\,{\text{M}}} \right)\):

  1. (a)

    Calculate a negative gradient (residual) \(\varepsilon_{{{\text{im}}}}\) for each middle/low-income tenant worker sample \({\text{i}}\left( {{\text{i}} = 1,\,2,\,3,...,\,3819} \right)\):

    $$\varepsilon_{{{\text{im}}}} = - \left[ {\frac{{\partial L\left( {y_{{\text{i}}} ,\,f\left( {x_{{\text{i}}} } \right)} \right)}}{{\partial f\left( {x_{{\text{i}}} } \right)}}} \right]_{{f\left( x \right) = f_{{{\text{m}} - 1}} \left( x \right)}}$$
    (5)
  2. (b)

    Based on the residual \(\varepsilon_{{{\text{im}}}}\) obtained in step (a) as the new benchmark value for the sample, and the data \(\left( {x_{{\text{i}}} ,\;\varepsilon_{{{\text{im}}}} } \right)\),\(\left( {{\text{i}} = 1,\,2,\,3,...,\,3819} \right)\) as the training data for the next tree, a new regression tree \(f_{{\text{m}}} \left( x \right)\) is obtained, and the corresponding leaf node region is \(A_{{{\text{jm}}}}\), where \({\text{j}}\left( {{\text{j}} = 1,\,2,\,3,...,\,{\text{J}}} \right)\) is the number of leaf nodes of the regression tree.

  3. (c)

    Calculate the best-fit value \(\varepsilon_{{{\text{jm}}}}\) for leaf region \({\text{j}}\) :

    $$\varepsilon_{{{\text{jm}}}} = \arg \mathop {\min }\limits_{\varepsilon } \sum\limits_{{x_{{\text{i}}} \in A_{{{\text{jm}}}} }} {L\left( {y_{{\text{i}}} ,\,f_{{{\text{m}} - 1}} \left( {x_{{\text{i}}} } \right) + \varepsilon } \right)}$$
    (6)
  4. (d)

    Update the strong learner \(f_{{\text{m}}} \left( x \right)\):

    $$f_{{\text{m}}} \left( x \right) = f_{{{\text{m}} - 1}} \left( x \right) + \sum\limits_{{{\text{j}} = 1}}^{{\text{J}}} {\varepsilon_{{{\text{jm}}}} I\left( {x \in A_{{{\text{jm}}}} } \right)}$$
    (7)

Finally, the computation is concluded and the final learner \(f\left( x \right) = f_{{\text{M}}} \left( x \right)\) is obtained.

In this study, in order to suppress the overfitting problem that may occur during GBDT operation, we limit the residual learning results of each regression tree by introducing the learning rate factor \(\phi \left( {0 < \phi \le 1} \right)\) (shrinkage) [42].

$$f_{{\text{m}}} \left( x \right) = f_{{{\text{m}} - 1}} \left( x \right) + \phi \cdot \sum\limits_{{{\text{j}} = 1}}^{J} {\varepsilon_{{{\text{jm}}}} I\left( {x \in A_{{{\text{jm}}}} } \right)} ,\;0 < \phi \le 1$$
(8)

Each tree will multiply the learning rate factor \(\phi\) to minimize the loss function. However, this generates more regression trees and greatly increases the number of learners. In addition, another parameter is the complexity of the regression tree (the number of leaf nodes). In order to capture the complex interrelationships between variables, it is often necessary to increase the number of leaf nodes of the regression tree. That is, the optimal fit of the GBDT model depends on the combined effect of learning rate, number of regression trees, and complexity.

This study uses the mean absolute percentage error (MAPE) to calculate the residual of predicted values to test the fit of the GBDT model [54].

$${\text{MAPE}} = \left( {\frac{1}{R}\sum\limits_{{{\text{r}} = 1}}^{{\text{R}}} {\left| {\frac{{P_{{\text{r}}}{\prime} - P_{{\text{r}}} }}{{P_{{\text{r}}} }}} \right|} } \right)\; \times \;100\%$$
(9)

Here, \(R\) is the number of low-income tenants in the sample, and \(P_{{\text{r}}}\) and \(P_{{\text{r}}}{\prime}\) are the actual and predicted space distance of jobs–housing of middle/low-income tenant workers \(r\), respectively. The smaller the value of \({\text{MAPE}}\), the better the fit of the model, and conversely, the larger the value of \({\text{MAPE}}\), the worse the fit of the model. Generally speaking, when the value of \({\text{MAPE}}\) is between 0 and 15%, the model prediction result is considered good, that is, it passes the test of fitting degree.

5 Results and Discussion

5.1 The Impact of Built Environment Factors on the Commuting Distance of Middle/Low-Income Tenant Workers Who Use Slow Modes of Transport to Commute

From the calculation results (Table 4), it can be seen that for middle/low-income tenant workers, the factors related to the built environment around their residential areas (accounting for 78.719% relative importance) have an important impact on their commuting distance. In contrast, the individual socioeconomic characteristics of middle/low-income tenant workers (accounting for 21.281% relative importance) have a relatively small effect on their commuting distance.

Table 4 The relative influence of built environmental factors on the job–housing relationship of middle/low-income tenant workers who commute via a slow mode

Among the factors related to the built environment surrounding the residential area, the land use mix (accounting for 20.907% relative importance), rail transit station density (19.597% relative importance), bus station density (18.763% relative importance), and employment density (14.406% relative importance) have the greatest impact on the commuting distance of the middle/low-income tenant workers. Among these factors, land use mix is negatively correlated with commuting distance, that is, the higher the land use mix around the residence of middle/low-income tenant workers, the shorter the commuting distance, and vice versa. In particular, the negative relationship with commuting distance is most significant in the range of 0.3–0.5 (Fig. 2). The general work that middle/low-income tenant workers engage in is relatively uniform and common in spatial distribution, and the diversity of land use functions is conducive to a shorter distance to the workplace of middle/low-income tenant workers, which in turn leads to slower commuting on foot or by bicycle. In addition, rail transit station density and bus station density are negatively correlated with commuting distance, that is, the more stations around the residence of middle/low-income tenant workers, the greater their commuting distance will be. Better accessibility of public transport stations is conducive to improving commuting efficiency for middle/low-income tenant workers in terms of limited living and travel costs, thereby expanding the radius of job selection. In other words, for medium- to long-distance commuting, rail transit and buses are a clear substitute for slower travelling modes, and middle/low-income tenant workers who still commute on foot or by bicycle are usually closer to workplaces.

Fig. 2
figure 2figure 2figure 2

Nonlinear influence of built environment factors on the commute distance of middle/low-income tenant workers

Among the individual socioeconomic factors, only monthly income after tax per household (accounting for 9.560% relative importance) and gender (5.678% relative importance) affect the spatial relationship between work and residence in this group. The higher the monthly income after tax of middle/low-income tenant workers who commute by a slow commuting mode, the closer the commuting distance is. Men are more distant than women. Nonetheless, the overall effect of income and gender differences on jobs–housing distance is very limited.

5.2 The Impact of Built Environment Factors on the Commuting Distance of Middle/Low-Income Tenant Workers Who Use Ground Public Transportation

From the calculation results (Table 5), it can be seen that for middle/low-income tenant workers who use ground public transportation, the factors related to the built environment around their residential areas (accounting for 86.277% relative importance) are much more important than their individual socioeconomic factors (13.723% relative importance) in terms of the influence on commuting distance. Among the built environment factors, the relative importance of road network density on commuting distance is the greatest, at 30.431%. As shown in Fig. 2, the road network density of the subdistrict has a positive impact on the commuting distance of middle/low-income tenant workers who commute by bus in the range of 2–5 km/km2, and when the road network density exceeds this range, its increase will not further increase commuting distance. Higher road density can help to reduce traffic congestion to a certain extent, allowing people to travel further during a fixed time. In addition, rail station density (18.954%), bus station density (11.678%), job density (10.430%), and land use mix (8.648%) are also major factors affecting the distance between jobs and residence for this group. In particular, when rail station density is less than 0.4 stations/km2, there is a negative correlation between it and jobs–housing distance for middle/low-income tenant workers who use public transport to commute. On the contrary, bus station density is positively correlated with the spatial distance between work and residence among this group: the higher the density of bus stops, the longer the commuting distance. The increase in bus station density can significantly improve the accessibility for this group. The job density of the subdistrict of residence also negatively affects the commuting distance of middle/low-income tenant workers, and is most pronounced in the range of less than 5000 jobs/km2. Land use mix in the range of 0.35–0.6% negatively affects the spatial distance between the workplace and residence of middle/low-income tenant workers who commute by public transport, that is, the lower the land use mix, the greater the jobs–housing distance.

Table 5 The relative influence of built environmental factors on the job–housing relationship of middle/low-income tenant workers who commute by bus

5.3 The Impact of Built Environment Factors on the Commuting Distance of Middle/Low-Income Tenant Workers Who Use Rail Transit to Commute

As shown in Table 6, the built environment factor (90.567%) dominates the spatial relationship between work and residence for middle/low-income tenant workers who choose to commute by rail. In contrast, individual socioeconomic factors (9.433%) are less influential. Among the built environment factors, job density (34.166%), bus station density (24.787%), road network density (15.412%), and rail station density (9.699%) have the most significant effects.

Table 6 The relative influence of built environment factors on the job–housing relationship of middle/low-income tenant workers who commute by urban rail transit

In particular, the lower the number of jobs in the neighborhood, the greater the distance between work and residence for middle/low-income tenant workers who commute by rail, especially when the job density is below about 4000 jobs/km2. The bus station density and rail station density are positively correlated with the spatial distance between work and residence for this group, and the increase in accessibility of transport facilities will promote the increase in commuting distance. In contrast, the road network density of the subdistrict has a negative effect on the spatial distance, mainly within the range of 2–5 km/km2.

5.4 The Impact of Built Environment Factors on the Commuting Distance of Middle/Low-Income Tenant Workers Who Use Cars to Commute

It is clear from the previous section that the built environment around residential areas is the dominant factoring influencing the commuting distance of middle/low-income tenant workers who commute by slow and public transport. However, for those who rely on motor vehicles for commuting, the effects of the built environment (49.037%) and individual socioeconomic factors (50.037%) on their jobs–housing status is almost the same (Table 7), meaning that there is a significant difference in the effect of the built environment on the commuting behavior of people using different transport modes, which also corroborates the findings of case studies in other countries [42].

Table 7 The relative influence of built environmental factors on the job–housing relationship of middle/low-income tenant workers who commute by car

Among the built environment factors, bus station density (14.638%) negatively affects the jobs–housing distance of middle/low-income tenant workers who commute by motor vehicle. Similarly, job density (13.716%) is also negatively correlated with the distance for this group. In particular, the spatial distance between work and residence of middle/low-income tenant workers who commute by car increases significantly when the job density in the subdistrict where they live is less than about 3000 jobs/km2. Among the individual socioeconomic factors, the most influential one is rental type: the better the rental conditions, the greater the distance between work and residence for this group, and conversely, the poorer the conditions, the closer the group is to their workplace. With limited budgets, middle/low-income tenant workers are forced to make trade-offs between commuting distance and living conditions. In addition, there is a negative correlation between educational level and jobs–housing distance for middle/low-income tenant workers who commute by car: the higher the level of education, the closer the spatial distance between work and residence, and vice versa.

6 Conclusion

This article takes Beijing as a research case, based on multi-source data including a questionnaire survey of middle/low-income tenant workers, the census of geographical conditions, the economic census, and Internet big data. GIS methods are used to calculate the actual commuting distance along road and line networks, and a gradient-boosting decision tree is constructed to enable an in-depth analysis of the nonlinear impact of factors related to the built environment around the residential area on the commuting distance of middle/low-income tenant workers.

The study findings indicate that the built environment near middle/low-income tenant workers will significantly affect how far they must commute. The most significant of these factors are land use mix, bus station density, resident population density, accessibility of subway stations, and employment density. Contrary to earlier research findings on the factors influencing motor vehicle commuters, the socioeconomic features of middle/low-income tenant workers had very little bearing on their commute distance (only around 7% relative importance). The commute distance of middle/low-income tenant workers will also be significantly impacted by the mode of transportation they use: the more effective a commuting mode, the greater the distance that can be travelled. This finding is consistent with the current notion of travel time budgeting. In the case of middle/low-income tenant workers, increasing commuter efficiency can reduce the time needed for people to reach their workplace and thus expand their employment choices.

The built environment surrounding the residential neighborhoods of middle/low-income tenant workers in large cities would have a considerable impact on their real commute distance, as has been established based on the aforementioned research findings. In addition, this paper identified significant elements of the built environment and described their nonlinear effects on commute distance. To some extent, it has solved the primary and secondary problems and parameter problems of optimizing and adjusting built environmental indicators in urban planning and design at medium and micro spatial scales over the long term. The design and optimization of the built environment, as an effective path to improving the working and living conditions of middle/low-income tenant workers in major cities in China, can help improve commuting efficiency, reduce unnecessary excessive commuting, improve work efficiency, and thus improve the development imbalance between different income groups in major cities.

In terms of policy implications, consideration should be given to elements like land use mix, resident population density, and employment density when choosing affordable rental housing or community planning sites, and the optimal matching of indicator data should be achieved whenever possible. Additionally, public sectors need to concentrate on improving the connection between rental properties and public transportation hubs based on travel characteristics and preferences of middle-low income tenant workers. The reasonable distribution and supply of shared bicycles, as well as expansion of parking spaces for bikes and electric bicycles, can increase the connectivity and accessibility of bus stops and subway stations. At the same time, policies should encourage the road network segregation of people and vehicles and enhance walkability while optimizing the slow traffic system in rental housing and residential neighborhoods.

Due to limitations in data acquisition, the employment data in this study came mainly from the third economic census in Beijing in 2013, which is less recent than other data. Additionally, there is currently insufficient diversity in the variables chosen for the construction of environmental factors. This may have affected the outcomes of the analysis. As a result, in future research we will apply more current information and techniques to increase experimental sample sizes, enhance variable settings, and improve data accuracy. In order to strengthen the theoretical foundation of the research, the variations in how environmental factors affect the income levels of rental housing groups and the internal mechanisms that drive them will also be compared and contrasted.