1 Introduction

Mobility as the fabric of human societies has been studied for understanding the mechanism of movements [17], diffusive processes [8, 9], and its association with socioeconomic attributes [1015] at both individual and population levels. Taking advantage of rich data, several population-level mobility models were developed to describe travel patterns as a function of geographical factors, e.g., population distribution and travel distance. The gravity model [1618], intervening opportunities model [19, 20], and radiation model [4, 2123] have been the leading concepts for the population-level mobility models [24].

Among such mobility models, the gravity model has remained the representative population-level model, capturing the traffic in a simple form resembling Newton’s gravity law; the traffic volume from one region to the other is proportional to the product of population sizes of origin and destination regions and inversely proportional to the distance between those regions [16, 17]. Precisely, its generalization is written as follows:

$$\begin{aligned} T_{ij} = G \frac{m_{i} m_{j}}{r_{ij}^{\gamma}}, \end{aligned}$$
(1)

where \(T_{ij}\) is the traffic volume between the origin region i and the destination region j, and \(r_{ij}\) is the geographical distance between regions i and j. \(m_{i}\) and \(m_{j}\) are their population sizes or traffic volumes, while G is a coefficient. The distance exponent γ is to denote the deterrence of mobility on distance, and it usually takes a value ranging from 0.5 to 3 [25]. Note that the deterrence function of the gravity model has been observed in multiple forms, such as an exponential function [26], a combination of the exponential and power-law forms [24], and the Hill function [27], besides the common power-law form. The gravity model has been applied to systems of various spatial interactions, including urban mobility [2833], intercity mobility [18, 34, 35], and general inter-regional interactions [3643].

For most applications of the gravity model, the entire set of data for a given geographical unit, e.g., cities or countries, has been analyzed to result in a single distance exponent, effectively ignoring the heterogeneities within the unit. Such heterogeneities can be accounted for by grouping the regions within the unit and then by separately estimating the distance exponent for each pair of origin and destination groups. This possibility of having multiple values of the distance exponent within the same unit has recently been suggested in a theoretical study considering heterogeneous population landscapes [44].

In our work, we test the validity of such “multiple” gravity laws on the commuting data in the twelve largest cities of the United States of America (USA) only using their traffic volumes, not the population data. For each city, we first divide urban areas into 10 groups by their traffic volumes and estimate the value of the distance exponent between each pair of groups. We find the varying patterns of the distance exponent for all cities analyzed and their commonality across different cities. Then, we compare the accountability of the multiple gravity laws to that of the conventional gravity model concerning only a single distance exponent for the entire city. We also discuss the origin of the multiple exponents in terms of the core-periphery structure of the city and travel costs. Our findings suggest that the gravity model on urban landscapes could be fine-tuned to incorporate the broad spectrum of urban movements for better understanding, estimation, and prediction.

2 Data and methods

We analyze the commuting dataset processed from the LEHD Origin-Destination Employment Statistics (LODES) data in 2018 [45], where LEHD stands for the Longitudinal Employer-Household Dynamics project of the United States Census Bureau. The LODES data is a census survey connecting homes (origins) and workplaces (destinations) at a census block group (CBG) level. Here, census blocks are the smallest geographical unit for sampling the data, and a CBG consists of clusters of blocks, typically containing a few thousand people [46]. We use the Python module geopandas to derive geometric centroids of CBGs from geographical boundaries of CBGs in 2018 [47]. The LODES data includes the number of trips between such CBGs.

In our work, we choose the twelve most populated Metropolitan Statistical Areas (MSAs) in the USA, which we call cities hereafter. For each city, we divide the entire city into \(1\text{ km} \times 1\text{ km}\) square cells within the MSA boundary in 2018 [48] on the Universal Transverse Mercator (UTM) coordinate system. Then each cell may contain several CBGs; the CBGs whose centroids are located in the same cell are merged to represent the traffic volume of the cell. Cells containing no CBGs’ centroids are ignored for the analysis. Table 1 shows the number of cells and the total number of trips between cells for each city.

Table 1 Twelve most populated USA cities for the analysis with information on the number of cells and the number of trips within each city

We describe the data analysis framework [see also Fig. 1(a–c)]. Each city has N cells, and the numbers of trips, \(T_{i\to j}\), from the ith cell to the jth cell for the pair of \(i,j\in \{1,\ldots ,N\}\) are given. Note that \(T_{i\to j}\neq T_{j\to i}\) in general, since the commuting dataset only describes one-way trips from homes to workplaces. To illustrate the bidirectional traffic flow, we symmetrize the traffic volume between two cells as follows:

$$\begin{aligned} T_{ij}\equiv T_{i\to j}+T_{j\to i}. \end{aligned}$$
(2)

Note that \(T_{ii}=2T_{i\to i}\) by definition. The total number of trips for the ith cell is obtained by summing \(T_{ij}\) over all js:

$$\begin{aligned} T_{i}\equiv \sum_{j=1}^{N} T_{ij}. \end{aligned}$$
(3)
Figure 1
figure 1

Data analysis framework (a–c) and its application to the case of Chicago (d,e). (a) For a given city, cells in a square grid are grouped into 10 groups according to their traffic volumes \(T_{i}\) in Eq. (3). The traffic volume \(T_{ij}\) and geographic distance \(r_{ij}\) between a cell i of group k (red) and a cell j of group \(k'\) (blue) are identified. (b) Then, the distance exponent \(\gamma _{kk'}\) is estimated using the gravity model in Eq. (5). (c) Estimated values of the distance exponent, i.e., \(\gamma _{kk'}\) for \(k,k'\in \{1,\ldots ,10\}\), form the exponent matrix Γ. (d) Conventional estimation of the distance exponent using the whole set of data for Chicago, resulting in a single distance exponent \(\gamma _{\mathrm{s}}\approx 0.53\) (dashed line) in Eq. (4). (e) Empirical confirmation of multiple gravity laws within Chicago with different values of \(\gamma _{kk'}\) for some cases with \(k=k'\) (dashed lines)

The conventional gravity model assumes that the rescaled traffic volume between cells i and j, defined as \(T_{ij}/(T_{i}T_{j})\), decays with the geographical distance between those cells, \(r_{ij}\), as

$$\begin{aligned} \frac{T_{ij}}{T_{i}T_{j}}\propto r_{ij}^{-\gamma _{\mathrm{s}}} \quad \text{{for}} \ i,j\in \{1,\ldots ,N\}, \end{aligned}$$
(4)

where \(\gamma _{\mathrm{s}}\) is the distance exponent for the entire set of pairs of cells in the city. Here we have rescaled \(T_{ij}\) by the multiplication of traffic volumes of cells i and j, i.e., \(T_{i}T_{j}\), not by their populations as mentioned for Eq. (1). We estimate the value of \(\gamma _{\mathrm{s}}\) by means of the ordinary least squares linear regression using the equation: \(\log (T_{ij}/T_{i}T_{j})=A-\gamma _{\mathrm{s}}\log (r_{ij})+\epsilon \) with a constant A and an error term ϵ. The fitting range of \(r_{ij}\) is limited to that showing the scaling behavior. Throughout the paper, we set the lower and upper bounds of the fitting range to 1 km and 50 km, respectively. We also calculate the \(R^{2}\) value to quantify the quality of the fitting. For the case of Chicago we estimate \(\gamma _{\mathrm{s}}\approx 0.53\) (\(R^{2}\approx 0.17\)) [Fig. 1(d)].

To investigate the possible variation of the gravity law within the city, we sort the cells according to their traffic volumes, \(T_{i}\), and then sequentially group them into 10 groups with an equal size of \(N/10\). These groups are denoted by \(G_{k}\) for \(k=1,\ldots ,10\), in ascending order of traffic volumes. The group of \(k=1\) is for cells with the smallest traffic volumes, while the group of \(k=10\) is for cells with the largest traffic volumes. Here we have grouped cells with respect to their traffic volumes in accordance with the previous theoretical work [44] that has shown the possibility of multiple gravity laws. We remark that there are alternative grouping methods, e.g., in Refs. [11, 49]. Now we estimate the distance exponent for each pair of groups, say k and \(k'\), assuming the following functional form:

$$\begin{aligned} \frac{T_{ij}}{T_{i}T_{j}}\propto r_{ij}^{-\gamma _{kk'}}\quad \text{{for}}\ i\in G_{k}\ \text{{and}}\ j\in G_{k'}. \end{aligned}$$
(5)

This functional form leads to the equation for the linear regression as \(\log (T_{ij}/T_{i}T_{j})=A-\gamma _{kk'}\log (r_{ij})+\epsilon \). Using the estimated values of \(\gamma _{kk'}\) for all possible pairs of \(k,k'\in \{1,\ldots ,10\}\), we obtain the exponent matrix \(\Gamma =[\gamma _{kk'}]\), as depicted in Fig. 1(c). Note that \(\gamma _{kk'}=\gamma _{k'k}\) as we have symmetrized the traffic volumes between cells such that \(T_{ij}=T_{ji}\), leaving us with 55 distinct values of \(\gamma _{kk'}\). In the case of Chicago, different scaling behaviors are observed with different values of \(\gamma _{kk'}\). The results for several cases with \(k=k'\) are shown in Fig. 1(e); e.g., \(\gamma _{1,1}\approx 0.10\) (\(R^{2}\approx 0.014\)) and \(\gamma _{10,10}\approx 0.94\) (\(R^{2}\approx 0.37\)) among others. Our data analysis framework can be applied to any mobility datasets as long as both \(T_{ij}\) and \(r_{ij}\) are available.

3 Results

We apply the data analysis framework described in the previous Section to the twelve most populated cities in the USA as listed in Table 1. We first observe in Fig. 2 that cells with the largest traffic volumes are concentrated at one or more centers of the cities. For example, cities such as Dallas and Houston have a single center, while cities like Los Angeles apparently have more than one center and those centers are distributed over the cities. On the other hand, cells with smaller traffic volumes are scattered over the cities.

Figure 2
figure 2

Traffic landscapes and exponent matrices for the twelve most populated cities in the USA. Each panel consists of the traffic landscape of the city on the map (top) and the exponent matrix derived from the traffic volumes between cells in the city (bottom). In the landscapes, each cell is colored according to the group it belongs to, following the color bar at the bottom of the figure. A higher value of k indicates the larger traffic volume of the group. For comparison, we mark the estimated distance exponent using the whole data of the city by a white bar in the color bar for exponent values

The exponent matrices visualized in Fig. 2 show that for each city the distance exponent \(\gamma _{kk'}\) has various values according to the group indices k and \(k'\), which clearly evidence the multiple gravity laws within the city. For comparison, we mark as a white bar the value of the distance exponent, \(\gamma _{\mathrm{s}}\), i.e., when using the whole set of pairs of cells in the city, in the color bar for the exponent values in Fig. 2. The finding of various values of \(\gamma _{kk'}\) is consistent with our expectation that various scaling behaviors can be observed within the same city, depending on the populations or traffic volumes of both origin and destination cells [44]. Thus, our method can indeed reveal the hidden diversity of gravity laws that would be overlooked otherwise. In addition, one can say that the distance exponent \(\gamma _{\mathrm{s}}\) might show an average behavior of the multiple gravity laws.

Interestingly, we find a common pattern in the exponent matrices of different cities, despite the huge diversity of the population, geographical constraints, and travel patterns of those cities; e.g., one can see a variety of geographical constraints such as sea, lakes, mountains, and/or neighboring cities in Fig. 2. To be precise, the value of the distance exponent \(\gamma _{kk'}\) tends to be higher between groups of larger traffic volumes. The pairs of groups with large traffic volume and small traffic volume also tend to show higher values of \(\gamma _{kk'}\) than those between groups with small traffic volumes. A possible explanation for such observations will be discussed later.

We examine the quality of the distance exponent estimation in terms of \(R^{2}\) values. In Fig. 3(a), we show the \(R^{2}\) value distribution for \(\gamma _{kk'}\) [Eq. (5)] of each city in a box-and-whisker plot. The distribution is compared to the \(R^{2}\) value for \(\gamma _{\mathrm{s}}\) [Eq. (4)] of the same city, depicted as a black thick vertical line. We observe that the \(R^{2}\) value for \(\gamma _{\mathrm{s}}\) is not always better or worse than those for \(\gamma _{kk'}\). To look at more details, we group 55 exponent values into two subsets of roughly the same number of pairs; one subset includes 25 cases with \(k+k'> 11\) (i.e., pairs for relatively large traffic volumes), and the other subset is for 30 cases with \(k+k'\leq 11\) (i.e., pairs for relatively small traffic volumes). As shown in Fig. 3(b), it turns out that the \(R^{2}\) values for \(\gamma _{kk'}\) with \(k+k'> 11\) tend to be much larger than that for \(\gamma _{\mathrm{s}}\) in most cities, while the \(R^{2}\) values for \(\gamma _{kk'}\) with \(k+k'\leq 11\) show the opposite tendency. The larger \(R^{2}\) values for cases with larger k and/or \(k'\) indicate that multiple gravity laws can explain the mobility pattern for high-traffic regions better than the conventional gravity model characterized by a single distance exponent. On the other hand, the smaller \(R^{2}\) values for cases with smaller k and/or \(k'\) might be due to the fact that cells with relatively small traffic volumes are scattered over the cities and mostly located at the periphery.

Figure 3
figure 3

\(R^{2}\) values in the distance exponent estimation for the twelve most populated cities in the USA. (a) The box-and-whisker plot for each city describes a distribution of \(R^{2}\) values for 55 distance exponents in the exponent matrix, \(\Gamma =[\gamma _{kk'}]\). The black thick vertical line indicates the \(R^{2}\) value for the distance exponent, \(\gamma _{\mathrm{s}}\), when using the whole set of data for each city. (b) The same results as in (a) are presented but by separating the cases with \(k+k'>11\) (pink plots) from the other cases with \(k+k'\leq 11\) (blue plots)

In order to understand the observed common pattern in the exponent matrices, we consider two main factors, namely, the traffic landscape and the travel cost. We find that the effects due to the traffic landscape seem to explain the observed exponent matrices to some extent, which can also be argued in terms of the travel cost. As evident in the traffic landscapes of Fig. 2, all the cities analyzed might be considered to have a so-called “core-periphery” structure whether the number of central areas or centers is one or more [50]. We remark that in contrast to cells with large traffic volumes, comprising the center(s), cells with small traffic volumes are scattered over the cities; see a schematic diagram for the core-periphery structure in Fig. 4. It implies that travel distances between cells in the periphery tend to be more diverse than those between cells in the center, thus possibly weakening the distance dependence of traffic volumes between cells in the periphery. Indeed, as shown in Fig. 2, the distance exponent \(\gamma _{kk'}\) tends to have smaller values (e.g., ≲0.2) for smaller k and \(k'\), while \(\gamma _{kk'}\) has an overall higher value when either k or \(k'\) gets larger.

Figure 4
figure 4

Schematic diagram for multiple gravity laws in a centralized urban landscape. The dark shaded circle at the center and the bright shaded ring at the periphery denote groups of \(k=1\) and \(k=10\), respectively. Distance exponents \(\gamma _{10,10}\), \(\gamma _{1,1}\), and \(\gamma _{1,10}\) characterize the distance dependence of traffic volumes between or within groups. Arrows visualize some possible trips between or within groups

Just because the possible travel distances are diverse does not necessarily mean that the real travel distances are diverse. To examine this issue, we calculate the standard deviation of travel distances between cells in two groups of k and \(k'\) which is defined as

$$ \sigma _{kk'} \equiv \biggl[ \frac{\sum_{i\in G_{k}, j\in G_{k'}}T_{ij} (r_{ij}-\langle r\rangle _{kk'} )^{2}}{\sum_{i\in G_{k}, j\in G_{k'}}{T_{ij}}} \biggr]^{1/2}, $$
(6)

where \(\langle r\rangle _{kk'}\) is an average travel distance between cells in two groups of k and \(k'\):

$$ \langle r\rangle _{kk'} \equiv \frac{\sum_{i\in G_{k}, j\in G_{k'}}{T_{ij}r_{ij}}}{\sum_{i\in G_{k}, j\in G_{k'}}{T_{ij}}}. $$
(7)

As shown in Fig. 5, \(\sigma _{kk'}\) tends to have higher values for smaller k and/or \(k'\) in all cities but Miami. This implies that the diversity of the travel distances between groups seems to be anti-correlated with the distance exponent values, as expected. As for the average travel distance \(\langle r\rangle _{kk'}\), we find in Fig. 6 that the values of \(\langle r\rangle _{kk'}\) for small k and \(k'\) are overall comparable to those for large k and \(k'\), both of which are shorter than those between groups of small traffic volumes and large traffic volumes. This tendency is denoted by long and short arrows in Fig. 4. Considering the fact that the dataset analyzed is for commuting between homes and workplaces, people living in the periphery do not travel so far from their homes, while people going to work to the center from the periphery (or the other way around) need to travel farther than others.

Figure 5
figure 5

Standard deviations of travel distances between groups [Eq. (6)] for the twelve most populated cities in the USA. The color bar for the distance is in km. Note that cities have different ranges of distance in the color bar. Standard deviations of travel distances between groups of relatively small traffic volumes are overall larger than those of relatively large traffic volumes

Figure 6
figure 6

Average travel distances between groups [Eq. (7)] for the twelve most populated cities in the USA. The color bar for the distance is in km. Note that cities have different ranges of distance in the color bar. Average travel distances between groups tend to be the largest between groups with the largest traffic volumes and the smallest traffic volumes than other pairs

Next, we argue the effect of the travel cost for understanding the observed exponent matrices. The larger value of the distance exponent implies the stronger effect of the distance on the traffic volume, e.g., due to the higher travel cost per distance traveled. Here the travel cost can be measured in terms of elapsed time or transportation cost. In this sense, the larger values of the distance exponent within central areas or between central and peripheral areas could be due to the higher travel cost per distance. On the other hand, the smaller values of the distance exponent between peripheral areas could be partly due to the lower travel cost per distance. These arguments might be the case considering various factors in the central areas such as congestion, traffic signs, and speed limits that tend to increase the travel cost [51]. On the other hand, travels between peripheral areas tend to suffer from such factors less often, e.g., by taking highways. Yet our explanation is speculative at most, calling for more detailed empirical analyses in the future.

4 Discussion

In summary, we have devised the data analysis framework for urban mobility patterns and applied it to the dataset of the twelve most populated cities in the USA. We have found that the intra-city mobility patterns can be successfully characterized by multiple gravity laws, which means that the distance exponent value depends on the traffic volumes of the origin and destination regions within the same city. These findings are in contrast to the conventional gravity model characterized by a single distance exponent for a given dataset or area of interest. The common pattern in the distance matrices of different cities is observed, implying some common mechanisms behind such observations.

The dataset we have analyzed has some limits. First, it includes only trips for commuting, but not other kinds of mobility such as shopping or tourism. Second, the dataset does not provide detailed information on the travel trajectory and cost, hampering further analysis to study the mechanisms for the multiple gravity laws.

In particular, information on travel trajectories within city centers and peripheral areas as well as between centers and peripheral areas must be relevant to understanding microscopic mechanisms behind the observed common pattern in exponent matrices. For example, with such information, one could study in more detail the impact of traffic conditions along trajectories between regions on their traffic volumes. Also, different geographical features, such as mountains and lakes, among cities might affect the travel trajectories in such cities, hence help us better understand the variation in our empirical findings. This approach will be complementary to the typical framework of gravity models only considering Euclidean distances between regions in the city.

Our findings suggest that the variation of distance exponent values can be used as an indicator to measure the appropriate dispersion of travel costs, as city centers with high traffic volumes tend to have large values of the exponent. For example, the difference in the exponent values before and after introducing new public transportation, e.g., subway or high-speed train, may be used to infer whether the new transportation has improved or redistributed the travel costs across the city.

Finally, we discuss possible future works. To investigate the mechanisms for the multiple gravity laws within the cities, one can study mathematical models considering the heterogeneous core-periphery structure of urban population and/or different travel costs depending on the mode of travel, etc. Based on the understanding of the mechanisms, one is expected to enhance the prediction and optimization of the mobility pattern within the city.