Main

Constructing a science of cities has become a crucial task for our societies, which are growing ever more concentrated in urban systems. Better planning could be achieved with a better understanding of city growth and how it affects society and the environment12. Various important aspects of cities such as urban sprawl, infrastructure development or transport planning depend on the population evolution over time, and multiple theoretical attempts have been made in order to understand this crucial phenomenon.

Growth of cities and Zipf’s law

So far, most research in city growth has been done with the idea that the stationary state for a set of cities is described by Zipf’s law. This law is considered to be a cornerstone of urban economics and geography3, and states that the population distribution of urban areas in a given territory (or country) displays a Pareto law with exponent equal to 2 or, equivalently, that the city populations sorted in decreasing order versus their ranks follow a power law with exponent 1. This alleged regularity through time and space is probably the most striking fact in the science of cities and for more than a century has triggered intense debate and many studies1,2,5,10,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28. This result characterizes the hierarchical organization of cities, and in particular it quantifies the statistical occurrence of large cities. Zipf’s law implies that in any country, the city with the largest population is generally twice as large as the next largest, and so on. It is a signature of the very large heterogeneity of city sizes and shows that cities are not governed by optimal considerations that would lead to one unique size but, on the contrary, that city sizes are broadly distributed and follow some sort of hierarchy16. The empirical value of the Pareto exponent informs us about the hierarchical degree of a system of cities: a large value of the exponent corresponds to a more equally distributed population among cities, and, vice versa, for small exponent values the corresponding system of cities is very heterogeneous with a few megacities.

Studies in economics have suggested that Zipf’s law is the result of economic shocks and random growth processes6,7,8. Gabaix10 proved in a seminal paper that Gibrat’s law of random growth9—which assumes a population growth rate independent of the size of the city—can lead to a Zipf law with exponent 1, at the expense of the additional and untested assumption that cities cannot become too small. This model remains the most accepted paradigm to understand city growth. Since then, it has also been understood using simplified theoretical models (without any empirical arguments) that migrations from other cities or countries are determinant in explaining random growth29. However, although most of these theoretical approaches focus on explaining Zipf’s law with exponent 1, recent empirical studies3,4, supported by an increasing number of data sources, have questioned the existence of such a universal power law and have shown that Zipf’s exponent can vary around 1 depending on the country, the time period, the definition of cities used or the fitting method13,21,30,31 (we illustrate this in Extended Data Fig. 1, showing that no universal result for the population distribution is observed), leading to the idea that there is no reason to think that Zipf’s law holds in all cases32.

Beyond understanding the stationary distribution of urban populations lies the problem of their temporal evolution. As already noted5, the huge number of studies regarding population distribution contrasts with the few analyses of the time evolution of cities. As discussed in that same work5, cities and civilizations rise and fall many times on a large range of time scales, and Gabaix’s model is both quantitatively and qualitatively unable to explain these specific chaotic dynamics.

Therefore, a model able to simultaneously explain observations about the stationary population distribution and the temporal dynamics of systems of cities is missing. In particular, we are not at this point able to identify the causes of the diversity of empirical observations about the hierarchical organization of cities, the occurrence of megacities, and the empirical instability in city dynamics seen in the births and deaths of large cities on short time scales. In this respect, we do not need just a quantitative improvement of models but a shift of paradigm.

In this paper, we show that city growth is dominated by rare events—namely large interurban migratory shocks—rather than by the average growth rate. Rare but large positive or negative migratory flows can destabilize the hierarchy and the dynamics of a city on very short time scales, leading to the disordered dynamics of cities observed throughout history. On the basis of an empirical analysis of migrations flows in four countries, in the following we derive a stochastic equation of city growth that is able to explain empirical observations of the statistics and temporal dynamics of cities.

Deriving the equation of city growth

To understand city growth, we require a robust, bottom-up approach, starting from elementary mechanisms governing the evolution of cities. Without loss of generality, the growth dynamics of a system (such as a country) of cities i of size Si can be decomposed into the sum of an interurban migration term between metropolitan areas and an ‘out-of-system’ term that combines other sources of growth: natural growth (births and deaths) and migrations that do not occur within the system of cities (international migrations and exchanges with smaller towns). We denote by N(i) the set of neighbours of city i, that is, those that exchange a non-zero number of inhabitants. Using the four recent datasets of migrations that we use here (USA, 2012–2017; France, 2003–2008; England and Wales (for simplicity, UK), 2012–2016; Canada, 2012–2016) we find for France and the USA that |N(i)| ∝ \({S}_{i}^{\gamma }\), where γ ≈ 0.5 (Extended Data Fig. 2). The British and Canadian datasets are fully connected, leading to γ = 0. The time (t) evolution of the population size Si can then be written as

$$\frac{\partial {S}_{i}}{\partial t}={\eta }_{i}{S}_{i}+\sum _{j\in N(i)}{J}_{j\to i}-{J}_{i\to j},$$
(1)

where the quantity ηi is a random variable accounting for the ‘out-of-system’ growth of city i; the data show that ηi is Gaussian-distributed (Extended Data Fig. 3). The flow Jij is the number of individuals moving from city i to city j during a period of time dt. If there is an exact balance of migration flows ( Jij = Jji), the equation becomes equivalent to Gibrat’s model9, which predicts a log-normal distribution of populations.

Starting from this general equation (1) is very natural as it amounts to writing the balance of births, deaths and migrations; however—as is often the case when using very general, basic equations—it is difficult to use for making predictions. Simplifications of this equation have been proposed29, wherein various assumptions (such as the gravity model for migration, for example) lead to Gibrat’s model, but miss the very large fluctuations of migrations—as we will see below, this is a crucial ingredient. We also note that this general stochastic equation (1) was discussed in another context33 and is a central object in the statistical physics of disordered systems. With regard to cities, the migration flow Jij depends a priori (and at least) on the populations Si and Sj and the distance dij between cities i and j. Using a standard gravitational model34,35, we show that for France and the USA, the dominant contribution to Jij comes from the populations and that the role of distance appears as a second-order effect (see Supplementary Information for details). This result suggests that the Jij term can be represented by a variable of the form \({I}_{0}{S}_{i}^{\mu }{S}_{j}^{\nu }{x}_{ij}\), where the random variables xij have an average equal to 1 and encode the noise as well as multiple other effects, including distance. We denote by Iji = Jij/Si the probability per unit time and per capita of moving from city i to city j. The left panel of Fig. 1 shows that the ratio Iij/Iji versus the ratio of populations Si/Sj displays, on average, linear behaviour. This implies that μ = ν, and that we have, on average, a sort of detailed balance ⟨ Jij⟩ = ⟨ Jji⟩ (where the angled brackets here denote the average over cities), but that crucially, fluctuations are non-zero. More precisely, if we denote by \({X}_{ij}=(\,{J}_{j\to i}\,-{J}_{i\to j})/{I}_{0}{S}_{i}^{\nu }\) , we observe that these random variables Xij are heavy-tailed—that is, they are distributed according to a broad law that decreases asymptotically as a power law with exponent α < 2 (see Supplementary Information for more details and empirical evidence). The sum in the second term of the right-hand side of equation (1) can then be rewritten as

$$\sum _{j\in N(i)}{J}_{j\to i}\,-{J}_{i\to j}\,={I}_{0}{S}_{i}^{\nu }\sum _{j\in N(i)}\,{X}_{ij},$$
(2)

and, according to the generalized version of the central limit theorem36 (assuming that correlations between the variables Xij are negligible), the random variable

$${\zeta }_{i}=\frac{1}{|N{(i)|}^{1/\alpha }}\sum _{j\in N(i)}{X}_{ij}$$

follows a Lévy stable law Lα with parameter α (for large enough N(i)). This is empirically confirmed in Fig. 1 (right panel): French, US, British and Canadian data are better fitted by a Lévy stable law than by any other distribution and the estimates of α (using different methods) are given in Table 1. We are led to the conclusion that the growth of systems of cities is governed by a stochastic differential equation with two independent noises, which reads as follows

$$\frac{\partial {S}_{i}}{\partial t}={\eta }_{i}{S}_{i}+D{S}_{i}^{\beta }{{\zeta }}_{i},$$
(3)

where DI0, β = ν + γ/α and ηi is a Gaussian noise with mean the average growth rate r and a dispersion σ. This is the growth equation of cities that governs the dynamics of large urban populations; it is our main result here. In equation (3) both noises are uncorrelated and multiplicative, and Itô’s convention here seems to be more appropriate than Stratonovich’s37 because population sizes at time t are computed independently from interurban migration terms at time t + dt. Estimates for the various parameters together with the prediction for the value of β are given in Table 2.

Fig. 1: Migration flow analysis.
figure 1

ad, Analysis for France (a), the USA (b), the UK (c) and Canada (d). Left, migration-rate ratio versus the ratio of populations. The straight line is a power-law fit that gives an exponent equal to one. Right, empirical right-cumulative distribution function of renormalized migrations flows ζi compared to Lévy (continuous red lines) and normal distributions (green dashed lines). See Extended Data Fig. 4 for the left-cumulative distribution function.

Table 1 Estimates of parameter α
Table 2 Estimates of parameters for the four datasets

The central limit theorem, together with the broadness of interurban migration flow, enables us to show that many details in equation (1) are unnecessary and that the dynamics can be described by the more universal equation (3). We conclude that starting from equation (1) is thus less useful than previously thought. The importance of migrations has been previously noted29, but in that work the authors derived a stochastic differential equation with multiplicative Gaussian noise, which we show here to be incorrect: we indeed have a first term with multiplicative noise but also, crucially, we obtain another term that is a multiplicative Lévy noise with zero average. This is a major theoretical shift that is not included in previous studies on urban growth and which has many crucial implications in understanding both the stationary and dynamic properties of cities.

No stationary distribution for cities

Equation (3) governs the evolution of urban populations and analysing it at large times gives indications about the stationary distribution of cities. To discuss the analytical properties of equation (3), we assume that Gaussian fluctuations are negligible compared to the Lévy noise and write ηi ≈ r (see Extended Data Fig. 5). The corresponding Fokker–Planck equation (with Itô’s convention) can be solved using the formalism of fractional-order derivatives and Fox functions38,39,40,41, leading to the general distribution at time t that can be expanded in powers of S as (see Supplementary Information for derivation and complete expressions of all terms):

$$P(S,t)=\mathop{\sum }\limits_{k=1}^{\infty }\,{C}_{k}\frac{a{(t)}^{-\alpha \beta -\alpha (1-\beta )k}}{{S}^{1+\alpha \beta +\alpha (1-\beta )k}}$$
(4)

where Ck is a prefactor that is a function of α, β and k and independent of t and S, and where \(a(t)\propto {\left[{\textstyle \tfrac{r/{D}^{\alpha }}{({{\rm{e}}}^{r\alpha (1-\beta )t}-1)}}\right]}^{1/\alpha (1-\beta )}\) decreases exponentially at large times. This expansion shows that the probability distribution of city sizes is dominated at large S by the order k = 1 and converges towards a Pareto distribution with exponent α ≠ 1. The speed of convergence towards this power law can be estimated with the ratio λ(St) of the first and second terms of the expansion equation (4) and leads to:

$$\lambda (S,t)=\frac{{D}^{\alpha }}{r}{\left(\frac{\bar{S}(t)}{S}\right)}^{\alpha (1-\beta )}$$
(5)

where \(\bar{S}(t)\) is the mean city size. If λ(S) ≳ 1, the α-exponent regime is not valid in the right tail with threshold S at time t. Estimates of α and β for the four datasets show that finite-time effects are very important in all cases and that a power-law regime is only reached for unrealistically large city sizes (see discussion in Supplementary Information). Hence, the range of city sizes for which we can observe a power-law distribution may not exist in practice and there is no reason in general to observe Zipf’s law or any other stationary distribution. We also note that from equation (4) there is a scaling of the form \(P(S,t)={\textstyle \tfrac{1}{S}}F\left({\textstyle \tfrac{S}{\bar{S}(t)}}\right)\) with a scaling function F that depends on the country. We confirmed this scaling form for France (the only country for which we had sufficient data); details can be found in Supplementary Information (see also Extended Data Fig. 6).

In addition, if we perform a power-law fit of the expansion (equation (4)), the upper tail of the city-size distributions may be mistaken for a Pareto tail with a spurious exponent that changes with the definition of the upper tail (Extended Data Fig. 7). This might explain the discrepancies observed in the literature on Zipf’s law. As city sizes increase, the apparent exponent changes and can dramatically deviate from 1, as we initially observe in Extended Data Fig. 1. Following our analysis, the apparent exponent should converge towards the value given by α, as is indeed observed in, for example, France (α = 1.4) and the USA (α = 1.3).

Dynamics: splendour and decline of cities

The validity of our model (equation (3)) can be further tested on the dynamics of systems of cities over large periods of time. This can be done by following the populations and ranks of the system’s cities at different times with the help of ‘rank clocks’, as previously proposed5. In that work, it was proven that the micro-dynamics of cities is very turbulent, with many rises and falls of entire cities that cannot result from Gabaix’s model (which is, in essence, Gibrat’s model with a non-zero minimum for city sizes). We show in Fig. 2 the empirical rank clock for France (from 1876 to 2015) and for the results obtained with Gabaix’s model and ours (for the other countries, see Extended Data Fig. 8).

Fig. 2: Rank clocks for France.
figure 2

We compare the real dynamics of the 500 largest French cities between 1876 and 2015 (left) to Gabaix’s statistical prediction (middle) and to our statistical prediction (right). On the clocks, each line represents a city rank over time where the radius is given by the rank and the angle by time. In this representation, the largest city is at the centre and the smallest at the edge of the disk.

We see that in Gabaix’s model (middle), the city rank is stable on average, and not turbulent: the rank trajectories are concentric and the rank of a city oscillates around its average position. In the real dynamics (left), cities can emerge or die. Very fast changes in rank order can occur, leading to much more turbulent behaviour. In our model (right), the large fluctuations of Lévy noise are able to statistically reproduce such ebbs and flows of cities. More quantitatively, we first compare the average shift per time \(d=(\,{\sum }_{t}{\sum }_{i=1}^{N}|{r}_{i}(t)-{r}_{i}(t-1)|)/NT\) over T years and for N cities in the three cases (Table 3) and look at the statistical fluctuations of the rank (see Extended Data Fig. 9): we note that Lévy fluctuations are much more able to reproduce the turbulent properties of the dynamics of cities through time. Indeed, the fast births and deaths of cities—due, for example, to wars, discoveries of new resources, incentive settlement policies, and so on—are statistically explained by broadly distributed migrations and are incompatible with a Gaussian noise. Second, we can compare with the empirical data the predictions of the different models for the time needed to make the largest rank jump (see Extended Data Fig. 10 for France, which typically predicts a duration of order 80 years to make a very large jump). We confirm that Gabaix’s model is unable to reproduce these very large fluctuations and that our equation agrees very well with the data.

Table 3 Average rank shift per unit time, d

A new paradigm

In this Article, we build a stochastic equation of growth for cities on the basis of microlevel considerations that is empirically sound and that challenges the paradigm of Zipf’s law and current models of urban growth. We show that microscopic details are irrelevant and that the growth equation obtained is universal. A crucial point in this reasoning is that, although we have on average a sort of detailed balance that would lead to a Gaussian multiplicative-growth process, it is the existence of non-universal and broadly distributed fluctuations of the microscopic migration flows between cities that govern the statistics of city populations. We introduce here a stochastic equation that describes city growth that includes two sources of noise and that predicts an asymptotic power-law regime. However, this stationary regime is not generally reached and finite-time effects cannot be discarded. Our model is also able to statistically reproduce the turbulent micro-dynamics of cities that rapidly rise and fall, in contrast with previous Gaussian-based models of growth5.

In addition, our fundamental result exhibits an interesting connection between the behaviour of complex systems and non-equilibrium statistical physics for which microscopic currents and the violation of detailed balance seem to be the rule rather than the exception11. At a practical level, this result also highlights the critical effect of not only interurban migration flows (an ingredient that is not generally considered in urban-planning theories), but also, more importantly, their large fluctuations—which are ultimately connected to the capacity of a city to attract a large number of new citizens. Our approach, which relies in essence on the population budget description and empirical results, provides a solid ground for future research on the temporal evolution of cities, a central problem in urban science.

Methods

For each of the four countries we build a graph of migration flows between metropolitan areas. We have (1) the populations of metropolitan areas and (2) the migration flows between metropolitan areas (described in more detail below).

US migrations

Data of migrations in the USA are taken from the 2013–2017 American Community Survey (ACS)43. Aggregated metro-area-to-metro-area migration flows and counterflows are directly given between 389 metropolitan statistical areas in the USA. More precisely, the ACS asked respondents whether they lived in the same residence one year ago; for people who lived in a different residence, the location of their previous residence was collected.

French interurban migrations

Data of migrations in France are taken from the 2008 INSEE report for residential migrations at the town (commune) level for each individual household44. The main residence in 2008 is compared to the main residence in 2003. In order to work at the urban area level, we used the 1999 INSEE list of urban areas and aggregate residential migrations at the metropolitan level, enabling us to analyse migration flows between the 500 largest urban areas in France.

UK interurban migrations

Data of migrations in the UK are taken from 2012–2016 ONS reports on internal migration between English and Welsh local authorities, giving the square matrix of moves each year45. In order to work at the urban area level, we used the list of local authorities by OECD functional urban areas and aggregate residential migrations at the metropolitan level, enabling us to analyse migration flows between the 41 largest urban areas in England and Wales.

Canadian interurban migrations

Data of migrations in Canada are taken from 2012–2016 census reports on internal migration between Canadian metropolitan areas46. Flows between these areas are given city-to-city for each year between 2012 and 2016 for the top-160 largest cities in Canada.