1 Introduction

The problem of the city size distribution (CSD) is a part of a more general problem: the size distribution of entities for which a “size” can be defined. There are many examples of such entities in various scientific fields such as cities (population), countries (area and population), incomes (amount of income), animal species (number of animals of a particular species), languages (number of people speaking a particular language), words (frequency of a given word in a particular language), mayonnaise (weight of the mayonnaise drops), terrorism (amount of damage of a particular terrorism action), etc. For a detailed bibliography we refer the reader to the Internet site: www.nlsij-genetics.org/wli/zipf). There are several equivalent mathematical possibilities to describe a size distribution: the cumulative distribution function, the density or probability function, and the rank–size distribution. In the case of cities, the sizes are represented by the populations of the cities.

The first representation, which is well known from the work of Zipf, is to sort the cities in a decreasing order, following their sizes. This yields the function S(R), which is commonly plotted on a double logarithmic scale, i.e., Ln S versus Ln R (the Zipf plot). For a given value of S, R can be seen as the number of cities with size larger than or equal to a chosen value S. This means that the function S-1(R) = R(S) is the cumulative distribution function.

In order to find how many cities there are with sizes between a given S and S + dS (dS ≪ S), one can use (−dR/dS) dS. The function D(S) = −dR/dS is the density distribution function. Sometimes the curve R(S) is discontinuous, such that D(S) cannot always be defined, and it is better to plot the histogram of Ln S. If the function S(R) is a power law, S = S0/Rm, the cumulative function and the density function are also power laws: R ∝ 1/S1/m and D ∝ 1/S1 + 1/m. These two last functions diverge when S → 0. However, this is only a pseudo divergence since S does not go to zero but to a minimum size Smini. This last case is often called a Pareto distribution, but the words “Zipf’s law” or “power law” are completely equivalent.

For the particular case of systems of cities, it was proposed that the distribution is a power law (Zipf 1941, 1949) and consequently the log–log plots of the three characteristic functions are straight lines: this is Zipf’s law.Footnote 1 Even if the law is not always observed, it was admitted that it is a good approximation (Rosen and Resnick 1980).

The hidden reason for accepting this law is probably the concept of universality. One of the inherent characteristics of a law is its universality, which has a strong attractive power. Perhaps, scholars have been fascinated by the idea that all the systems of cites over the entire world obey this very simple law (Soo 2005; Alperovich 1984, 1988). Words like “mystery” (Krugman 1996), “intellectual challenge” (Popescu 2003) and “puzzling regularity” (Gabaix 1999) were used to express the admiration for this phenomenon. The mystery increases for those who accept that the exponent of the power law is always equal to −1. Not only an integer, but the exact value: one!

In the case of City Size Distribution, this can be seen as an example of a paradigm, defined by Kuhn (1977) as: “…what members of a scientific community, and they alone, share.” A paradigm is usually considered as a group of laws that are shared by a scientific community. However, in this work we use the term paradigm in an unorthodox way when referring to a single law. We allow ourselves to do so as this unique law is found in many disciplines and is associated with similar phenomena (e.g., the existence of order).

This paradigm, accepted by the community of geographers and economists, claims that in all countries in the world, the relationship between the size and the rank is a power law (i.e., the relationship between the log of the size and the log of the rank is a straight line).

However, some researchers claimed that Zipf’s law is not always observed (Par 1973, 1976; Alperovich 1984; Laherrere and Sornette 1998; Li 2002; Benguigui and Blumenfeld-Lieberthal 2006, 2007a, b). In fact, a detailed analysis of real CSD reveals that there is not in fact a unique law that fits all existing cases. Even if the power law is considered as a good approximation, other formulas give a much better description. Universality is lost. In some cases, in order to find Zipf’s law again, a cutoff is introduced. This is, however, an ad hoc procedure as there are not always convincing justifications for it, unless maybe the desire to obtain the power law (see below).

In this work, we present a different approach, which is based, first on the notion that the paradigm should be ignored and that Zipf’s law is not necessarily always valid. Instead we propose to classify the CSD into classes of universality in analogy with critical phenomena in physics. Following the shape of the rank–size log–log plot, it is possible to define three classes of universality. They are characterized by an exponent α that can be larger than 1, equal to one or smaller than one. The case α = 1 recovers Zipf’s law. We analyze the properties of each class and give real examples for these classes.

2 The verification of Zipf’s law

The high volume of works devoted to this subject indicates that there are numerous cases where Zipf’s law is valid. As an example of Zipf’s law, we present the case of India (see the results of the analysis in “Appendix 1”). Nevertheless, beside the clearcut cases, there are also other situations that worth discussing. The method of verifying the validity of Zipf’s law is relatively simple. It is based on fitting the log–log rank–size curve of a given CSD to the expression:

$$ Ln \, S = A-m \, Ln\,R $$
(1)

where m is the Zipf exponent. A good fit of Eq. 1 to the CSD is obtained when Zipf’s law is valid. In fact, there are two completely different approaches concerning the significance and the verification of Zipf’s law. We describe the first as the “physicist approach”. This, however, does mean that all the researchers using this approach are necessarily physicistsFootnote 2; we describe the second approach as the “economist approach,” which also doesn’t indicate that all the researchers are necessarily economists.

In the first approach the verification is made separately for each country by fitting the real data of the CSD to Eq. 1. One can choose statistical criteria (e.g., the coefficient of determination, R 2) to decide whether the law is verified. These criteria, as will be shown next (see further discussion on statistical methods of validating the fit in “Appendix 1”), it is not sufficient. At this stage, we would like to make two remarks. First, in order to be able to decide if Eq. 1 is verified or not, one has to fit the data to several functions and compare the results, using the same criterion. Naturally, it is not realistic to expect each CSD would be fitted to numerous formulas; thus, we propose to use a visual inspection in order to help decide which formulas might represent the data correctly. This brings us to the second remark: we trust the human mind and believe that a visual inspection can indeed give essential information; particularly it helps deciding if the studied system is homogeneous or not. Often, a low value of the R 2 can indicate an inhomogeneous system of cities. Yet, a high value of R 2 does not always indicate a homogenous system of cities. A striking example for that argument is the case of Venezuela.Footnote 3 When fitting the CSD of Venezuela with Eq. 1, the result seems to be an excellent fit: m = 0.91 with R 2 = 0.997. However, a simple visual inspection (Fig. 1) shows that the system of cities in Venezuela is not homogeneous. It can be divided into two subsystems. This example emphasizes the need for a visual inspection of the rank–size relation of the real data on log–log scales. This gives the possibility to see (in the simple meaning of the word, see with the eye) if the points may be fitted with some mathematical function (not necessarily a straight line). As mentioned above, even if the fit with Eq. 1 is not good, it is largely accepted that Zipf’s law is a good and useful approximation (Rosen and Resnick 1980). This raises a question (which could be considered as naïve): why settle with an approximation when a better description can be achieved by another function? In “Appendix 1” we present the case of the metropolitan areas in the USA and show that a non-linear function yields a much better fit than a straight line.

Fig. 1
figure 1

Log–log rank–size curve of Venezuela. Note the division of the points into two groups, indicating that the cities can be divided into two subsystems

Before examining the second approach, we mention the work of Alperovich (1988, 1995), who uses particular methods to check the validity of Zipf’s law. In the second approach, Zipf’s law is considered a statistical law, and the cities' populations as random variables. Consequently, each country or system of cities is a realization of the statistics. This problem is well known in Statistics, and we describe it next. We start with a given law for a random variable and try to determine the coefficient of this law from a sample with a finite number of values for the variable. In the present case, the null hypothesis can be expressed as follows: given that the cities of a particular country obey the Pareto distribution, should the value 1 (for the exponent m) be accepted or rejected (Soo 2005; Urzua 2000; Gabaix et al. 2003; Gabaix and Ioannides 2004; Nishiyama et al. 2008)? Zipf’s exponent is determined for each country, and with the help of a statistics test (like the t-test and the t-distribution), it is decided if Zipf’s law with m = 1 is acceptable or not. The conclusion of these works is that in the majority of countries, the hypothesis is accepted despite the fact that the measured exponent m is different from unity. This approach is highly problematic as it violently contradicts the intuition. It seems difficult to accept that the law is not rejected when m is considerably different from one or when the coefficient R 2 giving the quality of the fit is small (say smaller than 0.97). However, the most questionable aspect of these works is the fact that there is no serious discussion of what to do with the cases of rejection: is the exponent m different from one or is Zipf’s law not obeyed? Finally, sentences such as: “the results we obtain depend …whether we believe that Zipf’s law holds” (Soo 2005) are also difficult to accept.

To summarize this section, we mention three important problems that are yet to receive a satisfactory answer. The first is the problem of the cutoff. It was often mentioned, and several solutions were proposed (Par 1985), e.g., to cut off at a fixed number of cities (say, for example, the first 50 cities), or at cities above a given size (say 100,000 or 10,000) or at cities such that their total population is a given fraction of the total population of the countries. These choices, however, are somewhat arbitrary. We propose to begin with the idea (not often formulated but implicit in several works) that Zipf’s law may be obeyed only by a homogeneous group of cities, i.e., cities that grew in the same way due to the social, economic and cultural environment. This can be verified by examining (with the eyes) the rank–size curve. If this curve presents discontinuities or breaks in the derivative, the group of cities cannot be treated as homogeneous. This enables choosing a clear cutoff. Figure 2a presents the case of the agglomerations in India (2001) as an example in which there is a break in the derivative for agglomerations with populations smaller than 200,000. Thus, the group of agglomerations larger than this value forms a homogeneous system.

Fig. 2
figure 2

a Rank–size plot of the agglomeration of India. There is a clear break in the slope for agglomeration of approximately 200,000 inhabitants. The straight line is the result of the fit without the agglomerations smaller than 200,000 (see “Appendix 1”). b Rank–size plot of Romania. The shape of this graph precludes the possibility to fit it with a continuous function

The second problem is also related to the definition of a homogeneous group of cities. Most of the work on CSD is based on the assumption that cities that belong to the same country constitute a group worth studying. This assumption ignores the fact that during their history, the frontiers of different countries have occasionally changed. A system of cities in a current country might have developed in different periods under different regimes. Only a historical study permits to discover such phenomena. Recently, we showed (Benguigui and Blumenfeld-Lieberthal 2006) that the case of Romania is an example for this phenomenon. During its history Romania was divided into three regions, each was under a different regime: the Austrian Empire, the Russian Empire and the Ottoman Empire. These empires were very different; thus, it is not surprising that the rank–size curve of Romania is a broken line that no continuous function can fit (see Fig. 2b).

The third problem concerns the largest cities in the distribution. In some countries, the largest city cannot be fitted to the same homogenous distribution that the rest of the cities belong to. This phenomenon is known as “primate city” (Jefferson 1939). Yet we often find cases where several of the largest cities (two and more) belong to a particular group. This phenomenon appears even in cases where Zipf’s law is obtained. The fit may be good since the quality criterion is based on the entire distribution and the largest cities are only a small fraction of the total number of cities. Here also a visual inspection is needed. In Fig. 3 we give the rank–size plot of the Netherlands cities to demonstrate this phenomenon; the fit to a straight line is very good, yet the four largest cities are not well aligned with the rest of the cities. This leads us to another question: should the study of CSD be refined, and in some cases, should the system of cities be divided into more than one group?

Fig. 3
figure 3

Rank–size plot of the cities of the Netherlands. The first four cities seem to form an exclusive group of cities. A fit with a straight line (not shown) gives a good result, but a visual inspection shows the particularity of this distribution

3 The classification

The general conclusion that one can draw from all the work done on Zipf’s law is that it is not always obeyed. The problem of the CSD should be examined more generally. When considering a homogenous group of cites, one should look for all the different possibilities to describe the CSD. For all the cases that do not obey Zipf’s law, other laws should be examined. Is the case of Zipf’s law a particular case? We propose a general answer to this question: the CSD (and also other distributions) can be classified into three classes following the shape of the log–log rank–size plot (Benguigui and Blumenfeld-Lieberthal 2006, 2007a, b). At this stage we cannot provide a complete theoretical explanation that links the different classes to different urban processes, and further work is needed in order to clarify these relations. A partial explanation, however, is provided in the next section.

The first class is characterized by a linear relationship between x = Ln Rank and y = Ln Size. As is well known, it is the case of the Pareto distribution, and in the case of city size distributions, it is Zipf’s law.

The second class is characterized by a parabola-like x-y curve with a symmetry axis parallel to the y axis (see example in Fig. 9).

The third class is characterized also by a parabola-like x-y curve, but with a symmetry axis parallel to the x axis (see below the case of Israel for example).

The x-y curve for the first two classes can be written as (x = Ln Rank, y = Ln Size)

$$ y = y_{0} -m\left( {b + x} \right)^{\alpha } $$
(2)

with α = 1 (and b = 0) for the first class and α > 1 for the second class.

In the case of the third class the x-y curve can be written as:

$$ y = y_{0} + m\left( {b-x} \right)^{\alpha } $$
(3)

with α < 1.

We call the exponent α, the shape exponent since it determines the shape of the log–log rank–size curve. The first two classes are well known, and several mathematical expressions were already proposed for them. The most popular expression for the function y(x) is a polynomial of the second order or third order (Rosen and Resnick 1980). From a practical point of view, our expression is completely equivalent to a polynomial. However, the introduction of the shape exponent enables the classification.

4 The meaning of the classification

In this section we want to address the important question: what are the characteristics of the three different classes?

4.1 Class 1 (α = 1)

The relation between S and R is a power law, S = S0/Rm, where m is the Zipf exponent. The cumulative distribution and the density distribution are also power laws indicating that the number of small cities increased when their size is decreased, as a pseudo divergence (see the relations between the rank–size distribution, the cumulative distribution and the density distribution, presented earlier).

4.2 Class 2 (α > 1)

The cumulative distribution and the density distribution are analogous to those of the class 1, i.e., both curves indicate a divergence toward the small sizes. However, for the large cities the rank–size curve has a parabolic aspect, and it is not a power law.

4.3 Class 3 (α < 1)

The most important property of this class is the fact that the density distribution has a finite value (or even has a maximum) for the smallest sizes. Two well known distributions belonging to this class are the lognormal distribution and the exponential one. The distribution of the largest cities can be approximated by a power law.

In a recent work (Benguigui and Blumenfeld-Lieberthal 2007a, b), we proposed a model for the development of a system of cities characterized by two processes. The first is a random increase of the population of each city, and the second is an increase of the number of cities. We showed that the type of the resulted CSD (i.e., the value of the exponent α) depends on the rate of the creation of new cities. In particular, a very rapid increase of the number of cities yielded α = 1 when small rates induced α to be either greater or smaller than unity. As we explain below, in the case of Israel a linear rate of the creation of new cities resulted in a decrease of the value of the exponent α.

5 An example: the temporal change of α: the case of Israel

We present a real example to describe the meaning of the above classification, particularly when the classification of a system changes with time. We chose the case of the Israeli system of cities as it provides a good representation for this phenomenon. Figure 4 presents the rank–size distribution of the Israeli system of cities for cities with populations larger than 1,000 inhabitantsFootnote 4 for the years 1961 and 2005. We fitted the rank–size curves with either Eqs. 2 or 3, which yielded an interesting result. The exponent α is larger than the one in 1961 (meaning the fit is better with Eq. 2) as seen in Fig. (4), and decreases to a value smaller than one in the subsequent years (the fit is better with Eq. 3) as seen in Figs. 4 and 5. As mentioned earlier, the rate of the creation of new cities in the model was linear and the number of cities increased as: \( N\left( t \right)\sim \,\,t^{2} \), which eventually resulted in a decrease in the values of α.

Fig. 4
figure 4

Rank–size plot for the Israeli system of cities. The lines represent the fit with Eq. 2 for 1961 and with Eq. 3 for 2005

Fig. 5
figure 5

Variation of the shape exponent α with time for the Israeli cities

Following the above, this means that the density function diverges toward the small sizes in 1961, which is not the case for the later years. We show this result in Figs. 6 and 7. In Fig. 6 we calculated the density distribution for the small cities (with populations between 1,000 and 10,000) using the parameters appearing in Eqs. 2 or 3 deduced from the fit. One sees clearly the different behaviors, the pseudo divergence in 1961 and the absence of divergence in 2005. Figure 7 shows the histogram of Ln S. Here also, it is possible to observe the different behaviors of the small cites. In 1961 the histogram has large values consistent with a divergence when in 2005 the histogram presents smaller values on the side of the larger cities without divergence.Footnote 5 To emphasize this evolution, we show in Fig. 8 the proportion of cities smaller than 5,000 and 10,000 inhabitants through time. The fact that this proportion decreases with time indicates a process of urbanization in that the proportion of small cites becomes smaller and smaller. This is our interpretation of the change of the exponent α from a value that is larger than one to one smaller than one.

Fig. 6
figure 6

Density distribution for the small Israeli cities showing the different behaviors in 1961 and in 2005

Fig. 7
figure 7

Histogram of Ln S for 1961 and 2005

Fig. 8
figure 8

Variation of the proportion of cities smaller than 5,000 and 10,000 inhabitants among the Israeli cities

6 Conclusion

As in the case of critical phenomena in physics, we propose to define universality classes in the case of City Size Distribution. Three classes are characterized by a shape exponent α, which can be equal to one (which recovers Zipf’s law), larger than one or smaller than one. The main differences are as follows: On the side of the small cities the density distribution of class 3 (α < 1) does not diverge; it either goes to a finite value for S  S min (the smallest city) or exhibits a maximum. For α ≥ 1 (classes 1 and 2) the density distribution diverges. On the side of the large cities, for α ≤ 1 (classes 1 and 3) the distribution is a power law, which is not the case for α > 1. We presented a particular example of the Israeli system of cities, for which the shape exponent goes to a value larger than one in 1961 and to values smaller than one in the later years. We interpreted this change by a process of urbanization with a relative decrease in the number of small cities in Israel.

We believe that by abandoning the paradigm of Zipf’s law (but not Zipf's law itself!), one can gain new insights on City Size Distribution. Despite the fact that it is not easy to abandon a paradigm, we encourage researchers to look for the advantages of the new presented classification of CSD. Two examples for these advantaged are the possibility to compare different distributions belonging to the same class and to compare the evolution of a distribution through the change in the exponent α.