Keywords

I am pleased to be able to contribute to this volume in honor of David Plane’s contributions to the fields of geography and regional science. Dave and I first met in 1980 at the US Census Bureau, where we were doctoral students working on a project jointly funded by the American Statistical Association, the National Science Foundation, and the Census Bureau. Since that time, we have enjoyed both a career-long collaboration that has been productive and rewarding and a lifelong friendship that has been enjoyable and genuine. Over the course of our careers, we worked together on many topics at the intersection of demography and geography. Some of our early direction and inspiration came from Andrew Isserman, the leader of our project at the Bureau. Following Andy’s passing in 2010, Dave and I coauthored a paper on the Hoover index of population concentration for a special volume of the International Regional Science Review that was put together in memory of Andy. With this chapter, I am delighted to return to this topic. A special word about the title—over the years, Dave and I shared both hotel rooms at conferences and tents in campgrounds while on bike trips. In addition to the many stimulating conversations, Dave learned that I often have vivid dreams—these ranged from falling out of both my bed and then our 35th story window at the Bonaventure Hotel in Los Angeles to shouting away the intruders of my dreams that lurked outside of our tent. I think, then, it is only fitting to include an allusion to this in the title of the chapter.

1.1 Introduction

Substantial change has taken place in the distribution of population across counties over time in the United States. On a long time scale, the population has of course become urbanized, rising from 5.1% urban at the time of the first decennial census in 1790 to 80.7% urban in 2010. An increase has occurred in every decade with the exception of 1810–1820, when the urban population went from 7.3% to 7.2% (US Bureau of the Census 2012).

With respect to more recent change, a plethora of studies emerge whenever new census data are released. A Pew research report notes that urban counties have grown at about the nationwide rate of 13% during the period 2000–2016, while suburban and small urban counties have grown more rapidly (16%) (Parker et al. 2018). Simultaneously, rural counties have grown more slowly, rising just 3% over the period.

Similarly, Frey (2018), in a report from the Metropolitan Policy Program of the Brookings Institution, notes that urban core counties grew more slowly (at an annual rate of approximately 0.5%) than did mature and emerging suburban counties and exurban counties (annual rate of a bit over 1%) during the period 2000–2017. Toward the end of that period, small metropolitan areas began to grow at the expense of large metropolitan areas.

A widely recognized confounding factor in these studies is the definition of terms such as rural, suburban, urban, and exurban. Definitions vary from study to study, and the census definition itself has varied over time. Hall, Kaufman, and Ricketts (2006) provide an overview of many of the available definitions.

1.2 Some Measures of Concentration: With Historical Notes

One of the earliest measures of concentration was described by Lorenz in 1905; the well-known Lorenz curve provides a convenient way to visualize inequality. It is most commonly used for the depiction of income inequality. In that context, after arranging individuals (or groups of individuals) in terms of increasing income, cumulative population is plotted on the horizontal axis against cumulative income on the vertical axis. When there is little inequality, the plot will lie close to the 45° line. As inequality increases, the Lorenz curve bows further outward and away from (and below) the 45° line.

Shortly after, Gini (1912, 1914) introduced a numerical measure of inequality that is directly related to the Lorenz curve. In particular, the Gini coefficient is equal to the fraction of the area lying below the 45° line that is between the 45° line and the Lorenz curve. The Gini coefficient is also equal to half of the relative mean absolute difference between all pairs of incomes:

$$ G=\frac{\sum_{i=1}^n{\sum}_{j=1}^n\sum \mid {x}_i-{x}_j\mid }{2{n}^2\overline{x}} $$

The measure is variously known as the Gini coefficient (1,170,000), Gini index (871,000), and the Gini ratio (141,000), where the numbers in parentheses indicate the number of “hits” in Google (recognizing of course that this is not a perfect indicator of popularity or use).

As noted by Ceriani and Verme (2012), Gini (1912) first described his index in terms of mean differences, in a publication in Italian that has never been translated into English. In that publication, Gini discusses 13 different measures of inequality, of which the Gini coefficient is one. He draws an explicit connection to the Lorenz curve in his 1914 publication—this was eventually published in English, but not until 2005 (Gini 2005). Ceriani and Verme suggest that Gini’s work became more widely known and disseminated following a note by Gini (1921) in English in Economic Journal, where he brought readers to the attention of the work of several Italian researchers. The note focused on measures of inequality and was written in response to an article on the topic in that journal by Dalton (1920).

The Hoover index of concentration, as it is known to geographers and demographers, is equal to the maximum vertical distance between the Lorenz curve and the 45° line. It is also equal to half of the relative mean absolute deviation (from the mean):

$$ H=\frac{\sum \mid {x}_i-\overline{x}\mid }{2n\overline{x}} $$

In the context of income, it is interpreted as the percentage of total income that would have to be redistributed from the rich (defined as those to the right of the vertical line) to the poor (defined as those to the left of the vertical line), to equalize incomes. In the context of population, it is the percentage of population that would have to move from high-density places to low-density places to equalize population density across all spatial units. The Hoover index (13,000) is known variously as the Pietra index (3000), the Robin Hood index (15,000), and the Schutz index (47,000), where the number in parentheses is again the rounded number of “hits” in a Google search.

Hoover’s 1936 paper is often cited as the one where he introduced what is now known as Hoover index of concentration. However, in that paper he used the Gini index, and there is no mention of his index of concentration. He cited Gini as an important contributor to measures of inequality and thanked the Nobel Prize economist Wassily Leontief for pointing this out to him, but he does not appear to have noticed at that time that Gini was responsible for the development of the coefficient; instead, he credited Vinci (1934) for interpreting the coefficient.

Hoover introduced his index of concentration in his 1941 paper. Here he credits Florence and Wensley (1939) for developing the measure, citing their book in the context of the extent of localization of manufacturing industries. Florence (1953) himself notes that he and Wensley first reported their initial work on this measure in 1937, in Economic Journal. And interestingly, in this 1937 work, they cite the work of Hoover as one of several researchers working on other measures.

Hoover’s index actually appears to have first been discussed by Pietra (1915; translated in Pietra 2014), who showed the relations between several measures of inequality/concentration. Giorgi (2014) notes that Pietra showed the “relations existing between the mean difference, the simple mean deviations from the arithmetic mean and the median, and their geometrical interpretation”. Schutz (1951) was a graduate student at Berkeley when he published his explication of the index (apparently without knowledge of and certainly without attribution of the earlier work of Pietra, Florence and Wensley, and Hoover). Despite the existence of earlier formulations, the term “Schutz index” is used much more often than the other names for the measure, perhaps due to its introduction and use within the field of economics.

1.3 Selected Review of Studies of Population Concentration and Deconcentration in the United States

The following review is meant to be illustrative and representative of the types of studies that have focused on population concentration and its measurement in the United States; it is by no means meant to be a comprehensive review.

Duncan et al. (1961) used the Hoover index to show how the US population became increasingly concentrated during the first half of the twentieth century. Lichter (1985) also used the Hoover index, focusing upon the latter half of the century, and the differing degrees of concentration by race.

Plane and Mulligan (1997) argue for the use of the Gini coefficient in population and migration research. They calculate and interpret several Gini coefficients in the context of the US migration system, concentrating on the measurement of the amount of spatial focusing that occurs—either within the entire system or within the sets of inflows and outflows. Rogers and Sweeney (1998) and Rogers and Raymer (1998) apply these measures and the coefficient of variation in their own analyses of US population redistribution.

Lichter and Johnson (2006) examine the spatial patterns of concentration and deconcentration among the foreign-born population that occurred during the 1990s. They find that (a) the foreign-born are dispersing away from metropolitan, gateway cities (although they remain much more concentrated than the native-born) and (b) they are less segregated from other populations than they were in the past—so-called balkanization and isolation are not as acute as they once were.

Plane et al. (2005) examine population distribution in the context of the urban hierarchy. They place their work in historical context by noting that over the long-term net migration has been primarily up the urban hierarchy, leading to increasing population concentration. They use migration data for the period 1995–2000 to look at net flows up and down the hierarchy—the nature of such flows can differ substantially according to age and life course stages and events (such as college, military service, family formation, retirement, etc.).

By far, the majority of attention to population concentration and deconcentration in the United States has been devoted to the “rural renaissance” of the 1970s and the subsequent relative growth rates of rural, urban, and suburban areas. Vining and Strauss (1977) argued that the recently observed deconcentration was a “clean break” from past trends. At about the same time, McCarthy and Morrison (1977) also noted the increased net in-migration that was being experienced by rural, nonmetropolitan counties, signaling the beginning of a new or at least more complex demographic trend. They also found that retirement and recreation were increasingly important as drivers of this migration.

Gordon (1979) argued that the newfound reversal was perhaps due to growth that was extending from metro areas and spilling over into adjacent nonmetropolitan counties and that the “clean break” hypothesis deserved further scrutiny. He used the Hoover index to support this view using data from 18 countries. Bourne (1980) attributed the decreasing concentration to implicit US policies that favored the growth of exurbia (and the lack of policies that encouraged redevelopment within cities).

Work in this area developed rapidly, with Berry (1980) providing an early review of the counterurbanization trend and John Long (1981) producing a book describing the trends toward deconcentration at various spatial scales. Lichter and Fuguitt (1982) focused some of their attention on population distribution within nonmetropolitan areas, showing that there was deconcentration at that level as well. Like Morrison and McCarthy, they found that economic explanations for demographic change in nonmetropolitan areas were of decreasing importance.

The empirical work also led to the development of various theoretical frameworks within which the changes could be understood more broadly. Morrill (1979, 1980) and Geyer and Kontuly (1993) developed conceptual and theoretical frameworks for population and concentration; the latter authors, for example, use data from different countries to suggest that counterurbanization represents the final phase of the first cycle of demographic change in an urban system, and this is followed by a cycle where concentration is again dominant. Morrill (1980) argued against a “clean break” in the 1970s, suggesting that long-term agglomerative forces acted differentially across space and that older and denser places experience out-migration and deconcentration as a stage in the evolution of the geographic landscape.

Fuguitt (1985) was an early reporter of the reversal of the turnaround, reporting that there was a return to concentration in the metropolitan nonmetro system during the early 1980s. Fuguitt also reviewed the literature on the turnaround, emphasizing the point that the focus on trends in concentration and deconcentration had the beneficial effect of drawing more attention to other facets of migration research, including individual migration behavior, preferences, and the relation of migration to employment. Finally, he pointed out that the research generated by the topic was not only broad in scope but also large in its volume. At that time Fuguitt speculated that a return to concentration seemed unlikely. Cochrane and Vining (1988) also provided early evidence of the reversal of counterurbanization that occurred during the 1980s.

Then, during the 1990s, there was a return to the deconcentration witnessed in the 1970s. Long and Nucci (1997a) documented this return to deconcentration using the Hoover index, with counties as the spatial unit. In a second paper, Long and Nucci (1997b) documented this further, simultaneously (a) correcting an error in the original county-based series of Hoover indexes reported by Duncan et al. (1961), (b) extending the Duncan et al. time series both backward and forward in time, and (c) noting that there was also deconcentration at the county level between 1890 and 1910. Johnson and Cromartie (2006) provided corroborating evidence for the renewed deconcentration that took place during the 1990s by using data from the 2000 census.

Domina (2006) found that trends changed again during the late 1990s and the early years of the twenty-first century, with nonmetro areas once again experiencing net out-migration. Domina used data from the Current Population Survey to show that much of the net out-migration was attributable to those with higher levels of education and to suggest that economic factors now carry important explanatory power in understanding nonmetro migration.

Rogerson and Plane (2013) calculated annual Hoover indexes for the period 1990–2009 at a variety of spatial scales, including the county level. Like Domina, they find concentration to be generally increasing at the county level from the late 1990s. Interestingly, they find that births, deaths, and immigration all caused the index of concentration to increase, but net internal migration on its own would have led to deconcentration during the period.

1.4 Measurement: Disaggregating the Lorenz Curve

When the Lorenz curve is split into two pieces by the vertical line representing the Hoover index and the maximum difference between cumulative populations and cumulative areas, each of the two pieces may then be scaled and transformed into a Lorenz curve. Each of these has an associated Hoover index. If desired, each of these in turn may also be split into two pieces and Hoover indexes calculated for each of the new pieces; this process can continue to be repeated. Thus it is possible to examine inequality along portions of the Lorenz curve—in the application that is the focus here, we may examine population concentration for sets of regions that have a particular range of population density.

The Gini coefficient is always at least as high as the Hoover index. The excess is equal to twice the area between the Lorenz curve and the 45° line that lies beneath the triangle created by the end points of the Lorenz curve and the lower point of the vertical line associated with the Hoover index. The Gini coefficient for the full Lorenz curve may be calculated as the sum of the overall Hoover index and a weighted sum of the Hoover indexes associated with the disaggregation described above, where the weights are equal to the proportion of area (or population, in the case of application to income inequality) associated with that segment of the curve (Rogerson 2013).

We now begin with an illustration for the US population in 2010, at the county level. The Hoover index was 66.16 (after rounding to two decimal places); 16.31% of the population lived on 82.48% of the land (and, of course, the other 83.69% of the population lived on just 17.52% of the land). The former group consisted of 2166 low-density counties, and the latter group had 977 high-density counties. The low-density counties had an average of 17.29 people per square mile; the high-density counties averaged 417.5 people per square mile.

Each of these two groups may be examined as a separate subsystem. The line 0B in triangle 0AB in Fig. 1.1 represents what would be expected if all low-density counties had the same density. Similarly, the line BD in triangle BCD represents what would be expected if all high-density counties had the same population density. There is of course concentration within low-density and high-density subsystems, and this is captured by these two triangles.

Fig. 1.1
figure 1

Lorenz curve for US population: 2010

Among the 2166 low-density counties, the 795 with the lowest density have 15.96% of the population on 65.73% of the land area (HL = 65.73 − 15.96 = 49.77). Among the 977 high-density counties, 687 have 34.29% of the population on 76.95% of the area, for a Hoover index of HU = 76.95 − 34.29 = 42.66.

These three breakpoints—one for the entire Unites States and the other two for low- and high-density subsystems—serve to divide the United States into four categories of population density (labeled 1 through 4 on Fig. 1.1). The coordinates of points A, B, C, and D, representing cumulative population and cumulative area, are given in Table 1.1. The table also reports the absolute values of the differences between cumulative area and cumulative population. Note that the highest of these absolute differences (when multiplied by 100) represents the greatest vertical distance between the Lorenz curve and the 45° line and is equal to the Hoover index of 66.16.

Table 1.1 Areas and populations for US counties, 2010

The triangles for the low-density and high-density subsystems are redrawn in Figs. 1.2 and 1.3, respectively. Part (a) of each figure is the triangle as it appears within the original Lorenz curve in Fig. 1.1; part (b) of each figure has scaled the triangle in part (a) to represent a Lorenz curve for the two-region subsystem that the figure represents.

Fig. 1.2
figure 2

Lorenz curve for low density US counties: 2010

Fig. 1.3
figure 3

Lorenz curve for high density US counties: 2010

The length of the maximum vertical distance in part (a) of Figs. 1.2 and 1.3 can be thought of scaled versions of the full Hoover indexes found in part (b). For Fig. 1.2a, the y-coordinate of point P is found as (0.1631/0.8248) (0.5422) = 0.1072. The vertical distance and scaled Hoover index is then 0.1072 − 0.0260 = 0.0812. In Fig. 1.3a, point P has a y-coordinate of 0.1631 + (0.9596 − 0.8248)(1 − 0.1631)/(1 − 0.8248) = 0.8070. The vertical distance and scaled Hoover index is 0.8070 − 0.4501 = 0.3569.

The Gini coefficient (for the Lorenz curve in Fig. 1.1) may be calculated as a weighted sum of these scaled Hoover indexes, where the weights are the proportion of total land area that is associated with each section of the Lorenz curve. Thus

$$ G=.6616+(.8248)(.0812)+(.1752)(.3569)=.7911. $$

The second and third terms in the sum also represent the areas of triangles OAB and BCD in Fig. 1.1 (repeated as Figs. 1.2a and 1.3a), respectively.

More generally, suppose pi is the proportion of total population found in subset i, and ai is the proportion of total area found in subset i (where i is indexed from 1 to 4, corresponding to the sections in Fig. 1.1). Let cpi and cai be the corresponding cumulative proportions. Furthermore, subsets 1 and 2 together comprise the low-density population and subsets 3 and 4 together comprise the high-density population (as determined by the system-wide Hoover index). The Hoover index for the entire system is equal to the absolute value of the difference between cp2 and ca2.

The Hoover index for low-density subsystem in Fig. 1.2b (say HL) is the absolute value of the difference between cp1/cp2 and ca1/ca2. The Hoover index for high-density subsystem in Fig. 1.3b, comprised of Sects. 3 and 4 in Fig. 1.1, is the absolute value of the difference between (ca3ca2)/(1 − ca2) and (cp3cp2)/(1 − cp2).

The scaled Hoover indexes (H′), i.e., the heights of the two sub-triangles in Fig. 1.1, are found as the difference between the y-coordinate of point P and cp1 in Fig. 1.2a and the difference between the y-coordinate of point P and cp3 in Fig. 1.3a. The y-coordinate of point P in Fig. 1.2a is equal to (cp2/ca2)(ca1); in Fig. 1.3a it is equal to cp2 + (ca3ca2)(1 − cp1)/(1 − ca1). Thus

$$ {\displaystyle \begin{array}{c}{\mathrm{H}}_{\mathrm{L}}^{\prime }=\left({cp}_2/{ca}_2\right)\left({ca}_1\right)\backslash, \hbox{--} \backslash, {cp}_1\\ {}{\mathrm{H}}_{\mathrm{U}}^{\prime }={cp}_2+\left({ca}_3-{ca}_2\right)\left(1-{cp}_1\right)/\left(1-{ca}_1\right)\backslash, \hbox{--} \backslash, {cp}_3\end{array}} $$

and this last equation can be simplified a bit by replacing the difference in cumulative proportions (ca3ca2) by the proportion a3 and (cp2cp3) by −p3:

$$ {\displaystyle \begin{array}{c}{\mathrm{H}}_{\mathrm{U}}^{\prime }={cp}_2+\left({a}_3\right)\left(1-{cp}_1\right)/\left(1-{ca}_1\right)\hbox{--} {cp}_3\\ {}=\left({a}_3\right)\left(1-{cp}_1\right)/\left(1-{ca}_1\right)\hbox{--} {p}_3\end{array}} $$

The Gini coefficient is then equal to H + (ca2) HL′ + (1 − ca2) HU′.

1.5 Changes in Population Concentration: 2000–2015

In 2000, when the 2144 counties with lowest pop density are examined (out of a total of 3141 counties), we find that they contain 82.3% of the country’s area, but just 16.7% of the population, leading to a Hoover index of 65.61. These counties have an average population density of 16.14 people per square mile; the remaining 997 high-density counties have an average population density of 374.81 people per square mile. During the first decade of this century then, there was an increase in population concentration at the county level, as the Hoover index rose from 65.61 to 66.16. By 2015, the US population was even more concentrated at the county level, with H = 66.73.

Among the 2144 low-density counties, the Hoover index was 49.31 in 2000, with the most rural of places (758 of the 2144 counties) containing 65.53% of the land area in this subset but just 16.22% of the population and having an average density of 3.99 people per square mile. The remaining 2144 − 758 = 1386 counties had an average population density of 39.22 people per square mile. As a group, the low-density counties experienced a small increase in concentration, with the Hoover index rising from 49.31 to 49.77 during the decade. Concentration among the low-density counties continued during the first half of the next decade, with the Hoover index rising further to 49.87 by 2015.

Among the 997 high-density counties, the 719 least dense of these counties had 78.06% of the area but just 33.98% of the population (and an average population density of 163.04 people per square mile), leading to a Hoover index of 44.08 in 2000. The remaining high-density counties had an average population density of 1131.2 people per square mile. Within the high-density counties, deconcentration was experienced during the decade, with the Hoover index falling from 44.08 to 42.66. From 2010 to 2015, the index then rose slightly, to H = 42.7. These results are summarized in Table 1.2.

Table 1.2 Summary of changes in population concentration: 2000–2015

Table 1.3 shows the results of further disaggregation, where there are four separate Hoover indexes associated with eight sections of the Lorenz curve. There has been little change in concentration over the period for the subset of lowest density counties. For the next level up the density hierarchy, there has been a steady increase in concentration between 2000 and 2015, with H increasing from 17.93 to 18.95 (and it is interesting to note that this subsystem of counties has the lowest H values, indicating relative uniformity in density). At the next step up the curve, there was also an increase in concentration between 2000 and 2010 and then a slight decrease during the first half of this decade. Finally, for the subset of counties with the highest densities, there was deconcentration during 2000–2010, followed by a slight increase in concentration during 2010–2015.

Table 1.3 Hoover index along four segments of Lorenz curve: 2000 and 2010

1.6 Attributing Change to Particular Counties

The Hoover index does not change when people move from a region on one side of the vertical line that defines the Hoover index to another region on that same side of the line. Migration will only cause change in the index when people move from a region on one side of the vertical line to a region on the other side of the line. When two Lorenz curves are compared, there may be regions that are on one side of the vertical line at one point in time and on the other side of the line at the next point in time. A region on the left side of the vertical line in 1 year will contribute to increased concentration if it is on the right side of the line in a subsequent year.

To illustrate, here we examine the set of counties that had a density of about 40 persons per square mile in 2000 and find those that contributed to the increased concentration that was observed over the next 15 years. In 2000, there were 1386 counties in Sect. 2 of the Lorenz curve (when it is divided into four sections, in order of increasing density); there were 1369 counties in that section in 2015. In 2000, these counties had a population density of 39.22 persons per square mile, and they ranged in population density from 16.2 to 79.4 people per square mile. By 2015, this section of the curve contained counties with an average population density of 43.43 persons per square mile and densities ranging from 17.7 to 90.8 persons per square mile. The 23 counties listed in Table 1.4 were among the low-density counties in this portion of the Lorenz curve in 2000 but were in the high density portion in 2015. They therefore contributed to the increased concentration in this subsystem. Many of these grew rapidly and are easily recognizable as amenity-laden places. Others, such as Bacon County, Georgia, did not grow particularly fast but were close to the vertical line in the Lorenz curve of 2000 (near the mean density of 39.22 persons per square mile); even a small amount of growth was sufficient to put them on the right-hand side of the vertical line in the 2015 Lorenz curve.

Table 1.4 Counties contributing to increasing concentration within the high density portion of the low density counties

1.7 Discussion and Summary

The Hoover index of concentration has a long history of application in population geography and demography. It has received particularly widespread use in the study of rural population change since the 1970s.

Like the Hoover index, the Gini coefficient may be interpreted in relation to the Lorenz curve. It may also be derived as the sum of the Hoover index, plus weighted, scaled Hoover indexes associated with partitions of the Lorenz curve constructed by dividing the curve at the vertical line associated with the Hoover index. This approach was used here to show how population concentration has changed during the period 2000–2015. Overall increases in concentration were accompanied by concentration for many population density levels, but deconcentration occurred for that set of counties comprising the highest density levels.

It would be interesting to assess the temporal change in Hoover indexes for these subsystems for various demographic components of change. Rogerson and Plane (2013) found increasing concentration for births, deaths, and international migration and deconcentration for internal migration. Whether these trends hold at all levels of population density is unknown. Rural areas, e.g., tend to have a relatively older age structure, and the relatively large number of deaths and small number of births act to increase population concentration in the more dense places among them. This effect might, for example, be less apparent or nonexistent at higher population densities. It would be particularly interesting to know, e.g., whether net internal migration is leading to deconcentration at all levels of population density or whether only a certain portion of the density hierarchy is witness to that trend.

Finally, it could also be enlightening to examine more closely the geographic distribution of these changes. The focus here has been on groupings of counties by population density. These groupings could easily be mapped, as could the counties that increased or decreased in population density sufficiently to contribute to a change in the Hoover index. Similarly, the analysis described above could be carried out for regional divisions of the United States.