Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Geographic information systems (GIS) are computer software used to describe and analyze spatial data (DeMers 2000). Through linking geographic locations with data that describe those locations, GIS provides a basis for examining a range of questions associated with the arrangement of variables in geographic space. Since human beings necessarily inhabit geographic space, population data are inherently spatial. GIS enables explicit investigations of the geography of human population, complementing the range of demographic methods used to measure other population characteristics.

Smith et al. (2001: 365–366) describe recent uses of GIS for demographic research, concluding that this tool makes demographic trend analysis possible for very small spatial areas. Such a capability could open up new possibilities for considering some of the likely interactions of present and projected human populations with nonhuman species. Most such species must retreat or, at a minimum, alter their behavior as human beings not only expand their occupation of physical space but also co-opt natural resources and whole ecosystems for human purposes and activities. Smith et al. nonetheless caution that “… GIS-based projection methods are complex and expensive. They require a substantial investment of time and effort ….” In the following paper, we show that it is feasible to use GIS to apply simple projection methods to a large number of small area projections. And, while GIS software and its application are not inexpensive, there is almost certainly no less expensive way to create millions of small area projections at once.

2 Recent Population Density Maps and Prior Projections Using Simple Methods

For interdisciplinary analysts, maps of current world population density are fundamental to many environmental and socioeconomic studies. To cite just a few diverse applications, such maps have facilitated demographic analysis of the earth’s biodiversity hotspots (Cincotta et al. 2000), economic studies on the geography of poverty and wealth (Sachs et al. 2001), and greenhouse gas emission inventories used in global climate change simulations (Van Aardenne et al. 2001; Graedel et al. 1993).

Gridded population density maps have been developed at regional and global scales. Methods and data have included both census enumerations (Tobler et al. 1995; CIESIN 2000, 2005a) and modeled allocations of population data to smaller geographic units using supplemental information such as proximity to roads and remotely sensed patterns of night lights (ORNL 2002). However, little theoretical work has been applied to the problem of developing projected versions of such maps, despite the fact that projected versions would likely be of similar interest to the users of present day maps (CIESIN 2005b).Footnote 1 One reason for the limited work on projected grids is the obvious challenge of making, or obtaining, demographic projections at small spatial scales for many world regions.

The United Nations routinely develops 50-year projections for the world’s approximately 230 countries (United Nations Population Division 2003). These projections rely on state-of-the-art cohort-component modeling that is the commonly accepted tool for world and national projections. In addition, the United Nations also makes projections to 2030 for each country’s urban and rural populations and to 2015 for 30 of the largest urban agglomerations with 750,000 or more inhabitants. These subnational projections are presented as numerical totals for each country and not as spatial data. In addition, they are based on logarithmic extrapolation of changes in the urban–rural population ratio for each country, not the cohort-component methods used in the national projections (United Nations Population Division 2003).

In contrast, present day population maps exist at spatial scales as small as 0.5 min latitude and longitude (~0.9 km at the equator) (ORNL 2002; CIESIN 2004) or 2.5 min latitude and longitude (~4.6 km at the equator) (CIESIN 2000, 2005a). At the latter scale, the maps include density estimates for almost nine million grid cells over the Earth’s land areas. It clearly is not feasible to apply age-cohort modeling to such a vast array of grid cells as it would require assembling vital rate information (fertility, mortality, and migration rates) for each grid cell. Such data are not available, and without them, standard age-cohort projections are not an option.

But even if cohort-component modeling on such scales were technically possible – in a sense the ultimate spatial disaggregation of world population projections – an important question is whether the accuracy of the resulting projections would be worth the vast effort they required in comparison to simpler methods. Studies of the relative accuracy of simpler versus complex projection methods suggest that the complex methods may perform no better than simple methods – with respect to forecasts of total population – when retrospective analyses of accuracy have been done (Smith 1997; Long 1995; White 1954). In his survey of published ex post comparisons of projection accuracy from simpler methods as compared to more complex causal and cohort models, Smith (1997: 560–561) concluded that “… There is a substantial body of evidence … supporting the conclusion that more complex [population projection] models generally do not lead to more accurate forecasts of total population than can be achieved with simpler models ….” Alternative viewpoints on this question are made in Beaumont and Isserman (1987), Ahlburg (1995), and Keyfitz (1981).

We are aware of only three other published gridded world population projections using simple methods. In the first case, Gaffin et al. (2004) attempted to model a 2025 GPW projection using a “constant-share” method. This method requires a launch year (1990) and a projection (derived from the 1996 Revision of the United Nation’s World Population Prospects). The share of national population in the grid cell is calculated for the launch year and held constant for any future projection. If a grid cell holds 1% of national population in 1990, it is assigned 1% of the projected 2025 national population. This forces all small area trends to conform to the large area trends, so subareas that are losing population prior to the launch year will be seen to gain population thereafter (provided that the large area is gaining population). This kind of trend reversal was noted as problematic and led to the current attempt by the authors to apply more sophisticated ratio methods. Note, however, that Smith et al. (2001) still define the methods we apply in this chapter as “simple” projections methods compared to other trend methods (such as ARIMA time series models).

Thornton et al. (2002) work with historical data at the administrative level within developing countries. They used 1980–1990 administrative unit population growth rates and held these rates constant throughout the forecast period to 2050. Gridding of the administrative areas was then applied using a smoothing algorithm. Their method does not ensure consistency with the United Nations national projections for the countries they study, and discrepancies with the UN 2050 projections arise. In cases where these discrepancies were large, the results were rescaled to match the UN figures.

Finally, the Center for International Earth Science Information Network (CIESIN 2005b) produced future estimates of population out to 2015, released in conjunction with their Gridded Population of the World Version 3 (CIESIN 2005a). These estimates are made based on logarithmic extrapolation of historical data at the administrative level within each country. For some countries, administrative boundaries are proprietary and not available for release. A scale factor was applied to all subareas to force compliance with the UN national projections. Population was then distributed over the grid using a proportional allocation method.

In contrast with Thornton et al. (2002) and CIESIN (2005b), we project population at the grid cell level. Our projections combine two different extrapolation methods so as to reduce anomalous results, and we also build agreement with the UN projections into the algorithms.

3 Methods

Trend extrapolation methods, long ignored in favor of cohort-component methods, have made a comeback in recent years due to theoretical advances and their relatively low cost and data requirements (Smith et al. 2001). Ratio methods rely on the existence of independent projections of larger areas. While most small-area projections use states or other administrative boundaries as the subarea of a larger administrative unit, in this chapter, we treat the Gridded Population of the World Version 2 grid cells (described below) as subareas of countries.

To carry out the Geographic information systems (GIS) calculations, we used ArcView 8.3 (ESRI 2002) with ESRI’s Spatial Analyst extension to conduct analysis of raster grids. Spatial Analyst allows the user to perform various logical and mathematical operations on grids. Among the simplest operations is arithmetic. Addition, for example, will generate an output grid where each grid cell is assigned a value equal to the sum of the values of the corresponding cells from each input grid (that is, those cells which are in the same spatial location). The result is much the same as creating a spreadsheet whose A1 cell is equal to the sum of the A1 cells from several other spreadsheets.

For our base year maps, we work with the 1990 and 1995 Gridded Population of the World, Version 2, databases (GPWv2), maintained and disseminated by CIESIN at Columbia University. These data are partly historical and partly estimated. For example, the United States conducted censuses in 1990 and 2000, so the 1990 data in GPWv2 represent actual census counts, while the 1995 data represent a population estimate between census years.

The GPWv2 and GPWv3 datasets are available at various resolutions down to 2.5 min, or ~4.6 km at the equator (becoming slightly smaller towards the poles). The population is allocated in proportion to the area of each grid cell, with those grid cells crossing administrative boundaries being assigned an area-proportionate population from each administrative unit. Year 2000 data are currently available as part of GPW3, although we rely on 1990 and 1995 data from GPWv2, available at the time this analysis was conducted.

The two ratio methods that we apply require two historical data points: a base year (1990) and a launch year (1995). The first is referred to as the “shift-share” method by Smith et al. (2001). With this, one calculates the annual rate of change in the subarea fractional share of national population between the base year and the launch year:

$$ {f_{\rm{subarea}}}(1995) = \frac{{\displaystyle \left[ {\frac{{{P_{\rm{subarea}}}(1995)}}{{{P_{\rm{national}}}(1995)}} - \frac{{{P_{\rm{subarea}}}(1990)}}{{{P_{\rm{national}}}(1990)}}} \right]}}{{1995 - 1990}}. $$
(2.1)

This trend factor is held fixed and extrapolated to the projection year – 2025 for this study – to yield an extrapolated share. The extrapolated share is then applied to the large area projection for 2025 to generate the small area projection for 2025:

$${P_{\rm{subarea}}}(2025) = {P_{\rm{national}}}{(2025)} \cdot \left[ {\frac{{{P_{\rm{subarea}}}(1995)}}{{{P_{\rm{national}}}{(1995)}}} + {f_{\rm{subarea}}}{(1995)} \times {[2025 - 1995]}} \right]. $$
(2.2)

This method is known to suffer face validity problems (Smith et al. 2001) such as trend reversals and negative population projections. These problems did occur with the present projection and are discussed in the results section.

While we use the simplest shift-share method of linear extrapolation from two data points, Gabbour (1993) describes other shift-share methods that use more than two data points and nonlinear extrapolation methods (such as exponential and logistic). Since more than two data points are used, he calculates a Pearson correlation coefficient for each subarea and retains the method that produces the highest R 2. These methods remain as possibilities for future investigation.

The second method we employ is referred to as the “share-of-growth” method (Smith et al. 2001). This also requires a base year (1990) and a launch year (1995). We calculate the proportion of national growth that occurs in each subarea:

$$ {f_{\rm{subarea}}}(1995) = \left[ {\frac{{{P_{\rm{subarea}}}(1995) - {P_{\rm{subarea}}}(1990)}}{{{P_{\rm{national}}}(1995) - {P_{\rm{national}}}(1990)}}} \right]. $$
(2.3)

This fraction is then multiplied by the projected change in national population between the launch year and the projection year and added to the launch year subarea population to obtain the future subarea population projection.

$${P_{\rm{subarea}}}(2025)\! = {P_{\rm{subarea}}}(1995) \!+ {f_{\rm{subarea}}}(1995)\! \times \left[ {{P_{\rm{national}}}(2025) - {P_{\rm{national}}}(1995)} \right]. $$
(2.4)

Unlike the shift-share method, trend reversals will only occur with the share-of-growth method if a trend reversal is projected for the national area by the United Nations projections.

Figure 2.1 depicts the four generic trend reversal cases that can occur with this method. In Fig. 2.1a, for example, both the nation and subarea have growing population during the base period, and the UN projects a declining national population during the forecast period. The share-of-growth model will forecast a declining population for the subarea during projection. Figure 2.1a, b corresponds to the situation when f subarea in (2.3) is positive, and Fig. 2.1c, d corresponds to a negative value for f subarea.

Fig. 2.1
figure 1

Four generic types of trend reversals that can occur with the share-of-growth method (n national population; s subarea population)

4 Results

Comparisons of the shift-share and share-of-growth models with cohort-component projections lead us to strongly prefer share-of-growth. We discuss the results of this comparison, followed by problems specific to each model. We address in detail those circumstances under which share-of-growth fails to perform as well. Finally, we present our criteria for choosing between these models in creating a combined-model gridded population projection.

4.1 Comparison of the Simple Projections with US Census Bureau State Projections

We assess the shift-share and share-of-growth methods against US Census Bureau state-level projections (US Census Bureau 2005). The gridded projections for the United States are redone using all three ratio methods described (constant-share, shift-share, and share-of-growth) using the GPWv2 1990 and 1995 population grids and using the sum of Census Bureau state-level projections as the large area projection for the United States (which differs from the United Nations projection for the country). The grid cells are then aggregated by state to generate a total 2025 population for each state. We then assess forecast “accuracy” by treating the Census Bureau projections as “observed data.” This is, therefore, not a genuine measure of accuracy. Rather, it is a measure of how these ratio methods compare to standard cohort-component methods. We do not assess bias; since ratio methods are designed to conform to an independent large area projection, state-level errors will cancel when summed, and bias is necessarily zero.

Comparisons to the Census Bureau projections were made using standard mean absolute percentage error (MAPE) and geometric mean absolute percentage error (GMAPE). The mean is a reasonable measure of central tendency for a normally distributed variable, while the geometric mean is a better measure of central tendency for a variable whose logarithm is normally distributed. One way to determine whether a variable is normally distributed is to examine its quantile–normal plot, which is a plot of the variable against a normally distributed theoretical variable having the same mean and standard deviation (Hamilton 1992). If the quantile-normal plot of a variable falls on a straight line, that variable is normally distributed.

A variable whose logarithm is normally distributed is referred to as log-normally distributed. Log-normal distribution is typical of left-bounded, right-skewed variables such as absolute percentage errors (APEs). The log-normal plot of a variable is a plot of the natural log of a variable against a theoretical variable whose natural log is normally distributed and has the same mean and standard deviation. The quantile-normal plots for the share-of-growth APEs and the shift-share APEs do not fall on a straight line, calling into question the usefulness of using MAPE as a comparison of accuracy. The log-normal plot of the share-of-growth and shift-share APEs are closer to a straight line, justifying the reporting of GMAPE here.

We also note that high outliers (which occurred in our case for small states such as Nevada and the District of Columbia) suggest the utility of transforming the APEs. Emerson and Stoto suggest a maximum of 20 for the ratio of the highest to lowest APE (Emerson and Stoto 1983, cited in Swanson et al. 2000). For all three models we consider, the ratio of the highest to lowest APE is larger than 20. For example, for the constant-share model, the highest APE (24.90% for West Virginia) is approximately 138 times as large as the lowest APE (0.18% for Maryland). For the shift-share and share-of-growth models, this ratio is 267 and 103, respectively. We therefore also report our comparisons in MAPE-R (Table 2.1). This measure of accuracy applies the modified Box–Cox transformation described by Swanson et al. (2000) and then uses regression to re-express the transformed APEs in the same scale as the original APEs.

Table 2.1 Absolute percentage error differences between the three ratio method 2025 projections for the U.S. states and the Census Bureau 2025 projections

In conforming to Census Bureau cohort-component projections, the share-of-growth projection method performs noticeably better than either of the other ratio methods. In comparing shift-share with constant-share, we do not feel that the MAPE metric is appropriate for reasons cited above, and we do not feel that the GMAPE metric is appropriate because the APEs produced by the constant-share model are not log-normally distributed. We therefore use the MAPE-R metric for comparison purposes, and by this measure, shift-share appears to perform slightly better than constant-share (9.18% as opposed to 9.29%).

4.2 Problems of the Shift-Share Method

The shift-share method suffers face validity problems. As with the constant-share method used by Gaffin et al. (2004), it also produces trend reversals. A subarea that is growing, but not growing as fast as the national population, will have a negative trend in its population share. For short-horizon projections, this will likely not be a problem; but for long-horizon projections, the trend can produce a ratio small enough to show a population loss in spite of the fact that both the subarea and national area show population increases between the base and launch years. Additionally, this method is known to perform poorly for areas losing population. Since many of the GPW grid cells losing population are rural areas that already have low populations (and therefore a very low share of national population), this method can lead to projections of negative population shares (and therefore negative population, an obvious impossibility in human terms). Figure 2.2 shows areas of the world that contained some negative grid cells using shift-share.

Fig. 2.2
figure 2

Areas with negative 2025 projected population densities using the shift-share method

4.3 Problems of the Share-of-Growth Method

Negative populations are also possible with the share-of-growth method (Fig. 2.3), though not as widespread as with the shift-share method (62 countries with any negative grid cells using share-of-growth, versus 89 such countries with the shift-share method). On the other hand, the share-of-growth method generated grid cells with very large negative values for countries experiencing certain kinds of demographic trends. This is best explained in relation to the issue of trend reversals, summarized in Fig. 2.1.

Fig. 2.3
figure 3

Areas with negative 2025 projected population densities using the share-of-growth method

Reversals like Fig. 2.1a, b are plausible on their face as the subarea is experiencing the same trend reversal as the nation. However, reversals such as Fig. 2.1c, d, without further demographic information on the region, are anomalous.

Portugal exemplifies the trend reversal danger of the share-of-growth model. Portugal experienced urban growth and rural decline between the base year and the launch year, with an overall population decrease of 0.08% (the United Nations now estimates that Portugal’s population grew from 1990 to 1995; we persist in using the 2002 Revision because it provides a useful example of this pitfall of the share-of-growth method). Then, from the launch year to the target year, Portugal is expected to gain by 2.22%. Rural areas in Portugal correspond to Fig. 2.1b and urban areas to Fig. 2.1c. Since the previous growth was in the urban areas, but the national growth undergoes a sign change, the share-of-growth method reverses the trend for urban areas and increases the population in rural areas. These increases in a large number of rural grid cells are offset in the ratio method by decreases in a small number of urban grid cells, with the end result that this method projects negative populations in the urban grid cells. The sum of all negative grid cells comes to 24.84% of the projected national population. Furthermore, zeroing out and renormalizing in this case would produce an absurd distribution of population, with zero population in formerly urban areas.

The most spectacular failure of the share-of-growth method did not, however, occur for a country experiencing trend reversal. It occurred for Russia, where the rate of change was much larger in the projection period than the base period and the country was experiencing decline of the national population in both periods. During the 30-year projection period, Russia was projected to lose 131 times the population as was lost during the 5-year base period, for an annual rate of loss 21.8 times faster during the projection period. The next largest problem countries were Lithuania (with 10.3 times the rate of loss) and Ukraine (with 3.8 times the rate of loss).

4.4 Combined Model Gridded Projection

As mentioned in the section on Census Bureau comparison, the share-of-growth projection method performs noticeably better than either of the other ratio methods. In deciding which method to retain for each country in our world projection, we selected the share-of-growth method for this reason, as well as because the shift-share seems more susceptible to negative population projections. We rule out share-of-growth for those countries where population change reverses direction.

Although we performed these calculations for the entire world at once, each country’s grid cells are independently trended and normalized to that country’s national (historical and projected) population. Examining the results of the shift-share and share-of-growth methods for each country confirmed that for the vast majority of countries, the share-of-growth model had the best face validity.

As mentioned above, both methods can produce negative grid cells under the wrong conditions. The negative grid cells in each country were summed and expressed as a percentage of the projected national population. Particularly poor performers are listed in Table 2.2. One of the criteria used to determine the best projection was which method projected the smallest percentage of negative population. In general, a country that performed poorly with one model performed reasonably well with the other one. For many countries, we were able to eliminate the negative grid cell problem entirely by choosing between the models. Nonetheless, the combined model grid still had some countries with negative grid cells – that is, where both methods projected a negative population for some grid cells for a given country – though nothing larger than 2.69% (Gabon, which appears in Table 2.2 rounded to 3%). These remaining negative cells were set to zero and the grid was renormalized to the United Nations national projections. The final combined map for 2025 is shown in Fig. 2.4. A 1995–2025 change map for the combined model is shown in Fig. 2.5.

Table 2.2 Countries for which at least one projection method resulted in negative person counts totaling greater than 5% of the projected national population
Fig. 2.4
figure 4

Combined model using share-of-growth method for most counties and, to minimize anomalies, the shift-share for the remainder

Fig. 2.5
figure 5

1995–2025 population density change map

5 Conclusions

We have described the use of Geographic information systems (GIS) for small-area projections using ratio trend extrapolation methods and have commented more generally on the performance of these ratio methods. Our end result is a gridded world population projection. A wall chart based on this gridded projection is available as a PDF download at http://ccsr.columbia.edu/?id=research_population. It is important to bear in mind that all population projections are conditional forecasts, based on assumptions made about future trends, and these maps are no exceptions. In particular, mapped results are sensitive to the population changes that occurred in the reference period (in this case 1990–1995). For the same reason, we are not able to, at this time, integrate known changes in population distribution that occurred after the reference period.

This projection can be improved and extended in a number of ways. First, we rely on general considerations of face validity in choosing which method to apply to each country. A demographer closely familiar with a particular country may have very good reasons for criticizing our choice. In fact, we cannot rule out the possibility that the simple constant-share model may produce the most plausible projection for a given country. Without applying any new methods, the projection could be improved by being examined by demographers with expertise in each country. Indeed, with sufficient financial and other support, the projections could become the vehicle for a dialog with such demographers. This in turn could be expanded into further explorations of human–biodiversity interactions around the world.

Second, the projection can be improved by using high-quality subnational projections, especially for countries covering large areas, such as Russia, Canada, the United States, China, Brazil, and Australia. Those countries for which subnational projections are not available can at least be split into urban and rural areas so that United Nations urban–rural projections can be applied. As part of the GPW3 beta test, CIESIN (2004) has released a GPW-compatible grid of urban extents developed under the Global Urban–Rural Mapping Project (GRUMP). Using this urban extents grid, urban and rural populations can be trended and normalized independently.

Third, alternative trend extrapolation methods can be explored. When GPW has matured to the point where several grids are available, regression-based methods such as ordinary least squares, logistic, and ARIMA models can be used.

Finally, of course, as the GPW database continues to be updated, the reference period will lengthen and capture a longer time span, helping to average out more episodic changes in population that may occur in a 5-year or 10-year period.

Despite the uncertainties in mapping population projections – uncertainties which apply to the projections themselves – such exercises can tell us much about the likely future of population distribution worldwide, and hence, perhaps, the future of biodiversity as well. Seeing well-defined geographic areas of projected population decline in Africa, sub-Saharan, for example, serves as a reminder that urbanization powerfully influences rural population density despite high natural increase in such areas. The maps make clear that population change is anything but uniform spatially, and demographic diversity is characteristic of nations as well as of the world as a whole. Finally, such maps provide a powerful visual reminder that, despite large areas of projected population decline, the world overall will have many more people in 2025 than it does today. For the vast majority of nonhuman species, absent surprises or sustained efforts that slow human fertility more than the medium projection anticipates, it is likely to be a less accommodating planet.