1 Introduction

In GIScience, different conceptual models underlie the representation of space and the modeling of spatial processes. The two basic models of space are: (1) a field-based one that views geographic data as collections of spatial distributions describing the variation of an attribute of interest over space, and (2) an entity-based model in which space is a void except where occupied by objects of interest (Couclelis 1992). In location analysis, different models of space can lead to very different solutional methodologies. For example, Weber’s least-cost model of industrial location (Weber 1929) in a field perspective is a problem of calculating a least-cost surface in which the “best” location is the lowest point of the surface. This location can be identified visually in a 2.5D rendering of the surface or in a 2D contour map of isodapanes. In an entity perspective, the best location is a cartesian co-ordinate whose (x, y) values can be determined in an iterative optimization algorithm (see for example Kuhn and Kuenne 1962; Cooper 1963).

The representation of a phenomenon also depends on its level of abstraction. Moving from iconic models of space, such as photographs, to analog models of space, such as maps, and then to symbolic models of space expressed in the language of mathematics requires increasing levels of abstraction. At each higher level of abstraction, properties are substituted that are easier to manipulate (Taylor 1977). However, there is a danger that over-simplification might occur in the abstraction process. Point features are the geometric dual of polygon features (White 1980), which enables centroid points to be frequently substituted for polygon features when moving from analog to symbolic models. The resulting aggregation effects of that substitution in location–allocation modeling has been extensively studied (see Hillsman and Rhoda 1978; Goodchild 1979; Casillas 1987; Daskin et al. 1989; Current and Schilling 1987, 1990; Fotheringham et al. 1995; Murray and Gottsegen 1997; Erkut and Bozkaya 1999; Francis et al. 2000; Plastria 2001; Murray and O’Kelly 2002; Emir-Farinas and Francis 2005).

Geographic information system (GIS) technology allows an integration of the different conceptualizations of space as well as a union of models of differing levels of abstraction. It affords a bridge that links together Bunge’s (1962) traverses from pre-map to maps to mathematics. The geometric view of analog models co-exists with the numerical view of symbolic models, and spatial relationships that are visually recognizable in the map can also be recorded and displayed in attribute tables. As such, GIS has the data representation capabilities for reducing representation error in facility location problems and manipulation capabilities that permit the explicit measurement of spatial relationships between objects in facility models (Miller 1996, 2003; Church 2002; Kwan et al. 2003). The purpose of this paper is to use the spatial manipulation capabilities of GIS to construct spatial objects that encapsulate the algebraic structure of the maximum capture location problem (MAXCAP) (ReVelle 1986) that moves beyond the traditional node-in-network representation used in location–allocation modeling.

2 MAXCAP model formulation

The MAXCAP facility location problem was developed to ascertain the location a set of new facilities within an existing competitive environment (ReVelle 1986):

$$ \hbox{Maximize}\quad {\sum\limits_{i \in I} {a_{i} y_{i}}} + {\sum\limits_{i \in I} {(a_{i} /2)z_{i}}} $$
(1)

subject to:

$$ {\mathop \sum \limits_{j \in J}}h_{ij} x_{j} - y_{i} \geq 0 \quad \forall i $$
(2)
$$ {\mathop \sum \limits_{j \in J}}c_{ij} x_{j} - z_{i} \geq 0 \quad \forall i $$
(3)
$$ y_{i} + z_{i} \leq 1 \quad \forall i $$
(4)
$$ {\sum\limits_{j \in J} {x_{j}}} = p $$
(5)
$$ y_{i}, z_{i}, x_{j} = (0,1)\quad \forall i\;\hbox{and}\;j $$
(6)

where:

I :

= the set of demand nodes;

J :

= the set of eligible facility sites;

h ij :

= 1 if the jth eligible facility site is located closer to node i than the nearest competitor site, and 0 otherwise;

c ij :

= 1 if the jth eligible facility site is located equidistant to node i as the nearest competitor site, and 0 otherwise;

y i :

= 1 if a new facility is located closer to node i than the nearest existing competitor site, and 0 otherwise;

z i :

= 1 if a new facility is located equidistant to node i as the nearest existing competitor site, and 0 otherwise;

x j :

= 1 if a facility is sited at j, and 0 otherwise;

a i :

= the market size associated with demand node i;

p :

= the number of new facilities to be located.

The objective function (1) maximizes that total demand captured by the set of new facilities. Constraint sets (2) and (3) ensure that demand can only be captured if a new facility is located closer than or equidistant to any existing competitor site respectively. Constraint set (4) ensures that only the full demand or half of the demand at node i is captured but not both. Constraint (5) fixes the number of new facilities to be opened. Because of the node representation, ReVelle also noted that there are two special cases in which (1) two or more original closest existing facilities are equidistant to node i or (2) two or more entering facilities may be equidistant to the closest existing facility; he then presented an ad hoc method to handle each of these cases.

ReVelle (1986) also showed how the MAXCAP problem could be written in a p-median formulation by the appropriate choice of coefficients in the objective function of the p-median problem. First, let I, J, x j , a i and p be defined as before; then let D i be the distance from node i to the nearest existing competitor facility and S i be the set of all potential facility sites for node i whose distance, d ij , to node i is less than or equal to D i . The p-median formulation of the MAXCAP problem is:

$$ \hbox{Maximize} \quad {\sum\limits_{i \in I} }{\sum\limits_{j \in S_{i}} {a_{ij}^{\prime} y_{ij}}} $$
(7)

subject to:

$$ {\mathop \sum \limits_{j \in S_{i}}}y_{ij} \leq 1 \quad \forall i $$
(8)
$$ x_{j} - y_{ij} \geq 0 \quad \forall i \;\hbox{and}\;j\;\hbox{for\;which}\; j \in S_{i} $$
(9)
$$ {\sum\limits_{j \in J} {x_{j}}} = p $$
(10)
$$ y_{ij}, x_{j} = (0,1) \quad\forall i\;\hbox{and}\; j $$
(11)

where:

a ij :

= a i if d ij < S i or a i /2 if d ij S i ;

y ij :

= 1 if node i is assigned to facility j, and 0 otherwise.

ReVelle’s original formulation had a strict equality relationship for constraint set (8) but allowing the relationship to be less than or equal permits one to eliminate all y ij variables whose d ij values is greater than S i which can greatly reduce the number of decision variables. Other authors (Church and ReVelle 1976; Hillsman 1984) have demonstrated a similar interrelationship between the p-median and the maximal covering problem. It has been proposed that the p-median model provides a unifying framework for a variety of different location–allocation problems (Hillsman 1984; Armstrong and Densham 1990) and that these other location–allocation models be solved using the p-median equivalent. One benefit of this approach is that specialized algorithms developed for the p-median problem could be applied to a wider set of models. However, a number of these specialized algorithms are heuristics (Teitz and Bart 1968; Densham and Rushton 1992) and are not guaranteed to find an optimal answer. Another advantage is that the p-median formulation permits a MAXCAP extension in which demand declines linearly with the distance to the nearest server (ReVelle 1986).

The main disadvantage of this approach is that it increases the computational complexity of the MAXCAP and maximal covering problems. The MAXCAP and maximal covering models have only single subscripted decision variables because these models only locate facilities, whereas the p-median problem has double-subscripted decision variables because it allocates as well as locates. This increases the number of decision variables and constraints. A second problem is that the p-median problem is prone to Source A, B, and C aggregation errors (Hillsman and Rhoda 1978; Current and Schilling 1987) whereas covering problems only have Source A and B aggregation errors (Current and Schilling 1990). The approach here is to retain only single subscripted variables and to simplify the original model structure and in the process eliminate the two special cases discussed above. A p-median formulation is only necessary if one is solving the extension in which demand declines linearly with the distance to the nearest server

Other extensions and variations of the MAXCAP problem have included weighted networks (Eiselt and Laporte 1989), preemptive strategies (Serra and ReVelle 1994), probabilistic demand (Benati 1999; Benati and Hansen 2002), hierarchical systems (Serra et al. 1992), and threshold demand (Serra et al. 1999; Colomé et al. 2003). The MAXCAP problem is in the tradition of spatially competitive models (Hotelling 1929; Ghosh and Craig 1984; Drezner 1994; Thill 2000; Fischer 2002; Plastria and Carrizosa 2004)

The MAXCAP problem is generally defined in terms of a stylized network of nodes and arcs in which the nodes correspond to point locations of demand and the links represent the shortest path connections between these nodes (ReVelle 1986; Eiselt and Laporte 1989). Usually, the set of existing competitor facilities is a subset of the demand nodes and the set of potential facility locations for the new entrant is the set of demand nodes. In a GIS environment, the actual transportation network can be used to represent the cost and direction of movement of demand to facilities. Existing facilities are located at their actual position along this network as point objects and any potential facility can also be located at its actual position on the transportation network.

In a GIS approach, only the set of facility sites need be given; the location and size of demand units are endogenously determined. This follows the strategy used by Church (1984) to find sets of potential facilities given demand locations, only it is reversed here. Assuming minimum product differentiation and that consumers patronize the nearest facility, each potential facility site can capture a region nearer to it than an existing competitor location. For a transportation network, these capture zones would correspond to the portion of network closer to a new facility than an existing one. For example in Fig. 1, points 1, 2 and 3 represent the locations of existing competitor facilities and point 4 is the location of an alternative new site. In Fig. 1a, the capture zone for site 4 is presented in black; the gray region is the zone captured by the three competitor sites. If the new entrant locates at an existing competitor site then it is assumed that the new entrant captures half the portion of the network since it is equally near as the one competitor site but closer than any other competitor site. In Fig. 1b, the capture zone for existing site 1 is shown in black against that of the other two existing sites; if the new entrant locates at 1 it will receive one-half of the demand in that zone.

Fig. 1
figure 1

a Capture zone of site 4—a new site. b Capture zone of site 1—one of the three existing sites

The capture zones for two or more potential sites are not necessarily mutually exclusive and may overlap. Figure 2 presents the intersection of capture zones associated with sites 1 and 4. The intersection of the all capture zones defines each unique demand region. Because each demand region is a subset of one or more larger capture zones, it may be partially or wholly captured by one or more facilities. Based on these demand regions having some spatial extent, the MAXCAP formulation presented here differs somewhat from the original formulations presented by ReVelle (1986). Like the original model, the formulation is still an extension of the maximal covering problem but has a structure closer to the algebraic formulation of the maximal covering problem (formulation II in Church and ReVelle 1974, p. 105) rather than its p-median equivalent. The new formulation minimizes the demand not captured by the new entry facilities:

$$ \hbox{Minimize}\quad{\sum\limits_{i \in I} {a_{i} y_{i}}} $$
(12)

subject to:

$$ {\mathop \sum \limits_{j \in J}}b_{ij} x_{j} + y_{i} \geq 1 \quad \forall i $$
(13)
$$ {\sum\limits_{j \in J} {x_{j}}} = p $$
(14)
$$ x_{j} = (0,1) $$
(15)
$$ y_{i}\geq 0 $$
(16)

where:

I :

= the set of demand regions;

J :

= the set of eligible facility sites;

y i :

= “uncaptured” proportion of a demand region;

x j :

= 1 if a facility is sited at j, and 0 otherwise;

b ij :

= 1 if the potential facility site captures the ith demand region and is not coincident to an existing competitor site; 0.5 if the potential facility site captures the ith demand region and is coincident with an existing competitor site; 0 if the potential facility site does not capture the ith demand region;

a i :

= the market size associated with demand region i;

p :

= the number of facilities to be located.

This revised formulation modifies the maximal covering problem by introducing a coefficient, b ij , that indicates the proportion of demand captured by a new facility. In the maximal covering problem b ij would be a dichotomous (0,1) coefficient, but b ij equals 0.5 whenever a new facility site captures a demand region and is coincident with an existing competitor site. Thus, y i is also not a dichotomous (0,1) variable. Constraint set (13) only allows y i to equal one when no new facility is closer to a demand region than a competitor facility. For this constraint, y i will be zero when a new facility is closer to demand than a competitor facility and 0.5 when the facility is the same distance as the nearest competitor facility. It should be noted that if none of the competitor sites were part of the set of potential new locations, b ij would be a (0,1) coefficient and the formulation would be identical to the maximal covering problem. Constraint (14) sets the number of facilities equal to a fixed value p. Given that the set of facilities sites is given, this formulation requires the estimation of the size and location of individual demand regions and their associated market size (a i ) by GIS operations.

Fig. 2
figure 2

The intersection of site 1 and site 4 capture zones

3 GIS processing methodology

Every linear programming model has a set of rows that correspond to the objective function and constraints of the problem, and a set of columns that correspond to the decision variables. In a geo-relational, vector GIS analysis, every data layer has a feature attribute table in which the set of rows correspond to geographic objects and the set of columns correspond to the attributes of the objects. The attributes can contain information on spatial properties of the objects such as their location, size or length, as well as intrinsic characteristics of the objects, or spatial relationships between pairs of objects. ArcGIS 9.1, the GIS software used in this research, is used to create a data layer in which data values of the attribute table correspond to the coefficients of the MAXCAP formulation given in constraint set (13) and the objective function (12). Once this attribute table is complete, it can be exported as input into a mathematical programming software package.

3.1 Identification of demand regions

As discussed above, it is first necessary to construct the capture zones associated with each potential facility site before demand regions can be identified. The capture zones associated with locating at competitor sites are found simultaneously by using the Network Analyst extension in the ArcMAP module. The “NEW SERVICE AREA” tool, with the competitor sites selected, returns the set of road segments nearest each facility within a facility layer. Because these zones are mutually exclusive, the resulting GIS layer of capture zones will have as many objects as existing facilities. Each individual capture zone is now selected and built as a separate map layer. Next, the capture zones associated with each of the other potential sites are found separately. The “NEW SERVICE AREA” tool is now used with the competitor sites and one additional site selected (see Fig. 1a). The capture zone associated with the additional site is then selected and built as a separate map layer. This process is repeated until all capture zones have been calculated and stored as separate map layers. Although it has been assumed that each potential facility site is of equal importance, differentiated facilities could be included in this approach as the “NEW SERVICE AREA” tool permits different distance weighting by facility site. Instead of each capture zone representing the nearest distance region it would now correspond to the nearest weighted-distance region.

Finally, each capture zone is given an areal extent by buffering its road network. Each capture zone must have an areal extent because the subsequent GIS operations require polygon objects. The width of the buffer can either correspond to a known setback width or can be set to a relatively small width so that the resulting buffer polygons are as similar to the road network as possible. The roads should also be buffered using the option to eliminate semi-circles at the extremities of the road network. Allowing the semi-circles to be added would create spurious intersection polygons in the next step. Each capture zone layer has one row and one column in its attribute table. The single value of the attribute table is set equal to 0.5 if the capture zone is associated with a competitor site and 1 if the capture zone is associated with a non-competitor site.

The demand regions are now constructed by using the “UNION” command in the ArcToolbox module to overlay the capture zones of each of the potential facility sites in succession. The final overlayed layer will result in a map layer with as many features as demand regions. In the associated attribute table, the columns correspond to the decision variables x j and the data values of the columns are the associated b ij coefficients. (It should be noted that if a p-median formulation was necessary, the non-zero b ij coefficients for the ith row of the attribute table at this stage correspond to the set of allocation variables, y ij , belonging to the set S i . Mulitplying these coefficents by a i give the a ij coefficents for each y ij .)

Finally, the columns containing the coefficients associated with the y i decision variables are successively added as new fields using the attribute table editor. After selecting a row of the table, the “FIELD CALCULATOR” is then used to set the value of the new field associated with the selected row equal to one and the values of the unselected rows will automatically be set to zero. The attribute table now contains all of the coefficents in constraint set (13).

The size of the constraint set (13) in this approach will be smaller than in the traditional demand node approach whenever the number of potential facilities is smaller than the number of demand nodes. In the traditional approach in which each node corresponds to the centroid of a demand polygon such as a census tract, the number of constraints in (13) is a function of the scale at which the demand units are given. For example, if there are 50 demand nodes then there would be 50 constraints. However, if at a larger scale the 50 nodes were disaggregated into 1,000 demand nodes, there would now be 1,000 constraints. In the GIS approach, however, the number of demand regions is strictly a function of the number of potential facilities. Adding a new potential facility site to the problem would increase the number of demand regions by approximately six new demand regions but changing the scale at which demand is given would not alter the number of demand regions. It would only affect the estimation of the market size associated with each demand region.

3.2 Establishing market size within each region

The methodology for estimating market size, a i , depends on the available data source. If actual individual customer data were available, the a i values could be calculated by first address-match geocoding each customer location to give it a geo-reference in a point feature map layer. These individual data points could then be aggregated into demand regions using the point-in-polygon operator. This would require that the buffer around the road network that comprises the demand regions was wider than the offset distance used in the geocoding procedure.

If the data source consists of customer information or a surrogate such as total population, which is aggregated into polygon features such as census units, some form of areal interpolation can be used in the estimation process. Although areal interpolation is used to estimate unknown values associated with areal units in a variety of situations that can arise in the processing of map data layers (Mrozinski and Cromley 1999), the technique is most frequently used to solve alternative geography problems in which data values (usually population totals) collected for one geography of source zones units are transferred to different geography of target zones (Goodchild and Lam 1980; Lam 1983; Flowerdew et al. 1991; Fisher and Langford 1996). A pycnophylactic restriction (Tobler 1979) is also imposed on the interpolation process to ensure that the data values associated with the source zones are preserved. Although a variety of areal interpolators have been developed, tests have shown that that dasymetric methods that employ different types of ancillary data improve the interpolated results (Langford and Unwin 1994). Ancillary data that have been used in dasymetric approaches to areal interpolation include remotely sensed land use/land cover data (Fisher and Langford 1996; Mennis 2003) and buffered road networks (Mrozinski and Cromley 1999).

The problem here is to transfer population data collected for a source set of census units to estimate the population (as a surrogate for market size) for the target set of the demand regions. Ancillary data in the form of a road network (used in the previous GIS processing step) already exists but certain road segments that affect travel times and distances such as interstate highways must now be removed before the buffer operation because no one lives along them. This research uses a singly-constrained dasymetric interpolation procedure (Mrozinski and Cromley 1999) to estimate the market size for each demand region. Because there is one a i value for each demand region, the values for a i can be stored in a column of the attribute table.

3.3 Adaptation of ReVelle’s extensions

ReVelle (1986) proposed two extensions to the simple MAXCAP problem. The first allowed the entering firm to have some facilities already in place. This can be implemented in our approach by including these sites along with the existing competitor sites in the first step when the “NEW SERVICE AREA” tool is used to estimate the mutually exclusive capture zones among all existing facilities. The market size for the capture zones associated with the existing sites of the entering firm is then estimated using the techniques discussed above and these capture zones and associated market demand (population) are removed from the system. The remaining steps of calculating the capture zones associated with each of the new potential sites, unioning all capture zones (except those that have been removed) to define each demand region and estimating the market size of these demand regions stay the same.

The second extension allowed demand for the service or product to linearly decline with distance from the facility. As previously noted this extension requires a p-median formulation of the MAXCAP problem. In this GIS approach outlined above, all market regions correspond to a common intersection region of one or more facility capture zones. Once all demand regions have been defined, the market size of the ith market region will be different for each facility associated with that intersection region. Because each demand region has an areal extent, there is no single distance between a demand region and a facility. If the centroid of the demand region was used to calculate an average distance, that would negate the attempts to reduce aggregation errors. Instead it is proposed that because the demand region is a composite of network links, the distance of each link to the facility site be calculated and the market demand associated with each link be estimated using areal interpolation. This market demand is then discounted by a linear distance decay function and the resulting discounted demands are summed over the demand region. This would reduce or eliminate Source A and B aggregation error but not Source C error. The final market demand captured by the entering firm will always be underestimated if two or more facilities associated with any intersection region are selected because all demand will be assigned to one or the other of these facilities rather than the nearest. However, this Source C error can easily be calculated using the discounted demand associated with the closest facility for each link in the demand region rather that discounted demand associated with the facility to which the model assigns it.

4 Test example

The test problem is to locate a set home improvement stores for a new entrant into the southeast New Hampshire market (Fig. 3). The region, the most populated part of New Hampshire, includes its largest cities: Manchester and Nashua, the state capital of Concord, and other major cities like Portsmouth and Derry (Fig. 3). In addition to the densely populated areas, there are less densely populated towns in the north center portion of the study region. The region is closely connected to the Boston metropolitan region to the south and the Portland, Maine, metropolitan area to the east.

Fig. 3
figure 3

The study area of 75 towns in Southeastern New Hampshire

4.1 Data

The existing 14 Home Depot stores in the region were taken as the set of competitor locations (Fig. 4). In addition, 11 other potential sites (Fig. 4) were chosen based on two criteria: (1) the presence of an existing large retail facility was used because the site would need a footprint large enough to accommodate it, and (2) the location did not have a Home Depot store in the same town. The road network based on a TIGER line graph file was downloaded from the ESRI website and clipped to the study area. The road network is given a network data set designation in ArcCatalog to allow distances to be calculated along the road network using the Network Analyst extension in ArcGIS 9.1. Finally, four different census aggregation levels (town, tract, block group and block) were used to examine the impact that the scale of available data would have on the results. Population data for the census blocks were acquired from the University of New Hampshire’s GRANIT website (GRANIT 2006) and towns, tracts, and block groups population was attained from the United States Census (Bureau of the Census 2005). The associated geographic boundary files for the population data regarding the towns, tracts, and block groups were downloaded in a shapefile format from the Environmental Science Research Institute (ESRI) website (ESRI 2006) and blocks were acquired from GRANIT (GRANIT 2006).

Fig. 4
figure 4

Locations of current competitor sites and additional proposed sites

4.2 Processing

First, the 14 existing competitor sites were selected and the capture zone of a new facility that is coincident to these locations was calculated simultaneously by finding the 14 service areas nearest to each of the existing facilities. Next, the capture zone for each of the other 11 sites was determined by adding one new site at a time to the set of the 14 existing Home Depot locations and calculating the nearest road length associated with these current 15 stores. An example of one of these 11 different runs is given in Fig. 5 in which the site at Bedford is added to the 14 Home Depot locations and its capture zone is delineated. The road network associated each of the 25 capture zones was then buffered by a distance of four feet. Such a small distance was chosen to retain the original shape of the road network as much as possible (see Fig. 6). At certain buffer distances individual road segments would coalesce. Next the attribute table for each buffer polygon was edited to retain the appropriate b ij value to each buffer polygon. The 25 capture zones were then overlayed in succession resulting in a final map layer of 89 unique demand regions. Once the unioning process was complete, the attribute table of the final union was expanded to include an additional 89 fields to correspond to the y i decision variables. A value of one was entered into every cell shared by the field and row of the same demand region. This portion of the attribute table represents the uncaptured proportion of demand for any area.

Fig. 5
figure 5

Demand region nearer to a potential facility at Bedford than any existing facility

Fig. 6
figure 6

Example of the buffered road network

The first step in calculating the population of each demand region is to overlay the demand region layer against a census map layer using the “INTERSECT” operator. The result is a map layer with the geography of the 89 demand regions disaggregated into a set of new intersection polygons that has an attribute table containing population and census identification fields from the census layer as well as all of the attribute fields of the demand region layer. The areas of these new intersection polygons are then calculated and stored in a field called NEW AREA. The values in the NEW AREA field are then summarized by the census delineation and this summary table is joined to the attribute table of the intersection layer.

Next, the ratio of the area of each intersection polygon to the summarized area of the census delineation, of which it is a portion, is calculated. The resulting ratio is then multiplied by the population of the associated census delineation and this value is stored in a new field. Once population estimates for each intersection polygon is determined, the population field is summarized by the identification field associated with the demand regions. This summary gives the a i estimate for each of the 89 demand regions and the summary table and is then joined to the attribute table of the demand region layer. Once all the data needed for the MAXCAP formulation is compiled by the GIS processing, the attribute table for the final demand region layer is exported and the data are copied into a SAS file for final solution of the integer program.

4.3 Results

A total of four MAXCAP formulations based on different market size estimates associated with the census delineations were investigated. Each formulation was run 15 times to locate between one and fifteen stores. Comparing results across the different source geographies and number of facilities should give some insight into the robustness of the results with respect to representation error. Casillas (1987) has identified two types of representation error in facility location problems: (1) cost-estimate error—the difference in cost function values due to inaccurate distance values arising from aggregation of demand, and (2) optimality error—difference in cost function values due to incorrect locations. Traditionally, cost-estimate error is the result of Source A and B error. For the MAXCAP problem, cost-estimate error is examined here by comparing differences by source geography in the captured market share. In addition, Erkut and Bozkaya (1999) defined location error as the geographic difference between facility locations based on aggregate demand and those based on more disaggregated demand. Location error is studied here by examining how similar are the combinations of facilities chosen across the different source geographies as well as by computing the average distance separating pairs of store locations.

The aggregate population of the study area is 824,610. Although the areal interpolation procedure was designed to ensure that no population was lost in the interpolation process, round-off error resulted in a small difference in the aggregated estimates (see Table 1). Geographies having fewer observation units (towns, tracts, and block groups) produced aggregate totals very similar to the actual value (the minimum difference was 28 for Towns). The round-off error for the estimates based on block values was much higher because the 16,111 blocks fragmented the interpolation process. At a more disaggregate level, the root mean square (RMS) measurement is used to compare the differences between the demand region population estimates for each pair of formulations. An RMS score of 0.0 means that the two formulations are identical in their estimates and would therefore produce the same MAXCAP results. Overall, formulation estimates were more similar in their market size estimates as the census delineations were similar in their areal extent (Table 2).

Table 1 Aggregated population Ecl
Table 2 RMS scores for demand region population differences between formulations

The results were very similar across the different demand estimates and number of facilities. The maximum difference in the percent market share captured among the four demand estimates was greatest for locating one facility although it was never greater than one percent for any number of facilities. The maximum differences for one through seven facilities were a couple of tenths of a percent less than the maximum differences for eight through fifteen facilities (Table 3). There was no regularity with respect to any geography—no geography was always the highest or the lowest in terms of estimated market share.

Table 3 Estimate of percent market share captured (in percent)

The underlying reason for these results was that the facility locations were almost the same for towns versus tracts, block groups and blocks, and for tracts versus block groups and blocks, and were the same for block groups and blocks across the different number of facilities. Between towns and tracts there was only one pair of stores that differed when five stores were located and one pair differed when fifteen stores were located (Table 4). Between towns and block groups and blocks there was only one pair of stores that differed when five, thirteen, and fifteen stores were located (Table 4). Between tracts and block groups and blocks there was only one pair that differed when thirteen stores were located (Table 4). Although the differences were few (no more than one pairs differed for any number of stores), the expected spatial hierarchy of similarity based on population estimates was present: town locations were more similar to that of tracts than that of block groups or blocks whereas block groups and blocks locations were the most similar.

Table 4 Estimate of location error

Finally, examining individual locations, four of the proposed new sites (Canterbury, Deerfield, Loudon, and Goffstown and four of the existing home depot sites (Somersworth, Portsmouth, Seabrook, and Nashua South) were never chosen. When the number of facilities located was low most of the chosen sites were new ones; as the number of facilities increased a higher percentage of existing locations were found (Table 5). The results suggest that if the new entrant is locating just a few facilities within the region it is better to avoid existing competitor locations. Figure 5a displays the spatial niche that the new entrant would capture in locating one facility by avoiding existing competitor sites. None of the captured area is shared with a competitor. Figure 7 displays the total demand region captured by the new entrant locating seven facilities. Now almost half of the captured demand region is shared with the existing competitor. Finally, Fig. 8 shows the total demand region captured by the new entrant when fourteen facilities are located—the same number as the competitor. More than half of this captured demand region is shared with the competition.

Table 5 Number of new and existing sites chosen
Fig. 7
figure 7

Total demand region captured by a new entrant with seven facilities

Fig. 8
figure 8

Total demand region captured by a new entrant with 14 facilities

5 Conclusions

By using the data representation and manipulation capabilities of GIS to define areal demand units to be captured by potential facilities, the formulation of the maximum capture problem becomes similar to that of the maximal covering problem. The service area delineation and polygon overlay operations within ArcGIS 9.1 permit the construction of the constraint set within the attribute table of a map layer. The geographic representation of demand as areas rather than nodes also eliminates the need for ad hoc solution to two special cases identified by ReVelle (1986) that can arise in a node representation. Neither of these situations can occur in an area representation because demand is contained within the region. All points equidistant define boundaries but no demand is shared at these imaginary boundary lines.

Finally, creating the demand regions limits the size of the problem defining natural aggregations of demand that are scale independent. Areal interpolation procedures can then be used to estimate the size of the demand within each areal unit. The test example suggests that the scale of source population data used in the interpolation process has little impact on the final results. Even with different demand estimates associated with using different source geographies, the selected facilities changed only marginally.