Introduction

Data-driven local inferences on health care delivery and health outcomes across large geographic areas can facilitate targeted and comparative actions in a way that resources are effectively allocated for maximum impact.1 3 Local inferences rely on high geographic resolution measures, often called small-area estimates. Many existing studies in the fields of epidemiology, environmentrics,4 , 5 and health care planning literature6 10 have introduced and applied models for deriving small-area estimates from large, national administrative datasets or/and publicly available data.

Small-area estimates at high geographic resolution may not however be available for policymaking by federal agencies, state health departments, and community health organizations because of resource constraints in acquiring relevant data or/and lack of analytic capabilities to derive such estimates.11 , 12 While local health departments may collect some administrative data that can be used in small-area inferences, they likely do not collect the types of data that can aid in deriving small-area estimates and identifying emerging trends across large geographic regions (e.g., statewide or multiple states).

Because of this important challenge, it is common practice to derive estimates at lower geographic resolution (e.g., county) for statewide or national comparisons.6 10 , 13 , 14 However, division of states into counties varies greatly; for example, a large state such as California has 58 counties, whereas a smaller state such as Georgia has 159 counties. Within-county variations in demographics, population health, and economics can lead to high error rates of the county-level estimates. In turn, making decisions based on such estimates can suggest interventions that do not appropriately address health care disparities and/or improve outcomes. Estimation at the highest geographic resolution is important to overcome these limitations.

Small-area estimation at the highest geographic resolution is particularly critical in estimation of spatial access, measuring accessibility and availability of health care services.15 Spatial access is a system outcome varying across communities due to geographic variations in health care infrastructure and in population’s choice of health care provider,16 both coupled with variations in state-level health policies, for example, Medicaid reimbursement and eligibility.

To this end, this paper applies a modeling approach for obtaining small-area estimates of spatial access to pediatric primary care that is data “rich” and mathematically rigorous, integrating data and health policy in a systematic way. The model is general and can be applied widely to different types of care, different states, and different countries. The methodology is particularly relevant for deriving local estimates because it employs a systems approach, allowing for trade-offs between supply (providers) and need (patients), and for constraints in other forms of access such as acceptance rate of governmental insurance.17 , 18 The approach characterizes spatial access assuming that not all patients are covered or served by the system of care because of their lack of access in all its forms,19 and it separately provides estimates of spatial access for those served by private and public insurance while accounting for their competition on available resources. It incorporates timely information specified in the Patient Protection and Affordable Care Act (ACA); it considers all options of primary care for children, with preferences depending on the provider type and patient age; and it accounts for patient’s trade-off between accessibility (measured by distance traveled) and availability (measured by congestion at the provider or wait time).

Statistical inference is used to demonstrate the sensitivity of the model in capturing trends within states derived from estimates at different geographic resolution levels, i.e., census tract and county level, with implications in policy decision making on where to focus interventions for improving access and on which type of intervention to consider for more appropriate health care delivery, particularly for different population groups differentiated by insurance type.

We piloted our approach in two states, Georgia and California, selected because of the differences in their administrative geographic subdivisions.

Data Sources

Data: Administrative Subdivision in Georgia and California

Census tract-level data containing geographic information for the two states were downloaded from the census bureau’s website.20 Georgia has 1969 census tracts and 159 counties. After eliminating census tracts with missing data, the analysis includes 1951 census tracts. California has 8057 census tracts and 58 counties. After eliminating census tracts with missing data, the analysis includes 7984 census tracts.

Data: Demand for Pediatric Primary Care

Recommendations by the American Academy of Pediatrics21 about the type and frequency of visits are heeded resulting in consideration of three age classes requiring different numbers of visits/year: age 0–1, age 1–5, and age 6–18 with an average number of visits/year equal to 8, 1.6, and 1, respectively. Patient population is aggregated at the census tract level and located at the census tract centroid. The 2010 SF2 100 % census data and the 2012 American Community Survey data are used to compute the total number of children in each census tract and in each age class. Specifically, the following tables were used to compute the value of the parameters for the optimization model: PCT3, B17024, PCT10, PCT7, B19001, and B08201 (see Appendix 2 which provides a detailed description on the applied methodologies to derive model parameters from the different tables.)

Medicaid and the Children’s Health Insurance Program (CHIP) are social health care programs for families and individuals with low income and limited resources. They constitute the primary sources of coverage for low-income children in the USA. Several requirements need to be met to be eligible in the programs, including total family income which cannot be greater than a fixed threshold which varies by state. We derive the Medicaid and CHIP eligible population counts using a threshold of 247 % of the Federal Poverty Level (FPL) for the state of Georgia and a threshold of 261 % for the state of California, consistent with the new threshold limits of the ACA.22 It is assumed that patients can be assigned to a provider whose distance is less than or equal to 25 miles, in accordance with the guidelines of the Department of Health and Human Services (DHHS).23 Since patients without a private vehicle must use alternative means of transportation, we also assign a maximum distance threshold for these patients equal to 10 miles.

Data: Supply of Pediatric Primary Care

The set of providers and their addresses were obtained from the 2013 National Provider Identification (NPI) database.24 This dataset provides information of all providers currently being reimbursed for health care services. Individual and institutional records are distinguished by the entity type attribute where entity 1 corresponds to individual (e.g., physicians, sole proprietors) and entity 2 corresponds to health care providers who are not individual. We restricted our network of providers only to entity 125 because the dataset does not report how many providers of a particular type are represented by the organization’s listed taxonomies raising the possibility of double-counting between individuals and organizations.

We selected only those providers who have, among the declared set of taxonomy codes, one of the following: Family Medicine (207Q00000X), Internal Medicine (207R00000X), Pediatrics (208000000X), and Nurse Practitioner Pediatrics (363LP0200X). We used the Business Practice Address attribute in the dataset to geolocate each selected provider using the Texas A&M Geocoding Services.26 The NPI database is used not only because geolocation information of each provider was needed, but also because of the inconsistencies27 in other databases. Street network distances between census tract centroids and provider addresses are computed using the ArcGIS Network Analyst28 using the National Highway Planning Network downloadable from the Federal Highway Administration website.29 Total distances correspond to the shortest network path between the centroid location and the provider location. If a centroid does not fall onto the street network, it is moved to the closest street.

A maximum provider caseload of 7500 visits/year for an average patient panel size of 2500 patients28 is used. It is assumed that a higher value will significantly impair quality of access. Caseloads of General Pediatricians and Pediatric Nurse Practitioners are assumed to be completely devoted to pediatric care while Family/Internal Medicine physicians are assumed to devote only 10 % of their caseload to children.29

To account for different acceptance rates of Medicaid/CHIP insurance, varying by state,30 , 31 practice setting, and provider type,32 the 2009 MAX Medicaid claims data is used to obtain county-level estimates of the percentage of primary care physicians participating in the Medicaid program (see Appendix 1 which provides additional information on the data used and methodology used to compute county-level Medicaid acceptance ratios). Using these ratios, the same approach used in a previous work17 is applied, and a subset of providers is randomly selected in each county who accept Medicaid/CHIP insurance such that the total number of selected providers equals the estimated percentage of providers in the county.

Methods

The Mathematical Model

The optimization model in Nobles et al.17 is extended to take into account several additional factors that influence patients’ and providers’ behavior, including different patient age classes, different needs for age class, different provider types, and different patient preferences for provider types.

The decision variables of the model represent the total number of patients in each census tract of a given age class who are assigned to a specific provider. Since the Medicaid/CHIP eligible and privately insured populations may face different barriers to health care, the model considers the two populations separately when matched to providers, while accounting for their competing access to providers accepting both types of insurance.

Our model is based on the assumptions that both families and policy makers value children having a primary care provider, that patients prefer to visit nearby physicians, and that they prefer to schedule visits when the office is not too busy (low congestion); however, when physician congestion is considered too high, families prefer nonphysician providers such as Nurse Practitioners.33 Under these assumptions, the objective function of the optimization model is a weighted sum of the total distance traveled (which needs to be minimized) and the provider preference contingent upon demand volume (which needs to be maximized). The balance between these two components of the objective function is controlled by a trade-off parameter, which is used to define the relative importance of each component in the objective function. Its value is empirically selected such that (i) neither of the two components of the objective function dominates the other, and (ii) the optimized decision results in sufficient spatial autocorrelation to indicate that close neighbors experience similar travel distance and similar congestion level (see Appendix 3 for a detailed description of the empirical selection of the trade-off parameter).

Constraints in the model reflect trade-offs and behaviors in the system. From patients’ perspective, constraints in the resulting matching take into account the obstacles that patients encounter when choosing a provider (such as distance, provider congestion, Medicaid/CHIP insurance acceptance and provider type). From providers’ perspective, constraints are considered so that: (i) the total number of patients assigned to each provider does not exceed maximum caseload capacity, (ii) the total number of assigned patients under Medicaid/CHIP insurance does not exceed Medicaid/CHIP acceptance caseload, and (iii) different provider types have different caseload capacities and different Medicaid/CHIP acceptance levels. The detailed description of the model is provided in Appendix 3.

The output of the model consists of the optimal assignment of needed demand in each census tract to providers in the network, while the demand within a census tract may be assigned to different providers or/and a proportion of the demand may not be served. Hence, the model provides estimates of the served demand for primary care. The optimization model is implemented using the optimization programming language OPL34 and the CPLEX solver on a UNIX system.

Parameters of the model are estimated for each state by integrating the different data sources mentioned in the previous section together with the different health policies (i.e., Medicaid/CHIP eligibility criteria) implemented in each state. Table 1 provides a summarized description of the set of parameters, their values, and the data source we used to determine their value.

TABLE 1 The parameters used in the model, together with their description and the data sources used to set the corresponding value

Spatial Access Measures: Accessibility and Availability

We use the results of the optimization model to measure accessibility and availability of primary care for overall population of children, Medicaid/CHIP-insured children, and privately insured children, both at the county and census tract levels. Accessibility is measured as the average distance a child must travel for each visit to his/her assigned provider. Availability is measured as the congestion a child in the census tract or in the county experiences for each visit at his/her assigned provider. Since children who are not assigned to a provider have the worst possible spatial access, regions whose population is not assigned to any provider are assumed to experience a distance of 25 miles and 100 % congestion.

Statistical Comparison of the Measures at the County and Census Tract Levels

Inference statistical methods are used to compare the distributions of accessibility and availability at the county and census tract levels and to compare the level of disparities in accessibility and availability between the Medicaid/CHIP-insured and the privately insured population when analyzed at the census tract level or at the county level.

In particular, denote by M(s) and by O(s) the spatial access measures (either the availability or the accessibility measure) derived from the optimization model for the Medicaid/CHIP-insured population and for the privately insured population, respectively, where s denotes the spatial aggregation (i.e., either census tract or county). We test whether summaries of the distributions of the two processes, including median and variance, are statistically different. Because both processes are observed over the same spatial units, we apply paired testing procedures.

We apply the nonparametric Wilcoxon test to test the null hypothesis for equality of medians at the census tract level and at the county level (H0 : μ CTY = μ CENSUS; H1 : μ CTY > μ CENSUS) of the two dimensions (accessibility and availability) for the two states. The nonparametric Wilcoxon test and a modified version of the nonparametric Wilcoxon test are used to test equality of medians (H0 : μ M  = μ O ; H1 : μ M  > μ O ) and equality of variances (H0 : σ 2 M  = σ 2 O ; H1 : σ 2 M  > σ 2 O ), respectively, of the two processes M(s) and O(s) both at the census tract level and at the county level (see Appendix 4 for details on the statistical methods used).

Threshold maps are constructed to visualize where accessibility and/or availability is higher or lower than the 85th percentile and the 15th percentile, respectively.

To analyze the association between accessibility and availability at the census tract and county levels, a nonparametric regression method is applied to find a smooth relationship between the two measures using the GAM function in the mgcv library in R.35

Simultaneous confidence bands for the difference measure of the two populations are estimated to identify census tracts or counties where the difference in either accessibility or availability between the two populations is statistically significant. Positive or negative significance maps are then derived at the 0.01 significance level (see Appendix 4 for details).

Results

The total number of children in the two states is approximately equal to 12.5 million, and the total number of children eligible for public insurance is equal to 6.8 million. The total number of providers under consideration in the state of Georgia is approximately equal to 8000, while in the state of California, it is approximately equal to 32,000. Fifty-three percent of the providers in the state of Georgia accept Medicaid/CHIP-insured children, for a total of approximately 4200 providers; 59 % of the providers in the state of California accept Medicaid/CHIP-insured children, for a total of approximately 18,800 providers.

Comparing Accessibility and Availability at the County and Census Tract Levels

Summary statistics of the accessibility and availability measures are computed at the county and the census tract levels for each population group in each state and are given in Table 2. Boxplots of the distributions of the accessibility and availability measures for the two states, both at the county and census tract levels, are shown in Fig. 1. Tables 3 and 4 show the results of the statistical tests. Figure 2 shows the threshold maps, with the census tracts (or counties) with accessibility or availability measures higher than the 85th percentile or lower that the 15th percentile (denoted as “higher” or “lower”). Figure 3 shows the smooth nonparametric relationship between availability and accessibility. Figure 4 shows the significance maps at the census tract level and at the county level for both states and for both the accessibility and the availability dimensions.

TABLE 2 Summary statistics for the two distributions (accessibility/distance (miles) and availability/congestion) at the census tract level and at the county level for both states. Each measure is estimated for the entire children population (column “All”) and for the two population groups separately, i.e., Medicaid/CHIP-insured and privately insured (non-Medicaid)
FIG. 1
figure 1

The boxplots of the distributions of the accessibility dimension (distance (miles)—on the left) and of the availability dimension (congestion—on the right) at the census tract level and at the county level for the Medicaid/CHIP-insured population and the privately insured (non-Medicaid) population for the state of Georgia (top) and for the state of California (bottom). Horizontal lines in the figures represent mean values at the state level for the overall population.

TABLE 3 p values of the statistical test for equality of medians at the census tract level and at the county level (H0 : μ CTY > μ CENSUS; H1 : μ CTY > μ CENSUS) of the two dimensions (accessibility and availability) for the state of Georgia and the state of California
TABLE 4 p values of the statistical test for equality of medians (H0 : μ M  = μ 0; H1 : μ M  > μ 0) and equality of variances (H0 : σ 2 M  = σ 2 O : H1 : σ 2 M  > σ 2 O ) of the two dimensions (accessibility and availability) both at the census tract level and at the county level between the Medicaid/CHIP-insured population and the privately insured (non-Medicaid—(OTH)) population
FIG. 2
figure 2

The threshold maps for both states and for both accessibility (i.e., distance (miles)) and availability (i.e., congestion). In each map on the left, the gray-shaded areas and triangles correspond to counties and census tracts, respectively, where the local estimates are lower than the 15th percentile (denoted “Lower”). In each map on the right, the gray-shaded areas and dots correspond to counties and census tracts, respectively, where the local estimates are higher than the 85th percentile (denoted “Higher”).

FIG. 3
figure 3

The smoothed nonparametric relationship between accessibility (distance) and availability (congestion) obtained using the “gam” function in the library mgcv in the R statistical software. The resulting function for the county level is shown on the left, and the function for the census tract level is shown on the right. Results for Georgia are on the top, and results for California are on the bottom.

FIG. 4
figure 4

Significance maps both at the census tract level (on the left) and at the county level (on the right) for the two dimensions of access (i.e., accessibility and availability). Each dot on the map corresponds to a census tract or a county where Medicaid/CHIP-insured population has a statistically significantly lower accessibility (i.e., greater distance) or lower availability (i.e., greater congestion) than the privately insured population, at α = 0.01 significance level. The gray-shaded regions on the maps correspond to counties where the Medicaid/CHIP-insured population does not experience a significantly worse accessibility or availability.

Accessibility

The median distance traveled at the census tract level for the three population groups for Georgia and California is 7.94 miles (standard deviation (SD) = 6.65) and 4.92 miles (SD = 5.86) for the overall population, 8.62 miles (SD = 7.29) and 6.92 miles (SD = 7.03) for the Medicaid/CHIP population, and 6.07 miles (SD = 6.03) and 1.01 miles (SD = 4.73) for the privately insured population, respectively.

The median distance at the county level is greater than the median distance at the census tract level in both states, and for all the three population groups, such a difference is statistically significant for both states (Table 3).

The 85th percentile of the travel distance distribution at the census tract (county) level is 15.84 (19.58) miles in Georgia and 10.55 (15.69) miles in California. The 15th percentile of the travel distance distribution at the census tract (county) level is 1.07 (7.39) miles in Georgia and 0.45 (5.55) miles in California (Fig. 2).

Availability

The median congestion experienced at the census tract level for the three population groups for Georgia and California is 0.43 (SD = 0.27) and 0.49 (SD = 0.25) for the overall population, 0.42 (SD = 0.28) and 0.43 (SD = 0.26) for the Medicaid/CHIP population, and 0.39 (SD = 0.28) and 0.49 (SD = 0.29) for the privately insured population, respectively.

The median congestion at the county level is greater than the median congestion at the census tract level in both states, and for all the three population groups, such a difference is statistically significant for both states (Table 3).

The 85th percentile of the congestion distribution at the census tract (county) level is 0.78 (0.85) in Georgia and 0.81 (0.73) in California. The 15th percentile of the congestion distribution at the census tract (county) level is 0.15 (0.39) in Georgia and 0.23 (0.45) in California (Fig. 2).

The correlation coefficient between availability and accessibility at the census tract level is r = 0.73 for Georgia and r = 0.59 for California, while at the county level it is, respectively, r = 0.85 and r = 0.88 (Fig. 3).

Disparities between Population Groups at the Census Tract and the County Levels

The significance maps (Fig. 4) show that in Georgia (California), the Medicaid/CHIP-insured population has a statistically significantly lower accessibility than the privately insured population in 53 % (74 %) of the census tracts and in 100 % (98 %) of the counties. The Medicaid/CHIP-insured population in Georgia (California) experiences a statistically significantly higher congestion than the privately insured population in 18 % (12 %) of the census tracts and 47 % (31 %) of the counties. As shown in Table 4, the median and variance of distance (accessibility) for the Medicaid/CHIP-insured population are statistically significantly higher than for the privately insured population at both the census tract and county levels for both states. The median congestion (availability) for the Medicaid/CHIP-insured population is statistically significantly higher than for the privately insured population both at the census tract and county levels, only for the state of Georgia.

Discussion

A modeling approach for small-area estimates of spatial access to pediatric primary care is introduced along with a systematic comparison of estimates of spatial access at the census tract and county levels for Georgia and California.

The approach can address the limitations of other existing models such as simple ratios of providers to population15 and the two-phase catchment method (2SFCA)36 which does not account for the unserved demand due to lack of access in all its dimensions and the trade-offs between supply and need of care. The proposed approach is able to capture patient and provider preferences17 , 18 and to incorporate health policies changes by a different setting of the values of the model parameters. Additionally, the model is able to capture the differences in the trends across large regions derived from estimates with different resolution levels. The constraints included in the model are basic barriers to accessibility, availability, and acceptability; the model can take into account other dimensions of access such as accommodation.

For both availability and accessibility, the results show that county-level estimates tend to underestimate spatial access. The medians of the measures estimated at the county level are significantly greater than the medians of the measures at the census tract level, because the within-county distributions of the estimates are highly skewed. Hence, spatial access is underestimated at the county level. The implication of this finding is that decisions based on the county-level estimates about where to locate or incentivize new practices could be misaligned to need.

The results show also that geographic disparities for each population group (public and privately insured) are consistently underestimated when measured at the county level: variability of the measures at the county level is lower than variability at the census tract level on all but one measure reported here.

For the two states, the choice of where to focus interventions for improving access, if based on the county-level estimates, would target regions that do not experience greater needs for improvement in spatial access. For Georgia, many areas of the state would not be identified as “in need” according to the county estimates, although it is evident from the census tract estimates. For example, Burke County in Georgia with six census tracts has an average travel distance at the county level of 17.7 miles, which is lower than 19.5 miles, the 85th percentile of the travel distance distribution at the county level for this state. However, the average travel distance of the six census tracts in the county is 23.6, 22.5, 18.1, 13.3, 8.9, and >25 miles. The high variability of the travel distances at the census tract level is not captured at the county level, and an intervention based on county-level estimates would neglect in this case to intervene in communities which are potentially most in need.

County estimates are misleading when trying to understand which type of intervention is more appropriate to deliver. In the two states, the county-level estimates suggest that regions with low (high) availability also have consistently low (high) accessibility. Thus, although appointments may be available, they are practices that are difficult to travel to. In contrast, the census tract estimates reveal a different relationship; specifically, in Georgia, these estimates show that there are several regions that experience lower distance but higher congestion.

Finally, local estimates are a much better tool for capturing disparities across populations. Our results reveal that disparities in access between Medicaid/CHIP-insured and privately insured populations based on the county level metrics are underestimated. Specifically, the county-level estimates for availability in Georgia do not capture the statistically significant gaps in some counties in the northern part of the state, as evident from the census tract estimates. In California, according to the county-level estimates for availability, there is no county with a statistically significant difference in the central and south regions. However, the census tract estimates reveal statistically significant differences for many areas in these regions. In terms of accessibility, 100 % of the counties in Georgia show a statistically significant difference between the two population groups; however, the census tract estimates show that this is not true for some areas in the southeastern region of the state.

The study has several limitations. The primary challenge in this study is the limited availability of data. Because of limited state-level information, eligible populations are evaluated considering federal net income thresholds instead of modified adjusted gross income (MAGI)-equivalent thresholds. The procedure for evaluating the MAGI-equivalent threshold depends on state-level policies with too many unknown parameters to be realistically considered. Additionally, because of lack of data, multiple data sources are considered for different years to estimate demand parameters. Finally, Medicaid/CHIP acceptance rates are computed using 2009 data; although this limitation does not affect the overall findings of the analysis, these parameters could be underestimated because of the implementation of the ACA.

A second limitation is the set of assumptions specifying some of the system constraints. For example, the provider capacity is assumed to be uniform across geographies (i.e., 2500 patients or 7500 visits/year). Similarly, the same willingness to travel for all populations in rural and urban areas is assumed. Moreover, variations in the percentage of physicians practicing pediatric primary care after the implementation of the ACA are not accounted for. These assumptions can be relaxed, and the system constraints can be better informed with the acquisition of detailed local-level data.

Our study brings us to several conclusions. Much research in health services considers access to care, either as a primary study topic or as a factor for which to control.8 , 37 43 These existing studies often rely on measures of access that are too simple to account for systemwide trade-offs, or measures computed at high geographic aggregation levels. This paper demonstrates that these limitations can result in misleading policies and interventions and that high geographic resolution estimates are needed to understand the nuances in health care access where county-level estimates wash out important differences.

By understanding access at high geographic resolution, it is also possible to separate out different dimensions of access and may facilitate designing targeted interventions that will have the highest impact at the community level. Local estimation approaches are the best available tool and can ultimately help improve the health of our nation’s children. The models used in this paper (and the associated code) are available at www.healthanalytics.gatech.edu so that other researchers or health organizations can use them to assist in quantifying measures of access, with no software license required.