Introduction

In this article, we examine demographic and socio-economic predictors of vegetative cover on private residential lands in New York City, New York. Our motivations for this work are both practical and theoretical, and require advanced spatial methods because of the combination of categorical and continuous data, multivariate co-variation, and potential for temporal and spatial dependencies. Our practical motivation is tied to the City’s tree planting goals. In 2007, the City of New York established both a canopy goal of 30 % (Grove et al. 2006b) and a Million Trees campaign as part of its Sustainability Plan called PlaNYC (City of New York 2007, 2011).

Twenty-eight percent of existing and 35 % of possible canopy cover are on private residential lands (O’Neil-Dunne 2012; Fig. 1). Given this distribution of existing and possible canopy cover, conserving existing canopy cover and planting new trees on private residential lands is crucial to the City’s ability to achieve its canopy goal and Million Trees campaign. Therefore, understanding the significance of different demographic and socio-economic factors that affect the motivations, preferences, and capacities of residential landowners to conserve or plant trees could be critical to achieving New York City’s sustainability goals.

Fig. 1
figure 1

The distribution of land uses in New York City, and Tree Canopy (TC) metrics (adapted from O’Neil-Dunne, 2012)

Understanding the land management practices of urban residential households is globally significant as well, more than half the world’s population lives in cities. Urban populations are projected to swell to 84 % by 2,050 and reach 6.3 billion people, which is approximately equal to the Earth’s current total population (United Nations 2010). The behaviors of urban residential landowners affect many phenomena that are important to urban sustainability goals, especially those goals that are directly or indirectly associated with land cover management. For instance, trees provide numerous environmental, social, and economic benefits, especially in urban areas (ACTrees 2012; Forest Service Northern Research Station 2008). Globally, trees are gaining importance as part of cities’ critical infrastructure. Along with New York City, cities such as Philadelphia,Footnote 1 Los Angeles,Footnote 2 Shanghai,Footnote 3 and DenverFootnote 4 have pledged to plant one million trees in their cities, signaling a growing recognition of these benefits.

In addition to its practical consequences, this study is motivated by a desire to advance social–ecological understanding of urban residential systems. Four theories have emerged over time to explain variations in the distribution of tree canopy cover on public and private lands in residential neighborhoods: population density, social stratification, a luxury effect, and an ecology of prestige (Grove et al. 2006a, c; Troy et al. 2007). Population density is an early theory in this literature, proposed by ecologists and based upon measures of human demographics. Population density is hypothesized to determine changes in the distribution of vegetation cover through development (Smith et al. 2005; Marco et al. 2008 in Cook et al. 2012). As more land is modified for housing and transporting people for example, there is less space available for lawns and trees. Population density theory is relevant to both public and private property because changes to land cover occur on both public and private lands as human settlements develop. A second theory is based upon socio-economic measures and theories of social stratification from sociology. Social stratification theory hypothesizes that the distribution of vegetation cover will be based on relative power and income differences among neighborhoods and residents’ varying ability to influence the amount of public investments that are made in their neighborhoods (Logan and Molotch 1987). Social stratification theory speaks particularly to the distribution of tree canopy on public lands, such as street trees and trees in parks, because of the hypothesized relationship between power and its potential influence over public investment.

The luxury effect is the third theory and is also based upon socio-economic measures. The luxury effect hypothesizes that households with more discretionary income are more likely to invest in green amenities such as trees (Hope et al. 2003; Martin et al. 2004; Kirkpatrick et al. 2007; Luck et al. 2009). While homeowners may be allowed to plant trees on public lands such as street trees and parkways, the luxury effect theory is most relevant to the distribution of tree canopy on private lands because homeowners have legal control over these lands. Cook et al. (2012), reviewing more than 250 studies of residential landscapes, identified common associations between socio-economic characteristics of residents and tree canopy, and negative correlations with housing density at a parcel scale. At a neighborhood or larger scale, positive correlations existed between socio-economic factors such as income and education (Iverson and Cook 2000; Hope et al. 2003; Martin et al. 2004; Grove et al. 2006a, c; Mennis 2006; Tratalos et al. 2007; Luck et al. 2009; Zhou et al. 2009; Boone et al. 2010; Landry and Chakraborty 2009; Pham et al. 2012a,b; Romolini et al. 2013). Most of these studies do not consider alternative sociological explanations for the role of socio-economic status nor additional household factors such as family size and life stage (Grove et al. 2006a).

The concept of a luxury effect is relevant to the fourth social theory we discuss: an ecology of prestige (Grove et al. 2006a, c; Troy et al. 2007; Zhou et al. 2009). The theory of an ecology of prestige is based upon demographic (population, ethnicity, and life stage) and socio-economic (income and education) measures. The ecology of prestige theory is rooted in Veblen’s theory of conspicuous consumption (1981, 1981) and reference group behavior theory (Hyman 1942; Merton and Kitt 1950) from sociology to understand social differentiation of urban neighborhoods. Veblen’s theory of conspicuous consumption refers to the behavioral phenomenon of individuals and households and their purchase of consumer goods and services as a means to publicly display their social status and prestige. While initially used by Veblen to describe the “nouveau riche” at the turn of the twentieth century, this behavior has been observed to be common among all social and economic classes. Conspicuous consumption is linked to reference group behavior theory because consumption behaviors represent a means by which households can establish membership in a social group. Reference group behavior theory examines the process of evaluation and self-appraisal, where individuals adopt the values, standards, or norms of a social group as a frame of reference for their own behavior (Merton and Kitt 1950). Lifestyle characteristics are often associated with different social groups. Social differentiation among urban neighborhoods frequently becomes manifest in terms of the different lifestyle choices that households make and how those choices change over time. Some of the characteristics that affect these choices include socio-economic status, family size and life stage, and ethnicity (Grove et al. 2005; Timms 1971; Knox 1994; Short 1996; Gottdiener and Hutchison 2001; Kaplan et al. 2004). Building on this approach to lifestyle choices and neighborhood differentiation, an ecology of prestige is a theory of lifestyle behavior which hypothesizes that many locational choices, environmental management decisions, and expenditures on publicly visible environmentally relevant goods and services at household and neighborhood levels are motivated by group identity and social status associated with different lifestyles and lifestages (Law et al. 2004; Grove et al. 2006a, c; Troy et al. 2007; Boone et al. 2010). Specifically, it is hypothesized that a household’s land management decisions may be influenced by its desire to uphold the prestige of its neighborhood and outwardly express its membership in a given lifestyle group.

From this perspective, housing and yard styles, green grass, and tree and shrub plantings have social meaning. These are not luxury items per se because these behaviors support something valuable, namely a household’s publicly visible contributions to upholding neighborhood group identity, prestige, and other qualities (Scotts 1998; Nassauer et al. 2009; Mustafa et al. 2010). Further, these ecological prestige behaviors may vary among different lifestyle groups, even within relatively similar levels of socio-economic status or affluence (Jenkins 1994; Bormann et al. 2001; Grove et al. 2006a, c; Troy et al. 2007; Zhou et al. 2009b; Boone et al. 2010; Roy Chowdhury et al. 2011). For example, Troy et al. (2007) found in Baltimore, MD that despite similar population density, occupations, and level of educational attainment, neighborhoods predominated by families with children had on average 36 % more of their yards covered in vegetation compared to neighborhoods predominated by younger singles or couples with no children. The ecology of prestige theory is relevant to all four types of property regimes–private, public, community, and open access (Bromley 1991)—because it posits that households will act individually and collectively on these types of lands in order to establish and/or maintain the ecological identity of their neighborhood.

Finally, research has shown that there are temporal lags between neighborhood change in terms of socio-economic status and other lifestyle characteristics and land cover (Grove 1996; Troy et al. 2007; Luck et al. 2009; Boone et al. 2010; Clarke et al. 2013). Trees do not grow instantaneously. Therefore, the characteristics of the people who lived in a particular place when trees were first established may provide important insights into the socio-spatial distribution of vegetation observed today.

Advanced spatial methods are available and may be necessary to comparatively test these four theories because of (1) the high spatial heterogeneity of urban areas; (2) the possibility of temporal lags; (3) use of categorical and continuous data, which may exhibit multivariate co-variation; and (4) potential for spatial dependencies. Many of the existing studies examining the distribution of urban land cover in residential areas are methodologically limited to comparatively evaluate these four theories for several reasons (Cook et al. 2012). First, most of the studies examined by Cook et al. (2012) used spectral analyses of mid-resolution, 30 m land cover data. These data and methods are not able to fully quantify vegetation in the high spatial heterogeneity of urban areas, particularly in residential areas (MacFaden et al. 2012; O’Neil-Dunne et al. 2012; Zhou and Troy 2008; Fig. 2). A second concern related to spatial heterogeneity is that many of these studies used U.S. Census tracts or block groups as the unit of analysis, but did not distinguish between residential and non-residential land uses such as parks and open spaces, and institutional lands within those tracts or block groups. Further, many of these studies did not distinguish between private and public lands–street trees and parks—within residential land uses. This mixing of land uses and ownerships confounds the analyses and limits the theoretical validity of those studies (see Fig. 3a, b). Finally, temporal lags and “landscape legacies” were seldom examined. Thus, the results of past behaviors by previous residents may be incorrectly attributed to current residents (c.f. Grove 1996; Troy et al. 2007; Luck et al. 2009; Boone et al. 2010; Clarke et al. 2013).

Fig. 2
figure 2

Urban areas exhibit substantial socio-ecological heterogeneity across relatively short distances and heights (a). Coarse-scale, 30 m estimates of tree canopy successfully capture larger patches of urban vegetation (b). However, finer-scale measures (6 in) (c) are needed for accurate estimates of smaller patches which often comprise a sizeable portion of the study area. Sources: Orthophotographs (2009), National Land Cover Database (2001), Land Cover for the City of New York (2010 by MacFadden et al. 2012)

Fig. 3
figure 3

a, b This area of Brooklyn typifies the heterogeneity across the study area in terms of both land use and land cover. In the featured block group, private residential land comprises 35 % of area, and contains 26 % of the tree canopy. The public right of way occupies 31 % of the land area, and holds 42 % of the canopy. Combining all land uses within a block group would distort the analyses and results

In addition to the potential for theoretical limitations, there are two methodological challenges to be addressed. First, two types of data can be used to comparatively evaluate the four theories we have presented: categorical and continuous data. Categorical data can be produced using geodemographic segmentation techniques based upon a suite of continuous variables as inputs into the aspatial clustering of spatial data (Troy 2008). Geodemographic segmentation has been used to produce market segments, which contain relatively homogenous social groups that are intended to predict household consumption behaviors. Geodemographic segmentation has emerged as a useful way to compare differences in the amounts and types of urban vegetation across meaningful social groups (Grove et al. 2006a, c; Troy et al. 2007; Zhou et al. 2009; Boone et al. 2010; Bigsby et al. 2014). However, continuous variables are also useful to “unpack” or disaggregate the individual measures that comprise categorical data in order to investigate the relative importance of individual variables. Further, by combining categorical and continuous data, analyses may produce mutually reinforcing conclusions that increase confidence and reduce uncertainty when interpreting results.

Second, spatially explicit methods may be needed because of the risk of spatial dependencies among independent and dependent variables. Spatial dependencies may be present because a household’s land management activities may not occur in socio-spatial isolation. Instead, it is likely that households take notice of and are affected by their neighbors’ behaviors (Nassauer et al. 2009; Harris et al. 2012; Larson and Brumand 2014). This socio-spatial phenomenon would create a spatial dependency and violate the statistical assumptions of independence among observations. Spatial lag and spatial error regression models incorporate spatial autocorrelations in order to more effectively understand underlying spatial patterns.

Research Questions

In this paper, we address four questions. First, which social theory best predicts variations in the distribution of existing and possible canopy cover on private residential lands in New York City: population density, social stratification, luxury effect, or ecology of prestige? Second, are there temporal lags between neighborhood change and present-day land cover? For instance, are demographic and socio-economic measures from 1980, 1990, and 2000 better predictors of the current distribution of vegetation than contemporary demographic and socio-economic indicators? Third, which continuous variables of geodemographic categories are most significant in predicting variations in the distribution of existing and possible canopy cover? Fourth, do spatial statistics enhance our understanding of the relationships among demographic and socio-economic factors and variations in the distribution of existing and possible canopy cover?

Methods

Site Description

New York City is comprised five counties, which are also called Boroughs: Manhattan, the Bronx, Brooklyn, Queens, and Staten Island. According to the US Census Bureau, 8,186,443 people lived in NYC in 2010 (United States Census Bureau 2011). The City of New York’s Parcel Land Use and Tax Ownership (PLUTOTM) database contained 859,146 parcels in 2010, of which 753,418 lots (87.70 %) were zoned as residential land use: (1) One and Two Family, (2) Multi-Family walkup, (3) Multi-Family Elevator, or (4) Mixed Residential/Commercial. Collectively, these residential parcels cover 65,931 acres of land, representing 35 % of total land area and the largest land use category by both area and number of owners (Fig. 1). For comparison, the extensive public park system managed by the Department of Parks and Recreation, who is the second largest public landowner, covers approximately 29,000 acres of land and 14 % of the City’s total area.

Data

There were four primary sources of data for this analysis: (1) Claritas’ Potential Rating Index for Zipcode Markets (PRIZM™) database, (2) U.S. Census block group attributes and geographies, (3) New York City’s PLUTO tax-lot attributes and geographies, and (4) a land cover raster dataset (MacFaden et al. 2012). Claritas’ PRIZM database at the Census block group level was used for the categorical analysis. PRIZM is a geodemographic dataset that was developed originally to characterize household consumption behaviors in order to determine marketing strategies and store locations (Weiss 2000; Claritas 2008). For example, PRIZM has been used to answer four interrelated marketing questions: who are my targets, what are they like, where can I find them, and how can I reach them (Claritas 2008)? The use of PRIZM has not been restricted to the private sector. It has also been used effectively in other contexts such as adoption of stormwater mitigation practices and urban forestry programs (Holbrook 2001). PRIZM categories are organized into a nested hierarchy of urbanization (5 categories), socio-economic status (15 categories), and lifestyle (66 categories). These three levels correspond to the theories we have identified: population density (PRIZM-5), social stratification and luxury effect (PRIZM-15), and ecology of prestige (PRIZM-66). PRIZM is available at the Census block group level for the entire United States.

Census block group attribute data from 1980, 1990, and 2000 were used for time-series analysis. These data were obtained from Geolytics’ Neighborhood Change Database (NCD). NCD consists of realigned/re-aggregated datasets where US Census data from 1980 and 1990 are re-apportioned to 2000 geographies. The NCD is useful because it facilitates time-series analyses that would otherwise be more difficult to conduct due to changing Census boundaries over time. These time-series data were used to address the second research question about temporal lags.

Continuous data were obtained from two sources. Demographic and socio-economic data were acquired from the 5-year American Community Survey (ACS) 2006/2011-estimation product at the block group level. Zoning codes, home values, and building age attributes were accessed from New York City’s PLUTO database (Table 1). These data were used to address the third question: which continuous variables were most significant in predicting variations in the distribution of existing and possible canopy cover.

Table 1 Descriptive statistics of input data

The fourth data source was a hi-resolution land cover dataset. A hi-resolution (6 in.), seven-class land cover data layer for New York City, was created by MacFaden et al. (2012) and summarized to every parcel in City’s tax map in 2010. For this project, only land cover data for private residential properties were included. These land cover data were generated using a combination of Light Detection and Ranging (LiDAR), multi-spectral aerial imagery, and existing GIS vector data sets (e.g., building footprints and road polygons) and a semi-automated approach. First, an Object-Based Image Analysis (OBIA) rule-based system was built to automatically extract land cover information. This was followed by a detailed manual review at a scale of 1:1,250 in which approximately 35,000 corrections were made. Overall accuracy of the land cover layer exceeded 96 %, kappa = 0.95 (MacFaden et al. 2012). The seven classes are tree canopy, grass/shrub, bare earth, water, buildings, transportation (roads and railroads combined), and other paved surfaces. Parcel boundaries from the PLUTO database were used to calculate possible and existing vegetation for each residential parcel.

Geoprocessing

To constrain the analysis to residential lands, block groups were excluded in this analysis if they contained fewer than three residential lots, zero population, or were classified as 100 % open space (Troy et al. 2007). An additional category was created at the PRIZM-66 level to accommodate New York City Housing Authority (NYCHA) properties. NYCHA is responsible for more than 931 acres of tree canopy on approximately 2,410 acres of land that includes 334 developments and 2,597 residential buildings. Approximately, 400,000 people in more than 175,000 households reside in NYCHA housing. If these properties were a city unto itself, then it would form the 21st largest in the country (New York City Housing Authority 2013). For comparison, NYCHA grounds contain more tree canopy (931 acres) than Central Park’s entire land area (843 acresFootnote 5). In short, these properties contain a significant portion of the City’s population and natural resources, and therefore warrant inclusion in this analysis. Block groups comprised exclusively of NYCHA properties which were given their own new category yielding a “PRIZM 66 + 1.” Block groups containing a mix of NYCHA-owned parcels and other residential properties maintained their original classifications from Claritas in the PRIZM system.

Geoprocessing tools were used to summarize all continuous variables at the Census block group level. PLUTO was queried for lots zoned as either One and Two Family, Multi-Family walkup, Multi-Family Elevator or Mixed Residential/Commercial. Average building age and its square were calculated per block group. The squared value was included because trees do not grow instantaneously. Instead, trees planted as part of a new development may exhibit peak canopy levels decades later and then senesce, approximating an upside down “U” shape (Troy et al. 2007). For simplicity, we call the 2006/2011 models “2010.” PLUTO also facilitated the calculation of the percent of the residential area per block group that was classified as a single-family home or a two-story detached home, and the percent area of each block group that was classified as protected open space (NYC Open Data 2013). A total crime index for the year 2008 from Tetrad was tabularly joined to the block groups (Tetrad|AGS 2008). The index is scaled such that 100 is the national average, and a 200 would represent twice the national average.

Possible and existing vegetation data were calculated for Census block groups using the following methods. The amount of each of the seven land cover types was calculated for each residential parcel in the PLUTO database using the Tabulate Area Tool in ArcGIS 10 (ESRI 2010) in order to quantify the amount of possible and existing vegetative cover (EVC) per parcel. The percent possible vegetation cover (PVC) is the individual tax-lot area minus the building envelope as a percentage of area, aggregated to Census block group boundaries (Fig. 4). Then, PLUTO was intersected with the year 2000 block groups. Because the two agencies, City of New York Department of City Planning and the United States Census Bureau, created each dataset independently and for different purposes, there were 2,890 instances where a parcel straddled more than one block group (0.38 % of all parcels considered). Parcels were manually reassigned to the block group that contained the majority of the parcel. Within each Census block group, the residential land area was summed, and then the percent containing building area was subtracted. This represents the amount of space that hypothetically could contain trees, grass, or shrubs (termed “PVC”), and was then divided by the residential land area of that block group.

Fig. 4
figure 4

Parcels are partitioned into Possible Vegetation Cover and not suitable categories. Then, Existing Vegetation Cover (EVC) and its two subtypes are calculated

Possible vegetation cover is the percent of the parcel area that theoretically could contain vegetation of any type: trees, grass, or, shrub. PVC is calculated as the parcel area minus the building footprint area. Grass and shrubs are combined into the same category in the land cover data due to the extreme heterogeneity of urban environments. Existing ground cover (EGC) is the percent of PVC that contain grass or shrub, and existing tree canopy (ETC) is the percent of PVC that is occupied by trees. Thus, there are four continuous response variables: the amount of space for vegetation PVC, the fraction of that area occupied by existing vegetation cover (EVC), and the percent of the possible area that is currently occupied by either tree canopy (ETC) or grass/shrub (EGC, Fig. 4). Table 1 shows the continuous predicting variables, response variables, and categorical sets of predictors. Social stratification models include population density, and lifestyle models include both social stratification and population density variables.

Statistical Analyses

Four stages of analyses were performed. All statistical analyses were preformed using the R Programming language except where noted (R Development Core Team 2012). The first two stages addressed the first research question: which theory best predicts variations in the distribution of existing and possible canopy cover on private residential lands: population density, social stratification, luxury effect, or ecology of prestige? ANOVAs were used in the first stage with PRIZM clusters as categorical predicting variables of PVC, EVC, existing tree cover (ETC), and existing grass/shrubs cover (EGC). In these ANOVAs, each dependent, vegetation variable is explained by the categorical PRIZM data three times, once for each type of PRIZM classification–5, 15, 66 + 1, which represented population density, socio-economic status (stratification and luxury effect), and lifestyle theories, respectively. Twenty-four block groups were excluded in the categorical analyses because there were three or fewer block groups within a particular lifestyle category, but their continuous values were used in the spatial regressions (n = 5,425). Also, not all PRIZM groups occur in NYC—three groups from PRIZM 5, eight groups from PRIZM 15, and 22 groups from “PRIZM 66 + 1” groups—when groups with three block groups or fewer are excluded due to small sample size. Bartlett’s test of homogeneity of variances ensured that the ANOVA assumptions were met. In two instances when these assumptions were not met (P < 0.05), the non-parametric Kruskal–Wallis test was used instead. This first stage of analysis produced 12 categorical models. Subsequently, a multi-model comparison using the Akaike information criterion (AIC) was used to identify the model that was most likely to be the best categorical model (Akaike 1973, 1978). The AIC is a log-penalized maximum likelihood estimator that helps an analyst rank model performance based on a balance between fit and the number of parameters. The AIC is calculated as follows (Eq. (1)):

$${\text{AIC }} = \, - 2 {\text{log}}L\left( M \right) \, + { 2}k$$
(1)

where k is the number of parameters plus one while log L (M) equals the maximized log likelihood for the model. AIC scores form the basis for ranking models: lower scores are better. When two models exhibit 0 ≤ ∆ AIC < 3, they are generally considered equally predictive and parsimonious (Burnhan and Anderson 2002).

In the second stage, the continuous variables that collectively contribute to defining PRIZM categories were used as predicting variables in a series of linear bi-directional stepwise regression models for the same four response variables in stage one. This enabled us to examine which individual variables were most closely associated with the amount and type of vegetation present. Each of the predicting variables was grouped into nested theories: population density, social socio-economic status, and lifestyle to match their PRIZM 5, 15, and 66 + 1 counterparts. The presence of multicollinearity was examined using the variance inflation factor (VIF) for each of the OLS models with the lowest AIC. The VIF score was at or below 7.5 for all models except for building age and building age squared, which are naturally correlated. A VIF score of 10 is considered to be a sufficiently high correlation to warrant further evaluation (O’Brien 2007). A multi-model comparison was used to infer the relative significance of each theory and how individual factors related to the types and extents of cover in residential areas among the 12 continuous models.

The third stage addressed the second research question, which addressed temporal legacies and lags. Continuous variables representing population density and socio-economic status from years 1980, 1990, and 2000 were correlated with measures of current vegetation cover. Time-series analysis using variables associated with lifestyle and life stage were not performed because historic data were not available.

In the fourth stage, spatial lag and spatial error models were created using the regression equations from stages one and two using the free software package GeoDa (Anselin et al. 2006). A first-order queen contiguity spatial weights matrix was used since many block groups form a relatively uniform rectilinear tessellation across the region, which approximates a chessboard configuration from which the weights matrix derives its name. Both Pham et al. (2012a,b), and Raddatz and Mennis (2012) employed the queen contiguity matrix in similar situations when exploring environmental justice issues in Montreal, Ontario and Hamburg Germany, respectively. Conceptually and theoretically both spatial lag and spatial error models are relevant. In the case of lags, it is probable that households consciously or subconsciously take note of the landscapes surrounding their residence. A spatial lag model would shed light on the “diffusion” of local management practices (Nassauer et al. 2009; Fraser et al. 2013; Larson and Brumand 2014). Alternatively, there may be unknown and unmeasured factors influencing the management of private residential lands. In this case, spatial error models would address these factors and reduce the probability of committing a Type 1 error. Therefore, lacking clear guidance on the appropriateness of either method, we estimated both spatial lag and spatial error models.

Results

Categorical Models

Analyses using categorical models were employed to evaluate the significance of the four social theories. There were statistically significant differences in all cases of categorical models and dependent variables except for PRIZM 5 groups and ETC. Of the three analyses of variance of PVC models, PRIZM 66 + 1 (lifestyle augmented with NYCHA) ranked the highest because its AIC score was the lowest, signaling a balance of explanatory power and conformity with Occam’s razor. EVC in aggregate, ETC, and EGC were also best explained by the lifestyle categories (PRIZM 66 + 1, Fig. 5, Table 2a, b).

Fig. 5
figure 5

Box plots of possible vegetation cover (PVC, left) and existing vegetation cover (EVC, right) by PRIZM classes

Table 2 ANOVA model descriptions, results, and comparisons

Continuous Models: OLS and Spatial Regressions

All 36 continuous model outputs are shown in Table 3. Only the most relevant model for each dependent variable is discussed in depth in the following text. The ordinary least squares regressions always displayed the least explanatory power. The spatial error models provided the most explanation for EVC and ETC, while the spatial lag model produced the lowest AIC scores for PVC and EGC. Also of note, population density consistently yielded the lowest R2, and lifestyle models always explained the most variation in all four dependent variables. These findings reinforce the categorical model results and are consistent with previous studies of Baltimore, MD (Grove et al. 2006a; Troy et al. 2007).

Table 3 Continuous model descriptions, results, and comparisons

The regressions of PVC with variables representing population density, socio-economic status (stratification and luxury), and lifestyle (ecology of prestige) theories ranged widely in their explanatory power. Although not directly comparable, the lowest R2 was 0.29 (OLS, population density), while the highest was 0.76 (spatial error, lifestyle), which shows the range of explanatory power in these models. The beta coefficient estimates their level of significance, and model diagnostics for the PVC, and spatial error lifestyle model can be found in Table 4.

Table 4 Regression coefficients and test statistics for spatial lag model of possible vegetation cover (PVC) within Census block groups (n = 5,425) within New York City

Regression equations for EVC also had a wide range of R2 values. Population density, median household income, percent vacancy, building age, crime index, and percent married were each negatively associated with EVC. Overall, the lowest R2 was 0.12 (OLS, population density), and the greatest R2 value was 0.62 (spatial error, lifestyle), again demonstrating the range of the variation explained. The spatial error lifestyle model results for EVC model are shown in Table 5.

Table 5 Regression coefficients and test statistics for spatial error lifestyle model of existing vegetation cover (EVC) within Census block groups (n = 5,425) within New York City

The regressions for ETC provided the least explanatory power when compared to PVC, EVC, and EGC. The R 2 values ranged from 0.05 (OLS, population density) to 0.45 (spatial error, lifestyle). The spatial error lifestyle model results for ETC are displayed in Table 6. Housing density, home value, building age squared, percent African-American population, detached homes, and open space were each positively associated with ETC.

Table 6 Regression coefficients and test statistics for spatial error lifestyle model of existing tree cover (ETC) within Census block groups (n = 5425) within New York City

Existing ground cover regression models performed substantially better than ETC regression models. The lowest R-squared value was 0.23 (OLS, population density), and the highest was 0.68 (spatial lag, lifestyle). After removing insignificant terms, population density, median household income, home value, building age, and percent married were positively associated with realized grass and shrub cover. Results for the spatial lag lifestyle model are shown in Table 7.

Table 7 Regression coefficients and test statistics for spatial lag lifestyle model of existing grass cover within Census block groups (n = 5425) within New York City

Temporal Legacies and Lags

Our third question examined the relationship between time-series demographic and socio-economic measures and contemporary vegetation indicators using correlation analyses. For brevity, we show only the Spearman’s ρ for possible and EVC (Fig. 6a, b). In several cases, times-series measures are better correlated with present-day vegetation. For example, median household income, educational attainment, and family size in year 1980 are positively correlated with PVC in year 2010. Vacancy in 1980 is negatively correlated with vegetation today and is greater in magnitude than in 1990, 2000, and 2010.

Fig. 6
figure 6

Median household income, the percentage of the 25 year or older population with a high school diploma (labeled education), and owner-occupied housing in 1980 were more strongly correlated with possible vegetation cover (PVC) in year 2010 than present-day measures (a)

Discussion

Comparing Social Theories

Our first question asked which theory best explains the distribution of and opportunities for vegetation in New York City. The evidence from both the categorical and continuous models consistently and independently supports the lifestyle theory, termed the ecology of prestige, over the three alternatives: population density, social stratification, and luxury effect. These results raise doubt about the commonly held notion that measures of socio-economic status alone, such as income and/or race, explain which neighborhoods have the most tree canopy and lawns. Fig. 7 is provided as an example.

Fig. 7
figure 7

Affluence alone provides a poor explanation of the spatial variation of vegetation on privately owned urban lands. This Young Digerati (a) block group has more than twice the median household income, but less than half the possible vegetation cover or plantable space than the Urban Elders (c), and nearly 18 times less existing vegetation cover as the Low-Rise Living market segments (d). The Money and Brains market segment (b) has 3.6 times the median household income of Low-Rise Living (d), but only 20 % more plantable and vegetated area

These results from New York City are generally similar to the findings in Baltimore, MD. When comparing the continuous models from New York City and Baltimore, each model pair contains similar beta coefficient magnitudes, direction, and level of significance. We propose that the high level of agreement between the models for each city adds credence to the theory of an ecology of prestige because New York City is ten times the geographic size of Baltimore, with a population nearly 7.6 times larger, and contains substantially more observations (5,425 vs approximately 700).

Temporal Lags

Our examination of contemporary indicators of vegetation with time-series measures suggests the presence of temporal lags. Our results reinforce findings from similar studies that contemporary urban residential landscapes are temporally dependent and that different rates of social and ecological change need to be considered. This corroborates findings from other locations including Baltimore (Grove 1996; Troy et al. 2007; Boone et al. 2010), southeastern Australia (Luck et al. 2009), and Los Angeles (Clarke et al. 2013). These findings also support the idea that past and present planning decisions such as zoning may affect both the amount of available space for vegetation and where people with different lifestyles may choose to live. In particular, block groups with higher incomes and a better-educated population in the 1980s are more strongly correlated with the possible vegetation in year 2010. This combination of available space and lifestyle behaviors can have significant effects on the distribution of vegetation on private residential lands. Because of the ecosystem services that urban vegetation provides, these decisions and management behaviors may have important and long-term environmental justice implications for urban residents.

Continuous Analyses

Our categorical and continuous data analyses provided mutually reinforcing results when comparing social theories of the distribution of and opportunities for vegetation on private residential lands. As we noted earlier, our categorical analyses showed that there were significant differences among lifestyle groups. Our use of continuous variables enabled us to examine individual predictor variables that comprised each category.

Possible Vegetation Cover

Possible vegetation cover was most highly and positively associated with the presence of detached homes, African-American households, single-family, owner-occupied homes, and the amount of nearby open space. Vacancies were negatively associated with PVC. Existing vegetative cover was most often found in neighborhoods with single-family, African-American households living in detached, owner-occupied homes. Vacancy and higher proportions of married families were negatively associated with greater PVC. Thus, the socio-demographic neighborhood profiles associated with both EVC and PVC were relatively similar. Yet, higher property values, more detached and single-family homes, and higher percentage of owner-occupied residential spaces also tend to have more opportunities for additional greening. This may suggest two potential planting site types or markets; lower income, more sparsely populated places being one type, and higher economically valued, detached, owner-occupied homes in neighborhoods with few vacancies also appear to have more potential acreage to devote toward greening on private residential land.

Existing Vegetation Cover

The associations of the component variables were interestingly signed. For example, population density was negatively associated with vegetation (more plants co-occur with less people), but existing vegetation was positively associated with housing density (more plants co-occur with more housing units). However, the beta coefficients are both close to zero, signaling that neither are very influential when contextual associations are considered. When examining all coefficient signs, a “profile” emerges. Existing vegetation is the greatest in areas where homes are highly valued, detached or one family owner occupied in neighborhoods, with more African-Americans, and with relatively more abundant open spaces and fewer vacancies, lower crime, and newer buildings. EVC is negatively associated with presences of married families. Of note, the two spatial, social socio-economic status continuous models also performed well, ranking second and third, so we infer that the relative contributions of nearby open space and marriage are at least mathematically minor.

Existing Tree Cover

In the regression equations for ETC, population density and housing density have opposite signs. Vacancy and crime maintained a weak but significant negative association with existing tree canopy. All other variables except for the aforementioned population density were positively signed in the socio-economic status models. These findings were also confirmed in the lifestyle models, and the percentage of families that are married was negatively associated with canopy. These relationships, however, left more variation unexplained than explained. Canopy distribution appears more complex than grass and shrubs, possibly because trees take longer to establish and grow, so temporally lagged predictors might be necessary to more fully describe their distribution (Boone et al. 2010).

Existing Ground Cover

Lifestyle theory had more support than either of the other candidate theories for EGC when using continuous methods. For all twelve of the ANOVA’s, the AIC score for the lifestyle model was the lowest, indicating that it had the best fit despite the number of variables used in the model. This finding adds support to the theory of an ecology of prestige because grass and shrub cover are highly visible to the public and controlled on a shorter time frame than tree canopy. The EGC, continuous models out performed both existing vegetation and ETC models. This suggests that our inability to explain the variation in ETC reduces our ability to predict the overall variation in all types of EVC.

Spatial Dependencies

We examined whether the inclusion of spatial statistics would improve our understanding of the relationships among demographic and socio-economic factors and the distribution of and opportunities for vegetation on residential lands. There were strong spatial dependencies among nearly all variables (Table 1). We found that the spatial versions of each continuous model always performed better than its aspatial counterpart, based upon lower AIC scores. The spatial autoregressive term in our four continuous models (Tables 47) always had a large beta coefficient and was statistically significant. These results suggest that the distribution of and opportunities for vegetation on residential lands is spatially dependent. Further, the Moran’s I for the residuals of each model was close to zero, which indicates that the spatial autocorrelation was handled effectively and adds confidence to the model estimates. We suggest that similar studies that use only OLS studies may not fully account for spillover effects associated with the influence of neighbors’ land management choices and neighborhood level norms (Nassauer et al. 2009, Fraser et al. 2013; Larson and Brumand 2014).

Conclusion and Management Implications

In this paper, we asked four questions about the distribution of possible and EVC on private residential lands in New York City. One of the motivations for asking these questions is to advance a general understanding of the social–ecological dynamics of urban residential systems. A general understanding depends upon testing theory and methods under different social–ecological conditions. Our results found very similar results for New York City and Baltimore in terms of theory and temporal dependence. Tests for spatial dependence have not been conducted for Baltimore. Results from other studies using similar theoretical constructs and methods have been mixed. For example, the addition of lifestyle and life-stage measures did not improve the ability to explain the distribution of grass cover on private residential lands in a northern Boston suburban area (Giner et al. 2013). Similar work in Raleigh, NC indicated that urban morphological characteristics played a larger role, and socio-economic measures had approximately equal explanatory value compared to lifestyle characteristics when explaining the distribution of tree cover (Bigsby et al. 2014). The differences in the results from these studies could be due to differences in the density of development patterns, fewer lifestyle groups, landscape legacies, or a combination of these and other factors. We suggest that more comparisons are needed using similar methods. One such study is already underway examining similarities and differences among urban areas in different climatic regimes: Miami (tropical), Baltimore, Boston, and Minneapolis/St. Paul (temperate), Los Angeles (mediterranean), and Phoenix (desert, Polsky et al. 2014).

Replication and comparison depend upon similar data and analyses. Several datasets standout in importance. Hi-resolution land cover (<2 m) and parcel boundaries are crucial to distinguish land cover for different land uses and ownership types. The attributes of two social geographies are also critical to the analysis. Assessed home value, housing type, and housing age are attributes that are often available at the parcel level. Census block group data contain information the about number of households, race, and income. Geodemographic segmentations such as the Potential Rating Index of Zipcode Markets (PRIZM) and other commercial categorical segments like ESRI’s Tapestry can also be acquired at the block group level. Unfortunately, education data are no longer available at the block group level. Since 2010, the US decennial Census no longer reports education statistics at the block group level, and education data are not collected by the ACS.

Adding spatial and temporal lags to the analyses revealed that the implicit assumption of statistical independence among observations is unsubstantiated. In the case of spatial dependence, all variables were significantly and highly autocorrelated (Table 1), and the Moran’s I of the regression residuals was near zero after estimating the appropriate spatial lag or spatial error model (Tables 47). This indicates that the models correctly accounted for these dependencies. Time-series measures of population density and socio-economic status also revealed a temporal dependence. Spatial and temporal autocorrelation not only violate statistical assumptions but may be theoretically important as well. Spatial autocorrelation may indicate that residents are affected by their neighbors’ land management choices and neighborhood level norms. Temporal autocorrelation may reflect the role of past social behaviors, that land management decisions have lasting impacts, and that residential social–ecological systems experience different rates of change.

Testing alternative theories about the relationships between urban residential lands and vegetation cover is more than an academic exercise. Understanding which theory best explains the relationships among population density, socio-economic status, group identity, and private residential canopy cover may have significant implications for the design of policies, plans, and management to advance urban sustainability. For instance, if population density best explained the distribution of vegetation, then decision makers would most likely focus on land use planning and urban design. If social stratification were the best explanation for the distribution of vegetation, then decision makers would need to address patterns and processes of environmental dis/investment, which are essential to advancing environmental justice. And if “luxury” were the best explanation of the distribution of vegetation, then decision makers might subsidize the availability of trees through free “give-a-ways” or tree rebates. What we find, however, is that lifestyle is the best explanation of the distribution of trees on residential lands. This is not to say that land planning and urban design are unimportant, or that environmental justice and costs are unimportant. They are important. However, it is also clear from this research that there are different “markets” for trees.

To conserve and enhance tree canopy cover on private residential lands, municipal agencies, non-profit organizations, and private businesses may need to craft different approaches to residents in different market segments instead of a “one-size-fits-all” approach. Different urban forestry practices may be more appealing to members of different market segments, and policy makers can use that knowledge to their advantage. In this case, advocates may consider policies and plans that address differences among residential markets and their motivations, preferences, and capacities to conserve existing trees and/or plant new trees. Targeting a more locally appealing message about the values of trees with a more appropriate messenger tailored to different lifestyle segments may improve program effectiveness for tree giveaways. Ultimately, this coupling of theory and action may be essential to providing a critical basis for achieving urban sustainability associated with land management.