Introduction

Physical activity—bodily movement produced by the skeletal muscles that results in energy expenditure [1]—prevents or delays onset of many negative physical and mental health outcomes common among older adults, including but not limited to diabetes, cardiovascular disease, breast and colon cancer, arthritis, dementia, declines in physical capacity, falls, loss of independence, and frailty [25]. Despite these protective benefits, fewer than one in four Americans age 65 and over meet recommended physical activity guidelines [6], and nearly a third report engaging in no leisure-time physical activity in the past month [7].

Older people may be particularly sensitive to the influence of the environments they inhabit since they tend to spend more time in their residential neighborhood and limitations in physical capacity may increase the influence of neighborhood-based barriers and threats. Physical disorder—the deterioration of urban landscapes [8]—may be an important and modifiable barrier to physical activity, particularly walking, among older adults [912]. However, the quantitative evidence base that physical disorder acts as such a barrier is limited. Moreover, only a few studies have examined disorder in relation to activity specifically among older adults [1318]. Furthermore, to the best of our knowledge, all prior studies of neighborhood disorder and physical activity among older adults have been cross-sectional. By assessing neighborhood exposure before changes in activity, longitudinal analyses can establish stronger evidence for a causal relationship than is available from cross-sectional studies [19].

In this study, we investigated the longitudinal relationship between neighborhood physical disorder and physical activity in community-dwelling older adults, focusing on the between-individual differences that arise with respect to different disorder levels. We hypothesized that disorder would discourage outdoor activity.

Methods

Subjects and Setting

We used data from the New York City Neighborhood and Mental Health in the Elderly Study (NYCNAMES-II), a longitudinal study of 3497 residents of New York City aged 65–75 at baseline in 2011. Sampling and recruitment for NYCNAMES-II has been described previously [20]. Briefly, subjects were initially recruited by phone using a list of telephone numbers purchased from InfoUSA, a data broker that sells geographically targeted lists of individual’s phone numbers and basic demographic characteristics primarily for sales and marketing purposes. The response rate (i.e., the proportion of persons initially selected from the list who were successfully contacted and who agreed to participate or were determined to be ineligible) was 17%, and the cooperation rate (i.e., the proportion of those successfully contacted who agreed to participate or were determined to be ineligible) was 31%. Seventy percent of subjects (n = 2455) were re-contacted successfully in 2012, and 67% (n = 2355) were re-contacted in 2013. All surveys were conducted by Abt-SRBI, a survey research company, in English or Spanish. Each subject was followed up by telephone once in summer or fall 2012 and once in summer or fall 2013. Final survey weights were raked to New York City population estimates from the 2006–2010 American Community Survey for gender and race/ethnicity and from 2010 Census estimates for educational attainment and borough of residence.

Individual Measures

Subjects self-reported sex, age, educational attainment, race/ethnicity, health status, and income. For analysis, we categorized age at baseline as 65–68, 69–71, and 72–75 and categorized household income as <US$20,000, US$20,000–39,999, US$40,000–79,999, and ≥US$80,000. Education levels were reported as less than high school graduate, high school graduate, some college, or college graduate; health statuses were reported as excellent, good, fair, or poor. To maintain a balance of individuals in each racial/ethnic group, we categorized race/ethnicity as Non-Hispanic Black, Non-Hispanic White, Hispanic, and Other.

We assessed past-week physical activity using the Physical Activity Scale for the Elderly (PASE) [21]. PASE has been validated in several older adult populations [2225] and has been shown to have good correlation (r = 0.68) with doubly labeled water assessment of physical activity [24] and to be more strongly correlated with 6-min test performance than two comparable self-reported older adult physical activity instruments that assessed “typical” activity [23].

Subject physical function was assessed using the nine-item physical function subscale of the Functional Status Questionnaire (FSQ) [26]. At baseline, the sample-weighted mean score for basic activities of daily living was 92.8 and 70.7% of subjects that had no difficulties performing any basic activity of daily living.

Neighborhood Measures

Each subject’s perception of neighborhood social cohesion was assessed using an eight-item scale adapted from an instrument developed by the Project for Human Development in Chicago Neighborhoods [27]. Specifically, subjects were asked about the strength of their agreement with the following statements using a four-point Likert-type scale: (1) if there are problems around your neighborhood, your neighbors get together to deal with it; (2) your neighborhood is close-knit; (3) people in your neighborhood generally do not get along with each other (reverse-coded); (4) if you had to borrow $30 in an emergency, you could borrow it from a neighbor; (5) neighbors will keep their eyes open for possible trouble to your place; (6) people in your neighborhood can be trusted; (7) people in your neighborhood don’t share the same values (reverse-coded); and (8) if you were sick, you could count on your neighbors to shop groceries for you. The overall scale had a Cronbach’s alpha of 0.747.

To measure neighborhood disorder, we used the novel but validated “virtual street audit” technique using a computerized system designed to improve reliability and efficiency of virtual street audits [28]. Trained virtual street auditors used imagery from Google Street View whose initial image capture occurred between August 2007 and October 2011 to assess 532 block faces across New York City for nine indicators of disorder including litter, graffiti, and buildings that appear to be abandoned. Individual items showed kappa scores ranging from 0.34 (for presence of empty alcohol bottles) to 0.80 (for presence of apparently abandoned buildings). Those indicators were then combined using a two-parameter item response theory model to construct a single disorder scale, which had an internal consistency reliability of 0.93. We used kriging, a geospatial modeling technique that incorporates spatial covariance with distance-weighted measurements [29] to provide an estimate of disorder, with confidence levels, at any point in New York City [30, 31]. We then computed estimates at every vertex of a 100 × 100-m grid over the land area of the city and used ArcGIS to compute the mean of the disorder estimates at grid points that fell within each subject’s network buffer. Those mean values constituted our estimates of subjects’ neighborhood disorder levels.

To account for differences in walkability between neighborhoods, we used a validated walkability metric previously described in detail elsewhere [32]. In this measure, the total walkability score is the sum of z-scores of five measures derived from urban planning literature: (1) residential population density, (2) land use mix, (3) intersection density, (4) retail floor area ratio, and (5) subway stop density. This measure has previously been shown to predict BMI [33], engagement in active transport [34], and total physical activity as recorded by accelerometer [35].

To account for social differences in neighborhoods, we used area-weighted estimates from data from the American Community Survey, area-weighting from the census tracts comprising the network buffer to compute a personal score for each subject at each wave. Waves 1 and 2 used 2006–2010 5-year averages, and wave 3 used 2008–2012 5-year averages. Following prior studies of the effect of neighborhood disadvantage, we operationalized racial composition as proportion of residents reporting their race/ethnicity as Hispanic or Non-Hispanic Black and operationalized neighborhood socioeconomic status as proportion of adult residents who had completed high school [36].

Finally, our neighborhood pedestrian risk measure was calculated as the density of unique pedestrian–motorist collisions resulting in an injury or fatality to the pedestrian in 2010. These data have been used in prior analyses of pedestrian collisions and influences on physical activity [31, 37, 38].

Each subject reported his or her home address at each of the three waves. We geocoded these addresses to identify the geographic coordinates of the subject’s home (96% were geocoded to a rooftop; the remainder were assigned to the age 65–74 population-weighted centroid of the reported ZIP code) For each subject, we defined the residential neighborhood as the land area reachable by city streets within 0.25 km of the geocoded home location, an area referred to as a 0.25-km network buffer and frequently used in neighborhood research [37, 39, 40]. We then assigned a mean disorder, mean walkability, and pedestrian risk score to each participant’s residential neighborhood at each wave of follow-up.

Missing Data and Sample Weights

This analysis used data from the 2787 (79.7%) subjects who were successfully re-contacted during wave 2 or wave 3. To account for the missing subjects, we computed inverse probability of observation weights [41] for each subject at each wave as a linear function of gender, race/ethnicity, educational attainment, borough of residence, neighborhood disorder, neighborhood pedestrian injury rate, and self-reported health status. Censored observations at both wave 2 and wave 3 were modestly more common among subjects with male sex, lower educational attainment, and Hispanic ethnicity. Further details on sample weights are given in Appendix 1. Our final analyses used the product of IPCW weights and baseline sample weights such that results are a representative of the population of non-institutionalized New York residents aged 65–75 according to the 2010 US Census (n = 571,323). Appendix 2 presents the sample-weighted population demographics in each wave, illustrating that weights were successful in preserving demographic stability.

Relatively few responses were missing for the subjects who were followed up successfully. For example, no more than 1% of data was missing on any PASE component. Nonetheless, to account for possible bias due to missing data, we used five multiple imputations, computed using IVEWARE [42] to model missing covariates from all available covariates, for all missing responses and used Rubin’s rules to calculate combined estimates.

Statistical Analysis

We explored the stability of PASE scores and functional status over three waves of data collection using spaghetti plots and by computing ICCs. To explore the demographic patterning of disorder and functional status, we computed mean disorder levels and median functional status scores, stratified by age, sex, educational attainment, and income.

After plotting disorder and PASE scores to check linearity assumptions, we modeled PASE as a continuous outcome in a longitudinal linear mixed-effects model. Specifically, we first fit a random intercept model predicting PASE score at each wave from neighborhood disorder in that wave, controlling for baseline age, sex, and educational attainment and for time-varying perceived social cohesion, neighborhood walkability, and neighborhood pedestrian injury risk. Next, to investigate whether disorder affected the change in PASE score over time (e.g., if older adults living in more disordered neighborhoods encounter a sharper decline in activity), we fit a random intercept/random slope model with an interaction term between baseline disorder and wave. In this model, the interaction term is interpretable as the association between baseline disorder and change in PASE score over time. Finally, we fit a random intercept/random slope model with an interaction term between time-varying disorder and wave. In this model, the interaction term is interpretable as the association between change in disorder over time and change in PASE score.

Sensitivity Analyses

We performed five sensitivity analyses to test the robustness of our analysis to various assumptions. First, because we were concerned that past-week activity, as assessed by PASE, might be affected by weather and season, we explored the relationship between PASE and both days since June 1 (to test for seasonal effects), mean past-week “feels like” temperature using weather data for New York City downloaded from the Weather Underground website [43], and formulae for heat index [44] and wind chill [45] published by the National Weather Service. Second, to test the robustness of our conclusions to our choice of longitudinal modeling strategy, we repeated the primary analysis using generalized estimating equations rather than mixed models [46]. Third, to test the robustness of our results to our model for probability of inclusion in any given wave, we re-ran the main analysis using sampling weights supplied by Abt-SRBI for each wave, which were ranked to demographic targets as described above, but by design, it could not account for disorder, walkability, or self-reported health status. Fourth, because disorder has been associated with crime [47] (though the causal relationship of that association is controversial [8, 4851]), we re-ran our primary analysis incorporating CrimeRisk Index variables acquired from ESRI, Inc. (www.esri.com/data/esri_data/business-overview/crimerisk). These measures were based on the Federal Bureau of Investigation Uniform Crime Reports records and have been used in prior analyses. [52, 53] Finally, since some subjects live in the same larger scale neighborhood areas, here, operationalized as NYC Community Districts, we assessed the possibility of non-independence of observations between subjects by fitting a three-level hierarchical model, clustering on subjects within community districts.

All analyses used R for Windows Version 3.2.3, including the “survey” package to incorporate survey weights to account for sample design. We used the R “mitools” package to combine estimates across imputations using Rubin’s rules [54].

Results

As compared to the older adult population of New York City, the full NYCNAMES-II baseline sample analysis was disproportionately female, well-educated, and non-Hispanic. Table 1 shows selected demographic characteristics of the full study population and the subset who were re-contacted at each wave of follow-up. Relatively few subjects moved during the follow-up period (0.9% at wave 2 and 2.0% at wave 3, n = 103 overall).

Table 1 Selected characteristics of the NYCNAMES-II study population interviewed at each wave

Functional status at baseline and exposure to neighborhood disorder varied. On average, Hispanics, less educated individuals, and those with lower incomes encountered more disorder. Younger subjects, men, non-Hispanic whites, and those with higher incomes and more education had higher functional status (Table 2). Disorder was not strongly correlated with other neighborhood measures; more broadly, neighborhood measures of interest were only weakly inter-correlated except for pedestrian injury risk and walkability (Table 3).

Table 2 Disorder levels, functional status, and PASE score at baseline for 3497 older adult residents of New York City surveyed in 2011, stratified by demographic and socioeconomic characteristics
Table 3 Spearman’s correlations between selected neighborhood characteristics

PASE scores were correlated within people across waves (ICC over three waves 0.67, Fig. 1). Mean PASE at wave 3 (80 PASE units) was essentially unchanged from mean PASE at wave 1 (81 PASE units), offering little evidence of activity decline across the population over this 2-year period. Disorder and PASE scores were weakly negatively correlated within each wave analyzed cross-sectionally (Spearman’s r = −0.13, −0.12, −0.13, Fig. 2), though all negative correlations were significantly different from zero (p < 0.001 for all three).

Fig. 1
figure 1

Scatter plots of PASE scores across waves for each subject

Fig. 2
figure 2

Scatter plots of neighborhood disorder score and PASE score at each wave, with an overlaid unadjusted least-squares regression line showing the negative correlation at each wave

In a mixed longitudinal random intercept model using IPCW weights to account for censoring and controlling for baseline age, sex, race/ethnicity, educational attainment, and functional status, we observed that a one standard deviation increase in disorder was associated with an average of 3.1 (95% CI −4.6, −1.7) units lower PASE score at baseline (Table 4). Adding neighborhood social cohesion, walkability, racial composition, neighborhood socioeconomic status, and pedestrian risk to the models modestly decreased the estimate to 2.0 (95% CI −3.7, −0.2) units lower, or about 6 min of walking/day. In a random slope model including an interaction term between wave and disorder, the estimated coefficient for the disorder/time interaction term was 0.0 (95% CI −0.8, 0.9), providing no evidence for differences in PASE trajectory by disorder.

Table 4 Mean differences in PASE score at baseline and mean differences in changes in PASE score associated with baseline physical disorder and changes in physical disorder over time for 3497 adult residents of New York City surveyed from 2011 to 2013

Sensitivity Analyses

While there was minor seasonal and temperature variation in PASE score, particularly in the gardening item, past-week temperature was not strongly associated with overall PASE score. Analyses using mean past-week “feels like” temperature and days since June 1 as covariates are detailed in Appendix 3.

Coefficient estimates computed using a GEE model rather than mixed model were similar to those computed in our primary analysis (Appendix 4). Similarly, effect estimates computed using a mixed model with Abt-SRBI’s sample weights rather than the weights we computed to incorporate health status and other covariates into the model for loss to follow-up were similar to those computed in our primary analysis (Appendix 5). Estimates incorporating a measure of crime risk were largely unchanged from the main analyses (Appendix 6). Finally, mixed models clustering on community districts were also very similar to the primary analysis (Appendix 7).

Discussion

In this longitudinal study of older adult residents of New York City, we observed the hypothesized inverse association between neighborhood physical disorder and physical activity after controlling for numerous individual and neighborhood covariates. However, while individual subjects’ activity levels fluctuated moderately, mean PASE scores for the whole cohort changed little over the two available years of follow-up, and we observed no interaction between disorder and change in activity over those 2 years. Overall, the two-point PASE score differential per standard deviation of disorder remained constant across all three waves.

PASE scores are abstractions and cannot be directly translated in terms of energy expenditure [21]. However, it is possible to conceptualize the two PASE point differential as achievable through roughly 6 min/day of walking [21, 25]. That is, if the estimated difference in PASE score by neighborhood disorder were interpretable as an intervention effect such that removing disorder in a given subject’s neighborhood would elevate that subject’s activity level, then subjects who currently live in highly disordered neighborhoods and engage in no activity could meet the recommended 30 min/day of walking [55] if all nine indicators of disorder were removed (equivalent to removing about five standard deviations of disorder) [30]. We caution, however, that this interpretation is purely a thought experiment to contextualize our estimated two PASE points per standard deviation of disorder; our data and study design do not support a causal interpretation of the disorder coefficient estimate. Not only did too few subjects move for a meaningful estimate of the effect of changing disorder exposure in this group [56] but also the causal identifiability assumptions of conditional exchangeability, treatment–variation irrelevance, and lack of interference between units [57] were all likely violated in some degree.

Evidence from walk-along interviews and other qualitative studies of older adults have contributed to the development of theory suggesting that neighborhood disorder may inhibit physical activity among older adults [5860]. Several recent cross-sectional quantitative studies generally appear to support this theory, albeit with caveats [13, 15, 17]. Our study provides further support that disorder and activity are inversely associated after controlling for salient factors. We did not find evidence that living amidst disorder led to faster decline in activity levels; though with only 2 years of follow-up and less than 3% of subjects moving to new neighborhoods, our power to detect such effects was limited.

While too few subjects moved in over our 2 years of follow-up for us to assess the effect of changing disorder exposure, our longitudinal dataset did allow us to identify changes in physical activity over time. The modestly negative relationship we observed between elapsed time and PASE score (activity decreased an average of 0.5 PASE units per year on average, and that estimate that was sufficiently small and imprecise as to be compatible with no change occurring at all) did not appear to be differential by neighborhood disorder level. Given that we observed no disparity in activity trajectory by neighborhood disorder, there are four complementary explanations for how the presence of the disparity we observed at baseline might have arisen. The first is that consistent residual confounding is responsible for the observed consistent association at each wave. Such confounding would need to be independent of the individual and neighborhood covariates in our model, but we cannot rule this possibility out. A second possible explanation is that residential self-selection is responsible for the emergence of the disparity—that is, on average, subjects selected neighborhoods fitting their activity preferences, and retained their age-specific preferred activity level across all waves of follow-up. A third possibility is that the critical period for neighborhood as an cause of activity norm is prior to age 65, the youngest age in our cohort, such that our subjects had already established physical activity norms suited to their neighborhoods prior to recruitment and continued in these activity behaviors through the duration of the study. Finally, consistent with the socio-ecological model of health behavior, each neighborhood’s support for activity was roughly constant over time, and the differential in activity between subjects results from the differences in support. Future research might explore these mechanisms in more depth.

Strengths

This study has several important strengths. First, as noted above, nearly all prior studies of neighborhood condition and physical activity, especially among older adults, have been cross-sectional [61]. The few longitudinal exceptions [6264] have not examined neighborhood disorder as an influence. Second, this study used a novel low-cost CANVAS/Google Street View measure of neighborhood disorder that can in principle be deployed in other cities, lowering the costs of future replication studies [28, 30]. Third, because this measure of disorder was ascertained independent of survey response, our results are not subject to same-source bias that might arise in survey-only studies [65, 66]. Fourth, we used advanced statistical techniques to account for both missing covariates and loss to follow-up such that missing data would only bias our findings if it was missing not at random conditional on a number of comprehensive covariates [67]. Finally, our results were robust to sensitivity analyses addressing past-week weather and several alternate modeling approaches.

Limitations

However, like most empirical research, this study also has important limitations. First, the low response rate raises concerns that the sample may not be representative of the older adults in New York City. This low response rate was partially due to a low (57%) contact rate among phone numbers selected from a list of numbers provided by a data vendor; it may be that inaccuracies in address and phone number data included in the list hampered the contact rate, though this hypothesis has not been tested empirically. The cooperation rate among those contacted (31%) was within the 30–40% response rate range typically encountered by New York City Department of Health telephone surveys [68] and in line with response rates reported by a recent test of various survey methodologies conducted in Australia [69]. Concerns about non-response are also somewhat mitigated by the population-based sample design and our use of sample weights in analysis.

A second limitation is that several measures used for this analysis were problematic. Specifically, our social cohesion measure had only mediocre internal consistency in this population, raising the concern that the scale may reflect multiple underlying constructs or may have been interpreted differently by different subjects. Assuming social cohesion independently prevents disorder and encourages physical activity, as has been suggested previously [70], residual confounding due to incomplete control for social cohesion might have biased results away from the null. Similarly, while the PASE questionnaire has been validated in several populations similar to the NYCNAMES-II population [24, 25], all physical activity questionnaires are subject to imperfect recall and reporting biases, which may be particularly strong among older adult populations. While imperfect recall would be expected to bias our results towards the null, if residents of more disordered neighborhoods simply fail to recall past-week activities, perhaps as a result of stressful neighborhood encounters, the resulting systematic bias would artificially inflate the association between disorder and activity. However, our concerns about recall are tempered by a related analysis (S.J.M., Unpublished Manuscript) in which we found that types of activity engaged in were fairly stable across waves, making it unlikely that past-week activity was frequently forgotten in as a consequence of transient events. Finally, our measure of functional status, particularly the basic activities of daily living score, was left-skewed with strong ceiling effects. While functional status was not our primary exposure of interest, if our measure failed to capture functional status variation that was positively correlated with activity and negatively correlated with disorder, then our estimates may be inflated due to residual confounding. More broadly, a more sensitive measure might have resulted in an observable association between neighborhood characteristics, physical activity, and changes in functional status, allowing us to control more completely for time-varying confounding by functional status.

A third limitation in this study as in nearly all neighborhood effects studies [71, 72] is residential self-selection—the tendency for people to choose neighborhoods that better support their chosen lifestyles. For example, because disorder can act as a barrier to walking only for subjects who would ever choose to walk, if those subjects on average choose less disordered neighborhoods, then an estimated effect of disorder failing to account for this difference in walking preferences would be biased. We observe, however, that in New York City, as in many North American cities, neighborhood disorder is strongly correlated with race/ethnicity and educational attainment of neighborhood residents, as it was for our study participants. Because we controlled not only for the race/ethnicity and educational attainment of study participants but also for racial/ethnic composition and educational attainment in neighborhoods, confounding introduced by residential self-selection may be somewhat controlled for in our models already.

Conclusions

Our study supports prior observations that older adults living in more disordered neighborhoods are on average somewhat less active than those in more ordered neighborhoods. However, we did not find evidence that the presence of disorder induces faster decline in activity levels among older adults. Whether the between-neighborhood disparity in physical activity levels arose as a result of residual confounding, as a result of residential self-selection, as a result of prior neighborhood influence on activity norms, or as a result of unchanging but differential neighborhood support for activity is an area for future research using datasets with longer follow-up and more dynamic neighborhood conditions.