Introduction

Green, or vegetated roofs are engineered ecosystems composed of a waterproofing layer, a soil-like substrate, and drought-tolerant plants, all installed on the roof of a building. Modern extensive (thin-soil) green roofs are unique ecosystems in that they are: engineered, typically pairing a mixture of a light-weight aggregate material such as heat-expanded shale and compost with slow-growing succulent plants, often those in the Sedum genus; have shallow, quick draining engineered “soils”, referred to as substrate hereafter; experience stressful conditions such as drought, high winds, and ultraviolet radiation; and are “young” ecosystems, typically ranging from 0 to 15 years in the United States with some modern green roofs reaching over 30 years old in Europe (Oberndorfer et al. 2007). These unique qualities, in addition to the defined edges of the roof that form convenient and manageable ecosystem boundaries and watershed delineations, make green roofs ideal candidates for investigating both classic and cutting-edge ecosystem ecology questions.

Green roofs are increasing in coverage in many cities, primarily because of the benefits they provide, including stormwater retention, urban heat island reductions, insulation and passive cooling, and aesthetics and habitat (Reviewed in Oberndorfer et al. 2007). Green roofs and other green infrastructure elements function as engineered riparian areas for urban stream networks (Kaushal and Belt 2012), and the hope is that like many natural riparian zones they will improve water quality by intercepting and reducing the flux of nutrients and other pollutants to receiving waters (Berndtsson 2010; Rowe 2011). Green roofs can perform this function, acting in some cases as sinks for heavy metals (Steusloff 1998; Berndtsson et al. 2006) and some forms of nutrients, including ammonium (NH4 +; Berndtsson et al. 2006; Berndtsson 2010; Buffam and Mitchell 2015). However, in contrast to these positive environmental impacts, green roof ecosystems can also serve as a source of metals (Mendez et al. 2011) and nutrients, including nitrate (NO3 ) and total nitrogen (Berndtsson 2010; Teemusk and Mander 2011; Buffam and Mitchell 2015), dissolved organic carbon (DOC; Berndtsson et al. 2009; Mendez et al. 2011; Buffam and Mitchell 2015), and phosphorus, largely as inorganic phosphate (PO4 3−; Monterusso et al. 2004; Berndtsson et al. 2006). Researchers have shown that young green roofs in particular can contribute very high levels of phosphate via runoff to local waterways (reviewed in Berndtsson 2010; Rowe 2011; Buffam and Mitchell 2015).

The contribution of phosphate (PO4 3−), the biologically accessible and water-soluble form of phosphorus (P), to local waterways from green roof ecosystems is a cause for concern. Phosphorus is a limiting nutrient in many unmanaged environments, but commonly very low in inputs to ecosystems via atmospheric deposition. Because of this, and the way phosphate precipitates out of solution with a number of naturally occurring soil elements, it is typically held very tightly within natural ecosystems and therefore relatively low in runoff (Chapin et al. 2011). For these same reasons, additional contributions of P to local waterways are concerning because through green roof proliferation, with up to 20–30% coverage in dense urban areas (Frazer 2005), large nutrient doses could add to the eutrophication threat that already exists for P limited aquatic ecosystems (Carpenter et al. 1998).

Most studies of green roof runoff water quality have been essentially snapshots in time, but those few that have extended for more than 1 year have observed decreases in P leaching with increasing roof age (Berndtsson et al. 2006). For example, Köhler et al. (2002) found that the ability of green roofs to retain P increased from 26% in the first year of installation to 80% retention after 4 years. The authors concluded that this trend was likely due to plant establishment increasing nutrient requirements. On the other hand, Van Seters et al. (2009) attributed similar P declines to the decreasing pool of substrate P created by the gradual leaching of available P from the green roof substrate over time (Van Seters et al. 2009). Evoking different mechanisms to explain similar patterns highlights an important knowledge gap: are observed declines in runoff P with roof age caused by plant growth and conversion of phosphate into biomass and soil organic matter, or gradual leaching of a finite P pool, or some combination of these mechanisms? Both processes would likely result in similar phosphate release patterns in the early stages following roof installation and overall in the long-term once the levels in the substrate come into balance with the biotic community. However, the rate of change from source to sink or steady-state, and thus the timing, could vary depending on mechanism. Therefore, the levels of nutrients in the substrate relative to the needs of the plant community when it is installed, and during subsequent fertilization events, can potentially have a large and lasting impact on the nutrient dynamics of the system.

Understanding both short-term and long-term dynamics is essential to understand why, and for how long, green roof ecosystems act as sources of P and conversely, how the design of green roofs can be improved to increase P retention. Small-scale controlled studies comparing newly established green roof plots with and without plants all indicate reductions in phosphate runoff due to green roof plant uptake (Aitkenhead-Peterson et al. 2011; Beck et al. 2011; Vijayaraghavan et al. 2012). For example, Aitkenhead-Peterson et al. (2011) studied 6 month old green roof planted and unplanted plots and found that plant uptake and leaching were approximately equally responsible for losses of phosphate from their studied substrates. In contrast, a recent study of a full-scale green roof found strong seasonal patterns of PO4 3− in green roof runoff over 2 years, with concentrations reaching their maximum in the summer (when plants were most active) and minimum in the winter (when plants were mostly dormant), indicating that P uptake by plants was relatively small compared to other processes controlling P availability (Buffam et al. 2016). In this relatively young green roof (1–3 years old), availability of labile P (primarily as compost in the initial substrate) was hypothesized to be out of balance with the establishing plant community. Microbial mineralization of substrate organic matter appears to be the dominant process influencing P dynamics, resulting in net P mineralization and P runoff as PO4 3− (Buffam and Mitchell 2015).

Little is known about the longer-term dynamics of P cycling as green roofs age. To further explore how and why green roof P runoff and retention vary with green roof ecosystem development over longer periods of time, we revisited the aforementioned full-scale roof study (Buffam et al. 2016) and assembled 4 years of high frequency P runoff data. By investigating shifts in seasonal and long-term trends in P runoff, and comparing them with changes in the substrate P pool, we sought to explore how long green roofs act as sources of P and the driving mechanisms behind net release of P. To achieve this aim, we measured P concentrations of roof runoff during a 4-year period and carried out a time-series analysis of the concentration data. The result was a best-fit model including green roof age (i.e. trend) and seasonal dynamics for the 4-year P time-course. The model was validated using runoff P data from 2 other extensive green roofs in the same region to determine if similar temporal dynamics were present. We subsequently estimated total loss of P from the main study green roof and compared these losses with measured changes in the substrate P pool to investigate the roles of plant uptake and leaching in driving the P runoff dynamics.

We hypothesized an overall decreasing trend of P concentrations in runoff with increasing roof age, with annual seasonal maximum concentrations in the summer and minimum concentrations in the winter continuing throughout the 4-year sampling period as a result of temperature-mediated microbial mineralization of organic P in excess of plant demand. However, we expected that the amplitude of the warm-season P runoff maxima would be reduced over time as plants continue to establish and grow and thus take up more P as the roof matures, with most of the P uptake occurring in the early summer when plant growth rates should be highest. Additionally, we predicted that the long-term losses via runoff would be reflected in decreases in the substrate P pool, with most of the P loss from the substrate attributable to loss of P in runoff; and with plants overall playing a minor role in P uptake relative to leaching losses.

Methods

Study sites

The region, southwest Ohio and northern Kentucky, where all 3 study sites are located, experiences average high temperatures ranging from 4 °C in January to 31 °C in July, and an average precipitation of 1080 mm per year (National Weather Service, Wilmington, Ohio). Annual precipitation for each sampling year was obtained from NOAA’s Lunken Airfield Weather Station located in Cincinnati, Ohio approximately 7.5 km away from our main study site (Civic Garden Center). For precipitation, the 4-year sampling period consisted of (in order) a wet year (1280 mm), a dry year (670 mm), and 2 near-average years (1000 mm and 940 mm).

All study sites are extensive (substrate depth < 20 cm; Oberndorfer et al. 2007) green roofs. Our primary study site, the Civic Garden Center green roof (CGC), is a 46 m2 green roof located in Cincinnati, Ohio, that was installed in 2010 with 7.6 cm (3 in.) of Tremco (Beachwood, OH) extensive substrate at a 20° slope with a soil stabilization system, all overlain by a 2.5 cm (1 in.) Sedum mat (Dvorak 2015, Buffam et al. 2016). At time of installation the Sedum mat contained the following stonecrop species: Sedum album, S. sexangulare, S. acre, S. hispanicum, S. rupestre, Phedimus spurius, P. kamtschaticus, and P. hybridus. Immediately adjacent to the Civic Garden Center green roof is the Civic Garden Center traditional roof, a 37 m2, 20° sloped roof composed of asphalt shingles. Both the green roof and traditional roof are partially shaded by deciduous trees (Buffam et al. 2016).

Two other green roofs located nearby in Northern Kentucky were sampled for model validation. The 102 m2 Turkeyfoot Middle School green roof (TMS) is located in Kenton County, Kentucky and like the CGC green roof was also installed in 2010. It differs from the CGC green roof in that it was installed with 10 cm (4 in.) of Tremco (Beachwood, OH) extensive substrate, has a 1% slope, and was planted with plugs of a variety of Sedum species. The 483 m2 Sanitation District 1 green roof (SD1), also in Kenton County, Kentucky, is much older than the other sampled roofs (installed in 2003) and has a 2% slope, a 10 cm (4 in.) thick extensive substrate produced by Roofscapes, Inc. (now Roofmeadow; Philadelphia, PA), and was planted with a variety of Sedum spp., Allium spp., and Bouteloua spp. plugs.

Sample collection

Runoff samples from the Civic Garden Center green roof and traditional roof were collected for chemical analyses for precipitation events over a 4-year period (April 2011 to March 2015). Samples were collected either as grab samples taken directly from the downspout during a precipitation-runoff event or following the runoff event from a high-density polyethylene (HDPE) collection bucket. Because of this approach, the runoff P concentrations included in this study are actually instantaneous concentrations and not necessarily representative of the entire storm event from which it was taken; however a pilot study of within-event phosphate dynamics suggested that our end of event concentrations from the CGC green roof were on average less than 6% different than the event mean concentrations (Buffam et al. 2016). All samples included in our analyses were from discrete events, classified as periods of rain preceded by a period of 12 h with no precipitation (Buffam et al. 2016).

In total, 183 sampled precipitation-runoff events from the CGC green roof and 181 events from the adjacent traditional roof were collected over the 4-year period and used for the analyses in this study. Following the main study period, ten additional CGC green roof runoff events were sampled between April 2015 and July 2015 and used to test model predictions and to help estimate P fluxes relative to changes in the substrate P pool. Atmospheric deposition samples (including the precipitation from the event and any dry deposition between events) were collected for each rainfall-runoff event using a nearby (approximately 45 m distant) HDPE collection bucket. A total of 156 atmospheric deposition samples were collected over the 4-year period and included in our analyses.

Runoff sample collection from the TMS and SD1 green roofs was less intensive, taking place regularly from July to September 2012, and then beginning again in March 2014 through March 2015. In total, 16 rainfall-runoff events were sampled from the SD1 green roof and 15 from the TMS green roof, representing 3 months in 2012 (July, August, and September), 5 months in 2014 (March, June, August, September, December), and 4 months in 2015 (January, March, April, May).

Chemical analyses of runoff

A portion of each runoff sample from all roofs was analyzed for pH (Orion Ross Ultra Combination pH, Thermo Fisher Scientific, Waltham, MA) and conductivity (Orion Conductivity Cell; Thermo Fisher Scientific, Waltham, MA). The remaining sample was passed through a 0.45 μm filter (Millipore MF™ membrane filter; Millipore, Billerica, MA) and analyzed for PO4 3− using the ascorbic acid method (Murphy and Riley 1962) adapted for a microplate reader (Biotek® Synergy H1 Hybrid Microplate reader; Biotek, Winooski, VT) and total dissolved P using ICP-OES (Thermo-Electron iCAP 6300 Duo ICP-OES; Thermo Fisher Scientific, Waltham, MA).

Substrate collection and analyses

Substrate cores were collected from the CGC and SD1 roofs in August 2013, December 2014, and July 2015 using a 3.2 cm diameter soil corer to a depth of 10 cm. Both aboveground and belowground plant biomass were removed during sampling and areas directly beneath the base of plants were avoided to minimize damage to the system. However, detrital matter from roof plants and any overhead vegetation (i.e. canopy leaf litter) were considered part of the core samples. Nine evenly spaced cores were collected from each roof during each collection date and combined into three pooled samples representing the top, middle, and bottom sections of each roof. Immediately following collection, samples were dried at 105 °C to a constant mass, and then finely ground to <1 mm mesh. Loss on ignition at 550 °C was measured for all dried substrate samples (Dean 1974). Samples were also analyzed for organic and total P content using the dry combustion technique (Saunders and Williams 1955), which includes paired subsamples, one of which is combusted at 550 °C for 2 h to oxidize organic P, followed by an extraction of both subsamples with 0.2 N H2SO4. Extracts from both of the paired subsamples were analyzed for PO4 3−-P using the ascorbic acid method (Murphy and Riley 1962), with total P determined from the combusted subsample and inorganic P determined from the non-combusted subsample. Organic P content was calculated as the difference between total and inorganic P (Saunders and Williams 1955). For comparison, total P levels were also determined using the Bowman and Moir (1993) alkaline EDTA extraction method followed by the acid persulfate digest (Bowman 1989) and ascorbic acid method (Murphy and Riley 1962).

Statistical analyses

The temporal variation in P concentration in the CGC green roof runoff was modeled using linear mixed models fitted with maximum likelihood. The significance of the long-term trend, seasonal trend, and interaction terms in the model were determined using model comparisons with and without the respective terms. The Lme4 package in R 3.3.2 statistical software (R Core Team, Vienna, Austria) was used, with yearly intercepts employed as random effects to account for non-independence (Crawley 2012). Total P did not differ from phosphate concentrations (Paired t-test: Mean difference = 0.076 mg/L, t = 1.268, DF = 173, P = 0.206), so statistical analyses used the phosphate data. Analyses were run on three separate PO4 3− concentration datasets: green roof runoff samples, traditional roof runoff samples, and atmospheric deposition samples. For all datasets the concentration data were first converted into a time-series object using the “ts” command in R. Monthly mean PO4 3− concentrations were used because regularly spaced data are required for this analysis. There were on average 3.8 roof runoff samples and 3.4 atmospheric deposition samples collected per month. Model selection was used to test the importance of: roof age in months (Age), which was tested both as an exponential decay relationship (i.e. ec*Age) as expected mechanistically based on a declining pool of P where the decay constant, c, was determined from the slope of Age as related to ln [PO4] (Crawley 2012), and as a linear relationship with the linear term (i.e. Age) replacing all exponential terms; seasonal variability using sin(2πt) and cos(2πt) functions, with the sin function in this case corresponding to the spring and fall dynamics and the cos function corresponding to summer and winter dynamics (Crawley 2012); and interactions between these seasonal dynamics and roof age. Model selection began with the full exponential model, using the following equation:

$$ \begin{array}{l}\left[ PO4\right]={e}^{c\ast Age}+ \sin \left(2\pi t\right)+ \cos \left(2\pi t\right)+\left(1| factor( Year)\right)+{e}^{c\ast Age}\ast \sin \left(2\pi t\right)\\ {}+{e}^{c\ast Age}\ast \cos \left(2\pi t\right)\end{array} $$

Models were sequentially simplified and compared using AIC, deviance, and likelihood ratio tests by removing terms, starting with the interaction terms (e.g. ec*Age*sin, ec*Age *cos) and progressing to the additive components (e.g. ec*Age, sin, cos). The resulting “best” model was therefore the most parsimonious model with the greatest explanatory power. This “best” model was compared to a null model with [PO4 3−] as a function of a constant and the random effect (i.e. [PO4] ~ 1 + (1|factor(Year)), to ensure that the best model had significant explanatory power. The best-fit model was used to extrapolate PO4 3− concentrations for the future sampling months 49 through 144 using the predict function in R.

Using the above monthly time-series approach, 2 green roof runoff data points were missing (January 2014, February 2015) due to little to no runoff during these months, times when the area experienced below freezing temperatures for prolonged periods of time. Both missing data points were gap-filled by interpolation using the average of the 2 nearest-neighbor monthly mean concentrations. In addition to the January 2014 and February 2015 gap-filling, atmospheric deposition data was missing from June and July 2011. These points were also interpolated, with both data points set to the average of the May and August 2011 PO4 3−concentrations. To ensure that we did not introduce any undue bias through gap-filling, we performed a sensitivity analysis that compared the best-fit model for PO4 3− from the sequential model selection (described above) using the interpolated gap-filled points, with the same best-fit model using a dataset with the data gaps set to the minimum, mean and maximum PO4 3−-P concentrations found in our dataset. These comparisons indicate that gap-filling with concentrations at and below the mean have minimal impact on the strength of the model. Gap-filling with concentrations at the maximum values observed in the green roof dataset do have a larger impact on the model likely due to the fact that both of our data gaps occur in the winter and near the end of the 4-year dataset, points where our model predicts very low PO4 3− concentrations in runoff from our green roof.

Finally, the best-fit model and extrapolations for phosphate from our main study site, the CGC green roof, were validated by comparing them with PO4 3− concentrations from the TMS and SD1 green roofs. Data from the TMS green roof was directly compared with the corresponding sampling time in the CGC runoff model. However, because the SD1 green roof was between 9 and 11 years old when it was sampled for runoff, these data were compared with predicted values from the CGC runoff model when it will be of similar age (i.e. sampling months 100–102 in the CGC best model compared with SD1 data from July–September 2012; and sampling months 108–121 in the CGC best model compared with SD1 data from March 2014–March 2015). The TMS and SD1 roofs are good validation sites because they differ somewhat from the modeled CGC green roof in their design (e.g. substrate materials, slope), establishment techniques (e.g. Sedum mat vs. planted plugs), and age, but are representative of commonly employed extensive green roofs and their plant palettes (cf. Oberndorfer et al. 2007).

Phosphorus mass balance

In the absence of continuous measurements of roof runoff volume, we estimated monthly hydrologic (leaching) P fluxes by multiplying the monthly mean concentration of PO4 3− leaving the roof by an estimate of the amount of runoff for that month. Extensive green roofs have an average retention of between 30 and 80% of incoming precipitation (Berndtsson 2010), most commonly averaging 50–60% (Gregoire and Clausen 2011). We used 55% annual retention referred to here as the Intermediate Runoff Scenario, and bounded it with a high end estimate designated here as the High Runoff Scenario (30% retention) and a low end estimate designated as the Low Runoff scenario (80% retention). These estimates of hydrologic P flux, together with the substrate P measurements, allow us to calculate a rough mass balance of P for the CGC green roof, along with associated uncertainty.

Results

Phosphorus runoff dynamics

Phosphate (PO4 3−) concentrations in runoff from the CGC green roof ranged from a maximum of 3.85 mg/L in August 2011, the first year of sampling and a little more than 1 year following roof installation, to a low concentration of 0.48 mg/L in February 2012 and November 2014, approximately 2 and 3.5 years following roof installation, respectively (Fig. 1). There was also a clear decline in mean annual PO4 3− concentrations over the course of the study, dropping from an annual mean running from April to March of each year of 2.17 mg/L in 2011–2012 (1–2 year-old roof) to 0.91 mg/L in 2014–2015 (4–5 year-old roof) (Table 1). In contrast, annual mean pH and conductivity in runoff from the green roof did not change (Table 1). Total P concentrations in green roof runoff did not differ from PO4 3−-P levels (Paired t-test: Mean difference = 0.08 mg/L, t = 1.268, DF = 173, P = 0.206), indicating dissolved organic phosphorus (DOP) in runoff was minor compared to PO4 3−.

Fig. 1
figure 1

Phosphate (PO4 3−-P) concentrations from the Civic Garden Center green roof (grey and black points and green line), traditional roof (red line), and atmospheric deposition (blue line) from April 2011 (Month 1) to March 2015 (Month 48). All sampled concentrations collected from the green roof are shown with grey points, with the mean concentration for that sampling month shown with black points. The green line is the best-fit model for the green roof runoff data, the red line for the traditional roof runoff data, and the blue line for atmospheric deposition data

Table 1 Sampling year (April to March of the following year) summaries for precipitation and Civic Garden Center green roof runoff water quality for the 4-year time-series

Civic Garden Center green roof runoff [PO4 3−] was strongly influenced by both roof age and seasonal effects, with annual summer/autumn peaks and winter valleys in concentration, and a relatively rapid overall decline over the 4-year sampling period (Fig. 1, Table 2) that was best represented by an exponential decay function with a decay constant (c) of −0.032. The full model with an exponential trend term had a lower deviance than the full model with a linear trend term (Δ Deviance = 2.86). Inclusion of both the seasonal (χ2 = 60.71, DF = 4, P < 0.0001) and long-term trends (χ2 = 50.68, DF = 3, P < 0.0001) produced better models than without these terms (Table 2, Fig. 2). Model selection indicated that in addition to the long-term and seasonal dynamics, the interactions between seasonal and long-term trends were also important (Table 2). The significant interaction terms (Age * sin; Age * cos) indicated that the magnitude of the seasonal swings in PO4 3− -P concentrations is significantly decreasing over time (Table 3). The best-fit model was the full model, which was normal and homoscedastic.

Table 2 Green roof phosphate model comparisons showing pairs of models (Model 1, Model 2), differences in AIC and deviance values for the model comparison (Δ AIC, Δ Deviance), the results of the χ2 model comparison (χ2, P), the effect tested by the model comparison, and the result of the model comparison stating whether the models were significantly different
Fig. 2
figure 2

Green roof phosphate (PO4 3−-P) model comparison showing the best-fit model (full model, green line) and mean monthly PO4 3− concentrations (black points) for comparison of model fit to: the “No Trend” model, which is the full model without the long-term trend and random annual intercept (purple line); the “Cos Model”, which is the full model minus all sin (spring and fall seasonal dynamics) terms in red; and the “Sin Model”, which is the full model minus all cos (summer and winter seasonal dynamics) terms in blue

Table 3 Fixed effect parameter estimates of the green roof phosphate (PO4) best model (full model)

Both sin and cos terms and their interactions with Age were significant (Tables 2 and 3). The significance of the cos term, which accounts for the influence of summer and winter seasonal dynamics, was not surprising, as the concentrations had been observed to track seasonally with temperature (Buffam et al. 2016). However, the sin term also had significant predictive power, indicating some influence of spring and fall seasonal dynamics. It appears that the spring and fall seasonal dynamics may be most influential especially in the first and last years of this study (Fig. 2). In the first year, the concentrations of PO4 3− -P peaked in August, which is in between the peaks of the sin and cos curves (Fig. 2). In years 2 and 3 (months 12–36), the full model visually appears to align with the cos only model (i.e. full model minus the sin term; Fig. 2), indicating a diminishing influence of the spring and fall seasons. However, in the final year (months 36–48), the full model and sin curve appear to converge once again, especially in the last several months of sampling (Fig. 2). Overall, these dynamics indicate that the maximum seasonal concentrations have shifted from late summer (August maximum in years 1 and 2) to mid-summer (July maximum in year 3) to late spring (May maximum in year 4) over the 4-year period. These dynamics are also reflected in the additional samples collected between April and July 2015 and the extrapolated model, with the maximum concentrations occurring for both in May in year 5 and then, for the extrapolated model, shifting to April in subsequent years.

The change in model explanatory power resulting from the removal of the Age terms, including the additive and multiplicative (interaction) terms, was the greatest of all the model components, followed closely by the cos and sin terms (Table 2). This indicates that roof age, followed by the summer and winter seasonal dynamics, had the greatest effect on PO4 3− concentrations in runoff from the CGC green roof.

The seasonal and long-term trends in phosphate concentrations were not a consequence of seasonal and long-term patterns in atmospheric deposition, and were also not evident in runoff from the adjacent CGC traditional, shingled roof where the null model provided the best fit (Fig. 1). While the best-fit model for atmospheric deposition did include both the cos (summer and winter seasonal dynamics) and Age, as well as their interaction (AIC = −131.04; Deviance = −143.04), and had better explanatory power than the null model (χ2 = 11.90; P = 0.008), the inputs were minor compared to the greatly elevated concentrations of phosphate in runoff from the green roof (Fig. 1).

Phosphorus runoff declined substantially from year-to-year for the CGC and TMS green roofs, but less so for the older SD1 green roof. The best-fit model for explaining PO4 3− concentrations from the CGC green roof was able to reasonably predict the dynamics for samples collected outside the model’s date range (i.e. April–July 2015), with actual mean monthly values differing from predictions on average for the 4 months by 0.18 mg/L, or 24%. The CGC green roof full model was not, however, able to fully predict annual mean concentrations of phosphate in runoff from the TMS or SD1 roofs, with the CGC green roof having elevated P runoff concentrations or predicted concentrations compared to the other roofs in all years (Fig. 3). Over the ten months (3 months in 2012, 7 months in 2014–2015) when samples were collected from the TMS green roof and the CGC green roof, the mean difference between the CGC runoff model and the mean monthly concentrations for TMS was 0.59 mg/L PO4 3−-P (Fig. 4b; Adj. R2 = 0.41, RMSE =0.59, P = 0.028). Runoff phosphate concentrations from the older SD1 green roof differed overall on average from the CGC model extrapolations for 2019–2021, when the CGC green roof will be the same age as the SD1 green roof when it was sampled, by 0.31 mg/L (Adj. R2 = −0.09, RMSE =0.39, P = 0.941; Figs. 3 and 4c).

Fig. 3
figure 3

Annual mean phosphate concentrations in green roof runoff as a function of roof age, showing runoff from the Civic Garden Center green roof (CGC; black points), CGC runoff predictions from the best-fit model (CGC Predict; white points), and runoff from the Turkeyfoot Middle School (TMS; red squares) and Sanitation District 1 (SD1; blue triangles) green roofs. Annual mean input of phosphate from atmospheric deposition, based on levels at the CGC site, is shown with a dashed line. Error bars indicate ±1 SE

Fig. 4
figure 4

Civic Garden Center green roof phosphate (PO4 3−-P) best-fit model (full model; shown with black line) with: a interpolated predictions (months 1–48) and extrapolated predicted values (months 49–144); b concentrations of all samples collected from the Turkeyfoot Middle School green roof (red points); and c extrapolated predictions for sampling months 98 through 122 in black, with the runoff concentrations of PO4 3− from the Sanitation District 1 green roof in blue. For panel B, samples from the Turkeyfoot Middle School green roof from months 49 and 50 are shown to indicate extrapolation potential for the model. Any modeled negative concentrations were set to zero

Extrapolating the CGC runoff model and assuming 55% retention of incoming precipitation by the green roof (Berndtsson 2010; Gregoire and Clausen 2011) indicated that the CGC green roof is not likely to become a net sink (average annual runoff concentration < P in atmospheric deposition) for phosphate in the near future, with levels of P in runoff approaching an asymptote around 11 years following installation but continuing to act as a source (Fig. 3). Contrary to the CGC green roof predictions, the mean annual phosphate concentrations in runoff from the SD1 roof approached the level of incoming phosphate in atmospheric deposition when the roof was 11 years old (Fig. 3).

Substrate organic matter and phosphorus content

Substrate organic matter content did not change significantly over the measured time frame for either the CGC or SD1 roofs (Table 4). In contrast, there was a significant reduction in the substrate P content over time (2013–2015) for the CGC roof and overall lower substrate P in the older SD1 roof (Fig. 5; 2-Way ANOVA: F5,12 = 7.83, P = 0.002). The CGC green roof experienced a 39% substrate P reduction (loss of 188 ± 85 mg P/kg) over nearly 2 years when the roof was between 2 and 4 years old (Tukey HSD: P = 0.019). The older SD1 green roof substrate P pool, which averaged 23% lower than the CGC roof (Tukey HSD: P = 0.013), did not change significantly over the studied time frame (Fig. 5; Tukey HSD: P = 0.680). There were no significant differences for organic P, either by roof or over time (Fig. 5; 2-Way ANOVA: F5,12 = 1.23, P = 0.355).

Table 4 Mean substrate composition of green roof substrates collected from the Civic Garden Center (CGC) and Sanitation District 1 (SD1) green roofs in August 2013, December 2014, and July 2015
Fig. 5
figure 5

Substrate phosphorus (mg P/kg) over time from the Civic Garden Center (CGC; left panel) and Sanitation District 1 (SD1; right panel) green roofs. For each green roof at each sampling time, 3 pooled substrate samples were collected. Error bars show the mean ± 1 SE for total phosphorus. The * indicates a significant difference (P < 0.05)

Both the dry combustion (Saunders and Williams 1955) and alkaline EDTA (Bowman and Moir 1993) extractions showed similar levels and patterns for total P in the CGC substrates, with the dry combustion method always resulting in higher extracted P and on average differing in extracted P by 11% (1.7 g P/m2). Likewise for the SD1 substrates, the dry combustion method always resulted in higher extracted P, but the 2 methods differed overall on average by 40% (3.9 g P/m2).

Phosphorus mass balance

We estimated the P loss for the CGC green roof using an Intermediate Runoff Scenario (55% flow reduction) bounded by Low (80% reduction) and High (30% reduction) runoff scenarios (Table 5), and compared these estimates with the changes in the substrate P pool (Fig. 6). This mass balance estimate indicates that the losses of P in runoff (0.36–1.25 g P/m2) actually represent only a small fraction of the total P that was lost from the substrate pool (8.92 ± 4.4 g P/m2) between the August 2013 and July 2015 substrate samplings (Fig. 6). The remaining 7.7–8.6 ± 4.4 g P/m2 of P that was lost from the substrate over the 23-month period is unaccounted for.

Table 5 Estimates of runoff volume (l/m2) and phosphorus loss (g P/m2) from the Civic Garden Center green roof based on 55% precipitation retention by the green roof (i.e. Intermediate Retention Scenario), and contributions of phosphorus (g P/m2) in atmospheric deposition. In square parentheses, runoff volume estimates for the green roof are bounded using a “High Runoff” scenario where only 30% of incoming precipitation is retained, and a “Low Runoff” scenario where 80% of incoming precipitation is retained
Fig. 6
figure 6

Mean substrate P (g P/m2) content of the Civic Garden Center green roof (black) compared to estimated changes attributed to cumulative P loss in runoff (Grey). The solid grey line indicates changes due to runoff using the Intermediate Runoff Scenario (i.e. 55% retention). The Lower and Upper Bound lines indicate the difference or sum, respectively, of the High (30% retention) or Low (80% retention) Runoff Scenario predictions and 1 SE from the CGC 2013 substrate sampling. Error bars indicate ±1 SE

Discussion

The levels of P in runoff and in the substrates of the studied green roofs varied largely over time and among roofs, and for the main study green roof (CGC) both were on par with fertilized agroecosystems. Phosphate concentrations leaving the CGC green roof started at very high levels but declined substantially after only a few years. In fact, concentrations of PO4 3−-P leaving the green roof during the summer of the first year were on par with observed levels in wastewater (3–10 mg/L; Metcalf and Eddy 1991), contributions that can easily contribute to eutrophication in downstream ecosystems (Carpenter et al. 1998). The levels observed in this study are also within the very large range of PO4 3−-P concentrations measured in runoff from other green roof studies of varying ages (e.g. 0.25 mg/L (Gregoire and Clausen 2011); 29 mg/L (Vijayaraghavan et al. 2012)).

The CGC, TMS, and SD1 green roofs differed substantially in their P runoff, with the CGC green roof having elevated concentrations relative to the other roofs. One clear difference between the SD1, TMS, and CGC green roofs is their surroundings. The CGC is partially shaded by deciduous trees (Buffam et al. 2016), and thus subject to additional inputs of nutrients via leafwash and organic matter via leaf fall, compared to the SD1 and TMS green roofs which do not appear to receive significant inputs of materials from outside the roof. However, when expressed as a function of time since construction, all roofs could be visualized on a similar trajectory with declines in P runoff over time (Fig. 3). Thus, while some of the variability among roofs in this study may be due to the differences in initial substrate materials, vegetation communities, plant establishment method, and P inputs, our results are consistent with the hypothesis that these roofs may all have experienced similar and predictable changes in P runoff with roof age.

All three studied roofs employed commercially available and widely used green roof substrate materials and plant communities with substrate depths (~10 cm) similar to other extensive roofs. The study roofs therefore are reasonable representatives of other extensive green roofs constructed during the same time frame, at least in the U.S.

What are the mechanisms driving the observed temporal dynamics?

Our results show an overall decline in runoff phosphate concentrations, primarily driven by the aging of the roof (i.e. Age term) and its interactions with seasonal dynamics (Fig. 1, Table 2), regardless of the large variation in annual precipitation during the 4-year study (Table 1). This result suggests that P concentrations are relatively independent of variation in event characteristics. Notably, conductivity, which is a measurement of dissolved salts, showed no trend over the 4-year sampling period (Table 1), indicating that P is behaving differently than the bulk ions in solution, perhaps because of its status as an essential and frequently limiting nutrient for biotic communities (Vitousek et al. 1998).

In a companion study of the CGC green roof focusing on the first 2 years of measurements, variation in rainfall, including the effects of precipitation intensity and antecedent conditions, had only a minimal impact on runoff concentrations of P, especially compared to mean weekly temperature (Buffam et al. 2016). Therefore, the observed seasonal dynamics, especially the increased release of P during the summer months, are likely a result of increased rates of biogeochemical reactions in the green roof substrate occurring at higher temperatures, resulting in net mineralization (i.e., P release > P uptake) and release of P into runoff waters (Buffam et al. 2016). A high rate of net mineralization is hypothesized to be due to the imbalance between the large pool of organic P in the initial substrate, especially relative to nitrogen, and the low nutrient requirements of the plant community (Buffam and Mitchell 2015). These mechanisms are further supported in this study by the observed declines in substrate P and relatively static substrate organic matter and nitrogen content over time in the CGC green roof (Table 4), indicating that P is leaching preferentially from the substrate.

We were only able to attribute a small proportion of the P lost from the CGC green roof substrate over a nearly 2-year period to leaching losses. We attribute the unexplained P loss to other, unmeasured pools or fluxes; most likely, plant uptake and/or a shift over time in the sorption capacity, binding strength, or form of P. Small-scale studies of newly established green roof plots have observed large reductions in phosphate runoff due to plant presence. For example, a plot-scale study estimated that approximately half (1.4 mg PO4 3−-P/kg or 0.23 mg PO4 3−-P /kg/month) of the phosphate lost from their green roof substrates over the first 6 months of green roof plot establishment was due to plant uptake and translocation, with the other half attributable to leaching losses (Aitkenhead-Peterson et al. 2011). In newly established green roof plots, Beck et al. (2011) found planted plots reduced phosphate runoff by 0.53 g PO4 3−-P/m2 compared to unplanted plots during the first two runoff events. These studies show that plant activity can have a measurable impact on P leaching rate in green roofs, either by changing soil properties or by direct uptake and incorporation of P into plant biomass. The amount of P bound in plant biomass on the CGC roof is likely about 1–3 g P/m2, based on a range of 600–700 g/m2 plant biomass for Sedum-dominated extensive green roofs with complete coverage (Getter et al. 2009) and assuming a plant P content of 2–4 mg P/g (Güsewell 2004; Bell et al. 2014; Kulik et al. 2017). As plant cover on the CGC roof was full throughout the time of the current study, we can reasonably assume that the change in plant biomass P over time was less than 2 g P/m2, thus responsible for only a portion of the >3.3 g P/m2 of unaccounted for substrate P loss in the roof from 2013 to 2015.

Altogether, studies of green roof P dynamics, including this one, suggest that plants have a strong impact on green roof P cycling; but plant uptake is unlikely to explain all the unaccounted for P lost from the substrate in our study. This points to other age-driven dynamics such as an increased storage of P in compounds resistant to our extraction methods, a shift in the sorption strength of the substrate materials, or other unmeasured loss pathways. Intriguingly, we did observe a shift in P runoff maxima from late summer in the first year to late spring in the final year of this study, potentially indicating a shift in the processes driving the release of phosphate. Agricultural systems can experience losses of P in the spring due to a lag between relatively rapid increases in microbial decomposition with increasing temperature, compared to the relatively delayed break in dormancy of the plant community (Chapin et al. 2011). However, more data will need to be collected in order to confirm if these seasonal shifts are indeed a symptom of roof age and to determine the underlying mechanisms.

How long will green roofs act as sources of phosphorus?

The observed year-to-year declines in the P content of the CGC green roof substrate and the lack of significant year-to-year change in the older SD1 substrate, together with the leveling off of runoff phosphate, could indicate that these systems either are moving towards (CGC) or have reached (SD1) a steady-state. At steady-state, P either may become or has already become a limiting nutrient that is tightly held and recycled by biota in the system (Vitousek et al. 1998). Expectations from other systems and the Nutrient Retention Hypothesis (Vitousek and Reiners 1975) suggest that once steady state is reached, the level of P in runoff should approximately equal the level of P in atmospheric deposition or fertilizer inputs. This equilibrium may have been reached for the SD1 green roof by 11 years following installation (Fig. 3). However, predictions based on the CGC green roof model suggest that this green roof is approaching a steady state, like the SD1 green roof, around 11 years following installation, but at a higher level than runoff concentrations from SD1 and concentrations in atmospheric deposition (Fig. 3). These elevated levels may in part be a result of our extrapolation method where predicted negative concentrations were set to zero. When negative concentrations are included, mean annual phosphate concentrations are reduced and similar to SD1, but still elevated relative to atmospheric deposition, when the roof is 13 years old (Mean = 0.18 mg/L). It is as yet unclear why the CGC green roof would either progress towards an elevated steady-state or be delayed in reaching it; however additional P inputs to the CGC green roof due to surrounding trees, as discussed earlier, may play a role.

How do green roof phosphorus dynamics compare to other ecosystems?

Even after 5 years following green roof installation (i.e., Sampling Year 4; Table 5), P export is high relative to natural systems and even regularly fertilized agricultural systems, often by an order of magnitude or more. Average annual fluxes of PO4 3−-P from the CGC roof, depending on runoff volume estimates, are between 15 and 51 g, or 0.3 to 1.1 g/m2/year (Table 5). By comparison, net P export from Hubbard Brook, New Hampshire watersheds ranged from 0.002 to 0.02 g P/m2/year (Hobbie and Likens 1973). Measurements of dissolved P export from agroecosystems with drainage tiles, which create a relatively shallow drainage system more representative of green roof drainage, range from 0.013 g P/m2/year for an unfertilized cornfield (0.020 g P/m2/year when fertilized; Culley et al. 1983) to 0.044 g P/m2/year from a fertilized grass agrosystem in New Zealand (0.008 g P/m2/year when not fertilized; Sharpley and Syers 1979). In a review of P export from agricultural systems (Sharpley et al. 2001), levels of dissolved P export as great as the CGC green roof were only observed in surface runoff from manure-fertilized systems; an alfalfa field in Minnesota (0.480 g P/m2/year; Young and Mutchler 1976) and a fescue system in Arkansas (0.480 g P/m2/year; Edwards and Daniel 1993).

The level of substrate P in the CGC green roof about 3 years (486 mg P/kg) and 5 years (299 mg P/kg) following its installation was above or similar, respectively, to that of agricultural systems (326 (unfertilized)-384 (fertilized) mg P/kg; Sharpley et al. 1995). In contrast, the level of P in the older SD1 green roof substrate, which averaged 265 mg P/kg, was 23%–45% less than the aforementioned agroecosystems. These comparisons indicate that the green roofs in this study, and likely many other green roofs that use commercially available substrates amended with large amounts of compost or fertilizer, are initially P rich systems, even relative to fertilized agricultural systems which are designed for high productivity and harvest. This early imbalance is especially unnecessary considering that typical green roof plants are slow growing and generally adapted to nutrient poor conditions (Lundholm 2006; Buffam and Mitchell 2015).

Conclusions, implications, and future studies

Comparisons of P runoff across ecosystems indicate that the young green roofs in this study, and likely many other green roofs shortly following installation, are operating with similar levels of P export as highly managed agricultural systems, which are designed to maximize plant productivity. However, common green roof plants (e.g., Sedum), selected for stress and drought resistance and with reduced nutrient requirements and growth rates, do not require the high P levels delivered in current engineered green roof soils.

Because they have engineered “soils”, green roofs provide a rare opportunity to match the levels of nutrients with the requirements of typical green roof plants. While roof vegetation is likely reducing P runoff for several years following roof installation, the substrate P pool relative to the P requirements of the plant community is clearly too large, leading to high leaching losses of P. For current engineered green roof soils, our model and older SD1 roof observations predict P levels will exceed plant demand for 10+ years following installation. These imbalances warrant P reductions in the substrate of newly installed green roofs and cautious use of fertilizer amendments. This extends to slow-release and organic forms of P, which were the principal source of P runoff in this and most other green roof studies (Buffam and Mitchell 2015). Additionally, more attention should be paid to the nitrogen to phosphorus ratio of green roof substrates, as evidence is mounting that many green roofs are limited by N, not P (Teemusk and Mander 2007; Johnson et al. 2016); meaning no gains in plant growth and associated benefits, no matter how much P is added, without sufficient available N. Ultimately, the main goals of the green roof installation (often stormwater runoff reduction and thermal benefits) should be considered, which in many cases do not require a high density of plant biomass.

As green roofs are predicted to last anywhere between 40 and 100 years, more research must be done on older roofs to determine their behavior with regards to P, as well as N and other important elements. We also recommend that green roof ecological studies incorporate more frequent and long-term measurements in order to fully capture roof behavior, rather than a snapshot approach that may greatly over- or under-estimate roof characteristics, depending on when the snapshot was taken.