Introduction

Coastal and estuarine ecosystems around the world are threatened by cultural eutrophication (Nixon 1995) as a result of nonpoint source pollution of nitrogen and phosphorus (Carpenter and others 1998; Conley and others 2009). A prominent symptom of eutrophication in estuarine ecosystems is the diminishment of dissolved oxygen (DO) levels to hypoxic or anoxic levels. Hypoxia is now widespread throughout estuaries and semi-enclosed seas throughout the world (Diaz 2001), most often manifest seasonally after spring blooms have dissipated and vertical mixing is diminished. The duration, extent and magnitude of seasonal hypoxia has dramatically increased over the past few decades (Diaz and Rosenberg 2008), prompting concerns about the consequences of hypoxia for marine organisms that are dependent on these ecosystems (Diaz and Rosenberg 1995). Because eutrophication abatement policies are likely to be costly, they are unlikely to be pursued unless their ecological effects and subsequent impacts on the delivery of ecosystem goods and services can be gauged.

A substantial body of research has produced a conceptual model for understanding both the immediate and the lasting effects of seasonal hypoxia on marine fauna. Hypoxia can be lethal for sessile organisms that lack the capacity to reduce their exposure to low dissolved oxygen levels, producing shifts in communities and the species that prey upon them (Pihl and others 1992; Powers and others 2005). The dominant responses of mobile species are distributional shifts (Breitburg 1992; Diaz and Rosenberg 1995; Ludsin and others 2009; Zhang and others 2009). These responses may lead to indirect ecological effects such as reduced growth (Petersen and Pihl 1995; Craig and Crowder 2005; Eby and others 2005; Brandt and others 2009; Kidwell and others 2009), lowered reproductive output (Aumann and others 2006), and increased vulnerability to predators (Long and Seitz 2008). Distributional shifts may also intensify species interactions in refuge habitats (Lenihan and others 2001). Over longer time periods and repeated exposure, benthic species diversity may be reduced (Conley and others 2007) and species composition may shift to favor those with opportunistic life history strategies (Diaz and Rosenberg 1995). Revealing the effects of repeated exposure to hypoxia on populations of mobile species is complicated by uncertainty concerning how local responses act in a compensatory fashion to diminish population-level impacts (Breitburg and others 2009).

The effects of hypoxia on marine organisms are most commonly measured via spatial or temporal comparisons between hypoxia-impacted and reference areas. Thus, the evaluation of seasonal hypoxia appears suited to standard ecological assessment procedures. A crucial element that underpins this approach is the presumption that compared sites are similar to each other in ecologically relevant ways (Underwood 1994). This presumption is often not warranted when evaluating ecological impacts from cultural eutrophication, where the disturbances such as hypoxia are themselves secondary responses that are often related to local environmental features. For instance, hypoxia is more likely in marine systems when nutrient loading is accompanied by strong vertical stratification, low tidal flushing rates, and long water residence times (Diaz and Rosenberg 2008). Thus, it can be difficult to find suitable undisturbed reference sites that share characteristics with sites exhibiting secondary effects of eutrophication. This dependency on local characteristics may make simple comparisons across sites or ecosystems confounded, necessitating more elaborate sampling designs to avoid drawing spurious conclusions (Underwood 1992, 1994). Improvements to the standard Before-After Control-Impact design attempt to control for confounding covariates by sampling multiple regions, each of which shares at least one common characteristic with impacted regions. In this way, the relative effect of the ecological disturbance can be weighed against those attributed to alternative environmental characteristics. These sampling schemes are more robust and less susceptible to drawing spurious conclusions, but they also require large sample sizes (replication in either space or time) to detect significant effects because of their low statistical power (Underwood and Chapman 2003).

Here, we employ a hierarchical “ensemble” statistical approach to measure evidence for sustained and immediate responses to hypoxia among species that comprise the benthic fish, benthopelagic fish, and the macroinvertebrate community in Hood Canal, Washington, USA. Hierarchical ensemble analyses, wherein one assumes a priori that certain species are likely to respond to disturbances in similar ways, strengthens statistical power by “borrowing strength from the ensemble” (Sauer and Link 2002). This strength derives from the capacity to estimate effects on species in a manner that draws upon information contained in data collected on similar species. This approach is already widely used in other applications, typically by integrating observations collected over distinct temporal periods (Harley and others 2004; He and Bence 2007; Stow and Scavia 2009), spatial regions (Qian and others 2009, 2010), or across species (Sauer and Link 2002; Clark 2003; Russell and others 2009).

Our principle aim was to evaluate the magnitude of local density responses exhibited by demersal fauna to seasonal hypoxia in Hood Canal, Washington. Although low dissolved oxygen (DO) conditions in southern Hood Canal have been observed as far back as the 1950s, the temporal and spatial extents of hypoxia have increased over the past two decades, presumably from a combination of natural and anthropogenic influences (Fagergren and others 2004). Low DO (<3 mg/l) and hypoxia (<2 mg/l) are now regular features of bottom waters in the southern portion of the main channel and the surrounding areas. Since regular year-round sampling began in 2004, bottom (>100 m) hypoxia in the southern main channel has been present as early as May, but more commonly appears in late summer (Hood Canal Dissolved Oxygen Program, Citizens Monitoring Program, unpublished data, http://www.hoodcanal.washington.edu/observations/cm_time_series.jsp). Fish kill events occurred in 2002–2004 and 2006 when southerly winds upwelled anoxic water to the surface, prompting concern about the causes and impacts of these events on local fauna. The spatial and temporal extent of effects across species is presently unknown. Our a priori expectations were that sessile species would exhibit reduced densities in the hypoxia-impacted site at all times, but that mobile species would only exhibit a density response when seasonal hypoxic conditions were present.

Methods

Sampling Design, and Data Collection

Because potential reference sites are not exactly like the hypoxia-impacted site, we chose multiple reference sites with differing characteristics to allow us to account for these factors in our analyses. This sampling design included a single hypoxia-impacted site, “Hoodsport” located in southern Hood Canal (Figure 1). One reference site (Hazel Point) was chosen because it sits in the same basin as the hypoxia-impacted site but does not experience regular seasonal hypoxia (Figure 1). A second reference site (“Possession Sound”) was chosen because it has a similar bathymetric profile (steeply sloped channel bathymetry) as the hypoxia-impacted site, but is located in the main Puget Sound basin (Figure 1). Bathymetry may alter local population densities because marine communities are strongly zonated by depth. Thus, steep bathymetry induces strong ecological gradients that may reduce effective habitat availability for some species. Further, previous oceanographic monitoring revealed that sea surface salinities were similar (ca. 27–29) between this site and the hypoxia-impact site, as both sites have large rivers nearby. The third reference site (“Useless Bay”) was chosen because it has a similar bathymetric profile as Hazel Point, but is sited in the same basin as Possession Sound (Figure 1).

Figure 1
figure 1

Location of sample sites in Hood Canal and central Puget Sound, Washington. Hoodsport is the only site subjected to seasonal hypoxia. Hazel Point is located within Hood Canal, but does not regularly experience hypoxia. Possession Sound shares similar bathymetry as Hoodsport, whereas Useless Bay shares similar bathymetric features as Hazel Point. Smaller map indicates the general region where the study was conducted.

Benthic fish, benthopelagic fish, and macroinvertebrates were sampled prior to the onset of summertime hypoxia in June 2007 and during hypoxia in September 2007. At each site, nine trawl stations were sampled in a depth-stratified design. Within each depth strata, sampling sites were haphazardly selected based on proximity to other site locations (minimum 1-km distance), bottom terrain (avoiding slope gradients larger than 20%), and other logistical concerns related to gear deployment. The goal was to equally distribute the nine trawl stations among three depth categories; shallow (30 m), intermediate (60 m), and deep (100 m). Due to challenging bottom terrain, Hazel Point and Useless Bay were sampled at three shallow, two intermediate, and four deep locations. The same trawl locations were sampled in June and September. One of the deep locations at Useless Bay was identified as an outlier based on unique bottom substrate and subsequent trawl composition and was thus removed from the dataset, resulting in a total of 70 trawl samples for analyses.

All trawls were performed using a 400-mesh Eastern otter trawl with a 3.2-cm mesh codend, a 21.4-m head rope, and a 28.7-m foot rope during daytime hours. Although species-specific size-selectivities have not been directly estimated, analysis of length frequency distributions indicates selectivity for fish longer than 80 mm. Tow distances, calculated from start and end coordinates, ranged from 0.24 to 0.62 km. The area swept was calculated by multiplying tow distance by the net opening width, calculated from an empirical relationship between depth and net width (W. Palsson, Washington State Department of Fish and Wildlife, personal communication). The area swept ranged from 0.0027 to 0.0076 km2. Fish and invertebrates were sorted by species and quantified in terms of numerical abundance. Numerical abundance was calculated directly or by subsampling a weighed portion of the catch (if abundance exceeded ca. 500 individuals). Subsample weights ranged from approximately 20–50% of total sample weights.

Benthic grabs (Van Veen) were used to collect sediment samples following each trawling event. The grabs were positioned at approximately equal spacing along the length of the tow transect at three locations. Sediment composition was analyzed for percent gravel (>1.00 mm), sand (0.425–1.0 mm), and fines (<0.425 mm) using sieves from the U.S. Standard Sieve Series. It was assumed that the sediment composition remained constant between the June and the September sampling periods. A Sea-Bird SeaCat 19plus (Sea-Bird Electronics, Inc., Bellevue, Washington, USA) conductivity—temperature depth profiler with a SBE 43 dissolved oxygen sensor was deployed at multiple locations (n = 4–6) at all four sites to collect temperature, salinity, and dissolved oxygen data profiles. These data were averaged for each combination of location, time period, and depth. Because of equipment malfunction, we were unable to collect environmental data during the September sampling event in the Useless Bay site. We instead used data collected at this site from the same equipment 18 days prior to the biological sampling.

Species Selection

Because we captured over 170 species, many of which were rare or absent in Hood Canal samples, we screened species for inclusion into analysis based on their frequency of occurrence. The goal was to conduct analyses on species for which we had adequate data to evaluate the alternative hypotheses. To be included in the analyses, species had to meet one of four criteria: (1) present in at least 30% of all samples, (2) present in at least 30% of samples in any area, (3) present in at least 30% of samples in any depth strata, and (4) present in at least 50% of samples in Hood Canal. From species that met one of these criteria, we eliminated one species (copper rockfish; Sebastes caurinus) because they were rare in all Hood Canal samples (occurring in fewer than two samples in either Hood Canal sites). This rarity would make precise estimation of impact effects, the goal of this study, impossible. This filtering process produced 16 invertebrate species and 19 fish species for analysis.

Statistical Models

Our approach was to pose alternative hypotheses as statistical models, fit each model to the catch data for each species and quantify the degree of support for each. We then estimated effect sizes from a full model that contained all possible covariates to compare the size and the precision of estimated hypoxia effects derived through conventional and hierarchical ensemble methods. Here, we describe the statistical models that we employed, define the general framework for model selection, and describe the approach used to evaluate alternative hypotheses of interest, that is, the importance of hypoxia versus other environmental features to explain the spatial and temporal variation in catch rate.

We used generalized linear models (GLMs) to represent hypotheses about the underlying processes that govern spatial and temporal patterns in local density (Faraway 2006). In GLMs, linear combinations of predictor \( X_{0} ,X_{1} , \ldots ,X_{\text{n}} , \) and coefficients \( \beta_{0} ,\beta_{1} , \ldots ,\beta_{\text{n}} \) are related to the response variable through a specified link function. We used a log-link function, which is commonly used in count data because the predicted mean response is always non-negative. Also, this link function presumes that effects of predictor variables on the response variable are multiplicative rather than additive. We used a special form of count regression, termed rate models by Faraway (2006) with a negative binomial probability density function. In rate regression, the count of the number of individuals sampled is the response variable, but natural log (area swept) is added to the linear combination of predictors with a fixed coefficient of 1. In this way, the remaining model terms predict the mean density (# individuals/area swept). The negative binomial model was chosen after initial exploration revealed that the data were too overdispersed to be adequately represented by a Poisson function (based on analysis of residual deviance). We fit all GLMs using the R statistical software via the glm.nb routine.

We posed the alternative hypotheses explaining differences in species’ densities among sites as alternative configurations of predictor variables. The simplest model was that mean catch rate for sample i was determined only by sampling depth and time period (the “null” model):

$$ \eta_{i} = \beta_{0} + \beta_{T} T_{i} + \beta_{\text{D}} D_{i} + \ln \left( {{\text{area}}_{i} } \right) $$
(1)

where η i is linked to the response variable, Y i (number of individuals captured in sample i) by exp(η) = Y, T i is a dummy variable equaling 1 for September samples and 0 for June samples and D i is a categorical variable indicating the depth at which each sample was taken. All subsequent models included these two predictor variables.

To include hypoxia impacts, we first considered whether the hypoxia-impacted site had different densities across both sampling periods:

$$ \eta_{i} = \beta_{0} + \beta_{\text{T}} T_{i} + \beta_{\text{D}} D_{i} + \beta_{\text{H}} H_{i} + \ln \left( {{\text{area}}_{i} } \right) $$
(2)

where H i is a dummy variable equaling 1 if the sample came from the hypoxia-impacted site and 0 otherwise and the coefficient β H provided a measure of differences in species’ densities that were persistent across both sampling periods (pre- and post-hypoxia). We refer to this coefficient as the hypoxia main effect, to note its similarity to ANOVA-based analysis. To consider responses that were only manifest during seasonal hypoxic events, we included an interaction term:

$$ \eta_{i} = \beta_{0} + \beta_{T} T_{i} + \beta_{\text{D}} D_{i} + \beta_{H} H_{i} + \beta_{H \times T} \left( {H \times T} \right)_{i} + \ln \left( {{\text{area}}_{i} } \right) $$
(3)

where the coefficient βH×T represents the change over time that was unique to the hypoxia-impacted site.

Equations (2) and (3) represent two hypotheses to explain differences in catch rates between hypoxia-impacted sites and reference sites. We sought to evaluate those against alternative hypotheses, namely that catch rates are explained by differences between basins and bathymetry types. We therefore considered expanded versions of equation (1) that test for basin, basin × time, bathymetry, and bathymetry × time effects. Basin and bathymetry were treated as dummy variables. We focused on a limited set (n = 14) of potential models consisting of the null model, all single main effect models, all models with time interaction effects and all pairwise combinations of main effects and interactive effects.

We evaluated the degree of support for each alternative model using Bayesian Information Criteria (BIC) and BIC weights (BICw). BIC is easily computed directly from the maximum likelihood of each model, and thereby permitted a rapid and simple way to assess the degree of support for hypoxia impacts when alternative explanations for spatio-temporal patterns of abundance were considered. BICw is often considered to be an approximation of Bayes factors (Link and Barker 2006; Ward 2008) that relate the ratio of posterior to prior odds between two competing models (BICw = 1 indicates the best fitting model). Following Raftery (1996), models that have BICw greater than \( \frac{1}{3} \) cannot be dismissed in favor of the best fitting model; the best fitting model therefore has notable support if none of the alternative models have BICw greater than 0.33. In general, BIC more strongly favors simpler models over complex models when compared to the Akaiki Information Criteria (Ward 2008).

Parameter Estimation

An alternative way to evaluate the degree of support for hypoxia impacts is to estimate uncertainty in effect sizes for the hypoxia and hypoxia × time coefficiencts from a full model that includes all possible predictors. We therefore estimated Bayesian posterior probability densities and 90% credibility intervals for each species’ β H and βH×T coefficients. The adoption of a Bayesian approach was motivated by the fact that numerical estimation methods for hierarchical ensemble models (Gelman and others 1995; Gelman and Hill 2007) are more advanced in Bayesian applications, particularly when using the negative binomial probability density function.

We used a Markov-Chain Monte Carlo algorithm (MCMC) to simulate draws from the posterior distributions of each GLM parameter. Gelman and others (1995) provide detailed information about the MCMC algorithm, but we give an overview of our implementation here. We used a 10,000 iteration “burn-in” phase using a Gibbs sampler with uniform symmetric jumping rules and an adaptive step size to reach an acceptance rate between 20 and 40%. These were “thinned” every 100 iterations and the resulting output was used to produce a variance–covariance matrix for the estimated parameters. The main chain was then developed using a Metropolis–Hastings algorithm where 400,000–1,000,000 multidimensional parameter jumps were taken from a multivariate normal distribution that used the estimated variance–covariance matrix scaled by 2.4/d 0.5, where d is the number of parameters, as suggested by Gelman and others (1995). This scaling factor was adjusted upward or downward during runtime as needed to maintain an acceptance rate of approximately 25%. We thinned the chain by saving every 100th iteration to reduce autocorrelation in the posterior draws. For each species group, this procedure was implemented 10 times producing 10 independent MCMC chains. Starting values for each simulation were randomized by finding maximum likelihood parameters for each species individually and randomly perturbing each parameter ±30%. Convergence of chains was evaluated using the \( \sqrt {\hat{R}} \) diagnostic, whereby values close to 1 (<1.2) suggest convergence. Chains were run until convergence was reached (maximum 1,000,000 steps). Posterior kernel densities were calculated from the MCMC chains using a non-parametric Gaussian kernel with adaptive bandwidth. Estimated posterior densities were used to calculate 90% credibility intervals and the probability that βH×T and β H were less than 0. We assumed non-informative priors on all model parameters, so that they had little effect on the posterior distributions; for all GLM coefficients we assumed a normal prior distribution with a mean of 0 and standard deviation of 10.

Hierarchical Parameter Estimation

We sought to estimate the average effect size over species that we expected a priori to respond similarly because of their mobility. In a hierarchical Bayesian parameter estimation framework, there is a collection of K species that comprise some group and for each species there is some GLM parameter β j . The hierarchical model assumes that the coefficients for K species \( \beta_{j}^{1} ,\beta_{j}^{2} , \ldots ,\beta_{j}^{k} \) are drawn from a hyperdistribution, gj, σ2), a normal distribution with a mean μ and variance σ2 (Clark 2003). The goal of the analysis is to estimate the probability distributions for μ and σ2 conditional on the observed data \( Y = y_{1} ,y_{2} , \ldots ,y_{\text{K}} . \)

$$ p\left( {\mu_{j} ,\left. {\sigma_{j}^{2} } \right|Y} \right) = \int {p\left( {\left. {\beta_{j}} \right|Y} \right)p\left( {\mu_{j} ,\left. {\sigma_{j}^{2} } \right|\beta_{j}} \right){\text{d}}\beta_{j} } $$
(4)

where \( \tilde{\beta }_{j} \) is the vector of \( \beta_{j}^{1} ,\beta_{j}^{2} , \ldots ,\beta_{j}^{k} \) and \( p\left( {\left. {\tilde{\beta }_{j} } \right|Y} \right) \) is the probability density function for the vector of GLM coefficients conditional on the data.

We modeled the β H and βH×T coefficients as hierarchical parameters, while all other GLM parameters were modeled independently for each species. Posterior probability distributions for the hierarchical model were estimated using the MCMC routine similar to the one described above. Following standard procedures (Gelman and others 1995), we modeled the prior of the hyperdistribution means μ H (hypoxia main effect) and μH×T (hypoxia × time) with non-informative normally distributed priors (a mean of 0 and standard deviation of 10). We modeled the prior of the hyperdistribution variances such that the log(σ) followed a normal distribution with mean 0 and standard deviation equal to 2.5.

Results

Environmental Characteristics

As expected, none of the study sites were hypoxic at any depth during June sampling (Table 1), and none of the reference sites had mean DO less than 2.0 mg/l in either the June or the September sampling period. All depth strata in Hoodsport were hypoxic during September. Bottom temperatures were consistently 1°–2° cooler in Hoodsport than the reference sites, but these differences were consistent between June and September (Table 1). There were only minor differences in salinity at the sampling depths among sites and between sample periods (Table 1). In general, the two Hood Canal sites had bottom substrates composed of mixtures of fines/sand, whereas Puget Sound sites were dominated by sand (Table 1). Variance in water column measurement among samples within sites was minimal (CV < 0.05).

Table 1 Mean Temperature, Dissolved Oxygen (DO), Salinity, and Bottom Substrate Composition at the Four Study Regions by Depth

Overall Patterns of abundance

Total numerical density of the 35 species was roughly equivalent among the four sampling areas in June (Figure 2). Benthic fish density was slightly greater in the hypoxia-impacted region when compared to the three reference sites, whereas benthopelagic fish and sessile invertebrates were less abundant. In September, numerical density was more variable among areas. Overall density of the 35 species declined at Hoodsport, but increased at all three reference sites when compared to June (Figure 2). On average the Puget Sound sites had higher overall abundances when compared to the Hood Canal sites, with Useless Bay having the highest overall density during the September sampling period.

Figure 2
figure 2

Numerical density of species groups in A June 2007 and B September 2007, when Hoodsport was subject to hypoxia. Data represent averages over all sample depths.

Model Selection

For sessile invertebrate species, there was strong support for a single model for each species, and in all cases this model included the hypoxia main effect term (Table 2). The hypoxia × time models generally had low model weights (<0.2). Model selection did not reveal strong weights of evidence for either hypoxia main effect or hypoxia × time effects for the other species groups. Among the mobile invertebrates, three species indicated substantial support for models containing the hypoxia main effect (Decorator crab, Pandalus danae; graceful crab, Cancer gracilis; and red rock crab, Cancer productus), and one had substantial support for the hypoxia × time effect (Sand star, Luida foliolata). Among benthic fishes, one species had strong support for the hypoxia main effect (blackfin sculpin, Malacottus kincaidi) and another had support for the hypoxia × time effect (slender sole, Lyopsetta exilis). For benthopelagic fish, two species had substantial support for the hypoxia main effect (quillback rockfish, Sebastes maliger and walleye pollock, Theragra chalcogramma) and two indicated substantial support for the hypoxia × time effect (Pacific hake, Merluccius productus, Spiny dogfish, Squalus acanthias). Thus, model selection did not indicate widespread evidence for immediate responses to hypoxia when compared against alternative explanations for the spatio-temporal pattern of catch rates.

Table 2 Number of Individuals Sampled and Results of Model Selection

Parameter Estimation

Because sessile invertebrate species were often completely absent from the hypoxia-impacted site, it was not possible to estimate hypoxia × time interaction effect sizes for these species. We therefore estimated parameters from the full model minus the interactive term to estimate the hypoxia main effect. These estimates (log response ratios) were strongly negative, with mean values ranging from −4.54 (Metridium spp.) to −11.58 (Spiny sea star, Hippasteria spinoa). The estimated posterior probability of a negative hypoxia main effect exceeded 95% for all species (range, 97.3–100%). Because these effects were pronounced (due to species’ absence from the hypoxia-impacted site) and therefore well defined, we did not conduct a hierarchical ensemble analysis for this group of species.

The results from conventional parameter estimation for mobile species generally conformed to the results of model selection; there was not widespread evidence of a consistent hypoxia main effect or a hypoxia × time effect across species. For those species where the model selection indicated substantial evidence for a hypoxia or hypoxia × time effect, the credibility interval for that parameter tended to exclude 0 (Figure 3). Examples of strong correspondence include graceful crab, blackfin sculpin, slender sole, Pacific hake, spiny dogfish, and walleye pollock (Figure 3). However, there were exceptions where there was disagreement. In some cases, model selection indicated hypoxia impacts where parameter estimation did not, for example, red rock crab hypoxia × time effect (Figure 3). In others, parameter estimation suggested effects where model selection did not, for example, Dover sole (Microstomus pacificus) hypoxia main effect (Figure 3). These discrepancies most likely arose because parameter estimation included all possible predictors simultaneously. This can act to diminish the apparent importance of hypoxia if there is high covariance between combinations of predictors that was not considered in the model selection routine, or may enhance precision by better controlling for confounding effects that might have masked the hypoxia-related effects. Note that because no decorator crabs (Oregonia gracilis) were found in the hypoxia-impacted site, they were not included in the parameter estimation because there was no information to estimate the hypoxia × time interaction term.

Figure 3
figure 3

Bayesian posterior densities of the hypoxia main effect (left column) and hypoxia × time effect (right) column for individual species calculated through conventional methods (empty circles) and hierarchical methods (black circles). Lines denote 90% credibility intervals.

The hierarchical ensemble parameter estimation process led to two changes in parameter estimates for the hypoxia main effect and hypoxia × time effect. The first was that the posterior means tended to move toward the hyperdistribution mean; strongly negative estimates shifted toward zero whereas positive estimates shifted toward or below zero (Figure 4). This is the well-known Bayesian “shrinkage effect,” where individual estimates converge toward the hyperdistribution means (Sauer and Link 2002). The second effect was to enhance the precision of the parameter estimates, as evidenced by reduced 90% credibility ranges for many species (Figure 3). The average percent changes in widths of the 90% credibility intervals for the hypoxia main effect were −20, −13, and −11 for mobile invertebrates, benthic fish, and benthopelagic fish, respectively. The improvements in precision were stronger for the hypoxia × time effect among the mobile invertebrates and benthic fishes (−27 and −21% average change, respectively) but were weaker for the benthopelagic fishes (−2% average change).

Figure 4
figure 4

Comparison of posterior mean estimates for the hypoxia main effect and hypoxia × time effect estimated through conventional and hierarchical methods (shrinkage plot, sensu Sauer and Link 2002). Lines connect estimates derived from the two methods for a single species with conventional-based estimates on the top and hierarchical-based estimates on the bottom of each plot. Vertical lines indicate that estimated effect size was relatively unchanged, whereas diagonal lines indicate shifts under the hierarchical model.

We asked whether the ability to detect the anticipated effect of hypoxia (a negative hypoxia × time coefficient) on individual species was enhanced by the hierarchical analysis. We calculated the probability that the hypoxia × time effect was negative for both the conventional and the hierarchical analyses and compared them for each species (Figure 5). Apart from those species that had high probabilities of negative effects for both the methods, there were only five species for which the hierarchical estimation did not notably change the probability of a negative hypoxia × time effect (Figure 5). For four species, the hierarchical model elevated the probabilities of a negative effect so that they exceeded 95% (range of probabilities in conventional model for these species ranged from 80 to 90%) and for three species the hierarchical model elevated these probabilities so that they exceeded 90% (for these three species the probabilities ranged from ca. 60 to 70% in the conventional analysis).

Figure 5
figure 5

Probability of negative hypoxia response (hypoxia × time) estimated through conventional versus hierarchical analysis. Solid points indicate mobile invertebrate species, empty points indicate benthic fish species, and gray points indicate benthopelagic fish species. A dashed horizontal line at 0.95 is indicated for reference and the solid diagonal line represents the 1:1 line.

The posterior probability distributions for the hyperparameter means provided stronger evidence for a negative hypoxia × time effect while supporting the expectation that the hypoxia main effect was less pronounced (Figure 6). For mobile invertebrates, the mean hypoxia × time effect equaled −2.61, which equates to a 92% density reduction from June to September in the hypoxic site relative to reference sites. The probability of a negative mean effect, PH×T < 0), equaled 98%. In contrast, the mean of the hypoxia main effect across species distribution was −1.0 (P H  < 0) = 85%). Results for benthopelagic fishes were similar: marginal evidence for a negative hypoxia main effect (P H  < 0) = 79%), but stronger evidence for a negative hypoxia × time effect (PH×T < 0) = 95%). The expected value for the mean hypoxia × time effect was −3.7, which equates to a roughly 98% reduction in catch rate of benthopelagic fishes (although the posterior distribution was broad; Figure 6). Benthic fishes exhibited results that most closely matched our a priori expectations, with little support for a negative hypoxia main effect (P H  < 0) = 20%) but strong evidence for a negative hypoxia × time effect (P(μH×T < 0) = 99%) and an expected value for the mean hypoxia × time effect equaling −1.77 (73% reduction).

Figure 6
figure 6

Bayesian posterior densities of the hyperdistribution mean of the hypoxia main effect (left) and hypoxia × time effect (right) for mobile invertebrates (solid line), benthic fishes (gray line), and benthopelagic fishes (dashed line).

Discussion

Here, we quantified species responses to hypoxia in Hood Canal, Washington in a manner that explicitly considered multiple alternative explanations for spatio-temporal variation in catch rates among study sites. We found strong evidence for hypoxia inducing a persistent reduction in densities among five species of sessile macroinvertebrates, even when considering alternative explanations. In contrast, the primary effect among the mobile species was a density reduction that was apparent only when hypoxia was present. This second finding was not revealed through conventional analysis, which treated each species independently, but was made clearer when parameters were estimated in a hierarchical model that presumed that groups of species responded in similar ways. Specifically, the estimated probabilities of negative density responses to hypoxia were enhanced between 10 and 40% for many species. The estimated mean effect sizes integrated over observations on groups of species deemed likely to respond to hypoxia similarly indicated strong density responses of mobile species, with average declines in species’ densities during hypoxia ranging from 73 to 98%.

This study confirms the expectation that impacts to sessile species are primarily demographic (mortality), whereas the ecological impacts to mobile species are produced as a consequence of distributional shifts. Because the magnitudes of density responses during hypoxia were substantial in Hood Canal, they likely have ecological consequences that could be manifest as altered vulnerability to fishing gear and natural predators (Breitburg and others 1997; Seitz and others 2003; Eggleston and others 2005; Altieri 2008; Long and Seitz 2008; Ludsin and others 2009; Parker-Stetter and Horne 2009) and changes in prey availability and competition (Pihl and others 1992; Pihl 1994; Petersen and Pihl 1995; Bell and others 2003; Craig and Crowder 2005; Long and Seitz 2008; Baustian and others 2009; Brandt and others 2009; Vanderploeg and others 2009). Still, linking these immediate responses to specific secondary effects requires an improved understanding of the nature and the extent of distributional shifts. For instance, benthopelagic fishes and some mobile invertebrates may change their vertical distribution in response to deep water hypoxia (Hazen and others 2009); although, in Hood Canal this would restrict their distribution to the top 25 m of the water column (based on 2007 water quality conditions). For benthic species, the southern region of Hood Canal has few (<10% of total) areas where the depth is less than 25 m, indicating a paucity of local refuges from low DO. If even a small fraction of displaced benthic animals shifted into shallow habitats within the hypoxia-impacted zone, extraordinarily high densities would result, potentially inducing significant density-dependent responses. The spatial extent of these responses is critical, as it has direct bearing on the connectivity of the hypoxia-impacted region with other regions of Hood Canal and thereby dictates the spatial extent at which impacts may be felt (Lenihan and others 2001).

The sizable density responses to hypoxia suggest the potential for broader ecosystem-level effects of hypoxia on the structure and processing of materials within the Hood Canal food web. Baird and others (2004) used comparisons across time (pre- and post-hypoxia) over 2 years to document pronounced reductions in the transfer of energy from autotrophs to heterotrophs during the most intense periods of hypoxia in the Neuse River Estuary. Much of the ecosystem-level effect was attributed to a seasonal density reduction for specific groups of species (in this case, herbivorous species). Similarly, low dissolved oxygen was also implicated in the seasonal changes in the Chesapeake Bay food web through a reduction in abundance of key heterotroph species (Baird and Ulanowicz 1989). That both of these studies related changes in food web structure and material cycling to reductions in species’ densities suggests that the large density reductions observed in Hood Canal are also likely to be associated with similar ecosystem shifts. Much more intensive sampling on the feeding relations across the entire food web is needed to better characterize these shifts.

Two features of the present analysis of hypoxia impacts are notable. The first was the explicit consideration of alternative explanations for the spatial and the temporal patterns of abundances. By considering that local environmental attributes related to basin or bathymetry type might govern species densities and phenologies therein, we were less likely to attribute spatial or temporal differences in catch rates to hypoxia. The importance of these alternative explanations was most pronounced among the benthic fishes. Had we drawn inferences based only on a comparison between the northern and the southern sites in Hood Canal (thereby not considering the different bathymetric profiles), we would have found notable differences—and attributed these to hypoxia—for nearly one half of the species although bathymetry often proved to be a better predictor of these differences. Similarly, a simple paired comparison between southern Hood Canal and Possession Sound (both with similar bathymetries but residing in separate basins of Puget Sound) would have led us to conclude that there was a hypoxia impact (immediate or persistent) for four species, whereas our analysis indicated that these differences are better explained by basin-effects.

The second notable feature was our focus on estimating effect size so that we could gauge the magnitude of the density response that could be attributable to hypoxia. Central to this effort was our use of an ensemble modeling approach. This proved crucial to reveal non-equivocal evidence of density reductions of benthic fish, benthopelagic fish, and mobile invertebrates during hypoxia when species-specific estimates were less precise. We expected that the application of ensemble modeling would greatly enhance the precision of species-level parameter estimates, and we did witness improvements in species-specific estimates of effect sizes (Figure 5). Still, these improvements were not as striking as those noted by other authors (Sauer and Link 2002; Harley and others 2004). One notable feature of our study that may have limited the benefits of the hierarchical model at the species level was that our sampling design and analysis was conducted primarily to permit a thorough evaluation of hypoxia impacts against several potential confounding factors. Because of the dependence of hypoxia on local environmental features our statistical power is fundamentally limited by the absence of closely matched reference sites and the capacity to explain the data under multiple hypotheses.

The most notable benefit from the ensemble modeling approach was the capacity to detect effects at the group level. Here, the results were generally less equivocal, providing strong evidence for a density reduction for mobile species when hypoxia emerged but smaller effects when hypoxia was absent. This capacity to make inferences at a level that aggregates responses of many species may be a more policy (or ecologically) relevant measure, unless specific species provide unique ecosystem services or play essential roles in organizing communities. This capacity is noteworthy because detecting effects in this system was particularly challenging; multiple hypotheses might explain the data, extensive sampling was not feasible because of permitting requirements, and catch rates were highly variable within sites. Despite these challenges, evidence for immediate density reductions during hypoxia—presumably from distributional responses—was strong. In the present application to ecological impact assessment, the expectation for improved precision at the aggregated species-level may represent a substantial advance given the low statistical power of sampling and analysis methods to detect effects from disturbances (Underwood and Chapman 2003).

There were important limitations to our study design that warrant consideration. The first was that we evaluated responses in a single year, but the strength of responses may vary considerably depending on the intensity of hypoxia and other conditions. Second, although we considered multiple explanations, we obviously did not account for all possible differences among sites. We also lack direct data on movements of individuals to confirm that the hypoxia × time effect was a distributional response. One alternative explanation for this response was that recruitment of juveniles was diminished in the hypoxia-impacted site. This could be particularly important for populations that have high reproductive capacity so that the age-structure is numerically dominated by juveniles. However, Paulsen (2008) found little difference in length-frequency distributions of dominant fish species between the hypoxia-impacted and the reference sites during either sampling date.

This study provides a foundation for building a fuller understanding of hypoxia impacts in Hood Canal. The pronounced density reductions observed here suggest the potential for profound indirect ecological effects triggered by hypoxia, and these effects might be more widespread and ecologically significant than the direct mortality events that garner public attention. To advance this understanding, much more knowledge is needed regarding the resilience of the food web to intermittent disturbances, the connectedness of populations, communities and food webs, and the full extent of effect sizes produced across the range of hypoxia intensities to which Hood Canal is annually subjected.