Introduction

Hydrophysical conditions substantially influence aquatic communities in lotic systems (Statzner et al., 1988; Gore, 1996). Distributions of benthic organisms are often responsive to heterogeneous conditions, especially at the sediment water interface, resulting from spatial and temporal variation in discharge and geomorphology (Lancaster & Hildrew, 1993; Layzer & Madison, 1995; Rempel et al., 2000; Merigoux & Doledec, 2004). Interest in understanding physical and hydraulic factors that might drive the distribution and abundance of freshwater mussels has been increasing due to their decline throughout North America (Strayer et al., 2004).

Large-scale patterns of mussel distributions may be constrained by patterns of dispersal of the early life stages including their parasitic glochidial stage and host fish (Watters, 1992; Vaughn & Taylor, 2000), as well as geomorphology and land-use of surrounding watersheds (Arbuckle & Downing, 2002). Past studies of mussel distributions at smaller scales relied on simple physical variables (e.g., depth, current velocity, substrate type) to describe and predict suitable mussel habitat with limited success (e.g., Holland Bartels, 1990; Strayer & Ralley, 1993; Brim Box et al., 2002). More recently, many studies have provided evidence that mussel occurrence in streams and small rivers was related to complex hydraulic variables such as shear stress and Froude number (Layzer & Madison, 1995; Hardison & Layzer, 2001; Howard & Cuffey, 2003). Strayer (1999) found that mussel beds in two streams in New York were spatially coincident with stable substrates during high discharge events (i.e., floods) suggesting that flow refuges might partly explain the negative relation between shear stress and mussel density as well as the characteristic patchiness of unionids in lotic systems. Similarly the effects of flow refuges during floods have been observed in distributions of other benthic invertebrates (Lancaster & Hildrew, 1993; Rempel et al., 1999). However, shear stress might also determine mussel distributions by influencing settlement of juvenile mussels (Layzer & Madison, 1995; Hardison & Layzer, 2001), presumably during summer low flows when juveniles of many species drop off their fish hosts, rather than movement of bed load during floods.

Much of the information on factors controlling mussel distributions has been developed on streams or small rivers, whereas large rivers have received much less attention. Large rivers that contain substantial floodplains are fundamentally different from smaller systems in their lateral complexity. Physical and biological processes may be quite different than lower order systems (Johnson et al., 1995). Moreover, floodplain complexity in large rivers infers a different, and perhaps more diverse, set of physical (e.g., bathymetric, hydraulic) conditions available to biota than smaller systems. Historically, they are, more likely hydrologically stable (less flashy) than smaller systems, but the hydrologic regime and hydraulic features of many rivers has been substantially altered in North America and elsewhere by human modification including channel training structures and dams (Gore & Shields, 1995; Tockner & Stanford, 2002). Given the imperiled status of unionid mussels and growing interest in management actions that alter flow patterns in large rivers including dam removal, water level drawdown, reservoir water release, and habitat rehabilitation projects (Gore & Shields, 1995; Johnson & Janvrin, 2000; Bednarik, 2001; Landwehr et al., 2004) information on factors that influence mussel distributions in large rivers is critical for their conservation.

The lack of published large river studies may be, in part, due to the difficulty and expense of sampling mussels in this environment. Small streams and rivers can often be sampled inexpensively by wading and hand collection, but large rivers frequently require more expensive diver-assisted sampling (e.g., Holland-Bartels, 1990; Christian & Harris, 2005) due to areas of deep and turbid water. Nonetheless, substantial unpublished information on mussel communities is often collected by state, federal and private agencies due to concerns about their declining populations. Because these data are collected for specific purposes, compilation typically results in data sets that are nonuniform or “messy.” However, exploratory analysis techniques can provide a means to draw valid inferences on the inherent structure and interactions among variables of such complicated data (Efron & Tibshirani, 1991; De’ath & Fabricus, 2000; Cablk et al., 2002).

We conducted an exploratory, retrospective analysis of existing mussel data in conjunction with recently developed geographic information system (GIS) layers of physical and hydraulic features in a large river reach. Our specific objectives were (1) to evaluate the role of discharge in structuring mussel distributions and abundance in a reach of the Upper Mississippi River; and (2) to develop exploratory statistical and geospatial models of physical and hydraulic conditions that influence the presence and absence of mussels in this reach.

Materials and methods

Study area

The Upper Mississippi River contains 1,380 km of impounded river that runs from the most upstream dam at St. Anthony Falls in Minneapolis, Minnesota, to the most downstream dam near St. Louis, Missouri. The river has a long history of anthropogenic modification for commercial navigation (Fremling and Claflin, 1984). Between 1878 and 1912, the U.S. Army Corps of Engineers began channelizing the river by dredging, and constructing wing and closing dikes. A series of 29 locks and dams were completed in the 1930s to facilitate a navigation channel with a minimum 2.7 m depth. Currently, dredging is only used intermittently on an as-needed basis to maintain depths in the navigation channel.

Our study area was the 38-km long reach of river in Navigation Pool 8, which is between Lock and Dam 7 at Dresbach, Minnesota, and Lock and Dam 8 near Genoa, Wisconsin (Fig. 1). This reach has an aquatic area of about 8,900 ha and contains a diverse mosiac of channel and backwater habitat types (Wilcox, 1993; Table 1) in the floodplain. Navigation Pool 8 is one of six study reaches of the U. S. Geological Surveys’ Long Term Resource Monitoring Program (LTRMP), and therefore several GIS layers (e.g., bathymetry, current velocity) were available.

Fig. 1
figure 1

Locations of recent and historic brail, dive, and sled samples in Navigation Pool 8, Upper Mississippi River. Aggregations of 8 or more samples of a gear in a 0.5 km diameter circular area are designated with larger symbols to improve clarity. Shaded areas denote land. Horizontal lines delineate pool thirds as defined in Table 3. MN = Minnesota; WI = Wisconsin

Table 1 Descriptions of eight aquatic area types, derived from Wilcox (1993), found in the Upper Mississippi River

Study approach

Our general approach was to compile and analyze available information on mussels, which were sampled with three gears, and aquatic habitat in our study reach. We developed GIS layers for complex hydraulic variables, and then extracted georeferenced estimates of potential explanatory variables for each sampling location. We constructed gear-specific statistical models of mussel distributions (i.e., presence–absence, density), and then applied the decision rules in the statistical models to construct geospatial models predicting overall mussel distribution in our study reach. Discharge-specific models of mussel density were compared in a separate analysis to evaluate the effects of discharge on model error.

Data compilation

We assembled mussel data and associated field-collected ancillary data for Pool 8 of the Upper Mississippi River from a variety of sources including both state and federal government agencies and independent consultants (Table 2). The data represents a 28-year period (from 1975 to 2003) and were collected for various study objectives with a variety of gears including dive sampling of quadrats (0.25-m2), tows of a sled-mounted dredge (width 0.43 m; 2.5 cm bar mesh), and brail bars. Sleds were often towed long distances, especially in areas with low mussel densities. To improve spatial consistency and reduce bias towards mussel presence in sled data, we only used data from shorter sled tows that sampled less than 100 m2, except we retained all tows in which mussels were absent regardless of sampling area as evidence of areas that were depauperate of mussels. Brails were constructed of a row of blunt, multi-pronged hooks attached by short chains to a wood bar. The area sampled by tows of brails, which capture mussels that clamp their valves onto the brail hooks, was calculated from the bar width and estimates of tow distance along the river bottom, and ranged from 93 to 650 m2.

Table 2 Sources, collection years, gear type and number of samples of mussel data used in analyses of mussel distributions in Pool 8 of the Upper Mississippi River

Sample locations were either provided by the study investigators or were estimated with ArcView software (Environmental Systems Research Institute, Redlands, California) from maps provided with the data. Geographic coordinates for dive samples were from global positioning system (GPS) coordinates provided by investigators or, in a few cases, were digitized from locations along a transect bounded by known GPS coordinates. Coordinates for sled tows were based on GPS locations reported by investigators at the start of each tow. Locations of brail samples were digitized based on estimated locations on maps that accompanied the data. Brail samples were the most well distributed longitudinally with most samples located in areas of large channels rather than off-channel areas. Most sled samples were in the lower half of Navigation Pool 8. Dive samples were concentrated in eight areas spread throughout the study reach including channel and off-channel areas, although there were few in the lacustrine impounded area. Sampling in six of these areas were along transects with quadrats purposefully taken at locations in and adjacent to known mussel beds.

Habitat variables

We developed a suite of variables (Table 3) that might influence the distribution of mussels in our study reach. For each sampling location, values of these predictor variables were derived either from the original data source or from existing and newly developed GIS coverages (Table 3). GIS coverages of aquatic area type (Table 1), pool thirds, bathymetry, and current velocity were obtained from the LTRMP. Bathymetry and current velocity models were based on data collected in 1989 and 1990. The bathymetric coverage was derived from an intensive hydrographic survey (Rogala, 1999). Current velocity coverages were based on two-dimensional, depth averaged (RMA-2) current velocity models (U.S. Army Corps of Engineers, 1996) calculated for three river discharge levels: 2,547 m3/s, 906 m3/s, and 311 m3/s. These levels (hereafter termed Q5, Q50, and Q95) corresponded to river discharges that were exceeded 5, 50, and 95% of the time from 1960 to 1999, respectively (J. Rogala, U.S. Geological Survey, personal communication). All bathymetric and current velocity coverages were bounded by the 0-m depth contour (shoreline) at a water surface elevation of 192.5-m above sea level.

Table 3 List of variables used in the classification and regression tree (CART) analyses to model mussel distribution in Pool 8 of the Upper Mississippi River

The GIS coverages for shear stress, Froude number, relative substrate stability and slope were computed based on existing data (Table 3) and standard formulae applied to raster data (5-m cell size) using ArcGIS software (Environmental Systems Research Institute, Redlands, California). Formulae in Statzer et al. (1988) were used to calculate Froude number and shear stress (via rearrangement of terms in the shear velocity equation). Substrate roughness, which was needed to develop shear stress coverages, was based on estimates of sediment particle size derived (Håkanson 1986) from a sediment penetrometer survey (J. Steuer and J. Rogala, U.S. Geological Survey, unpublished data). For samples that were assigned a substrate class by field crews (75% of brail data, 82% of sled data, 99% of dive data), relative substrate stability was computed using estimated shear stress from GIS coverages and entrainment shear stress of the mean particle size, based on the predominant substrate class, using methods in Morales et al. (2006) and Wu & Wang (2002). Pool-wide GIS coverages of relative substrate stability for geospatial modeling (see below) were developed (S. Zigler, unpublished data) based on a coverage of sand and silt substrate classes from LTRMP ponar data (N = 5596). We calculated slope along the river bottom, based on the LTRMP bathymetric coverage, as the angle of change in depth over a plane fit to a 3 × 3 cell neighborhood around each 5-m processing cell in a moving window. Slope was calculated in degrees irrespective of aspect (i.e., compass direction) with possible values ranging from 0 (no change in depth) to approaching 90-degrees (nearly vertical).

Statistical analyses

Data were explored and analyzed with classification and regression tree (CART) models (Brieman et al., 1984; Clark & Pregibon, 1992) developed with CART software (Salford Systems, San Diego, California). Classification and regression trees have been seldom used in ecological studies, but are a powerful analytical method for descriptive and predictive modeling of complex data (Efron & Tibshirani, 1991; De’ath & Fabricus 2000). Tree models are grown by recursive binary splitting of the data to form a complex of nodes and branches. Beginning with the root node, data are split into two mutually exclusive groups based on a simple decision rule. Decision rules are selected by computer algorithms from candidate explanatory variables with the goal of maximizing homogeneity within each group. The splitting procedure is repeated for successive groups of data (decision nodes) with the overall goal of identifying a combination of splits that form a reasonably small tree with homogenous final groups (terminal nodes). For complex data such as ours, CART models offer several important advantages in that they are distribution-free, robust to outliers and missing data, do not require a priori selection of variables, and can accommodate threshold responses and context dependent interactions. Further, classification and regression trees can effectively model variable relations despite significant spatial autocorrelation in data (Cablk et al., 2002).

Classification tree models of presence or absence of mussels were constructed for each of the three gear types. We used v-fold cross validation (= 10) to aid in the selection of the most parsimonious tree and assess error rates (Breiman et al., 1984) because estimates of prediction success (i.e., correct classification of observations) of models tend to be overly optimistic if testing is based on the same data used to construct the model. Briefly, the procedure behind v-fold cross validation is to randomly divide the data into equal subsets (v = 10) and then drop out each subset in turn (test data), building a series of tree models on the remaining subsets (learn data). The omitted subsets (test data) are then used to estimate the true error rates for the overall, combined tree model. For each gear, our goal was to identify the tree model with the minimum relative, cross-validated error of the overall model (Breiman et al., 1984; Steinberg & Colla, 1997). Only models that had a cross-validated prediction success >60% (i.e., correct classification of test data) for both presence and absence were considered. To guard against overfitting, terminal nodes contained at least five observations, and maximal trees were grown and then pruned automatically via CART algorithms (Steinberg & Colla, 1997). Exploratory trees were grown with the full suite of variables. Variables were deleted stepwise from candidate models when masking effects were evident from strong competitor and surrogate splits (Breiman et al., 1984; De’ath & Fabricus 2000). To facilitate interpretation, simpler trees were favored over more complex trees that had similar cross-validated prediction scores, and candidate trees with >10 terminal nodes were discarded as too complex. Relative importance scores of model variables were calculated by summing the improvements in model error for each variable across all nodes for the primary and first five surrogate splits, and then scaled (range, 0–100) to the variable with highest improvement score (Brieman et al., 1984; Steinberg & Colla, 1997).

A regression tree of mussel abundance in dive-quadrat samples, which were the most quantitative and spatially precise data available, was developed similarly to the presence–absence trees using the full suite of explanatory variables. Further, we evaluated the role of discharge in structuring mussel distribution and abundance by bootstrap aggregation (Breimann, 1996; Steinberg & Colla, 1997) of regression trees of mussel abundance in dive-quadrat samples. This procedure is similar to a neural net in that the underlying model structure cannot be directly displayed or understood, but the predictive results are typically more stable and accurate than the results of a single tree. For each discharge level (i.e., Q5, Q50, Q95), the explanatory variables included the discharge-specific current velocity, shear stress, relative substrate stability, and Froude number. Additionally, depth and slope were included in all models because of potential interactions with these discharge-dependent variables. Each tree was built using least squares splitting criterion (Brieman et al., 1984; Steinberg & Colla, 1997). We aggregated the results of 100 bootstrapped trees; about 37% of the data were randomly excluded from each tree with the data set brought to full size by resampling with replacement. A 15% random sample of the data was withheld to evaluate the performance of the aggregated trees. Results of the discharge-specific models were evaluated using the predictor mean square error (PMSE).

Geospatial models and GIS analyses

Classification tree models of mussel presence–absence for each gear, and the regression tree model of mussel abundance (dive-quadrat only) were translated into geospatial models of Navigation Pool 8 using ArcGIS. For each model, individual 5-m grid cells within aquatic areas were attributed with values from overlying GIS coverages of explanatory variables. Each cell was classified into a terminal node outcome based on the decision rules of the model; outcomes were predicted presence or absence for classification models, and predicted density based on median mussel abundance in quadrats for terminal nodes of the regression model. We overlayed the geospatial models of presence–absence for each gear to evaluate concordance.

Results

Sampling site distribution and hydraulic conditions

The spatial distribution of sampling sites differed among gears (Fig. 1). However, data from each gear contained observations from quiescent lentic areas to hydraulically-energetic lotic areas resulting range of hydraulic conditions (Table 3). Many of the continuous variables in our analyses were significantly correlated (Table 4). This result is not surprising because the complex hydraulic variables (e.g., shear stress, Froude number) use simple physical variables (e.g, depth, velocity) in their calculation, and because ranks of variable types measured at many locations will be similar (but not monotonic) across different discharges (e.g., Shrs_5, Shrs_95). Although collinearity of predictor variables is not problematic for CART models, it can complicate model interpretation.

Table 4 Significant (P ≤ 0.05) Pearson correlation coefficients for pairs of continuous variables used in classification and regression tree (CART) analysis of mussel data (N = 800)

Mussel abundance and distributions

The mussel database contained information on a total of 4,027 individual mussels of 26 species (Table 5). However, about 70% of the mussels were four species that included the threeridge (Amblema plicata), fawnsfoot (Truncilla donaciformis), threehorn wartyback (Obliquaria reflexa), and wabash pigtoe (Fusconaia flava). Threeridge were the most numerically dominant by far and the most ubiquitous; their presence was reported at more than 300 sampling sites. The notable absence of fawnsfoot in sled samples may be partly related to gear bias because the dredge used 2.5-cm mesh size, which might allow small mussels like the fawnsfoot to pass through. Five individuals of the federally endangered Higgins eye (Lampsilis higginsii) were reported in the dive samples, but none were captured by the other gears . Several species that are considered endangered or threatened in Minnesota or Wisconsin were also reported including the rock pocketbook (Arcidens confragosus), monkeyface (Quadrula metanevra), round pigtoe (Pleurobema sintoxia), mucket (Actinonaias ligamentina), washboard (Megalonaias nervosa), and butterfly (Ellipsaria lineolata).

Table 5 Total number of mussels reported from all data used in analyses, and the number sites each species was present in brail (n = 396), sled (n = 90), and dive (n = 314) samples

Classification tree models

Overall, correct classification of the data used to build the models for presence–absence of mussels was 76% for brail data, 87% for sled data, and 84% for dive data. Cross-validated prediction success of the gear models ranged from about 71 to 76% for the overall model, and >70% for all presence and absence components (Table 6). However, the models substantially differed in their structure, containing from 5 to 9 terminal nodes (Fig. 2).

Table 6 Cross-validated prediction success of classification tree models of presence–absence of mussels in brail, sled and dive samples
Fig. 2
figure 2

Classification tree models of mussel presence and absence based on brail (a), sled (b), and dive (c) data in Navigation Pool 8, Upper Mississippi River. Models are read from the top down beginning at the root node, which contains all data. At each subsequent decision node, data that satisfy the splitting rule move to the left branch and all other data move to the right branch. For splitting rules using continuous predictor variables, numbers in parentheses are the quantile of the rule value over all data for that gear. Observations in terminal nodes are classed as present (P, solid bar) or absent (A, clear bar) based on the probabilities shown in the graph below each node. Abbreviations for variables are given in Table 3

The brail model was the most unique of the three presence–absence models particularly at the top of the tree. The first two splits, aquatic area type and pool thirds (Fig. 2a), were not used in any other model and operate at a coarser scale than the other explanatory variables. Nonetheless, the first two splits resulted in two relatively homogeneous terminal nodes (i.e., high probability of correct classification) that accounted for 51% of data (nodes 1 and 5, N = 203 of 396 samples). Brail samples taken in floodplain lakes, secondary channels, and tertiary channels had a lower probability of capturing mussels than other aquatic area types (node 1, Fig. 2a). The remaining aquatic area types (navigation channel, channel border) in the upper third of Pool 8 had a high probability of mussel presence (node 5; Fig. 2a). Data in the terminal nodes in the lower portion of the tree (nodes 2, 3, and 4), which pertained to samples in middle and lower thirds of Pool 8, were less homogeneous as evidenced by smaller differences in the probabilities of presence and absence within each node. These nodes were split on combinations of shear stress and relative substrate stability. However, the importance scores of these and other fine-scale variables was low relative to aquatic area type and pool thirds (Table 6). Moreover, an alternative tree structure (not shown) using only aquatic area type and pool thirds, had identical first two splits and cross-validated prediction success (71%), but predicted mussel presence only in floodplain shallow aquatic and main channel border areas in the lower third of Pool 8.

The sled model was constructed of splits on relative substrate stability and shear stress at the Q5 and Q95 discharge levels (Fig. 2b). Importance scores for the top variables used in the model were more evenly distributed in contrast with the brail model (Table 6). Samples at sites with very low shear stress at Q5 discharge had low probability of mussel presence (node 1; Fig. 2b). However, sites with moderate to high shear stress values in combination with moderate to highly stable substrates had high probabilities of mussel presence (nodes 2, 4, and 5, Fig. 2b).

The tree model for presence–absence of mussels in dive samples (Fig. 2c) was the most predictive of the three classification models (Table 6). Most terminal nodes were the result of interactions between shear stress and slope (Fig. 2c). Notably, all splits on slopes indicated a positive relation between slope and mussel presence (nodes 1, 2, 6, 9; Fig. 2c). Relative substrate stability variables were also important in the model (Table 6). This was generally due to their strength as surrogate split variables although relative substrate stability at Q50 was used to form terminal nodes 4 and 5 (Fig. 2c), which indicated an association of stable substrates with mussel presence.

Regression tree models

The regression model of mussel density in dive samples (Fig. 3a), which used data from all discharge levels, explained 51% of the deviance in the data. The model was based on interactions between depth (relative importance score 100), slope (relative importance score 36), Froude number (relative importance score 51), and various measures of relative substrate stability (relative importance scores, 51–70). Shear stress at Q5 discharge (relative importance score 44) was also important in forming the model, but did not appear as a primary split variable. Notably, all discharge-dependent variables appearing in the model were calculated at Q95 and Q5 discharges, but not Q50 discharge. Similar to the presence–absence model, higher slopes were associated with higher densities (nodes 6, 7, 9; Fig. 3a and b). The largest number of sites (N = 133) were contained in node 8 (Fig. 3a) that had a combination of less stable substrates at low discharge and low slope, and one of the lowest mussel densities (Fig. 3b). Bootstrap aggregation of the dive data indicated that variables measured at Q95 discharge were about 25% more predictive (PMSE = 14.8) than variables measured at Q50 discharge (PMSE = 20.4); Q5 discharge variables were intermediate (PMSE = 17.1).

Fig. 3
figure 3

Regression tree model of mussel abundance in 0.25-m2 quadrats sampled by diving in Navigation Pool 8, Upper Mississippi River (a). The model is read from the top down beginning at the root node, which contains all data. At each subsequent decision node, data that satisfy the splitting rule move to the left branch and all other data move to the right branch. For splitting rules using continuous predictor variables, numbers in parentheses are the quantile of the rule value over all quadrat data. Box and whisker plots for each terminal node of the model (b). Abbreviations for variables are given in Table 3

Geospatial models

The geospatial models of mussel presence–absence in the three gears substantially differed in the amount of area predicted to have mussels present, especially in the lower impounded area. The dive model (Fig. 4a) was the most restrictive and predicted mussel presence at only 19% (149 ha) of the aquatic area in Pool 8. The order of importance for dive model terminal nodes (Fig. 2c) based on areal coverage were nodes 3 (108 ha), 2 (24 ha), and 7 (7 ha) for mussel presence, and nodes 1 (523 ha), 6 (46 ha), and 8 (30 ha) for mussel absence. Generally, areas predicted to have mussels present in the dive model were along channel borders, especially near planform features that added complexity such as channel bends and islands, and well-connected areas in backwaters especially channels in floodplain shallow aquatic areas. The navigation channel, poorly connected floodplain lakes, and the impounded area were typically classified as areas without mussels.

Fig. 4
figure 4

Geospatial models of the presence and absence of mussel in Navigation Pool 8, Upper Mississippi River based on classification tree models for dive (a), sled (b), and brail (c) gear types

The geospatial model derived from the sled tree model predicted that the total area with (411 ha) and without (381 ha) mussels was similar (Fig. 4b). The order of importance of terminal nodes (Fig. 2b) based on total area was node 2 (291 ha), 5 (54 ha), and 4 (35 ha) for mussel presence, and nodes 1 (240 ha), 3 (129 ha), and 6 (41 ha) for absence. Similar to the dive model, complex areas along channel borders and channels in backwaters were predicted to have mussels; poorly connected floodplain lakes and most of the navigation channel were classified as areas without mussels. In contrast to the dive model, a substantial portion of the impounded area was classified as having mussels present.

Because the classification tree model for mussel presence–absence sampled by brail relied on broad categorizations of aquatic areas (i.e., pool thirds, aquatic area types) the spatial resolution of the geospatial model was coarser than the dive or sled models (Fig. 4c). Nonetheless, there were some similarities with the other two presence–absence models. These included the classification of floodplain lakes as areas without mussels, and the classification of main channel border areas as areas with mussels. The navigation channel in the lower two thirds of Pool 8 was typically classified as not having mussels.

The geospatial model based on mussel abundance in dive data (Fig. 5) showed a similar pattern to the dive presence–absence model. Areas predicted to have high densities of mussels (>48/m2) constituted only 0.7% (56 ha) of the total aquatic area in Pool 8 and were mostly located in small flowing channels in the floodplain shallow aquatic area, and an inundated channel in the impounded area near the west bank (Fig. 5). The geospatial model predicted the lowest densities (0–11/m2) in the navigation channel, poorly connected backwater areas such as floodplain lakes, and a large portion of the impounded area (82% of the total area; 6,299 ha).

Fig. 5
figure 5

Geospatial model of mussel density based on the regression tree model of abundance in 0.25-m2 quadrats sampled by diving. Mussel densities, which were estimated by terminal nodes in the regression tree, were grouped into low (≤12/m2), medium (13–48/m2), and high (>48/m2) classes for clarity. Arrows indicate regions predicted to have high mussel densities in the abandoned, historic channel in the impounded area (main map), and in the channelized portions of the floodplain shallow aquatic area (inset). Some data were omitted from the uppermost portion of the reach due to lack of data for several predictor variables

Discussion

Our models provided substantial information on mussel distributions in large floodplain river and demonstrated the utility of exploratory models for leveraging existing data. The overall success of the CART models (70–76% for presence–absence, 51% of deviance for abundance) was surprising given gear biases (see below), limitations of the data, and the large spatial extent of the study reach. Although differences in methods, scale, and resolution in past studies preclude meaningful comparisons of success rate for the few studies with predictive models, our results suggest that hydrophysical conditions in large rivers constrain available habitat for mussels similar to smaller systems (e.g., Strayer, 1999; Hardison & Layzer, 2001; Howard & Cuffey, 2003; Stone et al., 2004).

Factors influencing mussel distributions

Past studies have provided abundant evidence that small-scale factors acting near the sediment-water interface, particularly shear stress and substrate stability, are important for structuring mussel distributions in streams and small rivers (Layzer & Madison, 1995; Strayer, 1999; Hardison & Layzer, 2001; Howard & Cuffey, 2003). In general, our exploratory CART models support the premise that mussel distributions can be predicted by shear stress and substrate stability in a large rivers, and that these variables are more predictive than simple physical variables (e.g., current velocity, substrate type). Although many past studies reported significant linear relations between mussel abundance and complex hydraulic variables, our models predict that interactions between these and other variables may be important. For example, interactions between complex hydraulic variables and slope were important in the models derived from the dive data.

Slope might influence distributions and abundances of mussels in different ways depending on scale and aspect relative to flow. At a watershed scale, a negative association between slope and mussel abundance in streams has been attributed to hydrologic regimes and erosional–depositional characteristics of streams (Arbuckle & Downing, 2002). However, a positive association between mussel distribution and longitudinal slope of a river channel has been observed for Margaritifera margaritifera, presumably because it represented overall, reach-scale hydraulic and substrate conditions (Hastie et al., 2004). In our study, which measured slope irrespective of aspect to flow at a much smaller scale than previous studies, slope had a positive effect on mussel presence and abundance. The resolution of our slope coverage was insufficient to map small-scale bedforms such as sand dunes, but rather mapped gross planform features. Substantial bottom slope at this scale potentially represents zones of rapidly changing hydraulic conditions at the sediment-water interface.

We were unable to directly evaluate the effect of aspect relative to flow on mussel distribution due to the lack of directional flow data. Moreover, the presence–absence model for dive samples, which associated strong probabilities of mussel presence with high slopes (terminal nodes 2 and 9, Fig. 2c), resulted in areas classed as presence in the geospatial model that appeared to be both parallel and perpendicular to the likely current direction in channels. Slope that is parallel to the flow direction could function quite differently than slope of features perpendicular to flow such as islands, sand bars, bedform depressions, or scour holes. Slope features that are perpendicular to the flow present a range of conditions from upstream-facing or stoss slopes with accelerated flow and higher shear stress areas, to downstream-facing or lee slopes that may be characterized by flow separation, low shear stress, and turbulent, low-velocity eddies (McLean & Smith, 1986; Maddux et al., 2003). Association with slope parallel to flow along channel margins might simply reflect a refuge from unstable substrates that occur in the dune fields common to the faster flowing center of the channel. Other studies have observed higher mussel presence in channel borders and attributed the lack of mussels near the center of the channel to high hydraulic energy, shear stress, and unstable substrates (Brim Box et al., 2002; Howard & Cuffey, 2003).

Most studies, including ours, calculated shear stress using estimates of substrate roughness from particle size data. However, bedform features may have a substantial influence on shear stress and thus substrate stability (Maddux et al., 2003; Kostaschuk et al., 2004). For areas such as the main channel, the lee faces and troughs of sand dunes would provide areas of low shear stress (McLean & Smith, 1986; Maddux et al., 2003), but the effects are presumably ephemeral at any given location because of substrate mobility (i.e., movement of sand dunes) and changes in bedform with discharge. Our estimates of relative substrate stability and Froude number indicate substantial movement of sand dunes is likely in many areas of the navigation channel even at low discharge. Moreover, main channel areas such as the portion of the navigation channel in middle third of the Pool 8, which was the least sinuous and predicted to be without mussels in all three presence–absence models, would have mobile substrates even at very low discharges as indicated by relative substrate stability ratios >1. Conversely, main channel borders, especially areas associated with significant geomorphic complexity (e.g., river bends, islands, side channel entrances), were often identified as areas with mussels present or abundant in the geospatial models. Such areas might be especially important in the UMR and other similar systems because channel training structures (e.g., wing dikes) and dredging for navigation have homogenized depths and accelerated flows in the main channel, which may have contained areas of suitable habitat including stable substrates before the system was altered by humans.

Several other similarities among geospatial models were apparent despite the substantial differences in the independent data sets that were used in their development. Each of the models generally predicted no or few mussels in backwater areas such as floodplain lakes and poorly connected, low-flow portions of the floodplain shallow aquatic area. Although some of these areas may contain lentic mussel species, populations may be sparse. Such areas could be less conducive to mussels because many are subject to anthropogenic sedimentation (McHenry et al., 1984), episodes of winter hypoxia (Bodensteiner & Lewis, 1992; Gent et al., 1995), high sediment ammonia (Frazier et al., 1996), or freeze to the bottom during winter in shallow areas (Gent et al., 1995).

The models showed substantial differences in the prediction of mussel occurrence in the impounded area except that the inundated channel near the west shore was important for mussels in all geospatial models we developed. That portion of the impounded area is unique because it was the main channel prior to impoundment and still retains some characteristics of a channel including substantial depth, and moderate current velocities and shear stress. However, a much larger portion of the impounded area was predicted to have mussels present in the sled model than the dive presence–absence model. This result was partly due to a number of factors including sampling distributions and methodology that may have biased the model to an unknown extent. Sled samples, which were concentrated in the impounded area (Fig. 1), had a high proportion of samples with mussels present, but also sampled a large area (typically <100 m2) despite our discard of samples from sleds towed over longer distances. Shorter sled tows would better match the scale of the microhabitat data and might have improved our model. Future efforts focused on mussel distributions in the impounded reach, which might be the area most affected by navigation dams, in comparison to the upper reaches of the Pool that substantially retain many characteristics of the pre-impoundment river would be beneficial.

The brail data were unusual in several respects. All brail samples were taken in a substantially different time period of nearly 30 years ago. Thus, mussel populations existing at that time were subjected to somewhat different hydrologic and hydraulic conditions than populations sampled by more recent studies, which could have affected distributions (see below). The time difference also may have contributed to errors in the explanatory variables because our GIS coverages were developed on more recent data. Moreover, brails have been rarely used to sample mussels in more recent years partly because they are inherently qualitative with poorly understood biases compared to more quantitative sampling methods such as dive-quadrat sampling. Our estimates of physical conditions at brail sample locations were also less accurate than for the dive and sled samples because positions and tow lengths were estimated from archived study maps rather than documented with GPS technology. Consequently, the model greatly depends on broad-scale categorical variables such as aquatic area type and pool thirds that only indirectly represent other physical or biological variables that act on mussel populations. Nonetheless, the brail data were more longitudinally complete than our other data sets, and the brail model corroborates aspects of the sled and dive models, which indicate the relative importance of channel border and floodplain shallow aquatic areas and the relative unimportance of poorly connected floodplain lakes and unstable areas of the main channel.

Role of discharge

Spatial differences in river geomorphology result in variation in flow patterns among discharge levels, and some variation in relative values of hydraulic variables. However, significant relations of hydraulic variables among discharge levels are common (e.g., Layzer & Madison, 1995; this study), and meaningful models of mussel distributions might be built for a given system using information from any consistent discharge level. Nonetheless, our bootstrap analyses of discharge indicated that conditions during high (Q5) and low (Q95) discharge were more predictive of mussel abundance than conditions at average (Q50) flows. This result suggests that rare, episodal events such as floods and droughts were important in structuring the distribution and abundances of mussels.

Our models are consistent with other studies that associated higher mussel abundances with flow refuges constituted by areas of low shear stress or stable substrates during floods (Strayer, 1999; Howard & Cuffey, 2004). Such areas might partly explain the characteristic patchiness of mussel populations observed in many systems, and might be self-reinforcing to a degree if dense beds of adult mussels function to further stabilize sediments (Strayer et al., 2004; Vaughn and Spooner, 2006). However, low discharge models in our study were the most predictive in our study indicating effects on juvenile settlement or displacement might be important, which has been suggested in both small streams (Layzer & Madison, 1995; Hardison & Layzer, 2001) and in another reach of the Mississippi River (Morales et al., 2006). Hardison & Layzer (2001) hypothesized that displacement of adults during floods and restrictions on juvenile settlement at lower shear stress might both play a role in determining mussel distributions, but at different spatial and temporal scales. We concur and further speculate that hydrologic history for highly variable systems like the Mississippi River should be expressed spatially in patterns of recruitment and population demographics of mussels. Studies linking hydrologic history, hydraulic models, and spatial characteristics of population demography are needed to clarify the roles that discharge and anthropogenic modifications to hydrology have on mussel populations, and could also help resolve questions regarding the relative importance of positive, negative, and self-reinforcement mechanisms that might affect formation and persistence of mussel beds.

The role of rare events, including both droughts and floods, in structuring populations of mussels is reasonable because mussels are long-lived and relatively sessile. Moreover, mussels in Pool 8 have experienced substantially more severe hydrologic events in last few decades than were modeled in our data. For example, discharge in Pool 8 during a drought in summer 1989 was only 187 m3/s, about 60% of our lowest modeled discharge (311 m3/s), and were 6,370 m3/s during a spring flood in 2001, about 250% of our highest modeled discharge (2,547 m3/s). Development of hydraulic data for more extreme discharges might lead to further improvements in the predictive success of models when coupled with appropriate mussel abundance data.

Model constraints and conclusions

The approach used in this study allowed us to develop useful statistical and geospatial models of mussel distributions. Although the combination of historical mussel data and ancillary geospatial data layers available for our study reach is unusual, exploratory modeling methods such as CART are underused in ecological studies (De’ath & Fabricus, 2000) and can provide substantial insights even when data are relatively limited (Brieman et al., 1984). To our knowledge, the extension of statistical models to geospatial models of mussel habitat has not been done in previous studies and provides a first step in understanding overall patterns of mussel distributions in the Upper Mississippi River.

The substantially different sampling distributions and methods of the various mussel data sets we used to develop models undoubtedly introduced biases, but the similarities and differences among independent models were especially revealing. The application of CART models to geographical space also presents unknown spatial errors that may differ among terminal nodes because of the differences in the number and type of GIS layers used to map individual nodes. Moreover, the crude estimation of some variables for GIS layers undoubtedly resulted in additional error. Despite limitations due to error, the application of such CART models to a geospatial context results in spatial models that are highly amenable to future validation, refinement, and hypothesis testing in the field (Urban et al., 2002). Assessment of prediction errors for CART-derived geospatial models using independent data across a range of conditions in other river reaches would be especially helpful for understanding model limitations.

Further development of information on factors that influence mussel distributions in large rivers is crucial because resource managers are frequently faced with decisions affecting at-risk populations in the absence of data. We did not attempt to develop species-specific models because our goal was to better define the broad envelope of conditions that allow mussels to persist in large rivers. Some species differences undoubtedly exist, but many studies have shown that mussel beds are often diverse in the Mississippi River and most species co-occur to some extent (Holland-Bartels, 1990). We believe that development and modeling using our approach for functional guilds or species of particular interest such as the endangered Higgins eye pearly mussel might be beneficial.