Introduction

Production wells are often the primary source of temporal and spatial water quality and hydraulic head data. Such data are used to evaluate the origin, mixing patterns, stratification, and movement of groundwater; to describe the location, geometry, and migration of groundwater contamination; and to calibrate and verify groundwater flow and solute transport models. Because confidence in water quality data is critical, numerous purging and sampling protocols have been developed to help ensure representative water quality data, and considerable attention has been paid to monitoring well design.

Well purging is often needed because stagnant casing storage may not be representative of aquifer water quality. Non-representative water may result from chemical and microbiological induced changes in borehole water quality, internal ambient well-bore flow and mixing, the affects of pumping rates and time, and the pump location (Barber and Davis 1987; Martin-Hayden 2000a, b; Martin-Hayden and Wolfe 2000). If the aquifer is chemically homogeneous, well purging should result in a representative water quality sample. Sampling protocols include pumping until field parameters stabilize, evacuation of three or more well-bore volumes, low-flow purging, and calculations of purging times and volumes based on theoretical considerations (Barcelona et al. 1994; Barber and Davis 1987; Boylan 2004; Capel et al. 2002; Gibs et al. 1990; Hardy et al. 1989; Knobel 2006; Robbins et al. 2005; Varljen et al. 2006).

Some aquifers have vertical and/or spatial chemical heterogeneity, thus attention has been paid to monitoring well design including long vs. short-screen monitoring wells and discrete and multiport sampling devices (Britt and Tunks 2003; Einarson and Cherry 2002; Gibs et al. 1993). Long well screens (i.e., > ∼2–3 m) can bias the sample by diluting water drawn from contaminated horizon(s) with water from non-contaminated horizons. Short screen wells and discrete sampling may bias the sample by either missing contaminated or non-contaminated horizons.

Internal and external factors can bias water-quality samples obtained from heterogeneous aquifers. Well bores can induce cross-aquifer contamination in multilayered aquifers (Church and Granato 1996; Henrich 1998; Meiri 1989; Santi et al. 2006; Sloto 1996; Sloto et al. 1992). Cross-contamination may occur with small hydraulic head or thermal differentials under ambient conditions (Elci et al. 2001, 2003; Reilly et al. 1989). Temporal solute variability may also result from physical and chemical heterogeneity in the aquifer and from skin effects (Church and Granato 1996; Reilly and LeBlanc 1998).

Although many processes which can bias well water quality have been investigated, previous work has focused on: (1) theoretical and modeling approaches (Barber and Davis 1987; Elci et al. 2001, 2003; Lacombe et al. 1995; Reilly et al. 1989), and (2) laboratory and field experiments generally using low pumping rates (<5 L/s), short time intervals (<1 day), and low solute concentrations (<100 mg/L; Church and Granato 1996; Hutchins and Acree 2000; Martin-Hayden 2000b; Reilly and LeBlanc 1998). Low pumping rates and short pumping intervals mean the travel distance for water drawn into the well bore is small and only a limited aerial extent of the aquifer is involved. Additionally, low solute concentrations can make it difficult to chemically distinguish water bearing horizons, to fully evaluate well skin effects, and to identify cross-aquifer contamination.

Short well screens and multiport sampling designs are generally preferred over long well screens for water quality sampling and water level measurements. However, long-screened production wells with high pumping rates, which may sample chemically heterogeneous aquifers, are often the only data source. Despite the numerous investigations of water quality bias associated with monitoring wells, the potential for water quality bias due to long-term pumping of high output long-screened production wells is poorly understood.

Because long-screened production wells commonly draw water from heterogeneous aquifers, a critical question is what does water quality data from long-screened production wells in heterogeneous aquifers tell us about the aquifer? To examine this question two unconfined aquifer production wells, which are located in the San Luis Valley, Colorado (Fig. 1) and which exhibit large temporal TDS variability, have been investigated.

Fig. 1
figure 1

Location of the ancestral sump in the San Luis Valley, Colorado, USA (after Mayo et al. 2007). The area north of the groundwater divide is known as the Closed Basin due to internal drainage. Black dots represent the locations of SW and EW wells. Boundaries of the valley are approximate

Geologic and hydrologic conditions

The San Luis Valley, a major agricultural area located in south-central Colorado (Fig. 1), contains approximately 2.5 × 1012 m3 of groundwater within 1,800 m of the land surface (Romero and Fawcett 1978). The 95 km wide and 170 km long valley was a closed basin from about 4.5 Ma until a few hundred thousand years ago, when the ancestral Rio Grande overflowed the basin and cut a channel through the San Luis Hills in the southern portion of the valley (M.N. Machette, US Geological Survey, Denver, personal communication, 2004). The northern portion of the valley, known as the Closed Basin, remained a region of internal drainage. Mayo et al. (2007) designated the central portion of the Closed Basin as the ancestral sump due to methane (CH4) evolving, high TDS groundwater in the unconfined and upper portion of the confined aquifer.

In the Closed Basin, the ∼30 m thick unconfined aquifer occurs in the upper part of the Pliocene-Pleistocene Alamosa Formation (Mayo et al. 2007). The Alamosa Formation, which consists of a series of discontinuous lakebed clay, and other interbeds that are up to several hundred meters thick, also supports underlying confined groundwater systems (Emery et al. 1973; Hanna and Harmon 1989; Huntley 1976; Powell 1958; Romero and Fawcett 1978). Interbeds in the ancestral sump area include well-sorted fluvial deposits, fine-grained lake sediments, organic sediments, and evaporite minerals deposited in ancestral Lake Sipapu (Mayo et al. 2007).

In the ancestral sump area, the US Bureau of Reclamation constructed 170 long-screened (15–30 m of screen) production wells and 35 monitoring wells, known as SW and EW wells, respectively (Fig. 1). The SW wells have a casing diameter of 0.28 m, a mean screen length of 16.6 m, and a mean well depth of 29.9 m. EW wells have a well casing diameter of 0.1 m, a mean screen length of 4.5 m, and a mean depth of 39.8 m.

Using SW and EW data, maximum TDS concentrations in the ancestral sump are contoured on Fig. 2. The waters have a maximum concentration of more than 44,000 mg/L and evolve from Ca2+–HCO3 type water outside the ancestral sump (mean TDS 247 mg/L) to sump area Na+–HCO3 –SO4 2−–Cl rich water (mean TDS 2,619 mg/L). The chemical evolution of the waters is described by Mayo et al. (2007).

Fig. 2
figure 2

Maximum TDS contours of the unconfined aquifers in the ancestral sump area. Note variable-scale contour intervals

Methods of investigation

Thirty-five SW wells, with a maximum TDS greater than 300 mg/L, have exhibited TDS variability greater than 25% in response to pumping stress. In some wells, the TDS increases during pumping and in other wells the TDS decreases during pumping. Absolute differences between minimum and maximum TDS for individual wells range from 52 to 14,394 mg/L. Wells exhibiting TDS variations are located along a linear trend (Fig. 3), which corresponds to the locations of playa and organic rich environments in ancestral Lake Sipapu (Mayo et al. 2007). The combined effects of chemical stratification well-bore mixing due to hydraulic-head-driven ambient well-bore flow, and differential well-bore inflows during pumping stress were suspected as the cause of temporal TDS variability.

Fig. 3
figure 3

Contour map of maximum temporal TDS variations in unconfined aquifer wells. Locations of SW-67 and SW-89 are shown. Contour intervals are the difference between maximum and minimum values (maximum-minimum) of TDS in mg/L

Two production wells, SW-67 and SW-89 (Fig. 1), were selected for study because they exhibit large temporal water-quality changes in response to pumping stress (Fig. 4). Low TDS in SW-67 corresponds to pumping periods, whereas low TDS in SW-89 corresponds to non-pumping periods (Fig. 5). After periods of pumping concentrations of Na+, HCO3 , SO4 2−, and Cl decreased in SW-67, whereas concentrations of Na+, Ca2+, SO4 2−, Cl, and HCO3 increased in SW-89 (Fig. 4). The recorded TDS range in SW-67 is 1,030–15,427 mg/L and the recorded TDS range in SW-89 is 123–3,250 mg/L.

Fig. 4
figure 4

Variations in solute chemistry of a SW-67 and b SW-89 discharge during a 10-year period. Pumping history prior to sampling is shown in the bar graph above the data plot

Fig. 5
figure 5

TDS vs. pumping rate for SW-67 and SW-89. Periods of no pumping and pumping ranged from 5 to 334 days prior to sampling. Periods of no pumping are represented as 0 L/s discharge

SW-67 and SW-89 are located about 10.5 km apart (Fig. 1). The unconfined aquifer lithology is not continuous between the two wells. SW-67 lithology is dominated by finer-grained sediments, whereas coarser grained sands are more common at SW-89 (Fig. 6a,b). In both wells numerous thin clay and interbedded clay and sand horizons are encountered. Lithologic logs of wells located between SW-67 and SW-89 suggest that horizons commonly pinch over distances of a kilometer or more.

Fig. 6
figure 6

a Pre- and post-test water quality in SW-67 and observation wells during 8.5-day water-quality test. Upper Stiff diagrams represent pre-test compositions and lower Stiff diagrams represent post-test compositions. Diagrams on the left are for the B-series wells and diagrams on the right are for the A-series wells. b Pre- and post-test water quality in SW-89 and observation wells during 23-day water-quality test. Upper Stiff diagrams represent pre-test compositions and lower Stiff diagrams represent post-test compositions. Diagrams on the left are for the B-series wells and diagrams on the right are for the A-series wells

Using SW and deep boring lithologic logs, water levels, and geophysical logs, several water-bearing horizons were identified at SW-67 and SW-89. At each well, a series of nested monitoring wells were constructed in three distinct aquifer horizons traditionally considered and legally defined as part of the unconfined aquifer. At SW-67, a monitoring well was also completed in the upper part of the underlying unconfined aquifer. Seven monitoring wells were constructed near SW-67 and six were constructed near SW-89 (Table 1).

Table 1 Pre-test monitoring well data at SW-67 and SW-89. Water bearing horizons are local designations only and do not correlate between SW-67 and SW-89

At SW-67, a combined 8.5-day time-drawdown pumping and water quality test, using observation well responses, was performed. During the test, the pumping rate declined from 16.7 to 11 L/s. At SW-89 a 23-day water quality test was performed with a pumping rate that declined from 12.7 to 9.1 L/s. Pumping rates declined in response to falling water levels in the pumping wells (Fig. 7). Because the wells contained permanently installed electric pumps, the impeller speed could not be increased to compensate for the declining pumping rates. Water levels were measured using an electrical sounder. Discharge water was conveyed 300 m from the pumping wells via pipeline and discharged into small ponds.

Fig. 7
figure 7

Plot of semi-log time-drawdown vs. discharge at SW-67 during 8.5-day aquifer test

Solute, gas, and isotopic samples were collected prior to and at the end of each test. Pumping well water samples were collected from permanently attached sampling faucets. Each SW well has such a faucet because production wells are sampled at least twice a year. Prior to and at the end of the water quality tests, samples were collected from the monitoring wells using a low volume pump. Samples were collected after at least 3 well-bore volumes had been removed and the field parameters pH, temperature and conductivity stabilized. The purging protocol is considered adequate because the monitoring wells were designed to sample discrete water bearing horizons.

In addition to field parameters, samples were collected for major ion and isotopic analysis (Table 2). Isotopic analysis included δ2H and δ18O, δ13C, and δ34S. A wide range of solutes and isotopes were collected because it was uncertain which parameters would be most useful. Major ions and stable isotopes were collected to help distinguish geochemical horizons, ambient well-bore mixing, and aquifer cross-contamination. The stable isotopes δ2H, δ18O, δ13C and δ34S were collected because they sometimes provide insight into geochemical horizons independent of water–rock interactions that effect solute compositions. Major ion charge balance errors for SW-67 data are <3% and errors for SW-89 data are typically <5%. Because many confined and some unconfined aquifer waters exsolve gas, the gas was analyzed for C1-C5 (i.e., methane, ethane, propane, butane, pentane), CO2, N2, O2, Ar, He, and H2 content (Table 3). δ13C and δ2H were determined on CH4, and δ13C was determined on CO2 to help determine the origin of the gas.

Table 2 Isotopic and solute chemical results of SW-67 (8.5 day) and SW-89 (23 day) water quality test
Table 3 Analysis of evolving gas collected at end of SW-67 water quality test

Results

SW-67 and SW-89 water-quality tests

Water quality tests, which included observation-well-water-quality sampling, were conducted at SW-67 and SW-89 in an attempt to better understand the relationship between TDS variability, pumping stress, and aquifer heterogeneity. SW-67 was pumped for 8.5 days and SW-89 was pumped for 23 days. Prior to the tests, SW-67 and SW-89 were not pumped for 123 and 71 days, respectively. Beginning and end-of-test solute compositions are illustrated as Stiff diagrams in Fig. 6a,b. During the tests, SW-67 TDS declined from 10,600 to 3,530 mg/L and the TDS in SW-89 increased from 136 to 2,282 mg/L (Table 2). Water quality stabilization in SW-89 was not achieved until ∼15 days of pumping (Fig. 8). Similar temporal water quality data are not available for the 8.5 day test at SW-67.

Fig. 8
figure 8

Plot of TDS vs. time during SW-89 23-day water quality test. About 15 days were required to stabilize water quality

At each pumping well, the water bearing horizons are chemically stratified. Three distinct water types were identified: low to moderate TDS Na+–HCO3 type water, elevated TDS Na+–HCO3 –SO4 2− type water, and elevated TDS Na+–Cl–Ca2+–SO4 2− type water. Na+–HCO3 type water occurs in horizons 1, 3, and 4 at SW-67 and in horizons 1 and 3 at SW-89. Na+–HCO3 –SO4 2− type water occurs in horizon 2 at SW-67, and Na+– Cl–Ca2+–SO4 2− type water occurs in horizon 2 at SW-89 (Fig. 6a,b).

Only the upper and lower horizons at each pumping well maintained stable chemical compositions during the tests. Changes in chemical compositions in horizons 2 and 3 at SW-67 and horizon 2 at SW-89 resulted from either pumping induce vertical leakage from an adjacent horizon or from the removal of water which invaded the horizon via the pumping well bore during non-pumping periods. Where no pumping-induced vertical leakage occurred, end-of-test monitoring-well chemistries are assumed to be representative of the background solute compositions of the horizon.

Although an in depth analysis of the chemical evolution is beyond the scope of this investigation, a brief discussion of the chemical evolution will help to provide context for the observed chemical stratification. Diverse carbon histories are evidenced by HCO3 and δ13C contents which vary from 0 to −12‰ (Table 2). End-of-test SW-89 horizons 1 and 3 have HCO3 concentrations ∼1.3–3.7 meq/L and δ13C ∼ −12 to −9‰, which are typical of carbon acquired from soil zone CO2 gas and the dissolution of soil zone carbonate minerals. Most end-of-test waters have δ13C compositions of ∼ −7 to −5‰; however, SW-67 horizon 4 has a δ13C of about 0‰. Possible explanations for the less negative δ13C compositions include: (1) the acquisition of dissolved carbon during a different climatic time, or (2) the acquisition of CO2 gas or H+ from additional sources such as methanogenic reactions. δ2H and δ18O isotopic composition of SW-89 horizons 2 and 3, and SW-67 horizon 2 may suggest recharge during different climatic conditions than during the recharge of other waters or evaporation at the time of aquifer deposition (Fig. 9). Both of these mechanisms could affect the δ13C composition; however, neither mechanism would increase HCO3 to >20 meq/L.

Fig. 9
figure 9

Plot of δ18O vs. δ2H for a SW-67 and b SW-89 water quality tests. Data are plotted relative to the global meteoric water line (GMWL)

At SW-67, δ13C of −7.4‰ or less combined with elevated HCO3 concentrations are accompanied by in situ production of methanogenic carbon. Methanogenic processes are described by Doelle (1969), Hunt (1979), Whiticar et al. (1986), and Wolin and Miller (1987). Evolving methane gas (CH4) was evident during the SW-67 test from several monitoring wells (Table 3). The slight odor of HS gas occurs at SW-67, WS-89, and other unconfined aquifer wells. HS is a product of sulfate reduction and often associated with methanogenesis. Although the δ13C composition of horizon 2 may suggest methanogenesis, the SO4 2− content of this water is too elevated for appreciable anaerobic methanogenesis to have occurred. The apparent contradiction between the δ13C and SO4 2− may indicate that different chemical processes are occurring in different horizons. In summary, elevated Na+ and HCO3 contents are attributed to in situ H+-driven methanogenic-cation-exchange-carbonate-mineral-dissolution mechanism, whereas low Na+ and HCO3 contents are attributed to cation exchange.

The elevated SO4 2− content of horizon 2 at both wells is attributed to gypsum dissolution. Gypsum dissolution is evidenced by the positive δ34S compositions (Table 2). Sulfate from reduced sulfur sources (e.g., pyrite) typically have δ34S values of ∼0‰, whereas sulfate from oxidized sources (e.g., gypsum) typically have δ34S compositions >10‰ (Clark and Fritz 1997). The idea of gypsum dissolution in SW-89 horizon 2 is also supported by gypsum saturation (gypsum SI = 0.01). The low SO4 2− concentrations and very positive δ34S compositions of SW-89 horizon 1 and 3 waters are attributed to the original sulfate content of the recharge waters. Most stream waters entering the valley from both the Sangre de Cristo Range and the San Juan Mountains have very low SO4 2− concentrations; however, δ34S data are only available for one stream entering the valley, thus correlation of δ34S contents with closed basin groundwaters is problematic (Mayo et al. 2007). The elevated Cl at SW-89 horizon 2 most likely results from the dissolution of halite in the aquifer matrix.

δ2H and δ18O compositions suggest that some pre-test waters have been subjected to evaporation (Fig. 9). Well-bore evaporation may be responsible for some pre-test data, although all of the wells were capped. The data also suggest that end-of-test SW-67 horizon 2 and possibly SW-89 horizon 2 have been evaporated, whereas groundwaters in horizons above and below have not. The significance of the elevated TDS contents of horizons 2 at both wells and the potential evaporation water in SW-67 horizon 2 is that, under natural conditions, upward vertical flow from horizon 2 to horizon 1 has been limited. Under current conditions, the vertical gradient is slightly downward from horizon 1 to horizon 2. When the idea of limited vertical flow is considered in light of the closed TDS contours (Fig. 2), it is apparent that a zone or zones of stagnate or nearly stagnant groundwater exists in the subsurface. The lateral continuity of the elevated TDS horizons beyond SW-67 and SW-89 is unknown.

SW-67 time-drawdown pumping test

Unconfined aquifer vertical gradients are downward near SW-67, but the total head difference is generally less than 0.15 m (Table 1; Fig. 10). However, in horizon 3, well 3A had a static head about 0.3 m less than the corresponding well 3B, possibly due a facies change between the two wells or vertical leakage from horizon 1. The confined aquifer antecedent water level (horizon 4) was about 1 m above unconfined aquifer levels indicating upward potential between the two aquifers. No discernable trends in antecedent water levels were measured in the 20 days prior to the test.

Fig. 10
figure 10

Static water levels in monitoring well located near SW-67 (1A–4A, and 1B–3B), and near SW-89 (OW-1, OW-3 and OW-4). Water-level elevations are in meters above mean sea level (amsl)

Results of the 8.5-day aquifer test are shown in Figs. 11 and 12. Drawdown data for unconfined aquifer wells (horizons 1, 2, and 3) initially decreased in a straight-line fashion on a semi-log plot (Fig. 11). During the test, the rate of decline decreased after 200 min or less and water levels subsequently increased. The non-linear semi-log slopes are attributed to the declining pumping rate during the test, although facies changes and vertical leakage may also have been factors.

Fig. 11
figure 11

Semi-log time-drawdown plot of 8.5-day SW-67 test results

Fig. 12
figure 12

Semi-log time-drawdown plot of 1A, 1B and 4A drawdown during 8.5-day SW-67 test

The rise above the static level in the confined aquifer monitoring well 4A (Fig. 12) is a common response in Closed Basin upper-confined aquifer wells that exsolve (CH4) methane gas (F. Huss, Rio Grande Water Conservation District, personal communication, 2004). The rise in water level is attributed to the Noordbergum effect which is the result of three-dimensional deformation of adjacent aquifer materials induced by pumping (Hsieh 1996; Wolff 1970).

Drawdown data and well screen locations suggests that drawdown occurred in each well in each horizon in response to pumping SW-67, horizon 1 drawdown was <0.3 m, whereas horizons 2 and 3 drawdowns were several meters, drawdowns in the distant B-series wells were less than in the nearby A-series wells. That SW-67 is only screened opposite horizons 2 and 3 suggests that:

  1. 1.

    Horizon 1 is unconfined

  2. 2.

    Horizons 2 and 3 are confined or semi-confined

  3. 3.

    Water from horizon 1 enters the SW-67 well bore by vertical leakage into horizon 2

  4. 4.

    Lateral hydrodynamic communication occurs in each horizon

Understanding the potential contribution from each horizon to SW-67 is important for evaluating the time-drawdown data. The fact that the horizons are chemically stratified means that it should be possible to use the chemical compositions to help understand the contribution of each horizon to each other and to SW-67.

Assuming the SW-67 well completion isolates horizon 1 from the well bore, drawdown in horizon 1 results from SW-67 pumping-induced vertical leakage. Examination of the SW-67 as built documentation and discussions with Rio Grande Water Conversancy personnel (F. Huss, Rio Grande Water Conservation District, personal communication, 2004) supports this assumption. The fact that the post-test TDS of horizon 2 water is less than the end-of-test TDS and the TDS of both pre-test and end-of-test horizon 1 water is greater than the TDS of horizon 1 water suggests that ambient leakage from horizon 1 to horizon 2 is minimal. Head differentials between the two horizons of 0.01–0.03 m support this idea. Pumping SW-67 created downward head differentials of ∼2–6 m, which could readily induce vertical leakage from horizon 1 to horizon 2. The potential contribution of horizon 1 to horizon 2 during pumping was calculated assuming that the final horizon 2 chemistry is a mixture of horizon 1 and pre-test horizon 2 waters. Calculations using SO4 2−, Cl , and TDS suggest about 25% of end-of-test horizon 2 water is from horizon 1.

Mixing ratios were calculated in an attempt to evaluate contributions from horizons 2 and 3 to SW-67 during pumping (Table 4). Mixing calculations only used the A-series end-of-test results for conservative solute species (SO4 2− and Cl) and the isotopic compositions (δ18O, δ2H, and δ13C). End-of-test compositions accommodate the effect of vertical leakage from horizon 1 to horizon 2. The A-series wells were chosen because they are close to SW-67 and their water chemistry could not have been impacted by water flowing near the B-series wells. For SO4 2− and Cl, the calculations suggest that horizon 2 could contribute ∼22–34% of SW-67 discharge and that horizon 3 contributes most of the water discharging from SW-67.

Table 4 Calculation of mixing ratios of horizon 1 and horizon 2 water to yield final SW-67 water

Mixing calculations for the stable isotopes of water (i.e., δ18O and δ2H) suggest that both horizon 2 and 3 could contribute ∼50% of the water to SW-67. However, the value of the final δ13C of SW-67 is less than the δ13C value for water near A-series horizon 3 water; thus, the calculated contribution from this horizon is listed as 100% in Table 4. The δ13C of water near the B-series horizon 3 well is −5.0‰ , which means the combined contributions of water from near the A- and B-series wells is consistent with the end-of-test SW-67 δ13C composition. Therefore, based on δ13C, both horizons 2 and 3 could contribute to SW-67 discharge.

Although the chemical data suggest that during pumping horizon 2 contributes 30–40%, horizon 3 contributes 60–70% of SW-67 discharge, the data also suggest that ∼25% of horizon 2 water is from horizon 1. The uncertainty in contributions of from each horizon combined with the non-linear drawdown responses complicate the analysis of the SW-67 observation well time-drawdown data. Because of the uncertainty, both analytical and numerical methods were used to analyze aquifer parameters using the time-drawdown data.

Analytical analysis involved curve matching with respect to delayed yield, vertical leakage, and boundary conditions by the methods described in Lohman (1972) and Batu (1998). Only horizons 2 and 3, which have direct hydraulic communication with SW-67, were analyzed by curve-matching methods. The SW-67 pumping rate was apportioned between the horizons based on the calculated mixing ratios. Assigned Q contributions from horizons 2 and 3 are 27 and 73% of SW-67 discharge, respectively, and only the first 300 min of data were used. The 27 and 73% values were used to account for some water from the vicinity of the B-series wells. Aquifer parameters were calculated using leaky without storage log–log-type curves. The drawdown data were also evaluated relative to other type curves. Using the Q apportionment method, calculated storativity (S) of horizons 2 and 3 are similar, ∼3– 6 × 10−4 and ∼2–4 × 10−4, respectively and calculated transmissivity (T) for horizons 2 and 3 were ∼15 and 45 m2/day. The rate of vertical leakage between horizons 1 and 2 was not quantified.

Numerical analysis, using a radial flow model similar to Hoffmann et al. (1996), was performed on the first 360 min of data (K.J. Halford, US Geological Survey, personal communication, 2009). The numerical model has the advantage that flow rates do not need to be assigned to specific horizons and all horizons can be analyzed simultaneously. Assumptions included vertical to horizontal anisotropy = 0.2, specific storage = 9.9 × 10−6, and specific yield = 0.15. Calculated T values for horizons 1, 2, 3, and 4 were 10, 8, 84, and 67 m2/day, respectively.

Calculated T results using both analytical and numerical methods show similar patterns and are relativity consistent with each other. The results should only be viewed as 1st order approximations due to the numerous assumption used in the analysis. Results of the numerical analyses confirm that the transmissivity of horizon 3 is appreciably greater than the overlying horizons and that most groundwater discharging from SW-67 originates in horizon 3.

Discussion

SW-67 and SW-89, which are open to heterogeneous aquifers, exhibit extreme water quality variability in response to pumping stress. Each well encounters water-bearing horizons that are chemically distinct from each other. At each well site, chemical differences between the horizons include concentration and in some instances chemical composition. Because pumping well water quality variability is associated with TDS differences between the horizons as well as temporal solute variability within some horizons, the water quality relationships between the pumping wells and the horizons are complex. Understanding this complexity is complicated by the fact that hours to days of well purging are required to stabilize water quality parameters in the production wells. Such purging times greatly exceed typical sampling protocols.

In order to sort out the factors responsible for the temporal water-quality variability in the production wells, several factors need to be evaluated: (1) the natural water quality in each horizon and the spatial distribution of this water quality, (2) water quality changes within horizons due to pumping induced head changes, and (3) water quality changes induced by ambient (i.e., non-pumping) cross-aquifer contamination via the pumping well bore. The following evaluation will only include analysis of SW-67 data, because only one horizon in SW-89 contains two monitoring wells. Two monitoring wells are needed to evaluate spatial relationships.

At SW-67, only horizon 1 monitoring wells have end-of-test compositions that are essentially unchanged from pre-test conditions and that are similar in both the A and B series wells. These waters are Na+–HCO3 type with a TDS of ∼2,200 mg/L. The spatial and temporal consistency of pre-test and end-of-test compositions suggest chemical homogeneity in horizon 1 that has not influenced by pumping stress.

Horizons 2 and 3 monitoring wells exhibit both spatial and temporal chemical heterogeneity. Horizon 2 pre-test compositions in the A- and B-series wells are chemically similar (Fig. 6a). The pre-test composition (Na+–HCO3 –SO4 2− type water, TDS of ∼13,000 mg/L) likely represents background conditions because: (1) significant natural vertical leakage from horizon 1 into horizon 2 is unlikely due to the small head differential between horizons 1 and 2, (2) horizon 3 water can not invade horizon 2 via SW-67 well bore due to the downward gradient, and (3) horizon 2 waters have the highest TDS. Mixing calculations, discussed previously, suggest that the horizon 2 end-of-test compositions are diluted by a ∼25% contribution from horizon 1 via pumping induced vertical leakage. The cause of the large TDS difference between end-of-test well 2A and well 2B waters may be the result of greater pumping-induced vertical leakage from horizon 1 in the vicinity of the B-series wells than in the vicinity of the A-series wells. Lower TDS water beyond well 3B is unlikely for several reasons. The gradient from well 2A to well 2B is 0.007, suggesting horizon 3 ambient water occurring west of well 3B should be similar or more saline than water encountered in well 2A. The pre-test horizon 2 compositions support this idea. Because, prior to pumping, the natural groundwater flow is toward well 3B, the TDS beyond the well prior to pumping should also be elevated.

Pre-test and end-of-test horizon 3 water in well 3A exhibits large compositional and concentration differences, whereas both the composition and concentration in well 3B were relatively stable during the test. In well 3A, the end-of-test TDS declined from ∼6,500 to ∼700 mg/L, indicating the invasion of substantial amounts of high TDS water into the horizon prior to pumping. Elevated TDS horizon-2 water is the most reasonable source of this water. Because the head differential between well 2A and 3A is very small, only 0.01 m, and only horizon 3 water in the vicinity of the A-series wells was affected, vertical flow in the SW-67 well bore is the most likely avenue for fluid migration between horizon 2 and horizon 3. Mixing calculations using pre-test well 2A and post test well 3A as end members suggest that ∼50% of the water encountered in well 3A originated in horizon 2. The ∼45-m-thick clay zone separating horizon 4 from horizon 3 would limit vertical leakage from the confined to the unconfined aquifer.

The lateral extent horizon-2-water invasion into horizon 3 was evaluated by two methods: (1) calculation of the volume of water removed from horizon 3 during the water quality test, and (2) calculation of the radius of water invasion into horizon 3 from horizon 2 during a specified time. The volume-of-water-removed calculation utilizes several simplifying assumptions: piston flow, aquifer porosity = 0.2, the pumping time required to remove mixed water from horizon 3 = 8.5 days, average pumping rate = 14 L/s, saturated thickness in horizon 2 = 7 m, horizon 3 = 10 m. Based on these assumptions, the radius impacted in horizon 3 is ∼35 m. The 8.5 day purging time was selected because this was the duration of the drawdown test. Using a shorter purging time would result in a small radial impact. It should be noted, however, that the pre-test solute composition in well 3B, located 22.5 m from SW-67, was slightly impacted water from horizon 2.

The water invasion estimate involved calculating the ambient flow rate from horizon 2 to horizon 3 via SW-67 well bore. The flow rate was then used to estimate the radius of water invasion. The flow rate calculation assumed horizons 2 and 3 satisfy the Theis assumptions. Assigned aquifer parameters, based on the results of the time-drawdown aquifer test were T = 18 m2/day and S = 0.2 for horizon 2, and T = 84 m2/day and S = 10−4 for horizon 3. Calculated ambient flow rates ranged from ∼0.01 to 0.04 L/s, depending on the assigned head differential.

Using the range of calculated ambient flow rates, the radius of horizon 3 affected and the minimum purging time required to remove the water invaded from horizon 2 were calculated. Assumptions included horizon 3 saturated thickness = 10 m, porosity = 0.2, inflow rate = 0.01–0.04 L/s, and pumping rate = 14 L/s. The calculated radius impacted after 1 year of ambient well-bore flow is ∼7–12 m, and the purging time required to remove horizon 2 water from horizon 3 ranged from ∼4 to 24 h. In addition to the uncertainty in the ambient flow-rate calculations and the porosity of horizon 3, the calculations include the simplifying assumptions of piston flow from the well bore into horizon 3, and that the regional gradient does not impact the shape of the plume (i.e., radial flow from the well).

The purging times calculated using the ambient flow-rate method appears to be low, based on field observations, and the 8.5-day purging assumed in method 1 is likely too long. Short-term synoptic conductivity data are not available to better ascertain the necessary purging time for SW-67. SW-67 had only been idle for 71 days prior to the test. The time between pumping events would also greatly impact the purging time as the volume invasion water via ambient flow is proportional to time between pumping events. Synoptic TDS data collected during the 23-day SW-89 test (Fig. 8) provides insight into potential purging times. SW-89 had only been idle for 71 days prior to the test. During the SW-89 water-quality test, TDS data were measured frequently (Fig. 8). At SW-89, which had been idle for 123 days prior to the test, ∼15 days of pumping at ∼11 L/s were required to remove all invaded water.

Conclusions

Vertical stratification may affect water quality in sampled wells in the following manner. When the well is not under stress, groundwater from the horizon with the greatest hydraulic head flows into the well bore and displaces water from horizon(s) with lesser hydraulic heads. Water can move up or down the well bore, depending upon the direction of head differentials. Under non-stressed conditions, where the aquifer contains hydrochemically distinct or contaminated horizons, the water chemistry or contaminant concentrations in the well is dominated by the chemistry of the horizon with the greatest hydraulic head. If the well has not been pumped for some time, water from the horizon with the greatest hydraulic head will also move into and mix with groundwater in horizons with lower hydraulic heads.

Pumping removes the mixed groundwater from the well bore and contaminated horizons. Thus, after a well has been pumped for a sufficient time, the water chemistry in the well will represent the chemistry or contamination of each horizon mass weighted for its transmissivity. Numerous schemes have been developed to determine the pumped volume of water necessary to help ensure samples that are representative of the aquifer (Barcelona et al. 1994; Barber and Davis 1987; Gibs et al. 1990; Hardy et al. 1989). The general idea in most schemes is that a limited amount of groundwater extraction is required to obtain representative water-quality data. Inherent in this is the assumption that the influence of well-bore cross-contamination does not extend for a great distance into the aquifer. In most situations, water quality differences between horizons are not great and the true extent of cross-contamination is difficult to quantify.

Because substantial water-quality differences exist between aquifer horizons in this study, several observations regarding the potential meaning of water quality samples are possible from the San Luis Valley testing. At SW-89, where chemical stratification is not subtle and thus purging effects can be readily measured, the necessary purging volume greatly exceeded most protocols. The daily effects of pumping on solute compositions of SW-89 waters are illustrated in Fig. 8. Similar data are not available for SW-67. Solute compositions increased steadily until about day 14–16 when compositions stabilized. This is equivalent to vacating about 1,600 well-bore volumes before representative water quality was obtained. After chemical stabilization the representative water quality carries the caveat that is does not represent a single horizon, but it represents mixed water quality.

Observation well data from SW-67 and SW-89 demonstrate that well-bore mixing in long-screened wells can result in appreciable aquifer water mixing away from the well bore under small head differentials. Thus, mixing can influence the solute, isotopic, and contaminant concentrations in nearby short-screened monitoring wells. Mixing of aquifer waters by invasion of water via the well bore is observed in well 3B (SW-67), at a distance of 22.5 m from the long-screened well pumping under a head differential of only 0.11 m.

Another critical issue is what does the water quality from a sampled well tell about the aquifer system? Water quality sampled from SW-67 and SW-89 provided only limited insight into subsurface conditions. The temporal water-quality data did suggest well-bore mixing from two or more horizons. However, the pumping well data did not suggest the existence of four hydrogeochemical horizons and it did not suggest chemical facies changes over short distances. For example, the δ13C of HCO3 and the major ion compositions of end test SW-67 horizon 2 waters (i.e., 2A and 2B; Table 2) and the beginning and end-of-test TDS differences in SW-67 wells 3A and 3B are fundamentally different from each other indicating facies changes over a distance of less than 30 horizontal m.

Some monitoring well data also provided misleading information. For example, OW-3 (SW-89) has only 1.5 m of screen, yet the water quality and isotopic composition of samples at the beginning and ending of the 23-day test are fundamentally different (Fig. 6b, Table 2). A similar conditions occurs in well 3A at SW-67 (Fig. 8a) Daily sampling from SW-89 (Fig. 8) indicates that a considerable volume of water, 1,600 well bores in this case, must be removed to eliminate cross-contaminated water for the aquifer system. Even in cases where the TDS did not appreciably change between the beginning and end of the test such as well 3B (SW-67), fundamental isotopic and isotopic compositional changes occurred (Table 2).

Knowledge of aquifer lithology and aquifer parameters may provide little comfort in assessing the meaning of water quality data. The extent of chemical stratification and aquifer cross-contamination was not apparent from the borehole lithologic and geophysical data at SW-67 and SW-89.

Because chemical stratification is pronounced over relatively short vertical distances, the water quality variability in long- and short-screened wells in the San Luis Valley provides valuable insights into the impact of chemical stratification on water quality samples and on knowledge of the groundwater system gained from samples. Such insight is not readily apparent in groundwater systems where chemical stratification is not pronounced yet subtle differences occur. This is particularly true for groundwater contamination investigations where concentration differences as small as 0.001–0.01 mg/L may be critical. Such small values may be the difference between meeting or exceeding a water quality standard. As an example of how this critical difference may be of importance, Capel et al. (2002) found that purging three well-bore casings may be adequate for major ion analysis but not for some inorganic constituents. In this case, it is likely that different hydrostratigraphic horizons had similar overall water chemistry but not all horizons were nitrate and atrazine contaminated. Thus conventional purging stabilized field parameters, but either cross-contamination or organic constituents remained or other factors affecting the reliability of purged samples remained.

Results of this investigation suggests that: (1) in relatively low TDS, groundwater cross-aquifer contamination may persist appreciably longer that previously thought and that typical well-purging techniques may not result in representative water quality samples, (2) in long-screened production wells, cross-aquifer contamination is common, although it may not be readily apparent when the solute concentrations of the various horizons are similar, and (3) it may not be possible to obtain a non-biased water-quality sample. Therefore, the question remains—You have sampled the well, but what do you now know about the aquifer? Clearly thoughtful consideration is required when collecting, interpreting, and evaluating water quality results from long-screened wells in heterogeneous aquifers.