Introduction

Since the introduction of the term ‘hyporheic biotope’ by Orghidan (1959), a large number of studies have been dedicated to the hyporheic zone (HZ) and its role as a transition zone between groundwater and surface water bodies (Boulton et al. 1998; Hancock et al. 2005). As a transient storage zone (Bencala et al. 1990; Cardenas 2015), the HZ plays a pivotal role in the transport and attenuation of nutrients and contaminants in rivers, lakes and wetland environments. Despite different research perspectives and interests, many studies can be tied to the quantification of exchange fluxes (EFs) between these landscape compartments and involve predominantly the transport of water and energy. With their variability in space (Elliott and Brooks 1997; Poole et al. 2008) and time (e.g. Stanford and Ward 1988; Harvey et al. 2006), EFs influence biogeochemical reactions and microbial growth in the HZ (Conant et al. 2004; Boano et al. 2014).

Kalbus et al. (2006) review and discuss principles and methods used for the delineation of EFs at various scales, of which environmental heat tracing is one important technique (see, e.g. the reviews of Rau et al. 2014 or Irvine et al. 2016). As a measure for heat, temperature is an important variable to calculate environmental energy fluxes (Hannah et al. 2004; Caissie et al. 2014) and forms a strong link between hydrology, ecology and other geosciences (e.g. Lewandowski and Nützmann 2010 or Zheng et al. 2016).

Heat tracing methods make use of temperature differences in surface water and groundwater. Heat is mainly transported by advection (i.e. the mass transport of water through the porous medium) and heat conduction (i.e. the diffusive heat transport in absence of mass transport) due to spatial and temporal variations in temperature (Anderson 2005; Constantz 2008). Thermal gradients within the HZ depend on the local climate, the physical composition of the riverbed and groundwater flow. Temperatures can be relatively easily measured using temperature sensors and loggers (Schmidt et al. 2014; Irvine et al. 2016); however, the application of heat tracing techniques is limited by heterogeneities within the HZ (Schornberg et al. 2010; Lautz and Ribaudo 2012), parameterization and scale problems (Lautz 2010; Krause et al. 2011) or nonvertical groundwater flow (Ferguson and Bense 2011).

Thermal steady-state assumptions (Bredehoeft and Papadopoulos 1965) require a single temperature profile measured at one point in time to quantify vertical EFs (Arriaga and Leap 2004; Schmidt et al. 2007; Anibas et al. 2011); however, as Anibas et al. (2009) demonstrated, the thermal steady-state assumption is only justified for certain conditions, namely exfiltration in winter and summer (i.e. times when seasonal influences on the vertical temperature gradients in the HZ are small). Exchange fluxes calculated outside these periods lead to erroneous results. Transient analytical solutions (Stallman 1965; Luce et al. 2013) using sinusoidal temperature signals, or the more recently developed spectral solutions (Wörman et al. 2012; Vandersteen et al. 2015; Schneidewind et al. 2016) overcome this limitation. Transient solutions use temperature time series from different depths in the riverbed as input.

This work presents EFs derived from an extensive river and riverbed temperature data set collected at the Aa River site in Belgium. Collected in intervals of several weeks to 2 month over more than 1.5 years, the data set consists of around 650 temperature profiles from 26 spatially distributed measurement points. Earlier publications (Anibas et al. 2008, 2009, 2011) used parts of the presented data set, but due to the methodological limitations of the applied thermal steady-state assumption they fell short in analyzing the entire data set. Here, transient heat tracing techniques have been applied; however, the lack of time series data, a requirement needed for spectral solutions of, e.g. Wörman et al. (2012) or Vandersteen et al. (2015), inhibits their application on the Aa River data. The research question therefore is whether a series of independent temperature profile measurements obtained from the same location at intervals too large to consider them temperature time series can be used with a transient heat transport model to calculate vertical EFs.

To achieve this objective, the temperature profiles measured at different intervals in time were linked with a river-temperature time series as a model boundary. The data set was implemented in the numerical heat transport code STRIVE, resulting in 380 independent simulations. The analysis allowed for the quantification of vertical EFs and model prediction errors—root-mean-square error (RMSE)—which is not possible with a thermal steady-state assumption. It is shown that this transient model can use all riverbed temperature data, independent of the season, which leads to a higher temporal resolution of calculated EFs than previously achieved.

Similar to Anibas et al. (2011), the obtained spatially distributed EF point estimates are upscaled to the investigated river reach to study their spatial–temporal variability and to be compared to the ones obtained from the earlier applied thermal steady-state analysis. Together with statistical tests, the transient model is further used to investigate the local hydrological system, i.e. the contribution of EFs at the center of the river and near the river banks and the change of EFs over time. Hydraulic gradient data obtained from piezometer nests placed in the riverbed are used for the verification of the model results.

Methodology and STRIVE model

Periodic temperature changes at the land surface, due to diurnal and seasonal temperature changes, or nonperiodic ones (e.g. due to shading by clouds or plants, precipitation, changes of magnitude or direction of groundwater flow or anthropogenic influences) are transported into the subsurface via heat advection and heat conduction. Heat transport through the subsurface causes a delay of the temperature signal and its attenuation.

The combined anisothermal heat-fluid transport of an incompressible fluid through a homogeneous porous medium can be written as (Stallmann 1965; Domenico and Schwartz 1990)

$$ \frac{\kappa_{\mathrm{e}}}{\rho c}{\nabla}^2T-\frac{\rho_{\mathrm{w}}{c}_{\mathrm{w}}}{\rho c}\nabla \cdot (Tq)=\frac{\partial T}{\partial t} $$
(1)

Here, T is the temperature at any point at any time in the subsurface in K, with c w being the specific heat capacity of the fluid in J kg−1 K−1, ρ w the density of the fluid in kg m−3, and c and ρ the specific heat capacity and density of the sediment-fluid matrix in J kg−1 K−1 and kg m−3, respectively; t is the time in s and q is the seepage velocity or specific discharge vector in m s−1. For presentation reasons, the unit mm d−1 is used to indicate fluxes. Groundwater discharge or a gaining reach is represented as a negative seepage velocity, while recharge or a losing reach is represented by a positive seepage velocity. κ e is the effective thermal conductivity of the soil-water matrix in J s−1 m−1 K−1. For a more comprehensive overview about the theoretical background, history, limitations and practical applications of 1D-vertical EF applications the reader is referred to Anderson (2005), Constantz (2008) or Lautz (2010).

The presented heat tracing technique is an indirect method, so the measured temperatures need to be processed with a heat transport model to obtain estimates of EFs. A variety of numerical codes is available to calculate EFs from HZ temperature measurements, ranging from fully coupled numerical codes such as HydroGeoSphere (Therrien et al. 2010) to simpler ones such as 1DTempPro (Voytek et al. 2014) or STRIVE (STReam RIVer Ecosystem; Buis et al. 2007; Anibas et al. 2009).

Here, inverse modeling of numerical 1D heat and fluid flow routines is applied based on Eq. (1) and Lapham (1989) in STRIVE, a set of modules with internal integration routines embedded in the ecosystem model platform FEMME (Soetaert et al. 2002). For the simulations presented in this work, parameter values were chosen based on Anibas et al. (2011): a thermal conductivity of 1.8 J s−1 m−1 K−1, ρ w was set to 1,000 kg m−3, c w is 4,180 J kg−1 K−1, whereas ρ and c were set to 1,970 kg m−3 and 1,370 J kg−1 K−1, respectively.

The riverbed was discretized as a vertical, one-dimensional, homogeneous, saturated soil column of 5.0 m depth composed of 100 layers. The thickness of the model layers follows a sine function, providing layer thicknesses of 0.001 m at the model boundaries, while the thickness of the layers is increasing to 0.08 m towards the center of the model domain. This reduces discretization errors and enhances computational speed. Notice that the seepage fluxes calculated with STRIVE represent point estimates and are purely vertical; lateral flow is not taken into account. The model furthermore considers the interface between river and riverbed as a sink for the heat transported from the riverbed to the river.

STRIVE can be used in two setups, a thermal steady-state (Anibas et al. 2011) and a transient mode (Anibas et al. 2009, 2012). While the former can be performed as a simple curve fitting of a single temperature profile, for the latter, a time-dependent model input and model boundaries needs to be applied. For a thermal transient simulation, STRIVE uses two or more individual riverbed temperature profiles measured at the same location at different times. Time series of river and riverbed temperatures are used as boundary conditions connecting these profiles in time (Fig. 1). Such measured surface water temperature time series were already used by Silliman and Booth (1993) and Silliman et al. (1995) to quantify recharge through the sediments of a creek. Notice that the model uses identical temperature values for the boundary conditions and the respective measurement at the river–riverbed interface (i.e. at 0.0 m depth) and at 5.0 m depth. Measurements of Anibas et al. (2011) determined 12.2 °C as the long-term average groundwater temperature in the area of the Aa River. Time series measurements at 3.8 m below surface near to weir 3 indicated annual temperature differences of about 1.5 °C. At 5.0 m depth these differences will be smaller; such changes in groundwater temperature have only limited influence on the model output for a given simulation period. Therefore a constant average groundwater temperature is applied at 5.0 m depth, strongly increasing the computational speed of STRIVE.

Fig. 1
figure 1

Transient thermal modeling using STRIVE. The crosses and the lines indicate the measured the simulated riverbed temperatures at different depth, respectively. The data come from simulations of point 4 showing modeling with two and four temperature profiles for simulation periods A and 1, respectively. The final results for EF were −43 mm d−1 for period A and −39 mm d−1 for period 1, achieved after iteration with a minimal difference between the measured and the modeled temperature distribution. The respective RMSE were 0.11 and 0.36 °C

For a single run of the transient model, the first temperature profile is used to initialize the model, while subsequent profiles are used for model fitting. In an iterative procedure, the model varies the vertical flux term of Eq. (1) to minimize the differences between measured and simulated temperature distribution (Fig. 1). The model was run using two and four temperature profiles as input.

Notice that in the context of this work, ‘transient’ refers to the transient nature of the measured and simulated temperatures rather than to a transient model output of EFs. The estimated EFs are integrations or net fluxes for the given simulation period and therefore constant during the simulation period. A disadvantage compared to the steady-state assumption is the computing time. Whereas the former is obtained in a few seconds with STRIVE, the transient analysis may take up to an hour for a single simulation period using a contemporary PC.

Field site and measurements

Field work was carried out along a 1,425 m stretch of the Aa River in Flanders, Belgium (Fig. 2). The Aa River is a typical Flemish lowland river with a total length of 36.7 km and a catchment area of 237 km2. The yearly average air temperature is around 10.2 °C, whereas the potential evapotranspiration and the annual precipitation are 670 and 830 mm year−1, respectively (Dams et al. 2015). At the field site, the Aa River has an average water depth of around 1.1 m, a width of around 13 m, an inclination of 0.48 ‰ and an average discharge of approximately 1.9 m3 s−1. The examined river reach is bounded by two weirs, upstream weir 3 and downstream weir 4, between which the Aa River flows in a straightened bed. The weirs provide relatively stable water levels in the river. The banks and the riparian zone of the Aa River are mostly free of tall vegetation. Several ditches drain the surrounding agricultural lands into the river.

Fig. 2
figure 2

a The field site at Aa River located within the catchment of the Nete River, Belgium. b Topographic elevation map of the examined river reach with the location of the 26 spatially distributed measurement points and three piezometer nests. The boundary between the Diest Formation downstream and the Kasterlee Formation upstream is indicated by a dotted line and was determined using the database of DOV (2017)

The riverbed is composed of fine-to-medium sand with varying fractions of organic matter, especially at point bars of meanders. The riverbed sediments along the banks are usually more compacted than at the center of the river. This might be the effect of erosion and sedimentation processes after the construction of the riverbed. According to the Flemish geological data base ‘Databank Ondergrond Vlaanderen’ (DOV 2017), the local superficial geology consists of the Tertiary Diest Formation, a heterogeneous sand layer, and the Kasterlee Formation composed of fine sands with fractions of clay. The former appears in the downstream part (Fig. 2, downstream of point 7), while it is overlain by the latter in the upstream part. Together with the underlying Berchem Formation they constitute an aquifer system of around 80 m thickness which is bounded below by the Boom Aquitard (Anibas et al. 2011).

The Aa River is a relatively large system where EFs are investigated, and it is also one of the best studied in this respect. Anibas et al. (2008) already published first results on EFs in the framework of a larger multidisciplinary project to quantify exchange processes in river ecosystems (Buis et al. 2007). Anibas et al. (2009, 2011) developed the STRIVE model and analyzed the Aa River dataset concentrating on the thermal steady-state approximation. Vandersteen et al. (2015), Anibas et al. (2016) and Schneidewind et al. (2016) focused their research on the major tributary within the examined river section, the Slootbeek. They developed novel methods for the analysis of riverbed-temperature time series, and their work thus has a different spatial and temporal focus compared to this publication.

For the examination of EFs, it is advised to combine different methods such as temperature tracers with hydraulic head measurements, slug tests and seepage meters (Rau et al. 2010; Wang et al. 2017) or other tracers (Engelhardt et al. 2011; Gonzalez-Pinzon et al. 2015) to maximize the tradeoff and minimize errors for each single method. Here (1) consecutive measurements of temperature profiles in the riverbed have been combined with (2) continuous measurements of hydraulic gradients and river and riverbed temperatures; note that ‘(1)’ was performed with a so-called ‘temperature lance’ instrument (‘T-lance’, also known as ‘T-stick’, Anibas et al. 2011), a 2-m-long instrument equipped with a single thermistor (Davis Instruments Model 7817; Hayward, California, USA). For ‘(2)’ piezometer nests were installed in the riverbed at locations 4, 7 and 13 (Fig. 2) equipped with Diver (Schlumberger Water Services, Delft, The Netherlands) temperature and hydraulic head data loggers and/or temperature data loggers StowAway TidbiT (Onset Computer Corporation, Bourne, Massachusetts, USA) recording at up to three different depths (0.1–0.15, 0.5–0.6 and 0.9–1.2 m) with hourly frequency. From the combined temperature time series collected in the piezometer nests, the upper model boundary for the STRIVE model was created.

The ease of temperature measurements in the riverbed allows for obtaining this parameter with a high spatial and temporal resolution, especially on local or reach scales. Twenty-two measurement campaigns were performed at the Aa River reach between 02 Aug 2004 and 08 Feb 2007 to gather riverbed temperatures. Since the piezometer nests were not available from the beginning of the measurements, in this paper, data from 14 campaigns beginning from 25 Aug 2005 are presented in detail. In a single measurement campaign, up to 26 spatially distributed measurement points along and across the Aa River were investigated. Three of these campaigns were performed in ‘summer’ (25–26 Aug 2005, 03–04 Jul 2007 and 08 Sep 2007) and five in ‘winter’ (13–14 Jan 2006, 09–10 Feb 2006, 06 Mar 2006, 14–15 Jan 2007 and 08–09 Feb 2007). These were times with a strong thermal contrast between surface water and groundwater. Two were performed in ‘spring’ (04–05 May 2006, 18–19 May 2006) and four in ‘autumn’ (03–04 Oct 2005, 30 Nov 2005–01 Dec 2005, 28–29 Sep 2006 and 09–10 Nov 2006); hence, at times when weak thermal gradients may be present in the riverbed. At most measurement points, five temperature measurements were conducted at depths of 0.0 m (i.e. top of the riverbed), 0.10, 0.25, 0.50 m and the maximum depth that could be reached with the T-stick, which was generally around 0.40–1.20 m (on average 0.90 m). Each measurement was assigned a number from 1 to 14 and a location identifier, L (left bank), C (center), and R (right bank). The L and R measurements are located around 1.5 m inward from the river banks and C at the center-line of the river (Fig. 2). Point 12 was inaccessible for most of the time, so no values are presented here. The measurement points were sampled in campaigns of two consecutive days, a time frame within which according to Anibas et al. (2011) the hydrologic conditions can be presumed to be constant.

Results and discussion

Analysis of the river and riverbed temperatures

A wide range of temperatures was observed in the riverbed of the Aa River during the measurement campaigns using the T-lance instrument. Figure 3 shows the riverbed temperature as an average of the 13 measurement points at the center for each campaign and their seasonal classification. Also shown are the 14 measurement campaigns presented in this paper (see underlined in Fig. 3). The extremes of individual temperature measurements are listed in Table 1, whereby the highest and lowest temperatures were measured at 0.0 m depth. The temperatures show a stronger attenuation at the banks, indicating stronger EFs. The measured river temperature time series, shown in Fig. 4a as a line graph, indicated an average temperature of 6.0 °C (standard deviation 1.8 °C) in winter, while the average temperature was 19.0 °C (standard deviation 2.9 °C) in summer. These measurements provide a good indication of riverbed temperatures in Belgian lowland rivers and show that the riverbed temperature is more variable in summer than in winter.

Fig. 3
figure 3

The seasonally classified temporal variation of riverbed temperature profiles measured in the Aa River. The profiles show the average of the 13 measurement points at the center of the river taken from 22 measurement campaigns between 02 Aug 2004 and 08 Feb 2007. The 14 measurement campaigns used in this paper are underlined

Table 1 Minimum (Min) and maximum (Max) riverbed temperatures from the temperature profiles measured using the T-lance and the river water temperature time series of the Aa River, Belgium
Fig. 4
figure 4

a Boundary conditions for the STRIVE model, the time series of the measured river water temperature and the groundwater temperature at 5.0 m depth. b Time series for Darcy-based EFs. c Results of the transient simulation with STRIVE as time series of EFs for the periods A–L. EFs were calculated as averages of the 14 measurement points in the center of the river

Table 2 summarizes the observed temperature differences between all pairs of measurements for a particular measurement campaign at equal depth of 0.10 m (26 measured points in space). At this depth, diurnal transient influences on the temperature measurement are already strongly attenuated. The spatial minimum and maximum temperature differences between the observations are respectively 1.8 °C (autumn) and 9.7 °C (summer). The strong temperature contrast in summer and winter is a clear indication of spatial variation of EFs. Within a year (Fig. 3; Table 3) there is a progression from warmer surface temperatures to warmer riverbed temperatures and vice versa. While summer and winter are times with a pronounced temperature gradient in the HZ, autumn shows lower vertical temperature gradients. Strong seasonal transients led to a fast evolution in thermal gradients in spring and autumn. In autumn, the lowest gradient of 0.41 °C was measured (between 0.1 and 0.5 m depth as an average of the center points on 03 Oct 2005). A similar behavior is expected for the spring season; unfortunately the two campaigns presented in this work (i.e. 04 May 2006 and 18 May 2006) are not representative for an entire spring season. The autumn gradients in general show values larger than the accuracy of the used thermal sensor (0.3 °C), suggesting that most riverbed temperature profiles contain sufficient variation to be used in a heat transport model.

Table 2 Minimum and maximum spatial temperature contrast between observations at 0.1 m depth within the examined river reach for the various measurement campaigns
Table 3 The average measured temperature gradients between 0.1 and 0.5 m depth at the center and at the banks of the river indicated per season

Model validation using RMSE

The lack of a measure of model or output quality is a deficit of the thermal steady-state assumption. The transient model allows STRIVE to apply model cost function analysis. For this article, the model quality was expressed by the differences between the measured and the simulated temperatures as RMSE. Using two temperature profiles for the analysis, the mean RMSE is 0.24 °C, while the RMSE for four profiles is 0.42 °C. Higher mean RMSE values (mean 0.61 °C, range 0.15–1.81 °C) were calculated at the banks of the river reach. The variation may result from the influence of nonvertical groundwater fluxes (Ferguson and Bense 2011), temporal variations of EFs or from heterogeneity of the thermal riverbed parameters, which are not accounted for in the model.

By using the results from simulations with two temperature profiles of the center points of the river, the seasonal arithmetic mean RMSEs were 0.24 °C in summer, 0.30 °C in autumn, 0.22 °C in winter and 0.36 °C in spring. The respective maximum RMSEs were 0.41 °C in summer, 0.50 °C in autumn, 0.33 °C in winter and 1.05 °C in spring. Figure 5 shows this seasonal variation of RMSE by indicating the mean and the 95% confidence interval of the data set together with the accuracy of the used temperature sensor of 0.30 °C. Being mostly below the sensor accuracy, the model fit (i.e. RMSE) shows only little seasonal variation. The relatively small RMSE values show the feasibility of the transient method for the simulation of vertical EFs. However, similarly to the thermal steady-state assumption (Anibas et al. 2011), seasonal transient temperature changes may influence the quality of the transient model. While the vertical temperature gradient in the riverbed varies from almost 0 °C to up to about 5.0 °C annually, the accuracy of the temperature sensor and systematic measurement errors remain constant. An RMSE of <0.3 °C is therefore necessary in times of strong vertical temperature gradients (e.g. > 1.6 °C as common in summer and winter). In times where the riverbed temperatures are more uniform in depth (i.e. in autumn and spring), a model fit with an RMSE of 0.3 °C could be achieved with a broader range of EFs. Hence, the changing temperature gradients may lead to a temporally variable model uncertainty. In times of stronger gradients, the model would be able to deliver better EF estimates. This research however did not clearly indicate such a dependence. The presented transient simulations used temperature profiles as input which are separated in time by weeks or months; a uniform vertical temperature distribution incorporated in the model is therefore unlikely. Thus, it can be stated that the transient approach is applicable for all temperature profiles, regardless of the time or season in which they were measured. The start and/or the end of a model run is preferably placed at times of strong vertical temperature gradients.

Fig. 5
figure 5

Results from simulations with two temperature profiles at the river’s center points indicate seasonal variations of the RMSE, with a peak in spring. The mean RMSE and the 95% confidence interval band of the data set and the accuracy of the used temperature sensor with 0.30 °C are indicated

Quantitative estimation of EFs: simulations using two riverbed temperature profiles

The STRIVE model does not calculate time-dependent EFs; however, a temporal resolution can be achieved by sequentially modeling the temperature data. In this way, the last temperature profile used in the former simulation initializes the following one; hence, the output of different transient simulations was connected in time resulting in 12 successive simulation periods (i.e. A-L as indicated in Table S1 of the electronic supplementary material (ESM). Since the time between the different riverbed temperature measurements was not constant, the length of the simulation periods varies between about 2 weeks and 2 months. These results can be obtained for each of the 26 spatially distributed measurement points (Table S1 of the ESM). The river-temperature time series must correspond with the measurements at 0.0 m depth, a model requirement which worked well for the measurements at the center of the river. At the bank measurements (i.e. L and R), however, the relatively warm or cold upwelling groundwater limits the use of the single river-temperature time series for defining the model boundary—for example, near point 13 L the difference in groundwater and river temperatures was even felt while standing in the river. In such cases, the measured time series was adjusted to link it to the measured temperature profile. This procedure may introduce errors to the EF calculation, and as can be seen in Table S1 of the ESM, many simulations did not lead to an acceptable result. A complete set of results is therefore only available for the center points (C).

Figure 4c shows the temporal variation of average fluxes of the 12 periods A–L as an average of the 13 center points. For comparison and validation, the figure also shows a time series of EFs derived from a Darcy-based analysis from piezometer data (Fig. 4b). It is apparent that the Darcy-based fluxes are higher than the STRIVE estimates (an average of −37 vs. –49 mm d−1, respectively); however, the calculated Darcy-based EFs are very sensitive with respect to the hydraulic conductivity. Since there is no exact knowledge of vertical hydraulic conductivity of the riverbed, the Darcy analysis is based on Anibas et al. (2011) using 3.0 × 10−6 m s−1. Both analyses showed lower EFs during summer than winter, supporting Anibas et al. (2011), who detected similar seasonal trends. The measured data set also covered two similar periods between late summer and winter of two consecutive years (i.e. B–C–D and J–K–L), which showed very similar weighted averages with −48 and −50 mm d−1, respectively. The Darcy based analysis showed higher fluxes for the first period, indicating annual variations of EFs. The STRIVE and the Darcy-based estimates therefore did not always match: while for the periods C–E, H–I or L both methods showed similar trends, the results for autumn (A, B and C in 2005 and J and K in 2006) or spring (F–G) showed temporal variations which were not reflected in the Darcy-based flux calculation.

Figure 4c shows that late summer was a time of relatively high EFs (i.e. periods A and I). In early autumn the fluxes were decreasing (periods B and J), only to rise sharply in late autumn (periods C and K). The EFs stabilized at their highest values in winter (periods D, E and L) and dropped sharply in spring (periods F, G and H). Then low values of EFs were indicated before they rose again in summer (period I). In general, the match between the STRIVE and the Darcy-based analysis was better during summer and winter than during spring and autumn. The differences might be caused by changing model accuracies as explained in section ‘Model validation using RMSE’.

Figure 6 highlights EFs of three representative measurement points (i.e. 4, 6 and 13) in their temporal distribution. Measurement point 4 shows EFs closest to the average of the river reach (compare to Fig. 4c), while point 6 shows the lowest fluxes and point 13 shows the highest EFs of all measurement points. The overall pattern is comparable to the trend in Fig. 4c, but individual variations do exist. While point 6 experienced a change from gaining to losing in the spring periods F and H, points 4 and 13 remained gaining throughout the investigated time. Period G showed very high EFs at point 13, while most other points yielded values closer to the neighboring periods F and H. Since only two temperature profiles were used in STRIVE, a nonoptimal model fit or inaccuracies of the temperature measurements, as discussed in section ‘Model validation using RMSE’, may occasionally lead to erroneous flux estimates.

Fig. 6
figure 6

Three typical measurement points 4, 6 and 13 from top to bottom with their temporal distribution of EFs. The bars show the periods A–L. Measurement point 4 has flux values close to the average of the whole river stretch, while measurement points 6 and 13 show the lowest and highest fluxes along the Aa River reach, respectively

Figure 7a shows the spatial distribution of EFs along the river reach using the weighted average EFs of the 12 simulation periods as bars; their standard deviation is indicated as whiskers. It can be seen that the fluxes along the river were heterogeneous, with higher fluxes generally occurring in the upper half of the reach. The change in local hydrogeology beneath the HZ around halfway of the river reach, near point 7 could account for these variations. A seasonal temporal discretization of the output is shown in Fig. 7b: ‘summer’ consists of the simulation periods A and I, ‘autumn’ of the periods B, J, K, ‘winter’ of periods C, D, E, L and ‘spring’ of periods F, G and H, respectively. A combination of spatial and temporal differences along the river can be observed: the average EFs in ‘autumn’ and ‘winter’ were almost equally high (with averages of −49 mm d−1). EFs in summer were slightly lower with an average of −39 mm d−1; only at a few points summer EFs exceeded winter or autumn EFs. The average standard deviation for these three seasons was similar with 10 mm d−1. Only spring was different; here the EFs were much lower with an average of only −8 mm d−1. In ‘spring’, the parts of the river with generally low EFs changed the direction of flow and became infiltrating. The points with high EFs remained exfiltrating at all times. Since the ‘spring’ periods F, G and H were consecutive, they provide a good indication of EFs in spring 2006; unfortunately, it is not known whether the change in flow direction occurs on a yearly basis. The low EFs in spring may be caused by lower hydraulic gradients and due to evapotranspiration in this agricultural area. Spring is also the driest season.

Fig. 7
figure 7

a Weighted average of EFs calculated in the periods A–L. The whiskers indicate the standard deviation of the fluxes. It is clear that the spatial distribution of EFs along the center of the Aa River is heterogeneous with higher fluxes in the upper half of the river section. b Classified as summer (i.e. periods A and I), autumn (i.e. periods B, J, K), winter (i.e. periods C, D, E, L) and spring seasons (i.e. F, G, H), the temporal distribution of EFs is shown along the river. While the summer shows lower values than autumn and winter, the overall pattern is fairly similar. In spring the lowest values of EFs are obtained and some sections of the river show infiltrating conditions (i.e. positive flux values)

A comparison of the differences between spring and the other seasons shows that the average difference in EFs was 38 mm d−1, with nine measurement points fairly close to this number (they lie within 38 and 53 mm d−1), indicating that the temporal variation of EFs was relatively uniform along the river reach. At the center of the river, this variation therefore seems to be independent of the magnitude of the flux, suggesting that they are more influenced by regional hydraulic gradients and their variation than heterogeneous hydraulic conductivities of the riverbed.

The magnitudes of the calculated EFs at the center points are remarkably similar to the findings of Anibas et al. (2011); however, in the earlier study it was not possible to detect gaining EFs. Such changes of the direction of flow may have biological and geochemical implications and should be a focal point for future investigations.

Simulations using four riverbed temperature profiles

To improve the robustness of the model output, simulations with more than two temperature profiles were performed. This increased the information for the model fit and reduced the influence of measurement and fitting errors, but came at the cost of temporal resolution. To retain temporal resolution, it was decided to use four temperature profiles as model input, one to initialize the model and three to fit it (Fig. 1). Besides five missing values in Table 4, this setup obtained results for all simulations, bank measurements L and R included. To complete the data set, the missing values were interpolated using values of neighboring points in space and time. Four distinct periods in time were investigated: period 1 covered the time between 25 Aug 2005–13 Jan 2006 (autumn–winter 2005–2006), period 2 covered 13 Jan 2006–04 May 2006 (winter–spring 2006), period 3 covered 04 May 2006–08 Sep 2006 (spring–summer 2006) and period 4 covered 28 Sep 2006–08 Feb 2007 (autumn–winter 2006–2007). Two very similar periods of two consecutive years, labelled I (25 Aug 2005–09 Feb 2006) and II (08 Sep 2006–08 Feb 2007), were also analyzed. The lengths of these periods were relatively similar ranging between 111 and 168 days.

Table 4 Results of the simulated EFs in mm d−1 per measurement point and the different simulation periods

The set of spatially distributed EFs allowed for the examination of spatial heterogeneities of EF information. Upscaling represents a scientific challenge since it has to be assured that spatial heterogeneities are sufficiently accounted for on bigger scales (de Marsily et al. 2005). We performed upscaling by spatial interpolation of the point measurements that are representative on a local scale (i.e. up to several meters) onto a reach scale, which covers 19,600 m2. The 26 measurement points therefore may fall short to cover the heterogeneity and magnitude of EFs in the examined region. Works like Schmidt et al. (2007), Lautz and Ribaudo (2012) or Wang et al. (2017), however, indicated that spatial interpolation is a useful tool to examine and visualize spatial trends of exchange processes in the HZ.

Interpolation was performed with ordinary kriging in Esri ArcMapTM 10.1 based on variograms calculated from the EF dataset. Omni-directional experimental variograms and variogram models were created for the simulation periods with SGeMS (Remy et al. 2009). The average experimental variogram for these periods was fitted to a spherical variogram model with a nugget of 0.63 times the variance, a sill of 1.13 times the variance and a range of 800 m (see Fig. S1 of the ESM). Plots of the spatial distribution of EFs along the Aa River are shown in Figs. 8 and 9. For a better visualization, the width of the river is exaggerated in these figures by a factor of five using Surfer 13.5.583 (64-bit), Golden Software.

Fig. 8
figure 8

Spatial interpolation of the EFs along the Aa River of a period 1 (25 Aug 2005–13 Jan 2006), b period 2 (13 Jan 2006–04 May 2006), c period 3 (04 May 2006–08 Sep 2006) and period 4 (28 Sep 2006–09 Feb 2007). While white indicates stagnant or small EFs, green and blue colors indicate strong EFs of more than −130 mm d−1. Red colors indicate average EFs between −30 and −90 mm d−1. The spatial pattern of the EFs is variable for the four different periods. While in the upstream part the absolute changes are higher, in the downstream part of the river the relative changes in fluxes are greater. The figures have been created using a Radial Basis Function for a better display, where the width of the river is five times exaggerated. The green crosses indicate the measurement points starting with 1 L, 1 and 1R downstream on the left hand side of the subplots and end with 14 L, 14 and 14R on the right hand side of the subplot

Fig. 9
figure 9

Graphical interpretation of the spatial differences in EFs along and across the Aa River from data presented in Table 4 (b) in comparison to Anibas et al. (2011) (a). It is clear that the banks in general deliver much higher EFs than the center of the river. While the center values are fairly similar, the bank values are higher for b than the ones for a. While points 7, 11 and 14 show very high fluxes on the left bank, for point 4 the right bank is dominant. Notice that average values are shown

The results shown in the contour plots of Fig. 8 indicate average fluxes (standard deviations in brackets) for periods 1 to 4 of −62 (15), −63 (15), −42 (11) and −90 (16) mm d−1, respectively. Period 3, which roughly covers the summer season, demonstrated a very similar result to the one reported by Anibas et al. (2011) for summer with −44 mm d−1. By taking the surface of the riverbed and the varying average river discharge into account, which was 1.37, 1.79, 1.28 and 2.00 m3 s−1 (Waterinfo 2011) for periods 1–4 respectively, the net groundwater discharge for these periods was calculated. For period 1 it amounted to 14.1 × 10−3 m3 s−1, for period 2 to 14.3 × 10−3 m3 s−1, for period 3 to 9.5 × 10−3 m3 s−1 and for period 4 to 20.4 × 10−3 m3 s−1. The EFs seem to correspond with the river discharge, so that for periods 1 and 4 the net discharge of the reach is 1.0% of the river discharge, whereas for periods 2 and 3 respective values of 0.8 and 0.7% were calculated. These values are higher than previously published (Anibas et al. 2011) and show not only the influence of the spring and autumn seasons on groundwater/surface-water interaction, but also underscore the significance of the river banks for the variation and magnitude of EFs.

Thanks to the influence of bank EFs the temporal variation of EFs was stronger in the downstream half of the river reach; however, the downstream half has lower EFs and less spatial heterogeneity than the upstream half (Fig. 8). The measurement points 2, 3, 6 and 11 indicate low EFs, while especially the left bank at 7, 11 and 14 L shows high fluxes. While periods 1 and 2 showed similar average EFs (i.e. −62 and −63 mm d−1), period 2 experienced a higher spatial heterogeneity. Period 3 showed lower EFs (−42 mm d−1) along the entire river reach. In comparison to periods 1 and 2, the strongest reduction in EFs is indicated at the banks between point 7 and 11 with a decrease of up to 31 mm d−1. The biggest changes in EFs were indicated between periods 3 and 4, where they rise from their lowest levels to the highest ones. While in the upstream half, the EFs rise on average only by 36 mm d−1, for the downstream half this is now 59 mm d−1. The bulk of this change however was caused by high EFs at the banks around points 4 and 7.

The spatial analysis suggests a strong influence of the bank areas. Compared to the center, the banks usually show much higher EFs (Fig. 9; Table 4 and Table S1 of the ESM). For the left bank, EFs were on average 490% higher using simulations with four temperature profiles; at points 11, 13 or 14, the actual difference between the center and the banks can reach almost an order of magnitude. The differences between the center and the right bank are more subtle, but still the average values are 260% higher using four temperature profiles. A paired two-sample t-test with a significance level (α) of 5% was performed to compare the EFs calculated at the banks and at the center of the river. The tests showed that the null hypothesis (i.e. the EFs are the same at the left/right bank and at the center) should be rejected in both cases. It can be stated that the EFs of both the left and the right bank differ significantly from the EFs at the center of the river.

The EF values calculated for the banks are higher than comparable values obtained by Anibas et al. (2011). The thermal steady-state approach yielded respective values of 240% for the left bank and 150% for the right bank. These values however are close to the transient simulations using two temperature profiles. An unpaired two-sample t-test on the calculated EFs based on results of two and four temperature profile analyses showed that the results obtained for the left bank based on four profiles differ significantly from those obtained with two profiles (significance level (α) 5%), but the null hypotheses (i.e. the EFs are the same for both methods) could not be rejected for both the center and right bank. The reason for the differences could be related to the different model assumptions between the two transient models and the transient model and the steady-state model. The two-profile transient model and steady-state approximation could both underestimate the EFs along the river banks, especially at the left bank.

In the upper half of the river reach, higher EFs were observed at the left banks, while in the lower half, the right river bank dominated (Fig. 9), which can be explained with the hydromorphology of the river valley with respect to the direction of the regional groundwater flow. Flow lines converge towards the outer banks of the river bends because they are exposed to the groundwater flow. Since more measurements were taken in the center of the river, the influence of the banks on the total EFs might be underestimated. The high values of (vertical) EFs could, however, also be biased by horizontal or lateral flows (Ferguson and Bense 2011). In either way, the bank zones should receive more attention in future studies, where especially vertical and horizontal flow components should also be examined.

The simulation periods I and II cover the times of late summer to early winter of two consecutive years (Fig. 10). While the spatial patterns look remarkably similar in the figure, there are differences in the average EFs (standard deviation in brackets) with −56 (20) mm d−1 for period I and −49 (22) mm d−1 for period II. Period II showed a stronger variation with maxima and minima of EFs of −154 to −15 mm d−1, in comparison to −106 and –28 mm d−1 for period I. In the upstream half of the river reach, the average difference in flux is only 12 (20) mm d−1. The variation of the bank EFs is strong, e.g. while point 13 was stronger in period II, the EFs around points 7, 11 and 14 were lower. The average flux difference in the downstream half is only 3 (11) mm d−1, mainly because point 7 L showed a higher value in period I and points 1 and 2 showed higher values in period II. As much as 30% of the river reach showed differences in EFs of less than the standard deviation, 25% of it in the downstream section. This means that the bulk of the differences in EFs between I and II is due to different flux patterns in the upstream part of the river section, caused by the strong variation in bank EFs. When compared to Fig. 6 where rather uniform temporal EF variations where found along the center of the river, here again a strong influence of the banks on the overall spatial EF pattern is indicated. It is assumed that different local and regional flow mechanisms are influencing the EFs at the center and at the banks. While the center is dominated by stable, deeper regional groundwater flows, the banks are influenced by shallower, stronger fluctuating local groundwater flows. The fact that period I shows higher fluxes than period II is supported by the Darcy-based EF calculation (Fig. 4b), which also shows higher EFs in the first summer–winter period.

Fig. 10
figure 10

Spatial interpolation of the EFs along the Aa River of a period 1 (25 Aug 2005–13 Jan 2006), b period I (25 Aug 2005–9 Feb 2006) and c period II (8 Sep 2006–8 Feb 2007). While white indicates stagnant or small EFs, green and blue colors indicate strong EFs of more than −130 mm d−1. Red colors indicate average EFs between −30 and −90 mm d−1. The general spatial pattern of the EFs are relatively similar for the two different periods especially in the downstream part of the river. The spatial pattern of EFs in the upstream half of the river is more heterogeneous in time and space. The figures have been created using a radial basis function for a better display, where the width of the river is five times exaggerated. The green crosses indicate the measurement points starting with 1 L, 1 and 1R downstream on the left-hand side of the subplots and end with 14 L, 14 and 14R on the right-hand side of the subplot

A comparison with statistical tests of periods 1 and I gave insight into the reliability and repeatability of the obtained results. The overlap in simulation time of the two periods is 5/6, and assuming that the EF estimates are the integration over the simulation time, a reasonable (spatial) coincidence of the two periods was expected. The two periods indeed showed a remarkably similar spatial pattern (Fig. 10). Period 1 yields higher fluxes with −62 mm d−1 compared to period I with −56 mm d−1. The standard deviations yield 6 mm d−1 for both cases. The higher EFs of period 1 are caused mainly by higher fluxes downstream at points 1 L, 1, 1R, 7 L and 13 and shows that the applied methodology is able to consistently deliver comparable and hence reliable EF estimates. In order to assess whether the results for the three periods 1, I and II are significantly different, a paired two-sample t-test was performed (period 1 vs. period I, period 1 vs. period II and period I vs. period II) with a significance level (α) of 5%. The null hypothesis (i.e. the Aa River has the same EFs pattern for the three periods) is however not rejected for any of the combinations; therefore, the results for these three periods are not significantly different.

Conclusions

A large hyporheic zone temperature dataset has been reused in a transient thermal model to increase the output information

The presented data set collected in the Aa River covers more than 1.5 years, representing the longest investigation of hyporheic EFs using the thermal tracer method known to the authors. Parts of the data set were studied earlier (Anibas et al. 2008, 2009, 2011); in this study adapted tools and models are used to show that this dataset can deliver additional information. The thermal steady-state approach of Anibas et al. (2011) does not allow for quantifying EFs for periods such as autumn or spring. This is only possible with transient models such as the one presented here. On the other hand, the dataset is also insufficient to be used in advanced models like 1DtempPro or LPML, which need time-series rather than point-in-time data as input. The transient STRIVE model allowed investigating more temperature information than previously published. Results of EFs were obtained for any period of the year. Additionally, the transient thermal model could calculate model quality parameters.

The transient model delivers consistent and reliable results

In general, the obtained RMSEs are fairly low, mostly smaller than the accuracy of the used temperature sensor. By comparing overlapping simulation periods of independent model runs the applied methodology delivered statistically similar EF estimates. Also no statistical evidence of annual changes in EFs was found. While clearly superior in temporal resolution, the transient model may, like the steady-state assumption, not be completely independent from seasonal transient temperature influences. The measured riverbed temperature gradients did not directly correlate with the RMSE; however, a certain RMSE value in times of low vertical temperature gradients (e.g. autumn or spring) may allow for a weaker model fit than for a similar RMSE value during times of strong temperature gradients (e.g. winter and summer). Table 5 highlights the most important features of both the steady-state assumption and the transient model.

Table 5 Comparison of the main advantages and limitations of the transient model and the steady-state approximation in STRIVE

Transient model using two temperature profiles for the investigation of temporal EF variation

The use of two measured temperature profiles is the minimum requirement to apply the transient model. This setup allows for the best temporal resolution and is feasible for an investigation of temporal changes without the need for the best model quality. Especially for simulations where strong fluxes were expected, like at the river banks, this approach mostly did not lead to meaningful results. Time series measurements at the location of bank temperature profile measurement points could increase the quality of these EF estimates as well as the temporal resolution of the output. Measuring a big amount of corresponding time series would require additional resources that go beyond most field studies.

Transient model using four temperature profiles for the investigation of spatial EF variation

Modeling with four (or more) temperature profiles is expanding the input information for the model, and decreasing the influence of individual measurement errors. This approach led, at the cost of the temporal resolution, to results where the two-temperature-profile analysis failed. The main strength of the analysis with four profiles in general and the transient model in particular certainly is the possibility to analyze spatially distributed EF data.

River bank EFs play a pivotal role in groundwater/surface-water interaction in the Aa River

The transient model consistently calculated higher EFs for the bank areas than observed in earlier studies. The results therefore indicate a fairly strong influence of the bank EFs on the spatial and temporal EF variation along the river reach. Since the measurement points at the banks were underrepresented in the spatial survey of riverbed temperatures, the presented results probably still underestimate this influence. The bank areas therefore should receive special attention in future studies. The bank EFs might be influenced by horizontal and lateral flow components as well; such flows should also be considered.

Different mechanisms determine EFs at the center and at the river banks

The center of the river is dominated by relatively stable EFs. Here, the presented analysis indicates EFs very similar to the steady-state simulations of Anibas et al. (2011). These stable over time EFs emerge from deeper more regional groundwater flows. The banks on the other hand are influenced by shallower, spatially and temporally stronger fluctuating local groundwater flows.

Changes in the direction of flow from gaining to losing

In earlier studies (Anibas et al. 2008, 2011) it was not possible to reliably quantify gaining conditions in the Aa River reach. Here it was shown that parts of the downstream section of the river change flow direction from gaining to losing in the spring season. Such changes of the direction of flow are of ecohydrologic importance and should be a focal point for future investigations. Long-term observations should especially cover the first 8 months of the year, beginning from the strong winter EFs and focus on the times of reduced EFs in spring and their subsequent rise in summer.

Combining hydraulic head and temperature information in reach-scale groundwater/surface-water exchange studies

For any groundwater/surface-water exchange study, a combined collection of time series of river hydraulic head and temperature and riverbed temperature profiles is advised. Combining hydraulic head and temperature information can help to establish groundwater flow and ecosystem models on a reach scale. Such models could also help in the investigation of potential horizontal EFs near the banks. Aiming to find the relevant riverbed heterogeneities along and across the river reach, a better estimation of hydraulic and thermal parameters of the riverbed is advised with, e.g. a systematic application of slug tests and soil sampling.