Introduction

Recent advances in simulating water transportation using physically-based process-oriented fully distributed models are deepening the understanding of hydrological processes by considering multiple water regimes (e.g., surface water, vadose zone water, groundwater) as an integrated system (e.g., Spanoudaki et al. 2009; Brunner and Simmons 2012; Barthel and Banzhaf 2016; Berg and Sudicky 2019). The fully coupled models can eliminate the interfaces between separate model modules and avoid problems related to theories and software packages for different water regimes (Brunner and Simmons 2012; Barthel and Banzhaf 2016). These approaches have been successfully applied over a wide range of watersheds to characterize: stream flow processes (e.g., VanderKwaak and Loague 2001; Frei et al. 2010; Weill et al. 2013); interaction between surface water and groundwater (e.g., Kollet and Maxwell 2006; Shen and Phanikumar 2010; Ala-aho et al. 2015; Chen et al. 2019; Boubacar et al. 2020); spatiotemporal distribution of contaminated sediments (Mori et al. 2015; Kitamura et al. 2016; Sakuma et al. 2017, 2018); rainfall responses to surface and subsurface flow systems (Li et al. 2008; Miles and Novakowski 2016); water and heat transport processes at the river–groundwater interface (Munz et al. 2017); groundwater residence time distributions (Kollet and Maxwell 2008; Maxwell et al. 2016; Jing et al. 2018; Wilusz et al. 2020); climate change impacts on hydrological processes (Goderniaux et al. 2009; Sudicky 2013; Davison et al. 2018; Erler et al. 2019); artificial recharge (Maples and Fogg 2019); coseismic hydrological changes (Hosono et al. 2019; Tawara et al. 2020); and hydrological cycles (Davison et al. 2018; Hwang et al. 2018; Chen et al. 2019).

Despite significant progress in integrated watershed modeling techniques (Berg and Sudicky 2019), there is an argument about the credibility of the models since, for the majority, model calibrations and validations relied solely on use of hydrographs (McDonnell and Beven 2014; McGuire and McDonnell 2015; Schilling et al. 2019). The hydrograph alone, however, cannot explicitly explore the whole catchment dynamics such as the origin of water, water ages, flow path, storage distribution, and solute transport (Birkel and Soulsby 2015; van Huijgevoort et al. 2016a). Incorporating tracer data is thus critically needed (McDonnell and Beven 2014; Birkel and Soulsby 2015; McGuire and McDonnell 2015; Knighton et al. 2017; Scheliga et al. 2019; Schilling et al. 2019) to build realistic models to explain catchment dynamics explicitly. For this reason, some recent studies (e.g., Fenicia et al. 2010; Birkel et al. 2010, 2011; Davies et al. 2011, 2013; Hrachowitz et al. 2013; Beven and Davies 2015) incorporated tracer data into rainfall-runoff models for characterizing sources of runoff, storage, water ages, and flow paths together with hydrometric data.

Although tracer-aided rainfall-runoff models represent a step towards an enhanced representation of the hydrologic cycle (Knighton et al. 2017; Schilling et al. 2019; Birkel et al. 2020), most of the tracer-aided models are lumped/semilumped conceptual models and applied in small catchments (Hrachowitz et al. 2013; McGuire and McDonnell 2015; Smith et al. 2016; Ala-aho et al. 2017; Birkel et al. 2020). For instance, Van Huijgevoort et al. (2016a, b) and Ala-aho et al. (2017) developed tracer-aided rainfall-runoff parsimonious models for simulating flux, storage, age, and mixing processes of waters in three small catchments (0.5–3.2 km2) in Scotland, UK. Dehaspe et al. (2018) applied the same model for characterizing water and isotope transport processes in a small catchment (3.2 km2) in Costa Rica. Kuppel et al. (2018) developed a fully distributed, physically-based ecohydrological model for characterizing water isotopes (δ18O and δ2H) and age tracking across the hydro-pedological units in a small Scottish catchment (3.2 km2). Piovano et al. (2019) characterized the runoff and water storage using a tracer-aided model in Granger Basin, Canada (7.8 km2). Birkel et al. (2020) developed a semidistributed tracer-aided model for Howard River (126 km2) catchment in Australia. However, their applications are still limited in hydrological modeling mainly due to a lack of long-term measured tracer data (McDonnell and Beven 2014; Birkel and Soulsby 2015) and model complexity in terms of parameterization and computational challenges (Dunn et al. 2010; McDonnell and Beven 2014; Birkel and Soulsby 2015, 2016; Schilling et al. 2019). In addition, an application of these models has not been expanded on a regional scale, although its development is important for conducting reasonable water management at this scale (Rassam et al. 2013; Barthel and Banzhaf 2016). Hence, incorporating multiple tracers into this type of model is a necessary task for better characterizing catchment water dynamics on a regional scale.

The main challenge for developing such a model is the collection of long-term tracer measurement data over a regional scale. The model domain of this study addresses regional groundwater flow systems in Kumamoto in the central part of Kyushu Island in southern Japan (Fig. 1); several studies (Taniguchi et al. 2003; Hosono et al. 2013, 2014, 2020; Hossain et al. 2016a, b; Zeng et al. 2016; Kagabu et al. 2017; Okumura et al. 2018) have characterized the groundwater age, flow dynamics, and biogeochemical processes using several tracers such as groundwater age tracers including sulfur hexafluoride (SF6), tritium (3H), chlorofluorocarbons (CFCs) and krypton (85Kr), stable isotope ratios of water molecular (δD and δ18O), and aquifer temperature profiles. These studies demonstrated that the groundwater age tracers such as 3H and 85Kr concentrations are the most useful variables, while SF6 and CFCs cannot be used for proper age determination due to contamination effects (Kagabu et al. 2017; Ide et al. 2020). In addition, δD and δ18O do not show significant variations in time and space as former traces (3H and 85Kr) do due to small variations in altitude over the study area. Moreover, repeated measurement data of borehole temperature profiles were reported in this area (Taniguchi et al. 2003; Miyakoshi et al. 2020). An accumulation of these datasets provides an excellent opportunity to develop a multiple-tracers-aided model on a regional scale in Kumamoto, Japan.

Fig. 1
figure 1

Study area with locations of the meteorological station, precipitation gauge stations, groundwater monitoring wells, river discharge monitoring stations, sampling points for isotopic measurements, and groundwater temperature measuring wells

The aim of this study was thus to develop a robust model for improving the understanding of catchment dynamics by bridging the physically-based process-oriented fully distributed model and multiple tracers (3H, 85Kr, and groundwater temperature) on a regional scale. The novel contribution of this study is to calibrate a model using multiple tracers together with hydrometric data for simulating the groundwater age. Before simulating groundwater age, the model structure was established through trial-and-error procedures for an area where delineation of the depth of the hydrogeological basement remains uncertain due to absence of impermeable stratum. Based on obtained simulation results, this study explicitly explains the catchment dynamics of the active groundwater flow systems of Kumamoto.

Study area, materials and methods

This study seamlessly simulated surface and subsurface flows, materials (3H and 85Kr) and heat transport, and groundwater age using GETFLOWS (General-purpose Terrestrial FLuidflOW Simulator) simulation code (e.g., Tosaka et al. 2000, 2010; Itoh et al. 2000; Mori et al. 2015; Tawara et al. 2020). Details of this simulation code (governing equations, verifications and validations) are provided in the Appendix and electronic supplementary material (ESM). The brief descriptions of the study area, prevailing climatic conditions, geological features, hydrogeological settings, data used for this study are presented in the following sections, while the steps followed for model development are explained in sections ‘Numerical modeling’ and ‘Performance evaluation.

Study area

The Kumamoto region is located in the central part of Kyushu Island in the southern part of Japan (Fig. 1). Geographically, the study area extends from 32°40′N to 33°N latitude and 130°30′E to131°05′E longitude, and it covers an area of about 2,689 km2. Kumamoto is one of the largest groundwater utilization regions in Japan. About one million city dwellers depend entirely on groundwater resources for their domestic and drinking purposes (Oshima 2010; Shimada 2012; Taniguchi et al. 2019). Approximately 8 × 107 m3 groundwater per year is withdrawn from 58 pump stations to meet the water demand (Hosono et al. 2013).

Topography and climate

The landscape of the study area is diverse and includes mountains, highlands, and lowland areas that are open to the Ariake Sea (Figs. 1 and 2a). The study area covers three major river basins, Kikuchi, Shira, and Midori River basins, along with other small watersheds. The area covers the Aso caldera volcanic mountains (1,592 m above mean sea level, amsl). There are three main highlands, Kikuchi, Takayubaru, and Ueki highlands (Fig. 1) which were formed during the last volcanic eruptions of Mt. Aso and have altitude ranging from 100 to 200 m (Miyamoto et al. 1962). The elevation becomes lower westwards in the plain area (elevation <8 m amsl) and towards the Ariake Sea (Fig. 2a).

Fig. 2
figure 2

Map showing 3D-model meshes with a digital elevation with 10 m resolution and b geological distribution of Kumamoto region (modified after Hosono et al. 2019)

The study area is characterized by a warm and humid climate. There are 40 precipitation gauge stations and one meteorological station within the area (Fig. 1). The average annual rainfall for the period of 1956–2007 varied between 1,723 mm (in the southeast) and 4,135 mm (in the northeast) with an average of 2,551 mm. Almost 40% of annual precipitation occurs during the rainy season (June and July), and precipitation is the main source of groundwater recharge in this area. The annual average temperature of Kumamoto meteorological station (Fig. 1) during the period of 1902 to 2017 ranged from 14.7 °C (recorded in 1917) to 18.2 °C (recorded in 1998) with an average of 16.1 °C. Further discussion on the climate of the study area can be found in Rahman et al. (2020a).

Geology

The geological units of the study area can broadly be divided into four groups (Hosono et al. 2020), the geological basement of Paleozoic metamorphic and metasedimentary rocks, Tertiary-Quaternary Pre-Aso volcanic rocks, the Quaternary Aso volcanic rocks (Ono and Watanabe 1985; Miyoshi et al. 2009), and alluvium deposits (Fig. 2b and Fig. S1 of the ESM). The Pre-Aso volcanic rocks are partially outcropped at the surface around the western foothill of Mt. Aso and at the Mts. Kinpo and Tatsuta. These volcanic rocks with older ages consist of major constituents of the hydrogeological basement (Fig. S1 of the ESM). The younger Aso volcanic rocks are distributed over the wide areas and composed mainly of flow lavas and pyroclastic flow deposits (Ono and Watanabe 1985), overlain by alluvium and the Ariake marine clays in the coast that are formed in the last transgression (Fig. 2b and Fig. S1 of the ESM). Aso volcanic rocks are subdivided into four units, based on the major volcanic eruption cycles, Aso-1, Aso-2, Aso-3, and Aso-4 from older to younger units (Ono and Watanabe 1985; Miyoshi et al. 2009). Of these, Aso-2 is a highly permeable formation as it consists of andesitic lava with joints and many porous structures (Hosono et al. 2013). There is impermeable lacustrine sediment, the Futa and Hanafusa layers, between the Aso-3 and Aso-4 (Fig. S1 of the ESM).

Hydrogeology

The aquifers in the study area are separated by the Futa and Hanafusa impermeable sediment layers into unconfined (named as the first aquifer) and semiconfined to confined aquifers (named as the second aquifer; Fig. S1 of the ESM; Taniguchi et al. 2003; Hosono et al. 2013, 2019, 2020; Rahman et al. 2020b). The first aquifer consists of unwelded Aso-4 pyroclastic deposits and alluvium sedimentary deposits, and the depth of this aquifer varies between a few meters to 90 m. The second aquifer is comprised of Aso-1 to Aso-3 pyroclastic flow deposits. The second aquifer is known as a deep aquifer, and its depth varies between 20 and 250 m. The groundwater in the study area is withdrawn mainly from the second aquifer. A detailed hydrogeological description is provided elsewhere (Hosono et al. 2013, 2019, 2020). There are two major groundwater flow lines, A–A′ and B–B′, which are further divided into four zones, recharge, mixed, discharge, and stagnant zones (Fig. S1 of the ESM) (Hosono et al. 2013, 2014). The cross-sectional views along these two major flow lines are shown in Fig. S1 of the ESM. The Kikuchi, Ueki, and Takayubaru highlands (Fig. 1) are the major groundwater recharge zones. Infiltration of precipitation is the primary source of groundwater recharge. In addition, groundwater recharge also occurs from rivers and the artificial ponding of paddy fields near the mid-stream of the Shira River in groundwater recharge and mixed zones (Hosono et al. 2013; Taniguchi et al. 2019). In these areas, the soil infiltration capacity is very high (30–500 mm/day; Kiriyama and Ichikawa 2004; Takemori and Ichikawa 2007), and it accounts for almost one-third of the total groundwater recharge (Shimada 2012). In contrast, the majority of groundwater discharge occurs at the Ezu Lake in the plain area (Fig. 1 and Fig. S1 of the ESM).

Data source

The model generally requires areal property data such as surface and subsurface geology, meteorology, land use, and land cover. The seamless digital surface geological map of Japan (Geological Survey of Japan 2009) was used for mapping the surface geology (Fig. 2b). The digital elevation map (Fig. 2a) with a 10 m resolution reported by the Geographical Survey Institute of Japan was used for the modeling. Land use and land cover maps (Fig. S2 of the ESM) were used to estimate Manning’s roughness coefficient (Table S2 of the ESM) provided by the Ministry of Land, Infrastructure, Transport, and Tourism (MLIT) of Japan and Land Surface Hydraulic Conductivity (LSHC) were also collected from MLIT (Table S2 of the ESM). The long-term (1957–2006) streamflow discharge data from 21 monitoring stations and groundwater level data (for the period of 1976–2006) from 43 monitoring stations (see Fig. 1 for their locations) were collected from MLIT database. The historical water use data such as river intake, paddy irrigation, and groundwater pumping were estimated by the Kumamoto City government for the same period. Precipitation and snow melt data (1956–2007) from 41 monitoring stations (Fig. 1) were collected from the Japan Meteorological Agency (2020) and the Ministry of Land, Infrastructure, Transport and Tourism (MLIT 2020) of Japan. A Thiessen polygon map was produced to show the distribution of precipitation in the area (Fig. S3 of the ESM). Potential evapotranspiration was estimated by the Thornthwaite method (Thornthwaite 1948) using the daily average temperature data of Kumamoto weather station from 1956 to 2007 considering the altitude effect on temperature by a factor of −0.0059 °C/m. A three-dimensional (3D) geological map of the study area (Fig. 2b) was prepared based on subsurface geological information from geological maps and borehole data reported in domestic reports. The hydraulic properties such as porosity and hydraulic conductivity were estimated from pumping test data reported in domestic reports. The multiphase flow properties such as intrinsic permeability, relative permeability, and capillary pressure were estimated from well test data, which are reported in the domestic reports. Further detail about the multiphase flow parameters for this study area can be found in Tawara et al. (2020).

There is a comprehensive temporal tracer data set for the study area; 47 vertical profiles of borehole groundwater temperature data were used (see Fig. 1 for their locations). Most of the stations have groundwater temperature records of very fine interval (1 m) from 1 to 221 m in depth repeatedly measured by previous researchers (Shimano et al. 1992 for records during 1986–1988; Taniguchi et al. 2003 for records during 2001; Uchida et al., Geological Survey of Japan, personal communication, 2017, for records during 2009–2010; and Ikawa et al., Geological Survey of Japan, personal communication, 2017, for records during 2011; Miyakoshi et al. 2020). This study also compiled historical 3H concentration data from 26 measuring stations measured by the Water Resources Development Study Group (1975), Kumamoto Prefecture and Kumamoto City (1986, 1995) and Yamaguchi (2010). Kagabu et al. (2017) measured groundwater 85Kr concentrations from nine wells that were used for validating the model derived during this study. Long-term (1940–2013) atmospheric 85Kr and 3H concentrations were also used, shown by Kagabu et al. (2017) as model input parameters.

Numerical modeling

GETFLOWS simulation code was used for fully coupled simulation of hydrological processes in Kumamoto region. GETFLOWS is a physically based, process-oriented fully distributed and multi-phase flow watershed modeling simulator (e.g., Tosaka et al. 2000; Mori et al. 2015). The GETFLOWS simulator has already been verified for several domains using analytical solutions, experimental study, and intercomparisons with other numerical codes (e.g., Tosaka et al. 1996, 2000; Itoh et al. 2000; Mori et al. 2015; Kitamura et al. 2016; Sakuma et al. 2017, 2018). Other (unpublished) verifications were also performed for simultaneous transport of heat and water for relevant cases, mainly for the subsurface domains. The details regarding verification and validation (V & V) of GETFLOWS are provided in the Appendix and ESM (Sections S2-S8 of the ESM). The procedures applied for developing the numerical models are presented in the following sections.

Model domain discretization

The integral finite difference method (IFDM) was applied for spatial discretization of the model domain in the GETFLOWS system (Mori et al. 2015). The computational fields were discretized into arbitrary deformed hexahedral grid blocks (Fig. 2) for adapting the geological heterogeneity, river networks and complex topography of this study area (Fig. 1). The surface model domain was discretized into 33,274 grids (Fig. 2) with horizontal resolution varying between 100 and 500 m, considering surface topographical and land-use features. Moreover, the grid block system was refined to match the curvature and width of the main river channel. This ensures that no empirical parameters are necessary for connecting or disconnecting the watershed components like rivers, slope, subsurface soils, etc., and reduce the computational burden (Mori et al. 2015). The subsurface in the vertical direction was discretized into 28 layers with varying thicknesses; hence, the total number of analysis grids (998,220) was about one million.

Boundary conditions of the model

The no flow model boundary conditions were assigned at the borders in contact with land (Fig. 1) and at the bottom of the model that was set to 2,000 m below mean sea level. The active groundwater flows are assumed to be negligible below this depth. Furthermore, a zero-meter constant head boundary was assigned for the boundary between the model and the sea area (Ariake Sea; Fig. 1). The boundary of the model domain close to the seaside is approximately 20 km from the coastline and hence cannot inhibit groundwater and surface water flowing from the land into the sea. One of the main advantages of the GETFLOWS simulator for the fully coupled model is that it is not necessary to apply any a priori assumptions such as the first order exchange coefficient or interfacial boundary conditions which are mandatory for many existing simulators such as HydroGeoSphere (Therrien et al. 2010) and ParFlow (Kollet and Maxwell 2006). In other words, there is no need to assign boundary conditions for groundwater recharge and infiltration rates since this simulator can explicitly compute the interactions between air and water phases (Tosaka et al. 2000; Mori et al. 2015; Tawara et al. 2020). To enable the fully coupled simulation, two layers—an atmospheric layer and a surface layer—were placed over the subsurface layers (K = 3, 4…, 29, 30) for coupling the fluid flows between surface and subsurface water regimes (Fig. S13 of the ESM). Constant atmospheric pressure and temperature were assigned as boundary conditions in the atmospheric layer, K = 1. Precipitation and potential evapotranspiration were assigned as boundary conditions in the surface layer, K = 2. A large value was given for the porosity and permeability for the uppermost layer (K = 1), whereas the effective porosity of the surface layer (K = 2) was set to 1.

For transport of materials (85Kr and 3H), similar boundaries (groundwater boundaries as stated above) were applied and the process was considered as simultaneous transport of water and materials. Furthermore, the material transport parameters diffusion coefficient and decay constant were used for simulating the behavior along the water flows (Table S5 of the ESM), while historical atmospheric concentrations of 87Kr and 3H in rainfall were assigned to the surface layer. In addition, the seamless simulation of groundwater and heat transport processes were considered following Domenico and Palciauskas (1973). Detailed description of the thermal boundary conditions is provided in the ESM (Section S8 of the ESM). In brief, the constant heat boundary condition (the long-term average temperature of Kumamoto meteorological station) was assigned to the surface layer of the model. The heat insulation boundaries were assigned at the borders in contact with land and the constant heat boundary (107 °C; Dong 2014) was set to the lower surface of the model. The initial geothermal gradient was set to 4.5 °C/100 m and the heat transport parameters such as heat capacity and thermal conductivity (Dong 2014) were treated as fixed values in this simulation. Dong (2014) did a field survey, and performed one-dimensional (1D), two-dimensional (2D) analytical solutions and 3D heat transport modeling to assess the specific heat and thermal conductivity values for different geological layers in Kumamoto region.

The model parameters, such as Manning’s roughness coefficient (Table S1 of the ESM), land-surface hydraulic conductivity (Table S2 of the ESM), hydraulic conductivity and effective porosity (Table S3 of the ESM), thermal properties (Table S4 of the ESM), and material transport parameters (Table S5 of the ESM) are provided in the ESM.

Groundwater abstractions

There are a number of production wells (207) in the study area (Fig. S4 of the ESM). The historical daily groundwater abstraction data estimated by Kumamoto city government were used for modeling purposes. The GETFLOWS systems incorporate the well index approach proposed by Peaceman (1983) to address the impact of groundwater pumping in the model. The Peaceman well index approach is derived for single phase flows in anisotropic homogeneous ground and is applied to estimate groundwater production amount at a given pressure within a well shaft created in a coarse grid. This approach provides a good approximation for the grid with a production well, while the grid size is generally larger than the production well diameter and there is no mutual interference with the boundary conditions.

Model calibration and two different model settings

The model calibrations were performed following the trial-and-error procedures. The model performance was first assessed by comparing observed and simulated values of river discharges and groundwater levels. During the calibration process, the hydrogeological parameters of the aquifers (Table S3 of the ESM) were first adjusted within the range of values reported in the domestic borehole completion reports and previous relevant studies for this area (Shimada et al. 2012; Ichiyanagi et al. 2012; Hosono et al. 2019). It was confirmed that hydraulic conductivity is a fundamental parameter that has practical application, and other hydrogeological parameters, such as porosity, are not sensible for use in calibration in this study. Thus, an attempt was made to calibrate hydraulic conductivity values for each unit and to check the performance of the model by assessing the fitting between simulated and observed river discharge and groundwater levels, as well as take account of multiple tracer data. These trial-and-error works were repeated until there was maximum fit between the calculation and observation results.

Through the aforementioned calibration, a model was first established following preexisting geological maps, assuming that each geological unit has each particular hydrogeological feature, and thus it was assumed that there is no active groundwater flow system in the Pre-Aso volcanic hydrogeological basement rocks beneath the second aquifer (Shimada et al. 2012; Ichiyanagi et al. 2012; Fig. S5a,c of the ESM). This model is called ‘model-1’. Thereafter, another model called ‘model-2’ was established. Model-2 basically follows almost similar geological distributions and thus has hydrometric conditions similar to model-1. However, major changes were applied in model-2 with respect to its boundary condition of the bottom surface of the second aquifer, which was set to 100 m deeper than model-1 (Table S3 and Fig. S5b,d of the ESM). This model structure is based on the new results from a recent boring survey for deeper groundwater exploration (Nakayama et al. 2019). This recent survey confirmed the existence of the permeable layer in the upper part of the previously defined hydrogeological basement, i.e., Pre-Aso volcanic rocks. Nakayama et al. (2019) documented several groundwater production wells penetrating to a depth that is about 100 m into the Pre-Aso volcanic rocks and most of the production wells are continuously running for the purposes of groundwater abstraction. Hence, it clarified that there is a groundwater flow system in the upper part of the Pre-Aso volcanic rocks; Nakayama et al. (2019) called this the ‘third aquifer’, although the thickness of these deepest aquifers is not clearly defined from their field observations.

After performing calibrations with trial-and-error procedures, the hydraulic conductivity for the new aquifer unit was defined (Table S3 of the ESM), with properties similar to those obtained from model-1 for the aquifer units. In total, more than 100 variable sets were tested through a trial-and-error process used for constructing both models. Although the hydraulic conductivity values were generally similar for both models, the values for the second aquifer (Aso-1 to 3, pyroclastic flow deposit) in model-2 were slightly lower than the obtained values for model-1 (Table S3 of the ESM). For instance, the obtained hydraulic conductivity values for the second aquifer were between 1 × 10−5 and 1 × 10−0 cm/s for model-1 and these were between 1 × 10−5 and 2.5 × 10−1 cm/s for model-2. In addition, the adjusted hydraulic conductivity value for aquifer unit Togawa lava was slightly lower in model-2 than model-1 (Table S3 of the ESM). All calibrated values fall in the ranges of previous reports for the study area (Shimada et al. 2012; Ichiyanagi et al. 2012; Hosono et al. 2019).

For both models, steady-state simulations were performed to create the initial conditions representative of long-term (1976–2006) average groundwater levels, stream flow discharge, exchange flux and heat. Thereafter, daily time transient simulations for the validation period (2002–2006) were performed by both models using the daily meteorological data, tracer data and water use data.

Groundwater age simulation

Groundwater age simulation is often performed by particle tracking approaches and the ‘age mass’ concept. Particle tracking approaches generally consider only the advection motion of groundwater and ignore mixing and dispersion processes (Cordes and Kinzelbach 1992; Varni and Carrera 1998; Cornaton and Perrochet 2006). However, some studies have also documented that 3D backward particle tracking approaches (Weissmann et al. 2002; Zhang et al. 2018) with considerations of dispersion and mixing of groundwater provide reliable groundwater age in regional-scale aquifer systems. In this study, the mean groundwater age simulation was performed following the conceptual age mass concept (Goode 1996). The main governing equation for estimation of mean groundwater age (equivalent to mean transit time) is derived from the mass conservation principles and includes advection, dispersion, and mixing of water (Stolp et al. 2010; McCallum et al. 2015). Detailed methods are shown by Goode (1996). This approach can better infer the spatial distribution of mean groundwater age than by general particle tracking approaches (Cordes and Kinzelbach 1992; Varni and Carrera 1998; Cornaton and Perrochet 2006).

Performance evaluation

The river discharge simulation performance was evaluated using the Nash-Sutcliffe Efficiency (NSE) coefficient (Nash and Sutcliffe 1970), root mean square error (RMSE), and coefficient of determination (R2). Simulated groundwater level was evaluated by RMSE, R2, Pearson correlation coefficient (r) and relative interquartile range error (QRE). The QRE is effective for evaluating the time series amplitude of groundwater level. It can be expressed as (Sutanudjaja et al. 2011; Jing et al. 2018):

$$ {Q}_{\mathrm{RE}}=\frac{{\mathrm{IQ}}_{7525}^{\mathrm{md}}-{\mathrm{IQ}}_{7525}^{\mathrm{dt}}}{{\mathrm{IQ}}_{7525}^{\mathrm{dt}}} $$
(1)

where, \( {\mathrm{IQ}}_{7525}^{\mathrm{md}} \) is the interquartile range of calibrated groundwater level data and \( {\mathrm{IQ}}_{7525}^{\mathrm{dt}} \) is the interquartile range of observed groundwater level data.

Mean absolute error (MAE) and mean error (ME) were used to examine the tracer (3H and 85Kr) output results. Since relatively low numbers of measured tracer data are available, evaluations by the hydrograph-oriented matrix such as NSE (Nash and Sutcliffe 1970) are less applicable (Ala-aho et al. 2017; Kuppel et al. 2018). However, this study could not estimate any goodness of fit statistics for the groundwater temperature simulation results, as the depth of the measured and simulated points were not coincident. Thus, the model performance for groundwater temperature simulation was assessed by visual inspection of the plots of measured and simulated groundwater temperatures in depth profiles.

Results

Steady-state simulation

A steady-state hydrological simulation was performed to create an initial condition for both model-1 and model-2. The steady-state simulation results for streamflow discharges and groundwater levels are shown in Fig. S6 of the ESM. The estimated NSE value for model-1 was 0.83, which indicates that the model performance is very good (Moriasi et al. 2007) with an RMSE of 2.57 (m3/s). Model-1 also reproduced groundwater levels with a high correlation coefficient (r = 0.95; Fig. S21 of the ESM) and RMSE of 6.48 m over a wide range of groundwater levels, between 0.07 and 477.92 (m amsl). Groundwater temperature-depth profiles were also simulated at steady state using model-1 (Fig. 3 and Fig. S7 of the ESM), showing that the simulated temperature profile has a large gap in observations for most of the wells. In particular, the simulated groundwater temperature increases with increasing depth, which does not follow the measurement trends since the upper part of the Pre-Aso volcanic rock is considered as hydraulic basement rock (Shimada et al. 2012; Ichiyanagi et al. 2012). This finding clearly indicates that the aquifer in the area is much thicker than previously considered. Although the structure of model-1 does not represent the actual field conditions, it reproduced hydrographs with acceptable results. Several studies (e.g., McDonnell and Beven 2014; Birkel and Soulsby 2015; van Huijgevoort et al. 2016a) also confirmed that hydrograph fitting could not well-characterize catchment dynamics.

Fig. 3
figure 3

Comparison between groundwater temperatures obtained from steady-state simulation and measured groundwater temperature data. The green dots represent average of measured data, while blue and red solid lines show results of model-1 and model-2, respectively. These three wells represent a shallow depth well, b medium depth well, and c deep well

The groundwater temperature-depth profile was simulated for 47 wells, which are located over the study area. The simulations included groundwater recharge, major flow, discharge, and stagnant zones (Fig. S1 of the ESM). Visual inspection showed that the simulation results from model-2 exhibited better agreement with the observations than the results from model-1 for 30 wells, while the rest (12 wells, mainly located in the first aquifer) exhibited a close agreement for both models, and few of them showed better agreement with model-1 than model-2 (Fig. S7 of the ESM). Five wells showed a large discrepancy by both models. The results of steady-state hydrograph simulation by model-2 showed slightly higher errors than model-1 for streamflow discharges (Fig. S6 of the ESM). However, the simulated steady-state groundwater levels by model-2 showed lower errors than model-1. This result corresponds to some aspects, i.e. that the reproduction of hydrographs using tracer-aided models generally shows higher error than hydrometrically calibrated models (see review by Schilling et al. 2019).

The simulated groundwater level shows a wide range of spatial variation following the topography of the area for both models (Fig. 4a,b). The simulated groundwater level in the plain area was the same for both models, and means and maximums were different in the highland areas (= recharge areas): model-1 mean elevation 322.46 m amsl and model-2 mean elevation 321.91 m amsl, and model-1 maximum 1,548.73 m amsl and model-2 maximum 1,499.01 m amsl. Other fluxes such as groundwater recharge and discharge, also displayed some gaps (Fig. 4c,d). Figure S8 of the ESM exhibits a good agreement between observed and simulated (by model-2) recharge rates at some representative sites, with high r value (0.86), low AME (0.49 mm/day), and low ME (0.30). In addition, model-2 well captured the artificial recharge rates. Model-2 estimated a maximum groundwater recharge rate of 613.45 mm/day (Fig. 4), which is in good agreement with some observations (~500 mm/day; Kiriyama and Ichikawa 2004; Takemori and Ichikawa 2007). However, model-1 estimated the maximum groundwater recharge rate of 1,074.10 mm/day, possibly due to overestimation of groundwater recharge, and thus the estimated groundwater residence time is faster than in the actual field conditions. Figure 4c,d shows that subsurface water discharge occurs along the major rivers and in lake areas, which was expected, and hence deemed an adequate inference for this model.

Fig. 4
figure 4

Simulated scenario of steady-state models for ab groundwater level and cd water flux in terms of groundwater recharge and discharge. Model-1 simulation results (a and c), model-2 results (b and d)

Transient simulation

Transient-state simulations were carried out by both models for 5 years (2002–2006) and reproduced daily streamflow discharges, groundwater levels, and tracer concentrations (3H and 85Kr). The simulation was performed for 21 river stations. Figure 5 shows results for streamflow discharges at three representative stations. As shown in Fig. 5, there is a good agreement between the observed and simulated discharge data for both models. For model-1, the NSE values obtained from the comparison between observation and simulation varied from 0.46 (Dai Roku Hashi station) to 0.90 (Chukobashi station; Table S6 of the ESM) with an average value of 0.70, and R2 ranged between 0.59 and 0.92 with an average value of 0.80. The performance of model-2 showed that the average NSE value for seven stations was 0.71, with an R2 value of 0.79 (Table S6 of the ESM). The data at Dai Roku Hashi also showed a comparatively large discrepancy between observation and simulation compared to other stations.

Fig. 5
figure 5

Comparison between observed and model simulated results for stream flow discharge for stations located in three major river basins in the study area: a Jinnai station in Shira river basin, b Hirose station in Kikuchi river basin, c Mifune station in Midori river basin

Some examples of transient simulation of groundwater levels are shown in Fig. 6. Instead of plotting the actual groundwater levels, Fig. 6 shows the anomalies of observed and simulated groundwater level data related to their long-term mean values for analyzing the discrepancy between model results and observations. The model performance was evaluated by the Pearson correlation coefficient (r), which indicates the timing/punctuality, and relative inter-quartile range ∣QRE∣, which measures the magnitude of amplitude error (Sutanudjaja et al. 2011; Jing et al. 2018). Models that simulated groundwater level showed a good agreement with observed data. The r and ∣QRE∣ values for the well in Gotsu (Fig. 6a) were 0.84 and 11.60% for model-1 and 0.78 and 0.99% for model-2, respectively. The mean and median values of model results and observation data are also shown in Fig. 6. The r value for model-1 is higher than that of model-2; however, ∣QRE∣ for model-2 is lower than that of model-1. Moreover, the estimated mean and median values of model-2 were closer to the observed values than the results of model-1. Similar results were also obtained for the well located in Izumi (Fig. 6b); however, the opposite results were found for the well in Koshi (Fig. 6c).

Fig. 6
figure 6

Comparison between the observed and simulated groundwater level anomaly of three groundwater observation stations: a Gotsu shi, b Izumi, and c Koshi

The simulation was examined using 43 groundwater monitoring stations. The estimated r value between observation and simulation data for model-1 (42 stations; one station has no good observation records) was 0.96 for the period during 2002–2006, while it was 0.94 for model-2 for the same period (Fig. S9 of the ESM). Moreover, the r values calculated between the median of observed and simulated were 0.96 and 0.94 for model-1 and model-2, respectively. This study also examined the seasonal responses of the model results (Fig. S10 of the ESM). These assessments generally show identical patterns that indicate that the model results do not exhibit any bias to a particular season. It may be noticeable that the error for the summer season, when most of the precipitation occurs, is the highest for model-1, while it is the lowest for model-2. Despite some discrepancies between observed and simulated data, model results generally well captured the seasonal groundwater dynamics.

Tracer simulation

Simulation was performed to reproduce 3H and 85Kr concentrations in groundwater for all the grids of the model domain. The 3H time series for 23 wells and 3 springs over the study area were used to evaluate the simulation performance of the models. Note that, tracers like 3H and 85Kr have no continuous measured records. For 3H, stations with at least 2 years of measured records during the period of 1957–2010 were used to validate the results obtained from the models. Figure 7 displays typical examples of comparisons between measured data and model results. Figure 7a shows groundwater data from the plain area, whereas Fig. 7b,c display groundwater data from Aso Mountain and spring water from Ezu Lake, respectively. The simulation results from model-2 seem to exhibit a good correspondence with measured data for all three stations. Plots for all stations are shown in Fig. S11 of the ESM. The estimated MAE and ME were 10.63 and 6.36 TU, respectively, for model-1 and the difference in mean between measured (5.88 TU) and simulated data (12.24 TU) was very high (6.36 TU) for model-1. On the other hand, model-2 reproduced 3H concentrations with reasonable accuracy, with MAE of 5.71 TU, ME of −0.27 TU and a small difference in mean between simulated and measured data of 0.28 TU. Thus, the discrepancy between measured data and simulation results is far smaller in model-2 than model-1.

Fig. 7
figure 7

The 3H concentration time series at different locations in the study area. 3H in precipitation is indicated by the blue line. Measured 3H values in groundwater are shown by green plots and simulated time series of 3H in groundwater for model-1 and model-2 are shown by light green and purple lines, respectively

Similarly, 85Kr concentration data of nine stations were compared to the simulation results for both models (Fig. 8 and Fig. S12 of the ESM). Model-1 exhibited an unsatisfactory performance for 85Kr concentration simulations for all stations since the difference between measured and simulated were very high. On the other hand, model-2 reasonably reproduced the 85Kr concentrations for almost all the measuring stations except a station located in Ippongi. There are some stations where measured 85Kr concentrations were nil as these are located in the stagnant zones, and model-2 results showed almost zero for these stations. The notable improvement in the performance of 85Kr simulation results from model-2 is consistent with results from the steady-state simulation of groundwater temperature.

Fig. 8
figure 8

The 85Kr concentrations time series in the atmosphere and groundwater. Black line, red square, green and blue lines indicate 85Kr values in the atmosphere, measured 85Kr in groundwater, and simulated (model-1 and model-2) 85Kr in groundwaters, respectively. The wells are in the a discharge area and b stagnant area. Note that measured 85Kr concentration was below the detection limit (0.0015 Bq) for the site illustrated in part b

Discussion

Applicability of multiple-tracers-aided hydrological modeling

Development of the two models and recent deep bore log data provide an opportunity to test the applicability of the multiple-tracer-aided model for reducing structural uncertainty. The findings of this study clearly demonstrate that hydrograph fitting alone could not determine the groundwater storage in an area where lower surface boundary conditions remained uncertain. Hence, the classical modeling approaches are not suitable for this type of aquifer system. The uncertainty of the model structure can be addressed in the model by incorporating tracer data. Reproduction of tracer data along with hydrograph fittings can increase the credibility of the model and reflect reliable subsurface conditions such as depth of the bedrock and groundwater storage. These factors have a significant influence on the estimation of groundwater residence time which is very useful in understanding contaminant transport dynamics (Ameli et al. 2016; Heidbuchel et al. 2013; Kim et al. 2016, Basu et al. 2012). Some studies have determined that tracer data can be used to improve the conceptualization of models as well as examine their internal consistency when tested at local scale (Tsuboyama et al. 1994; Dunn et al. 2010; Delavau et al. 2017; Schilling et al. 2019). The results reported here demonstrate that the methodology can be extended to a study at regional scale.

Regional catchment water dynamics in Kumamoto

As mentioned in the previous section, the updated model (model-2) provides better visualization of the subsurface systems such as groundwater storage and depth of the bedrocks of the Kumamoto region. Therefore, model-2 was used for flow paths and mean groundwater age simulation. Figure 9a displays the 3D simulated orthogonal projection of surface and subsurface coupled streamlines by model-2. The yellow lines display relatively faster (1.0–10 m/day) groundwater flow than other part of aquifers (e.g., green line, 0.1–1.0 m/day) in the study area, where active groundwater flows are facilitated by the high infiltration features in and around the midsection of the Shira River and the presence of highly porous formations of Togawa lava. The red lines represent the rapid flow of streams or surface runoff with higher velocities.

Fig. 9
figure 9

Spatial distributions of simulated a surface and subsurface coupled streamlines in equilibrium state with velocity of water flow, b groundwater age in the first and c second aquifers with water flow directions as shown by small gray arrows

Model-2 was also used for simulating the mean groundwater age distribution following the age mass concept, which includes advection, diffusion, and mixing of waters (Goode 1996). The solution of the equations yields mean groundwater ages ranging from a few years to 300 years in the first (unconfined) aquifer in the study area (Fig. 9b). As expected, the groundwater is generally older in the southwestern part near the coastal area in the stagnant zone, and younger waters are found in the recharge areas. Groundwater ages were also visualized for the second (confined) aquifer that showed older ages compared to the unconfined aquifer but shows similar spatial patterns (Fig. 9c). The simulated mean groundwater ages match well with the findings of earlier studies (Momoshima et al. 2011; Kagabu et al. 2017), while they were considered only for point data along two major flow lines (Fig. S1 of the ESM). For example, Kagabu et al. (2017) estimated groundwater age and found young (approximately 16 years) waters in the recharge area (A–A′ flow line), and the simulated groundwater ages for the same locations were within the same range (10–25 years). Although the estimated groundwater age along the B–B′ flow line (>55 years) was not defined precisely due to limitations of the 85Kr age tracer method (Kagabu et al. 2017), the simulation can determine that the groundwater ages range from ca. 10 to 100 years, except in the plain to coast areas where older aged (mostly >250 years) groundwater is found (Fig. 9c). The results from model-2 showed a more precise and accurate view of water flow pathways and time ranges than those from model-1 or any other studies from point observation datasets at regional scale.

Limitations and implications

Tracer simulation results for some measuring stations (Figs. S7, S11 and S12 of the ESM) still showed large errors, and some of them are found in the same areas. Hence, these discrepancies are mainly related to the local heterogeneities. Some sources of errors might be related to grid (Bathurst 1986; Hardy et al. 1999) and DEM resolutions (Ivanov et al. 2004; Zhang et al. 2016)—for example, this study used similar grid size (100–500 m resolution) to other reported studies, but DEM resolution (10 m) was coarser than those used in some previous works (e.g., van Huijgevoort et al. 2016a) with high-resolution DEM (1 m) and grid (100 × 100 m), mainly because this study treated larger areas than previously tested. Model performance for simulating tracers (point data) and hydrographs may increase using subgrid parameterization techniques (Samaniego et al. 2010; Decker 2015) and high-resolution DEM (Beven 1989; Smith et al. 2004; Ivanov et al. 2004; Zhang et al. 2016). In turn, if the numbers of grids increase and the complexity of the model increase, it takes more time for processing and calculations, which must simultaneously be improved for more practical application and for general users. [It took ca. 1–2 days to run a set of simulations for the case of the present study, while parallel computation was run by more than 100 computers with multicore processors (Intel(R) Core(TM) i7-3960X CPU @ 3.30 GHz).]

Although there are some limitations and scope for further improving the modeling, the obtained results and relevant studies (e.g., Knighton et al. 2017; Schilling et al. 2019; Birkel et al. 2020) showed that tracer incorporation is fundamental in increasing the reliability of the model. Although multiple tracer data like those shown in this study (3H and 85Kr concentrations and temperature log data) are not often available in many regions of the world, some general hydrochemistry data or dissolved contaminants data may be commonly available globally. For instance, some recent studies have started involving dissolved nitrate ions in modeling to understand transportation and behavior of nitrogen contamination once loaded on the ground surface and subsequently carried with water flows in aquifers with transformation (e.g., Almasri and Kaluarachchi 2007; Matiatos et al. 2019); nitrogen concentration data are widespread in time and space. Here it is suggested that such work can also be used for validating the structure of models designed to explain both water flows and tracer concentrations by comparing simulated and observed values. Thus, it must be important to seek some other tracers that are ubiquitous and applicable for developing reliable models more globally in the future.

Conclusions

This study presents the results of two different models calibrated using an integrated watershed modeling approach for the Kumamoto region (Japan) with diverse geomorphological settings. The model results were evaluated both for the steady-state and transient simulations. The model calibrated using hydrometric data generally exhibited a good performance for streamflow hydrographs and groundwater levels. However, hydrograph fitting could not determine the actual groundwater storage, which can be confirmed by incorporating multiple tracer data. The updated model (model-2) with deeper hydrogeological boundary conditions successfully reproduced multiple-tracer movement (3H, 85Kr, and temperature) as well as hydrographs with an acceptable error. This model can provide a better explanation for catchment water dynamics in Kumamoto at regional scale. The findings of this study will provide useful information for water resources managers when attempting to understand surface and subsurface hydrological processes and when characterizing sources, distribution, and transport processes of contaminants. Furthermore, this study ensures that multiple-tracer inclusion in the model can reduce the structural uncertainty of the model for an area where the lower boundary of the aquifer is uncertain. The obtained results represent the first step in understanding detailed catchment dynamics at a regional scale using an integrated watershed modeling technique incorporating multiple tracer data. The findings of this study encourage further studies using tracer data together with hydrometric data for detailed characterization of surface–subsurface hydrological processes.