Introduction

Groundwater modeling stands as an essential in comprehending and managing water resources, furnishing indispensable insights for ensuring sustainable water provision and environmental preservation (Taylor and Peach 2023; Mahzabin et al. 2023; Navarro-Farfán et al. 2024). Despite its pivotal role, groundwater models grapple with challenges rooted in inherent uncertainties stemming from inadequate understanding of the underlying system, natural variations in subsurface conditions, and field intricacies (Meyer et al. 2007; Samani et al. 2018a and b; Miro et al. 2021; Taşan et al. 2023; Enemark et al. 2024).

In recent years, there has been a growing recognition of the necessity to address two principal sources of uncertainty: spatial discretization and model complexity. Spatial discretization entails dividing the groundwater system into discrete cells or elements, thereby influencing the resolution and representation of flow and transport processes. Conversely, model complexity pertains to the number and type of parameters incorporated into the model, delineating the level of detail and realism in representing the hydrogeological system.

Spatial models across diverse disciplines, encompassing ecology, meteorology, and hydrology, mandate the application of spatial discretization techniques (Pogson et al. 2012; Raazia and Dar 2021; Higdon et al. 2022; Samani and Kardan, 2022; Choubin et al. 2023; Sun et al. 2024). Similarly, groundwater flow simulations predicated on MODFLOW necessitate the solution of partial differential equations through the finite difference method. In this method, the continuous region is discretized into cells, which may adopt square or rectangular configurations. The selection of an optimal grid size assumes paramount importance, as it hinges upon a comprehensive understanding of the conceptual aquifer model. Opting for finer cell sizes within the grid increases computational complexity, while coarser cell sizes introduce uncertainties in the conceptual model (Khan et al., 2018; Chen et al., 2020). The spatial resolution profoundly influences the fidelity of the model's representation of reality and its capacity to seamlessly integrate with other spatial data, thereby minimizing uncertainties. By judiciously determining the optimal spatial resolution, researchers can bolster the model's ability to accurately encapsulate real-world processes and mitigate uncertainty in the analyses and predictions made (Li et al., 2019; Liu et al., 2021). The selection of an appropriate grid size strikes a delicate balance between computational efficiency and model accuracy, ensuring dependable outcomes in the realm of groundwater modeling. Understanding the ramifications of spatial resolution in groundwater modeling is pivotal for curtailing uncertainties and ameliorating the overall reliability of model results. This knowledge underpins effective integration with other spatial data, empowering robust decision-making processes in water resources management, environmental planning, and related domains.

Numerous studies have delved into the influence of spatial discretization on model performance, proffering valuable insights into the intricacies of surface water-groundwater interactions and hydrological fluxes (Stampfl et al. 2007; Wang et al. 2012; Pogson and Smith 2015; Bomers et al. 2019). In one notable investigation by Refsdgaard (1997), four models with varying element sizes (500, 1000, 2000, and 4000 m) were developed to simulate the Karup catchment in Denmark (440 km). Calibration and validation were executed for the first model with a 500 m element size, while the subsequent models were generated without calibration. The results elucidated that model with element sizes larger than 1000 m evinced subpar simulation of surface water and groundwater interaction, particularly in reproducing runoff. However, the author underscored the potential for significant recalibration to bolster the performance of models with coarser element sizes.

Vàzquez et al. (2002) embarked on a study in the Gete catchment in Belgium (586 km2), where a model with a 600 m grid cell size was initially calibrated. Subsequently, two additional models were formulated, featuring grid cell sizes of 300 m (finer grid) and 1200 m (coarser grid), predicated on the same parameter set. Findings delineated that the model with the coarser grid cell size (1200 m) faltered to adequately simulate hydraulic heads sans the calibration process, while the model with the finer grid (300 m) garnered superior results via a simple trial-and-error calibration process. These findings underscored the imperative of recalibration when manipulating the grid resolution to attain optimal parameter values and enhance model performance.

Downer and Ogden (2004) directed their focus towards the Hortonian Godwin Creek Experimental catchment (21.2 km2) and the non-Hortonian Muddy Brook catchment (3.64 km2) in the USA. By formulating a series of 2D vadose zone models, they augmented the vertical cell size for each catchment and employed an automated calibration process leveraging the shuffled complex evolution method. The calibrated models were subsequently juxtaposed in terms of infiltration, runoff, and evapotranspiration fluxes. Results elucidated that smaller vertical cell sizes engendered more accurate simulations of hydrological fluxes in the unsaturated zone vis-à-vis larger cell sizes.

In the context of the Wüstebach catchment in Germany (0.27 km2), Sciuto and Diekkrüger (2010) generated two models with grid cell sizes of 25 and 100 m in the river zone. Their scrutiny honed in on discharge and the spatial pattern of soil moisture. The study unraveled that a coarse grid size engendered elevated discharge and diminished actual evapotranspiration due to the smoothing effect on the soil surface, thereby compromising the representation of topographic information.

Wildemeersch et al. (2014) crafted four models with constant element sizes of 250, 500, 750, and 1000 m for a synthetic catchment inspired by the Condroz region of Belgium. The objective was to assess the effects of element size on the simulation of discharge, hydraulic heads, and prediction uncertainty. Findings indicated that augmenting the element size predominantly impacted the simulation of discharge, primarily attributable to constraints in capturing surface water-groundwater interactions and runoff processes. Additionally, coarsening spatial discretization exacerbated the uncertainty associated with discharge predictions.

In the Ajabshir Aquifer (Iran), Samani et al. (2018a) devised six conceptual models with varying degrees of complexity (6, 10, 10, 13, 13, and 15 parameters). These models underwent comparison employing Bayesian model averaging, model selection criteria, and multicriteria decision-making (MCDM) methods to ascertain model probabilities and weights. Among the employed methodologies, the simplest model consistently garnered the highest model probability predicated on the AIC, AICc, and BIC approaches. Conversely, the KIC and MCDM methods favored a model with average complexity (10 parameters) as the best and least uncertain model. The study underscored the impact of over-complexity on uncertainty in groundwater modeling, aligning with analogous observations posited in other references on model complexity (Haitjema 2011; Simmons and Hunt 2012; Engelhardt et al. 2014).

The extant body of research has given invaluable insights into the influence of groundwater model complexity and spatial discretization on model performance and uncertainty. However, a conspicuous gap persists in contemplating conceptual groundwater uncertainty via the manipulation of spatial discretization scales. This study endeavors to bridge this gap by intertwining conceptual groundwater uncertainty with disparate spatial discretization scales while encompassing model complexity. This innovative approach is actualized through the comparative analysis of model probabilities leveraging Bayesian model-averaging (BMA) and model selection criteria. To realize this objective, the Najafabad Aquifer in Esfahan, Iran, is embraced as a case study predicated on comprehensive hydrogeological studies and modeling. Initially, five conceptual models with varying degrees of complexity are concocted and juxtaposed. The efficacy of these alternative models is gauged predicated on the model probability computed via BMA employing model selection criteria. Subsequently, the least uncertain model ascertained by BMA is culled, and two additional models are derived from it by employing finer (250 m) and coarser (1000 m) grid cell sizes. An automatic calibration process is subsequently executed to refine the performance of these two models. The study further scrutinizes groundwater uncertainty for the seven alternative models leveraging BMA. The procedural delineation of the conceptual models is illustrated in Fig. 1. The findings of this study harbor profound implications for modelers, furnishing invaluable insights into delineating the optimal scale for spatial discretization and the pertinent level of complexity for their models. By engendering a holistic understanding of the influence of these factors on model performance and uncertainties, modelers can refine the accuracy and reliability of their groundwater models.

Fig. 1
figure 1

Methodological framework of the proposed groundwater models

This innovative paradigm of interweaving conceptual groundwater uncertainty with spatial discretization and model complexity heralds a seminal advancement, proffering a novel contribution to the field. The utilization of BMA and model selection criteria imparts rigor to the analysis, furnishing a robust framework for evaluating model probabilities. The selection of the Najafabad Aquifer as the case study site augments the pertinence and applicability of the findings to real-world hydrogeological studies. These unique facets of the study render it a priceless and timely augmentation to the extant literature, tackling a pivotal research domain that hitherto remains largely unexplored.

Description of the study area

Location

The study area of this research is the Najafabad Plain, which spans an extensive area of 1075 km2. Situated within the Gavkhoni catchment in Esfahan, Iran, the Najafabad Plain is depicted in Fig. 2. The climatic conditions in this region align with the Emberger climate classification system (Emberger 1969), classifying it as an arid zone. The average annual precipitation in the study area is approximately 153 mm, accompanied by an average temperature of around 15 °C. Evaporation rates in the region are relatively high, with an average of about 2262 mm per year. These climate characteristics contribute to the unique hydrogeological conditions of the Najafabad Plain, making it an intriguing and pertinent site for groundwater modeling and analysis.

Fig. 2
figure 2

Location of the study area

Geological settings

The study area exhibits a wide range of geological facies, predominantly composed of limestone, sandstones, shales, and conglomerates. The geological structure of the plain follows a northwest-southeast direction, gradually accumulating alluvial sediments and forming the Najafabad alluvial aquifer. In the northeastern region of the Najafabad aquifer, Quaternary sediments consist of gravel deposits with shale and Cretaceous sandstones, intercalated with layers of limestone and ammonite. Moving towards the northwest, in addition to shale and Cretaceous sandstones, layered sandstones containing conglomerates and yellow dolomitic sandstones are present. The southwestern part of the area displays layered limestone shale alongside Triassic and Jurassic conglomerates. In the southern region, Cretaceous rocks are well-developed, with a maximum thickness of 50 m, characterized by layers of red shale. Overall, the thickness of alluvial deposits increases gradually from the periphery towards the center of the plain. Geophysical data and deep well observations indicate that the alluvial deposits can reach a thickness of approximately 200 m. Analysis of exploration wells penetrating the bedrock in the Najafabad area reveals that the underlying material primarily consists of shale and schist from the second geological period, which belongs to the category of ductile rocks. Despite being subjected to faulting, these ductile rocks generally exhibit low permeability, rendering water transfer challenging even in the presence of faults (Holder and Philip, 2001; Singhal and Gupta 2010).

Hydrogeological setting

The Najafabad Aquifer, characterized as an unconfined aquifer, primarily consists of alluvial deposits that have been transported by the Zayandehrood River originating from the southern highlands and flowing towards the northeast of the aquifer, as well as the Morghab River entering from the northwest and running towards the Zayandehrood River. The topography of the area exhibits variations, with the highest elevation reaching approximately 1885 m in the northwest, while the lowest elevation of around 1575 m is found in the north and northeast regions near the Zayandehrood River (Fig. 2).

Regarding aquifer thickness, the maximum thickness is observed in the northeast, south, and central parts of the aquifer along the Zayandehrood River, reaching up to 200 m. Conversely, the northwestern part of the aquifer exhibits the lowest thickness, measuring approximately 15 m.

Analysis of average water level measurements in observation wells during the 2021 period (Fig. 3) indicates that the predominant groundwater flow direction is from the northwest and west towards the eastern portion of the aquifer.

Fig. 3
figure 3

Potentiometric surface map and flow lines during the 2021 period (unit: m) and GHB inflow and outflow boundaries

Groundwater depth within the aquifer, based on measurements during 2021, ranges from 6 to 90 m. It is worth noting that the lowest depth of the groundwater level remains above the maximum depth of evaporation. As a result, the evaporation package was not included in the groundwater flow model for this aquifer.

The water gradient in the aquifer ranges from 0.003 to 0.04 and exhibits a decreasing trend from the northwest to the southeast. The lowest water gradient is observed at the outlet of the Zayandehrood River.

Horizontal hydraulic conductivity, estimated from pumping test data, varies across the Najafabad Aquifer. The highest hydraulic conductivity of approximately 10 m/day is observed in the western part of the aquifer, gradually decreasing from northwest to south and east. The lowest hydraulic conductivity measures around 1 m/day. On average, the hydraulic conductivity in the Najafabad Aquifer is approximately 4 m/day.

Water extraction from the aquifer occurs through 9,696 deep and shallow wells. In 2021, pumping wells discharged approximately 130 million cubic meters of water from the Najafabad Aquifer. It is noteworthy that a portion of this water returns to the aquifer as agricultural runoff.

Material and method

A variety of methods exist for selecting the most reliable model with an optimal number of parameters, such as the calculation of model probability (Hill and Tiedman, 2006). In this study, Bayesian statistics are employed to determine the model probability and select the most suitable model that exhibits a strong agreement between observed data and simulated results, while simultaneously achieving an optimal balance of parameter complexity.

By utilizing Bayesian statistics, the posterior model probability is computed using model selection criteria, enabling the identification of the model that provides the best fit to the observed data. This approach considers both the accuracy of the model's predictions and the complexity of its parameters, ensuring that the selected model strikes an optimal balance between fitting the data well and avoiding excessive complexity. Through this methodology, the study aims to identify the model that not only provides an accurate representation of the observed system but also exhibits a parsimonious parameterization.

Model selection criteria

In the process of model selection, various statistical criteria are employed, including the Akaike information criterion (AIC, Akaike 1974), corrected Akaike information criterion (AICc, Hurvich and Tsai 1989), Bayesian information criterion (BIC, Rissanen 1978), and Kashyap information criterion (KIC, Kashyap 1982). These criteria are rooted in statistical theory and provide a framework for evaluating and comparing alternative models based on their ability to fit the data and optimize model complexity. Let's consider a set of K alternative models, denoted as Mk, each characterized by Nk unknown parameters and denoted as θk, where k ranges from 1 to K. The model selection criteria can be defined as follows:

$${\text{AIC}}_{k} = - 2\ln \left[ {L\left( {\hat{\theta }_{k} |D} \right)} \right] + 2N_{k},$$
(1)
$${\text{AIC}}c_{k} = - 2\ln \left[ {L\left( {\hat{\theta }_{k} |D} \right)} \right] + 2N_{k} \frac{{2 N_{k} \left( { N_{k} + 1} \right)}}{{N - N_{k} - 1}},$$
(2)
$${\text{BIC}}_{k} = - 2\ln \left[ {L\left( {\hat{\theta }_{k} |D} \right)} \right] + N_{k} \ln \left( N \right),$$
(3)
$${\text{KIC}} = - 2\ln \left[ {L\left( {\hat{\theta }_{k} |D} \right)} \right] - 2\ln p(\hat{\theta }_{k} ) + N_{k} \ln (N/2\pi ) + \ln \left| {\overline{F}_{k} } \right|.$$
(4)

In the proposed methodology, the maximum likelihood estimate (θ ̂k) of the model parameters (θk) is obtained. The negative log-likelihood (NLL) function (−2ln[L(θ ̂k |D)]) is utilized to evaluate the goodness of fit between the model and the observed data (D). The prior probability (p(θk)) of the model parameters is considered, and Fk = Fk/N is the normalized log-likelihood. Here, N represents the total number of observations, and Nk denotes the number of parameters in the model. Based on these considerations, the KIC can be expressed as follows:

$$\overline{F}_{k,ij} = \frac{1}{N}F_{k,ij} = - \left. {\frac{1}{N}\frac{{\delta^{2} \ln \left[ {L\left( {\overline{\theta }_{k} lD} \right)} \right]}}{{\delta \theta_{ki} \delta \theta_{kj} }}} \right|\theta_{k} = \hat{\theta }_{k},$$
(5)
$${\text{KIC}} = - 2\ln \left[ {L\left( {\hat{\theta }_{k} |D} \right)} \right] - 2\ln p(\hat{\theta }_{k} ) - N_{k} \ln (2\pi ) + \ln \left| {F_{k} } \right|.$$
(6)

The Fisher information matrix (Fk) plays a crucial role in the calculation of model selection criteria. The initial term, −2ln[L(θ ̂kD)], shared by all criteria, quantifies the agreement between the predicted outcomes and the observed data. A smaller value of this term signifies a better fit between the model and the data. N represents the total number of observed data points, allowing for a comparison of the number of parameters against the number of observations. Ideally, the preferred model is one that incorporates more observational data while minimizing the number of parameters. Nk denotes measures of model complexity, enabling the penalization of models with excessive parameters that fail to enhance the model's fit (Samani et al. 2018a). The Fisher term within the KIC approach may lead to the selection of different models compared to the BIC method in certain cases (Ye et al. 2008a,b, 2010).

Model probability

In the context of models with varying degrees of complexity, Bayesian statistics offers a comprehensive methodology. This approach entails the calculation of the posterior probability of each model by considering the prior model probability and the marginal likelihood. Neuman (2003) proposed an application of Bayes' theorem to determine the posterior probability p(Mk |D), expressed as:

$$p\left( {M_{k} |D} \right) = \frac{{p\left( {D|M_{k} } \right)p\left( {M_{k} } \right)}}{{\mathop \sum \nolimits_{l = 1}^{k} p\left( {D|M_{l} } \right)p\left( {M_{l} } \right)}},$$
(7)

where p(Mk) is the prior probability of model Mk. The marginal likelihood function, p(D|Mk), is defined as:

$$p\left( {D|M_{k} } \right) = \smallint p\left( {D| \theta_{k} ,M_{k} } \right)p\left( {\theta_{k} | M_{k} } \right) d\theta_{k},$$
(8)

p(θk|Mk) is the prior probability density of θk under model Mk, and p(D|θk,Mk) is the joint likelihood of Mk and θk. The marginal likelihood, also called integrated likelihood or Bayesian evidence, measures overall model fit.

Calculating model probability based on model selection criteria

p(D|Mk) and p(Mk|D) can be calculated as (Ye et al. 2004):

$$P\left( {D|M_{k} } \right) \approx \exp \left( { - \frac{1}{2}IC_{k} } \right),$$
(9)
$$p\left( {M_{K} |D} \right) = \frac{{\exp \left( { - \frac{1}{2}\Delta IC_{k} } \right)p\left( {M_{k} } \right)}}{{\mathop \sum \nolimits_{l = 1}^{k} \exp \left( { - \frac{1}{2}\Delta IC_{l} } \right)p\left( {M_{l} } \right)}},$$
(10)

where ΔICk = ICk − ICmin and ICmin = mink{ICk}, IC being AIC, AICc, KIC or BIC.

Numerical model developments

Model construction

In this study, a total of seven three-dimensional finite-difference numerical models were constructed using MODFLOW with Model Muse serving as the graphical user interface, as outlined by Harbaugh (2005). It should be mentioned that employing MODFLOW in groundwater modeling comes with several constraints and challenges. Some of these include:

Numerical Approximation Methods: MODFLOW employs finite-difference numerical methods to approximate groundwater flow equations. While effective, these methods may encounter challenges in accurately representing complex hydrogeological settings, such as highly heterogeneous aquifer properties or irregular boundaries.

Computational Efficiency: Depending on the size and complexity of the groundwater system being modeled, MODFLOW simulations can be computationally intensive and time-consuming. Large-scale models with fine spatial discretization may require significant computational resources, limiting the feasibility of detailed simulations in certain scenarios.

Handling Complex Hydrogeological Settings: MODFLOW may face challenges in accurately representing complex hydrogeological settings, such as interconnected surface water-groundwater systems or fractured rock aquifers. Simplifications or assumptions may need to be made, potentially impacting the accuracy of model predictions.

Compatibility with Integrated Modeling Frameworks: Integrating MODFLOW with other models, such as surface water models or contaminant transport models, can be challenging due to differences in numerical formulations, spatial discretization schemes, and input/output formats. Ensuring seamless integration often requires careful calibration and validation efforts.

Parameterization and Calibration: Proper parameterization and calibration of MODFLOW models are essential for obtaining reliable simulation results. However, selecting appropriate parameter values and calibrating the model to observed data can be challenging, particularly in data-scarce environments or when dealing with uncertain hydrogeological parameters.

Model Validation: Validating MODFLOW models against independent data sources is crucial for assessing model reliability and accuracy. However, validating groundwater flow models can be challenging due to limited access to groundwater level measurements or other relevant data, leading to uncertainties in model predictions.

Model Interpretation: Interpreting MODFLOW simulation results requires a deep understanding of groundwater flow processes and the limitations of the modeling approach. Misinterpretation of model outputs can lead to erroneous conclusions and potentially inappropriate management decisions regarding groundwater resources (Gogineni and Chintalacheruvu 2023, 2024; Roy and Chintalacheruvu 2024; Thornton et al. 2022).

The modeling domain encompassed an area of 1075 km2, which corresponds to the Najafabad Aquifer. Considering the unique characteristics of the study area, five distinct conceptual models were developed by incorporating alternative geological interpretations, recharge estimations, and boundary condition implementations. These models aimed to capture the diverse hydrogeological conditions present within the Najafabad Aquifer. To discretize the aquifer domain, a grid resolution of 127 rows, 86 columns, and one layer was employed, resulting in a total of 10,922 active cells. The grid cell size was set to 500 m in both the x and y directions, ensuring an appropriate representation of the spatial dimensions. Moreover, two additional models were derived from the first model, utilizing finer (250 m) and coarser (1000 m) grid cell sizes, as detailed in Table 1.

Table 1 The introduction of seven alternative conceptual models

For the simulation of water level conditions, a steady-state situation was adopted to represent the year 2021, during which the groundwater system was in a state of equilibrium. All relevant observation and pumping wells, boundary conditions, and recharge inputs were incorporated into the model using suitable software packages. These components were carefully integrated to accurately reflect the hydrological characteristics of the Najafabad Aquifer and enable comprehensive analysis of its water level dynamics.

Numerical model boundaries

In order to establish the initial groundwater head distribution for the numerical models, the average groundwater levels observed in 40 wells during 2021 were utilized. These observed values were interpolated using the Fitted Surface interpolation method. Since the Najafabad Aquifer is unconfined, the highest level of the model was set as the surface topography, while the lowest level represented the bedrock elevation, determined using geophysical studies and exploration well logs.

For the simulation of pumping wells, the well package in MODFLOW was employed. A total of 9,696 pumping wells were included in the models. In some conceptual models (model numbers 1 and 3), the well package was also used to define river-related data, such as recharge and discharge values. The decision to use the well package instead of the river package was due to inadequate information regarding river properties, such as bed depth, surface water, and sediment thickness. Additionally, in model number 5, the well package was utilized to replace the output flow boundaries by specifying discharge well values.

Based on water table contours and flow lines derived from averaging groundwater levels in 2021, it was observed that most parts of the aquifer experienced inflow in the northwest, west, and east, while the southeast exhibited outflow. A no-flow boundary condition was implemented in the northeastern parts of the aquifer. General head boundary (GHB) packages were employed to represent the inflow and outflow boundaries, facilitating the characterization of groundwater boundaries. The GHB package simulated the movement of water based on the difference between the cell's head value and the specified general head boundary value, as well as the conductance parameter determining the ease of water flow. To ensure that the computed groundwater heads aligned with the actual conditions, linear interpolation was performed on node values within each segment of the grid cells. Conductance in GHB boundary cells was estimated using hydraulic conductivity values in the boundary area. In model number 5, well packages were used instead of GHB to represent the output boundary.

Recharge rates in the Najafabad Aquifer, resulting from rainfall and agricultural return water, were simulated using the recharge package. The volume of recharge varied across different parts of the aquifer due to variations in soil properties, geology, land use, rainfall intensity, and ground surface slope. Rain infiltration volume was calculated using the Thornthwaite water balance method, while agricultural return water was estimated using the Blaney–Criddle method based on agricultural wells and irrigation patterns. Due to the uncertainty associated with this parameter, recharge parameters for different zones were determined during the calibration process. In the seven alternative conceptual models, recharge was introduced with 15 zones and corresponding parameters. In models number 2, 4, and 5, the Zayandehrood River was defined using the recharge package, and four recharge parameters were assigned to represent the river (Fig. 4).

Fig. 4
figure 4

Recharge zonation defined for modeling software (model number 3, 4, and 5)

Hydraulic conductivity parameter

To estimate the hydraulic conductivity parameter for the numerical models, a combination of pumping tests, exploration well logs, and standard hydraulic conductivity tables was utilized. For models number 1 and 2, the hydraulic conductivity parameter was determined through interpolation techniques, resulting in the definition of a single parameter for the entire model domain (see Fig. 5).

Fig. 5
figure 5

Hydraulic conductivity interpolation for modeling software (model number 1, 1a, 1b, and 2)

In contrast, models number 3, 4, and 5 employed a zoning method to simulate hydraulic conductivity parameters. These models were divided into seven distinct zones, each assigned its own hydraulic conductivity parameter (as illustrated in Fig. 6). This zoning approach was adopted due to the presence of uncertain information regarding hydraulic conductivity values across the study area.

Fig. 6
figure 6

Hydraulic conductivity zones defined for modeling software

During the calibration process, the hydraulic conductivity parameters for the different zones were calculated and adjusted to improve the model's performance and accuracy. This calibration phase allowed for the refinement of the hydraulic conductivity estimates, considering the available data and the observed behavior of the aquifer system.

Conceptual models with different spatial discretization

To comprehensively investigate the influence of spatial discretization and complexity on uncertainty in groundwater flow modeling, this study further expanded upon the initial model (referred to as model number one) by developing two additional models. These new models, denoted as model 1a and model 1b, were designed to explore the effects of finer (250 m) and coarser (1000 m) grid cell sizes, respectively (as presented in Table 1).

The primary objective of incorporating these two additional models was to incorporate conceptual uncertainties into the analysis, while simultaneously considering the impact of model complexity. This was achieved by evaluating the model probability using the Bayesian model averaging method, a robust statistical approach. By comparing the model probabilities obtained from model 1a, model 1b, and the original model, a comprehensive assessment of the interplay between conceptual uncertainties, model complexity, and spatial discretization was made possible. Through this meticulous examination, a more nuanced understanding of the relationship between these factors and their impact on uncertainty in groundwater flow modeling was achieved. By leveraging the Bayesian model averaging method, this study offered valuable insights into the relative merits and performance of different spatial discretization and complexity configurations, ultimately advancing the understanding and practice of groundwater flow modeling.

Model calibration

The calibration process of the models involved a combination of trial and error methods and advanced automatic parameter estimation techniques to achieve the best possible match between the simulated hydraulic heads and observed data. To facilitate the automatic calibration, the UCODE software (Poeter et al., 2006) was utilized, offering efficient and robust optimization algorithms. In the case of the seven alternative models, the automatic calibration procedure involved the estimation of hydraulic conductivity and recharge parameters. The hydraulic conductivity and recharge packages were automatically calculated for these models, while the data pertaining to the General Head Boundary (GHB) boundaries, pumping wells located at the output boundary, and the river were manually calibrated.

For models numbered 1, 1a, 1b, and 2, a single parameter was assigned for hydraulic conductivity. In these models, the hydraulic conductivity data were incorporated through interpolation techniques. The process began by defining an initial zone encompassing the entire model domain, where the hydraulic conductivity parameter was set to a value of one. Subsequently, a multiplier coefficient was established for this zone, and hydraulic conductivity data were input as discrete points. These data were then interpolated across the entire model domain, resulting in a comprehensive representation of hydraulic conductivity distribution.

For models numbered 3, 4, and 5, the hydraulic conductivity parameter was divided into seven distinct zones. Each zone was assigned an individual parameter, which was subsequently optimized during the calibration process. Similarly, the recharge parameter was divided into 15 zones across all seven alternative models, and calibration efforts were focused on optimizing the parameters for each respective zone.

Through the integration of manual calibration for specific boundary conditions and river-related elements, along with automatic calibration for hydraulic conductivity and recharge, the models underwent a rigorous and comprehensive calibration process. This approach allowed for the refinement and optimization of model parameters, ultimately enhancing the accuracy and reliability of the groundwater flow simulations.

Model assumption

During the modeling process, several assumptions were made to facilitate the simulation of groundwater flow dynamics:

Steady-State Conditions: A steady-state situation was adopted to represent the year 2021, during which the groundwater system was assumed to be in equilibrium. This assumption allowed for the simulation of long-term groundwater flow patterns within the aquifer.

Recharge Estimations: Recharge rates in the Najafabad Aquifer, resulting from rainfall and agricultural return water, were simulated using the recharge package. These recharge rates were estimated based on soil properties, land use, rainfall intensity, and irrigation patterns, with uncertainties accounted for during the calibration process.

Model Calibration: The calibration process involved adjusting model parameters, including hydraulic conductivity and recharge rates, to achieve the best match between simulated and observed groundwater levels. This process utilized both manual calibration for specific boundary conditions and automatic calibration techniques for parameter estimation.

Results and discussion

Calibration results

Following the calibration process, an evaluation of the performance of the seven alternative models was conducted, employing a set of criteria outlined by ESI (2007) to assess the goodness of fit. Two primary objectives were considered: the mean absolute residual (MAR) and the ratio of the residual standard deviation to the range of groundwater heads. According to the established criteria, both of these metrics should be below 10% for all models to indicate a satisfactory performance.

In the case of the Najafabad Aquifer, the range of groundwater heads, representing the difference between the maximum and minimum groundwater levels, was determined to be 326 m. Table 2 provides a summary of the statistical measures pertaining to the calibration results obtained from the seven alternative models. Notably, both the MAR and the ratio of the residual standard deviation to the range of heads fell within the range of 1% to 2% for all models (Table 2).

Table 2 Statistical values for calibration and validation for seven alternative conceptual models

Given that the statistical results achieved in the calibration process meet and even surpass the specified calibration targets, it can be concluded that the calibration results of all seven alternative models are deemed acceptable. The performance evaluation demonstrates that the models successfully capture the observed data, exhibiting a high level of accuracy and reliability in simulating the groundwater flow dynamics within the Najafabad Aquifer.

Model validation

To ensure the reliability of the calibrated models, a rigorous model validation process was conducted to verify the accuracy of the obtained results. In this stage, the performance of the developed models was assessed by comparing the simulated average water level measurements in the observation wells during the year 2022, considering the system to be in a steady state, with the corresponding observed data. By subjecting the models to this validation procedure using new sets of observation data, certain parameters such as pumping rates and recharge rates were adjusted to address any minor discrepancies between the observed and calculated water levels.

To evaluate the accuracy of the model validation, the root-mean-square error (RMSE) was employed as a statistical measure, as suggested by Duan et al. (2007) and Diks and Vrugt (2010). Table 2 presents a comparison of the RMSE values obtained during the calibration and validation processes. The model validation results indicate a relatively weaker performance compared to the calibration results, implying a slightly reduced accuracy in reproducing the observed water levels. However, it is important to note that despite this discrepancy, the validation results remain within an acceptable range.

Overall, the model verification process reinforces the confidence in the calibrated models, as they demonstrate satisfactory performance during the validation stage. While the validation results may exhibit a slightly weaker performance compared to calibration, the models still provide reliable predictions and successfully capture the essential dynamics of the groundwater system in the Najafabad Aquifer.

Input data uncertainty

Addressing input data uncertainty in hydrogeological modeling is crucial due to its significant role alongside other sources of uncertainty, such as conceptual model uncertainty, complexity uncertainty arising from excessive parameterization, parameter uncertainty, and scenario uncertainty. Here are the potential impacts and strategies for addressing input data uncertainty:

Potential impacts

Model reliability

Input data uncertainty can directly impact the reliability of hydrological models. Inaccurate or unreliable input data may lead to biased model outputs, reducing the overall confidence in model predictions.

Decision making

Uncertain input data can result in suboptimal decision-making processes, as model outputs may not accurately represent the true state of the hydrological system. This can lead to ineffective or inappropriate management strategies for water resources.

Model calibration and validation

Input data uncertainty can pose challenges during model calibration and validation processes. Inaccurate input data may result in poor model performance and hinder the ability to accurately match observed data.

Uncertainty propagation

Uncertainty in input data can propagate through the modeling process, amplifying uncertainty in model predictions. This can make it difficult to identify the sources of uncertainty and assess their impacts on model outcomes.

Strategies for addressing input data uncertainty

Data quality assessment

Conduct thorough assessments of the quality and reliability of input data sources. This involves evaluating data collection methods, instrumentation accuracy, spatial and temporal resolution, and data consistency.

Data validation and verification

Implement procedures to validate and verify input data against independent sources or field measurements. This helps identify inconsistencies, errors, or outliers in the data.

Data uncertainty quantification

Quantify the uncertainty associated with input data using statistical methods or uncertainty analysis techniques. This provides insights into the range and magnitude of potential errors in the data.

Data assimilation techniques

Incorporate data assimilation techniques to integrate observational data into the modeling process dynamically. This allows for continuous updating and refinement of model inputs based on new information, reducing input data uncertainty over time.

Sensitivity analysis

Perform sensitivity analyses to assess the sensitivity of model outputs to variations in input data. Identifying influential input parameters helps prioritize data collection efforts and focus resources on improving the accuracy of critical input variables.

Data fusion and integration

Integrate diverse sources of information, including remote sensing data, geospatial data, and citizen science data, to enhance the reliability and completeness of input datasets. Data fusion techniques combine multiple data sources to mitigate individual data limitations and improve overall data quality.

Uncertainty propagation analysis

Conduct uncertainty propagation analyses to assess how input data uncertainty propagates through the modeling process and impacts model predictions. Understanding the cascading effects of input data uncertainty on model outputs helps quantify overall model uncertainty and identify critical sources of uncertainty that require mitigation.

Effect of complexity on groundwater modeling uncertainty

To quantitatively assess the uncertainty arising from variations in model complexity, a series of alternative conceptual models were constructed, each characterized by a distinct degree of complexity. The initial model, denoted as model number 1, represented the simplest configuration and comprised a total of 16 parameters. Subsequently, additional parameters were incrementally introduced to the alternative models, leading to an escalation in their degrees of complexity. Model number 2 encompassed 20 parameters, while models number 3 featured 22 parameters. The complexity further increased in models number 4 and 5, which consisted of 26 parameters.

By systematically varying the number of parameters within the alternative conceptual models, the study aimed to investigate the influence of model complexity on the resulting uncertainties. This approach allowed for a comprehensive exploration of the potential sources of variability and offered insights into the impacts of parameterization choices on the overall model outcomes. The diverse range of models, each characterized by a distinct number of parameters, provided a valuable framework for evaluating the associated uncertainties and shedding light on the intricate interplay between model complexity and its corresponding uncertainty levels.

Examining complexity according to model probability

The determination of model probabilities for the various alternative models is a crucial step in assessing their relative merits and uncertainties. Equation 10 is employed for calculating these probabilities, with one of the factors influencing this equation being the prior model probability, denoted as p(Mk). These probabilities can either be uniformly assigned across all models or estimated based on the modeler's expertise and understanding of the study area. Previous studies (Pohlmann et al. 2007; Singh et al. 2010; Ye et al. 2010; Samani et al. 2018a, 2018b) have highlighted the significance of prior probabilities in model selection.

In our investigation, all alternative models were assigned an equal prior model probability of 1/5. These probabilities were utilized to compute the posterior model probability using different model selection criteria, including AIC, AICc, BIC, and KIC. The results obtained from these calculations are presented in Table 3. Notably, all the methods consistently assigned the highest model probability to the simplest model, namely model number 1. It is important to emphasize that there was no distribution of model probabilities observed among the alternative models. Instead, all the selection criteria unanimously favored model number 1 as the most favorable option, exhibiting the highest probability and lowest uncertainty (99.25% probability of model 1 by KIC method (Table 3)). Intriguingly, the AIC, AICc, BIC, and KIC methods collectively indicated that models 2, 3, 4, and 5 suffered from inappropriate conceptual model definitions.

Table 3 Prior model probability and posterior model probabilities for five conceptual models evaluated with the AIC, AICc, BIC and KIC methods, and model ranking

In this study, an unbiased approach was adopted by initially developing five conceptual models without any predefined assumptions regarding the accuracy of their underlying structures. Subsequently, the model probability comparisons effectively rejected the certainty of conceptual models 2 to 5. Our case study serves as a compelling demonstration of the pitfalls associated with erroneous conceptualization and excessive parameterization, leading to misleading interpretations and diminished model probabilities. By embracing an objective evaluation process, we have highlighted the critical importance of robust conceptual model definition to enhance the reliability and credibility of groundwater modeling outcomes.

Assessing the effect of spatial discretization on the model uncertainty

In order to investigate the relationship between spatial discretization and complexity, two additional models were developed based on model number one, which was identified as the least uncertain model. These two models, labeled as model 1a and model 1b, employed a finer grid cell size of 250 m and a coarser grid cell size of 1000 m, respectively. Both models retained the same level of complexity, characterized by 16 parameters. The recalibration of these models was carried out using automatic parameter estimation techniques to minimize the disparity between simulated and observed hydraulic heads.

Model probabilities were calculated for the seven alternative models, with each model assigned an equal prior probability of 1/7. These probabilities were subsequently used to compute the posterior model probability employing different model selection criteria, including AIC, AICc, BIC, and KIC. The outcomes of this analysis are presented in Table 4.

Table 4 Prior model probability and posterior model probabilities for seven conceptual models evaluated with the AIC, AICc, BIC and KIC methods, and model ranking

The results reveal that model 1a obtained the highest probability and emerged as the model with the least uncertainty (93.42% probability of model 1a by KIC method Table 4). This finding suggests that an increase in grid cell size, specifically to 1000 m, introduces greater uncertainty into the conceptual model. Interestingly, model 1b, despite having the least complexity, exhibits an even lower probability (5.85E-10, KIC method) compared to the more complex models, such as models 4 and 5 (with probabilities of 1E-4 and 3.46E-05, KIC method, respectively). Hence, it is evident that in order to develop a conceptual model with minimal uncertainty, careful consideration must be given not only to the selection of optimal parameters but also to the scale of spatial discretization, taking into account the available data.

These findings align with the conclusions reported by Wildemeersch et al. (2014), as their research also indicated that coarsening spatial discretization leads to increased uncertainty in discharge predictions. In addition, the results of Vàzquez et al. (2002) validate the lower uncertainty associated with coarser spatial discretization. However, several factors could contribute to variations in results between this study and others:

Study Area and Hydrogeological Conditions: Differences in hydrogeological settings, such as aquifer geometry, lithology, hydraulic properties, and groundwater recharge rates, can lead to variations in model behavior and uncertainty. Studies conducted in geologically diverse regions with distinct hydrological characteristics may yield different results compared to those conducted in areas with similar hydrogeological conditions to the Najafabad Aquifer.

Modeling Approaches and Assumptions: Variations in modeling approaches, including model conceptualization, parameterization, boundary conditions, and calibration methods, can influence model performance and uncertainty. Studies employing different modeling techniques or making different assumptions about groundwater flow processes may produce contrasting results.

Temporal and Spatial Scale: Differences in the temporal and spatial scales of the study area and modeling domain can affect the representation of hydrological processes and uncertainty. Studies conducted at larger spatial scales or over longer time periods may capture additional complexities and sources of uncertainty not accounted for in smaller-scale or shorter-term investigations.

Data Availability and Quality: Variations in the availability and quality of data used for model calibration and validation, such as groundwater level measurements, hydraulic conductivity values, and recharge estimates, can influence model reliability and uncertainty. Studies utilizing more extensive and accurate datasets may yield different results compared to those with limited or lower-quality data.

Model Complexity and Parameterization: Differences in model complexity and parameterization schemes, such as the number of parameters included in the models and the representation of subsurface heterogeneity, can affect model uncertainty. Studies employing more complex models with a higher degree of parameterization may exhibit greater uncertainty compared to simpler models with fewer parameters.

The present study further supports the notion that spatial discretization plays a crucial role in determining the uncertainty of groundwater flow models and underscores the importance of judiciously selecting the appropriate spatial scale to enhance model reliability and accuracy. In total, by recognizing uncertainty in groundwater modeling, stakeholders can make more informed decisions to sustainably manage and protect valuable groundwater resources.

Acknowledgement of limitations

In this section, we acknowledge the limitations of our study, recognizing potential sources of uncertainty and assumptions that may have influenced the results. While our research endeavors to provide valuable insights into groundwater flow modeling uncertainty, it is essential to acknowledge the inherent constraints and uncertainties inherent in such studies.

One significant limitation lies in the simplifications and assumptions made during the modeling process. Despite our efforts to incorporate diverse hydrogeological conditions and alternative conceptual models, certain simplifications were necessary due to data limitations or modeling constraints. These simplifications may have influenced the model outcomes and should be considered when interpreting the results.

Furthermore, uncertainties associated with input data, such as hydraulic conductivity values, recharge rates, and boundary conditions, may have affected the accuracy of our simulations. While we employed various techniques to estimate these parameters and validate our models, uncertainties in data sources and measurement errors may still introduce uncertainties into the modeling outcomes.

Additionally, our study focused primarily on assessing uncertainty arising from spatial discretization and complexity dynamics. However, other sources of uncertainty, such as parameter estimation methods, model boundary conditions, and scenario uncertainty, were not extensively explored in this research. Future studies could address these additional sources of uncertainty to provide a more comprehensive understanding of groundwater flow modeling uncertainty.

Despite these limitations, our study contributes valuable insights into the interplay between spatial discretization, complexity, and groundwater modeling uncertainty. By acknowledging these limitations, we aim to provide readers with a nuanced understanding of the study's findings and their implications for groundwater modeling practice and research.

Future research

In light of the findings presented in this study, several avenues for future research in groundwater flow modeling emerge, offering opportunities to further enhance our understanding and improve modeling practices.

One promising direction for future research involves exploring additional sources of uncertainty beyond those investigated in this study. Specifically, investigating different parameter estimation methods could provide valuable insights into their impact on model uncertainty and ultimately improve model accuracy. Evaluating the effectiveness of various parameter estimation techniques, such as inverse modeling approaches or machine learning algorithms, in capturing the complex dynamics of groundwater systems could lead to more robust and reliable modeling results.

Furthermore, extending the application of the approach developed in this study to different aquifer systems presents an intriguing opportunity. By applying the proposed methodology to diverse hydrogeological settings with varying geological and hydrological characteristics, researchers can assess the generalizability and robustness of the findings. This broader application can help validate the effectiveness of the methodology across different contexts and contribute to the development of more universally applicable modeling frameworks.

Additionally, future research efforts could focus on integrating advanced modeling techniques with emerging technologies to improve predictive capabilities and address existing challenges in groundwater flow modeling. Leveraging advancements in remote sensing, data assimilation techniques, and computational modeling tools offers the potential to enhance model accuracy, reduce uncertainty, and better inform water resources management decisions.

Overall, exploring these avenues for future research holds the promise of advancing our understanding of groundwater flow processes, refining modeling methodologies, and ultimately contributing to more effective and sustainable management of groundwater resources.

Conclusion

This study aimed to rigorously evaluate the uncertainty of conceptual groundwater flow models in the Najafabad Aquifer, focusing on both complexity and spatial discretization dynamics. By employing Bayesian model-averaging (BMA) and rigorous model selection criteria, we discerned significant insights into the relationship between model complexity, spatial discretization, and groundwater modeling uncertainty.

Key findings of this study include:

1 The simplicity of groundwater flow models, characterized by fewer parameters, correlates with higher model accuracy and reduced uncertainty. Specifically, model 1, with the least complexity, consistently exhibited the highest probability across various model selection criteria, emphasizing the importance of parsimonious model design in capturing groundwater behavior accurately.

2 Spatial discretization plays a pivotal role in modulating uncertainty in groundwater modeling. Our investigation revealed that coarser spatial discretization, despite maintaining model simplicity, significantly reduced uncertainty compared to finer discretization schemes. Notably, model 1a, with a spatial discretization of 250 m, demonstrated lower uncertainty compared to the original model but still exhibited higher uncertainty compared to model 1b with a spatial discretization of 1000 m.

In summary, our findings underscore the critical importance of considering both model complexity and spatial discretization in groundwater modeling endeavors. Simplified models with optimal parameter counts, in conjunction with appropriately chosen spatial discretization scales, offer a robust framework for accurate groundwater predictions and informed decision-making in hydrogeological studies.

Future research efforts should focus on refining methodologies for assessing uncertainty in groundwater modeling, particularly in the context of spatial discretization. Investigating alternative model selection criteria and exploring advanced Bayesian techniques could further enhance our understanding of uncertainty dynamics in groundwater systems. Additionally, incorporating temporal variability in groundwater models could provide a more comprehensive assessment of uncertainty under changing environmental conditions. Addressing these research gaps will contribute to advancing the reliability and applicability of groundwater modeling approaches in hydrogeological studies.