Introduction

Agriculture is the world’s largest user of water in terms of irrigated volume. The growing competition for water between the agricultural and non-agricultural sectors has increased the concern for sustainability of the irrigated agricultural system. The need for increasing agricultural production demands an increase in the cropped area regardless of the availability of water resources for irrigation; thereby make the water a limiting factor in many of the irrigation projects. The great challenge for the coming decades will be the task of increasing food production with less water, particularly in countries with limited water and land resources. Therefore, improved water management practices, especially methods to improve the water use efficiency, are warranted in the irrigation sector.

The per capita water availability in India is declining continuously, and is likely to reach the stress/scarcity levels in some regions within the next few years. This has lead to injudicious abstraction of surface and ground water resulting in several problems including rapidly declining water table levels and salt water intrusion in coastal areas. The increased frequency of extreme events (especially drought) may further lead to unavailability of water to meet current irrigation demands. The strategy for water allocation and use of irrigation water in the south Indian systems has been neither demand based nor supply based (Mohanakrishnan 1990). Seasonal and in-seasonal allocations depend on the resources available, crop seasons, and priorities for water allocations prevailing in the particular location of the system. The irrigation engineers make the allocation decisions according to the ‘water duty’ specified at the heads of the main and branch/distributary canals in the command area (Pundarikanthan and Santhi 1996). While this kind of water allocation is ideal in situations where sufficient amount of water is available, the current water availability in the country, which is a deficit situation, warrants an optimal water management strategy to improve the current food production.

Optimal irrigation scheduling under deficit conditions is highly complex since it depends on the interaction of physical constraints of the irrigation system, soil moisture availability at the time of irrigation, growth stage of the crop, effect of previous and subsequent irrigations on crop growth and yield, and nature of weather conditions (Soundharajan and Sudheer 2009). Traditionally, yield reduction models based upon evapotranspiration (ET) ratios (e.g., Doorenbos and Kassam 1979; Jensen 1968) have been employed by many researchers for deficit irrigation management (Vedula and Mujumdar 1992; Kumar et al. 2006). However, these models have two major limitations: (i) they cannot provide crop yield in absolute terms and (ii) they do not have endogenous optimization capacity (Brumbelow and Georgakakos 2007). Yet another concern is that most of the ET ratio based yield reduction models consider crop yield reduction as a linear function of crop ET within a stage of growth. Researchers have integrated optimization schemes with ET ratio based yield reduction models to arrive at optimal deficit irrigation schedules (Rao et al. 1988; Paul et al. 2000; Prasad et al. 2006). These methods employ crop water production functions on different crop growth stages, which can optimize the total water requirement for different stages of crop growth. Therefore, identification of the timing of irrigation application becomes difficult.

Recent developments in simulation of bio-physical models have given the opportunities for simulating the crop growth and development in the field conditions. Adequately calibrated and validated crop growth simulation models provide a systems approach and a fast alternative method for developing and evaluating agronomic practices that can utilize technological advances in limited irrigation agriculture (Saseendran et al. 2008). A few researchers have employed crop growth simulation models for irrigation scheduling (Rao and Rees 1992; Talpaz and Mjelde 1988; Soundharajan and Sudheer 2009) and the results were encouraging. Nonetheless, the accuracy of process oriented crop growth models depends on conceptual representation of physiological processes and parameter values used in the mathematical representation (Zhai et al. 2004).

Many of the currently used crop growth models are highly complex and are generally characterized by a multitude of parameters (Varella et al. 2010). Due to the variability in agro-climatic zones and the specific cultivars, the value of many of these parameters will not be exactly known. Further, many of them may not be directly measurable. Therefore, in most cases model calibration is necessary. Model calibration helps reduce the parameter uncertainty, which in turn reduces the uncertainty in the simulated results. During a model calibration, selected parameters are allowed to vary within predefined bounds, until a sufficient correspondence between the model outputs and actual measurements are obtained. The actual measurements for calibration of crop growth models come from the field level experiments. However, generally the experimental data may not be long enough, for accurate estimation of model parameters, because experimentation on crop systems is necessarily lengthy and expensive in terms of land, equipment, and manpower. In addition, when the number of parameters in a model is large (either due to large number of sub-processes being considered or due to the model structure itself) the calibration process becomes complex and computationally extensive (Cibin et al. 2010). In such cases, sensitivity analysis (SA) is helpful to identify and rank parameters that have significant impact on specific model outputs of interest (Saltelli et al. 2000). In general, SA is employed prior to the calibration process in order to identify a candidate set of important factors for calibration so that complexity of calibration process can be reduced.

The objective of the current study is twofolds: (i) to carry out global SA (GSA) of a rice crop growth simulation model, ORYZA2000 (Bouman et al. 2001) and (ii) to develop and demonstrate an auto-calibration procedure for the ORYZA2000 model for applications in South India. The parameters of ORYZA2000 model are optimized and validated using the data collected from field experiments conducted in three seasons in an agricultural research farm. The remainder of the article is organized as follows. Following this introduction, a brief description about the models and optimization algorithm are presented. Also, the method of SA employed in this study is described in detail. Subsequently the experimental set up and the field data employed in this study are discussed, followed by the discussion on results of the SA and the auto-calibration.

Materials and methods

In this study, the optimal values for the parameters of ORYZA2000 model were estimated by integrating the model within an optimization algorithm, thereby facilitating an auto-calibration framework. The framework used the data collected from field experiments conducted in three seasons and different treatments. The details of the field experiments, crop growth model, optimizer, and the auto-calibration framework are discussed in the following sections.

Crop growth simulation model—ORYZA2000

ORYZA2000 is an eco-physiological crop growth model that simulates the growth, development, and water balance of rice in situations of potential, water-limited and nitrogen-limited conditions on a daily basis. While there are a few other crop growth models for rice that are available (e.g., RICEMODE (McMennary and O’Toole 1985), WOFOST (Boogaard et al. 1998)), the ORYZA2000 has been extensively used and tested for its efficiency in water limited conditions, and the results were encouraging (Belder et al. 2004, 2007; Feng et al. 2007; Arora 2006), and therefore is considered in the current study.

In ORYZA2000 model, several modules such as for aboveground crop growth, ET, nitrogen dynamics, and soil–water balance, etc., are combined to simulate the crop growth under different production conditions. The water dynamics are simulated by a one-dimensional multi-layer soil water balance module, which can simulates soil water balance for different growing conditions. The model follows a daily computation scheme for the rate of dry matter production of the plant organs and phenological development. By integrating these rates over time, dry matter production and development stage are simulated throughout the growing season (Bouman and van Laar 2006). Daily dry matter production is related to net radiation, temperature, and leaf area index (LAI). The carbohydrates produced during the crop growth are shared among roots, leaves, stems, and panicles using partitioning factors along the development stages, based on daily heat units and photoperiod. From flowering onwards, leaf loss rate is simulated from an experimentally derived loss rate factor, which is a function of development stage and green leaf biomass (BM). A detailed explanation of the model along with the program source code is given in Bouman et al. (2001) (also are available at www.knowledgebank.irri.org/oryzabeta). The ORYZA2000 model assumes that the crop is well protected against diseases, pests, and weeds, and consequently the model does not consider the yield reduction due to these factors.

In order to simulate the crop growth, the model requires inputs of management practices, soil properties, and weather data in addition to crop parameters. Except the crop parameters, all of these inputs can be directly obtained from field experiments. The required management practices are crop variety, spacing or plant population, transplanting depth, nursery duration, and fertilizer and irrigation application. Soil properties required are volumetric soil water content at saturation, field capacity (FC) and wilting point and corresponding soil water potential, depth of puddled soil, and saturated hydraulic conductivity of the soil. The weather data include the rainfall and temperature during the growing season. The crop parameters include phenological development parameters and many other parameters related to the process of crop growth, and most of them can be obtained from literature. However, the cultivar specific parameters such as development rates, partitioning factors, relative leaf growth rate, specific leaf area, and leaf death rate are to be calibrated using experimental data (Bouman et al. 2001).

The parameters of the ORYZA2000 that are to be calibrated, specifically for the variety and environment under consideration, were identified from Bouman et al. (2001). A list of these parameters along with their recommended range of values is presented in Table 1. These parameters take different values at different development stage (DVS). In Table 1, the DVS is represented in terms of fraction of heat unit along the crop growth, where a value of DVS equal to 2.0 represents maturity. The parameters RGRLMX and RGRLMN are maximum and minimum relative leaf growth rate, calculated from the daily increase in temperature sum, which controls the leaf area growth. SLATB is the specific leaf area, calculated as a function of development stage (DVS), which in turn is used to calculate the leaf growth during linear phase. During the linear phase of the leaf growth (up to DVS = 0.65), there is a fixed relation between leaf weight and LAI. LRSTR is the stem reserve available for growth after the respiration and growth BM losses. FLVTB, FSTTB, and FSOTB are fraction of shoot BM partitioned to leaves, stems and storage organ, respectively. Till panicle initiation (DVS = 0.65), BM is partitioned to leaves and stems only, as there is no storage organ component during this phase of crop growth. During the growth stage between panicle initiation and flowering, all the three parameters are active so that the BM is partitioned into all the three components. After flowering, the total BM (TBM) produced by the plant is allocated to storage organ only (Bouman et al. 2001). The relative death rate of leaves (DRLVT) is calculated from the weight of the green leaves as a function of DVS, which affects the LAI after flowering. Except RGRLMX and RGRLMN, other parameters are highly nonlinear. Bouman et al. (2001) have recommended values for these parameters for two different rice varieties: IR72 and IR64, and suggested that the values of parameters for any other rice variety would be very close to these recommended values, however, need to be estimated. As there were 18 parameters for the model to be estimated, we performed a SA in order to minimize the complexity in calibration and to reduce the uncertainty in the estimated values of parameters. The Sobol’s sensitivity method (Sobol 1993) was employed in the current study for this purpose, and the parameters were optimized using Genetic Algorithm (GA).

Table 1 ORYZA2000 parameters that influence the crop growth and their recommended range

Sobol’s SA

Sobol’s method (Sobol 1993) is a variance based GSA method in which the total output variance produced by any model within an ensemble is decomposed into variance caused by each parameter of the model. The method is described below following Tang et al. (2007).

Consider a generic model described by:

$$ {\mathbf{y}} = f({\mathbf{x}}|{\varvec{\theta}}) $$
(1)

where f(.) is the function described by the model, y is the output from the model (crop yield in this study) corresponding to the inputs x and \( {\varvec{\theta}} \) is the vector of parameters of the model. Sobol’s variance decomposition is:

$$ D(y) = \sum\limits_{i} {D_{i} } + \sum\limits_{i < j} {D_{ij} } + \sum\limits_{i < j < k} {D_{ijk} } + D_{12 \ldots m} $$
(2)

where D(y) is the total variance of the output of the model, D i is the measure of individual variance due to the ith parameter, D ij is the variance induced due to the interaction between ith parameter and jth parameter, and m is the total number of parameters. For this study, the primary interest was to get each parameters’ individual contribution (first order indices) to the output and the total contribution (total order) to the output. The first order and total order Sobol’s sensitivity indices are defined as:

$$ {\text{First}}\,{\text{order}}\,{\text{index: }}S_{i} = D_{i} /D(y) \, $$
(3)
$$ {\text{Total}}\,{\text{order}}\,{\text{index: }}S_{Ti} = \, 1 - (D_{\sim i} /D(y)) $$
(4)

where S i refers to the sensitivity of ith parameter to the model output, S Ti refers the total order sensitivity that is the sum of independent and interactive effects of ith parameter to the output, and D ~i is the average variance resulting from all the parameters, except ith parameter.

The variance terms of the Eqs. 35D, D i , and D ~i , are calculated by numerical integration in Monte Carlo approximation framework (Sobol 1993, 2001; Tang et al. 2007). The total variance D is the statistical variance of the output across the simulations. The Monte Carlo approximation for the variance terms are:

$$ \hat{f}_{0} = \frac{1}{n}\sum\limits_{s = 1}^{n} {f(\theta_{s} )} $$
(5)
$$ \hat{D} = \frac{1}{n}\sum\limits_{s = 1}^{n} {f^{2} (\theta_{s} )} - f_{0}^{2} $$
(6)
$$ \hat{D}_{i} = \frac{1}{n}\sum\limits_{s = 1}^{n} {f(\theta_{s}^{a} )f(\theta_{( - i)s}^{b} } ,\theta_{is}^{a} ) - \hat{f}_{0}^{2} $$
(7)
$$ \hat{D}_{ - i} = \frac{1}{n}\sum\limits_{s = 1}^{n} {f(\theta_{s}^{a} )f(\theta_{( - i)s}^{a} } ,\theta_{is}^{b} ) - \hat{f}_{0}^{2} $$
(8)

where n defines the Monte Carlo sample size, \( \theta_{s} \) represent the sampled individual in the unit hypercube, and (a) and (b) are two different sets of samples. Parameters from the sample set (a) denoted as \( \theta_{s}^{a} \) and \( \theta_{is}^{b} \) denotes ith parameter is taken from sample (b). \( \theta_{( - i)s}^{a} \) denote all the parameters from sample set (a) are taken except ith parameter. The Eqs. 58 provide a way to compute the first order and total order sensitivity of each parameter of the model.

Implementation of SA

The Eqs. 58 depict the original Monte Carlo approximation formulae for estimating the terms in the decomposition of total variance. However, a robust computation strategy proposed in Liburne et al. (2006) for the Sobol’s method is applied in this study for computing the variance terms D, D i , and D ~i , which is as follows:

  • Choose a base sample dimension (2000 in this study) and generate sample using some sampling technique (Latin Hypercube sampling in this study).

  • Split the sampled parameter set into two equal matrices (A and B) (see Fig. 1).

    Fig. 1
    figure 1

    Illustration of parameter matrix formulation in Sobol’s method. The matrices A and B are the base matrices of sampled parameters. C and D are derived from A and B by swapping the column of first parameter

  • Derive two matrices C i and D i by swapping ith columns of A and B (see Fig. 1).

  • Perform the Monte Carlo simulation of the model using all the samples in all four matrices (A, B, C, and D), and compute the model performance index for each simulation.

  • Calculate sensitivity following Eqs. 911.

According to Liburne et al. (2006), the first order and total order equations are:

$$ Si = \frac{{\hat{D}_{i} }}{{\hat{D}}} = \frac{{Y_{{(A)}} Y_{{(Ci)}} - f_{0}^{2} }}{{Y_{{(A)}} Y_{{(A)}} - f_{0}^{2} }} $$
(9)
$$ STi = \frac{{\hat{D}_{{ - i}} }}{{\hat{D}}} = 1 - \frac{{Y_{{(A)}} Y_{{(Di)}} - f_{0}^{2} }}{{Y_{{(A)}} Y_{{(A)}} - f_{0}^{2} }} $$
(10)

where, the mean

$$ f_{0} = \frac{1}{n}\sum\limits_{s = 1}^{n} {Y_{A}^{S} } $$
(11)

Note that the S i and S Ti can be computed in eight different ways (Liburne et al. 2006) using Eqs. 9 and 10 by interchanging the simulation matrix A, B, C, and D in Eqs. 9 and 10. The average value of these eight ensembles of sensitivity indices is considered to be the representative value of sensitivity for the parameter.

The total number of simulations required for this computation is ((k + 1) × n), where k is the number of parameters and n is the sample size. Consequently in the current study 38,000 model simulations are performed for the SA (2,000 samples each for 18 parameters along with 2,000 base simulations).

Optimizer—GA

As discussed earlier, the calibration of the model is a highly complex, non-linear optimization problem. The objective of the optimizer in the current study is to identify the optimal combination of parameters of the model that closely match the simulated and measured crop yield. The major concern here is that the objective function of minimizing the error between the simulated and measured crop yield is not a direct function of the decision variables (model parameters in this case). Therefore, despite the existence of a large number of traditional non-linear programming techniques for solving this kind of optimization problem, a search based optimizer is appropriate. In the current study, we employed GA (Holland 1975; Goldberg 1989; Michalewicz 1992) as the optimizer because of its various advantages, which includes their potential to search the solution from a population of points (not a single point), use objective function information itself but not any derivatives, and use probabilistic transitions rules but not deterministic rules. GA has found a large number of applications in complex optimization problems in various branches of science and engineering (Kohler 1990; Bickel and Bickel 1990; Suckley 1991; Cook and Wolfe 1991). GA was used for parameter estimation of bio-physical models (Bulatewicz et al. 2009) and the results were encouraging.

Genetic Algorithm is a random search optimization algorithm inspired by biological evolution that provides a robust method for searching of the optimum solution to complex problems. In a GA, the solution set is represented by a population of strings, which comprises of a number of blocks each representing the individual decision variables of the problem. Strings are processed and combined according to their fitness (objective function value evaluated using the components in the string), in order to generate new strings that contain the best features of two parent strings. Strings with the highest fitness have the greatest chance of contributing to future generations, similar to the process of natural selection. Initially GA suggests a set of candidate solutions to the problems, evaluates the fitness function that is to be optimized, and arrives at the optimal solution by the genetic operations in subsequent generations. A detailed description about the GA is beyond the scope of this article, and the readers are referred to Goldberg (1989) and Michalewicz (1992).

Field experiments

The ORYZA2000 model was calibrated using the data from field experiments conducted at Tamil Nadu Agricultural University, Coimbatore, India during 2 years (1999 and 2000). The experiments were laid out in a split plot design with three replications of three different water applications by growing medium duration rice variety. The experiments were continued in three consecutive seasons (June–October 1999 (kharif season), September 1999–February 2000 (rabi season); June–October 2000) in the 2 years of study (Luikham 2001). The water applications considered for the experiment were (i) application of 5 cm irrigation water depth as and when the standing water has disappeared—no deficit condition (IR1) (ii) application of 5 cm irrigation water depth 1 day after the standing water has disappeared (IR2), and (iii) application of 5 cm irrigation water depth 3 days after the standing water disappeared (IR3). Details of the experiments and crop period are presented in Table 2, which also includes the information about rainfall during the crop growth period.

Table 2 Details of experiments on Rice (Oryza sativa)

The nutrient supply for the crop was done at full recommended levels as per the Crop Production Manuel for the area (TNAU 1994) in order to insure that the crop will not have any nutrient deficiency during the experiment. The major properties of the soil in which the crop was grown is presented in Table 3. Soil information collected during the experiment were fraction of sand, silt, and clay; textural class; organic matter (%); soil pH; electrical conductivity (dS m−1); volumetric water content at FC; and permanent wilting point (PWP), and infiltration rate. Using these informations, the soil properties such as saturated hydraulic conductivity, volumetric water content at saturation, and soil moisture tension at different moisture levels were determined using the pedo-transfer function proposed by Saxton and Rawls (2006). It may be noted that the computed values of moisture content at FC and PWP were closely matching with the measured values.

Table 3 Soil Properties of the experimental farm

During the experiments, the dates of sowing, emergence, transplanting, active tillering (AT), panicle initiation, flowering, and physiological maturity were recorded in each experimental plot. In order to determine the total crop BM and LAI at different stages of crop growth, crop samples were collected at AT, panicle initiation, flowering, and maturity. At the time of harvest, yield components were measured in terms of total crop yield, weight of 1,000 grains and the straw weight. During the period of experiment, the climatic parameters such as values of minimum and maximum temperature, minimum and maximum relative humidity, sunshine hours, wind speed, and rainfall on each day were recorded.

Auto-calibration of ORYZA2000

The procedure of auto-calibration of the model is presented in Fig. 2 in the form of a flow chart. Initially, the GA generates candidate models for the decision variables (parameters) from the feasible region. Using these values for the decision variables, the ORYZA2000 model simulates the crop growth and yield. The model is provided with the soil properties observed during the experiment. The simulated values are used in evaluating the fitness function, based on which the GA develops the next generation candidates. The optimization of fitness function is continued till the maximum number of generation is reached.

Fig. 2
figure 2

Flow chart of the auto-calibration in simulation-optimization framework

The average daily percolation rate was fine tuned after the calibration of the model in order to represent the actual field conditions by considering the measured values of water balance components in the experimental field. The contribution of water through capillary pores to the crop root zone was not considered, since the water table of the experimental field is significantly deep. The effective rainfall during the period of crop growth was computed using the procedure outlined by Bouman et al. (2001), in which the any amount of rainfall above the field bund height is considered to be not supplementing the irrigation.

Results and discussions

Sensitivity of ORYZA2000 model parameters

The Sobol’s SA suggested that nine parameters out of the total 18 parameters of ORYZA2000 were sensitive for the cultivars considered in this study. Table 4 presents the ranking of parameters in their order of sensitivity according to the value of Sobol’s first order index. The most sensitive parameter is RGRLMX, which influence the leaf growth of the crop. The FLVTB, a parameter that facilitates the transfer of shoot BM to leaf growth in the model simulation, is found to be the next sensitive one at a growth stage corresponding to 0.50 DVS, which represents an active leaf growing stage where more leaf production occurs. As the death of leaves take place after flowering, the parameter DRLVT (leaf death rate) at grain filling stage and later (corresponding to 1.00 DVS and above) is also found to be sensitive. The parameters LRSTR and FSTTB (at growth stage corresponding to 1.00 DVS, which is grain filling stage) are found to be important in the crop simulation as these parameters influence the transfer of BM from the stem to grain during the filling process in the model simulation. It is noted that some of the parameters are not sensitive at all for these cultivars as their sensitivity index was zero. The results of the SA suggest that calibration should mostly be focused on estimating the values of the first nine parameters, and therefore only these parameters are considered for auto-calibration. For the non-sensitive parameters, the values corresponding to cultivar IR72 was used, as suggested by Bouman et al. (2001).

Table 4 Sobol’s sensitivity indices for ORYZA2000 model parameters

Parameter estimation using auto-calibration procedure

The data from the experiments described above corresponding to maximum production condition (no deficit condition—IR1) was used for estimation of ORYZA2000 model parameters (9 sets of experiments). The remaining 18 sets of experimental data were used for assessing the performance of the model (validation). As mentioned earlier, 9 model parameters, viz. BM partitioning factors (total 4 parameters, stem reserve factor (1 parameter), relative leaf growth rate (1 parameter), leaf death rate (total 3 parameters), are estimated during calibration of the model. The measured data during the field experiments were LAI and TBM at four different stages of growth and the crop yield at the time of maturity, and were used for computing the fitness function by the GA during the auto-calibration procedure.

The selection of fitness function in any optimization problem is crucial in identifying the appropriate values for the decision variables. In the current study, the decision variables were the parameters of the ORYZA2000 model. The model output is in terms of LAI, TBM at different stages of growth, and the total yield of the crop at the end of growing season. Typically the model parameters are usually estimated by solving the minimization problem, where the sum of squared error between the simulated and measured values of the model output is minimized. It may be noted that the three outputs from the model are different in nature in terms of their unit of expression and the magnitude of values. Therefore, a proper combining mechanism has to be selected for considering the error minimization of all the three outputs together. As suggested by Wallach et al. (2001), we used the following fitness function to be evaluated by GA:

$$ \mathop \sum \limits_{j = 1}^{n} \mathop \sum \limits_{i = 1}^{m} \left( {\frac{{{\text{LAI}}_{i,j}^{\text{o}} - {\text{LAI}}_{i,j}^{\text{s}} }}{{{\text{LAI}}_{i,j}^{\text{o}} }}} \right)^{2} + \mathop \sum \limits_{j = 1}^{n} \mathop \sum \limits_{i = 1}^{m} \left( {\frac{{{\text{TBM}}_{i,j}^{\text{o}} - {\text{TBM}}_{i,j}^{\text{s}} }}{{{\text{TBM}}_{i,j}^{\text{o}} }}} \right)^{2} + \sum \limits_{j = 1}^{n} \left( {\frac{{{\text{Y}}_{j}^{\text{o}} - {\text{Y}}_{j}^{\text{s}} }}{{{\text{Y}}_{j}^{\text{o}} }}} \right)^{2} $$
(12)

where, LAI—leaf area index; TBM—total BM (kg ha−1); Y—yield (kg ha−1); with subscript variables ‘o’ and ‘s’ corresponding to the observed and simulated values, respectively; ‘i’ refers to the stage of crop growth, ‘j’ refers to the season number of the experimental data.

The calibrated values of parameters are presented in Table 5. Note that the phenological development parameters were directly estimated from effective temperature and observed phenology (dates of transplanting, panicle initiation, flowering and maturity) following Bouman et al. (2001). It may be noted that the calibrated values of parameters are lying in between the recommended value of these parameters for rice varieties IR72 and IR64 (Bouman et al. 2001).

Table 5 Auto-calibrated values for ORYZA2000 model parameters

The effectiveness of the model simulation during calibration and validation are presented in Table 6 in terms of the performance indices. The performance index used in this study is NRMSE, which is a ratio of the root mean square error (RMSE) of the estimation and the observed average value of the variable. Ideal value of this index is zero, and any value close to zero indicates a good model performance (less residual in model output). It can be observed from Table 6 that the ORYZA2000 model is able to simulate the total crop yield (Y) reasonably well, as the NRMSE during calibration as well as validation are close to zero. It may be noted that the RMSE reported in Table 6 is computed from a total 18 experiment data. A lower value of RMSE, equal to 287 kg.ha−1 during the validation of the model, indicates that the parameters that are estimated by the auto-calibration framework are able to simulate the crop growth reasonable well in other situations than it is calibrated for. It can also be observed from Table 6 that the estimated parameters of ORYZA2000 is effective in simulating the LAI and the TBM during the validation of the model, though the NRMSE is slightly on the higher side compared to that corresponding to the yield.

$$ {\text{RMSE}} = \sqrt {\frac{{\sum\nolimits_{i = 1}^{n} {(x_{i}^{\text{o}} - x_{i}^{\text{s}} )^{2} } }}{n}} $$
(13)
$$ {\text{NRMSE}} = \frac{{\sqrt {{\raise0.7ex\hbox{${\sum\nolimits_{i = 1}^{n} {(x_{i}^{\text{o}} - x_{i}^{\text{s}} )} }$} \!\mathord{\left/ {\vphantom {{\sum\nolimits_{i = 1}^{n} {(x_{i}^{\text{o}} - x_{i}^{\text{s}} )} } n}}\right.\kern-\nulldelimiterspace} \!\lower0.7ex\hbox{$n$}}} }}{{{\raise0.7ex\hbox{${\sum\nolimits_{i = 1}^{n} {x_{i}^{\text{o}} } }$} \!\mathord{\left/ {\vphantom {{\sum\nolimits_{i = 1}^{n} {x_{i}^{\text{o}} } } n}}\right.\kern-\nulldelimiterspace} \!\lower0.7ex\hbox{$n$}}}} $$
(14)

where \( x_{i}^{\text{o}} \)—observed parameter ‘i’; \( x_{i}^{\text{s}} \)—simulated parameter ‘i’; n—number of parameters.

Table 6 Performance measures of simulations of ORYZA2000 using auto-calibrated values of parameters (for full season)

A scatter plot for comparison of the observed and simulated values of LAI, TBM and the total yield are presented in Fig. 3a–f for visual examination. It may be noted that the data points are less scattered are very close the ideal line of 45°. Nonetheless, the LAI (Fig. 3b) and BM (Fig. 3d) shows a higher scatter in validation compared to the others. This can be plausibly attributed to the error in simulation at various stages of the crop growth, which can be better understood with the data presented in Table 7, which depicts the performance measures of the model at various stages of growth.

Fig. 3
figure 3

Scatter plot of simulated and measured values of LAI (a calibration, b validation), BM (c calibration, d validation), Yield (e calibration, f validation)

Table 7 Performance measures of simulations of ORYZA2000 using auto-calibrated values of parameters (stage wise)

Note from Table 7 that the RMSE value of LAI at flowering and maturity in the validation data are close to each other (1.355 and 1.384, respectively), indicating the simulation of LAI by the model at these two stages are similar. This result may plausibly be due to the model assumption that the entire amount of BM is sent to grain filling after the flowering, which may not be actually the case in the field. Therefore, the simulated value of LAI shows much deviation from the measured, which attributes to the scattering in the Fig. 3b.

It can also be noted from Table 7 that the NRMSE for BM simulation at the AT stage (1.85) is much higher than other growth stages (all are less than unity). It should be noted that the AT of rice crops takes place after transplantation, and the transplanted crop in the actual field may take some time to stabilize and establish the growth (transplantation shock). Since the model assumes continuous growth of the crop even after transplantation, the transplantation shock may not be appropriately simulated by the model. This difference in growth rate (between actual and simulated) may be the plausible reason for high NRMSE value for BM at the AT stage. This may also create a bias in simulation of BM along the crop growth, which may be the plausible reason for the difference between simulated and measured BM at different stages of growth (Fig. 3d).

From Fig. 3e, f, it can be observed that the simulated yield is slightly overestimated compared to the measured yield during the experiments. This can be plausibly attributed to the time duration for the flowering to occur in the actual field. It may be noted that it may take a few weeks for all the plants to complete the flowering, while the model considers that the flowering of all plants take place simultaneously. Since the BM is completely used for grain filling by the model after flowering, the model may output a higher yield.

Figure 4 depicts the daily simulation of LAI and TBM along the crop growth for a typical experiment set up used for validation of the model. The observed values of these variables at different stages of growth are also presented in the Fig. 4, for visual comparison. It is noted from Fig. 4 that the behavior of simulation of LAI and BM accumulation along the crop growth is very well represented in the model as is evidence by the close values of these variables at different growth stages.

Fig. 4
figure 4

Simulated and measured LAI and BM along the crop growing period (Kharif 1999, Irrigation Treatment IR2)

It is worth mentioning that simulated yield, LAI and BM using the default values of the parameters (those recommended by Bouman et al. (2001)) for IR72 and IR64 varieties of rice showed large deviation from those simulated by the model using the estimated parameters, though the values are close to default values. The yield simulated by the model, when IR72 parameters are used, is 8103.8 kg ha−1, and that corresponding to IR64 parameters is 5703.9 kg ha−1 against a measured yield of 8,660 kg ha−1. Note that the calibrated model simulated a yield of 8,910 kg ha−1, which is sufficiently close to the measured value. Therefore, it can be suggested that a calibration of the model parameters is essential when a different variety of rice and cultivar are used in the field.

From the foregoing discussions, it is evident that the parameters of the ORYZA2000 models can be effectively estimated for any rice variety using field experimental data in a simulation-optimization framework. The estimated parameters of the model show good generalization property, and therefore can be used in developing appropriate irrigation schedules for the crop.

Summary and conclusions

In the current study, an auto-calibration framework is proposed to estimate the optimal parameter values of ORYZA2000 model, a rice crop growth simulation model, using GA. The crop growth simulation model is integrated within the GA, and field experimental data has been employed to calibrate the model using the simulation-optimization framework. Prior to the calibration of the model, a SA was performed to prune the number of parameters that are to be calibrated.

The results of the study indicate that the calibrated ORYZA2000 model was able to effectively simulate the crop growth under full irrigation and water deficit conditions. During the validation of the model, it is observed that the simulated yield closely matches with the measured yield under different experiment treatment. The model was found to be efficiently simulating the BM production at various stages of crop growth. It is noted that while there are 18 parameters for the model, some of them are not very sensitive to the final yield and can assume the values recommended by Bouman et al. (2001). The results also suggest that calibration of the model is necessary as the simulated values of yield, LAI and BM are showing large deviation from those values simulated using default values of the parameters. Overall, the results of the study are highly encouraging, and the calibrated model could be used for developing optimal irrigation schedule for rice crop under various levels of water deficit condition.