Keywords

4.1 Introduction

CGMs simulate crop yield as a function of environment, crop genotype, and crop management factors mostly at a daily time step. Crop yield and production-related information are very crucial in determining various agronomic as well as socio-economic policy decisions that could affect the livelihood of a large section of the human population (Kasampalis et al. 2018). These crop models also called crop yield models or agriculture system models are just simplified depictions of the real world (Van Ittersum and Donatelli 2003) represented by a set of mathematical equations used to model the processes of the system (Oteng-Darko et al. 2013). Explanatory process-based CGSMs comprise quantitative descriptions of the processes that control the behaviour of a system (Penning de Vries et al. 1989; Dadhwal 2003) and simulate the diurnal effects of environmental factors on the crop growth and development processes. The process-based crop growth model simulates crop growth processes at a daily timescale starting with the sowing of the crop to the final crop harvestable maturity along with the quantitative information about the crop growth and development at each time step. The equation used in the crop model mathematically represents the elementary process of the “soil-plant-atmosphere” system. Three main modules of the crop model could, therefore, be identified. The soil module describes the processes of water and nutrient transport within the soil profile. The mathematical equations for processes like infiltration, drainage, redistribution, and nutrient transport particularly of nitrate are included in this module (Brisson et al. 1998). The plant module controls two mechanisms: (i) crop growth (biomass production-based on interception and assimilation of photosynthetically active solar radiation and finally limited by senescence) and (ii) crop development that simulates the phenology and drives growth by regulating source and sink (Brisson et al. 1998).

These process-based models are very much input data-intensive, hence posing a challenge to create an input database with reasonable accuracy at a desirable scale to run on an operational basis (Wallach et al. 2001). Hence, the application of these models on a regional scale involves lots of assumptions and uncertainties. RS data could be used that provide spatial information on weather, soil, crop type, crop phenology, and crop condition information on a regional scale. This information is assimilated into to crop growth model after proper validation, thus reducing the uncertainty (Dorigo et al. 2007; Jin et al. 2018). The latest developments in geospatial technology such as mobile phone-based position services and geographical information system (GIS) can also be linked to crop growth models to optimize farm management practices, crop stress, and extreme weather events. These models are also used to test the adaptability of new crop variety under different locations, develop breeding strategies, and predict in-season yield. Boote et al. (1996) carried out a detailed review to find out the potential applications and limitations of CGMs and suggested that a particular model is applicable for a given situation and the same can be replicated to alternate environmental setup with proper calibration. The model prediction accuracy is highly dependent on the limitation of input data which in turn restricts its scalability (Clevers et al. 2002). Nevertheless, CGMs can simulate the impact of economic decisions in terms of crop management factors and weather effects (Batchelor et al. 2002) and thus enable informed decisions making. To summarize, the major limitations in the application of CGMs at a regional scale are the generation of necessary high-quality and accurate input data (Bhatia 2014) which may be cost-intensive and time-consuming.

4.1.1 Crop Growth Models: Scope and Development

Agricultural system modelling started long back in the late 1950s and specific crop modelling activities started a decade later. Since the 1960s, a new era in agriculture sciences started with the modelling of photosynthetic rates of crop canopies leading to the development of Elementary CRop growth Simulator (ELCROS) and BAsic CRop growth Simulator (BACROS) by de Wit (1965). In the 1970s, crop models received significant attention (Pinter et al. 2003) with the first attempt to combine the surveillance capacities of RS data with the predictive feature of crop models under the Large Area Crop Inventory Experiment (LACIE) funded by National Aeronautics and Space Administration (NASA) and National Oceanic and Atmospheric Administration (NOAA). Such experiments were carried out to estimate wheat production by combining RS data and crop models. They have developed a method of estimating worldwide wheat production using LANDSAT data (Erickson 1984).In the 1980s, the United States Department of Agriculture (USDA) had developed a model for the tropical environment after a thorough understanding of the system and its components (Jones et al. 2003; Roubtsova 2014) under International Benchmark Sites Network for Agrotechnology Transfer (IBSNAT) program. This leads to the development of a Decision Support System for Agrotechnology Transfer (DSSAT) (Johnson et al. 2003). A schematic representation of the development of these process-based CGSMs over time is shown in Fig. 4.1. Process-based CGMs are descriptive and dynamic, simulating the process of crop growth and development through time in a phased manner using different sets of equations. These models are equipped with modular functions defining various physiological and soil processes as a function of driving variables like weather and crop management at each time step (Wallach et al. 2014). Such models are more input data-intensive than empirical or statistical models (Di Paola et al. 2016). Empirical or statistical models describe the crop growth and yield response over sites where historical data are available and mostly fail over different environmental conditions (Jones et al. 2017). This is the most important limitation of empirical models for studying the implication of climate change scenarios on crop growth and development. For example, location-specific crop management practices may evolve to increase the crop adaptive capacity for the future climate change scenarios, but these management practices were not considered in the limited data used to develop a particular empirical model (Jones et al. 2017). In this context, process-based CGMs are used as a potential tool not only for analysing the impact of climate change but also for selecting best management practices by optimizing all input resources to increase crop adaptability.

Fig. 4.1
figure 1

Crop model developments over time. (Modified, Jin et al. 2018)

4.1.2 Minimum Data Requirement and Applications

Agriculture system operates at several spatial scales such as field, farm, regional, or global along with diverse temporal scales like hours, days, seasonal, and annual (Ewert et al. 2011). Many crop models require a large number of input data. For example, World Food Studies (WOFOST) used by the European Union Joint Resources Centre Monitoring Agricultural Resources (MARS) Unit Mission requires data for about 40 input parameters. However, the Food and Agriculture Organization (FAO) has developed the AquaCrop model that requires comparatively a smaller number of parameters (Mkhabela and Bullock 2012). The information related to weather, soil, crop management, phasic development of the crop, growing degree days, etc. are required for all crop growth model (Monteith and Moss 1977). According to Nix (1983), these input datasets may range in timescale between hours, daily, or weekly basis. For the first time, Hunt and Boote (1998) defined a minimum input data needed for operating crop growth model. The input required for a crop growth model can be categorized into a minimum and optimum/desirable for specific applications along with initial parameters (Ritchie and Alagarswamy 2002) as presented in Table 4.1. Weather is a driving force for CGSMs; hence, it needs maximum and minimum temperature, rainfall, solar radiation, etc. on a daily or weekly timescale depending on the defined time step of the model in a prescribed format. Several curve-fitting approaches like extrapolation function and interpolation are being followed over weather data for filling spatial and temporal data gaps in the model operation. Similarly, most of the commonly used models take layer-wise soil input data which includes various soil physico-chemical properties like bulk density (BD), soil texture, soil organic carbon (SOC), water content at field capacity and wilting point, initial nitrogen (N) content, and pH. Feeding the crop model with the crop management information is the most challenging task particularly when a simulation is carried out on a regional scale. The details of these data requirements are discussed in the subsequent section along with the scale and resolution issues involved in spatializing the CGSMs.

Table 4.1 Minimum data requirement for a crop growth simulation model

The economic outputs like grain yield, fruit yield, and biomass can be predicted through crop models (Murthy 2003). The management applications of crop simulation models can be categorized as follows:

Strategic application: To run the model before planting.

Practical application: To run the model before and during crop growth.

Forecasting application: To predict yield both before and during crop growth.

Besides the impact of climate change on crop growth and yield, crop vulnerability and adaptability analysis can be assessed through crop growth simulation modelling (Rosenzweig et al. 2014). These models are also useful to analyse the difference between actual and attainable crop yield, i.e. yield gap analysis (Lobell et al. 2009; van Ittersum et al. 2013). A list of popular CGSMs used worldwide along with their special uses is presented in Table 4.2.

Table 4.2 List of most popular crop growth simulation models, their usage and source

However, most of these models are not able to generate integrated information on global climate change (greenhouse gas emission) and its impact on soil C and N dynamics. They are strong either in environmental impact assessment or in crop growth and soil component. Most of these models are not able to simulate the yield loss due to weed, pest, and disease infestation and damage due to extreme weather events like a hail storm and high wind.

4.2 Spatialization of Crop Growth Simulation Models

CGMs assume the simulated unit to be homogeneous for soil type, weather variables, crop management, irrigation, fertilization, variety, sowing, etc. The “extent” of CGMs is the entire region of interest, which could be a watershed or a big command area or an administrative boundary like district or state. This “extent” may consist of a finite number of smaller homogeneous areas called “support unit” or “unit of simulation”. In reality, the region is often characterized by significant spatial variability, which is difficult to account fully. The application of CGMs on a larger area, than that for which it has been designed, is called the “spatialization” of the CGM. This is the application of crop models on a regional scale with inherent large heterogeneities in the soil, weather, and crop management factors between the units of simulation. Thus, spatialization leads to an analysis of the use of CGMs on units or scales outside the defined domain of validity of the hypotheses and the dedicated scale of the original model. A crop growth model is characterized by a spatial and temporal scale. But this review is confined to the spatial aspects since the focus of this chapter is on the spatialization of the crop growth model. The change of scale here pertains to the transition from a smaller unit of simulation to a bigger region. According to Robert et al., change in the scale of the model involves alteration of the scale of input-output data, validation, and the framework structure of the model. To put it in the proper perspective, the water flow model in hydrodynamics is governed by Navier-Stokes equations at a finer scale of soil pores. It is further generalized by Darcy’s law at the scale of a soil column. The CGMs further upscale it with generalized equations of the flow of water at a plot size of 1 m2 or even under the controlled laboratory condition. In practice, the model parameters are fixed by calibrating the model by experimentation on the scale of an agriculture field. The models can be upscaled by (i) collecting input data for each unit of simulation under the region of interest, (ii) considering the interaction between these simulation units, and (iii) evaluating the performance of the output. All these three levels are linked with the availability of spatial data, scale changes, and associated issues. All these aspects of spatialization are discussed in detail in the following sections.

4.2.1 Issues and Methods Involved in Spatialization

The spatialization of a CGM involves space-time variation of the soil-plant-atmosphere system. It could be addressed in two ways: firstly, by characterizing the environmental variables like soil and weather and their interaction with the biological system, and, secondly, by taking care of the diverse human-induced crop management factors. The environmental data required for running a crop growth model includes the weather variables like maximum and minimum temperatures, precipitation, solar radiation, humidity, and wind speed, along with soil physico-chemical properties. In reality, these data are not available everywhere at a desirable scale; hence, they are usually measured or estimated for a given spatial unit (meteorological station and soil profile) for a limited number of locations within the region of interest. To run crop models on a regional scale, it is, therefore, necessary to estimate these parameters at the required scale for each unit of simulation. This involves a spatial approximation, broadly categorized into three groups of approaches. The first approach includes traditional choropleth mapping without taking random components into consideration. Classical soil mapping techniques (Legros 1996) comes under this approach. Thiessen polygons, trend analysis, or arbitrary weighted averaging of data also belong to this traditional mapping category (Laslett et al. 1987). The second category is based on statistical modelling considering the spatial variability, also termed as geostatistical techniques (Webster and Oliver 1990; Goovaerts 1997). The most popular geostatistical method is kriging or several modified forms of kriging to deal with different types of point-based, continuous, and categorical variables, using normal, log-normal, or other probability density functions. The geostatistical approach involves spatial interpolation to estimate the missing values at a point in space based on known values at neighbouring points. There are different types of spatial interpolation such as gridding, area averaging, and the estimation of missing data from neighbouring stations. These methods vary in their complexity, constraints on inputs, and computation procedures. Various methods like kriging and co-kriging, inverse distance weighting (IDW), and thin-plate splines, etc. are popular in regional soil, weather, and crop analysis (Phillips et al. 1992; Hudson and Wackernagel 1994). IDW and simple kriging are pure geostatistical techniques of spatial interpolation whereas co-kriging takes advantage of additional knowledge obtained from external variables. Geostatistical approaches are most suitable for variables exhibiting stationarity and continuous spatial variations (Voltz and Webster 1990). Hence, it performs better for soil and climatic variables such as mapping of rainfall, temperature (Voltz and Webster 1990), soil texture, and soil pH (Creutin and Obled 1982; Van Meirvenne et al. 1994). The third category of methods is known as mesoscale modelling where the physics of the phenomenon is used to model its spatial behaviour (Takle 1995). Here, the spatial estimation of a variable is made based on the simulation of the processes that control the variable. For example, the prediction of the actual spatial variability of soil physico-chemical properties can be done based on the simulation of soil formation on a landscape scale (Minasny and McBratney 2001). But this kind of process-based approach is being developed to be augmented with the crop models on an operational basis.

As discussed above, the first approach is adopted by the EUMARS project, where the model is made to run on gridded input data as reported by Dallemand and Vossen (1995) and Rijks et al. (1998). An alternative approach was implemented by FAO (Gommes et al. 1998) where the model was made to run on actual available data and the yield output is subsequently interpolated using external variables like normalized difference vegetation index (NDVI) to guide the interpolation. Similarly, satellite enhanced data interpolation (SEDI) method is used for assisted interpolation taking advantage of the correlation between the variables to be interpolated and the environmental drivers (such as crop yield and NDVI/biomass). Hoefsloot (1996) described this concept of interpolation along with the technique of software implementation. This technique is applicable to any parameter of interest having spatial correlation and well distribution over the desirable extent of interpolation.

When the measured values of soil properties at each field are available, one generally relies on soil surveys in order to proceed for the spatialization of the crop growth model. These soil surveys provide information about the intrinsic spatial variability of soil physico-chemical properties. Voltz and Webster (1990) found that when soil properties vary abruptly classification is a better approach than standard kriging method. Thus, soil maps available on different scales can serve as a base for obtaining soil properties at the simulation unit. But in general, CGMs don’t take soil types or soil textural classes as direct input. It requires quantitative measurements of properties like soil depth, percentage of sand silt and clay, bulk density, water-holding capacity, and hydraulic conductivity. Hence, quantitative maps of soil properties need to be created from the existing soil map keeping the scale factor in mind (Leenhardt et al. 1994). Soil maps at finer scales (i.e. 1:10000 or 1:25000) would provide better spatial variability of intrinsic soil properties than the coarser soil maps of 1:100000. In this context, pedo-transfer functions (PTFs) have been developed to derive soil properties difficult to measure from the widely available basic soil properties (Bouma 1989). For example, soil hydraulic property is usually derived from soil textural information using PTFs. Several large soil database such as World Inventory of Soil Emission Potential (WISE) (Batjes 1996), USDA Natural Resources Conservation Service (NRCS) pedon database (NRCS, USDA 1994), UNSODA (Leij et al. 1996, 1997), and Hydraulic Properties of European Soils (HYPRES) (Lilly et al. 1999) have been widely used for the development of PTFs. In India, the National Bureau of Soil Survey and Land Use Planning (NBSSLUP) soil maps available at 250k and 50k are used in PTFs to generate quantitative soil properties and assimilate them to simulate CGMs at a regional scale.

For the spatialization of crop model, two main approaches are used to generate weather inputs, namely, zoning and interpolation (Leenhardt et al. 2006). In the zoning approach, the weather data of a meteorological station available in a specified zone is considered as the representative weather for the entire zone. Another alternative approach is the interpolation of the point weather data using nearest neighbour, arithmetic mean, optimal interpolation, spline function, kriging, etc. to generate spatial weather layers (Creutin and Obled 1982). The lack of observed daily weather data at the required scale is the most challenging constraint to simulate the effects of weather on crop growth and yields (Van Wart et al. 2013, 2015; Grassini et al. 2015). Currently, gridded weather data (GWD) are generated at a regional and global scale on an operational basis and regularly used in CGMs for decision supports (Miner et al. 2013; Mourtzinis et al. 2016). GWD is usually generated from satellite-derived weather information and/or interpolation of weather data from available meteorological stations using stringent empirical algorithms at defined spatial and temporal resolution. Influences of these gridded weather data on the simulation of CGMs are studied by various authors (Angulo et al. 2013; Zhao et al. 2015; Rezaei et al. 2015). These studies are mostly based on GWD at a very coarse spatial resolution like 50–100 km such as NASA-POWER (http:// power.larc.nasa.gov/), NCEP (National Centre for Climate Prediction http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanalysis2.html), and CRU (Climate Research Unit; http://badc.nerc.ac.uk/data/cru/). However, there is a lack of robust assessment of most recently developed GWD with a higher spatial resolution (<20 km2), with respect to their potential usage particularly in the application of crop growth simulation models. The Prediction of Worldwide Energy Resources dataset from NASA (NASA-POWER) has been widely used as weather inputs in various crop models throughout the world. This is a weather database having daily weather attributes including solar radiation and maximum and minimum temperature, for 100 × 100 km raster of the entire globe starting from 1983 to date. These data are derived from satellite observations coupled with Goddard Earth Observing System climate model to obtain complete terrestrial coverage on a daily timescale. The quality evaluation of this NASA-POWER as input to the CGSMs has been carried out with mixed results (White et al. 2008; Bai et al. 2010, Biswal et al. 2014).

A major issue of mapping soil and weather input data is the problem of change in scale. Most often, the measurement units of weather data are smaller than the simulation units; hence, the problem lies in upscaling of the measured or mapped input data. This requires a thorough understanding and knowledge of the variable across space and its aggregation over the simulation unit. In some cases, the change of scale could also be the other way round and requires downscaling of the weather data to suit the simulation unit. For example, Priya and Shibasaki (2001) estimated the required local information from meteorological stations of a national network and digital terrain model with a large scale using a purely statistical approach. In the context of spatial input data generation for CGMs, limited analysis has been done on the sources of error and their propagation. Crosetto et al. (2000) and Tarantola et al. (2000) presented a comprehensive approach to analysing uncertainty and sensitivity through GIS-based models for accurate and precise results, but this study is generic without specific application to CGMs. Poor-quality input data resulting from measurement errors or poor spatial aggregation or disaggregation are often the main source of errors in CGMs. Geographic information system (GIS) along with spatial data analysis plays an important role in integrating the crop model output into a larger geographic area (Delécolle et al. 1992; Ewert et al. 2011). Several researchers have demonstrated the linkage between crop model, GIS, and RS technology for regional crop forecasting (de Wit et al. 2010; Ma et al. 2013); precision agriculture (Seelan et al. 2003); yield gap analysis (Sibley et al. 2014); agro-ecological zoning (Ismail 2012); and crop suitability assessments (Mustafa et al. 2011; Mustak et al. 2015). Leitão et al. (2018) reported that broad-scale RS facilitates cost-efficient fast and periodic monitoring of the ecosystem in a larger area but is less useful for local scale applications (Leitao et al. 2018). With the advancement in RS technology, particularly the development of multispectral and hyperspectral sensors, RS has been proving its potential for upscaling vegetation parameters. But the trade-off among various sensor resolutions, viz. spatial, temporal, spectral, and radiometric, is the major limitation in the application of this technology. Further, it is a nearly impossible task to measure plant parameters in situ across the simulation unit, but we can always follow the empirical or physically based approach for biophysical parameter retrieval using RS technique. Upscaling in environmental research involves a combination of data generated at different temporal and spatial scales (Finke et al. 2002). It is important to consider the “observation scale” (spatial and temporal resolution of the measured data), “modelling scale” (temporal and spatial scale at which model operates), “operational scale” (scale time and space where the process operates), and finally the “geographic scale” (area of interest or target area of the research) in the upscaling protocol (Wu and Li 2009). Understanding the complexity of scale is essential as the mechanism of a model can be different at different scales. Models optimized for a particular scale become ill-performing at another scale without re-optimization (Wu and Li 2009). Heterogeneity is the intrinsic characteristic of the earth’s surface which is a mosaic of different patches of vegetation type, soil, and land use. Even when a landscape looks homogeneous at a particular scale, the possibility of having inhomogeneity increases with the increase in the spatial resolution (Wu et al. 2000). Hence, in the RS domain, heterogeneity is a relative concept which is highly linked with the sensor spatial resolution (Li et al. 2014). Besides heterogeneity, another issue is the “linearity” or “non-linearity” involved in the scaling between the RS measurement and the biophysical parameter of interest (Wu and Li 2009). Besides the issues of spatial resolution, the issues of temporal resolution of the sensors also need to be addressed. Li et al. (2014) demonstrated that in agroecosystem studies, some parameters could be retrieved from sensors with high spatial and low temporal resolution, whereas others need lower spatial and higher temporal resolution. The scale of the system to be modelled always depends on the objective of the study, and thus, identification of suitable scale for monitoring a particular system is the most important factor (Chemin and Alexandridis 2006; Alexandridis et al. 2008), along with different issues of aggregation and disaggregation of RS-derived information with minimal uncertainty (Alexandridis et al. 2010). RS satellites with higher spatial resolution such as SPOT, IKONOS, and Quickbird have a lower temporal resolution and narrow swath. On the other hand, satellites with a lower spatial resolution (coarser than 300 m) such as Terra/Aqua MODIS, NOAA, and AVHRR have daily global coverage. For crop modelling studies, these coarse-resolution satellites are preferred as they can generate a time series of information during the crop season. It should be noted that crop models are not expected to provide spatial information per se; rather, they require spatial information to operate. Thus, the combined use of RS data with crop models provides significant advantages by generating the “missing spatial information” expanding their coverage in two-dimensional spaces (Launay and Guerif 2005). This spatial information is crucial for the varied application of crop models starting from precision farming to regional yield prediction (Azzari et al. 2017). Furthermore, in-season monitoring of crops and providing preharvest yield estimation at various spatial and temporal scales are important for decision-making in trading, logistics, and insurance. This aspect of spatial data generation for running CGSMs is discussed in detail under Sect. 4.3 of this chapter.

4.2.2 Establishment of Spatial Framework

GIS tool enables the point-based crop model to simulate regional crop growth development and yield. Depending on the types of linkages of the model to GIS, three types of interfaces are identified: (i) linking, (ii) combining, and (iii) integrating (Hartkamp et al. 1999). The selection of interface is highly dependent on the factors like the objective of the research and expertise of the user along with the available computational framework. Simple linkage strategies employ GIS for spatially displaying the model outputs using interpolation techniques. In this approach, communication between GIS and the model takes place through unique identifiers of grid cells or polygons existing in input-output files and transferring the data back and forth in ASCII or binary format (Dadhwal 2003). The linkage of WOFOST model to ARC/INFO illustrates this aspect well (van Laanen et al. 1992) though in this approach, the full potential of GIS is not exploited. In combining strategy, the model is configured with interactive tools of GIS enabling automatic data exchange along with displaying model results (Burrough 1996). This approach usually involves complex programming and data management than the “linkage” approach. An illustration of this technique is presented by Engel et al. (1997) in the Agricultural and Environmental GIS (AEGIS) with ArcView. Integration is still more complex than the above-mentioned approaches and involves the incorporation of one system into the other. The application of GIS interfacing in modelling had been initiated in the mid-1980s particularly in the field of hydrological modelling. The GIS-enabled applications of crop models have been demonstrated by various researchers worldwide, such as regional/global crop yield calculation and productivity analysis (Calixte et al. 1992), precision farming, climate change, and agro-ecological zonation (Aggarwal 1993). Lal et al. (1993) had carried out regional productivity analysis using DSSAT-BEANGROW and AEGIS. Han et al. (1995) studied potato yield and N leaching distribution for site-specific crop management by developing an interface between PC ARC/INFO GIS and SIMPOTATO simulation model at South Central Washington state. Aggarwal (1998) suggested a land-use option for Haryana state in India by integrated simulation modeling, expert knowledge, and GIS optimization techniques. In this study, agro-ecological land units were delineated by overlapping maps for soil attributes and climatic normal rainfall in GIS-IDRISI. Similarly, Sehgal (2000) developed a protocol of near-real-time crop monitoring system for crop condition assessment and yield forecasting for the Haryana region of India linking WTGROWS with GIS assimilating RS-derived biophysical parameters. Ines et al. (2002) studied water use efficiency of rice, maize, and groundnut at basin scale for the Laoag River basin in the Philippines using a GIS-enabled crop growth model. Junguo Liu (2009) presented “GEPIC”, a GIS-based model to estimate crop water productivity regionally with a spatial resolution of 30 arc-minutes. Cedrez and Hijmans (2018) computed the potential yield (Yp) of crops for the entire world using WOFOST and LINTUL model.

4.3 Remote-Sensing-Based Retrieval of Crop Biophysical Parameters

Process-based CGSMs can incorporate physiological as well as biological knowledge of plants and are also capable to model the interaction between plants and their environment. In these models, vegetation state variables, such as developmental phase, leaf area index (LAI), above-ground biomass (AGB) are linked to driving variables like nutrient availability, weather conditions, and management factors. The final output of these models is the crop yield or accumulated biomass (Dele’colle et al. 1992). Hence, the information related to the crop canopy state variables such as LAI and AGB is a prerequisite to simulate the CGMs. Several techniques have been used to retrieve canopy state variables from reflective RS observations (400–2500 nm) by many researchers. Moulin et al. (1998) had proposed the use of sensors to parameterize CGMs based on the measurement of actual crop status. Several authors carried out studies related to the potential use of sensors including in situ, proximal sensing, and RS sensors to enhance prediction of CGM (Hoefsloot et al. 2012; Bai et al. 2012). However, the estimation of biophysical attributes in situ is a laborious and time-consuming task (Weiss et al. 2004) even with the help of automation systems. Hence, satellite-based RS data is the only source of information to retrieve biophysical parameters at regional scales (Camacho et al. 2013; Kolotii et al. 2015; Shelestov et al. 2015). Retrieval of biophysical parameters using various spatial and temporal RS data has been an active area of research for the past several decades (Wiegand et al. 1992; Chen et al. 2002; Walthall et al. 2004; Ganguly et al. 2012; Li et al. 2015a, b).

The existing biophysical parameter retrieval methods are empirical or physical in nature. Physically based models simulate spectral response based on input such as leaf constituents, canopy architecture, sun-viewing geometry, background soil (Ganguly et al. 2012; Li et al. 2015a, b), and inverted back using observed spectral response and limited known input parameters. Different techniques like lookup tables (Ganguly et al. 2012), numerical optimization, and machine learning (Walthall et al. 2004; Verrelst et al. 2012, 2013) techniques are successfully used for inversion of the model. The empirical models basically linked biophysical attributes with various RS-based spectral indices (Turner et al. 1999; Fensholt et al. 2004; Verrelst et al. 2012). These models are quite easier to implement, site-specific, and data-driven. Hence, its scalability is limited. The selection of the most sensitive and informative spectral features is important in the empirical approach. The addition of all possible spectral features increases the complexity and dimensionality and required optimization to make the empirical model simple for regional applications. The details of these approaches and methods are discussed in Chap. 3 of this book.

4.3.1 Importance of Remotely Sensed Crop Biophysical Parameters

LAI is the most important crop biophysical parameter and a vital component of the process-based CGMs. It’s a dimensionless quantity representing a one-sided leaf area per unit ground surface area. Spatially explicit measurements and retrieval of LAI from RS data are indispensable to model for simulation of ecological variables and processes at regional scales (Green et al. 1997). Recent studies have successfully demonstrated the retrieval of LAI using different parametric and nonparametric regression as well as physically based models (Cho et al. 2007; Im et al. 2009; Liu et al. 2018; Xie et al. 2019; Upreti et al. 2019). The RS data is also used to retrieve fractional absorbed photosynthetically active radiation (fAPAR) as it is highly related to dry matter production of a crop (Dong et al. 2013). Hilker et al. (2008) had conducted an experiment to retrieve fAPAR using spaceborne RS data for modelling plant dry matter production. They concluded that NDVI and enhanced vegetation index (EVI) are most promising for retrieval of fAPAR. Upreti et al. (2019) retrieved fAPAR through a hybrid approach using a neural network to train PROSAIL canopy reflectance model. Similarly, LUE is one of the key biophysical parameters require to calculate potential plant production. Hilker et al. (2010) have used the photochemical reflectance index (PRI) using narrow bands of 531 and 570 nm to retrieve LUE. However, it is difficult to upscale this retrieval process to canopy level at a regional scale (Rahman et al. 2001; Hilker et al. 2008).

Biomass production estimation is one of the main areas of research in modelling of vegetation growth and development. However, most of the RS satellites can measure AGB, as not able to observe below-ground biomass (Lee 1978). Several kinds of research have demonstrated successful biomass retrieval using RS data (Claverie et al. 2009; Jin et al. 2015a, b).

Water stress is one of the critical limiting factors for the growth and development of crops which leads to a yield gap between actual and potential production. Hence, information regarding plant or crop moisture content is important for crop growth and yield modelling. Many researchers across the globe have used various RS-based indices, spectral information along with climate data to retrieve crop moisture stress. Lee et al. (2010) reported the use of thermal infrared (3–12 μm) for crop water estimation. Jackson et al. (1981) used the crop water stress index (CWSI) to measure crop moisture. Govender et al. (2009) reported the use of middle- and shortwave infrared bands for plant water stress measurement. Two most popular spectral indices, namely, normalized difference water index (NDWI) by Gao (1996) and water band index (WBI) by Penuelas et al. (1993), are being used to measure crop moisture content. Djamai et al. (2019) carried out studies on the retrieval of canopy water content using Sentinel-2 and Landsat-8 data.

Plant nitrogen content (N) is one of the most important biochemical constituents of leaf chlorophyll content and therefore strongly correlated to plant photosynthetic activity (Diacono et al. 2013). Several researchers have found strong correlations between spectral indices and plant chlorophyll content. Yao et al. used NDVI to retrieve chlorophyll in wheat crops. Bagheri et al. (2012) employed soil adjusted vegetation index (SAVI), modified soil adjusted vegetation index (MSAVI), and optimized soil adjusted vegetation index (OSAVI) for leaf chlorophyll retrieval of corn. Jain et al. (2007) used several red-edge bands to retrieve chlorophyll content in potato. Clevers and Gitelson (2012) estimated plant N and chlorophyll content using MERIS and Sentinel-2 data using an empirical approach. Similarly, Miphokasap et al. (2012) used ground-based hyperspectral data for retrieval of canopy N content using the empirical method.

4.3.2 Scale Issues in Remote-Sensing-Based Parameter Retrieval

The retrieval of parameters by inverting models does not express the characteristics of scale explicitly; they may be suitable for homogeneous surface or point measurement (Raffy 1992). Chehbouni et al. 2000 stated that it is not appropriate to use the locally calibrated relationships, between the modelled and observed variables, at a regional scale simply by scaling the parameter. As a result, they need to be reparameterized to adapt to the new circumstances since the driving forces or mechanisms may be totally different at different scales. Hence, a model designed and calibrated at the leaf scale may not hold good at the canopy level. Consequently, the models or algorithms developed at one scale need to be revised for its application to other scales, and the impact of scale on the mechanism of the model or algorithms is to be investigated prior to changing the scale. Scale applicability of basic laws of physics such as Lambert’s law (Li and Strahler 1985), Beer’s law (Albers et al. 1990), and Planck’s law (Li et al. 1999) at the pixel level is being discussed by various researchers. Their results suggested that the scale applicability needs to be considered carefully for retrieval of the model at different scales. Besides this, the heterogeneity of land surface and the linearity or non-linearity of retrieved parameters are also highly related to scale. In reality, heterogeneity is a surface property varying over scenes (Garrigues et al. 2006) and is a relative concept highly dependent on the sensor resolution. As the spatial resolution of the sensor becomes finer, the possibility of pixel heterogeneity increases. The surface heterogeneity greatly affects the parameter retrieval strategy using RS data (Chen 1999). Garrigues et al. (2006) suggested two strategies to minimize the errors in scale change; the first one is to quantify the intra-pixel heterogeneity, and the second is to define proper pixel size to capture the variability and minimizing the intra-pixel variability. There is no arbitrary conclusion about the effect of linearity or non-linearity of retrieval models on scale change. When the retrieval models for different cover types are quite different, the linear retrieval models could also be affected highly by scale effect (Chen 1999). At the same time, when the medium is homogeneous, non-linear retrieval models also cause no scale effect as demonstrated by the Taylor series expansion (Hu and Islam 1997; Garrigues et al. 2006). Chen (1999) suggested that scaling error is more when a non-linear algorithm is applied to mixed pixels with different land cover types. Various authors proposed different techniques to minimize the scale effect in RS-based retrieval of biophysical parameters (Verhoef 1984; Jacquemoud and Baret 1990; Raffy 1992; Tian et al. 2003). Hu and Islam (1997) demonstrated that different parameterization and assumptions in retrieval models can lead to a different conclusion for the same physical process. There are conflicting conclusions in the literature describing whether the products are scale-dependent or scale-free. There is relevant literature addressing these scale-change issues, such as bidirectional reflectance distribution function(BRDF) and albedo (Liang et al. 2002), temperature (Liu et al. 2006), emissivity (Zhang et al. 2004), carbon flux (Thorgeirsson and Soegaard 1999; Sasai et al. 2007), soil moisture (Hu et al. 1997; Oldak, et al. 2002; Manfreda et al. 2007; Das and Mohanty 2008), NDVI and vegetation fraction (Jiang et al. 2006; Tarnavsky et al. 2008), LAI (Hu and Islam 1997; Chen 1999; Fernandes et al. 2004; Garrigues et al. 2006; Jin et al. 2007; Hufkens et al. 2008), net primary production (NPP), and gross primary production (GPP) (Simic et al. 2004; Turner et al. 2004). It can be concluded that we need to change the scale of the retrieval models through appropriate assumptions and approximation. Furthermore, there should be a clear separation of system errors from the errors arising from retrieval models due to scale changes.

4.4 Assimilation of Parameters into the Process-Based Crop Growth Simulation Model

The objective of the spatialization of a crop growth model is to simulate the crop growth and development on a regional scale, where significant spatial variability in soil, weather, and crop state variables exists along with the large uncertainties (Hansen and Jones 2000). These uncertainties result in large errors in crop growth simulation and yield estimation. In this context, RS technology plays an important role by facilitating input data generation for CGMs particularly generating the “missing spatial information” and thereby reducing the uncertainty in the spatialization of CGMs and yield estimation. The recent development in RS technology helps us to generate accurate, reliable, and quantitative information on soil, weather, and crop parameters at the regional scale. Many researchers have retrieved canopy state variables, soil, and weather parameters using different techniques and RS data. A detailed list of some of these studies that are relevant from the crop modelling point of view is presented in Chap. 3 of this book. Several researchers have used RS to retrieve crop state variables or soil properties over large areas, such as fAPAR (Sakowska et al. 2016; Upreti et al. 2019), LAI (Fang et al. 2008; Jiang et al. 2014; Liu et al. 2018; Pasqualotto et al. 2019), fraction of vegetation cover (fCover) (Djamai et al. 2019), biomass (Claverie et al. 2009; Jin et al. 2015a, b), leaf N content (Huang et al. 2013), evapotranspiration (Huang et al. 2015), and soil properties (Dente et al. 2008; Ines et al. 2013; Chakrabarti et al. 2014).

These retrieved biophysical parameters of soil, weather, or crop canopy states need to be integrated with CGMs. In recent years, rapid and parallel development in CGMs as well as in RS and information technology (IT) leads to the development of their combined applications. The availability of higher spatial resolution sensors such as Sentinel-2, SPOT-6, Landsat-8, Rapid Eye, World View-2, and GeoEye-1 with high temporal frequency combined with wide spatial coverage and low operating cost facilitates operational crop growth monitoring and assessment in regional scale. Similarly, there has been rapid development in IT leading to robust computational infrastructures, algorithms, and techniques for processing huge RS data and generating relevant information for improving the predictive capability of CGMs both in temporal and spatial scales (Launay and Gue’rif 2005). In this context, various data assimilation (DA) techniques have been developed allowing a formal and well-understood way to combine the predictions of simulation models with RS or ground-based observations. In this process, the model predictions are matched with the observed data limiting the errors due to poor local parameterization. Further fine-tuning could be done by retrieving the local parameters using RS techniques (Xi et al. 2019). In the context of DA, one needs to first distinguish observed variables (RS or ground-based), state variables (crop model system generated), model parameters (establishing relationships between observed and state variable), and output variables (crop yield in most of the DA) (Delécolle et al.1992). Several algorithms and techniques have been developed worldwide to facilitate DA through the combined use of crop models with RS data (Mass 1988; Guerif and Duke 2000; Dente et al. 2008; Curnel et al. 2011; Wang et al. 2013; Huang et al. 2015) to improve the accuracy of CGMs and in turn estimation accuracy of crop yield in regional scale. Various DA methods usually optimize the difference between the measured evidence (RS observation) and modelled prediction by using Bayes’ rule (Huang et al. 2019a, b). Many techniques have been developed to carry out this Bayesian update, and their relative advantage is based on the assumption made to solve for the posteriori analysis for probability density function (pdf) of the parameter or state variable.

The schematic flow diagram of a typical DA system involving crop model and RS-derived biophysical parameters is presented in Fig. 4.3. A point-based crop growth simulation model is to be calibrated and validated on a field scale, and the most sensitive parameters of the model need to be fixed along with the smart assumption of the less sensitive and difficult to measure parameters in the study region. The calibrated model will then be able to simulate crop growth and development. Then, the calculated uncertainties in the calibrated parameters propagate through the model to account for the limitations in the process of calibration and parameterization. After the calibration, the model is ready to simulate the crop growth by providing local predictions of a large number of biophysical variables such as development state (DVS), LAI, AGB, evapotranspiration (ET), and soil moisture. At the same time, satellite-based RS has the potential to provide an independent estimate of these parameters over large areas. Then the DA methods will seek to update the uncertain model simulations of LAI, AGB, SM, etc. to match the certain observations obtained through earth observation (EO) systems so that pdf is consistent with both the model and observation. The model with the embedded DA process can run in the forward direction towards the harvest to simulate crop growth and yield using the short-term as well as seasonal weather forecasts.

4.4.1 Methods of Remote Sensing Data Assimilation

Extensive reviews on the assimilation of RS-derived biophysical parameters into CGMs have been carried out previously by several authors (Maas 1988; Delecolle et al. 1992; Liang 2005; Dorigo et al. 2007; Lewis et al. 2012; Kasampalis et al. 2018; Jin et al. 2018). Similarly, various techniques have been developed to integrate RS observations in the agroecosystem models. In general, three different strategies are applied which are described by researchers worldwide (Dele’colle et al. 1992; Moulin et al. 1998; Olioso et al. 1999; Makowski et al. 2002; Bach and Mauser 2003). Three broad methods of DA, i.e. calibration, forcing, and updating techniques, have been used globally and are discussed in the following section.

4.4.1.1 Calibration Method

The main aim of the calibration method is to minimize the differences between the RS data and the simulated data of the crop model using an optimization algorithm. The initial parameters of crop models are adjusted to optimally match with the simulated state variables of the crop model with the RS data (Fig. 4.2a). While calibrating the sensitivity, uncertainty analysis of crop models is carried out manually or automatically running the model using a set of realistic parameters within range. Several studies have been carried out using RS DA into crop models using the calibration method. The main disadvantage of the calibration method is to parameterize the complex relation existing among the model variables. The popular algorithms are mentioned as below:

  1. (a)

    Maximum likelihood solution (MLS) (Dente et al. 2008)

  2. (b)

    Simplex search algorithm (SSA) (Launay and Guerif 2005; Ma et al. 2008; Claverie et al. 2009; Ma et al. 2013)

  3. (c)

    Least squares method (LSM) (Zhao et al. 2013)

  4. (d)

    Powell’s conjugate direction method (PCDM) (Fang et al. 2008, 2011)

  5. (e)

    Shuffled complex evolution (SCE-UA) (Shen et al. 2009; Ren et al. 2009, 2011; Jin et al. 2010; Ma et al. 2013; Wang et al. 2014; Huang et al. 2015)

  6. (f)

    Very fast annealing algorithm (VFSA) (Dong et al. 2013)

  7. (g)

    Particle swarm optimization algorithm (PSO) (Wang et al. 2014; Liu et al. 2014; Jin et al. 2015a)

Fig. 4.2
figure 2

Schematic representation of different methods for the assimilation of remotely sensed model state variables in agroecosystem models: (a) calibration, (b) forcing, and (c) updating. (Adopted, Dele’colle et al. 1992)

4.4.1.2 Forcing Method

Forcing methods use the RS data to replace the crop model simulation data (Fig. 4.2b) at each time step. The time step may be daily, weekly, or monthly which may not match with the temporal resolution of satellite data in most of the cases. Under normal circumstances, the temporal resolution of a satellite is less than the time step of the crop model. RS observations are available at a predefined temporal resolution of the satellite observations and generally less frequent than the model time step. Hence, various interpolation techniques like wavelet approaches, linear interpolation, and fast Fourier transformations (Roerink et al. 2000) have been used to fill the gaps between two observations. It helps to derive state variables as per the required time steps of the model. LAI data retrieved from RS images are most often used as an input parameter and state variable into a crop growth model. Huang et al. (2001), Clevers et al. (2002), Schneider (2003), Abou-Ismail (2004), Hadria et al. (2006), Thorp et al. (2010), Tripathy et al. (2013), and Yao et al. (2015) have retrieved LAI using different RS data. The simulated results of crop models were directly replaced by the retrieved LAI to improve the simulated LAI, AGB, and yield of crop models. Morel et al. (2012) used the estimated interception efficiency index (ε) and fAPAR as input into the MOSICAS model for estimating the yield of sugar beet and sugarcane, respectively. DA of RS data into the crop model is easy to operate using the forcing method. During this process, the simulated state variables were only replaced by the retrieved state variables derived from RS data.

4.4.1.3 Updating Methods

The updating method deals with continuous updating of model state variables with RS-based variables as per the availability (Fig. 4.2c). This method is based on the assumption that an updated state variable at each time step better simulates the state variable. It improves the accuracy of the simulated state variable at a succeeding time step. It is also referred to as sequential DA and many algorithms have been developed for this assimilation technique (McLaughlin 2002). These methods provide more flexibility in terms of data availability, but accounts for the errors in both observed and modelled state variables may affect the final output.

4.4.2 Issues in Data Assimilation

The assimilation of RS-derived biophysical and biochemical state variables into CGSMs can improve its predictive performance at a regional scale (Launay and Gue’rif 2005). However, RS-derived state variables may contain some observational error (Bastiaanssen et al. 1998). In forcing method, the model follows the observed state variable and may include observation errors. However, the “calibration” and “updating” methods offer more flexibility in the assimilation of RS-based state variables and their associated errors in the model. Nouvellon et al. (2001) reported that the calibration method could generate more representative parameters based on the simplified physical description of the underlying processes and thus improves model prediction. But it’s only applicable if there are a sufficient number of observations, and the observation error is also small. As it needs more computation time for the optimization process to assimilate RS data, the calibration method finds limited applications. However, this problem can be overcome by testing more robust and less time-consuming procedures such as methods based on extended and non-linear Kalman filtering (KF) (Nouvellon et al. 2001). The updating method has significantly reduced the computational times as compared to the calibration method as it requires a single run. Besides, in updating methods, the model state variables need to be adjusted during the model run itself and often intervene in the model structure and processing loops to a large extent. Walker et al. (2001) concluded that KF is a superior over the forcing method using a synthetic case. It has been successfully demonstrated that RS-derived biophysical variables can be utilized to calibrate parameters and initialize variables such as initial LAI and sowing date (Maas 1988; Guerif and Duke 2000). It can also be used to adjust or replace a state variable (LAI and fAPAR) in agroecosystem models (Bach and Mauser 2003; Launay and Gue’rif 2005). Most of these studies were carried out at subregional to local scales. However, these models can still be operated at the individual field level with high-resolution satellite data such as SPOT, Landsat TM, and Sentinel. Further, at these scales, the spatial and temporal resolution of RS images becomes a critical factor (Dele’colle et al. 1992; Launay and Gue’rif 2005).

4.4.3 Data Assimilation Algorithms

The currently used DA algorithms include KF, ensemble Kalman filter (EnKF), particle filter (PF), hierarchical Bayesian method (HBM), three-dimensional variational data assimilation (3DVAR), and four-dimensional variational data assimilation (4DVAR). All these algorithms are discussed in detail in the following section under the broad categories of the variational approach, KF and Bayesian Monte Carlo approaches. The variational approach can optimize a given criterion such as the minimization of a cost function, hence solving the assimilation problem. It is observed that in a wide range of functions, optimizers are used to solve a generic cost function problem for DA into a crop growth model. 3DVAR can assimilate observations without considering temporal dependency (Lorenc 1986). It can use the complex observation operator, hence making it easier to assimilate for state variables of non-director-related non-linear observations. However, 3DVAR model is limited in practical applications because of higher computational cost. Hence, 4DVAR was developed using 3DVAR algorithm to overcome such problems. 4DVAR integrates the solution over time (Le Dimet and Talagrand 1986). LAI or FAPAR is the mostly used linking variables between satellite observations and models. This is probably due to their straight forward representation or the connecting point within the crop growth simulation model and the wider availability of satellite-derived LAI and/or FAPAR products (Fig. 4.3).

Fig.4.3
figure 3

A schematic framework of spatialized crop growth simulation model with EO data assimilation

Many researchers have explored the empirical relationship between various vegetation indices (VIs) with LAI and fAPAR or using radiative transfer models (RTMs) to convert LAI to reflectance. However, it is important to understand the limitations of the observation operators, in both cases. The assimilation of fAPAR is not necessarily the same as assimilating VIs. KF method cannot be used to address high-dimensional data. Hence, it is often difficult to generate inputs for crop canopy state variables, structure, and model uncertainty. To overcome these problems, Evensen (1994) developed the EnKF. Many studies have demonstrated that the EnKF method is very helpful for DA between crop models and RS data (Crow and Wood 2003; Hadria et al. 2006; De Wit and Van Diepen 2007; Bolten et al. 2009; Nearing et al. 2012). The KF equations hold good only with linear CGMs and linear observation operators and assume all the statistical functions as Gaussian. However, dynamic crop models are often not linear, as the growth process is affected by many factors, such as solar radiation, temperature, moisture, and other crop management factors. These interactions cannot be simulated adequately by linear models. Hence, a standard KF cannot be employed directly for carrying out the assimilation process.

If direct RS-based measurements such as backscatter coefficients, radiance, and reflectance are to be assimilated, the local linear approximation needs to be feasible. If these approximations are available by using emulators (Gómez-Dans et al. 2016), then EKF might be an efficient alternative to the EnKF (Evensen 2009). RS-derived products can give direct, uncertainty-quantified measurement of state vector components like LAI. It can be directly connected to the model predictions. KF will be a good choice if the error related to the data product follows Gaussian distribution. It is a sequential approach to measure state vector at different points of time by considering the probability distribution of the variables for each time frame. It uses a series of measurements related to statistical noise; other errors are observed over time. Hence, the estimation of an unknown variable using such approaches tends to be more accurate than those based on a single measurement alone. However, the use of EnKF is an alternative approach to assimilate such products as most of the CGMs are non-linear.

The filtering approaches are casual as compared to the variational approach due to the only use of past information to assimilate a current observation. The variation approaches need information from the whole assimilation temporal window, resulting in a more constrained problem compared to filters (Huang et al. 2019a). Besides, filtering approaches facilitate near real-time operation with an on-line updating facility. The similarity between KF and 4DVAR is the fact that both follow the Gaussian assumption and Bayes’ rule. However, in CGMs and non-linear observation operators, the uncertainties in the model don’t follow Gaussian distribution, assuming normality in the posterior might be a poor choice. Rather, sampling-based methods like Markov chain Monte Carlo (MCMC) (Gilks and Roberts 1996) is a good choice. It uses the Markov chain to produce samples from the posterior pdf that will work for any problem, provided that the chain is allowed to run for a sufficient number of iterations. Again, a convergence of the chain is hard to diagnose. Hence, many thumb rules (R^ indicator) are usually employed (Cowles and Carlin 1996; Gelman et al. 2013). MCMC methods are more appropriate where the dimensionality of the problem is not very high. These methods become slow when the dimensionality is very high; hence, it is difficult to achieve convergence in the desired timescales. The functional equivalents of MCMC are sequential Monte Carlo methods, such as particle filters (PF). PF facilitates the propagation of non-Gaussian distributions through complicated CGMs. PF shows some potential for DA to integrate RS-derived biophysical parameters with a crop model as compared to widely used EnKF (Jiang et al. 2014; Machwitz et al. 2014; Chen and Cournède 2014). An important consideration for PF is that a large number of particles may be required to reliably describe the posterior pdf, particularly when the dimensionality of the problem increases. The approach appears to be promising for non-linear CGMs. Several researchers have demonstrated the applications of PF for crop model parameterization and uncertainty analysis (Makowski et al. 2002; Iizumi et al. 2009; Dumont et al. 2014). A detailed list of various algorithms and their usages along with crop models and RS data is presented in Table 4.3.

Table 4.3 Previous research on the assimilation of remote sensing parameter in the crop model

4.5 Future Scopes and Challenges

The spatial crop growth model can simulate regional crop growth development and yield using GIS and RS data. Studies conducted worldwide reported high simulation accuracies of the model on a coarser scale. But there is a scope to further improve upon the existing models to operate at a finer simulation unit like village or Gram Panchayat level without losing the accuracy. The crop models used today have their intrinsic limitations. Most of the crop models do not have the modules to simulate the impact of diseases, pest infestation, and climatic disasters such as flooding, hail, strong winds, and high temperature. The genetic coefficients of crop varieties in the crop models are fixed by field trials or from literature, and it is not possible to generate such coefficients on a regional scale by carrying out a large number of field experiments. Similarly, feeding the crop models with spatially divergence crop management information like input for irrigation, fertilization, sowing, etc. is furthermore challenging. Difficulties are also encountered to accommodate high spatiotemporal variations of the soil and weather parameters. Hence, the error is also high in parameterizing the soil properties such as soil moisture, soil texture, soil nutrients, carbon, and nitrogen content and daily weather inputs like maximum, minimum temperature, rainfall, and solar radiation. Low accuracy in the input data results in poor prediction of the model simulation. Most of the models assume uniform field growth situations (like the potential and water-limited production conditions), but in reality, several limiting factors can occur in the field. Hence, the actual field conditions are beyond the defined boundary conditions of the model range. These errors introduced through parameterization of crop model, hence influencing the accuracy of biomass, LAI, and yield estimates both at regional and global scales. To reduce the above-mentioned error, the following general issues need to be addressed:

  • What are the sensitive input parameters for the model?

  • Which of these parameters can be retrieved accurately through RS technique and how?

  • Which are the most suitable assimilation techniques for incorporating RS data into the model?

  • What is the effect of assimilation on simulated output?

  • What is the effect of spatial and temporal resolution on the predictive power of the model?

At the same time, some of the specific methods have been followed to reduce the uncertainty in simulation such as (1) the addition of modules for simulating the impact of diseases, insect/pest infestation, and climatic disasters and (2) combination of global sensitivity analysis such as Morris, extended Fourier amplitude sensitivity test (EFAST), intelligent optimization algorithms (IOA), MCMC, GA, general likelihood uncertainty estimation (GLUE) for parameter optimization and model accuracy improvement. Hence, the development of IOA for carrying out sensitivity analysis, calibration, and validation of crop models along with RS data retrieval and assimilation will be a demanding area of research in the future.

The prediction accuracy of crop models is improved through the assimilation of RS data. However, the RS data mainly obtained using various optical sensors may also contain some errors in a regional domain (Huang et al. 2001; Duchemin et al. 2003; Hadria et al. 2006; Thorp et al. 2010; Yao et al. 2015). Currently, the VIs derived from RS data is difficult to satisfy the requirement of crop models both temporally and spatially. Further, RS has challenges such as directional problem, scale effect and scale transformation, and retrieval techniques and method. These factors impact the retrieval accuracy of canopy state variables using RS data and at the DA chain (X. Jin et al. 2018). Crop area delineation is the foremost prerequisite of spatialization of CGMs, and towards this end, object-oriented image analysis (Blaschke 2010; Qi et al. 2012; Gu et al. 2017) could provide better representative crop maps for the models. Object-oriented classification enables the acquisition of a variety of spatial and textural features from multi-temporal RS images and carries out segmentation followed by crop area delineation. Improved algorithms based on machine learning techniques (Skakun et al. 2015; Guo et al. 2016) are very much useful for pixel-level classification and analysis of multispectral and multi-temporal RS data. Currently, the key problem of many RS-based parameter retrieval is due to the ill-posed issues during the inversion process (Li et al. 2015a,b). Though there is no definitive solution to the inversion problem, the introduction of the prior knowledge could provide better convergence. The uncertainty introduced by DA could be improved by combining different DA algorithms (such as a combination of EnKF and 4DVAR) (Dong et al. 2013). Development in the area of hyperspectral RS data can further improve the estimation accuracy of canopy state variables and soil properties at the field scale based on a combination of spectral shapes and spectral indices (Frels et al. 2018). With the fast development of versatile, lightweight, and low-cost portable sensors on the unmanned aerial vehicle (UAV) platform, newer avenue of RS data generation with the high spatial and temporal resolution is evolving (Bendig et al. 2015; Adao et al. 2017). Though this UAV technology is best suited to acquire high spatial and temporal RS data at the field scale, it does not provide regional-scale data promptly because of the small spatial coverage of UAV. To improve the stress detection and crop monitoring activities through RS, fluorescence-based sensors have been developed recently. In this context, Fluorescence Explorer (FLEX) of the European Space Agency is expected to monitor the photosynthetic activity of vegetation through chlorophyll fluorescence on a global scale. Fluorescence is considered to be a more accurate and earlier indicator of plant growth and stress than other biophysical parameters used (LAI, fAPAR), but the fluorescence data product would at a coarser scale account for its weak signal. Hence, lots of research is required on this fluorescence data retrieval and assimilation into the crop model. The recent trend in the development of constellations of nanosatellites (mass < 10 kg) is another area of research in satellite RS. An operational example is the constellation of Planet Lab’s “Doves”, which are designed to cover the globe daily at 3–5 m spatial resolution. Development of intelligent algorithm to handle big data generated through multiple sensors on different platforms and assimilating these data into process-based CGSMs at the required scale and resolution will become key research directions in the future.

4.6 Conclusions

Crop models and RS had parallel development courses, and both the technology complement each other. At the same time, the spatialization of CGMs demands the synergistic use of both. The combined use of various RS datasets and crop models using new DA methods could improve the retrieval accuracy of crop canopy state variables, soil properties, etc. The major challenge in the spatialization of CGMs is to address various issues and limitations of both techniques. In this chapter, a detailed discussion is carried out on the evolution, scope, and limitation of popular process-based crop growth simulation models at the beginning. The techniques and issues involved in the spatialization of crop models particularly the development of a spatial framework on the GIS environment and addressing the availability of data at various scales are discussed with emphasis on soil, weather, and retrieved crop biophysical parameters. Spatialization involves a change of scales in input, processes, and output. Various limitations of scale change are addressed in this chapter under various sections. The retrieval of biophysical parameters from RS data and its subsequent assimilation into the model is the central theme around which the entire concept of spatialization of the crop growth model revolves around. Different techniques for retrieval of crop biophysical parameters from RS data such as empirical, physical, and hybrid approaches are presented along with some of their recent application. This section also covers scale effect, optimal scale, and pixel heterogeneity and related issues involved in the retrieval process. The concept of RS DA into a crop growth model is discussed along with various algorithms. A list of recent studies on RS DA is presented. An attempt is made to cover all recent development and future scope for research in the area of spatialization of crop growth models.