Introduction

Assessing crop yield at finer scale is necessary to understand the yield variability at farm level, which in turn, is useful to measure and improve the productivity of small and marginal farm holdings through proper decision making on managing the farm level problems. The finer scale yield prediction at village and gram panchayat (GP) level is also required for taking decisions on settling the crop insurance. In India, currently the crop insurance schemes were launched at insurance units, which in some states consists of a single gram panchayat while in some other states a group of gram panchayat forms the insurance unit. Hence, for settling the claim at insurance unit the yield must be predicted before harvest at gram panchayat level for timely settlement with credibility.

All approaches of crop yield prediction using remote sensing may be grouped into three major categories- Statistical models with spectral vegetation indices and linear or non-linear (Artificial Intelligence-based techniques) regressions, remotely sensed data assimilation into process-based crop simulation models and the efficiency-based semi-physical models with remotely sensed inputs. Statistical models using observations from satellites to estimate crop yields date back to the 1970s when spectral signature from satellite imagery was found to be associated with crop yield estimation (Kanga and Özdoğanb 2019). Vegetation index (VI) based regression models are location-specific and cannot be extrapolated seamlessly over larger areas with adequate accuracy (Moulin 1998). These models rely on large number of in situ or synthetic yield data to train statistical models or machine learning algorithms (Johnson 2014; Sibley et al. 2014). Since training data are available only at political units, these studies tend to produce crop yield maps at aggregated units or broad resolutions (Kanga and Özdoğanb 2019). A few recent studies attempted at producing finer-resolution yield maps based on Landsat data by training statistical regression models with synthetic yield data from crop-growth model simulations (Azzari et al. 2017; Jin et al. 2017; Lobell et al. 2015). Such methods are computationally efficient for large-scale mapping but can suffer from potential bias, especially when synthetic samples are not adequately representative of real-world conditions (Kanga and Özdoğanb 2019). The mechanistic and dynamic crop growth simulation models (CSM) can describe the response of plants to their environment both physiologically and quantitatively. Many CSMs have been successfully used to simulate crop yield in India like DSSAT (Singh et al. 2017; Timsina et al. 2008); CropSyst (Singh et al. 2008); InfoCrop (Aggarwal et al. 2006; Aggarwal et al. 2006; Kumar et al. 2013); ORYZA (Yadav et al. 2011).Many studies on the use of remotely sensed data in different CSMs for crop yield prediction have been reported (Tripathy et al. 2013; Liu et al. 2015) and are recently reviewed by Jin et al. (2018). Fully mechanistic approaches can be very accurate, provided they are properly calibrated but are the most computationally demanding, making their use for macro-scale applications limited, unless several simplifying assumptions are made (Grassini et al. 2015). Other methodology adopted to predict regional crop yield at different spatial scales is the efficiency-based semi-physical model that uses the photosynthtically active radiation (PAR) and the physiological parameter, fraction of PAR absorbed by the plant and the maximum radiation use efficiency. These models, also known as light-use efficiency or Production Efficiency Models (PEMs) rely on the conservative and positive response of carbon assimilation to increased solar radiation (Monteith 1972; Monteith and Moss 1977). This makes transferability to other regions less difficult. In addition, the models are relatively easy to parameterize and can be run efficiently over large areas (Marshall et al. 2018). PEM has been revised and used for yield estimation of crops including rice, wheat and soybean by Marshall et al. (2018). In India, this methodology has been used for yield forecasting of wheat (Patel et al 2005, 2006, 2010; Tripathy et al. 2014), mustard (Tripathy et al. 2017), rice (Dwivedi et al. 2019) at district level. However, there are very few studies carried out for crop yield prediction at finer scale for aggregating at block, taluka and GP level. Availability of the new satellite sensors with spatial resolutions of 10 m or finer has opened up new possibilities for farm-level monitoring of crops. With the 5-day revisit of Sentinel-2 since early 2017 with a resolution of 10 m and the roughly daily re-visit of Planetscope since late 2017 with a resolution of ~ 3 m, it is now possible to observe smallholder fields at a much higher frequency than ever before (Jin et al. 2019). The advantage of Sentinel 2 data over other fine resolution data is that it is available free of cost. The spatial resolution of the MSI of Sentinel 2 ranges between 10 and 60 m, but the useful bands for agricultural applications provide data either 10 m or 20 m spatial resolution. Due to their shorter revisit time of 10 days (5 days combining Sentinel 2A and 2B) and more detailed spatial resolution as compared to Landsat (16-day revisit and 30 m spatial resolution) and AWiFS (5 days and 56 m pixel size), more precision in sub-field monitoring can be performed. Thus, the data from Sentinel 2 is gaining relevance for use in agricultural context and even more specifically smallholder farming systems in developing countries (Segarra et al. 2020). Previous studies have highlighted the potential of Sentinel-2 to play a key role in estimating crop yield (Battude et al. 2016; Lambert et al. 2017; Skakun et al. 2017), but so far the potential for mapping within-field variability in yield has yet to be fully explored (Hunt et al. 2019). A simple downscaling technique based on Enhanced Vegetation Index (EVI) from MODIS-TERRA was developed by Shirsath et al. (2020) for generating yield of two rabi season crops (wheat and sorghum) at 500 m resolution. In a country like India where most of the agricultural farms fall under small and marginal land holding, crop yield prediction is required at still finer resolution for generating the aggregated yield at GP and Taluka level. Any vegetation index with visible and NIR bands can explain the greenness of the crop but not the yield as a whole. There is much more challenge to predict yield of kharif season crops using the satellite data in visible bands. The indeterminate and multi-picking crop yield estimation remains as a challenge even at coarser resolution. Besides the prediction challenges, obtaining the accurate reference yield also is equally necessary for the validation of the methodology at such finer level. The reference yield available is mainly the reported yield from government source at higher administrative level like district. These are also not available for the current season, which poses difficulty in testing the model prediction. The yield from crop cutting experiment (CCE) can solve this issue. However, for a multi-picking crop like cotton, this poses problems in terms of labour and time. To minimise the cost one may think of crop cutting once and applying some correction factor based on the actual crop cutting for estimating the final cotton yield. The correction factor suggested in CCE manual for the missed picking is based on the average of CCE conducted elsewhere in a village (CSO 2008). Some reported studies used constant boll weight for all the bolls throughout the crop season (Prostko and Cothren 1998; McCarty and Bowman 2012). However, agronomical, physiological studies showed that boll size, and boll weight reduces at the later stages of crop growth due to various stresses (El-Mohsen and Amein 2016; Thakur 2020). Hence, there is a need to standardize a methodology for generating the correction factor.

All these reviews indicate the gaps and need for yield prediction at GP and taluka level, as well as to address the challenges involved in this. Difficulties in generating reference yield at this finer scale for validation especially for the multi-picking cotton crop states the need for a standardize methodology for the same. The availability and potential of the high-resolution satellite data to address the issue have also been shown. The objective of the present study was to demonstrate the methodology for predicting yield of three major crops of India (rice, wheat and cotton) at GP, taluka and district level through fusion of fine (10 m, 20 m), moderate (56 m, 500 m) and coarse resolution (4000 m) satellite data in the efficiency-based semi-physical model. The novelty of this study lies in the data fusion approach using parameters from multiple spatial resolution of different Geo-LEO satellites within the basic semi-physical model framework. Here the crop yield refers to threshed grain yield for wheat, paddy (rice grain with cover) for rice and seed yield for cotton crops. As this study directly computes yield at 10 m resolution using a semi-physical model, no historical or synthetic yield data series is required to develop model. The crops under study include wheat that is a rabi season crop (Mid October to April), rice, which is a kharif season crop (July-December) and cotton, which is a long duration kharif crop (June–February). In addition, this study also has developed a methodology for estimation of reference yield from CCE of the multi-picking cotton crop. The methodology and output of the study will be useful for the insurance settlement at GP level as well as for farm-scale crop management.

Study Area

The study was carried out in four districts of four different states of India with three different crops. Rice crop was taken in the Cuttack district of Odisha, wheat in Fatehabad of Haryana and Indore district of Madhya Pradesh (MP) while the cotton crop was considered in the Rajkot district of Gujarat (Fig. 1) state of India. These districts fall under different agro-climatic zones with different climate and soil type. The average agricultural field sizes in these regions range from 600 m2 in Cuttack to 6000 m2 in Fatehabad and Rajkot to 1 ha in Indore district. Field size is based on the ground truth sites and croplands in google earth images over the respective districts. Cuttack district is situated in the eastern part of Odisha and lies within geographical bounds of 20°00′N to 20°40′N Latitude and 84°52′E and 86°01′E Longitude. The district is divided into 3 sub-divisions and 14 blocks with 342 GPs. The district is characterized by tropical monsoon climate. Lowest and the highest temperatures recorded for the district are 7.5 °C and 42.0 °C, respectively. The normal annual rainfall is 1500 mm. Major soil types in the district are Alfisol, Ultisol and Entisol. The major kharif season crop is rice and the major rice varieties in the district for kharif season are Lalat, Naveen, and Swarna. The district of Fatehabad is bounded by 28°48′N to 29°17′N Latitudes and 76°29′E to 77°13′ E Longitude. It is divided into 3 talukas and 5 blocks. Climate of the district can be classified into tropical desert and steppe, which is mainly dry with very hot summer and cold winter. The normal annual rainfall of the district is 373 mm. Mean maximum temperature is 41.6 °C and mean minimum temperature is 5.5 °C. Major soil types of the district varied from sandy loam to loamy sand. In Fatehabad, the major rabi season crop is wheat and the major variety is U P-2338, WH-711 and HD 2851. Indore district lies in the Malwa plateau of MP state. The district boundary falls between 22°31′N and 23°05′N Latitudes, and 75°25′E and 76°15′E Longitudes. The district is divided into five talukas and four blocks. The climate of Indore district is hot and humid. Normal annual rainfall of the district is 960 mm. Major soil types in the district are medium black soil. The major wheat variety in Indore is Lok 1. Rajkot district of Gujarat lies between 20°30′N and 23°12′N Latitude and 70°00′E and 71°45′ E Longitude. The district comprises of 14 talukas. The district has semi-arid and sub-tropical climate. Mean annual maximum temperature is 33.8 °C and mean annual minimum temperature is 19.7 °C. Total annual rainfall is around 710 mm. The soils found in the district are mostly of Inceptisol and Entisol order. Texture-wise soils are sandy, loamy sand, clayey and silty type. The major cotton cultivar in Rajkot that falls in the Waged area is G.Cot 12.

Fig. 1
figure 1

Four districts (marked as magenta boundary) in four states (marked as cyan filled polygons) of India taken as study area with the respective taluka boundaries and crop name under study

Data

The data products from both geostationary satellite (GEO) and low earth orbiting satellites (LEO) at different spatial and temporal resolutions were used for this study (Table 1).

Table 1 Satellite data and data products with the respective source

In addition to the satellite data, data from ground observation and ancillary data from published report has been used. These include-

  • Interpolated minimum and maximum temperature data (daily, 5 km) for May 2018-December 2019

  • Maximum radiation use efficiency (ε0) from literature

  • CCE data averaged at GP, taluka and district level

Methodology

Data Processing

Daily insolation data product from INSAT 3D has been downloaded from MOSDAC (www.mosdac.gov.in) over the crop season. The processing of daily insolation involved the conversion of daily insolation to 8-day product (sum), resampling to 10 m resolution and conversion of projection to geographic from the Transverse Mercator projection.

MODIS surface reflectance, NDVI and fAPAR product (8-day composites) with 500 m spatial resolution has been acquired (https://lpdaac.usgs.gov). MOD09A1 or MODIS Surface Reflectance 8-Day L3 Global 500 m file contains seven spectral bands of data in the visible and infrared region. The band widths of these bands are Band 1: 0.620–0.670 µm; Band 2: 0.841–0.876 µm; Band 3: 0.459–0.479 µm; Band 4: 0.545–0.565 µm; Band 5: 1.230–1.250 µm; Band 6: 1.628–1.652 µm and Band 7: 2.105–2.155 µm. For this study, we used band 2 and 6 for computing Land Surface Water Index (LSWI). The MOD13A1 v6 data product contains the MODIS/TERRA Vegetation Indices 8-day L3 Global 500 m data from Terra. MOD15A2H data product from MODIS contains the MODIS/TERRA FPAR 8-day L3 Global 500 m data from Terra. The processing of all MODIS data included the mosaicking of tiles for the study area, resampling to 10 m resolution and conversion to geographical projection from the sinusoidal projection.

Cloud free Sentinel-2A data Level 1C product over the study area was collected for the whole crop season of the respective crops from the Copernicus Open Access Hub. MultiSpectral Instrument (MSI) of Sentinel-2A consists of 13 bands in the visible to the shortwave infrared region of the electromagnetic spectrum at different spatial resolution. These constitute four bands at 10 m in the classical broadband in visible (blue (0.490 µm), green (0.560 µm), red (0.665 µm)); and near-infrared (0.842 µm) region; six bands at 20 m, four narrow bands in the vegetation red edge spectral domain (0.705, 0.740, 0.775, and 0.865 µm), and two longer SWIR bands (1.610 and 2.190 µm); and three bands at 60 m dedicated to atmospheric correction (0.443 µm for aerosols and 0.940 µm for water vapor) and for cirrus detection (1.380 µm) (Segarra et al. 2020). In the present study, we used the two bands at 10 m resolution, red (0.665 µm), NIR (0.842 µm), for computing NDVI and two bands in 20 m resolution, NIR (0.865 µm and SWIR (1.610 µm) for computing the LSWI. These four bands were mosaicked and subset for the respective districts under study were extracted.

Generation of Model Inputs

The surface reflectance in the red and NIR at 10 m resolution from the Sentinel-2A level 1C product was used to compute NDVI for each image. Maximum NDVI of each month was derived at 10 m spatial resolution, monthly composite for each month of the rice-season was generated and used for further analysis. The relationship between the NDVI from Sentinel-2A and MODIS fAPAR at 500 m resolution was derived for each month. Rice is a kharif crop and in Cuttack district, rice season comprises of mid July to end December. MSI-Sentinel 2 cloud-free data was not available for the whole month of August. For the month of September, October, November and December cloud-free data was available only for one or two five-day periods. Hence, the relationship was derived using the monthly composite of NDVI and fAPAR for September, October, November and December. For the month of August, fAPAR of September was used. For wheat, similar regression equation developed by Bairagi et al. (2018 ) was used to generate the fAPAR at 10 m resolution. For cotton, the ratio of fAPAR to Sentinel NDVI was used to generate the fAPAR at 10 m resolution.

Maximum radiation use efficiency (ε0) for each crop was taken from literature. The ε0 of 2.2 g MJ−1 PAR for rice crop (Kiniry et al. 1989), 3.0 g MJ−1 PAR for wheat crop (Stockle et. al. 1992) and 1.8 g MJ−1 PAR for cotton crop (Pinter et al. 1994) were used. The eight-day sum of insolation was used to compute the PAR for the whole crop season and 50% of the insolation was assumed as PAR based on literature (Tsubo and Walker 2005). Water scalar (Tripathy et al. 2014) was computed through normalization of LSWI (Xiao et al. 2002) with respect to LSWImax. The LSWI was computed using surface reflectances in NIR (0.865 µm) and SWIR (1.610 µm) bands of the sentinel data at 20 m resolution. The crop mask was applied before computing the water scalar. Daily average temperature was computed from the daily maximum and minimum temperature of IMD weather data, interpolated to 5 km grid using thin-plate spline technique (Nagori and Chaudhari 2020). The temperature scalar (Raich et al. 1991) in a scale of 0–1 was computed for each 8-day period over the crop season using the daily average temperature and the cardinal temperature for the respective crop (Tripathy et al. 2014). As for kharif crops the temperature scalar is almost one for the regions under study, temperature scalar was used only in case of rabi crop, wheat. Cardinal temperature used for wheat are-Tmax: 35, °C; Tbase: 6 °C and Topt: 25 °C (Tripathy et al. 2014). Planting date for rice was derived using ISODATA classification scheme over the time-series NDVI from MODIS at 16 days interval. Polynomial curve fitting was used to smooth the NDVI of each class. The first positive change in the NDVI of each class was noted as the date of spectral emergence date (SED). A threshold of 10 days prior to SED was noted as the transplanting date of rice. For wheat and cotton, most frequent date of sowing among the selected CCE sites in the district under study was taken as the planting date of the respective district. All of the above input data were resampled to a common target resolution of 10 m. Field observed biomass and grain yield from the CCE were used to calculate the harvest index of the respective crop in the respective taluka. Past two years CCE data were used to compute harvest index (HI) and then averaged for each taluka to avoid human error while collecting CCE data.

Semi-Physical Model for Yield Estimation

We used a semi-physical model based on radiation use efficiency which computed biomass of each crop for a given time step (here 8-day period) from sowing to harvest for each crop using the periodical PAR, fAPAR, Wscalar, Tscalar and maximum radiation use efficiency using Eq. 1 (Tripathy et al. 2014).

$${\text{Biomass}} = \mathop \sum \limits_{{{\text{planting}}}}^{{{\text{Harvest}}}} ~\varepsilon _{0} *{\text{PAR}}*{\text{fAPAR}}*{\text{WS}}*{\text{TS}}$$
(1)

where PAR: Sum of incident photosynthetically active radiation over 8-days (MJ m−2 day−1), fAPAR: fraction of incident PAR which is intercepted and absorbed by the canopy (dimensionless), ε0: Maximum radiation-use efficiency (g MJ−1); WS is the water scalar and TS is the temperature scalar (varies from 0 to 1).

Total biomass was computed for each crop by summing up the 8-day biomass over the crop duration (planting date to harvest date). Harvest date for rice and wheat was derived using the crop duration for the respective cultivar from literature. For rice, the major varieties in Cuttack district were found to be Swarna, Lalat and Naveen (CCE data) and the duration varies from 110 to 160 days [Pathak et al. (2019), NRRI bulletin no 13]. The duration of the major wheat varieties in Fatehabad and Indore district ranged from 160 to 180 days (Directorate of Wheat Development 2016). For cotton, in Rajkot, the crop duration of 170 days (Singh and Kairon 2019), Technical Bulletin, CICR, Nagpur, www.cicr.org.in) was considered. Grain yield was computed using the harvest index derived for each crop for each taluka in each district under study. The whole methodology is shown in Fig. 2.

Fig. 2
figure 2

Methodology for estimating crop yield at 10 m resolution (PAR—Photosynthetically Active Radiation, fAPAR—fraction of PAR absorbed by crop, APAR—Total Absorbed PAR by the crop, LSWI—Land Surface Water Index, NDVI—Normalized Difference Vegetation Index, ε0 and ε—Maximum and actual Radiation Use Efficiency (g MJ−1), respectively)

Reference Yield for Validation from CCE

For rice and wheat, CCE was carried out at harvest of the crop from the selected locations of each taluka in the respective districts under study. The plot size was 5 × 5 m for rice and wheat while 10 × 5 m for cotton crop. For all three crops, CCE locations were generated using a stratified procedure, so-called smart sampling, based on remote sensing and GIS (Chaudhari et al. 2019).

For cotton crop, a new robust methodology based on boll count and adjusted boll weight at the time of second picking was developed. Number of empty bolls available during the second picking was counted, which were picked at first picking. Similarly, number of remaining immature bolls for latter pickings was also counted along with the actual bolls picked during the second picking. The average boll weight of 100 randomly selected cotton bolls was taken at the time of second picking, which was then adjusted with boll weight reduction factor expected at the later pickings. The following equation was used to estimate cotton yield from CCE at second picking.

$$\begin{aligned} {\text{Total}}\,{\text{boll}}\,{\text{count}}\,\left( N \right) & = {\text{Numbers}}\,{\text{of}}\,{\text{Empty}}\,{\text{bolls}}\,{\text{of}}\,{\text{First}}\,{\text{Picking}} + {\text{Number}}\,{\text{of}}\,{\text{bolls}}\,{\text{at}}\,{\text{II}}^{{{\text{nd}}}} \,{\text{Picking}} \\ & \quad + {\text{ Number}}\,{\text{of}}\,{\text{Immature}}\,{\text{bolls}}\,{\text{remained}}\,{\text{on}}\,{\text{the}}\,{\text{plant}}\,{\text{for}}\,{\text{later}}\,{\text{pickings}} \\ \end{aligned}$$
(2)

The reduction factor was computed based on the CCE observations in different strata representing the different vigour of cotton crops (Eq. 3).

$${\text{Reduction Factor}}\,\left( {{\text{RF}}_{i} } \right) = 0.3*\frac{{{\text{BW}}_{i} - {\text{BW}}_{\min } }}{{{\text{BW}}_{\max } - {\text{BW}}_{\min } }}$$
(3)

where BWi is the average boll weight of 100 random bolls for the ith CCE (in g).

An adjusted average boll weight (in g) for ith CCE observation was computed as:

$${\text{Adjusted}}\,{\text{Boll}}\,{\text{Weight}}\,\left( {{\text{BW}}_{a} } \right) = {\text{RF}}_{i} *{\text{BW}}_{i}$$
(4)

Finally, the seed cotton yield (t/ha) for the ith CCE plot was computed as:

$${\text{Seed}}\,{\text{Cotton}}\,{\text{Yield}} = \frac{{{\text{BW}}_{a} *N}}{{100*{\text{PLOT }}\,{\text{SIZE}}\left( {{\text{m}}^{2} } \right)}}$$
(5)

Statistical Evaluation of the Yield Estimation

The model was validated by comparing the estimated and CCE yield data at GP, taluka and district level for the three crops under study. Four different statistical metrics, such as correlation coefficient (r), Root Mean Square Error (RMSE), Mean Absolute Error (MAE) and Index of Agreement (IoA, Wilmot 1982) were used to test the model. Following equations were used to compute these statistical indicators.

$$r = \frac{{n(\sum xy) - \left( {\sum x)(\sum y} \right)}}{{\sqrt {\left[ {n\sum x^{2} - (\sum x)^{2} } \right]} \left[ {n\sum y^{2} - (\sum y)^{2} } \right]}}$$
(6)

where x and y are predicted and observed yield values

$${\text{RMSE}} = \left[ {n^{ - 1} \mathop \sum \limits_{i = 1}^{n} x\left( {Pi - Oi} \right)^{2} } \right]^{0.5}$$
(7)
$${\text{MAE}} = n^{ - 1} \mathop \sum \limits_{i = 1}^{n} \left| {Pi - Oi} \right|$$
(8)
$${\text{IoA}} = 1 - \frac{{\mathop \sum \nolimits_{i = 1}^{n} \left( {Pi - Oi} \right)^{2} }}{{\mathop \sum \nolimits_{i = 1}^{n} \left( {\left| {Pi^{^{\prime}} } \right| + \left| {Oi^{^{\prime}} } \right|} \right)^{2} }},\quad 0 \le {\text{IoA}} \le 1$$
(9)

where n is the number of cases, Pi and Oi are predicted and observed value of ith case, respectively. \(Pi^{^{\prime}} = Pi - \overline{Oi}\)., and \(Oi^{^{\prime}} = Oi - \overline{Oi}\). (\(\overline{Oi }\) is the mean of observed value f all cases).

Results and Discussion

Planting Date of Rice

Two dates of transplanting of rice crop were observed (1st and 2nd week of August 2019) in in Cuttack district of Odisha. Around 75% of the rice pixels showed planting in the second week of August 2019 (Fig. 3). These dates are within the range of the observed dates in the respective talukas.

Fig. 3
figure 3

Transplanting date of rice using time-series NDVI

Finer Scale fAPAR for Rice Crop

The relationship between fAPAR from MODIS at 500 m and the NDVI derived from the red and NIR reflectance of Sentinel 2 showed strong linear relation in each of the four months. The R2 of the relationship varied from 0.88 in October to 0.95 in September and December (Fig. 4). The slope of the regression lines varied from 0.79 to 0.96 with the highest corresponding to the month of October while the intercept was found be the lowest (0.033) in October and highest (0.105) in November. These equations were used to derive the fAPAR at 10 m resolution using the Sentinel 2A NDVI of the respective months. For the month of August, fAPAR was assumed to be the same as that of September as no clear sky Sentinel data was available for developing the relationship in August.

Fig. 4
figure 4

Month-wise relationship between Sentinel-2 NDVI to MODIS fAPAR for rice crop season

Finer Scale Yield Variability of Rice, Wheat and Cotton Crop

Estimated paddy yield was found to vary from 0.9 to 6.8 t ha−1 at pixel-level of 10 m spatial resolution (Fig. 5). Distribution indicated that number of pixels is highest in the yield range of 4–5 t ha−1 category followed by 3–4 t ha−1 and 5–6 t ha−1 category (Fig. 5).

Fig. 5
figure 5

Rice yield (t ha−1) at 10 m resolution and distribution of yield range in Cuttack district

The highest yield was estimated in Mahanga (4.98 t ha−1) Taluka followed by Niali (4.86 t ha−1) and lowest in Banki (3.88 t ha−1) taluka. District average yield was found to be 4.35 t ha−1.

In Indore, the wheat yield varied from 2.5 to 4.5 t ha−1 while in Fatehabad it ranged from 2.5 to 5.5 t ha−1. Maximum number of pixels fall in the range of 3.5–4 t ha−1 followed by > 4.5 t ha−1 range in Indore while in Fatehabad, maximum number of wheat pixels fall in the range > 5.5 t ha−1 followed by 4.5–5.5 t ha−1 range (Fig. 6a, b). The highest yield was obtained for the taluka of Sanwer (4.2 t ha−1) and the lowest yield was estimated in the taluka of Moho (3.3 t ha−1) in Indore district. The yield range from CCE matches with this estimation. In Fatehabad, the lowest yield was estimated in the taluka of Bhuna (4.5 t ha−1) and the highest yield was obtained in Ratia (5.5 t ha−1).

Fig. 6
figure 6

Wheat yield (t ha−1) at 10 m pixel level and distribution of yield range in a Indore district of MP and b Fatehabad district of Haryana

In Rajkot district, the estimated seed cotton yield varied from 0.75 to 1.75 t ha−1 in different cotton pixels. Maximum number of pixels (80% of total cotton pixel) fall in the range of 0.75–1.25 t ha−1 (Fig. 7). The lowest seed yield estimate was obtained in the taluka of Jasdan (0.8 t ha−1) and the highest yield was in the talukas of Jamkandorna and Kotda Sangani (1.6 tha−1).

Fig. 7
figure 7

Seed yield of cotton (t ha−1) and distribution of yield range in Rajkot district of Gujarat

Validation of Estimated Yield

The model results were validated at three different levels GP, Taluka and District. Comparison was made with the CCE yield at these levels and four statistical indicators were generated to evaluate the results. GP level validation was carried out for rice crop in Cuttack district and wheat crop in Indore district. Taluka level validation was carried out for all the crops in the four respective districts.

Validation at GP Level

At GP level, for rice crop in Cuttack district of Odisha, comparison of estimated rice yield with the average rice yield from CCE at the respective GP showed absolute difference of less than 25% for 219 GPs out of the 249 GPs within the 14 talukas of Cuttack district (Fig. 9a). Correlation between estimated and reported yield of these 249 village was found to be 0.44 and RMSE was found to be 16.5% of the mean CCE yield (Table 2). Mean relative deviation for all 249 villages was found to be around 9%. In Indore 18 villages were selected in the 5 talukas for validation at GP level which is well distributed over Indore district to represent the overall wheat crop of the district. Comparison of estimated wheat yield with the average wheat yield from CCE in the respective GP showed absolute difference of less than 10% for all the selected GPs (Fig. 9b), the highest being in the GP of Khemana (8.3%). Correlation between the estimated and CCE yield of those 18 GPs was found to be 0.91 and RMSE was found to be 5.8% of the mean GP yield (Table 2). Mean relative deviation was found to be − 4.15% for the all 18 GPs. The index of agreement was 0.6 for rice and 0.87 for wheat. In majority of the GPs the model underestimated the yield both in rice and wheat (Fig. 8). These results showed that rabi crop (wheat) yield estimation using the current methodology is more accurate than the kharif crop especially in east India (rice in Cuttack district).

Table 2 Statistical indicators for evaluating the model performance at GP level
Fig. 8
figure 8

Comparison of estimated crop yield and the observed yield from CCE at GP level a rice in Cuttack district, b wheat in Indore district (CI confidence interval)

Validation at Taluka Level

For rice crop in Cuttack district of Odisha, comparison of estimated rice yield with the average rice yield from CCE at the respective taluka showed absolute difference of less than 20% for all talukas except for Nischintkoili (33%) (Figs. 9a, 10a). Correlation between estimated and reported yield is 0.43 and RMSE was found to be 15.1% of the mean CCE yield (Table 3). For wheat crop in Indore district of MP, comparison of estimated wheat yield with the average wheat yield from CCE in the respective taluka showed absolute difference of less than 12% for all talukas. Correlation between the estimated and CCE yield was found to be 0.89. In Fatehabad, the absolute difference between the estimated and CCE yield at the five talukas was less than 8% the correlation coefficient was found to be 0.61 (Fig. 9b). The lowest deviation was found in the taluka of Tanwer (1.03%) in Indore district while in Fatehabad the lowest deviation was found in Fatehabad taluka (0.46%). The deviation was highest in the taluka of Hatod (11.4%) in Indore and in Bhuna taluka of Fatehabad (7.38%) (Fig. 10b). Comparison of estimated cotton yield (seed) with the average cotton yield from CCE in the respective taluka of Rajkot district of Gujarat showed absolute difference of less than 15% for all the talukas except for Jasdan taluka where it was around 26% (Fig. 10c). Correlation between the estimated and CCE yield was found to be 0.98 (Fig. 9c). Lowest deviation was found in the taluka of Kotda Sangani (2.5%) (Fig. 10c).

Fig. 9
figure 9

Comparison of estimated and the CCE yield at taluka level a for rice, b for wheat and c for cotton (CI confidence interval)

Fig. 10
figure 10

Mean deviation of estimated crop yield from the average taluka yield from CCE a for rice, b for wheat and c for cotton

The statistical error metrics indicated that the model estimation was the best for cotton crop followed by wheat and rice (Table 3). The correlation coefficient is the highest for cotton (0.98) and the least for rice (0.43). Index of agreement was the highest for cotton crop (0.95). The RMSE was the lowest for wheat in both the districts (Table 3).

Table 3 Statistical indicators for evaluation of the model estimation at taluka level

For wheat, the results from the present study showed a RMSE of 0.23 and 0.21 t ha−1 in Indore and Fatehabad district respectively, which is found to be better than the previous study by Hunt et al. (2019) that reported RMSE of 0.69 t ha−1 for wheat yield estimation at field scale using the vegetation index from MSI-sentinel 2 data. They have also shown improvement in yield estimation with the use of environmental data like temperature and rainfall at 5 km resolution along with the VIs (RMSE 0.61 t ha−1). The better accuracy in wheat yield estimation could be due to the use of PAR and fAPAR, which are related to crop physiology. The accuracy of the kharif rice crop is comparable to the error in other kharif crop as reported by Shirsath et al. 2020 (RMSE of 15.1% as compared to the RMSE of 15.4% for another kharif crop, soybean) and the accuracy for wheat was found to be better than the reported result (RMSE: 5.8% and 4.4% in Indore and Fatehabad districts, respectively as compared to 10.5% in the previous study by Shirsath et al. 2020).

Validation at District Level

The difference in estimated rice yield and the average rice yield of the Cuttack district as derived from CCE was found to be 10.3%. The difference in wheat yield between the estimated and CCE yield of the district was 3.6% in Indore and 1.7% in Fatehabad. In cotton, the difference was found to be − 2.9% for the Rajkot district of Gujarat (Fig. 11). This study with the use of fine resolution data (10 m) has shown better accuracy at district level for wheat crop as compared to the use of moderate resolution data (250 m and 500 m) from MODIS that had resulted a mean deviation of around 25% as reported by Tripathy et al. (2014) and 7–9% as reported by Patel et al. (2006).

Fig. 11
figure 11

Comparison of estimated yield with the average of CCE yield at district level

Discussion

This study used the semi-physical model that uses physiological concepts such as the Photosynthetically Active Radiation (PAR), and the fraction of PAR absorbed by the crop (fAPAR) of crop as well as the major scalars responsible for reducing the potential yield at 10 m spatial resolution and hence, expected to predict yield with higher accuracy at 10 m resolution as compared to the vegetation index alone. Previous study also reported that the efficiency-based semi-physical model represents an important class of remote-sensing based approaches for estimating biomass/yield (Marshall et al. 2018). This study showed the acceptable limits of accuracy for all the three crops under study at GP level, Taluka level as well as district level. The yield estimation of the challenging kharif crop like rice in east India was demonstrated with satisfactory level of accuracy at GP and taluka level (RMSE of 16.5 and 15.1%, respectively). The new crop insurance scheme in India is being implemented with the GP as the insurance unit (Gulati et al. 2018) and the insurance product design and price is based on estimated yield values at the GP level. This methodology is the first step towards getting yield map at the GP and taluka level using satellite input at fine resolution of 10 m with reasonable accuracy, hence has immense importance as far as crop insurance study is concerned. This study will also help for the crop management at farm scale.

There are certain factors that might lead to observed difference between satellite-based fine-scale yield estimates and measured yield data. These are enumerated below:

  1. 1.

    Resampling of satellite-based inputs from various native spatial resolutions including coarser resolution (e.g. PAR, fAPAR) to target finer resolution (10 m) could propagate the errors in the final yield estimates.

  2. 2.

    Determination of certain important biophysical factors such as planting date is definitely associated with certain uncertainty, which is generally prominent especially at finer-scale. This also could contribute to error in yield estimates.

  3. 3.

    Certain assumptions (e.g. extrapolation of September fAPAR to July–August for rice and cotton crops due to data gap, use of reported crop duration single and use of single value for ε0 throughout the crop growth (it varies with growth stage in actual situation) were other source of uncertainty in this approach.

From the results, it was noticed that the error of yield estimates got reduced at coarser administrative units (e.g. Taluka, District) for all the three crops. It was noticed that the RMSE was the highest in the kharif rice followed by cotton and wheat crop at all the administrative units. This can be explained based on availability of optical remote sensing data and crop duration. Wheat being the rabi crop could have been assessed with required number of clear-sky high-resolution satellite data throughout the major growth period. Cotton being long duration kharif crop spreads from June to February. Therefore, availability of high-resolution optical remote sensing data was less than what we could get for rabi crop such as wheat. In case of kharif rice crop with duration of July/August–December, the high-resolution optical remote sensing data was available only during fifty percent of growth period (September to December).

All these source of uncertainty clearly indicates that availability of the fine-resolution optical data during kharif season and the availability of other model input like PAR at fine resolution are the major challenges that need to be answered through future research for improving the yield prediction at fine-scale. One solution towards this may be through the use of high-resolution Synthetic Aperture Radar (SAR) backscatter data and different polarimetric signatures in combination with the high-resolution optical data that will reduce the uncertainty of optical data availability during consistent cloudy-sky period and improve the accuracy in yield prediction especially for kharif crops.

Conclusions

The study has demonstrated a methodology for finer scale yield prediction of three major crops of India namely rice, wheat and cotton at 10 m resolution primarily with optical remote sensing data through fusion of the data at various resolutions from multiple satellites in a semi-physical model. The methodology was validated at GP, taluka and district level. The results of this study showed the immense use of fine resolution remote sensing data for finer scale yield estimation, which can be aggregated at GP and taluka level. The applicability of this research lies in the crop management at farm scale and for insurance settlement as per the new policy of PMFBY where the insurance unit is GP not sub-district. However, the accuracy needs to be validated for more number of years and at other locations for its operational application and the methodology may be tested for other crops. The future thrust area will be to use the fine-resolution satellite data with other modelling approaches like simulation model and machine learning approach for crop yield estimation. One major thrust area will be comparison of various data assimilation and downscaling approaches for using multi-source satellite data for increasing the accuracy in high-resolution crop yield prediction. In addition, the combined use of optical-SAR high-resolution remote sensing data needs to be explored especially for kharif crops for seamless primary productivity simulation using a semi-physical model at fine resolution.