Keywords

1 Introduction

In developing countries, the population explosion is the major issue for food security achievement. The future food grain yield is estimated using standard crop and environmental parameters. With the predicted crop yield, the government gets enough time to prepare itself for alternative food sources to face the food security threat (if arises). However, several optical reflectance-based vegetation indices (VIs) are very good evaluators of the local food security scenario well in advance through the earth observation-based remote sensing (RS) techniques (Lambert et al. 2018). Singha and Swain (2016) described the advantages of RS techniques based on multicriteria decision-making analysis, which has greater potential for sustainable agricultural planning. Integrating spatial analysis and sensor technology in the precision agriculture study can provide valuable crop information for field-specific management (ESA 2019). This also assists in monitoring variable fertilizer application, irrigation scheduling, biomass estimation, and harvesting at actual crop maturity by observing various agronomic parameters (Vallentin et al. 2021). Quick and fast crop yield estimation at the micro- to macro-level is very promising through remote sensing, further assisting in sustainable agricultural planning (Yuan et al. 2015). Assessing crop yield is one of the major issues to enhance the farmer's socioeconomic development and optimize the industrial processing demand. Earth observation-based satellite systems could better monitor the changes in crop growth due to management practices, climate change impact, the emergence of pests and diseases, or water stress (ESA 2019; Yuan et al., 2015). Remote sensing resulting in several spectral VIs has helped monitor crop productivity at the farm level (Shammi and Meng 2020). Machine learning (ML)-based regression models were useful for crop yield prediction through the in-situ filed information and RS-derived VIs in US Corn Belt (Ji et al. 2021). The above-said model could provide a reference for both physical and data organization. Several studies have shown that using ML techniques to analyze the data collected from the RS technique improves crop yield prediction (Shammi and Meng 2020; Guo et al. 2021). Locally field-based multi-temporal satellite remote sensing of crop VIs has good statistical significance with wheat grain yield up to 2 t ha−1 (Ali et al. 2019; Panek et al. 2020). European Space Agency (ESA) provides high-resolution Sentinel 2 (S2) data through open access for the environmental monitoring and assessment of agriculture, wetland vegetation, forestry development, etc. S2 data offers a high spatial, spectral, and temporal resolution which can estimate individual farm-level crop health status, soil moisture, crop stress, and crop yield information through RGB and near-infrared-based spectral VIs. Multiple Linear Regression (MLR), support vector machine (SVM), extreme gradient boosting (XGB), stochastic gradient descent (SGD), and random forest (RF) were employed for field scale prediction of soya yield using cloud-free S2 multispectral images derived from different VIs of specific growing periods (2018, 2019, and 2020) in Upper Austria (Pejak et al. 2022).

The simplified S2-MSI imagery can scientifically create different vegetation biophysical variables from the surface reflectance values such as Normalized Differential Vegetation Index (NDVI) (Rouse et al. 1973); Normalized Differential Water Index (NDWI) (Chandel et al. 2019); Brightness Index (BI) (Cavalaris et al. 2021), Difference Vegetation Index (DVI) (Kussul et al. 2020), Soil-Adjusted Vegetation Index (SAVI) (Nagy et al. 2021); Global Environmental Monitoring Index (GEMI) (Kobayashi et al. 2020); Perpendicular Vegetation Index (PVI) (Lu et al. 2016); Ratio Vegetation Index (RVI) (Quan et al. 2011); and Leaf Area Index (LAI) (Gaso et al. 2019). Using satellite images and VIs allows the farmers to identify different management zones on a commercial farm (Campillo et al. 2018). These indices were used to estimate vegetation characteristics as they mostly serve as indicators for crop dynamics and overall changes in biomass quantity and properties. Moreover, some indices are used to monitor changes in the water content of leaves, while others can suppress the soil's influence or eliminate the influence of the atmosphere.

Several kinds of literature reported that VIs-based crop yield estimation is very popular in RS domains for wheat (Nagy et al. 2018), corn (Panda et al. 2010), maize (Fang et al. 2011), rice (Son et al. 2013), sunflower (Ali et al. 2019). In recent years, vegetation reflectance values have been further analyzed through geostatistical and machine learning techniques as an alternative approach for better crop yield prediction (van Klompenburg et al. 2020). Koutsos et al. (2021) applied the statistical hotspot autocorrelation to identify low-yield areas for special attention to the better management of the agronomic inputs as a sustainable approach. LAI and NDVI are good predictors of crop yield through the logistic regression model (Kunapuli et al. 2015). Marino and Alvino (2021) stated that statistical hotspot analysis is a sound strategy for defining the spatial differentiation of crop yield variation on a small scale with the help of vegetation indices. The US and German organic agriculture used the clustering hotspot algorithm to better support their regional economic development (Marasteanu and Jaenicke 2016). The GIS-based interpolations techniques such as IDW, EBK, and Kriging are the most preferred technique for multi-crop yield mapping (McKinion et al. 2010). RF and SVM allowed predictive models of crop yield estimation using multi-temporal data for site-specific management in different seasons (Shah et al. 2018; Filippi et al. 2019). Even the logistic regression (LR) and Naïve Bayes (NB) models correctly predicted the corn nitrogen stress class with the help of hyperspectral sensors derived VIs (Laacouri et al. 2018; Mupangwa et al. 2020). MODIS-derived LAI estimated rice crop yield on a near real-time basis using gradient boosted regression in India (Arumugam et al. 2021). ML models are very helpful in predicting the number of crop yields for supporting food security in Africa (Chepngetich 2020; Cedric et al. 2022). Different vegetation indices, namely NDVI, OSAVI, RSI, MTCI, and BOP, incorporated with BP neural network, significantly impact the cotton yield prediction mapping in China. Ramos et al. (2020) used multispectral UAV-based 33 VIs to predict maize yield through the ranking-based RF model, where the NDVI, NDRE, and GNDVI VIs are the most precedence indices. On the contrary, Pham et al. (2022) showed that the ML approach is useful for enhancing the rice crop yield through the spatial variability of VCI/TCI data in Vietnam. Five MLS, namely elastic net (EN), linear regression (LR), support vector regression (SVR), and k-nearest neighbor (k-NN) are tested for the potato tuber quality mapping with the NDVI value in Atlantic Canada (Abbas et al. 2020). Sharifi (2020) proposed that integrating satellite remote sensing and ML techniques provides a powerful potential approach for a barley yield prediction map based on seasonal phenological behavior in eastern Iran.

S2 MSI data with 10 m resolution has a greater potential to build a decision-making tool for crop yield estimation through numerous vegetation spectral indices. With the accessibility of open-source Python and SNAP tools along with Geostatistics, ML, and area under the curve (AUC) models, we tried to integrate several biophysical spectral indices for yield estimation of rice and potato crops in Hooghly District (West Bengal, India). Nine major crop biophysical parameters were identified, namely NDVI, NDWI, BI, DVI, SAVI, GEMI, PVI, RVI, and LAI, for accurate crop yield estimation during two crop seasons (summer, 2018, and winter, 2019).

2 Materials and Methods

2.1 Study Area

The study was carried out in the Hooghly Region (Tarakeswar ~ 2,528,500 to 2,530,600 N and 604,500 to 606,500 E in UTM-WGS84 Zone 45 N India) during the 2018 and 2019 crop seasons (Fig. 8.1). The study region experienced rich fertile alluvial soil with high productivity of rice and potato compared to other districts of West Bengal (Singha et al. 2020). The densely populated region is supported by high cropping intensity (184%). The total study area was around 300 ha with an average MSL of 18 m.

Fig. 8.1
figure 1

Study area map

This area is categorized by tropical monsoon climate with an average annual rainfall of 1200‒1700 mm) and temperature of 15‒35 ℃ (Singha et al. 2019). Rice and potato are the main crops, along with jute and vegetables. The growing season for Kharif rice starts with the arrival of the monsoon in July, and crops are harvested from October to December. In the Rabi season, the potato crop is sown in November–December and harvested in March next year. The potato crop has many advantages over other local crops in terms of high crop yield, easy market access, short crop growth, etc.

2.2 Satellite Image Processing

Cloud-free Sentinel-2B MSI imagery was selected for the Kharif rice (October 17, 2018) and Rabi potato (March 6, 2019) season, with a temporal resolution of 10 days and 290 km swath width. The images were processed for crop biophysical status mapping and yield estimation of two crops. The descending node of S2B MSI offers a 5-day temporal frequency with an orbital overpass time of approximately 10:30 a.m. (Drusch et al. 2012).

The Copernicus Open Access Hub data is easily retrieved by being atmospherically and radiometrically corrected S2B bottom of atmosphere (BOA) reflectance with 10 m pixel size imagery (https://sci-hub.copernicus.eu/). The Sentinel Application Platform v.8.0.0 (SNAP-ESA) was used for image pre-processing. The orthorectified images were geo-referenced in WGS84 UTM zone 45 N with Survey of India (SOI) topographical maps (No. 79B/1, 1:50,000 scale). The nine ground control points (GCPs) developed the final study area location map with a handheld GPS receiver e-Trex 20 Garmin). The detailed workflow of this research methodology is presented in Fig. 8.2.

Fig. 8.2
figure 2

Detailed workflow of the research work

2.3 Vegetation Indices (VIs)

This chapter selected the spatial resolution of 10 m for different VIs as input parameters for estimating the spatial variability of rice and potato yield, namely slope-based- (NDVI and RVI), distance-based—PVI, soil noise-based—SAVI, and NDWI, BI, GEMI, LAI (Hatfield et al. 2008). All the VIs maps were developed through ArcGIS 10.7 software for individual farm plots extracted from S2 images for two peak growing seasons. VIs are developed as the combination of numerous wavebands (red, NIR, SWIR, green, blue) and is also related to various canopy estimated parameters (Table 8.1).

Table 8.1 Details of selected vegetation indices (VIs)

2.4 Yield Estimation

During the post-harvest processing, a set of randomly selected 70 farms plot-wise agronomic practice details, along with crop yield information for two crops, were collected. The crop yield data were collected along with a handheld e-Trex GPS receiver (Garmin Ltd., Olathe, Kansas, USA) in October 2018 for Kharif rice and March 2019 for Rabi potato. Rice yield was estimated using a 10 m × 10 m square quadrant at five places for each plot. The yield data is converted into kg/ha. Similarly, potato yield was estimated for five rows, and the crop row's dimensions (length and width) were measured. The yield data in a point vector format at a spatial resolution were converted to raster GeoTiff at 10 × 10 m pixel resolution corresponding to the Sentinel-2 image pixels. Then, the yield maps were developed by the Kriging interpolation technique through ArcGIS 10.7.

2.5 Statistical Analysis

Statistical analysis was carried out to validate the relationship between the VIs, generic crop yield, and estimate yield data for two crops. The normality of the dataset was organized by descriptive statistics integrating minimum, maximum, range, mean, standard deviation (SD), skewness, kurtosis, and coefficient of variation. All the statistical calculations were performed in the ‘R’ software v.4.0.5 (University of Auckland, New Zealand) environment.

2.5.1 Geostatistical Analysis

ArcGIS Geostatistical Analyst tool was also utilized to study the degree of spatial dependence between Sentinel MSI-based VIs and yield data in the ArcGIS environment. The gradation of consistency for rice and potato yielding was determined using the standard equation using the geostatistical semivariogram model (Eq. 8.1) (McKinion et al. 2010).

$$\gamma \left( h \right) = \frac{1}{2N\left( h \right)} + \sum\limits_{i = 1}^{N\left( h \right)} {\left[ {Z\left( {x_{i} } \right) - Z\left( {x_{i} + h} \right)} \right]^{2} }$$
(8.1)

where γ(h) represents semivariance, Z(xi) is the distance between the measured sample at point Xi, and the point Z(xi + h) is the sum of the pairs separated by the lag h.

The semivariogram was fitted using exponential, spherical, Gaussian, and linear models. The residual sum of squares (RSS) was used to determine the form and goodness of fit of the model semivariogram (Eq. 8.2).

$${\text{RSS}} = \sum\limits_{i = 1}^{n} {\left( {y_{i} - f\left( {x_{i} } \right)} \right)^{2} }$$
(8.2)

where yi is the ith value of the variable to be predicted, xi is the ith value of the explanatory variable, and \(f\left({x}_{i}\right)\)  is the predicted value of yi

The statistical cross-validation method was used for the performance of the semivariogram model. The prediction accuracy of crop yield and VIs was assessed through the coefficient of determination and root mean square error value. The best model accuracy indicated the maximum R2 and least RMSE value for a final agreement between crop yield and VIs parameters.

The sill, easily-defined range explains the exponential and spherical models for the plant and soil variability fitting (McKinion et al. 2010). Our study structured the three parameters when fitting the best semivariogram model, namely (i) Nugget C0 represents short-scale randomness, (ii) Sill (C0 + C1) of the semivariogram is equal to the variance of the random variable when growing beyond a certain distance and becomes less or steadier around an edge value, (iii) range is defined the correlation between two properties inclined to be equivalent to zero when the distance h becomes too large. In semivariogram analysis, the trends of spatial dependence (SD) are measured by the nugget/sill ratio C0/ (C0 + C1), %. It is described in terms of three types of spatial dependency levels; (i) strong SD < 25%; (ii) moderate SD—25‒75%; (iii) weak SD > 75% (Singha et al. 2020). The geostatistical kriging analysis is incorporated with the semivariogram that estimates the known values to unknown values for spatial judgment of crop yield variation at a similar level of resolution (Li et al. 2016). Yield maps converted into raster form correspond to the same spatial resolution of S2 images for extracting multi-point values. Then Pearson correlation matrix showed the relationship between the VIs and yield for the two crops.

2.5.2 Hotspot Analysis

Hotspot analysis is a statistical procedure that identifies statistically significant cold and hot spots through the Getis-Ord Gi* statistic. It utilizes a set of features that are weighted to identify these cluster areas. These statistics have three components: Gi Z Score, Gi p-value, and Gi-Bin values of the selected criteria. The resultant Gi Z Score and Gi p-Value define the high or low values of the cluster in a spatial process. A high Z score and a small p-value indicate a significant hot spot positively correlated with a low negative Z score. The more intense the hotspot or coldspot clustering, the more it is distributed using kriging interpolation within the region. This approach identifies high and low clusters of yield data correspondence to VIs with similar areas. The ArcGIS spatial statistics tools did spatial autocorrelation of the VIs in hotspots analysis in the mapping cluster approach. The Getis-Ord Gi* statistic was explained in (Eqs. 8.3, 8.4, and 8.5) (Abdulhafedh 2017).

$$G_{i}^{*} = \frac{{\sum\nolimits_{j = 1}^{n} {w_{i,j} x_{j} } - \overline{X} \sum\nolimits_{j = 1}^{n} {w_{i,j} } }}{{\sqrt[S]{{\frac{{\left[ {n\sum\nolimits_{j = 1}^{n} {w_{i,j }^{2} } - \left( {\sum\nolimits_{j = 1}^{n} {w_{i,j} } } \right)^{2} } \right]}}{n - 1}}}}}$$
(8.3)
$$\mathop{X}\limits^{\rightharpoonup} = \frac{{\sum\nolimits_{j = 1}^{n} {x_{j} } }}{n}$$
(8.4)
$$S = \sqrt {\frac{{\sum\nolimits_{j = 1}^{n} {x_{j}^{2} } }}{n}} - \left( {\overline{X} } \right)^{2}$$
(8.5)

where G indicates the G statistics of i, xj is the VIs and yield of j, wi,j describes the spatial weight between VIs and yield of i and j, X, and S represent mean, variance, and n indicate the sum of the VIs and yield parameters.

When evaluating crop yield and VIs agricultural data with their surrounding neighbors, the hotspots represent positive autocorrelation with high production, and coldspots represent negative autocorrelation with low production in the study area. The attempts were made to carry out hotspot analysis for point data on a microscale.

3 Validation

Validation is carried out using the survey training dataset of rice and potato yield information collected from seventy farm plots in the study area. For validation purposes, we use the area under the receiver operating characteristic (AUROC) curve incorporated into the four ML techniques, namely (i) logistic regression (LR), (ii) support vector machine (SVM), (iii) random forest (RF), and (iv) Naïve Bayes (NB) which measured the consistent relationship between yield data and VIs.

LR is a classification algorithm that calculates a predicted probability for a dichotomous dependent variable based on one or more independent variables (Ahmed and Sajjad 2018). Authentication of the LR model is regulated by the binary composition that indicates the probability of incidence with pair sample space, P (Eq. 8.6)

$$P\left( {y = \left. { \pm 1} \right|x,w} \right) = \frac{1}{{1 + \exp \left( { - y\left( {w^{T} x + b} \right)} \right)}}$$
(8.6)

where b signifies the intercept, T signifies the transfer matrix, and the k-dimensional coefficient, w = (w1, w2, … wk)T, comprises the essential parameters to be assessed. Similarly, SVM, RF, and NB models can be estimated using their respective standard equations.

The model’s prediction accuracy was estimated at 70% for training and 30% for testing data through the area under the AUROC curve (Eq. 8.7) (Allen 2015).

$${\text{AUC}} = \frac{{\sum {\text{TP}} + \sum {\text{TN}}}}{P + N}$$
(8.7)

where TP—true positive and TN—true negative represents the number of farm plots correctly predicted crop yield, P is the total number of plots with heavy specs, and N is the total number of plots without substantial specs.

All-model performances were optimized for choosing the best tuning hyperparameter to determine the high accuracy (AUC) results (Table 8.2). The AUROC curve was produced by the Anaconda python Jupyter notebook v. 6.0.1.

Table 8.2 Optimization of parameters for AUROC validation

4 Results and Discussion

4.1 Descriptive Statistical Analysis

Descriptive statistical analysis was carried out for all the selected VIs and crop yields (Table 8.3). The mean rice and potato yield rate of the study area was 7.11 and 28.04 (t/ha), respectively. The spatial variability of crop yield and VIs data were good estimators of the ecological vegetation process that resulted in a standard deviation (SD), verified a CV value of 23.25% for rice yield and 29.88% for potato crop yield. The VIs of NDVI, NDWI, BI, DVI, GEMI, SAVI, PVI, RVI, and LAI had low CVs in rice crop; similarly, the NDVI, DVI, SAVI, PVI, and RVI had a moderate level of CVs for the potato crop. The NDVI and SAVI ranged between 0.45 to 0.58 and 0.27 to 0.34 for rice crops, and similarly for potato crops between 0.16 to 0.55 and 0.11 to 0.35. The high mean values for RVI were around 3.33, followed by LAI (2.06), GEMI (0.64), and NDWI (0.62) in the rice crop, whereas the potato crop is high mean values of RVI (2.25), followed by LAI (0.80), GEMI (0.56), and NDVI (0.38), respectively. The positive (1.29) and negative (− 2.08) skewness were found with BI and NDWI for the rice crop. However, the negative (− 0.32) skewness found in NDVI and positive (0.44) in the LAI was found in potato crop, creating the symmetrical type distribution. The kurtosis interrelated to DVI, SAVI, and PVI showed platykurtic behavior associated with normal distribution for rice crops. In the case of NDWI and BI, high kurtosis (− 0.89 and 0.27) was observed, which may represent the leptokurtic behavior of distribution for the potato crop.

Table 8.3 Descriptive statistical analysis of selected VIs and crop yields (tons/hector)

4.2 Relationship Between VIs and Crop Yields

Pearson correlations matrix described the interrelation between the selected VIs and crop yield for two specific crop seasons (Tables 8.4 and 8.5). Higher r2 values reported the most significance in crop growth and yield patterns around the experimental farm plots. Rice yield was significant/positively correlated with NDVI, and RVI, while a moderate correlation with SAVI, DVI, and PVI. VIs of RVI, NDVI, SAVI, DVI, and PVI were best positively correlated with the actual rice yield with high r values that ranged from 0.80 and 0.92 (Table 8.4).

Table 8.4 Pearson correlations matrix between VIs and rice yield (n = 70)
Table 8.5 Pearson correlations matrix between VIs and potato yield (n = 70)

Similarly, potato yield was strongly correlated with NDVI and moderately correlated with RVI, SAVI, DVI, PVI, and LAI of r values from 0.761 to 0.908 (Table 8.5). BI was insignificant/negatively correlated with both the crop yield (rice and potato) because of the variability of canopy reflectance and light use efficiency (LUE) with site-specific field management.

4.3 Geostatistical Analysis

Plot-based yield information and extracted VIs datasets were analyzed using the geostatistical semivariogram method (Table 8.6). The lag size of both crops was between 14.90 and 64.46, that’s were depending on the spatial variability of the crop, VI, and soil nutrients. The high lag size was 57.76 for rice (BI) and 64.46 for potato (NDWI) with a spherical semivariogram. Associating range values of rice VIs, the lowest value was measured for GEMI (95.59 m), and the highest value was measured at NDWI (255.01 m). Similarly, the VIs of potato crops varied amply between 246.46 and 349.83 m. Nuggets were highest for PVI (5.29) and lowest for BI, GEMI (0.0001) of rice crops, where the potato crop has the lowest nugget for GEMI variables. The nine VIs of the rice crop revealed a sill size between 0.002 (BI, GEMI) and 6.42 (PVI), which stated a relatively parallel total variance. The VIs of potato crop sill varied from 0.008 (GEMI) to 0.093 (NDVI). The ratio of percentage between the two parts, a nugget to sill variance, ranged noticeably between RVI (0.039%) and PVI (82.37%), where C0 represented 0.039% and 82.37% of C0 + C with strong spatial dependency (0.41%) for rice yield, and the other potato crops where C0 denoted 9.87% to 60.36% of C0 + C with moderate spatial dependency (34.27%) for potato yield. The exponential and spherical models are very good of a fit in semivariogram analysis with the lowest RMSE value, which means they are highly significant. Exponential models were suitable alternatives to the experimental semivariograms for NDVI, NDWI, DVI, SAVI, PVI, RVI, and LAI, whereas BI and GEMI values were best fitted with a spherical model for rice crops (Fig. 8.3). NDVI and BI are very well-explained by the exponential models; whereas NDWI, DVI, GEMI, SAVI, PVI, RVI, and LAI, are the best suited to the spherical model for potato crops (Fig. 8.3). R2 was always > 0.75 estimated for the crop yield, with an error of RMSE very low, particularly concerning average VI. The best R2 values of 0.846, 0.801, and 0.743 were found for RVI, NDVI, and SAVI, respectively, while a weak R2 was found with NDWI and BI for rice crops (Figs. 8.3 and 8.4). On the other hand, the NDVI, SAVI, DVI, PVI, and RVI were associated with the highest corresponding R2 values of 0.824, 0.674, 0.653, and 0.652 for the potato crop. The RMSE of VIs ranged between 0.004 and 0.41, concerning rice yield of 1.77 t/ha. Similarly, the VIs of potatoes range between 0.006 and 0.29, with a potato yield of 8.4 t/ha (Table 8.6). The maps indicated that the northeastern field had a lower crop yield than the southwestern part of the field. The spatial variability of the crop was also exhibited in the generated rice and potato yields.

Table 8.6 Semivariograms model performance with the exponential and spherical models for both the crop VIs and yields analysis
Fig. 8.3
figure 3

Semivariograms graph for rice crop; a NDVI, b NDWI, c BI, d DVI, e GEMI, f SAVI, g PVI, h RVI, i LAI, and j rice yield

Fig. 8.4
figure 4

Semivariograms graph for potato crop; a NDVI, b NDWI, c BI, d DVI, e GEMI, f SAVI, g PVI, h RVI, i LAI, and j potato yield

Remote sensing-based S2-derived VIs maps and the responses crop yield maps were established for similar spatial resolution (Figs. 8.5 and 8.6). A total of nine VIs are selected through their associations with crop productivity. The study region also observed a spatial pattern consistent with crop yield distribution. High values found in the south–southwest part of the farm plot were associated with high crop yield compared to the northern part of the region.

Fig. 8.5
figure 5

Vegetation indices and yield distribution map for rice crop; a NDVI, b NDWI, c BI, d DVI, e GEMI, f SAVI, g PVI, h RVI, i LAI, and j rice yield

Fig. 8.6
figure 6

Vegetation indices and yield distribution map for potato crop; a NDVI, b NDWI, c BI, d DVI, e GEMI, f SAVI, g PVI, h RVI, i LAI, and j potato yield

4.4 Hot Spots Analysis

The concept of the Z score indicates a significant hotspot and a negative Z score indicates a significant coldspot in Getis-Ord Gi* statistic. The analysis data revealed that the high-yielding areas were mainly concentrated in the RVI, NDVI, and SAVI surfaces. This observation explained that high-yield clusters exist in the RVI, NDVI, and SAVI surfaces with significant cluster distribution.

This analysis suggested that RVI, NDVI, and SAVI indices were superior in guiding maps for ideal demonstration of the spatial variability for rice and potato yield patterns at small farm levels (Figs. 8.7 and 8.8). BI values are highly significant (hotspot) and range between 90 and 95% confidence level in the northeastern region due to soil humidity and pH variability with low crop yield for both crops (Fig. 8.7). Similarly, the other VIs (DVI, GEMI, PVI, and LAI) showed a highly significant (90–95%) cluster with high yield due to better crop practice management in the southwest part of the farm areas for both the crop (Figs. 8.7 and 8.8). In this chapter, the hotspot analysis described the spatial pattern of both crop yield distribution with VIs: coldspot significance level found in the northern part of the farm plot, and in contrary hotspot was in significance level in the south–southwestern side of the farming area. The significance levels of hotspot maps presented a high number of low-distant VIs.

Fig. 8.7
figure 7

Hotspot analysis for rice crop; a NDVI, b NDWI, c BI, d DVI, e GEMI, f SAVI, g PVI, h RVI, i LAI, and j rice yield

Fig. 8.8
figure 8

Hotspot analysis for potato crop a NDVI, b NDWI, c BI, d DVI, e GEMI, f SAVI, g PVI, h RVI, i LAI, and j potato yield

4.5 AUROC Validation

The outcomes of the validation model of AUROC with four MLAs techniques between the yield and VIs datasets were assessed for both crops (Fig. 8.9). SVM, LR, and RF algorithms led to the spatial assessment of rice and potato crop yield estimation with AUC accuracy values between 80 and 90%, categorizing them as very good predictors, while NB had an accuracy value between 60 and 80%, categorized as a moderate to the good predictor for both the crops. More precisely, the outputs revealed that the three MLAs techniques, SVM, LR, and RF algorithms, produced competitively better performances than the NB ML algorithm for both the crop yield data.

Fig. 8.9
figure 9

AUROC validation between VIs and crop yield (rice and potato); a rice crop, a potato crop

5 Discussion

We have used different spectral vegetation indices, estimated from the Sentinel-2B satellite data, to carry out the geostatistical-based crop yield estimation of rice and potato crops in two different seasons in Tarakeswar, Hooghly region (West Bengal). The spatial variability of crop yielding is delineated by the climate, topography, soil factor, water availability, and farm mechanization parameters (Singha and Swain 2022). A non-destructive yield prediction map was developed using a multiyear dataset. The map was developed using a geostatistical tool and an artificial intelligence model (Panek et al. 2020). A correlation coefficient analysis shows the spatial pattern of the VIs and yield. The VIs of RVI, NDVI, SAVI, DVI, and PVI were found to be significantly positive and correlated to the rice yield in the Kharif season.

The most commonly used VIs utilizes the information in the red and near-infrared (NIR) canopy reflectance or radiances. They are combined in ratios: ratio vegetation index (RVI). RVI is the most frequently used VI with a significant correlation with grain yield (Pinter et al. 2003; Ali et al. 2019). The RVI, which measures the relative importance of rice yield to the farm plot, showed the best correlation with the actual rice yield, followed by the NDVI > SAVI > DVI > PVI > LAI > GEMI > NDWI > BI (Fig. 8.4). The plant signals attained from SAVI and NDVI had high correlation with the Sentinel-2-measured canopy reflectance during the Kharif rice and Rabi potato growing seasons. Similarly, during the Rabi seasons, the R2 values of NDVI, SAVI, DVI, PVI, RVI, NDWI, and LAI decreased toward potato yield distribution. These relationships showed a sound correlation coefficient ranging between, 0.761 and 0.908, with potato crop yield.

A weak correlation was observed between the various parameters, namely NDWI, BI of crop concentration, and actual yield. It was attributed to the fact that the crop's maturity caused an increase in visible reflectance (Kumhálová and Matˇejková 2017). RVI, NDVI, and SAVI indices were the better-controlling factors to variate crop yield. On the other hand, for potato crops, the NDVI, SAVI, DVI, PVI, and RVI indices were the highest R2 values of 0.824, 0.674, 0.653, and 0.652, respectively (Table 8.6). Seasonal NDVI values indicate the best yield estimation factor for rice and potato crop (yield) predictions. The maximum NDVI was retained as the most significant variable for predicting field-level yield. Borowik et al. (2013) reported that the relationship between vegetation above-ground biomass and NDVI reflects each habitat type in seasonal variation. The selected NDVI and RVI indices have comparatively higher correlations with grain yield than SAVI (Ali et al. 2019).

Geostatistical semivariogram analysis was performed with best-fitting exponential and spherical models to develop the RS-based VIs and crop yield maps (Singha et al. 2020). VIs maps combined with RGB and NIR bands have a greater potentiality for plant sensitivity analysis in different crop growing seasons (Pinter et al. 2003). The ML methods such as support vector regression (SVR) and ensemble multilayer perceptron neural networks are successfully predicting the rice yield in the Cauvery Delta Zone (CDZ) and Gujarat India (Bhojani and Bhatt 2020; Yu et al. 2021). The chapter's contribution is simple and efficient in providing effective structures within appropriate site-specific agronomic management. The sill variance depends on canopy cover reflectance with the plant growth condition assumed by the VIs semivariograms analysis (Pradhan et al. 2014). Data of all VIs and crop yield were signified according to the Getis-Ord Gi* statistic for identifying the degree of significance in the study area. In statistical hotspot analysis, high clusters of yield were detected in RVI, NDVI, and SAVI surfaces and low yield while the low significant clusters observed in RVI, NDVI, and SAVI surfaces were superior to guide map and best alternatives for rice and potato yield prediction in a small scale of village level. The hotspot analysis was used to define a cluster's higher or lower values in a spatial process, specifying the number of clusters to be detected (Marino and Alvino 2021). VIs map correspondence to yield map have similar trends in hotspot analysis. Generally, the northern part of the study area has some limitations with coldspots due to high soil pH (acidic soil), least organic carbon, high electrical conductivity, and low farm mechanization level. Similarly, high yield and VIs values were found in the northwestern region with good correlations because there has been remarkable crop growth represented by higher canopy biomass in agreement with the final yield.

The validation process was made through the AUROC machine learning model between VIs and crop yield (rice and potato) for two different seasons. SVM, LR, and RF models have good validation results with > 83% accuracy for both crops. Crop yield estimation can be improved using modern advanced techniques such as hybrid ML and deep learning, satellite data fusion, LiDAR, UAV, IoT, SAR, and GEE cloud with a higher number of influential factors (soil factor, climate, topography, hydrology, AGB, farm mechanization level, and socioeconomic background).

6 Conclusion

The Sentinel-2 mission has opened up new scenarios for monitoring the performance of smallholder farming systems. The various temporal and spatial resolutions of satellite images were possible to estimate a farm's crop yield using crop observation VIs. Evaluation of farm decisions can then be made based on the data collected by observing different crop practice parameters. High-resolution images can now accurately classify crop biophysical data for individual farmlands. The data collected by the NDVI and SAVI revealed a high correlation between the observed canopy reflectance and the plant signals. The hotspot analysis further identified the areas with high crop yields and problem areas with low crop yields. These signals indicated that the various interventions designed to improve smallholder farms’ productivity may not address these communities’ diverse socioeconomic conditions. The results of the crop models may be provided to the farmer to take necessary steps to improve crop productivity, particularly in the northern part of the concerned study area.