1 Introduction

Maize (Zea mays) is a crucial staple crop in Ethiopia. Maize is mainly grown as a rain-fed agricultural system without irrigation by smallholder private farmers. The production has been fluctuating from year to year due to rainfall and temperature variability, variation in soil fertility, plant diseases, and changing types of crop management (MOA 2015). Currently, yield fluctuations remain one of the significant challenges for the country's food security strategy (Cochrane and Bekele 2018; FAO 2015).

The agriculture sector of Ethiopia contributes more than 80% of the Gross Domestic Product (GDP) of the country (MOA 2019). Cereal crops are the primary food source where maize plays the most critical role in food security. Timely yield estimation for this significant crop is essential for supporting efficient agricultural decision-making for timely planning of food import in case of shortage or, optionally, to export in case of surplus production. Traditionally, in Ethiopia, early crop yield estimation is done from the observed data, such as the farmers' request for agricultural inputs and the yearly rainfall and distribution pattern. This technique, however, turned out to be biased, costly, and associated with significant uncertainties leading to poor estimates of crop area and predictions of crop yield (Balaghi et al. 2008; Ramirez-Villegas and Challinor 2012). Such information is also released late or even after the end of the season, which is too late to take appropriate actions to avert hunger. Locally specific models for a reliable annual yield prediction would significantly improve well-informed and timely decision-making in food provision.

The development of earth observation technology suggests that remotely sensed data be straightforwardly employed for agricultural crop yield forecasting. Remote sensing data offers at modest cost spatially explicit, large area and timely monitoring of the earth’s surface, including crop fields and development (Liu and Kogan 2002). To identify plants and their cover, discriminate individual plants, vegetation indices (VIs) are used to measure vegetation greenness. VIs are computed as a combination of different spectral bands, usually including the red and near infrared. VIs were also found to have a solid link to plant physiology and crop productivity (Meng et al. 2014; Noureldin et al. 2013; Prabhakara et al. 2015; Sakamoto et al. 2014). The relationship of various satellite image derived VIs with biophysical parameters of plants has been investigated, such as in Ban et al. (2016), Bussay et al. (2015), Chipanshi et al. (2015), Darvishzadeh et al. (2009), Prabhakara et al. (2015), Sharma et al. (2015) and Smethurst et al. (2017). Several researches have also been conducted on the relationships between various crops including maize, and vegetation indices (Bolton and Friedl 2013; Chivasa et al. 2017; da Silva et al. 2020; Maresma et al. 2016; Peng and Gitelson 2011; Zhou et al. 2017). These studies have established a positive linear correlation between VIs and crop biophysical variables, such as leaf area index, leaf traits, biomass, and grain yields.

The spectral and temporal changes of satellite images and their response to the biophysical characteristics of crop plants play a significant role when establishing an effective pre-harvest estimation of crop productivity. Multispectral satellite imageries have been extensively used for the spectral reflectance intensity in different spectral bands of visible to mid-infrared regions of the electromagnetic spectrum (Kamal and Bhatia 2010). There are many classes of spectral indices which have been subjected in substantial researches; such as Normalized Difference Vegetation Index (NDVI) (Rouse et al. 1973), Soil Adjusted Vegetation Index (SAVI) (Baret et al. 1989), Enhanced Vegetation Index (EVI) (Huete et al. 2002), Green Vegetation Index (GVI) (Panda et al. 2010), and Normalized Difference Flood Index (NDFI) (Boschetti et al. 2014). New group of VIs has also been developed based on the shape and position of the spectral reflectance curve. It comprises the Red-Edge part of the electromagnetic reflectance spectrum, such as the Red-Edge Normalized Difference Vegetation Index (Red-Edge NDVI) and Red-Edge Enhanced Vegetation Index (Red-Edge EVI) (Mutanga and Skidmore 2004). These VIs have a steep slope between the lower parts of the visible region and the higher reflectance of the near-infrared region (0.67–0.79 µm). The red-edge inflection point depends on the chlorophyll amount detected by the sensor and is strongly correlated with plant leaves chlorophyll concentration, offering a sensitive indicator of vegetation stresses and biomass content (Rossini et al. 2007).

The main aim of this study was to improve decision-making regarding food provision from agriculture and thus contribute to better food security in Ethiopia. The technical goals are (1) to validate the potential of Sentinel-2 MSI to identify and map maize fields; (2) to develop a remote sensing-based technique to monitor maize phenology; and (3) to establish models to predict the expected maize yield.

2 Materials and Methods

2.1 Study Site

Our study area is in the district of Abaya Woreda, Oromia Regional State, Ethiopia. It is situated in the Borena Zone at the southern part of the capital city of Addis Ababa, between 6° 17′ 14″ and 6° 28′ 4″ N, and 38° 7′ 54″ and 38° 19′ 19″ E (Fig. 1). The agricultural fields in the district where our investigation piloted is about 30,000 ha and is mainly cultivated by smallholder farmers. The altitude ranges from 1100 to 1900 m a. s. l. The area has a subtropical climate with an average annual rainfall of 1500 mm (mainly between beginning of April and October), and the monthly mean temperature ranges from 18 to 25 °C. According to UNDP (2000), the land comprises 41% of arable land (28.7% is under annual crops), 35% pasture, 15% forest, and the remaining 9% is swampy and degraded land. Four major crops are cultivated: maize (Zea mays), teff (Eragrostis tef), barley (Hordeum vulgare L.), and haricot beans (Phaseolus vulgaris L.), and there is also some cultivation of Sorghum (Sorghum bicolor) (CSA 2018).

Fig. 1
figure 1

Location of the study area

2.2 Field Data

The maize in our study area was sown at the beginning of April and harvested in October of 2018, which is the suitable cropping season of the study site. Principally, the local variety of ‘Pawuner’ was cultivated. In situ crop information of phenological or growth stages and transition dates, viz, sowing and establishment, vegetative growth, tasselling and silking, yield formation and maturity, harvesting and yield information, were noted regularly during the field observation. We used random sampling with a sample size n = 250 for training data collection.

The maize reference data required for phenological or growth stages monitoring and yield modelling were collected in a sample of n = 31 square field plots of 100 m2 (10*10 m) whose position was recorded by a global navigation satellite system (GNSS) with the accuracy of ± 3 m. This plot size represented one single Sentinel-2 image pixel.

2.3 Satellite Data

Time series of Sentinel-2A MSI for the 2018 maize growing season was used for classification and vegetation indices (VIs) extraction. The satellite image is part of ESA´s Sentinel mission for the monitoring of the Earth´s surface. It has a high revisit time of 10 days at the equator with one satellite resulting in 5 days from the two satellites constellation under cloud-free conditions in 2–3 days at mid-altitude (ESA 2016).

The images were originally published in the Copernicus Open Access Hub with Top of Atmosphere (ToA) reflectance, and we obtained it as LevelC. Images with a cloud cover surpassing 20% were discarded. Pre-processing, such as radiometric calibration, geometric and atmospheric corrections were completed in Google Earth Engine. Of 13 spectral bands of Sentinel-2 MSI ranging between 0.443 and 2.190 µm of wavelength and 10–60 m spatial resolution, we used seven spectral reflectance bands for maize field classification, phenological monitoring, and model prediction: blue (0.458–0.523 µm), green (0.543–0.578 µm), red (0. 650–0. 680 µm), Red-Edge (0.773–0.793 µm), near infrared (NIR) (0.785–0.900 µm), and short-wave infrared (SWIR) (2.10–2.28 µm). Blue, green, red, and NIR spectral bands have a 10 m spatial resolution, while Red-Edge and SWIR bands originate with a pixel size of 20 m and were later resampled to 10 m of pixel size following Park and Schowengerdt (1983) of nearest neighbour resampling approach.

2.4 Methods

The approach of this study comprises three main steps: (1) classification and mapping of maize crop fields; (2) development of a remote sensing-based crop phenological monitoring technique; and (3) building yield prediction models. Detailed activities implemented in each step are discussed in the following sections, and the overall procedural workflow of the study is illustrated in Fig. 2.

Fig. 2
figure 2

Methodology flowchart of the study

2.4.1 Processing of Vegetation Indices (VIs)

A composite of Sentinel-2 scene executed from each phenological stage was used to derive Vegetation Indices (VIs). Seven VIs were applied as remote sensing-based phenology and yield predictors: Normalized Difference Vegetation Index (NDVI), Enhanced Vegetation Index (EVI), Green Vegetation Index (GVI), Soil Adjusted Vegetation Index (SAVI), Normalized Difference Flood Index (NDFI), Red-Edge Normalized Difference Vegetation Index (Red-Edge NDVI) and Red-Edge Enhanced Vegetation Index (Red-Edge EVI). Calculations of spectral indices were executed according to the equations in Table 1.

Table 1 Mathematical equations of Sentinel-2 MSI vegetation indices (VIs)

2.4.2 Classification and Validation of Maize Fields

Composite scenes of phenologically adjusted images were used for classification. The composite was executed based on averaging the image pixel information (Running et al. 1995), acquired in different phenological stages. Due to the study site's high cloud cover effects in some months (especially in July and August), few scenes (about four images of composite per month) were considered only. We used a supervised random forest (RF) classifier (Breiman 2001) to discriminate maize fields (with package “ee.Classifier.smileRandomForest” in the Google Earth Engine). The RF classifier was chosen due to its lack of overfitting, its thrifty for user-defined features, low sensitivity to the number of input parameters, minimize correlation between classifiers, low computational demands, high processing speed and its ability to reduce noise (Belgiu and Drăguţ 2016; Gislason et al. 2006; Lambert et al. 2018). From a sample of n = 250 ground-truthing plots, a random subset of 70% of these field data was used for model building and the remaining 30% for validation. Randomly distributed reference training polygons were initially delineated manually in ArcMap for RF modeling based on the GPS location recorded during field observation. These reference polygons reflect the spectral properties of the maize crop and other land cover classes. The number of trees was set to 500 following the recommendation by Rodriguez-Galiano et al. (2012). A number of classifications were executed at different phenological stages to determine the best phenological transitional dates for accurate maize field identification. The classification accuracy was evaluated in terms of producer accuracy, user accuracy, and overall accuracy along with the standard techniques (Congalton 1991).

2.4.3 Monitoring of Maize Phenology Development

Crop phenology is the change in the growing phases of plants (Ruml and Vulic 2005). Maize phenologies and transition dates were regularly recorded in the field once a week (for n = 31 maize sample plots). Remote sensing-based spectral reflectance patterns of maize throughout the cropping season were then compared against field observed phenologies. We applied NDVI, EVI, GVI, SAVI, Red-Edge NDVI, and Red-Edge EVI for phenological information, while NDFI was used as an agronomic water level and flooding indicator. The VIs were obtained based on a total of 25 multi-temporal images distributed throughout the maize development stages.

2.4.4 Yield Predictive Model and Production Estimation

The phenologically adjusted values of multi-temporal VIs corresponding to the field sampling plots were weighted by an average fraction of each pixel. The values were subsequently used as predictor variables for yield models. For single VI-based predictive model tests, linear, exponential, power, logarithmic, and polynomial mathematical functions were analyzed. Predictive models of multiple regressions were executed through a stepwise forward regression approach using “stats” and “ggplot2” of statistical packages from RStudio (RStudio Team 2018). Accordingly, phenologically adjusted multi-models were established. The next step was to optimize the phenological periods to predict final grain yields in a timely and more accurate way. The model optimizations were evaluated based on the computed coefficient of determination (R2), root mean square error (RMSE), and bias (Eqs. 13). The R2 and RMSE values served as a model performance indicator. The VIs that provides peak accuracy when regressed with observed yield can then be selected as the best predictor of the final grain yield estimates.

$${\text{RMSE}} = \sqrt {\frac{{\mathop \sum \nolimits_{i = 1}^{n} (y_{i } - \hat{y}_{i} )^{2} }}{n}} ,$$
(1)
$${\text{RMSE}}\, \left( \% \right) = \frac{{{\text{RMSE}}}}{{\left( {\frac{{\begin{array}{*{20}c} . \\ { \mathop \sum \nolimits_{i = 1}^{n} (y_{i } )} \\ \end{array} }}{n}} \right)}}* 100,$$
(2)
$${\text{Bias}} = \frac{{\mathop \sum \nolimits_{i = 1}^{n} (y_{i} - \hat{y}_{i} )}}{n },$$
(3)

where yi—observed value of maize yield; \(\hat{y}_{i}\)—predicted value of maize yield; n—number of observations.

3 Results and Discussion

3.1 Discriminating Maize Fields

The overall accuracy (OA) results of maize discrimination were varied with different multi-date classifications subsequent to phenological phases. At the beginning of the developmental stages (between April and July), the OA of classifications was relatively low (ranging from 55 to 67%). This confirms that it is challenging to discriminate maize crops at these growth stages due to the spectral mixes of several plants. For instance, high weed infestations, which were not yet entirely removed at these stages, contribute to high spectral confusion. A high weed infestation during the cropping season leads to substantial omission error, which is also revealed from other studies (Eddy et al. 2014; Lambert et al. 2018).

In our study, classification accuracy significantly improved during tasselling and silking phenologies, typically between 80 and 110 days after sowing. During these stages, the OA of maize classification reached about 85%. The maize fields were well-weeded during these phenological stages and enabled the maize to have a distinct spectral signature and low commission errors. In addition, these periods offer a complete stage of maize development, while other cereal crops, such as bean (Phaseolus vulgaris L.) and barley (Hordeum vulgare L.) and some weed remnants were already at dehydration and senescence stages. This allows us for clear spectral discrimination of maize from other vegetation. Kussul et al. (2016) also reported an improved crop mapping with an OA of 87% during early growth stages that increased to 94% at late development stages. Several studies also distinguish specific temporal ranges for more accurate classifications (Mazzia et al. 2020; Vuolo et al. 2018; Waldhoff et al. 2017). The final classified maize fields accounted for about 8% of all land covers in the study site (Fig. 3).

Fig. 3
figure 3

Maize fields of the study district (2018 growing season), depicting a highly fragmented spatial arrangement of many small and few large fields. The overall accuracy of this classification was estimated to be 85%

We also confirmed that using multi-date composite images in the optimal temporal casement offered better maize discrimination accuracy when compared to using a single-date dataset. Belgiu and Csillik (2018) and Palchowdhuri et al. (2018) also reported classification accuracy increases when using multi-temporal composite images. With the opportunities to highly accurate crop discrimination potential of the high-resolution Sentinel-2 MSI, our ultimate classification result was slightly lower than several other studies (Khaliq et al. 2018; Lebourgeois et al. 2017; Nasrallah et al. 2018; Sonobe et al. 2018; Zheng et al. 2017). This is because the study area is highly fragmented farmlands with several adjusting agroforestry plant species, which can lead to spectral confusion between different species. Various studies also recognized spectral confusions among different species (Forkuor et al. 2014; Hu et al. 2019). In addition, intercropping operations in our study site, principally soybean (Glycine max) within the same maize fields, can also contribute to spectral confusions.

3.2 Remotely Sensed Maize Phenology Monitoring Technique

The remotely sensed monitoring of phenology of maize development revealed a distinct temporal pattern as presented in Fig. 4. Plant spectral reflectance-based VIs (NDVI, EVI, GVI, SAVI, Red-Edge NDVI, Red-Edge EVI) unveiled almost identical and consistent temporal patterns. At the same time, NDFI showed irregular patterns with the variability of moisture. During the initial periods, VIs showed the lowest records. After few weeks (around mid-April), NDFI values increased, which denotes the start of rainfall and might also point to agronomic flooding. This period is when farmers started maize sowing. Likewise, it is also possible to recognize sowing dates in remote sensing from the smoothed VIs temporal profile. A few weeks later (from 30 to 70 days after sowing), VIs steadily increased, representing the emerging of the maize plants and rapid growth. The crop reached its period of peak growth between 80 and 110 days after sowing. During this period, the VIs rose to the highest values. The decrease of VIs around 120–150 days after sowing represents plant maturity and senescence. VIs dropped to their lowest values around 160–180 after sowing as the leaves dried out and died, which was the time of harvesting operations in the study area. In the typical environmental conditions of the crop growing season, various plant spectral VI profiles showed a similar reflectance characteristic, which confirms numerous previous studies (Liao et al. 2019; Lobell et al. 2003; Sakamoto et al. 2005; Tian et al. 2019). Only NDFI revealed distinct reflectance properties linked with the frequency and quantity of rainfall in the study site.

Fig. 4
figure 4

Vegetation indices from Sentinel-2 MSI over the growth period of maize fields, depicting changing phenology for the 2018 crop growing season. Rectangular boxes in different colours represent the phenological stages observed in the field

3.3 Grain Yield Forecasting Models and Production Estimation

We analyzed the capability of six spectral VIs to predict maize grain yield early before the harvesting period. The result showed VIs could significantly predict the grain yield in the middle and late phenological stages. More accurate predictive models were achieved during the peak maize spectral reflectance period (between 80 and 110 days after sowing). Predictive models of this phenological stage are presented in Table 2. The use of single spectral VI, from Model 1–9, performed relatively low. From a single VI perspective, the Red-Edge band and EVI performed better with all estimations below 700 kg/ha of RMSE. The most accurate model was achieved through a combination of spectral VIs. Mathematical models: Model 10 (EVI and GVI), Model 11 (Red-Edge EVI and SAVI), and Model 12 (NDVI, Red-Edge EVI, and SAVI) all offered R2 values ≥ 80% of yield estimations.

Table 2 Correlation between vegetation indices and maize grain yield, executed from tasselling and silking phenological stages (between 80 and 110 days after sowing) of 2018 crop growing season

Model validation was carried out using regression analysis between actual field observations and predicted yield (Fig. 5). The scatterplots were established on one-hectare-based agricultural land production rates. In accordance with Table 2, Models 1–6 were performed according to empirical linear functions, Model 7 with a polynomial function, Models 8 and 9 with exponential functions, and Models 10–12 were based on multiple-variable regressions. According to these predictive models, there was a clear relationship between the observed and predicted grain yields. The mathematical Model 12 offered the most highly accurate yield estimations in our study with a RMSE of 449.66 kg/ha. The model showed a slight overestimation with an overall computed bias of 3.00 kg/ha. Mathematical Model 5, which bases on GVI, showed the lowest predictive power with a RMSE of 784.61 kg/ha.

Fig. 5
figure 5

Scatterplots between maize observed and predicted yields for 2018 crop growing season. The dashed line represents a 1:1 relationship and the solid line denotes the fitted line

In our study, remarkable yield regression models were obtained with evidence for practical use (Table 2). The accuracy of yield estimates peaked between 80 and 110 days after sowing for linear and multiple regression models, which was the tasselling and silking stages after green-up. A similar result was obtained using MODIS images, suggesting that grain yield prediction using spectral indices offers high accuracy (Bolton and Friedl 2013). Other studies found that the time for peak correlation with the best time for early yield predictive models is during the reproductive stage of maize (Mkhabela et al. 2011; Sacks and Kucharik 2011).

In particular, GVI and NDVI-based simple linear regressions resulted in relatively lower predictive grain estimation compared to other models, suggesting that yield was under and overestimated, respectively. The poor correlation is because the high greenness of the cultivated maize variety can make these VIs easily saturated. Thus, these VIs have the character of easy saturation with low biomass (Casanova et al. 1998; Xue and Su 2017). In crop fields with low moisture content and low weed infestation, NDVI can be a good yield predictive indicator (Noureldin et al. 2013). In fragmented small-scale farmlands, variable moisture, and high greenness crop fields of our study, using EVI, Red-Edge-based, and soil suppressing spectral indices improve predictive models' precision. Studies of Clevers and Gitelson (2013), Dong et al. (2015), and Forkuor et al. (2017) also point to the importance of Red-Edge to minimize saturation problems in crop analysis and crop mapping.

Models from multiple regression offered higher accuracy of all yield forecasts; models 10, 11, and 12 featured lower RMSE of 5.73, 5.09, and 4.50 kg/pixel of grain, respectively (Table 2). The use of more than one spectral index increased the accuracy of the models. This method allowed for a comprehensive assessment of different attributes, such as chlorophyll content, vegetation health, water level, and soil and canopy conditions through several spectral bands. In addition, our approach offers improved yield estimates compared to the predictive models developed using climate variables, such as rainfall and temperature (Ramirez-Villegas and Challinor 2012). If highly accurate predictive models are required, multiple VIs-based models can be preferred. But, if parsimonious yield estimation is needed, predictive models with a single spectral index, such as EVI, Red-Edge-based NDVI, and EVI can be selected.

Grain yield mapping in the study area can be done through summing of all pixel yield estimates (10*10 m) in each maize field, and then converting it to standard production rate (ha). We presented the spatial distribution maps by executing based on considering the best predictive models; potentially close to the 1:1 relationship line (Models 4, 10, 11, and 12) (Fig. 6). With the most refined predictive model (Model 12), about 3.4 t ha−1 of mean grain production was obtained. With a total area of 1917.74 ha of classified maize fields in the study area, about 6520.32 tonnes of maize grain was found to be harvested in the 2018 cropping season.

Fig. 6
figure 6

Spatial distribution of maize yield predictions for 2018 for different prediction models

In our investigation, VIs can potentially provide production anomalies. The potential of VIs for crop growth monitoring, and accurate yield prediction about 2 months before the start of the harvesting season, supports the government’s readiness for various early warning alerts. Subsequently, the government can make reliable agricultural decisions based on the crop growth conditions and production status, such as during the time of crop growth abnormalities, and shortage or surplus grain products.

4 Conclusion

We established approaches for remotely sensed maize phenological monitoring and yield predictive models early before the real-time harvesting period for Abaya district of Oromia Regional State in Ethiopia using spectral indices derived from Sentinel-2 MSI data. We also verified the potential of Sentinel-2 high-resolution imagery to discriminate maize fields in our study site. In addition, we evaluated the suitability of different vegetation indices (VIs) for grain prediction. Overall, our finding shows that the best time to predict maize grain yields was between 80 and 110 days after establishment. The uses of phenological-adjusted spectral indices provide more precise grain yield estimates. Multiple regressions from various VIs also offer more precise yield estimates than single VI-based models.

It is important to note that using a small number of sampling plots and the size of Sentinel-2 MSI grids in our study can affect our predictive models. We relied on agricultural lands that landlords were cooperative with to provide us with reliable yield information. On the other hand, the predictive model established in our study using only annual data may not be used for another year. This is because there has been a lack of a production database for the past years. Likewise, the Sentinel-2 imagery mission has been available only from 2016 onwards. For regular use of Sentinel-2-based predictive models, it would be essential to refine these models with data from many years. Slight pixel geometrical mismatches with the georeferenced field measured plots were also observed. But, it is reasonable to assume that yield variation within a single farm field is relatively small, and thus its effect on the model development is assumed to be negligible. In general, the results from this study suggest that Sentinel-2 MSI-based crop phenological information has the potential to support crop monitoring, mapping, and grain yield predicting in fragmented small-scale farming lands, like our study site, for agricultural decision-making.