Keywords

1 Introduction

Yield prediction is a general term describing an attempt to forecast crop yield in a forthcoming harvest season. Virtually all the efforts aiming at yield prediction involve data and some kind of an underlying model. A variety of different efforts can be considered when talking about yield prediction, depending on the aim of the prediction, the type of the model used (from gut feeling to complex plant physiology-based models to rigorous deep learning algorithms), the type of crop under consideration, the time horizon of the prediction, the scale of yield assessment (intra-field vs. regional), etc.

Yield prediction can be done at a global or national scale in which case the aim of the prediction is often economical (to predict the market prices, for example) or related to the security of supply. As market prices and food supply are complex issues depending, in addition to yield, on geopolitical situation, logistics, financial issues, etc., yield prediction driven by these aims is not considered in this chapter. Also, the data sources used to predict the yield are different in these efforts including official statistics, questionnaires to the farmers, etc. Instead, in this chapter we focus on yield prediction performed at the scale of a single crop field (or even at the subfield scale) over which the soil properties and growth conditions can be considered more or less constant. Indeed, this kind of effort will produce information for yield prediction at the global or national scale, but the immediate goal is different.

The aim of the yield prediction effort depends also on the time horizon of the prediction. One can develop a general model using the data from multiple years (together with general knowledge on plant physiology) to inform the farmers which crops and varieties would have higher potential in their particular environment or when to sow the crops. Slightly different aims can be considered when the predictions are made during the actual growth season the yield of which is predicted. Having yield prediction maps at the subfield level in the budding phase of plant growth could still inform the farmer on possible actions to be taken before harvest. Also, yield prediction can be performed using either a ‘snapshot’ of data at a certain time or a time series of data (weekly acquired weather data or remote sensing data, for example).

In this chapter we mainly focus on yield prediction based on the data acquired from the growth environment during the particular (and possibly also previous) growth season. While some data remain relatively constant (such as soil properties or climate conditions), other kinds of data can change radically from year to year (weather conditions). The main goals of this kind of yield prediction are to better understand the relationship between the environmental parameters and the yield and to provide as accurate prediction as possible. The practical value of these kinds of efforts is to provide the involved stakeholders (farmers, food industry, and regulators) information to support their decision-making (what crops to cultivate and how to improve the growth conditions by means of, for example, drainage, irrigation, soil shaping, or fertilization). The primary type of crops considered is cereals, although the models can be applied also to growing vegetables (but not so well to growing fruits or to the subject area of horticulture).

We will first have a look at the various data sources used in yield prediction and the measures used to assess the goodness of the prediction result. We will then consider two kinds of models—those based on plant physiology and those based on data alone—in separate sections of the chapter. The models based on plant physiology are presented only briefly and mostly for reference purposes, the main emphasis being on data-driven yield prediction models using machine learning.

2 Prerequisites of Yield Prediction

Whatever the method or model of yield prediction, it is always based on data. In the case of physics-based models, the data is used to calibrate or tune the model while in the case of data-driven models, such as those based on deep learning, the data are used for training the model parameters. In this section we first describe some common sources of data used in calibration or training yield prediction models. We then discuss various measures used in the evaluation of the accuracy of yield prediction models.

2.1 Source Data

Soil Data

Soil and its properties (composition and structure) play a major role in how plants grow and produce yield. A wide variety of variables can be derived such as:

  • Soil acidity (pH)

  • Cation exchange capacity (CEC)

  • Soil type

  • Soil chemical content (potassium and magnesium, for example)

  • Soil structure (clay or sand content, soil texture, etc.)

  • Water-related properties such as water holding capacity or water permeability.

Different ways can be used to acquire subsets of these variables. A chemical or structural analysis of soil samples in a laboratory setting is the most direct way to estimate soil properties. However, the collected data is sparse and it is often difficult to decide, where in the field the samples should be taken, especially in areas where soil properties change abruptly. More efficient sampling of soil chemical content can be done using portable hand-held X-ray fluorescence (XRF) devices (Weindorf and Chakraborty 2020). These devices enable determination of the amount of chemical elements in soil at a certain location in the field without the need for preprocessing the sample. However, the data acquisition is still manual and requires a license to operate the device. Other ways of acquiring soil property data are scanning for Electrical Conductivity (EC) (Stadler et al. 2015) or using the Ground Penetrating Radar (GPR) (Linna et al. 2022).

Features such as soil moisture or soil temperature can also be used in yield prediction models. These features are closely related to weather or climate data and can be directly measured using soil sensors. A variety of solutions are available from miniature wireless underground sensor systems (Tiusanen 2013) to tubes equipped with sensors at various depths providing stratigraphic data (Shah et al. 2012) of soil moisture and temperature. On the other hand, soil properties can also be obtained as target variables when applying machine learning methods to remote sensing data (Tantalaki et al. 2019). In this case soil properties, estimated from satellite or drone data, can be fed into the yield prediction models.

Remote Sensing

Remote sensing settings can be divided according to the host platform of the sensor. During recent years Unmanned Aerial Vehicles (UAVs), more commonly called drones, have become popular in remote sensing of agricultural land. Initial expectations of autonomous data acquisition with drones have not fully realized as in most countries restrictions on using UAVs are in place. In Finland, for example, there has to be constant visual contact between the operator and the drone and the operator has to have a license.

The Unmanned Aerial Systems (UASs) used in the context of smart farming usually contain a separate sensor mounted to the UAV platform, while in consumer systems, an RGB camera is often integrated to the drone. The most common sensor types used with UAVs include RGB cameras, multispectral cameras, hyperspectral cameras, thermal sensors, and lidar devices (Messina and Modica 2020; Tsouros et al. 2019). While RGB cameras use three wavelength bands in the visual range of 400–700 nm, multispectral cameras typically add one or more additional bands at the Near-InfraRed (NIR) region. In agricultural applications the main role of these additional bands is to cover the red edge in the spectrum caused by chlorophyll in plants. Hyperspectral cameras differ from the multispectral ones in that they cover a certain spectral range (usually either up to about 1100 nm or about 2500 nm) in consecutive wavelength bands. Thermal sensors (wavelength of 3–8 \(\upmu \)m) measure the surface temperature of the foliage and are mainly used for monitoring plant water stress and detecting plant diseases (Messina and Modica 2020). Lidar is the only active measurement technique in the above list as it measures the reflection of an emitted light beam from the surface. Using lidar techniques, the elevation map of a crop field can be produced. In addition, by analyzing the waveform of the reflected pulse, the structure of the targets can be characterized.

Remote sensing data from high-altitude satellite systems form another important data source in smart farming and yield prediction applications. Data from satellites were available long before UAVs became available, however, after the launches of higher resolution systems such as Landsat 8 in 2013 and Sentinel 2 in 2015, and after several operators have started to offer their data for free over a well-defined interface, the number of studies and services based on these kinds of data has increased significantly. A comprehensive list of satellite missions can be found in the Satellite Missions Catalogue,Footnote 1https://www.eoportal.org/satellite-missions the most common platforms employed in agricultural applications being Landsat 7&8, Sentinel 2, WorldView 2&3, and Geofen 1&2. All these missions provide remote sensing data in the optical range of the spectrum starting from 400 nm. The spatial resolution of the data varies from 0.31 m/pixel for commercial WorldView satellites to 10\(\ldots \)60 m/pixel for open access Landsat and Sentinel missions. In addition to the optical range, satellite data from Synthetic Aperture Radar (SAR) missions such as Sentinel 1 or TerraSAR-X have been used in crop yield prediction (Alebele et al. 2021). As mentioned above, remote sensing data can either be used directly for developing yield prediction models or they can be used to derive features such as soil or plant moisture, soil temperature, nitrogen level, etc. to be further used in plant physiology-based yield prediction models. Even if satellite data are freely available, processing and interpretation of the data requires expert knowledge and the farmers usually rely on either public or commercial service providers.

Weather Data

Weather data are probably the most common data source when decisions are made on immediate actions in agricultural production. In contrast to other data sources considered, weather data are often freely available from publicly maintained weather stations. Various derived parameters such as growing degree days may also be available. However, if more accurate and location-specific weather data is required, a private weather station can be installed. More advanced weather stations can provide data on a wide variety of environmental factors such as air temperature, wind speed and direction, atmospheric pressure, light intensity, solar radiation, and precipitation. Indeed, weather data are related to soil temperature and moisture, and due to easy access and interpretation, weather data provide valuable additional information for yield prediction models. Physical models for yield prediction usually involve weather-related parameters directly, whereas in data-driven models they can be used as additional data features. Some studies have even built deep learning models solely on weather data to predict crop growth stages (Yue et al. 2020).

Yield Maps

To validate yield prediction models, reference data on actual yield is required. The traditional approach to measuring crop yield is to weigh the harvested grain and calculate the average in a field by field basis. This kind of yield data can be used if the scope of yield prediction is county level, for example (Wang et al. 2020). To obtain data on intra-field variability of crop yield, yield monitoring devices can be mounted to harvesters. These devices may be based on optical measurement or on kinetic mass flow sensors. Also, accurate logging of the location of the harvester is required using satellite navigation systems. While harvester-mounted yield monitors are becoming more common among farmers, the skills required to extract and preprocess the data often hinder their use locally. Different vendors use different data formats and the data need to be corrected for several factors such as the properties of the grain (moisture level, for example) or incomplete swathes of harvesting. Also, point data obtained from the yield monitors need to be aggregated and rasterized. Other methods have also been proposed for intra-field yield assessment such as manual yield assessment within a standard frame at several locations of the field (Narra et al. 2022). The yield map can then be formed by coarse interpolation of the sampled yield values.

2.2 Assessment of Prediction Accuracy

In validating crop yield models for either parameter calibration of physical models or training of machine learning models, some kind of metrics is needed to estimate prediction error. Let us denote the predicted and true yield at location i by \(\hat {y}_i\) and \(y_i\), respectively. The most common error metrics include

(1)

or

(2)

where N is the number of individual units of yield measurement. If the units are of different size (as in the case of yield prediction on a field-by-field basis), the yield values should be normalized by the area of the corresponding field. In the case of intra-field yield assessment, usually yield in equal-sized units (say, 10\({\times }\)10 m) is considered.

The MAE and RMSE error metrics are useful if prediction errors obtained for the same crop in similar growing conditions are compared. Otherwise, it would be more useful to calculate relative error metrics such as

(3)

or

(4)

Another popular performance metric of crop yield prediction models is the coefficient of determination \(R^2\). \(R^2\) evaluates how well the true versus predicted yield values follow the linear regression line and can be calculated as

$$\displaystyle \begin{aligned} {} R^2 = 1 - \frac{\sum_{i=1}^{N}(\hat{y_i} - y_i)^2}{\sum_{i=1}^{N}(y_i - \mu_y)^2}, \end{aligned} $$
(5)

where \(\mu _y\) is the average over the true yield values. In their review on crop yield prediction using machine learning, van Klompenburg et al. have found that in 50 selected studies RMSE was used 29, \(R^2\) 19, and MAE 8 times as the metric of the prediction error (van Klompenburg et al. 2020).

Other metrics for model efficiency used in the context of yield prediction include

(6)

used in the assessment of crop yield prediction models in Chipanshi et al. (2015), for example, or

(7)

where \(\sigma _y^2\) and \(\sigma _{\hat {y}}^2\) are the variances of the true and predicted yield, respectively, and r is the correlation coefficient between the two variables. LCCC measures the goodness of linear regression between predicted and true yield and is used, for example, in Filippi et al. (2019). Still other metrics, more suitable for usage in the context of physics-based models, include the Skill Score (SS) (Johnson et al. 2016) and the ecological distance measure (Tian et al. 2020). It is common to use several error metrics in a single study to better characterize the model behavior.

The above list of yield prediction accuracy assessment measures is not comprehensive, and in individual studies several other metrics have been used. The selection of appropriate metrics should take into account the type of prediction model as well as the usage of the metrics (i.e., for what comparison is the metrics used for).

3 Physics-Based Models for Crop Yield Prediction

There are many plant physiology-based crop growth models available. EU Joint Research Center (JRC) launched the Monitoring Agricultural ResourceS (MARS) initiative in 1988 to acquire information on crop production using remote sensing technology (van der Velde et al. 2019). The crop monitoring and yield forecasting are currently performed by the Food Security Unit of the European Commission’s Joint Research Center using the MARS Crop Yield Forecasting System (MCYFS). Part of this system is the crop simulation module relying on crop models. The main crop growth model used within the MCYFS is the WOFOST (acronym for WOrld FOod STudies) model (de Wit et al. 2019), introduced already in 1989 (van Diepen et al. 1989) and updated continuously since. WOFOST explains crop growth based on the underlying processes such as photosynthesis and respiration. The effects of environmental conditions on these processes are considered when monitoring and forecasting crop growth and yield. WOFOST is open source and numerous implementations of its conceptual framework exist. The model has been used for modeling a wide variety of crops such as wheat, barley, maize, potato, sunflower, and rice in different growing conditions from Europe to China.Footnote 2https://marswiki.jrc.ec.europa.eu/agri4castwiki/index.php/Crop_Simulation Other more limited models used in the MCYFS context include:

  • WARM: a simplified user-friendly growth model for paddy rice crops

  • CropSyst: a multi-layer multi-crop model designed to study the effect of cropping systems management on productivity

  • CANERGO: sugarcane growth model based on daily weather data, soil properties, and data on management.

At the global level, the Food and Agriculture Organization (FAO) of the United Nations has developed the Aquacrop model, widely used to simulate the dependence of crop growth on water and nutrient availability (Steduto et al. 2009). The model is based on converting transpiration into biomass through water productivity. Biomass is connected to yield via the Harvest Index (HI) parameter (see Fig. 1). Similarly to WOFOST, numerous open source implementations of the Aquacrop model exist. In Todorovic et al. (2009) the Aquacrop model is compared with the WOFOST and Cropsyst models in the simulation of sunflower growth under different water regimes. The authors note that whereas Aquacrop is water-driven, Cropsyst can be considered both water- and radiation-driven and the WOFOST model is carbon-driven. It is found that the performance of the three models is similar in simulating biomass and yield, while Aquacrop requires less input parameters. In Mkhabela and Bullock (2012) the performance of the Aquacrop model in simulating yield and soil moisture for wheat is assessed. The model appears to model soil moisture better than yield (\(R^2\) of 0.90 vs 0.66, respectively). Aquacrop is compared to the WOFOST model for potato crop in Quintero and Díaz (2020). Both models gave correlation over 0.99 between the true and simulated harvestable biomass.

Fig. 1
A schematic of aqua crop model presents the connection of biomass to yield through H I. Other factors indicated are I, rainfall, climate, T, phenology, E T 0, canopy cover, E s, T a, C O 2, W P, roots, K s, soil water and salt balance, capillary rise, deep percolation, 1 to 5, and soil fertility.

Schematic of the Aquacrop model (https://www.fao.org/3/i6321e/i6321e.pdf)

A major challenge in applying physics-based crop growth models for yield simulation and forecasting is model calibration. For example, the Aquacrop model has more than 50 input variables or model parameters that should be determined to run the model. Modeling can be performed at a field scale with more precise parameter values or at a regional scale with different calibration for different crops and their varieties as well as different climatic conditions. In Silvestro et al. (2017) the sensitivity of the Aquacrop model to its parameters has been studied using the Morris and EFAST (Extended Fourier Amplitude Sensitivity Test) techniques. In the study, Aquacrop is compared to a more simple SAFYE (Simple Algorithm For Yield expanded with the evapotranspiration component) model (Duchemin et al. 2008) in complexity and plasticity for wet and dry conditions. SAFYE was found to be less complex but of less plasticity.

The main aim of the plant physiology-based crop models is usually not to estimate the crop yield as accurately as possible but rather to understand the factors affecting crop growth, biomass generation, and yield production. The target variables in these models can be other than yield (biomass or leaf area index, for example). The brief presentation of these models here is meant to underline the importance of relating the data-driven yield prediction models to the physiology of plant growth. Performing yield prediction using remote sensing or environmental data is, in fact, an indirect way to assess the factors of crop growth and yield production.

4 Data-Driven Yield Prediction Using Machine Learning

In this section yield prediction methods relying completely on the underlying data are discussed, i.e., no physical model of plant growth or growth environment is considered. Although the prediction algorithm can be called a model also in this case, the model is purely computational and its parameters are determined based on the data by some learning algorithm. If the learning algorithm involves training data (i.e., data for which the true yield value is known), it is called supervised, otherwise unsupervised learning or clustering is in question. The amount of training data required to train a supervised learning algorithm depends on the complexity of the computational model, its structure, and the number of parameters.

While recently more attention has been paid on deep learning models, the so-called conventional classification or regression models are still intensively used. Although it is difficult to draw a strict line between the two types of models from the application point of view, the main difference is that the conventional methods are usually based on precalculated features or properties of the data while deep learning models work on raw data. The number of parameters is usually much higher in deep learning models, and therefore, more data need to be used in their training. Deep learning models are usually not as sensitive to occasional errors in data as the conventional methods; on the other hand, they are only as good as their training data and biased training data will produce biased predictions. Deep learning models can comprehend a large amount of available data of different modalities being capable of combining virtually all the data sources available for a particular task (see Sect. 2.1). The results obtained with deep learning models are difficult to track or interpret, and although methods exist to pinpoint the features in the source data that affect the prediction results most, it is still difficult to relate the performance of the model to certain phenomena.

In the following a brief overview of the conventional machine learning methods and their usage in yield prediction is given. After that, three main types of deep learning models, Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and Transformer Neural Networks (TNNs) are discussed. The following is not a comprehensive literature review of the usage of these models; the reader is directed to the numerous review papers on the subject. The aim is to provide a general overview on the various methods with examples of their use for yield prediction.

4.1 Feature-Based Methods in Yield Prediction

The conventional classification models can be roughly divided into three categories: regression analysis, Bayesian models, and decision trees.

4.1.1 Regression Analysis

The main idea behind these models is to divide the feature space into subareas based on what is known about the true yield in the form of the training data. For example, the training data samples can be projected to the feature space formed by two or more wavelength bands of a remotely sensed data set and a discrimination curve can be defined to optimally separate the data points according to the true yield values. Probably the most common method in this category is the Support Vector Machine (SVM). In its basic form the SVM works in a two-dimensional feature space producing a linear separation line between two classes (in our case, the data points corresponding to yield higher or lower with respect to a certain threshold). Using modifications of the SVM such as kernel functions, SVMs can fit nonlinear discrimination functions and be used in higher dimensional feature space with more than two classes.

SVMs have been widely used for the classification of remotely sensed data acquired from crop fields, especially when evaluating the performance of more advanced deep learning (DL) methods in their early applications (Kim and Lee 2016; Ji et al. 2018). They are still commonly used in agricultural applications, including yield prediction (Kuradusenge et al. 2023). A common usage of SVMs is in combination with the CNNs (see Sect. 4.2) as a classification layer working on the features provided by the convolutional layers of the CNN (Tao and Wei 2022).

4.1.2 Bayesian Methods in Yield Prediction

In its simplest form Bayesian yield prediction models are based on the probabilities:

$$\displaystyle \begin{aligned} {} p(Y_k | \mathbf{x}) = \frac{p(Y_k) p(\mathbf{x}|Y_k)}{p(\mathbf{x})}, \end{aligned} $$
(8)

where \(p(Y_k | \mathbf {x})\) is the probability of certain yield range k given feature vector \(\mathbf {x}\) (posterior probability), \(p(Y_k)\) is the prior probability of having yield in the range k, \(p(\mathbf {x}|Y_k)\) is the likelihood that if yield values are in the range k, certain feature vector \(\mathbf {x}\) has occurred, and \(p(\mathbf {x})\) is the probability of having a certain feature vector \(\mathbf {x}\) in the first place (i.e., evidence). Thus, to determine the model one should have the knowledge on how the probability of observing certain source data values (wavelength band values in remote sensing or temperature/precipitation sums, for example) relates to the probability of having yield in certain ranges. Once the probabilities have been determined using the training data, the model can be used for obtaining the posterior probability of future yield values given the input feature vector.

The Bayesian method has the additional advantage of obtaining the uncertainty of the predicted yield values. Also, information about the sensitivity of the model output to changes in the input variables is inherently present in the model, while in the case of DL models, Monte Carlo analysis should be performed to assess the sensitivity of the model to its input. Bayesian inference is also widely used with physics-based model. The probabilities in Eq. 8 can be based on physical models and the knowledge on the underlying phenomena instead of using the training data.

An example of maize yield prediction based on temperature and precipitation using Bayesian inference is presented in Shirley et al. (2020). In Bazrafshan et al. (2022) Bayesian analysis is used to quantify the uncertainty of the parameters and input variables of yield prediction models that rely on other techniques such as multi-layer perceptrons or neuro-fuzzy models.

4.1.3 Decision Trees in Yield Prediction

The basic idea behind decision trees is to use expert knowledge in classifying the input feature vector by comparing the values of the features to predetermined thresholds in a step-by-step manner. As the models described in Sect. 4.1.1, decision trees also divide the feature space into subareas, however, the resulting subareas are rectangles bordered by threshold values used in the tree. From their basic form, decision tree models have developed into ensemble structures where a large number of individual decision trees are applied and their outputs are aggregated according to some rules. These methods are commonly referred to as Random Forest (RF) classifiers. In the context of machine learning, the thresholds used at the tree nodes are determined based on the training data. Also, the structure of the trees can be optimized (referred to as tree pruning). A deep learning approach to decision trees is provided by the eXtreme Gradient Boosting (XGBoost) software library including algorithms for penalization of trees, tree pruning, randomization (to avoid overfitting), and automatic feature selection.

Several studies applying SVMs to perform yield prediction also use RF classifiers in comparison (Kim and Lee 2016; Jhajharia et al. 2023). In Jhajharia et al. (2023) the RF classifier outperformed several other methods including SVM and LSTM (see Sect. 4.3). This indicates that the conventional prediction models have still their advantages despite the shift in the main focus of machine learning-based yield prediction toward DL models. In Huber et al. (2022) the XGBoost model is compared to DL models in soybean yield prediction with the advantage of more transparent prediction process. The authors encourage further experiments with the XGBoost model for other crops and geographical areas.

4.2 Convolutional Neural Network Models

Convolutional neural networks are probably the most widely used deep learning neural network architecture so far. The introduction of the pioneering 7-level LeNet-5 architecture meant the beginning of a new area in image analysis (Lecun et al. 1998). The main component of the model is the convolution operation, where a set of trainable kernels is applied to the input image, resulting in a set of features describing the data. The model learns basic features in the first layers and composite features in further layers. A fully connected (FC) network layer is then used after the convolutional layers to perform the classification. Structures where the FC layer is replaced by other classifiers such as the SVM have also been widely used.

In addition to the feature-extracting convolutional layers, several other properties of the CNNs have contributed to their popularity. The Rectified Linear Unit (ReLU) activation function used after the convolution operator, the batch normalization and pooling layers, as well as using regularization in the loss function used in error backpropagation to avoid overfitting constitute some of the properties behind the success of CNNs. As the most common application area of CNNs is image analysis, they are especially suitable for yield prediction based on remote sensing imagery. However, the kernel filters of CNNs can also be applied to one-dimensional input such as time series. On the other hand, using three-dimensional kernels (3D CNN), sequences of images (or other type of input data) can be used for yield prediction (Nevavuori et al. 2020).

The use of CNNs has been extensively studied in the context of smart farming and agriculture and several comprehensive reviews have been published on the subject. In a review published in 2018, the use of CNNs in agriculture has been considered in a set of 23 papers published between 2014 and 2017 (Kamilaris and Prenafeta-Boldú 2018). It was found that the most popular application areas of CNNs were fruit counting, plant recognition, land cover classification, weed identification, and disease detection, with one paper considering maize yield estimation (Kuwata and Shibasaki 2015). In a later review on using machine learning techniques specifically for crop yield prediction, 50 papers were considered (van Klompenburg et al. 2020). Of these, 30 papers applied deep learning models, CNN being the most popular with 15 cases. In some cases CNNs were combined with LSTMs (see Sect. 4.3) or some modification of the basic CNN architecture (such as Region-based CNN, R-CNN) was used.

4.3 Recurrent Neural Network Models

Recurrent Neural Networks (RNNs) form a subclass of deep learning architectures designed to analyze sequential data. As the term recurrent implies, the output of a network node can be used as an input to the same node at the next step of the sequence, forming loops. Another way to look at the network structure is having multiple nodes operating on consecutive elements of the sequence (feature vectors corresponding to consecutive sampling instances of data sources, for example). In addition to the input values, a state variable from the previous node is fed to each network node. The output of the network can be taken from all nodes forming an output sequence or just from the last node (if, for example, a single crop yield value is to be obtained based on a sequence of input feature vectors). Also, CNN layers can be applied to the input data before feeding them to the RNN nodes to automatically extract features, or FC layers can be applied to the RNN outputs for classification.

A node of an RNN structure is more complex compared to what is usually considered a node in a conventional neural network or in CNN architecture, containing several trainable parameter matrices and gates. Several modifications of RNN nodes have been introduced. The most popular RNN subclass in agricultural applications seems to be that of Long Short-Term Memory (LSTM). The main idea behind LSTM node architecture is to avoid vanishing or exploding gradients when training the network using backpropagation. There are two general concepts in the LSTM that help it learn temporal features from data. The first is the concept of memory, introduced as the cell state. The other one is the concept of gates, effectively trainable FC layers, manipulating the cell state in response to new inputs from the data and past outputs of the model. To handle the sequence of data, the model loops over the sequence, altering its cell (C) and hidden (H) states in the process using a combination of learned parameters and nonlinear activation functions.

In the review by van Klompenburg et al. (2020) 8 papers were found applying either LSTMs or hybrid methods including LSTMs to yield prediction. In a more recent review on deep learning methods for crop yield prediction using remote sensing data, 44 papers were considered. It was found that since 2018 the number of papers on the subject has been increasing exponentially and that LSTMs are gaining popularity with 30% of the studies applying this model (Muruganantham et al. 2022). Also, various hybrid architectures and subclasses of CNNs and LSTMs have been applied. In Nevavuori et al. (2020) we tested four different models (pretrained CNN, CNN-LSTM, convolutional LSTM, and 3D CNN) for the prediction of wheat, barley, and oats yield based on a sequence of UAV-based RGB data and found that the least prediction error was obtained with the 3D CNN model, while the CNN-LSTM model performed in a more stable manner (i.e., did not produce ill-fitted predictions for individual inputs).

4.4 Transformer Networks

Recently, a new deep learning architecture, generally called transformer network, has been presented. The basic transformer architecture was first introduced in Vaswani et al. (2017) for natural language processing applications such as translation. Transformer networks are based on the encoder-decoder architecture with a connection between the two. As RNNs, transformer networks are designed for the analysis of sequences of data, however, instead of sequential data processing by network nodes, joint information between all pairs of the elements of the sequence is considered by a set of computations in the multi-head attention block. A desired output sequence is fed to the decoder part for training the model and is processed by the masked multi-head attention block, the output of which is combined with the information coming from the encoder and fed to another attention block. FC layers are also used in both encoder and decoder.

Transformer networks have outperformed other deep learning structures in language models and in linguistic Artificial Intelligence. However, they have recently been successfully applied also to image analysis (using the Vision Transformer (ViT) architecture (Dosovitskiy et al. 2021)) as well as to other forms of source data. In this case, the image blocks are considered as the elements of a sequence. The blocks are encoded together with the information about the position of the block within the image before feeding to the multi-head attention. When considering yield prediction based on remote sensing data, transformer networks have the advantage of making better use of long-range and multi-level dependencies across the regions within the image (spatial dependencies) as well as long-term time dependencies in a sequence of images.

As of writing this chapter, only a few studies could be found applying transformer networks for crop yield prediction. In Liu et al. (2022) a modified version of the transformer network, called Informer (Zhou et al. 2021), was used for rice yield prediction across the Indian Indo-Gangetic Plains by combining time-series satellite data and environmental variables. The Informer model was found to give higher \(R^2\) and lower RMSE and MAPE than the other tested models (the Least Absolute Shrinkage and Selection Operator (LASSO), RF, XGBoost, and a modification of the LSTM) almost consistently. In Bi et al. (2022) two transformer networks, the ViT for image analysis and another transformer module for time series analysis, were used for the prediction of soybean yield. The authors claim a reduction of 40% in the prediction error compared to the baseline models of CNN combined with Linear Regression and CNN-LSTM. In other studies the transformer networks have been used for crop disease detection (Jubair et al. 2021) and crop classification (Weilandt et al. 2023). These early studies are promising, and given the success of the transformer models in other application areas we can expect rapid growth in their application for crop yield prediction as well.

5 Discussion and Conclusions

This chapter is an attempt to give a brief overview on the techniques and technologies used for the prediction of crop yield. The number of studies dealing with the task has increased exponentially during recent years. One reason might be that smart farming and precision agriculture have gained a lot of attention and the amount and variety of available data to develop methods for yield prediction has also increased, especially in the area of remote sensing. Satellite data have become freely available from various sources and drones are now in the reach of all the interested users. The role of data in agriculture has been intensively discussed and rules are being developed to determine the ownership and value of data. This gives incentives to develop algorithms and tools that would make use of the data and provide additional value for stakeholders. The largest increase in studies concerning yield prediction is related to applying novel deep learning methods to the task. Yield prediction is a favorable task to test and apply these methods as the reference data is relatively easy to obtain using yield monitors, for example.

We have included a brief overview of yield monitoring models based on plant physiology in this chapter. This is usually considered as a separate subject compared to machine learning-based yield prediction. This can be justified as the aims of the two types of models are different and obtaining an accurate yield forecast is not the primary goal of physics-based models. However, we suggest that combining these two branches of research would be worth paying more attention. From the point of view of yield forecasting, the machine learning models can be considered as metamodels for physics-based crop growth models. Also, machine learning can be used within physics-based models to assist in determining model parameters and in the calibration of the model.

There is virtually an infinite set of possibilities to test and evaluate various models for crop yield prediction. The models vary according to the crops and their varieties, climatic conditions, model structures, soil types, crop management, etc. For the model to be used in practical decision-making, the use cases and limitations of the models should be well defined. Linking the deep learning models to physical properties of the growth conditions and plant physiology makes the models more reliable and encourages their use.