1 Introduction

By the middle of this century, we will be roughly 9.7 billion [1], this will include 1 billion of us who shall be chronically under-nourished and would suffer from multiple nutritional deficiencies [2]. Generous estimates predict a 60 per cent increase in the global food production (based on 2005–2007 levels) has to be achieved at all cost for negotiating this monumental challenge [3]. Besides the challenge of attaining this production level, a key consideration has to be the equitability of access of the food produced. Thus, direct and indirect cost of production of the food has to remain plateaued. This in turn can be attained by optimizing the inputs in terms of water requirement; fertilizers, insecticides, pesticides, weedicides, etc.; controlling the postharvest losses, restoring the quality of produce during storage, maintaining cold chain, etc.

There are challenges to be meted out while we look for this increased agricultural production figures. Coming decades shall witness scenes of severe water scarcity as agricultural water use rises from 3220 to 5152 km3 by 2050 [4]. Similarly, there will be a marked decrease in arable land as soil erosion will take away 33 billion tons of arable land and fertilizer usage will rise from the present 190 million tons to a new high of 223 million tons [5]. This increase in fertilizer application shall contribute to pollution of water by increasing its nitrogen and phosphorus content to 150 and 130 per cent, respectively [6]. Water requirement by the crops needs to be optimized in terms of water stress invoked due to deficiency of the same. Soil moisture can be directly correlated with the water availability for plants; it is therefore widely used as an indicator of water stress [7, 8]. While pressing water stress can reduce productivity, affect produce quality and facilitate onset of disease [9]; moderate water stress has been reported to improve quality of yield of the agro-produce [10]. Regulated deficit irrigation (RDI) can be a potent technological intervention for reducing the staggering amount of water used for irrigation [11].

Providing food security and ensuring sustainability in agricultural production while decreasing the environmental impact of agriculture can be made possible by precision agricultural practices [12, 13]. It is basically data driven, technologically enabled sustainable farm management system which requires deployment of internet-of-things (IoT) [14] based sensors [15] for monitoring crop stress phenotyping [16], assessing nutrient requirement [17], analysing crop growth [18], using unmanned vehicles for computer vision-based weed and disease identification [19]. All this information is compiled by suitable software tools in smart embedded devices for a resilient artificial intelligence (AI)-based decision support system in the agroecosystem [20]. Successful realization of precision agriculture applications shall result in reducing production cost, optimizing labour, energy, space; all this will ultimately lead to enhanced profits from farming.

It is estimated that between 30 and 50 per cent (1.2–2 billion tons) of food produced on the planet is not consumed [21]. The spoils are shared equally by the postharvest losses attributed to quantitative losses due to managerial and logistic issues; and food wastage due to qualitative issues attributed to the biochemical changes within the food matrix. The blue water foot print of this food lost is about 250 km3 annually [22]. Assessment of the external and internal quality of food and agro produce can be cost effectively carried out non-destructively by spectroscopic sensing approaches [23,24,25,26]. Heterogeneity of samples, spectrometers and environment results in a lot of inconsistencies with the spectral data leading to numerous problems during quality evaluation [27]. During feature extraction, the chemometric models should display robustness and possess the inherent capabilities to remain unaffected by detection conditions and biological variabilities of the samples.

Penchant for predictions becomes an obsession for humans when uncertainty prevails over the outcomes. Agriculture is one such set of activities where uncertainty lurks behind every operation, and for operations involving engineering interventions the associated challenges have a substantial monetary baggage as well. Unreliable expertise in judging and foreseeing the unpleasant situations has constantly tickled humans to device tools and methods to scale time and opt for corrective measures to reap a rich harvest sustainably. A tool of relevance for predicting situations and causes in relation to agriculture is deep learning network. It includes a broad category of machine learning techniques wherein features are learnt in a hierarchical fashion. This technique can successfully handle computer vision tasks, which includes image classification, detection and segmentation [28]. Literally it means that the simple modules stacked in numerous layers are all learning and simultaneously computing nonlinear input–output mappings. Each module is capable of transforming the representation of input to increase selectivity and invariance. Multiple nonlinear layers make it possible for a deep learning architecture to implement extremely intricate input functions while being sensitive to all the minute details. This makes it possible for deep learning modules to distinguish, say, between a diseased leaf and a healthy leaf while not taking into consideration the background, orientation, lighting or the surroundings.

Past decade has seen a deluge of sensors and transducers that have been coupled with various electronic gadgetry to record responses of the various vectors causing detrimental effect in agricultural production. The plethora of sensors are generating massive volume of data. Interpretation of this data to decipher valuable information poses a worthy challenge across all disciplines of agricultural engineering. Deep learning network has made extraction of features from complex nonlinear data simpler by using convolutional neural network (CNN) [29], recurrent neural network (RNN) which includes long short term memory (LSTM), bidirectional long short term memory (bi-LSTM) and gated recurrent unit (GRU) [30,31,32]. Some of the other deep learning architecture include deep belief network [33], auto encoders [34] and deep Boltzman machine.

Deep learning can be used to carry out big data analysis for computer vision [35] applications related to plant water stress management [36] and help in formulating the protocols of RDI for efficient water management. Extraction of information from spectral data representing local and global features of agro produce can be effectively carried out by deep learning approaches [37]. Deep learning can handle complex image-based plant phenotyping like, leaf counting [38], disease detection [39], pixel-wise localization of root, shoot and ears [40]. All this information can be combined to support the development of intelligent agricultural machineries [41].

During the past century, agricultural engineers have immensely contributed to several path breaking advancements and developments in agricultural mechanization [42]; these professionals are instrumental in supervising the agrarian community for their sublime transition from machinery operators to machinery supervisors. Thus, enabling them with precision agricultural technologies circumscribing precision in water management, intelligent use of agricultural machinery and smart postharvest management of agricultural produce. This paper embodies amalgamation of deep learning applications to the different facets of agricultural engineering. Literature search revealed that deep learning has been applied to a wide range of issues related to subject of this paper, the selection and rejection was methodologically (Fig. 1) carried out to enthrall the readers with a comprehensive and a totalitarian read that elucidates the application of deep learning algorithms for the engineering interventions in agriculture. Highlighted also in this paper are ways and means for extraction of spatio-temporal features to overcome the limitations of conventional approaches and how deep learning will obliviate the hindrances that have been pulling down the widespread realistic adoption of intelligent, smart, IoT-based engineering applications in agriculture. The paper culminates with putting forward the challenges that contemporary deep learning approaches need to address to enable wider effective application and acceptance.

Fig. 1
figure 1

Methodology adopted for collection and inclusion of relevant literature pertinent to this review paper

2 Deep learning versus contemporary chemometrics

Near-InfraRed (NIR) spectrum spreading across 780–2500 nm has been widely used to register the changes in agri-food system before harvest, in terms of plant attributes, stresses, diseases, yield attributes, weed detection [43]; and after harvest, in terms of varietal differences of the produce, food quality, food contamination, etc. [27]. Excellent results have been demonstrated during the estimation of a range of soil properties in the visible and NIR range; this includes soil moisture as well [44]. In fact, there is an entire gambit of precision agriculture outcomes that can be addressed by deep learning techniques (Fig. 2).

Fig. 2
figure 2

Figurative representation of the broad scope of application of deep learning techniques in precision agriculture approaches for pre and postharvest operations

Plants undergo various changes in colour and shape followed by various physiological and biochemical changes as a response to attack by pathogens, such attacks often culminate to onset of diseases. Stress induced by disease, water, light or pests have a direct bearing on the transcription factors (abscisic acid, auxin and cytokines) which can be deciphered directly by molecular and serological methods only for high throughput analysis; and indirectly by thermography, fluorescence, spectroscopy or hyperspectral imaging (HSI) and the associated chemometrics. However, susceptibility to ambient environmental conditions and the absence of steady light during imaging restricts the exhaustive use of this technique.

In postharvest agriculture the absorption spectra generally record changes by means of the hydrogen containing groups (e.g. S–H, C–H, N–H, O–H) which are directly related to proximate composition of agro-products in terms of sugar, protein, fat, acid and water contents. Therefore, the spectrum is loaded with information in terms of related bio-molecules and other chemical substances. The underlying principle is explained by Beer-Lambert law, which expresses that a liner relationship exists between the absorbance spectra that entails changes in chemical composition of the substrate. Variations in the basic biochemical matrix of agro produce can be effectively reflected by different linear and nonlinear chemometric methods. Often the linear models fail to register subtle changes in the chemical composition and nonlinear models are always associated with the risk of over-fitting. Acquiral of spectral data can never be bereft of spectral noise. While spectral pre-processing algorithms can handle the noise arising due to biological anomalies of chemical compositions and noise arising due to the changes in environmental conditions; the introduction of noise due to physical state of spectrophotometer drifts the relevance of spectral data far and away.

It has been widely reported that assigning specific features to soil spectra is difficult for it being a heterogeneous complex mixture of materials [45]. Traditional regression models are not suitable to model soil moisture content because of the associated nonlinearity and non-stationarity with parameters which are difficult to measure in the field. The limitations of traditional modelling techniques can be minimized by using soft computing based data driven techniques (machine learning and deep learning) to estimate the components of hydrologic cycle (such as, ET0, runoff and soil moisture) as a function of time and space. The accuracy rendered by these techniques for estimation of ET0 and soil moisture (SM) is more or less in the acceptable range. However, the effective use of these techniques is limited by quality and period of time series data. It is therefore well understood that performance of empirical and ensemble models for prediction of short term daily ET0 is dependent on the choice of model and reliability of input variables. The suitability of these techniques for predicting short term (1–7 days) ET0 for real time irrigation scheduling based on actual water requirement is questionable. There is dire need for such models and techniques which can fill this gap.

Deep learning algorithms focus on learning features progressively from the data at several levels [46, 47]. As deep learning models learn from data, a clear understanding and representation of data are vital for building an intelligent system that can make complex predictions. Proper model selection is also crucial as each architecture has multiple unique features and processes the data in different ways. The deep learning architectures applied to the agricultural engineering domain have been mainly based on Artificial Neural Networks (ANN), Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN) and Auto Encoders (AE) [48, 49]. A succinct description of the above said architecture is as follows:

ANN architecture comprises multiple perceptrons or neurons at each layer and the neurons in different layers are linked by weighted synaptic connections. The architecture of ANN consists of an input layer, one or more hidden layer(s) and an output layer (Fig. 3a). ANN uses a training algorithm to learn patterns by modifying the weights based on error rate between actual and predicted output. ANN uses back propagation algorithm as a training algorithm to discover hidden patterns inside the dataset. The universal approximation capability and flexible architecture allows ANN models to capture complex nonlinear behaviours in the dataset [50].

Fig. 3
figure 3

Schematic representation of the various DL architectures a ANN b CNN c RNN d AE

CNN is extensively used in computer vision-based systems that can automatically extract features and perform various tasks such as image classification and semantic segmentation. It has been successfully utilized over several challenging visual analysis tasks in agriculture, such as pest-disease identification, stress detection and weed identification; achieved tremendous performance in tasks involving visual image analysis, which were previously considered to be purely within the human realm [51]. Applying various convolutional filters, the models can extract high-level representation of data making it more versatile for tasks such as image classification (Fig. 3b). CNN has three main types of layers namely convolutional, pooling and fully convolutional layers. The convolutional layer generates the feature map capturing all essential features. The pre-trained CNN models such as LeNet [52], AlexNet [53], VGG16 [54], InceptionV3 [55], GoogleNet [56], ResNet [57], MobileNet [58], Xception [59], DenseNet [60] and Darknet53 [61] have been successfully deployed in several computer vision applications [62]. RNN is a class of artificial neural network that address time-series problems involving sequential data (Fig. 3c). Unlike feed-forward neural networks, RNNs can make use of their internal memory to process sequential data. The distinctive feature of RNNs is their capability to send data and process over time steps, the recurrent nature of RNN allows the same function of each input data, while output for present input is based on past computation. After generating the output, it is copied and transferred back to recurrent network. Thus, for decision-making process, it considers the current input and output learnt from previous input [63].

AE is a special type of artificial neural network used to learn data encoders in an unsupervised manner (Fig. 3d). The input is compressed by the AE into a lower-dimensional code and then reconstructs the final output from this representation. The encoder part of AE is used for encoding and data compression purposes and has a decreasing number of hidden units. The latent space in the network has compact or compressed form of input. The decoding part attempts to regenerate the input from encoded data and has an increasing number of hidden units [64].

3 Production agriculture

Understanding plant phenotyping assumes prominence as it is an important aspect having direct association with all the efforts for increasing food production of the world to meet ever-rising demand. The quantitative study of parameters related to plant traits such as plant growth, stress and yield using rapid and non-destructive sensing technique is an important aspect of high-throughput phenotyping [65,66,67]. Infield measurement of crop parameters can also be accelerated with advances in vision-based technology in agriculture. Abiotic stress phenotyping of plant growth [68], canopy coverage [69], leaf structure [70], weed density, root growth status [71] etc. successfully demonstrated an increasingly important as a way to explore the deep learning based smart stress management system. Plant stress occurs when abnormal environmental conditions arise as a result of biotic (insects, pests, fungus, viruses and weeds) and abiotic (water, temperature, nutrients and toxicity) elements during plant development. These plant stresses are capable of threatening global food security. Plant disease outbreaks are a persistent hazard that is widespread as a result of complicated ecological dynamics and the standard state-of-the-art mechanisms are not able to cope up with this challenge. Using image-based stress datasets holds promise and is perceived to be a possible step in the right direction to handle plant stress management. Significant advances in image processing and machine learning techniques have been made over the last decade. Deep learning based models have high accuracy and can detect plant stress quickly. This method of stress identification is non-contact type, takes less time and output can be used in real time crop health management. The standardization of visual assessments, deployment of imaging techniques and application of big data analytics may overcome or improve reliability and accuracy of stress assessment in comparison with unaided visual measurement [72, 73]. Compared to traditional computer vision engineering, deep learning assists traditional computer vision techniques to achieve higher accuracy in crop for image detection, stress identification, classification, prediction, quantification and segmentation [72]. Methodology adopted in the recent studies deploying deep learning approaches in production agriculture for measurement of plant characteristics, weed detection, biotic and abiotic stress assessment and yield parameters has been summarized and presented schematically in Fig. 4. The essence of these studies is discussed in subsequent sections.

Fig. 4
figure 4

Schematic representation of precision input application measures for pre-harvest agriculture in a deep learning work environment

3.1 Measurement of plant attributes

Precise seedling counting, plant stand, panicle count are vital vectors for assessing seedling vigour, estimate crop density and uniformity of emergence rate for field and plantation crops. Tassel detection and flower counting offers a new opportunity for yield estimation and optimize fruit production in plant; all this, without using automatic yield monitoring system and facilitating site-specific crop management [74, 75]. Deep convolutional neural network (DCNN) (Faster Region Convolutional Neural Networks (FR-CNN) and Convolutional Neural Networks (CNN) + support vector machine (SVM)) algorithm significantly facilitated advanced approaches for apple flower detection [74]. Wu et al. [76] in their study captured a dataset comprising 147 images under natural and uncontrolled field condition and validated appropriately with respect to a previously unseen data set. A rice seedling dataset consisting of 40 high-resolution aerial images captured and collected in situ by red green blue (RGB) camera perched in unmanned aerial vehicle (UAV); manually-dotted annotations for rice seedling counting was analysed through deep CNN-based technique. Good performance accuracy (> 93%) between manual and automated rice seedling counting (UAV image-based) heralds a new opportunity for yield estimation with high accuracy. Labourious and subjective scoring systems of cotton flowering patterns recognition and bloom detections have been replaced by deep learning approaches [77]. The promising results for characterization of flowering patterns among genetic classes and genotypes has been adopted to predict reproductive improvements and was found to be of pivotal importance for crop yield forecasting. Higher classification accuracy (> 90%) of DCNN for paddy tiller counting [78], faster R-CNN for characterization of flowering patterns for cotton plants [79], MobileNet for cotton plant detection using UAV system [77] and CNN + SVM for flower detection in apple [74] show the potential of deployment of these technique into an online embedded system for electronically connected yield estimation. TasselNetV2 + outperformed TasselNetV2 for counting of wheat (R2 = 0.92), maize (R2 = 0.89) and sorghum (R2 = 0.68) plants using high-resolution field images (1980 × 1080) in less time [80]. Further, it was reported in the study that compared to Faster R-CNN; TasselNetV2 + indicates its effectiveness and robustness in different plant dataset like wheat ears (R2 = 0.92), maize tassels (R2 = 0.89) and sorghum heads (R2 = 0.67) counting. This feature of TasselNetV2 + can be attributed to its inherent ability for encoding sufficiently good appearance features even at low image resolution and not counting repetitive visual patterns like Faster R-CNN [81, 82]. TasselNetV2, TasselNetV2 + performs better than Faster R-CNN [61] whereas Faster R-CNN performs better than TasselNet for detecting maize tassels [80, 82]. ResNet demonstrated far better results than the VGGNet when it came to detecting and counting maize tassels from original high resolution UAVs images [82]. The application of LSTM model to simulate the effect of extreme climate change, plant phenology, meteorology indices and remote sensing data across the nine states of Corn Belt of USA could predict 76 per cent of corn yield variations [83]. Considering the extreme weather conditions, the LSTM model proved to be more robust as compared to other machine learning models, like–least absolute shrinkage and selection operator (LASSO) and random forest (RF) [84, 85].

3.2 Abiotic stress assessment

Spectroscopic and imaging are noninvasive abiotic stress identification methods used for discovering deficiency (nutrient, water, seed vitality, etc.) that affect the vigour of plants. Identification of abiotic stress includes extraction of biophysical parameters of plants through canopy water content, leaf pigments, canopy nitrogen and light use efficiency from the spectral data. Digital imaging is a simple and low-cost measurement technology which will act as a power full tool when it is used along with deep learning technique for stress monitoring applications in precision agriculture. Deep learning has introduced a paradigm shift in 2D RGB image-based plant stress phenotyping [72]. A broad range of deep learning techniques have been used in crop abiotic stress phenotyping, including DCNN [86], AlexNet [87], Faster R-CNN [88], GoogLeNet [87], ResNet [89], RootNav [71], SegNet [90], SW-SVM [91], VGGNet [88, 92] and UNet [71]. Deep learning architectures have been successful on a vast range of plant abiotic stress phenotyping work, such as crop identification/recognition based on leaf vein morphology patterns [70, 93], leaf counting and tassel detection in maize and sorghum [75, 94,95,96], stalk count and width of plant [97], panicle segmentation in sorghum [98], root localization and feature detection [71, 99], bloom detection, emerging counting, flowering characterization in cotton and apple [74, 77] and soil moisture estimation using thermal image [86].

Deep learning technique has been used for identification of abiotic stresses in field crops (paddy, maize, soybean sorghum, and wheat) as well as horticulture crops (tomato potato, okra). VGG-16 architectures were found to be a capable system for recognition and classification of various abiotic stresses in different varieties of paddy crop using the 30,000 RGB images with an accuracy of 92.13 and 95.08 per cent, respectively [92]. Non-destructive imaging, such as proximal and remote sense were used for deep learning-based abiotic stress identification under field conditions with different illuminations, background, colour, size and shape crop. It was concluded that the accuracy of object detection is based on the right selection of deep learning tools, optimum number of high-resolution images and image dimensions [79]. Across all the studies it was a common observation that the deep learning-based object detectors like AlexNet, Faster region convolutional neural network (Faster RCNN), GoogLeNet, Inception V3, SW-SVM with VGG-16, RestNet, SegNet performed far better as compared to other architectures in identifying plant abiotic stresses.

In a visual assisted precision agriculture application, a DCNN model was developed to identify water stress in maize and soybean using deep learning models. Three novel frameworks, i.e. AlexNet, GoogLeNet and Inception V3 were used as an unsupervised technique to precisely separate the visual cues representing water stress on the leaf of plants. GoogleNet was found to be superior with an accuracy of 98.3 and 94.1 per cent for maize and soybean plant, respectively. It was inferred that, the digital RGB images cues contribute to the deep learning model maximally for decision management [87]. Unsupervised localization of RGB image cues is used to identify the abiotic stress level [16]. To identify and visualize abiotic stresses in horticultural crop (tomato, potato and okra) images of various nutrient (excess or deficiency), soil moisture (excess or deficient water), and canopy temperature (low or high) stress were not obtained/available from the public database. AlexNet and GoogLeNet architectures were used in most of the stress identification studies in vegetable crop and GoogLeNet outperformed AlexNet in terms of accuracy [87, 100]. All told, recent studies indicate the growing potential of deep learning applications for plant stress identification and classification pattern; perhaps, this evades the comprehensive and extravagant stress region judgment by field specialists and opens up the scope for utilization of image-based plant phenotyping leading to the development of user-friendly PA tools.

3.3 Detection and classification of plant disease

Plant health monitoring and disease diagnosis are essential in the early stages of plant growth to prevent disease transmission. It helps in effective crop management prior to significant crop damage. Plant disease identification is traditionally done manually, either by visual observation or by using a microscope. These methods are time-consuming and labour-intensive; it involves a substantial risk of misidentification due to subjective perception of the human mind. The task of plant disease identification can be accelerated by adoption of advanced technologies which are based on image processing and artificial intelligence. Deep learning, which uses good quality images as source data is gaining popularity now-a-days for crop health monitoring and management; this is in line with developing an artificially intelligent system. Deep learning architectures such as AlexNet, GoogLeNet, ResNet, VGG and DenseNet have been successfully used to identify and classify various plant diseases in food crops such as wheat [101], maize [102], rice [103] and millets [104], cash crops such as sugarcane [105], tobacco [106], cotton [107], jute [108], plantation crops such as coffee [109], coconut [110], tea [111] and horticulture crops such as tomato [112], ladyfingers [113], apple [114] and grape [115].

Many researchers have used the images of diseased plant from a public database for training the deep learning architectures. Mohanty et al. [39] trained the deep convolution neural network with 54,306 images of healthy and diseased plants available from public database and identified 14 diseases of 26 different crops using GoogLeNet architecture with an accuracy of 99.35 per cent. An open database of 87,848 images of 25 types of crop with 58 distinct classes of plant and disease was used by Ferentinos [116]. The dataset was split into an 80/20 training/testing ratio, the most commonly used for neural network applications. Deep learning architectures AlexNet, AlexNetOWTBn, GoogLeNet, Overfeat and VGG were utilized for identification of various classes. The success rate of VGG architecture was 99.53 per cent with an inaccuracy of 0.47 per cent. Too et al. [117] achieved an accuracy of 99.75 per cent with DenseNets architecture using the same database as used by Mohanty et al. [39]. Deep CNN was deployed on 70,295 images of same database and obtained an accuracy of 99.78 per cent with ResNet [118]. The accuracy of disease detection and classification was reportedly increased with the evolution of the architectures of DCNN. The black sigatoka and speckle diseases were identified and classified in banana [100]. The images were obtained from the open source, trained using the LeNet architecture and features extracted using convolution and pooling layers. Deep learning technique was able to identify and classify both diseases with 99.72 per cent accuracy. Tomatoes are susceptible to diseases such as late blight, two-spotted spider mite, target spot, leaf mould, mosaic virus and yellow leaf curl virus that reduce production and impair quality. A collection of 13,262 diseased tomato leaf images was obtained from the PlantVillage dataset to train AlexNet and VGG16 deep learning architectures for non-destructive estimation of the extent of diseases [112]. AlexNet showed a good accuracy in classification (97.49 per cent) at minimum runtime as compared with VGG16 (97.26 per cent). Ji et al. [115] proposed a UnitedModel for grape leaf disease detection based on InceptionV3 and ResNet50; and compared it with VGGNet, GoogLeNet, DenseNet, and ResNet architectures. The leaf images (1619 numbers) of black rot, esca and isariopsis leaf spot diseases were taken from the PlantVillage dataset. The UnitedModel extracts more representative features using the width of InceptionV3 and the depth of ResNet50, resulting in 98.57 and 99.17 per cent test and validation accuracy, respectively, for grape leaf disease detection.

Several researchers used a camera and a smartphone to capture digital images of diseased plant leaves and trained deep learning based algorithms for disease detection and classification. Rangarajan and Raja [113] collected 2554 digital images to classify ten major diseases that affect the leaves of eggplant, hyacinth beans, lime and lady finger plants. Six pre-trained CNN models viz. AlexNet, VGG16, VGG19, GoogLeNet, ResNet101 and DenseNet201were used for identification and classification of different diseases. Among all the architectures tested, GoogLeNet performed better, with a validation accuracy of 97.3 per cent. Prune crops such as peach, cherry and apricot are widely grown in temperate and subtropical region. Virus-infected prune trees have a growth depleted by 10–30 per cent which results in a decrease in yield by over 20–60 per cent as compared to the healthy trees [120]. Deep learning approach was used for plant disease and pest detection in prunes [121]. A total of 1995 images of eight different diseases and pest-affected plant leaves were collected for the experiment. The transfer learning based pre trained deep learning models GoogleNet, AlexNet, VGG16, VGG19, ResNet50, ResNet101, Inception-V3, InceptionResNetv2 and SqueezeNet were used for feature extraction. The performance of the various extracted features was measured by SVM, Extreme Learning Machine (ELM) and K-Nearest Neighbours (kNN) classifiers. The maximum accuracy of disease detection was achieved at 97.86 per cent in ResNet50 with the SVM classifier. Plantation crops such as cotton, coffee, tea and sugarcane are widely cultivated and have high economic value. Infestation of such crops with diseases brings forth a huge economic shock for the farmers. Deep learning-based techniques have been used by researchers for precise disease management and improvement in quality by minimizing yield loss. Manually captured 13,842 images of diseased plants were used to train and test a DCNN model for the recognition of smut, grassy shoot, rust and yellow leaf diseases in sugarcane crop [105]. The sugarcane diseases were successfully identified and classified with an accuracy of 95 per cent. Esgario et al. [109] used smartphones to collect images of coffee leaves (1747 numbers) infected with leaf miner, rust, brown leaf spot and cercospora leaf spot diseases. The AlexNet, GoogLeNet, VGG19 and ResNet50 architectures were used for classification and estimation of disease severity. It was observed that the performance of ResNet50 was better than the rest with an accuracy of 95.63 per cent. Hu et al. [111] collected 144 images of diseased tea plant leaves, such as leaf blight, bud blight and red scab, to improve the performance of CIFAR10 model for disease identification with a small number of images. The number of model parameters was reduced in the proposed deep learning architecture to improve the detection process. The improved CIFAR10 model correctly identified tea leaf diseases with 92.5 per cent accuracy. Detection of diseases (cercospora, bacterial blight, aschocyta blight and target spot) in cotton leaves could be achieved with 96 per cent accuracy after training DCNN on 500 manually collected images [107]. The rice, wheat, maize and soybean are cultivated on large scale worldwide and are considered as important food and feed grains. Diseases can spread easily in these crops resulting in significant yield losses. Numerous studies have been conducted for the identification and classification of diseases in food grain crops using deep learning based techniques. A deep CNN based algorithm has been used to classify blast, bakanae, false smut, brown spot, sheath blight, bacterial leaf blight, sheath rot, bacterial sheath rot, bacterial wilt, seeding blight diseases in rice crop [103]. Model was trained by the images of diseased plants captured with a camera and some images gathered from public sources (a total of 500). The accuracy of DCNN for disease classification was found to be 95.48 per cent. Again, Lu et al. [101] used 9230 wheat plant images from a public database to train deep learning architectures for recognizing powdery mildew, stripe rust, smut, leaf blotch, black chaff and leaf rust diseases in wheat plants. VGG-FCN-VD16 and VGG-FCN-S were found to have recognition accuracy of 97.95 and 95.12 per cent, respectively. Wu et al. [122] identified the bacterial rot, downy mildew, pest and spider mite diseases in soybean after training deep learning models with 1470 leaf images. ResNet outperformed other architectures such as AlexNet and GoogLeNet demonstrating an accuracy of 94.29 per cent. Deep learning based approaches for disease detection and classification have been employed by numerous researchers in a variety of crop. In addition, the deep learning architectures have been updated to improve accuracy and make better predictions in challenging environments. Plant pathologists and farmers will be able to diagnose plant diseases early and take necessary precautions with on-the-go application of deep learning based technologies.

3.4 Yield attributes and harvesting

Detection, counting and size estimation are critical tasks for fruit harvesting and yield estimation. Research is progressing in the direction of vision-based systems for autonomous fruit harvesting. In a robotic fruit picking harvester, vision system and manipulator system are the two distinct components. Fruits attached to plants between the leaves, stems and branches are identified primarily through the vision system. Numerous researchers have used the feature extraction characteristics and autonomous learning ability of deep learning in the vision system for an effective detection, counting and harvesting of fruits. LedNet is a deep learning based framework for real-time apple detection reported to be useful in orchard harvesting [123]. The developed framework was robust and efficient, performing detection tasks with a recall and an accuracy of 0.82 and 0.85, respectively. Onishi et al. [124] implemented the VGG16 architecture for detection of apple after receiving the image from a stereo camera. Sa et al. [125] proposed a deep learning based technique for fruit detection after fine-tuning the VGG16 network with a pre-trained ImageNet model. The output thus obtained could be used for fruit yield estimation and for automatic harvesting. The F1 score for rock melon, sweet pepper, apple, avocado, orange and mango were 0.85, 0.84, 0.94, 0.93, 0.92 and 0.94, respectively. It was observed during the course of this study that scores were affected by the complexity of fruit shape and similarity of colour with plant canopy. ResNet50 combined with Feature Pyramid Network architecture along with Mask R-CNN was used for the detection of strawberry [126]. This approach could overcome all the limitations of strawberry fruit identification under typical field condition, like multi-fruit adhesion, overlapping, field obstacles and varying light conditions around the plants. The detection by the trained model with precision, recall and mean intersection over union was 95.78, 95.42 and 89.95 per cent, respectively. Afonso et al. [127] used the Mask R-CNN algorithm for detection of tomato in a greenhouse. The performance of Mask-RCNN for tomato detection was found superior than machine learning approaches used by Yamamoto et al. [128] and the Inception-ResNet based architecture of Rahnemoonfar and Sheppard [129]. Estimating the size of broccoli is crucial for determining its harvestability and yield. Blok et al. [130] used a deep learning algorithm called the occlusion region-based convolution neural network (ORCNN) for dealing with occlusions and assessed the size of broccoli. The ORCNN outperformed the Mask R-CNN with 487 broccoli images, a mean sizing error of 6.4 mm was recorded instead of 10.7 mm as in case of Mask R-CNN. Integration of vision-based systems and deep learning approaches for fruit detection, counting and yield estimation has sped up automation in harvesting [131]. It will be easier to cope up with labour—intensive operations by adopting deep learning-based technology in the harvesting of agricultural produce.

3.5 Weed detection

Weed infestation in crops is one of the most serious issues confronting modern agriculture. Weed control is currently carried out by manual hand tools, with the use of weedicides and modern weeding machinery. Various ground-based weed identification and management techniques including artificial neural networks [132], image processing [133], Internet of Things [134] and spectral reflectance [135] have been studied for weed management in crops. The use of a deep learning approach for selective weeding has been reported to be an effective weed control method [136]. The CNN technique was used to detect weeds in spinach and bean crops using an unsupervised training dataset [137]. The proposed system detected crop rows automatically, identified inter-row weeds, created a training dataset and used CNNs to build a model for detecting crop and weeds from a repository of UAV collected images. Ferreira et al. [138] used a UAV to capture field images (400 numbers) and applied machine learning and deep learning techniques to detect weeds in 15,336 segmented images of soil, soybean, grass and broadleaf weeds. The ConvNets detected weeds more precisely and achieved higher accuracy (> 99 per cent) as compared to SVM, Adaboost–C4.5 and RF. The VGGNet, GoogLeNet and DetectNet architectures were used for detection of weeds in bermuda grass. It was observed that VGGNet performed better in the identification of dollar weed, old world diamond-flower and Florida pusley with an F1 score of more than 0.95, whereas, DetectNet had a high F1 score of > 0.99 in detecting bluegrass [139]. Deep learning techniques were deployed on 17,509 captured images for classification of eight different weed species [140]. A classification accuracy of 95.1 and 95.7 per cent was obtained in Inception-V3 and ResNet-50, respectively. The ResNet-50 architecture was implemented in real time, yielding an inference time of 53.4 ms per image. Osorio et al. [141] used machine learning and deep learning techniques to detect weeds in lettuce crops from drone collected field images. The F1 scores of SVM, YOLOV3 and Mask R-CNN were 88, 94 and 94 per cent, respectively. Faster R-CNN and Single Shot Detector were used to detect weeds in mid to late season soybeans [142]. Faster R-CNN performed better in terms of precision, recall, F1 score, Intersection over Union and inference time. The ground ivy, dandelion and spotted spurge weeds were successfully detected in ryegrass using deep learning models on a dataset that included 15,486 negative images (no target weeds) and 17,600 positive images (target weeds) [143]. The VGGNet outperformed AlexNet and GoogLeNet in detection of weeds with an F1 score of 0.93 and recall values of 0.99.

The distribution of weeds in the field is usually in patches, but weedicides are sprayed evenly throughout the field, irrespective of the actual requirement. Hence, acquisition of images of the entire field with weed localization using deep learning techniques will be of great help for site specific weed management.

4 Water management

Modelling of hydrologic cycle components such as precipitation, runoff, evapotranspiration (ET) and change in soil moisture is essential for quantification of water balance for sustainable water resource management and planning of irrigation and drainage systems [144]. Estimation of reference evapotranspiration (ET0) for irrigation water management requires a number of climatic parameters such as temperature, wind speed, relative humidity and solar radiation. However, under limited data availability, empirical methods based on temperature or humidity or radiation can give a good approximation of ET0. Such empirical methods give reliable estimates at a particular region only or may overestimate/underestimate values [145,146,147]. Soil moisture prediction/estimation is challenging task due its spatio-temporal variability across the field. Number of sensor based, empirical and statistical techniques are in vogue for indirect estimation of soil moisture at local or regional scale. Several studies are conducted highlighting the importance of pedotransfer function (PTF) in the estimation of different soil moisture regimes, especially for the estimation of field capacity and permanent wilting point, as an important indicator for the estimation of soil moisture content (SMC). [148,149,150], but for the development of such PTF, different soil physical properties such as soil texture (particle size distribution), bulk density and organic matter content (OMC), are required, which makes the estimation of SMC a tedious task and prone to lesser accuracy, when considered a larger scale. With the advent of high computation power, and with the increased capabilities to handle big data even accumulated over several decades and widely approved capacity of handling such data by the deep learning modelling techniques, these models are being realized tremendously by the virtue of their well proven capability to extract information for prediction and better realization of land atmosphere interaction without going into deep of the complex mechanism associated with the physics involved in the process under consideration [151, 152]. The following section elaborates the estimation of ET0 and soil moisture using different deep learning techniques with respect to data availability, lead time, interval, etc.

4.1 Estimation of evapotranspiration

Evapotranspiration is considered as one of the most important components of hydrologic cycle which governs irrigation water management, water resources management, and hydrologic studies [153,154,155, 157]. The ET0 is derived using number of climatic variables and in combination of crop coefficient (Kc) is applied for the estimation of crop water requirement of a particular crop [156]. Different techniques based on artificial intelligence such as machine learning and deep learning using the limited weather data have the great potential for indirect estimation of ET0 with the development of associated robust models [145, 157].

Machine learning (ML) and deep learning (DL) models for estimation of ET0 from limited hourly data of few climatic parameters has been successfully applied by Ferreira and Da Cunha [157]. In this study, CNN model was applied to estimate daily ET0 using limited meteorological data such as temperature, relative humidity, and terrestrial radiation, and compared the results with those obtained using traditional ML models (i.e. RF, XGBoost and ANN). Performance of ANN models were found slightly better than other traditional models, but CNN model outperformed in comparison with the remaining models for all the combinations of inputs. Overall, the results stated that CNN developed using 24 h hourly data and hourly radiation applied to sequential data reduced root mean squared error (RMSE) by 15.9–21 per cent, increased Nash–Sutcliffe efficiency (NSE) by 4.6–8.8 per cent and improved R2 compared to machine learning models at regional and local scales. Besides agriculture and forest dominant locations, estimation of ET0 in urban areas have their own importance in greenery management and dealing with the climate change. Even though such studies are limited, one of such studies reported similar performance of 1D CNN deep learning model and random forest (RF) for prediction of urban ET at half hourly scale by Vulova et al. [158]. In another study dealing with the estimation of ET0 in urban areas, deep learning multilayer perceptron models were applied to estimate the daily ET0 in the Indian cities of Hoshiarpur and Patiala, and their performance were compared with the Generalized Linear Model (GLM), Random Forest (RF), and Gradient-Boosting Machine (GBM). The performance of deep learning models was superior (NSE, R2, mean squared error (MSE) and RMSE) for estimating ET0 over traditional models like RF, GBM and GLM [159]. Roy [160] evaluated the performance of LSTM and bi-LSTM network for predicting one step ahead ET0. Bi-LSTM resulted in highest R2, NS, IOA along with the lower RMSE, relative root mean squared error (RRMSE), and mean absolute error (MAE) compared to other soft computing techniques such as Sequence-to-Sequence Regression LSTM (SSR-LSTM) and Adaptive Neuro Fuzzy Inference System (ANFIS) models. Better performance of Bi-LSTM compared to LSTM for the estimation of ET0 and SM was also reported by Alibabaei et al. [161]. In another study by Yin et al. [162] a single-layer Bi-LSTM model with 512 nodes was applied to forecast short-term daily ET0 for 1–7 day lead times. The hyperparameters such as learning rate decay, batch size and dropout size were determined using the Bayesian optimization method and the training, validation and testing of the developed model was done at three different locations in semi-arid region of China. The performance of bi-LSTM model was evaluated using the Penman–Monteith based daily ET0 to forecast short-term daily ET0. Among several meteorological input dataset, bi-LSTM model with only three inputs (i.e. maximum temperature, minimum temperature and sunshine duration) performed the best to forecast short-term daily ET0 at all the meteorological stations. In another study by Afzaal et al. [163], the highest contributing climatic variables for predicting ET0, namely maximum air temperature and relative humidity, were selected as input variables to the LSTM and bi-LSTM models and were trained and tested using the data for the years of 2011–2015 and evaluated 2016–2017, respectively. The results stated that both the models, i.e. LSTM and bi-LSTM were suitable for estimating ET0 with lower RMSE (0.38–0.58 mm/day) for all sites during the testing period. Overall, no significant differences in accuracy of LSTM and Bi-LSTM compared to FAO 56 method for prediction of ET0 was observed. Proias et al. [164] applied time lagged RNN to predict near future ET0 in Greece. Higher values of R2 and lower RMSE were recorded for prediction of ET0 which were in good association with FAO-56 Penman Monteith method. A performance comparison of ET0 estimation using deep neural network (DNN), temporal convolution neural network (TCN), LSTM with other machine learning models such as RF and SVM and also models based on empirical relationships using different climatic dataset such as temperature, humidity, radiation revealed that temperature-based TCN had higher R2 and lower RMSE as compared to other machine learning and empirical models [165].

Geographical bearings in deep learning was observed when it was found that LSTM performed better in arid regions whereas nonlinear autoregressive network with exogenous inputs (NARX) in semi-arid region of US while predicting ET0 of 1–7 days [166]. However, the accuracy of the models reduced once the period of prediction exceeded 7 days. Monthly average data of climatic parameters were used as an input for support vector regression (SVR), Gaussian Process Regression (GPR), BFGS-ANN and LSTM models for predicting ET0 in arid and semi-arid climates of Turkey [167]. Among different models, Broyden–Fletcher–Goldfarb–Shanno artificial neural network model performed superior for estimation of ET0.

4.2 Soil moisture estimation

The soil moisture plays an essential role and is an important variable in different studies related to water balance, hydro-climatological and ecological systems and dominate its influence on the exchange of water and energy fluxes in understanding different environmental processes juxtaposed with land surface states [168]. The determination of point soil moisture in terms of field capacity (FC), permanent wilting point (PWP), etc. often are carried out for a smaller area or with limited areal extent which itself requires detailed analysis of soil sample in laboratory that are generally labourious and time consuming and difficult to replicate the observation and analyses due to the high variability of SMC in respect to time and space [169, 170]. Further, retrieving precise soil moisture at local, regional and global scales have a significant role in addressing many practical applications, including weather forecasts [171,172,173], drought and flood potential assessment [174,175,176], biogeochemical process characterizations [158], best agricultural and irrigation practices [178]. It shows that accurate and precise prediction of SMC has major contribution in providing estimates for taking effective disaster response, better estimation of crop water requirement and irrigation scheduling and other applications [179]. Estimation of soil moisture through process-based models is plagued by the under representation of key processes, excessive human influence and computationally exhaustive [180]. Deep learning has tremendous capabilities for soil moisture estimation as an alternative to conventional physically based models using satellite data. Song et al. [181] presented a deep belief network coupled with -macroscopic cellular automata (DBN-MCA) model by combining DBN and MCA model for the prediction of SMC corn field located in the Zhangye oasis. It was observed through the cross validation results that inclusion of the static and dynamic variables as inputs, DBN-MCA model performed better by showing 18 per cent reduction in RMSE as compared to that of MLP-MCA model. Tseng et al. [182] presented a simulation system for generating synthetic aerial images and for learning from it to simulate local SMCs using traditional as well as deep learning techniques. It was presented in the study that for most of the experiments, performance of CNN correlated field (CNNCF) method was the better with its test error as compared to the other methods such as (a) constant prediction baseline, (b) linear Support Vector Machines (SVM), (c) Random Forests, Uncorrelated Plant (RFUP), (d) Random Forests Correlated Field (RFCF), (e) two-layer Neural Networks (NN), (f) Deep Convolutional Neural Networks Uncorrelated Plant (CNNUP). In another study [86], CNN-based regression models were applied to estimate soil moisture integrating the temperature of plant (represented through the thermal infrared images obtained through drone-based sensors), and in situ measurements of soil moisture in the experimental farm. Three different machine learning techniques including deep learning, ANN and kNN was applied to estimate FC and PWP using PTF for the combinations of four soil dataset located in Konya-Çumra plain, Turkey [183]. It was observed that the deep learning modelling techniques using inputs of soil physical properties including the aggregate stability presented the best performances in the estimation of FC for samples of calcareous soils. In another pioneering study Yu et al. [184] modelled soil moisture using hybrid deep learning techniques at four different depths using SMC, climatological data, SWC and crop growth stage data from seven maize monitoring stations (during the 2016–2018), located in Hebei Province, China. It was presented in the study that hybrid modelling technique comprised a CNN-based ResNet and bi-LSTM model performed better than the traditional ML-based techniques such as MLP, SVR, and RF. In continuation to establish the further improved capabilities of deep learning modelling technique, Yu et al. [185] proposed a hybrid modelling technique combining capabilities of CNN and gated recurrent unit GRU (CNN-GRU) model developed using the SMC and climatologic data obtained from five representative sites, located in Shandong Province, China. Better performance was reported with the proposed hybrid CNN-GRU modelling techniques in comparison with the stand alone CNN or GRU model in terms of different performance indicators.

Further, deep learning has been proposed as an alternative to conventional physically based models for soil moisture estimation using satellite data. In the past decade, the estimation of SMC capabilities of remote sensing images including the advanced microwave scanning radiometer (AMSR) [186], the Advanced Scatterometer [187], the soil moisture and ocean salinity (SMOS) [188], the soil moisture active passive (SMAP) [189], among others, coupled with GIS techniques have widely been explored, and it has also tremendously improved the measurement accuracy and efficacy of SMCs [188]. Further, Microwave remote sensing having the capability to penetrate through the clouds and up to certain depth into the soil surface, provides estimate of SMC by using the soil dielectric properties with good consistency over a large spatial scale [190]. In estimation of SMC using satellite imagery, vegetation over the land surface is found as a major constraint for detecting signals of stored water within soil profile as vegetation attenuates soil emissions and also adds its own emissions to the microwave signal causing further error on actual emission from the soil profile [191]. Fang et al. [192] in a novel effort applied combination of RS and deep learning modelling techniques to developed a CONUS-scale LSTM network for the prediction of SMAP data and showed that the proposed modelling framework exhibits better generalization capability, both in space and time, Overall it was observed in this study that the proposed approach of using deep learning techniques for modelling soil moisture dynamics and for projecting SMAP is very efficient even by using shorter length of dataset. Zhang et al. [193] proposed a deep learning model for the estimation of SMC in China using Visible Infrared Imaging Radiometer Suite (VIIRS) remote sensing imagery as inputs. The study demonstrated the capabilities of deep learning modelling technique to capture in situ surface SMC using the VIIRS imagery in terms of better coefficient of determination (R2 = 0.99) and lower root mean squared error (RMSE = 0.0084). These results were found to be better than the soil moisture products obtained from SMAP and the Global Land Data Assimilation System (GLDAS) (0–100 mm). Lee et al. [194] employed deep learning modelling technique to estimate soil moisture over the Korean peninsula observed and thermal products from satellite. They also compared performance of the proposed modelling technique with the soil moisture products of AMSR2 and GLDAS. Wang et al. [195] developed a soil moisture inversion model (SM-DBN) by using a DBN to extract soil moisture data from Fengyun-3D (FY-3D) Medium Resolution Spectral Imager-II imagery in China. The developed model outperformed the other conventional models of linear regression (LR) and ANN, in terms of different performance indicators based on simulated and the actual ground measurement data. Masrur Ahmed et al. [196] applied deep learning hybrid models (i.e. complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN), (convolutional neural network-gated recurrent unit) CNN-GRU) for prediction of daily time-step surface SMC and demonstrated the prediction capability of the hybrid CEEMDAN-CNN-GRU model. The hybrid CEEMDAN-CNN-GRU model was built by integrating MODIS sensors (satellite-derived data), ground-based observations, and climate indices tested at important stations in the Australian Murray Darling Basin.

5 Post production interventions

Postharvest loss of food crops is not only the loss of food, but in a wider perspective is a loss of the natural resources, agricultural inputs and most importantly loss of hunger. The gargantuan volume of these losses has prompted researcher, policy makers and funding agencies alike to time and again rethink the actions and approaches that could possibly reduce if not eliminate this malaise. Besides the postharvest losses which by all generous estimate hovers around 20–23 per cent [21], there is another component termed as “food waste”, this amounts to 30 per cent loss of agricultural produce at the retailer and consumer end combined [197]. While postharvest losses are quantitative in nature and are mainly caused as a result of paralyzed managerial and technical competencies, the food waste is primarily associated with unconsumed food due to consumer behaviour, regulations and quality standards.

Rapid quality evaluation of agricultural produce is widely carried out by infrared spectroscopy combined with appropriate chemometrics [198]. Quality of the food product is generally ascertained by its moisture, protein and fat content or by the variations in it [199]. Interaction of the incident light and its subsequent scattering upon the chemical molecules express the characteristics of a food sample. The information about the quality of the food sample is obtained by the linear and nonlinear chemometric methods to provide rapid information about the internal and external quality of the agro-produce [200]. All said, generalization of sequential chemometric methods are a distant possibility due to the inherent heterogeneity of the samples owing to their biological nature. This introduces spectral variability, redundant data and optical noise all this hindering the feature extraction by the chemometric methods. Thus, there is widespread divergence between the calibration and target datasets creating enough limitations for spectral analysis.

It is well established that deep learning models are capable of demonstrating powerful capabilities of solving complex problems rapidly. This is attributed to the robustness of the models bolstered by deep neural network architectures. The automatic learning feature of deep learning models makes it a very suitable approach in the area of postharvest quality of agricultural produce, in terms of it’s geographical origin, identification, morphological features, composition, texture, soluble solid content, etc. [49, 201, 202]. Deep learning approaches facilitating image-based applications with improved accuracy and resilience for various postharvest intervention of agro-produce is figuratively explained in Fig. 5 and discussed in the following sub-sections.

Fig. 5
figure 5

A general work flow of deep learning applications for postharvest quality control of agro produce

5.1 Identification of varietal variability

Geographical origins have noteworthy bearings on the various attributes of agricultural products. Environmental variations entail introduction of varietal differences on the basis of composition, morphology and economic value. Deep learning models working with infrared spectra have demonstrated adequate instances of success in the identification of geographical origins of many agro products (Table 1). While flavour causing chemicals determines the success with coffee beans [203, 204], the colour and variations in the ratio of the fatty acids proved deep learning effective for olive oils [203]. For apples it is the variations in the cellular structure and related external/internal features (e.g. soluble solid content) that helped in successful application of deep learning models [205]. Pure seeds form the basis of a sustainable agricultural ecosystem. Varietal identification of seeds assumes critical importance for the growers, as well as for the breeders to ensure desired productivity and product quality. During the literature search it was observed that CNN models can be successfully used to predict the purity of rice seeds [206], hybrid loofah and okra seeds [207] and oat seeds [208]. Difficult to visually recognize herbal medicinal plant seeds were also found to be successfully classified by deep learning models [204]. It has been reported that effectivity of deep learning classification is way beyond a two-level classification. Deep learning approaches with CNN models powered with increased number of trainings can perform far better than kNN and SVM models [206]. Citing the need for a rapid and efficient means for selection/classification of loofa (Luffa aegyptiaca) seeds of intended progeny, an NIR-HSI (975–1648 nm) combined with deep learning approach was developed with 6136 hybrid okra and 4128 loofa seeds of six varieties [207]. The Deep Convolutional Neural Network (DCNN) discriminant analysis model had an accuracy of more than 95 per cent and can be adopted for automated selection of cross bred progeny. Oat seeds were discriminated based on their varieties by using HSI (874–1734 nm) and DCNN. It was concluded that HSI combined with an end-to-end DCNN could be a potent rapid tool for visualization of accurate (99.2 per cent) variety classification for oat seeds [208]. The same approach reaped similar results while handling seven varieties of Chrysanthemum comprising 11,038 samples. Here, the DCNN was based on spectra obtained from the full wavelengths to give results with an accuracy of 100 per cent for training and testing set [209]. The superiority of deep learning approach based on CNN model over the conventional methods (partial least squares-linear discriminant analysis and PCA with logistic regression) was achieved while handling NIR dataset for grapevine classification [210]. Considering the decrease in the number of experts for grape variety identification, deep learning techniques were employed for digitization in viticulture [211]. The models were trained with multiple features of a grape plant, e.g. leaves, fruits, etc.; eventually the models were combined into a single model for grape variety (five) identification. The accuracy achieved by the single model which was called ExtRestnet (99 per cent) was far superior to the accuracy achieved individually by the Kernelwise Soft Mask (KSM) (47 per cent) and Restnet (89 per cent) models. These findings hold the key to the future of identifying type-dependent diseases or any special fungal disease in grape.

Table 1 A comprehensive tabulation of the application of deep learning based techniques in various aspects of agriculture in relation to farm, water and postharvest management

5.2 Qualitative analysis of agro produce

As has been mentioned in the above section that one of the major causes of food waste is rejection or non-consumption of fruits due to poor quality. For most of the fruits, quality means the existence of adequate amount of desired taste and exact tautness all across the fruit surface. While taste can be demonstrated by soluble solid content (SSC), the texture of the fruit is essentially indicated by firmness which in turn depends on the right amount of moisture, uniform ripening and a puncture-less or bruise-free outer skin. Deep learning based spectral analysis demonstrated considerable success in predicting the SSC and firmness for Korla fragrant pear analysis while applying SAE-FNN model using VIS and NIR spectral data [212]. Sweetness of orange juice was accurately predicted in terms of the saccharose concentration by using a three-layer CNN while adopting a deep learning approach [213] as compared with conventional chemometric methods. Black goji berries are a store house of bioactive components with high medicinal value, deep learning approach achieved very good results in predicting the phenolics, flavonoids and anthocyanin content of juice as well as the dried berries [214]. Mishandling of the fruits during harvest and transportation results in cuts, bruises and fissures on the outer skin and cellular damage deep inside the skin. These injuries cause the fruits to go under stress, resulting in an enhanced respiration rate and rate of senescence. All this translates into a change in the biochemical properties of the fruits which can be captured by NIR spectra, a deep learning based qualitative analysis for winter jujubes was carried out by Feng et al. [215]. A 2B-CNN model was found to be highly robust for feature selection in detecting bruising of strawberry when the input dataset was a fusion of spectral and spatial data [204]. Attempt has been made to estimate the stages of ripeness in strawberry using a combination of HSI and deep learning [216]. Feature wavelengths were selected using a sequential feature selection algorithm and 530 nm was found to be the most important wavelength for field conditions. AlexNet (a popular deep learning model) CNN was observed to have a prediction accuracy of 98.6 per cent for detection of ripeness in strawberries. Findings of these types can be utilized for the development of real-time precision strawberry harvesting systems. There are some other instances of fruit classification based on colour, shape and texture [217,218,219] wherein absence of deep learning like models restricted the capability of the models for similar fruits of a particular species only. However, the same classification factors could be more accurately used with k-NN algorithm [220]. Evidences and instances discussed above indicate the strategic superiority of deep learning methods over the traditional data analysis methods, it would therefore be pertinent to conclude that further studies can be conducted in the future for quality detection of fruits.

There are a few instances where deep learning has been initiated for quality related evaluation of vegetables. Cucumbers are subjected to damage caused due to pests, insects, transportation induced surface discolouration, etc. Stacked sparse autoencoder (SSAE) in isolation and coupled with CNN has been attempted for deep feature representation and classification of damaged cucumbers based on the HSI [221]; defected region was screened out by CNN model based on the RGB channels, while the mean spectra of this defected area was used for SSAE-CNN classification. An accuracy of 91.1 per cent was achieved by this classification method. A very simple basis of grading for okra can be the length of pod. Deep learning models were used upon a dataset of 3200 images [222]. The accuracy exhibited by the models, AlexNet, GoogLeNet and ResNet50, was 63.5, 69.0 and 99.0 per cent, respectively. Potatoes are one of the most popular food crops, but they are prone to be infected with viruses. Deep learning with HSI using fully convolutional network was successfully applied for the detection of Potato virus Y [223]. Classification of tomatoes on the basis of seven selected species was attempted by using deep learning [224]. A network comprising four CNNs was trained to forecast the tomato species with an accuracy of 93 per cent. Tomato was classified on the basis of exterior surface defects by using 43,843 images as dataset for deep learning [225]. Feature extraction was carried out by trained ResNet classifiers, which was found competent to identify surface abnormalities issues with tomatoes. Deep learning based rapid recognitions system for identifying nutrition disorders in 11 kinds tomatoes was developed by pre-processing the dataset by a pre-trained Enhanced Deep Super-Resolution Network technique [226]. This deep learning-based technique could attain an accuracy of 81.11 per cent, much more than the existing techniques with the same objective. Predicting the actual oil yield from an oil palm plantation is tricky as in no way it can be judged that what can be the number of mature oil-bearing crowns. Deep learning with two different CNNs was applied to identify and forecast the quanta of mature and young oil palms by using the satellite images [227]. The forecasting outcomes were exported to a geographic information system software for mapping the grown and young palms, accuracies were 92.96 and 95.11 per cent for mature and young oil palm, respectively. Some work in deep learning has been carried out with dates; distinction of healthy dates from the imperfect ones and date yield. The difference in the growing phases between the healthy and imperfect dates formed the basis of modelling [228]. The study was conducted across four classes of dates, Khalal, Rutab, Tamar and defective dates, using a CNN technique with VGG-16 architecture. This study yielded results with 96.98 per cent accuracy. Mature and premature date images (8000) formed the dataset for deep learning-based tool for prediction of the type of date, maturity and harvesting decision with an accuracy of 99.01, 97.25 and 98.59 per cent, respectively [229].

5.3 Detection of food contamination

Contamination of agro produce with foreign materials can be caused due to poor agricultural inputs (polluted water, inconsistent fertilizers, etc.), improper handling (field dirt, crop residues, etc.) and wrong storage conditions (fungi, beetles, pesticide residues, etc.). Ingestion of such foods may result in detrimental changes to the physiological conditions of humans leading to onset of various co-morbidities. A brief account of the work carried out in the area of determination of contaminants in food by applying deep learning tools is reported in this section. Published work which embodies the usage of traditional machine learning algorithms for detection of food contaminants is existent [230,231,232]; however, the application of deep learning with similar objective is sparse. Perhaps desperate attempts are required to fully utilize the potential of deep learning to replace traditional machine learning methods [200]. Prediction of morbidity arisen due to gastrointestinal infections caused as a result of contaminated food has been attempted by using DNN [233]. A target region in China was the locale of this study comprising 227 contaminants in 119 types of widely consumed foods. The features of the contamination indexes were extracted by DDAE which is structurally similar to SAE with multiple hidden layers. Deep denoising auto encoder (DDAE) model was found to perform better (success rate 58.5 per cent) than conventional ANN algorithms. Manual detection of foreign objects perched in the different locations over juglans is tough and inconsistent. Complex shape of juglans results in improper image segmentation resulting in an inefficient machine vision approach. However, a deep learning approach comprising multiscale residual fully convolutional network was found to be efficient in image segmentation (99.4 per cent) and feature extraction of juglans [234]. The proposed method could detect and correctly (96.5 per cent) classify, leaf debris, paper scraps, plastic scraps and metal parts, clinging to juglans. A complete cycle of segmentation and detection took a time of less than 60 ms. Occurrence of pest fragments in stored food samples is obvious and rampant, human interventions are time-consuming and prone to errors. Deep learning approach was applied for rapid identification of 15 stored food products which are frequently contaminated with beetle species [122]. Convolutional neural network was trained on a dataset comprising 6900 microscopic images of elytra fragments. The model performed with an overall accuracy of 83.8 per cent. Pesticide residues are a common contaminant for fruits, its presence poses a serious threat for the fact that fruits are consumed as a table food. Apples with four pesticides (chlorpyrifos, carbendazim and two mixed pesticides) at a concentration of 100 ppm were imaged using a hyperspectral camera, in all making a dataset of 4608 images of each category [235]. The normalized (227 × 227 × 3 pixels) images were used as input to the CNN network for detection of pesticide residue. At a training epoch of 10, the accuracy of detection for test set was 99.09 per cent. Thus, this method demonstrated an effective non-contact technique for detection of pesticide residue in harvested apples. Adulteration of foods for eliciting enhanced monetary returns is a malpractice spread all across the globe. Milk is adulterated by a variety of substances that threatens well-being of humans. Spectral data from infrared spectroscopy was used for binary classification of (un)adulterated samples using a CNN model with Fourier transformed data [236]. The model was found to be 98.76 per cent accurate, far better than gradient boosting and random forest machine methods. Again, CNN models were found to be very accurate in determining the adulteration of different meats, chicken, turkey and pork [237]. Equipped with mid infrared spectral data, CNN models were found to be very accurate in classifying strawberry and non-strawberry purees [203]. It can hence be understood that deep learning has touched almost all aspects of food contamination successfully. This technique therefore poses to be a potent tool for rapid, non-contact and effective means of contamination detection in agro produce.

5.4 Food quality sensors

Precise non-invasive discrimination in terms of quality of foods has been made possible by the advent of electronic and multi-sensor technology applications. Human vision and gustatory system can be effectively mimicked with reasonable accuracy by electronic eye (EE) and voltametric electronic tongue (VET), respectively. These instruments can provide comprehensive information about the subject, rapidly. While the EE captures the colour and optical texture of the samples and compiles the result as overall appearance of the subject, the VET has an array of sensors which are titillated to send signals by the dissociative ions of a liquid sample [238, 239]. Instrumentation requirement for rapid detection of food quality has had a paradigm shift with the advent of EE and VET [240]. A deep learning algorithm was used to extract the feature and to non-destructively discriminate pu-erh tea based on its storage time (0, 2, 4, 6 and 8 years) with the help of data fusion strategy applied to the signals from an EE and VET combine [241]. The main parts of EE system were: an eyepiece, a stand, an LED lamp with adaptor. The eye piece was adjusted to capture a clear image of the pu-erh tea to gather all the relevant information. The VET employed in this study [242] comprised: a signal-conditioning circuit; an array of sensors (glassy carbon, tungsten, nickel, palladium, titanium, gold, silver and platinum, auxiliary electrode) with a standard three-electrode system for all the eight electrodes. There was a reference electrode of Ag/AgCl; the response signals from sensors were collected by a DAQ card (NI USB-6002, National Instruments, USA); LabVIEW software was used to control the DAQ card and analyse the collected signals. Deep learning was introduced to eliminate the manual intervention for feature extraction from the EE and VET signals. In-turn, 1-D CNN and 2-D CNN were used on behalf of the deep learning algorithm which leads to an overall improvement in the recognition of the data patterns as compared to the conventional techniques. This was followed by application of Bayesian optimization for selection of the optimal hyper parameters of the CNN model. The supreme novelty of this study lies in the fact that instead of individual data feeding, EE-CNN or VET-CNN, the data from the hardware was fused and fed to the deep learning algorithm in such a way that led to a more accurate and robust classification. This study opens a new window of opportunity for employing deep learning to intelligent sensory analysis which shall lead to a reliable, intelligent and non-destructive quality control tool open to be used for products other the pu-erh tea as well.

6 Dataset resources

Chemical and physical features of the fruits in terms of shape, nutrient content, maturity, firmness, damage, disease, etc., is reflected by the RGB images or in the spectral information which can be interpreted and classified using deep learning models. Breakthrough results in deep learning application for quality evaluation of agro produce shall require good input data as well. A dynamic dataset of high-quality fruit images called Fruit-360 [131] was developed with the sole purpose of creating deep learning models, care has been taken to capture images with uniform background and with measures to minimize noise. A collection of 87,848 expertly annotated images in the database, divided into 58 classes, each of which is defined as a pair of plant and a related disease, with some classes including healthy plants can be found in PlantVillage created by Hughes and Salathe [243]. There are 25 different healthy and diseased plants among the 58 classes. More than a third of the images (37.3 per cent) were taken in the field under actual conditions. Wheat Disease Database 2017 is a collection of 9230 wheat crop images with 6 diseases of wheat crop (smut, powdery mildew, stripe rust, black chaff, leaf rust and leaf blotch) and a healthy class, which are annotated in image level by agricultural experts [101]. Multi-class Pest Dataset (2018) is a collection of 88,670 images with 582,170 pest objects divided into 16 categories [244]. A large multiclass dataset called DeepWeeds [140] which comprises 17,509 images of eight different weed species (Chinese apple, parthenium, rubber vine, prickly acacia, lantana, parkinsonia, siam and snake weed) and various off-target (or negative) plant life native to Australia has been developed for deep learning applications. CropDeep is a collection of 31,147 images of vegetables and fruits under laboratorial greenhouse conditions as well as over 49,000 annotated instances from 31 different classes developed exclusively for deep learning-based classification and detection of species [29].

Multiple data sources will make the models robust and possibilities for data fusion will surely bolster the perspectives of future research. Data from different sources can be combined to improve the overall quality of the data, thus ensuring high quality representations [245], in this line an instance of combining spectral and spatial data from HSI was found to exhibit an improved performance [204].

Success of a deep learning model shall increase for the better if the input data originates from a variety of dependent factors; in case of agricultural produce, it can be variety, origin, size or temperature. This type of model training is called multi-task learning, where simultaneous learning from several tasks of common knowledge takes place for a single model and each task is labelled as an original task [246]. The inbuilt structure of multi-task learning allows for a general understanding for the feature patterns across various tasks, all this while ignoring the noisy and irrelevant data. Another machine learning approach is transfer learning where the information picked up while learning the source tasks can be applied for target tasks, this approach improves the performance of the model as the recalibration part is avoided [35, 247].

7 Deep learning model performance and causal analysis

The deep learning model validation quantifies the expected performance from the model based on how well the model responds to unseen data. For this reason, many times, model validation is done on a data other than training data. Different approaches for generating datasets for the deep learning model development include train/validate/test percentage split, k-fold cross-validation, and time-based splits [248]. The effectiveness of deep learning algorithm is measured using a defined metric on this unseen data and is used for comparison among different models on their predictive power. In particular, for image classification using CNN architectures, the metrics like precision, accuracy, recall, F1-score along with Receiver Operating Characteristic (ROC) curve are quite useful [249]. When it comes to the object detection models, Intersection over Union (IoU), mean Average precision (mAP), Average Precision, etc., [250] are most commonly preferred for comparing the effectiveness of the models. These validation approaches and metrics are recursively applied over the data to make feature selection and optimize the hyperparameters. The learning curves of CNN models assist in identifying if the model is following the right learning trajectory, with a bias-variance trade-off. Analysing the learning curve for different models reveals the performances of each model. For example, a model with stable learning curves across the training and validation data will perform well on the unseen data. Model’s generalization, probability of overfitting, and underfitting are some key observations from the learning curves [251, 252]. Performance is an important criterion for identifying the right architecture. A model that effectively makes use of memory resources can generate quick predictions and often favour the real-time processing of data. Easier retraining is also an important criterion that helps in accommodating changes to existing models.

DL models have managed to address various complex agricultural engineering problems; but they often fail to make human-level inferences. Especially in deep neural networks with a large number of layers, causality remains a challenge, it has been observed that the model performs poorly in generalizing beyond the training data. There are limitations of reliable decision making and robust predictions with ML based on correlational pattern recognition [253,254,255], e.g. in the case of a plant stress detection or disease identification, the model gets trained on a huge volume of data assuming that it will help the model to generalize the distribution and learn suitable parameters. But in reality, the distribution often changes drastically beyond the training data. The CNN model, trained to identify plant stress in RGB images may fail to identify the stress in a new environment with different lighting conditions. The same is applicable to object detection algorithms as well. A YOLO or Faster-RCNN object detector trained to detect fruits may detect a wrong object at a slightly different angle or against new backgrounds [250, 256]. As there is uncertainty in the actual environment, it is often impossible to train model to cover all possible scenarios. This is highly relevant in agricultural engineering applications as the deep learning models directly interact with the environment. If the model does not possess causal understanding, the model fails in dealing with new situations.

A general approach for solving a real-world problem includes collection of a large pool of data, split the data into training, testing and validation and evaluate its performance by measuring the accuracy. The process is recursively performed until desired level of accuracy is reached. Even the benchmark datasets above-cited, like Fruit-360 [257], plant village [258] also fall onto same category. The deep learning models transfer learning using CNN architectures such as VGG16, AlexNet and GoogLeNet that are widely used in agricultural research are fine-tuned image classifiers that can identify new types of patterns. These models may also respond poorly with respect to changes in the environment. The main objective should be to adapt as much knowledge as possible with fewer training examples, and the model should be able to reuse the knowledge gained without continuous training in a new environment. It is worth noting that often an accurate model may not be sufficient to make informed decisions as they had been trained on statistical regularities instead of causal relations. The causal model is capable of responding to situations that the model has not encountered before.

8 Concluding remarks and way forward

This review paper is an ensemble of a wide range of applications of deep learning techniques towards inducing precision in agricultural mechanization, water management and postharvest operations. The key findings of this mammoth review work are presented in an easily referable tabular format structured around the different aspects of deep learning application for the engineering aspects of precision agriculture (Table 1). Although there are instances galore about the application of deep learning in precise pre agricultural and postharvest agricultural operations; there is enough evidences to indicate and inspire researchers for developing more creative and computationally sound deep learning models for some of the identified associated challenges.

The size of dataset is a vital parameter that gives statistical strength to the deep learning models. Collection of quality data is difficult and challenging. Complexity arises from the fact that the challenges are multi-dimensional. The data acquiring tool needs to be consistent during the course of data collection. Else, even a large dataset would not assure a robust model and predictions would be listless. This propels us to think of some effective data clean-up tools or in other words the models should be capable of handling uncertainties with concepts of, say Bayesian inference [259]. Development of efficient tools will boost the supply of quality data to the model and remove all the noise and redundancy. Overcoming quality data collection and data cleaning will lead us to the challenge of data labelling, it is an activity of considerable monotony and has direct bearing on model efficacy. Deep learning models need to learn from labelled data while, as has been mentioned earlier in this paper that public datasets are an option for deep learning modelling, but the variability in agricultural inputs and outputs arguably limits the use of these datasets for global precision agriculture purposes.

Data augmentation is a potent tool for overcoming the limitation of public dataset. This tools not only helps in increasing the volume of the data set but also ensures consistency of quality and dimension. The techniques involved in data augmentation range from simple flipping, rotating, overturning, etc., of the images [260], to amplitude and frequency deformation [261] to utilizing Gaussian noise [262]. Compilation of advantageous performance of these techniques for agricultural data is absent, although desirable.

Another way of alleviating the data dependence of deep learning models is by training it with data generated from numerical simulation by employing laws governing the physical phenomenon [263]. Such data needs to be captured for the precision pre and post agriculture operations as well; care must be observed to stifle extrapolation and observational biases [180] so as to comprehensibly exploit the advantages of deep learning. Representation of data to be used for deep learning modelling has to be consistent in format [264], besides being smooth, temporal, spatially coherent, etc. [265]. This consideration has to be kept in mind while collecting data from the precision agriculture domain.

It is foreseen that in the future there will be a dramatic proliferation of deep learning application in the field of pre and postharvest agricultural engineering operations. Interfacing the results of deep learning models with application based hardware will require understanding and efficient interpretation of the statistically outstanding performance of these models, especially for the occasions where it outperforms the human experts [180]. Failure of deep learning network has been attributed to the unease of conclusive interpretation [266]; research needs to be focused in eliminating such instances and strive to achieve self-explanatory models. Deep learning frameworks fail to encompass weather or environmental conditions while predicting parameters in the agri-ecosystem domain. It is well understood that weather is a stochastic parameter riding on complex nonlinear relationships with multiple components [240]. There is an immediate requirement of including weather information during data collection for taking full advantage of the deep learning models. There are 3D models representing the architectural traits of soil, crop and agricultural commodities across different origins and varieties, couplings of these models to a deep learning framework would substantially bolster the parameter prediction capabilities while giving due weightage to the subject traits. Effective deep learning modelling for addressing soil–water issues and yield estimation concerns can be fulfilled once the deep learning framework unites with multi modal data streams from aerial as well as ground level sensing. Hopefully approaches will continue to evolve rapidly in near future which will result in realization of the amazing possibilities that deep learning modelling beholds; wherein, computer systems shall be able to identify, classify, quantify and predict in scenarios leading us to an era of autonomous precision agricultural engineering operations.