1 Introduction

The primary and most crucial activity in agriculture is the early identification of plants infections. Even today, most of the verification is doing manually that may not be easy to detect the disease and its type. This issue has raised the importance of automatic infection detection and compelled to develop such methods or systems that can detect the disease more reliably. Most data are in the form of images prone to error and treated as the motivation for this review process. The critical issue of identifying and diagnosis the diseases more effectively relies on acquiring accurate information on diseases symptoms (Ma et al. 2015a, b, 2017a). With the help of present computer-vision technologies, segmentation of disease spots on leaves can be considered as a primary source to acquire disease information (Barbedo 2017).

Machine Learning (ML) is one of the classical approaches of Deep Learning (DL) that enables machines to adopt similar behaviours like humans. Using these techniques, machines react by learning and using the same as experience for the next to learn and implementation (Jafar et al. 2018). ML is a multi-disciplinary area of learning raised several research domains including agriculture. The techniques can implement in various fields of computations that can allow to design and develop new algorithms. These algorithms are further applied to various agricultural problems to identify the crop diseases at an earlier stage and their classification based on disease types. One of the Deep learning technique named Convolutional Neural Networks (CNN) has proved to be the best in image classification and achieved great success (Ren et al. 2017; Zhao and Jia 2016; Xu et al. 2017; Sainath et al. 2015; Shelhamer et al. 2017; Ribeiro et al. 2016; Lv et al. 2017).

To address the issues, existing techniques be understood better by observing, measuring, and analyzing the massive agriculture data. It is also necessary to perceive the technologies for both short-term crop management and large-scale ecosystems (Kamilaris et al. 2016, 2017a). Big data analysis is one among the advanced techniques of deep learning. In general, deep learning model contains at least three layers where each layer with neurons is linking to data features, therefore produces more complex information. Through the hierarchy of organized networks of neurons, input features are learning from deep learning models (Lowe et al. 2017). Recent studies (Sladojevic et al. 2016; Gómez-Chova et al. 2015; Wang et al. 2017) have concentrated on deep learning model’s evaluation based on digital photography, hyperspectral imaging, and its data analysis to detect plant diseases.

Deep learning in future believes that CNN will be the most accepted and authoritative model to work on imaging. Presently, computer vision and multimedia (Gómez-Chova et al. 2015) and Natural Language Processing or simply NLP (Haut et al. (2017) are the most promising deep learning application areas (Sladojevic et al. 2016). In (Haut et al. 2017; Quirita et al. 2017), the cloud computing architectures implemented are identifying the practicality of feasibility while considering hyperspectral approaches. Cloud-based technology is capable of handling data collection, analysis, and expect the agriculture field environment data in one common platform (Kim et al. 2018). The other most accepted techniques for image analysis include noise filters, parametric rectifier function, Multiclass Support Vector Machine (MSVM), stochastic gradient descent, softmax/cross-entropy, and backpropagation. Hence, hyperspectral data analysis is an essential area of research for image processing while promising massive potential.

The significance of this paper is to handle the productivity by prior monitoring of plant health in terms of climatic changes, food safety and sustain during the cultivation scenarios. Developing a more sophisticated system and analysis for identification/classification of images based on healthy and diseased detection is the motto.

The rest of this paper is organized as Sect. 2 provides the research methodology followed during the review of existing work. Section 3 provides a description of the deep learning techniques based on the research questions framed in the previous section. Section 4 identified and discussed some of the challenging factors that still need to be answered for better implementation of models. Section 5 summarized the outcomes of the study and discussed them as observations followed by future directions in Sect. 6 and the conclusion in Sect. 7.

2 Methodology

Relying on the potential benefits of deep learning models in the field of agriculture described in the previous phase, this paper targets the current condition of solutions in these areas including technologies and techniques along with open challenges.

2.1 Planning

The analysis has involved the collection of various journal articles published during the years 2015 and 2019. Firstly, a keyword-based search is conducting from SCOPUS indexed list of journals, IEEE Xplore and other scientific databases. The list of search keywords mentioned in Table 1 are used along with the following query.

Table 1 List of keywords used for the search

[Deep Learning] AND [Convolutional Neural Network] AND [Hyperspectral Data] AND [Crop Diseases] AND [Farming OR Agriculture].

The entire process of research methodology (RM) followed to prepare this paper has been depicted in below Fig. 1.

Fig. 1
figure 1

Research methodology for deep learning techniques review

2.2 Conduction

This phase deals with reviewing and summarizing the selection criteria for evaluating the existing CNN models in terms of efficient detection of infections in various crops based on trained and testing data sets. The entire process of review is carrying by observing the existing CNN model solutions for the following research questions.

  1. 1.

    What are the primary sources of acquiring plants data?

  2. 2.

    What are the automated systems adopted to recognize and classify plant images?

  3. 3.

    How can be the deep learning models can improve the performance of automated recognition systems?

  4. 4.

    What are the various feature extraction techniques selected from deep learning algorithms?

  5. 5.

    What are the analytical techniques used to improve the quality of an image?

  6. 6.

    How deep learning techniques can reduce overfitting and improve non-linearity?

  7. 7.

    How can we optimize the loss function between the training and testing datasets?

During this process, we filtered out the papers referred to various deep learning techniques mostly applied to agriculture domain. From this determined attempt, 138 papers have been identified initially. Limiting the search process with suitable DL techniques and purposeful conclusions, the number of articles was reduced to 84 and considering one-by-one for further analysis.

3 Deep learning techniques

Research Methodology (RM) findings are presented in this process to answer seven research questions based on data derived from selected studies. Deep learning is gaining prominent attention during these days and becoming an active research field. It takes a long time for DL architecture to educate the neurons but achieves remarkable classification accuracy and a significant rate of object recognition.

3.1 Answer to the first research question

For this section, 35 papers were identified by filtering them with sources of data acquisition. At last 16 papers are considering for analysis.

An appropriate and enough dataset is needed to identify every stage of an object, starting from the training stage to the testing stage. This phase has reviewed and identified various data sources from which the existing literature have acquired their input image datasets.

Under the controlled conditions, a DCNN model was trained to identify 26 diseases among 14 crop species. The training was conducted with a public dataset, including 54,306 images of healthy and diseased crop leaves (Mohanty et al. 2016). Datasets were acquired from 116 spectral signatures through foliar samples in 4 levels of diseases. The study has applied artificial neural networks (ANN) techniques to discriminate and classify viruses in oil palm trees (Ahmadi et al. 2017). To facilitate wheat crop disease diagnosis, over 50,000 annotated images, including healthy and infected from Plant Village, have been acquired. In a pure background, seven different diseases with 9230 healthy images dataset were collected from WDD2017 database (Lu et al. 2017b).

From several sources like Plant Village and Google websites, 500 maize crop images were collected and divided into nine categories with eight diseased categories and one healthy image category (Zhang et al. 2018a, b). Over three datasets with 18,222 maize crop images were obtained from a publicly available repository on Open Source Framework (Wiesner-Hanks et al. 2018). The proposed CNN model used a professional aerial vehicle Phantom DJI 3 drone to capture many pictures from the soybean plantation. A database of over 15,000 images including the pictures of soil, soybean crop, broadleaf, and grassy weeds was created (dos Santos Ferreira et al. 2017).

The primary data source is the IoT sensors installed in the field. The acquired data is further used for data analysis (Kim et al. 2018). In (Ma et al. 2018), over 14, 208 images, including augmentation images with symptoms of four classes of cucumber infections, was captured from the real field. Nearly 93 images of cucumber downy mildew were obtained from a greenhouse innovation base contrasted in Tianjin Academy of Agricultural Sciences using a camera (Ma et al. 2017a, b). An image data required to test and train the CNN are obtaining from the farmers (Bai et al. 2017). The dataset is collected from Plant Village open-access database with 50,000 datasets of apple fruit leaves for 30 class labels like healthy, early, middle, and end stages disease was used to disease detection (Wang et al. 2017). Some of the other literature considered in this section is represented in Table 2.

Table 2 Hyperspectral data sources for healthy and infected crops

3.2 Answer to the second research question

For this section, 25 papers were identified by filtering them with existing automated systems developed to recognize and classify plant diseases. At last 20 papers are considering for analysis.

A review of advanced techniques for neural networks (NN) was conducted to identify how best to use hyperspectral data to determine plant diseases. A study on NN mechanisms, models, types, classifications, techniques and various algorithms used in the processing of hyperspectral data is conducted during the first stage of an investigation (Golhani et al. 2018). In the current state of imaging and non-imaging information for earlier disease detection, some detail is later highlighted (Golhani et al. 2018). One of the DL technique named CNN has emerged as the most powerful tool for identifying and classifying images and extracting nonlinear, invariant and discriminant features (Paoletti et al. 2017; Lu et al. 2017a). An artificial neural network is unable to manage images, CNN architecture has proposed with convolution and pooling layer for feature extraction, classification layer to classify the images dataset. These layers further connected to convolutional and max-pooling layers (Sibiya et al. 2019). The proposed deep metric learning-based framework to provide an efficient classification of objects even when there is a smaller number of examples of training per class (Anshul Thakur et al. 2019 ). An automated system for disease diagnosis with architecture named as multiple instances learning-based diagnosis system was developed for wheat crop diseases. The proposed method is trained with an end-to-end dataset, and the results are further evaluated to identify the precision of recognizing the disorders in the crop growth and its infection category (Lu et al. 2017a). The network was trained with input hyperspectral dataset and achieved accuracy in variation for several numbers of epochs and size of the convolutional filter (Jain et al. 1996). A DCNN model with CaffeNet architecture has proved a remarkable efficiency in disease recognition. An automated system for image recognition has been developed with a widespread of High Definition (HD) cameras, smartphone penetration, and high-performance processors to detect plant diseases (Mohanty et al. 2016).

An integrated system named Farm-as-a-Service (FaaS) developed and evaluated its performance through analysis of predicting the diseases in the strawberry crop. The system has proved its performance even in typical communication environments (Kim et al. 2018). The developed Deep Convolutional Neural Networks (DCNN) model conducted a process of recognizing the symptoms category for four infections of a cucumber leaf. The architecture adopted is a straightforward and faster approach to identify small-scale images (Ma et al. 2018). A web-based model proposed has helped farmers to inspect the diseases in pomegranate fruit. The system works by uploading the fruit image that is further analyzed with a trained dataset to identify the infections if any (Bhange et al. 2015).

By facilitating the principles of deep learning, a fully connected network model was developed using CNN for recognizing and classifying the maize crop diseases. The model can able to identify three different conditions named northern corn leaf blight, common rust, and grey leaf spot (Sibiya et al. 2019). An automated in-field disease diagnosis system for maize crop was developed using deep learning frameworks. The system has integrated the identification of diseases for trained images in wild conditions (Lu et al. 2017b). A new automated identification and measuring system with a trained machine learning system was developed. The system can detect diseases using images collected from the real field (Wiesner-Hanks et al. 2018). Using deep structured learning concepts and through some computer vision techniques designed an integrated approach to detect and distinguish the diseased with the healthy crop images (Jain et al. 1996).

An IoT-based network model was proposed to provide high-level service to the farmers (Kim et al. 2018). To minimize the time taken to predict the disease symptoms of cucumber leaf-like target spots, downy mildew, powdery mildew, and anthracnose a deep learning model with CNN was proposed (Ma et al. 2018). An image processing algorithm with hot-spot detection and statistical interface method is proposed to handle disease identification in real field climatic conditions (Alexander Johannes et al. 2019).

3.3 Answer to the third research question

For this section, 23 papers were identified by filtering them with automated systems performance in terms of precision in recognizing and classifying plant diseases. At last 15 papers are considering for analysis.

During the past few decades, there was tremendous growth in performance of DCNN with image recognition and classification. Working with traditional methods entirely relies on predefined features. Preprocessing of images dataset is made to increase consistency and improve the feature extraction before training the DCNN model. The most significant preprocessing operation is image resize and format so that all the photos should be in the same size and types. The most common image preprocessing is to minimize the image size so that it can be adaptable to deep learning models (Andreas Kamilaris et al. 2018).

In (Zhang et al. 2018a, b), the images are resized to 224 × 224 pixels and 32 × 32 dots/inch using python code on the OpenCV framework automatically. An image preprocessing step on the dataset is to resize each image to 60 * 60 pixels and later to convert them into a grayscale mode (Amara et al. 2017). All the dataset images were resizing into 800 * 600 pixels before data analysis (Ma et al. 2017a, b; JMa et al. 2018). To improve image processing, all the input images were adjusted to 1824:1028 size (Ulzii-Orshikh Dorj et al. 2017). For early identification of diseases, input images were adjusted into 224 pixels size that can reduce the incipient disease, which is already in small size (Srdjan Sladojevic et al. 2016). During the data collection, images with less than 500 pixels were not considered. The other size images resized into 256 * 256 pixels (Srdjan Sladojevic et al. 2016). The most common image sizes are 60 × 60, 96 × 96, 128 × 128, 256 × 256 (Andreas Kamilaris et et al. 2018).

Image segmentation is also one of the popular practices either to increase the count of images in a dataset or to facilitate the Deep Learning process by identifying the regions of disease (Ienco et al. 2017, Rebetez 2016, Sladojevic et al. 2016; Grinblat et al. 2016; Sa et al. 2016; Chen et al. 2017; Bargoti et al. 2016). Finally, it was understood that fewer size images can reduce the time taken to train and test the CNN models.

3.4 Answer to the fourth research question

For this section, 27 papers were identified by filtering them with deep learning algorithms for feature extraction from the images. At last 16 papers are considering for analysis.

Based on leaf classification, a recognition algorithm was developed using DCNN to identify plant infections. The model can able to identify 13 different varieties of plant diseases among the healthy leaves with the potential to distinguish the surroundings of the plant leaves (Sladojevic et al. 2016). A popular k-means algorithm for unsupervised clustering of hyperspectral data was implemented. The study also suggested a cloud-based architecture to identify the efficiency to process such image datasets (Juan Mario Haut et al. 2016). Features are manually chosen based on various application domains. Moreover, the issue of selecting features has gained significant attention in the community of pattern recognition (Yilmaz et al. 2006). First, among the other, convolutional operation is to extract the image features. It preserves the spatial association among the pixels by knowing the characteristics of an image through smaller portions of the input image. The approach proposed in (Jain 1996) relies on features of an image and exploited the visual cues like shape and color. The technique combines the features representing the shape and color in images and improved the features retrieval speed. In general, the mathematical equation of convolution is

$$\left( {f*g} \right)\left( t \right) \doteq \mathop \int \limits_{ - \infty }^{\infty } f\left( \tau \right)g\left( {t - \tau } \right)d\tau$$
(1)

Each feature map is convoluted by several feature graphs for x input of the ith convolutional layer is derived in (Zhang et al. 2018a, b) as

$$hic = f\left( {Wi*x} \right)$$
(2)

According to precise mathematics, convolution is represented as (Plant Village Disease Classification Challenge 2018):

$$yi^{1 + 1} ,j^{l + 1} + d = \sum\nolimits_{i = 0}^{H} {} \sum\nolimits_{j = 0}^{W} {} \sum\nolimits_{{d^{t} = 0}}^{{D^{l} }} {f_{i,j} ,d^{1} d \times x_{i}^{l} l + i,j^{l + 1} } j,d^{l}$$
(3)

Here \(x_{i}^{{l}_{l + 1}} ,j^{l + 1} + j,d^{l}\) refers to xl indexed by \((i^{l + 1} + i,j^{l + 1} + j,d^{l} )\) triplet.

For an image, convolutional filter (Anil K Jain 1996) is representing mathematically as

$$f\left( {x,y} \right)*g\left( {x,y} \right) = \mathop \sum \limits_{n1 = - \infty }^{\infty } \mathop \sum \limits_{n2 = - \infty }^{\infty } f\left( {n1,n2} \right).g\left( {x - n1.y - n2} \right)$$
(4)

Unsupervised or Supervised classification can be the primary aim of recognizing patterns. Among the most widely used approaches of pattern recognition, statistical learning method have been gaining attention (Anil K Jain et al. 2000). Binary Bayesian classifier has attempted to capture high-level issues from low-level features of an image under the criteria that the image for testing should belong to one among the classes (Aditya Vailaya et al. 2001). The framework uses a multiscale CNN to understand the process of learning to increase the size of a dataset (Anshul Thakur et al. 2019). CED algorithm is an optimal design that supervises the creation of most variety and precise classifier resembles (Alzubi et al. 2015). Bagging Algorithm called DivBagging (Alzubi 2015a, b) is a diversity-based approach designed to classify the ensembles by training on a variety of bootstrap replicating the training dataset. This method is proved to prune ensembles on selected and exact base classifiers.

Based on the pattern, its recognition may be supervised or unsupervised. Multivariant Support Vector Machine (MSVM) is a widely used method for supervised pattern recognition. It transforms the original image data into a high dimensional space and constructs a single or a pair of hyperplanes and reduces the variation between various categories (Wenwen Kong et al. 2018). The tomato yellow leaf curl virus using SVM is detecting with the help of some classic features of extraction followed by the classification techniques (Mokhtar et al. 2015). A comparative evaluation of DCNN model was made with Support Vector Machine (SVM) and random forest conventional classifiers and proved to be robust (Ma et al. 2018). Support Vector Machine has been used to differentiate the images dataset as infected and non-infected (Bhange et al. 2015).


For a given straightforward set of training with n number of samples, the SVM function (Wu et al. 2015) for prediction is representing as

$$y = \mathop \sum \limits_{j = 1}^{n} \propto_{j} y_{j} k\left( {x,x_{j} } \right)$$
(5)

where x is the input, k is the kernel function, xj is a support vector, for a label of xj and yj, aj is the weight factor.

3.5 Answer to the fifth research question

For this section, 18 papers were identified by filtering them with image analytical techniques that can be used to improve the quality. At last 8 papers are considering for analysis.

Noise filtering is one of the analytical techniques used to edit the image and prepare it for further processing. It includes image enhancement, augmentation, segmentation, and color space conversion. The convolutional operation is to extract the features like sharpen, blur, edge enhance, edge detect, and emboss from an input image. Each filter is applied to obtain the red, green, and blue channels from a leaf image by computing a dot product between the input pixel and filter pixel (Amara et al. 2017). In current years, CNN is an unsupervised architecture in deep learning and implements “filters convolutions” performance in the domain of images (Lowe et al. 2017). Segmentation of disease spot method with two pipelined procedures with complete color feature extraction and detection have been proposed (Bai et al. 2017).The CNN network gets activated when it finds the features in an input image. The image quality improvement using various noise filters is depicted in Figs. 2 and 3.

Fig. 2
figure 2

Examples of filters applied to images. a Blur b Sharpen c Emboss d Edge Enhance e Edge detect

Fig. 3
figure 3

Image after applying various filters (Peifeng et al. 2017). a Original image b HSV image c Filter with HSV ranges

The feature map for Miconvolutional layer is computing as

$$M_{i} = b_{i} + \mathop \sum \limits_{k = 0}^{n} W_{ik} *X_{k}$$
(6)

Here, * is the convolutional operation, \({X}_{k}\) is the input channel for kth filter size, \({W}_{ik}\) is the sub-kernel for the channel and bias term as \({b}_{i}\). In the proposed algorithm (Esmael Hamuda et al. 2017), an efficient Gaussian blur filter was applied to reduce the noise from the input image and enhanced its features. Based on the image recognition technology (Peifeng et al. 2017), Sobel operator method and vertical edge detection filter were applied on the grey image of wheat crop to eliminate its background and extract the features from the diseased spot. Later flood filling algorithm removes the remained noisy points from the image. Background removal, foreground image pixel extraction, or non-green pixel removal are the other methods to remove the overall noise from the datasets.

In many networks, CNN layer behaves as feature extractor from the image whose dimensions were reduced further by pooling layer (Andreas Kamilaris et al. 2018). Such pooling layers were used to obtain spatial invariance and reduce feature map resolution. One feature map corresponds to another feature map of the preceding layer (Wang et al. 2016). An existing typical CNN architecture that can extract the feature dimensions from an image is depicted in Fig. 4.

Fig. 4
figure 4

A typical CNN architecture proposed by (Hu et al. 2015) with a max-pooling layer

3.6 Answer to the sixth research question

For this section, 10 papers were identified by filtering them with deep learning techniques that can improve the quality of images by reducing overfitting and improve non-linearity. At last 3 papers are considering for analysis.

The rectifier linear unit (ReLU layer) can classify images and Parametric Rectifier Linear Unit (PReLU) can improve the model fitting with less overfitting risk and no computational cost. ReLU activation function is used to add or enhance the non-linearity to the convolutional network (Anil Jain et al. 1996) and is faster and better than the sigmoidal function (Plant Village Disease Classification Challenge 2018). Figure 5 clearly depicts the variation between two activation functions. The function is representing as

$$f(x) = \left\{ {\begin{array}{*{20}l} {x,} \hfill & {if\,x > 0} \hfill \\ {0,} \hfill & {otherwise} \hfill \\ \end{array} } \right.$$
(7)
Fig. 5
figure 5

Variation between ReLU and PReLU (He et al. 2015)

In (https://code.google.com/archive/p/cuda-convnet/) a neuron’s output as a standard model to represent f as a function for its x input is

$$f(x) = \tanh (x) \, or \, f(x) = (1 + e^{( - x)} )^{ - 1}$$
(8)

Concerning the training period with gradient descent, the saturating nonlinearities are slower than non-saturating nonlinearity as

$$f\left( x \right) = {\max}\left( {o,x} \right)$$
(9)

Classification accuracy can improve by replacing the non-parametric ReLU activation unit. PReLU adds some extra parameters that are equal to the overall channels count and are negligible when compared with weights count.

Consensus-based Combining Method (CCM) proved to be the better approach to improve the classification accuracy by combining the classifiers ensemble. CCM is different when compared with other methods by adjusting the iterative weight adjustments after measuring all the output classifiers (Alzubi et al 2018).

3.7 Answer to the seventh research question

For this section, 18 papers were identified by filtering them with deep learning techniques that can optimize the performance of automated systems by reducing the loss function among training and testing datasets. At last 6 papers are considering for analysis.

Stochastic Gradient Descent (SGD) is an algorithm implemented to update the biases and weights using a subset of training images dataset. It is used to minimize the loss function between training and testing datasets and optimize the input parameters (Shanwen Zhang et al. 2018a, b). The primary goal is to identify weight loss function. For the partial derivative parameter E(W, b), (W, b) can be updated as Eq. 10.

$$w_{ij}^{l + 1} = w_{ij}^{l} = \propto \frac{{\vartheta E\left( {w,b} \right)}}{{\vartheta w_{j}^{l} }}, b_{i}^{l + 1} = b_{i}^{l} - \propto \frac{{\vartheta E\left( {w,b} \right)}}{{\vartheta b_{j}^{l} }}$$
(10)

During learning, SGD works by choosing a small set of training inputs randomly. The study (Ghazi et al. 2017) used the algorithm to tune the weights with an Eq. 11 as

$$w_{t + 1} = \mu w_{t} - \alpha \nabla J\left( {w_{t} } \right)$$
(11)

Here \(\mu\) is weight momentum for \({w}_{t}\) and α is a learning state. The proposed models in (Alex et al. 2018) are trained using SGD with 128 examples batch size and 0.9 and 0.0005 dynamics of delay time. To reduce the training error weight w was updated as

$$w_{i + 1} = w_{i} + v_{i + 1}$$
(12)

In (Jain et al. 1996) SoftMax function is applied at the output layer to estimate the probability distribution of a specific event among n various events. For an event xi,SoftMax function is representing as.

$$F\left( {x_{i} } \right) = \frac{{{\text{Exp}}\left( {x_{i} } \right)}}{{\sum\nolimits_{{j = 0}}^{k} Exp \left( {x_{j} } \right),}}\quad {\text{where}}\;i = 0,1,2, \ldots k$$
(13)

In general, backpropagation (Jain et al. 1996) was first applied to CNN [55] to adjust the weights of the input variable and considered as the most potent learning algorithm (Ioffe and Szegedy 2015). For every node j in the output, the layer is performing as

$$\Delta \left[ j \right] \leftarrow f\left( {in_{j} } \right) \, x \, (t_{i} -_{qj} )$$
(14)

Repeat for m from M-1 to 1 for every node i in m layer as

$$\Delta [j] \leftarrow f( in_{i} )\Sigma j \, W_{i,j} \Delta [j]$$
(15)

Now, update the input variable weights for every Wi,j in nw as

$$W_{ij} \leftarrow W_{ij} + \alpha X \Delta \left[ j \right]^{ }$$
(16)
$${W}_{ij}\leftarrow {{W}_{ij}+\alpha X \Delta [j]}$$
(17)

When the condition gets terminated and finally it returns nn finally.

The few other significant researches contributed in the area of identifying the plant diseases using different deep learning models are included in the Table 3.

Table 3 List of significant research contributions claimed to different deep learning models for identification of plant diseases

4 Challenges in CNN implementation

This phase dedicates to discuss and identify some of the challenges that are still required to overcome. The outcomes can be useful for better image preprocessing, classification and further to detect the crop diseases at an earlier stage.

The primary challenges in CNN are to train many spectral image inputs and define their goals. It has made even further challenging with CNN classifier applications to classify the variations from the given input data. Analyzing the spectral mixtures by using CNN classifiers is one of the other most challenging aspects. In general, CNN classifiers classify several crop diseases depends on the integration of optimal parameters like texture, color, and object shape. These parameters can easily be trained when the standard images are linear. Training of such hyperspectral data is only possible when there are hundreds of uninterrupted spectral bands. Also, it is highly redundant to extract the adjacent spectral band's information in various spectral regions from a network. There are still more challenges remained while adopting more sophisticated CNN technology. Few situations cannot deal with automated methods on techniques like computer vision, and image processing as many researchers have their perspectives in using them. The models proposed so far are limited in their scope and dependent of data acquisition environment. It can lead to capturing different behaviours that can analyze images more difficult in disease classification and prediction. Some of the other challenges identified as the most impacting are:

  • Adapting new computer vision technologies are not enough standard for automatic disease detection.

  • Environmental conditions while acquiring the input data can also impact on analyzing the disease classification.

  • Disease symptoms are not well defined and making challenging to set healthy and diseased portions.

  • Visual similarities in the disease symptoms can force the existing methods to rely on variations to discriminate.

  • It was excluded to measure the severity of diseases and their management.

  • CNN models trained with smaller datasets may have a higher accuracy rate, but they are not trustworthy.

  • Running any CNN on CPU's will have higher computational cost rather than GPU's.

4.1 Gaps in the existing systems

The discussions presented were purely based on the literature reviewed for plant diseases and however, may vary for other areas of deep learning applications.

  • The networks do not provide a better data presentation in terms of multiple convolutions.

  • There are no flexible and adoptable models that can classify and predict more complex challenges of plant diseases.

  • No image augmentation techniques were used to extract features for precise classification.

  • The overall results of the model’s performance are not expected due to larger training and testing datasets.

5 Results and discussions

This phase has summarized and documented based on the information reviewed from selected studies.

5.1 Observation 1

Most of the studies employed a convolutional neural network is employing to identify the diseases in plants such as oil palm (Ahmadi et al. 2017; Wenwen Kong et al. 2018), maize (Sibiya et al. 2019; Dechant et al. 2017; Zhang et al. 2018a, b; Wiesner-Hanks et al. 2018; Fuentes et al. 2018), wheat (Lu et al. 2017b; Alexander Johannes et al. 2017; Jayme et al. 2015; Peifeng et al. 2017), soybean (dos Santos Ferreira et al. 2017), strawberry (Kim et al. 2018; Ferentinos 2018), cucumber (Ma et al. 2017a, b, 2018; Bai et al. 2017; Yusuke Kawasaki et al. 2015), pomegranate (Bhange et al. 2015), olive leaf (Cruz et al. 2017), rice (Lu et al. 2017a), and potato (Jain et al. 1996; Oppenheim et al. 2017). Similar kind of CNN models was implemented even for disease detection in some vegetables and fruits also. Some of the literature considered during this study to identify the subdomains in the area of agriculture is depicted in Fig. 6.

Fig. 6
figure 6

Applications of CNN models on various agriculture subdomains

5.2 Observation 2

Most of the researchers (Mohanty et al. 2016; Ahmadi et al. 2017; Lu et al. 2017b; Zhang et al. 2018a, b; Wiesner-Hanks et al. 2018; Jain et al. 1996; Wang et al. 2017) has acquired 133,158 images from the existing databases, and only 29,301 images (dos Santos Ferreira et al. 2017; Kim et al. 2018; Ma et al. 2017a, b, 2018; Bai et al. 2017) were collected from the real field. It means that current models rely on the public datasets but not on images acquired from the actual field conditions. This observation has then determined that these CNN models are not training adequately and testing using images obtained from the real field and various climatic conditions.

The performance of CNN models is not determining with the standard datasets that are used by most of the studies. The present article believes that the existing CNN models can only be trained and tested with the image datasets acquiring from the field captured throughout the crop life cycle. Figure 7 depicts the two primary sources like public datasets and real-field to acquire hyperspectral images.

Fig. 7
figure 7

Various hyperspectral data acquiring sources

5.3 Observation 3

During the CNN training, a minimized rate of error from 0.275 to 0.001 was obtained (Sibiya et al. 2019), but the model accuracy is varied from one class of disease to others. The model's performance is useful in identifying northern corn leaf blight with 99.9% accuracy, but it was just 87% for other disease classes (Sibiya et al. 2019). The present study believes that there should be a minimal variation in detection accuracies while predicting all the courses of crop diseases because separate models are not developing for different classes of diseases.

The CNN model should be developed and trained in such a way that the same network is implementing for various courses of a crop. Figure 8 depicts the performance analysis of existing CNN architectures by testing on multiple disease classes. The picture shows how the models failed while performing on multiple disease classes.

Fig. 8
figure 8

Performance evaluation of the CNN model

5.4 Observation 4

The leading technologies of CNN used to develop the models are GoogLeNet, Cifar, CaffeNet, VGG-FCN-S, VGG-FCN-VD16, and VGG16. The views (Lu et al. 2017b; Dechant et al. 2017; Zhang et al. 2018a, b; Wang et al. 2017) have relived that 86% of existing models are not so efficient in disease identification and classification. The present study believes that the existing models even need to improve in its developed and implementation so that a higher level of accuracies were acquired. Figure 9 depicts how deep convolutional models improved their performance while identifying multiple disease classes.

Fig. 9
figure 9

Results presenting the efficiency of DCNN in dealing with disease prediction

6 Future directions

Comfortability with the state-of-art, research methodology would support researchers not only in applying deep learning techniques. For this work, the study has planned to apply deep learning practices to the area of agriculture in identifying and predicting the various crop diseases at the earlier stage of infections. It is observed that DL would conduct plant disease identification such as health and diseased, and their classification. Its time to examine the DL applications on recognition/classification of possible diseases and many more where neural networks are intricated.

In future, more DL approaches are expected to adopt a higher level of performance. Eventually, the researchers need to test their developing or developed models on datasets acquired from real-field.

7 Conclusion

In this paper, a survey on deep learning techniques was performed and identified how far they are applicable in the agriculture domain. During this work, 84 relevant articles have identified by examining the area of agriculture. We focused on data sources, models employed, pre-processing techniques adopted and evaluating the overall efficiency of the proposed CNN models. The outcomes indicated that most of the existing CNN models were limited in their potential to process original image data in their unstructured form. Deep learning systems need a systematic engineering and expertise design abilities to extract featured extraction from the unstructured data into feature vector through which the subsystems can often detect and classify specific patterns in input data.

The purpose of this survey would prompt the researchers to implement deep learning techniques for plant disease identification and classification related to image analysis.