1 Introduction

Agricultural sector is playing its primary role in the process of economic development of several countries. It has already contributed substantially to the economic prosperity of both developed and under-developed countries. One of the factors, having considerable impact on the crops naturally growing process is the pests and weeds. They affect both quality and quantity of the agricultural products, which can be a central cause for an economic loss [32]. However, an accurate detection and classification of crop diseases at their initial stages could be helpful in reducing economic loss [10]. The crop diseases can be manually inspected and classified by an expert but this approach is quite unreal in terms of time and cost [26], therefore, their exists a room for an intelligent decision. The infection reveals itself mostly on leaves, therefore the chances of an detection increases if we efficiently utilize a computer vision based methods. [5] (Table 1).

Table 1 Abbreviations table

Normally, diseases can be detected and classified by examining few features, especially related to color, texture, shape, etc. [23, 36]. For example, in case of cucumber crop, detection and classification of diseases is a challenging task due to several reasons such as complexity of multiple diseases, regions’ irregularity, high number of extracted features, etc., [3]. The most common cucumber diseases, which affect the growth of cucumber are powdery mildew, anthracnose, downy mildew, angular leaf spot, and scab [10]. Several computer vision based methods are implemented lately for cucumber diseases detection and classification such as support vector machine (SVM), artificial neural networks (ANN), visual saliency maps, region growing, color information based approaches, probabilistic methods, thresholding, etc., [12, 19, 37]. These methods automatically detect and classify the cucumber diseases and with good accuracy.

There are few challenges researchers may face while detecting and classifying cucumber leaf diseases. Few of them are related to low contrast, background complexity, selection of irrelevant features, etc. The low contrast image has a direct impact on the segmentation accuracy and the degradation is mostly reflected in the later stage of classification. Similarly, the cucumber leaf disease segmentation itself is one of the most challenging tasks, where few primary challenges are; change in scale, origin, shape, and color. Afterwards, the irrelevant and redundant features may cause a problem at the classification stage in terms of average accuracy and execution time.

In computer vision and machine learning, the opening of deep neural network particularly convolutional neural network gives an excellent performance as compare to handcrafted features [13]. Several deep learning models are introduced by several researchers in literature and shows notable performance. However, the importance of any method is check based on the error rate and the number of parameters. AlexNet deep CNN model [17] is introduced in 2012 have error rate 15.3% and the number of the parameter are 60 million. ZFNet [6] is developed as a same structure of AlexNet in 2013 having error rate 14.8% and no of parameters are 4 million. A most famous deep CNN model name GoogleNet [31] is developed in 2014 and achieved an error rate of 5.1% for single model and 3.6% error rate for ensemble model. The total no of parameters of this model is 4 million. In 2014, VGG deep model [29] is developed which includes 16 and 19 convolutional layers and achieved error rate of 7.3% along 138 million parameters. In the computer vision community, this model is most useful for features extraction and shows superior performance as compared to these existing methods [21].

In this article, a novel method is proposed for cucumber leaf disease detection and classification based on improved saliency method and deep features selection. The proposed method incorporates four primary steps: a) contrast enhancement; b) diseased spot segmentation; c) deep features extraction and selection; d) classification. To authenticate the proposed method, five different diseases are selected including, powdery mildew, downy mildew, anthracnose, corynespora, and angular leaf spot. The major contributions are:

  1. 1.

    A new preprocessing method is proposed, which is a combination of local contrast, top-hat & hessian based filtering, and HSV color space transformation. The local contrast is initially improved to highlight edges of the salient regions (lesion), which are later enhanced using top-hat and hessian based filtering. A 3D median filter is implemented for removing noise in the sharpened image, which ultimately clears the diseased spots after performing HSV color transformation.

  2. 2.

    A novel binary segmentation method is developed in which color features are calculated using 3D HSV matrix, which in turn produces two values - mean deviation and harmonic mean. Based on calculated mean deviation value, a binary cluster is formulated from a Chi-Square distance matrix which is further refined by utilizing a harmonic mean threshold value. This entire procedure supports us to remove the inutile fragments from a segmented binary image.

  3. 3.

    A new feature selection methodology is implemented, which on the basis of three statistical measures - local entropy (L-Entropy), local standard deviation (L-SD), and interquartile range (IQR), selects the most principle features.

2 Related work

Several techniques are implemented for the detection and classification of cucumber leaf diseases. It can easily be concluded from the existing literature that segmentation and feature extraction steps play their vital role in an accurate classification. Pixia et al. [24] initially pre-processed the data by removing noise, prior to segmentation and classification step. In the second phase, noise free images are initially segmented and later features are extracted based on color, texture and shape. Finally, classification is performed using K-nearest neighbor (KNN). In their experiments, three types of cucumber leaf diseases are considered including, downy mildew, powdery mildew and anthracnose. Similarly, Tian et al. [36] utilized the same features used by [24], and later classified using SVM. In this article they concluded, the combination/fusion of features provides better results compared to shape features only. Ma et al. [20] constructed a deep neural network (DNN) based model for cucumber leaf diseases recognition. A data augmentation is performed before process by DNN and segments the symptoms which discriminate the cucumber diseases from the complex background. In order to evaluate the presented method, a total of 14,208 cucumber leaf images are utilized and shows good performance. Guo et al. [11] implemented an automatic system for cucumber leaf diseases classification in which they considered color and texture features, and later performed clustering using Bayesian technique. Four types of diseases are considered in their experimental results including, downy mildew, anthracnose, powdery mildew, and gray mold by achieving recognition rate of 94.0%, 86.7%, 88.8%, and 84.4%, respectively. Zhou et al. [41] presented a new method of cucumber diseases detection by calculating 11 parameters of two color spaces (RGB and HSV). The extracted features are later utilized by SVM to achieve recognition rate of 90% for downy mildew disease. Zhang et al. [39] introduced a new sparse representation based cucumber leaf diseases recognition method. It primarily consists of three steps; 1) k-means clustering to segment infected regions; 2) shape and color features extraction and; 3) sparse representation of extracted features. The introduced method efficiently decreases the computational time and achieved recognition rate of 85.7%. Zhang et al. [38] presented a singular value decomposition (SVD) based cucumber leaf diseases recognition. In their research, the infected regions are initially segmented using watershed algorithm, which are later divided into blocks to extract SVD features. Three cucumber leaf diseases are utilized in their experimental results which are downy mildew, blight, and anthracnose to achieve average recognition rate of 91.63% and average computation time of 24 sec. Zhang et al. [40] introduced a new method of cucumber diseases recognition based on superpixels fusion, EM segmentation, and pyramid HOG (PHoG) features. The introduced method consists of four steps; 1) dividing the diseased spot parts into small regions using superpixels; 2) implementing EM algorithm to detect diseased spots; 3) extracting PHoG features of segmented regions; 4) classifying the features using SVM. Five types of cucumber leaf diseases are used for experiments including angular, scab, powdery mildew, downy mildew, and anthracnose by achieving average recognition rate of 91.48%.

From above literature, it can be concluded that color, texture and shape features along with some efficient feature selection method is of great importance and hence for an improved recognition rate. It can be seen, algorithm using color and texture features perform much better compared to shape features only. In our subsequent sections, we are explaining our proposed method along with its improved results, compared to several existing algorithms.

3 Proposed method

In this section, proposed cucumber disease detection and recognition method is described in detail. The proposed method consists of five primary steps: a) image enhancement; b) infected region segmentation; c) deep features extraction; d) feature selection and e) recognition. The flow diagram of proposed method is show in Fig. 1, where each step is comprised of series of sub-steps.

Fig. 1
figure 1

Proposed flow diagram of cucumber disease detection and classification

3.1 Contrast enhancement

The contrast enhancement step in the domain of digital image processing is very important and also plays a vital role in the diagnosis of infected regions. The contrast enhancement methods maximize the differentiability ratio between the infected part and the background. In this article, for contrast enhancement an efficient method is presented. which is a combination of series of sub-steps including local contrast enhancement, top-hat & hessian based filtering, image sharpening & 3D median filtering, and HSV color space transformation. Each step follows a given sequence by producing a new enhanced image. These series of pre-processing steps not only improve the image quality but also separate the foreground from the background.

In the first step, local contrast of input image is increased while keeping the strong edges. Two variables α and β are initialized, where α represents the threshold value and β is the smoothing variable, which are initialized as 1.5 and 2, respectively. Suppose, an input image I(x, y) has number of pixels Ik. For scaling factor, mean of I(x, y) is calculated using (1).

$$ M_{I(x,y)}=\mu(I_{k})+ \beta(I_{k}-\mu(I_{k})) $$
(1)

where μ(Ik) denotes the average value over I(x, y). The resultant image of (1) has a different appearance in terms of lower details. This problem is resolved by implementing a local filter to control the contrast’s details. Mathematically, the local filter is defined as:

$$ L_{filter}(x, y)=\gamma_{\alpha, \beta}(I_{k})+C $$
(2)
$$ L_{I(x,y)}=L_{filter}(x, y)\left \{ M_{I(x,y)} \right \}_{\alpha, \beta} $$
(3)

where γ represents a local filter, C is a constant value which is initialized as 3, and LI(x, y) is an image with improved local contrast, shown in Fig. 2b. This filter controls the contrast of an image and produces enhanced image in terms of higher details. Afterwards, the top-hat filter is performed on local contrast image based on its structuring element b which is initialized as 6. This operation enhances the foreground contrast and make more visible. This process make helpful in symptoms segmentation step. Mathematically, the top-hat filtering is defined as follows:

$$ Top(x,y)=L_{I(x,y)}-L_{I(x,y)}\circ b $$
(4)

where Top(x, y) is resultant image with top-hat filter applied, as shown in Fig. 2c. In the third step, the hessian filter [30] is exercised to identify the edges of infected regions. The infected regions or symptoms which are not clear by top-hat filtering operation, are visible after this step. Mathematically, the hessian filter is defined as follows:

$$ Hes(x,y)=max_{\sigma}F(Top(x,y),\sigma) $$
(5)

Hes(x, y) is a hessian filtered image, shown in Fig. 2d, σ is a normalization parameter, \(F=\sigma \left | \frac {\lambda _{1}+ \lambda _{2}}{2} \right |\), λ1 and λ2 represent the eigen values. As a subsequent step, Hes(x, y) image is sharpened and later subjected to median filtering [8], shown in Fig. 2e. The median filter removes the noise, which was scaled up in the previous step, to make infected regions more clear, Fig. 2f. As a final step, RGB is transformed into HSV color space - standard relations are given below, which clearly disassociates the infected regions from the healthy background. The formulation of median filtered and HSV transformation on symptoms is given below:

$$ HSV(x,y)=F1(Median(x,y)) $$
(6)

where Median(x, y) represents the 3D median filtered image, \(F1 \in \left \{ H(x,y), S(x, y), V(x, y) \right \}\), where H(x, y)S(x, y), and V (x, y) are hue, saturated and variation channels respectively. mathematically, these channels are computed as follows:

$$ H(x,y)=60^{\circ} \times H^{\prime}, H^{\prime}=\left\{\begin{array}{lll} \left | \frac{\phi_{G}-\phi_{B}}{C_{r}} \right | & if & M1=\phi_{R}\\ \frac{\phi_{B}-\phi_{R}}{C_{r}}+2 & if & M1=\phi_{G} \\ \frac{\phi_{R}-\phi_{G}}{C_{r}}+4& if & M1=\phi_{B} \end{array}\right.,\ \ S(x,y)=\left\{\begin{array}{ll} 0 &if \ \ M1=0 \\ \frac{G}{V} & Otherwise \end{array}\right., \text{and } V(x,y)= $$

\(\max \limits (\phi _{R}, \phi _{G}, \phi _{B})\). Where, ϕR, ϕG, and ϕB denotes the red, green, and blue channels and computed as \(\phi _{R}=\frac {R}{255}\), \(\phi _{G}=\frac {G}{255}\), \(\phi _{B}=\frac {B}{255}\). The variable M1 = max(ϕR, ϕG, ϕB), Cr that represent as a chroma component. The effects of HSV color space on median filtered image is shown in Fig. 2g, h. In the subsequent sections, HSV image is utilized for further processing.

Fig. 2
figure 2

Contrast enhancement results; a original image; b local contrast image; c top-hat filtered image; d Hessian filtered image; e sharpened image; f median filtered image; g HSV transformed image; h histogram plot of HSV image

3.2 Infected region segmentation

Segmentation of infected regions in the cucumber leaf images is one of the primary steps in object classification [2, 15]. An accurate segmentation method has a direct impact on the feature extraction and classification step [40]. Mostly, we concentrate on the color features for the segmentation, and even several existing methods have utilized color features [1, 9], but a meaningful combination of multiple features produces better results. In this article, a new SHSB method is proposed, which is based on color features, chi-square distance, mean deviation (MD), harmonic mean (HM) and fusion of active contour segmentation. Initially, enhanced HSV image is used for the extraction of color features. Five parameters including mean, variance, standard deviation (SD), singular value decomposition (SVD), and skewness is calculated for color features, which are later fused with SFTA texture features using simple concatenation. The HM and MD are calculated from the fused feature vector and these values are utilized for selection of max and min pixel values of an image. Mathematically, the MD and HM are calculated as follows:

$$ \begin{array}{@{}rcl@{}} MD&=&\frac{\sum\left |f_{i}-\mu \right |}{f_{N}}\\ HM&=&\frac{f_{N}}{{\sum}_{i=1}^{f_{N}}\frac{1}{f_{i}}} \end{array} $$
(7)

where, μ represents the features mean, MD is mean deviation, HM is harmonic mean, fi denotes the ith extracted feature and fN represents total number of features (Table 2).

Table 2 Nomenclature table

Afterwards, the chi-square distance is calculated from enhanced HSV (x, y) image to find out the minimum and maximum distance pixels. The chi-square distance is defined as follows:

$$ \mathbf{D}=\sqrt{\frac{1}{r} \sum\limits_{j=1}^{p_{N}}(\frac{X_{rj}}{X_{r}}-\frac{X_{cj}}{X_{c}})} $$
(8)

where, Xr denotes the total number of rows in the image, Xc represents the columns, Xrj denotes the pixels of the rows, Xcj denotes the column pixels, and pN denotes the total number of pixels in the image. A selection function is adopted, which selects the minimum and maximum distance pixels based on MD. The minimum distance pixels are selected for final saliency image, which are calculated using the following relation.

$$ P_{i}(\chi^{2})=\left\{\begin{array}{lll} Min & if & \mathbf{D}\geq MD \\ Max & if & \mathbf{D}< MD \end{array}\right. $$
(9)

where, MD represents the mean deviation and Pi(χ2) is a chi-square distance between image pixels. Finally, the selected pixels are compared with HM value to select the maximum pixels based on their objective function.

$$ Fin(Sal)=\left\{\begin{array}{lll} \max & if & P_{i}(\chi^{2})\geq HM \\ \min& if & P_{i}(\chi^{2}) < HM \end{array}\right. $$
(10)

where, Fin(Sal) is the final image. It is quite clear from (10) that if the minimum distance pixels, selected after (9), are greater than the calculated HM value then the selected value is 1, otherwise it is 0. To improve the segmentation results, the active contour-based segmentation method is utilized. V channel is selected and later fed in to active contour model for segmentation, as shown in Fig. 3.

Fig. 3
figure 3

Active contour segmentation results; a Channel selection from HSV transformation; b mesh plot of selected channel; segmented image

The resultant binary image after from active contour model is fused with a proposed segmented image by implementing union operator. The primary reason behind is to obtain better segmentation results, because few diseased pixels are removed after implementing (9). The fusion process is defined as follows:

Let ψ(X1, X2) be a sample space, where X1 denotes the proposed segmented image and X2 denotes the active contour segmented image. The union operation is performed as:

$$ \begin{array}{@{}rcl@{}} X_{1}\cup X_{2}&=&\left \{ X_{1}\cup X_{2} \right \}\cap{\varPhi}\\ P(X_{1}\cup X_{2})&=&P(X_{1})\cup P(X_{2})\cap P({\varPhi}) \end{array} $$
(11)

where, P(Φ) denotes the zero pixels value and P(X1X2) represents the fused image. The morphological operations are performed on the resultant image to remove the inutile fragments and also clearing the boundaries. In Fig. 4, proposed segmentation result is shown, which is later fused with active contour model to improve the segmentation performance.

Fig. 4
figure 4

Proposed segmentation results. The HSV transformed image is utilized as a input and obtained a mapped segmented image as a output

The proposed segmentation results are analyzed with respect to the ground truth images in Fig. 5, which clearly shows that the proposed method has performed exceptionally well. For testing the proposed segmentation algorithm, 45 images are selected from the database and their results are shown in Fig. 6. The accuracy of the proposed method is also calculated with respect to ground truth image, according to the following equation.

$$ Accuracy=\frac{SM(x,y)-DF(x,y)}{{\varPsi}(X_{3},X_{4})} $$
(12)

where, SM(x, y) represents similar pixels values of the ground truth image and segmented image, DF(x, y) represents difference of pixels, and Ψ(X3, X4) represents the total number of pixels. Accuracy results are also presented in Table 3, where the maximum achieved accuracy is 97.23%, while average accuracy is 92.49%.

Fig. 5
figure 5

Analysis of proposed segmentation; a original image; b segmented image (proposed); c mapped image; d ground truth image; e mapped ground truth image

Fig. 6
figure 6

Steps of proposed segmentation results. The each image effects are demonstrated from top to bottom

3.3 Deep Features extraction

Table 3 Sample results of proposed segmentation method

Feature extraction is one of the key steps for an accurate classification in the field of machine learning [13, 16, 22, 27, 28, 35]. Several feature extraction techniques exist in the literature, which are specifically dealing with shapes [4], texture [41], color [19], and geometric features [25]. In agricultural domain, color and texture features are of more importance compared to others, due to the range of variations in image samples. The primary concern here to extract and utilize the most relevant extracted features, which ideally produces the best classification results. In this article, deep features are extracted using pre-trained models of VGG-VD-19 [29] and VGG-S,M,F [7]. These deep features are initially fused serially and later subjected to the max pooling step having window size of (2 × 2). The resultant features are fused again to produce a singleton vector prior to feature selection. The detailed architecture of proposed feature extraction and selection steps is shown in Fig. 7.

Fig. 7
figure 7

A proposed flow diagram of feature extraction and feature selection steps

3.3.1 VGG-VD-19

The pre-trained VGG-VD models (VGG-16, VGG-19) are introduced by [29] in 2014. In this article, VGG-19 pre-trained model is selected for deep feature extraction. It incorporates 16 convolution layers, 16 Relu layers, 3 fully connected layers (FCL) with 2 Relu layers and 1 softmax function for classification. For feature extraction, fully connected layers - FC6, FC7, and FC8, are selected, where the input variable for all FCL is x0 and output layers are x38, x40, and x42 for FC6, FC7, and FC8, respectively. Individual dimensions of extracted feature vectors for FC6, FC7, FC8 is presented in Table 4, and the fused vector [34] dimensions are (1 × 9192), shown in Fig. 7. Finally, a max pooling is performed on a fused vector to identify max features by generating a vector of a given size, (1 × 2298). Resultant vector obtained after max pooling step is later fused with VGG-M deep vector for the classification.

Table 4 Description of extracted deep features including input, output, FC layer and final feature vector

3.3.2 VGG-S,M,F

VGG-S,M,F pre-trained deep features are introduced by [7] in 2014, having 6 pre-trained models, publicly available for feature extraction. In this article, VGG-M model is selected for deep feature extraction, consisting of 5 convolution layers, 3 FCL, 7 Relu layers, 1 softmax, 1 norm layer and 1 pooling layer. For feature extraction of VGG-M model, FCL (FC6, FC7, and FC8) are selected. For each layer, the inputs, output parameters and their corresponding feature vector is also defined in Table 4. The extracted feature vector is fused using serial based method [34] prior to max pooling step. Resultant vector generated incorporates features with maximum information and reduced size, (1 × 2298).

3.3.3 Features selection

Selection of most optimal features is a crucial task, especially with the case when the provided information is redundant enough. Considering the fact, selection of principle features on one hand identify most salient features but on the other hand remove the relevant information. Therefore, the selection criteria should be robust enough, to minimize the information loss. The primary objective here is to maximize the classification accuracy by selecting most optimal features. For a robust selection, VGG-19 and VGG-M feature vectors are fused using the serial based method. Let FV 1 and FV 2 are two feature vectors of a sample space Ω, where FV 1 represents VGG-19 feature vector of size (1 × 2298) and FV 2 is VGG-M feature vector of size (1 × 2298), and the serial based fusion ξ(i) is given as \(\xi (i)=\left (\begin {array}{l} FV1\\ FV2 \end {array}\right )\). It clearly shows, if size of FV 1 is 1 × u and FV 2 is 1 × v, then addition of their length is defined by ζ = (u + v), where the length of fused features ζ is (1 × 4596), shown in Fig. 7. Afterwards, a new feature selection algorithm is implemented which is based on three parameters including local entropy, SD and interquartile range (IQR). Each parameter is calculated separately for all features and sorted into an ascending order. Mathematically, the formulation is defined as follows:

$$ \begin{array}{@{}rcl@{}} En(\zeta (i))&=&E[\zeta(i)\\ &=& E[-\log(P(\zeta(i)))]\\ En(\zeta (i))&=&-\sum\limits_{i=1}^{\zeta}P(\zeta(i))\log_{2}P(\zeta(i))\\ En_{vec}(\zeta(i))&=&Sort(En(\zeta (i)), \varrho ) \end{array} $$
(13)
$$ SD(\zeta(i))=\sqrt{\frac{\zeta_{i}-\bar{\xi(i)}}{\zeta}} $$
(14)
$$ SD_{vec}(\zeta(i))=Sort(SD(\zeta(i)), \varrho) $$
(15)
$$ IQR(\zeta(i))=\vartheta_{U}- \vartheta_{L} $$
(16)
$$ IQR_{vec}(\zeta(i))=Sort(IQR(\zeta(i)), \varrho) $$
(17)

where, Envec(ζ(i)) is an entropy sorted vector, ϱ denotes the ascending order parameters, SDvec(ζ(i)) denotes the sorted SD vector, \(\bar {\zeta }\) is a mean of feature vector ξ(i), IQRvec(ζ(i)) denotes the IQR sorted vector, 𝜗L is a lower quartile, and 𝜗U is an upper quartile, and 𝜗U = max(ξ(i) and 𝜗L = min(ξ(i)), respectively. Resultant vector generated still contains some redundant information, therefore, only top 2000 features from each sorting vector are selected and later fused to construct a feature vector, implementing a parallel fusion strategy [34]. At fusion, the max value feature is preferred for final selection by using below expression and example is demonstrated in Fig. 8.

$$ SLF(vec(i))=\left\{\begin{array}{llll} En_{vec} &if &En_{vec}\geq SD_{vec} \ \& \ IQR_{vec} \\ SD_{vec} &if & SD_{vec}> En_{vec} \ \& \ IQR_{vec} \\ IQR_{vec} &if & \ IQR_{vec}> En_{vec}\ \& \ SD_{vec} \end{array}\right. $$
(18)

where, SLF(vec(i)) represents final selected vector having dimensions of (1 × 2000), Fig. 7. The final selected vector, SLF(vec(i)), is later fed to multi-class support vector machine (M-SVM) [18] for classification.

Fig. 8
figure 8

An example of propose feature selection from three extracted vectors

4 Experimental results

In this section, the proposed method is validated on the selected image dataset [39, 40]. Several experiments are conducted for the validation of proposed method on the basis of Deep VGG-19 & VGG-M features, a serial-based fusion of Deep VGG-19 & VGG-M features, and finally, on proposed selected features. These experiments are performed in two different steps. In the first step, classification results are calculated using infected images (selected diseases), and in the second step, the classification is performed on all selected diseases along with the healthy images. For performance evaluation, we utilized a ratio of 50:50 for training and testing, shown in Table 5 and later validate our results with 10 fold cross validation [33]. Multi-class SVM (M-SVM), decision tree (DT), cubic SVM (C-SVM), logistics regression (LR), fine k-nearest neighbor (F-KNN), neural network (NN), and ensemble subspace discriminant analysis (ESDA) methods are utilized for classification. The performance of classification methods is calculated based on six measures including false positive rate (FPR), false negative rate (FNR), sensitivity, precision, AUC, and accuracy. All experiments are performed on MATLAB 2017a using Desktop computer Core-i7 with 8GB RAM and 8 GB GPU of Sapphire R9-290. Detailed description of number of training samples, testing samples, and selected steps for experiments are presented in Table 5.

Table 5 Description of training, testing samples with respect to selected diseases

4.1 Results

In this section, four different categories of experiments are performed on the proposed algorithm. The selected experiments are with: a) deep VGG-19 (1 × 9192); b) deep VGG-M (1 × 9192); c) serial based feature fusion (1 × 4596); and d) proposed features selection (1 × 2000). For each experiment, 2 tests are conducted, details are presented in Table 5.

4.1.1 VGG-19 (1 × 9192)

In this experiment, extracted features from pre-trained CNN model of VGG-19 are utilized. Usually, FC layers are used for the classification, therefore, we used FC6, FC7, and FC8 for DCNN features extraction and later fuse them using serial-based method, shown in Fig. 7. The dimensions of fused extracted features are (1 × 9192), which are later passed to classifier for recognition. Two different tests are performed for computing classification results, e.g., in the first test, all diseases are selected to achieve classification accuracy of 93.1%, FNR 6.9%, AUC 0.991, precision rate 93.72%, sensitivity 92.78%, and FPR of 0.012 on M-SVM, Table 6. Performance of M-SVM is also proved by a confusion matrix given in Fig. 9. In the second test, all classes including healthy class are classified to achieve best classification accuracy of 93.7%, FNR 6.3%, sensitivity rate 92.95%, precision rate 94.25%, and FPR of 0.010 using M-SVM as given in Table 6. It is quite clear from the Table 6 that M-SVM performs efficiently compared to other classification methods, which can also be shown in confusion matrix in Fig. 10. Our primary objective of having two different tests is just to show that with the increase of classes, the classification accuracy doesn’t degrade, Table 6.

Fig. 9
figure 9

Confusion matrix for Test 1 using VGG 19 pre-trained model

Fig. 10
figure 10

Confusion matrix for Test 2 using VGG 19 pre-trained model

Table 6 Evaluation results on VGG 19 pre-trained model

4.1.2 VGG-M (1 × 9192)

In this experiment, the deep CNN (DCNN) features are extracted from pre-trained CNN model VGG-M. For feature extraction, we used FC6, FC7, and FC8 and fused their features in one vector by a serial-based method, Fig. 7. Similar to experiment 1, two different test are performed for classification results. In the first test, maximum performance is obtained using M-SVM to achieve classification accuracy of 91.4%, FNR 8.6%, AUC 0.989, sensitivity rate 91.60%, and FP rate of 0.018, respectively. The classification results of M-SVM is presented in Table 7 and confusion matrix is shown in Fig. 11. In the second test, the maximum recognition accuracy is achieved on M-SVM in term of classification accuracy which is 93.9%, FNR 6.1%, AUC 0.994, precision rate 94.27%, and FPR of 0.007, as shown in Table 7 and confusion matrix in Fig. 12. From above tests, it is clear that the maximum classification accuracy obtained is 93.9% on original deep CNN features, which are generated by VGG-19 and VGG-M. These results are not sufficient enough for a good system, therefore, in this research, we perform max pooling after deep CNN feature extraction and then fused the information of both models to achieve better classification results.

Fig. 11
figure 11

Confusion matrix for Test 1 using VGG M pre-trained model

Fig. 12
figure 12

Confusion matrix for Test 2 using VGG M pre-trained model

Table 7 Evaluation results on VGG M pre-trained model

4.1.3 Serial based feature fusion (1 × 4596)

The fused extracted DCNN features are utilized in this step by performing max pooling having window size of (2 × 2) as shown in Fig. 7. After max pooling, the two generated vectors from both DCNN models are fused serially to form a single vector having dimensions of (1 × 4596). Following a same pattern - Section 4.1.2, two different tests are performed to get the classification results. For test 1, the achieved classification accuracy is 94.5%, FNR 5.5%, AUC 0.995, precision rate 94.68%, and FPR of 0.011 for M-SVM as presented in Table 8 and confusion matrix in Fig. 13. In test 2, the classification accuracy is improved up to 96.1%, can be seen in Table 8 & confusion matrix in Fig. 14. The classification results show that with fused vector the classification results are improved, compared to the classification rate with individual features. This method performs well in terms of accuracy but consuming more time (average 50 seconds for each test), therefore, in the proposed method we are embedding the concept of feature selection.

Fig. 13
figure 13

Confusion matrix for Test 1 using serial based fusion approach

Fig. 14
figure 14

Confusion matrix for Test 2 using VGG M pre-trained model

Table 8 Evaluation results of serial based fusion approach

4.1.4 Proposed feature selection (1 × 2000)

In this section, results with proposed feature selection method is presented in terms of classification accuracy, FP rate, execution time, etc. Feature selection process relies upon three parameters which are local entropy, S.D, and local range. These three measures generate three parallel vectors which are later sorted in an ascending to select top 2000 features, prior to final classification stage. Following the same pattern as above, two different tests are performed. In test 1, the maximum recognition rate achieved is 98.1%, FPR 0.001, sensitivity rate 98.16%, and AUC 0.999 using M-SVM, Table 9 and confusion matrix in Fig. 15. The minimum execution time per image is 10.91s on M-SVM, which is much improved compared to other classification methods. In test 2, the maximum recognition rate achieved is 98.4%, FPR 0.001, sensitivity 98.36%, precision rate 98.55%, and AUC 0.999 using M-SVM. The results are shown in Table 9 and confusion matrix in Fig. 16. The minimum execution time of test 2 is 11.60s with M-SVM, which is little bit higher compared to test 1. The primary reason behind more execution time in Test 2 is the addition of one more class. Therefore, overall M-SVM performs well on proposed feature selection method compared to other classification methods.

Fig. 15
figure 15

Confusion matrix for Test 1 using proposed selection method

Fig. 16
figure 16

Confusion matrix for Test 2 using proposed selection method

Table 9 Evaluation results of Proposed selection approach

To further authenticate our results, we selected set of top features, 500 & 1000. At first, we selected top 500 features and achieved classification accuracy of 92.1%, Table 10. In the second case, top 1000 features are selected and got the classification accuracy of 93.1%. Comparing the classification accuracy achieved with 2000 features, it is hence cleared that 2000 selected features are giving improved results due to maximum variance of the features, compared to 500 and 1000.

Table 10 Evaluation results on 500 and 1000 selected features

4.2 Discussion

In this section, we analyze our proposed method in terms of segmentation and recognition accuracy. As explained, the proposed method is comprised of four primary steps - preprocessing, diseased spot segmentation, deep CNN feature extraction/ selection, and recognition. Each step incorporates series of sub-steps as shown in Fig. 1. Initially, the results of preprocessing and segmentation results are computed, can be seen in Figs. 245, and 6. Additionally, the results in tabular form are presented in Table 3 having maximum accuracy 97.23% and average accuracy of 92.49%. For cucumber disease spot recognition, deep CNN features are extracted from pre-trained DCNN models - VGG-19 and VGG-M, can be seen in Fig. 7. For each extracted set of features, two tests are conducted, presented in Table 5.

VGG-VD-19 deep features evaluation results are presented in Table 6 having maximum achieved recognition accuracy of 93.1% and 93.7%, which are also been confirmed by confusion matrix in Figs. 9 and 10. Later, classification accuracy is also computed by utilizing VGG-M pre-trained deep network and obtained accuracy 91.4% and 93.9%, given in Table 7. The classification accuracy of VGG-M network is also proved by confusion matrix given in Figs. 11 and 12. But the individual results of DCNN pre-trained models are not satisfactory and the existing methods reported better accuracy. Therefore, we proposed two new methods: a) fusion of both pre-trained features in one vector; b) select best DCNN features. The recognition performance of fusion method is satisfactory compared to the individual DCNN features. The results can be seen in Table 8 having maximum accuracy of 94.5% and 96.1%, and their confusion matrix in Figs. 13 and 14. The fusion process achieved satisfactory results but the execution time of propose system is high as compare to individual DCNN networks. To further improve the classification accuracy and minimize the execution time, a feature selection technique is proposed and achieved maximum recognition accuracy of 98.1% and 98.4% on M-SVM which are significantly improved, compared to all other experiments. The proposed recognition results are given in Table 9, which are also authenticated with confusion matrix in Figs. 15 and 16.

As demonstrated in Fig. 1, the propose architecture includes several substeps, therefore we compute the middle results to analyze the affect of these steps on final recognition accuracy. The middle results are computed in five different structures- i) acquisition, preprocessing, V channel selection, features fusion, and classification, ii)acquisition, SHSB, features fusion, and classification, iii) acquisition, preprocessing, features fusion, selection, and classification, iv) acquisition, SHSB, features fusion, selection, and classification, v) Proposed complete structure. The results of each structure are given in Table 4.2. From Table 4.2, we analyze that in the structure 1 (St1), the error rate is increase up to 21.60% and 23.04% for test 1 and 2 respectively. This validation of this structure shows that the segmentation process is essential in this case for better accuracy. In St 2, the error rate is minimized upto 17.06% and 15.94% for test 1 and test 2. In this structure, the preprocessing step is eliminate but the SHSB method provide better performance. Similarly, for St 3 and St 4, error is minimized up to 6.16%. In the last, the propose results are presented with an error rate of 1.60%. From these five structures experiments, we analyze that the segmentation process (SHSB) helps in increase the overall accuracy but consumes more time. Moreover, the selection process minimize the computational time and increase the overall accuracy.

It is evident from the results that the proposed feature selection method outperforms other methods with greater accuracy margin. We have also considered the problem of greater execution time in our proposed approach and results are quite accurate, can be seen in Fig. 17. It is clear that the average execution time on VGG-19 features is 150.23 seconds and VGG-M features is 193.18 seconds. After fusion of both model features in one vector, the execution time is maximize and average time is 198.76 seconds. With the proposed method, the execution time is reduced to 16.51 seconds with greater classification accuracy.

Fig. 17
figure 17

Time comparison of multiple scenarios

Table 11

The robustness of our proposed method is also been proved by providing a fair comparison with existing methods, Table 11. In [11], three cucumber diseases are classified including powdery mildew, down mildew, and anthracnose having recognition accuracy of 88.8%, 94.0%, and 86.7%. In [38], classified downy mildew, bilght, and anthracnose diseases and obtained average accuracy of 91.63% in 24 seconds. However, in [40], four different diseases are classified and achieved accuracy above 90% and execution time was 32 seconds. Comparing with the above methods [11, 38,39,40], our proposed approach performs significantly well and classified five disease classes with healthy class. The recognition accuracy of proposed method is above 96% and maximum recognition accuracy is 98.4% on angular leaf spot and powdery mildew. Moreover, the average execution time of proposed method is 16.51 seconds, which proves the authenticity of our method. Moreover, labeled results are shown in Fig. 18.

Fig. 18
figure 18

Propose labeled results

Table 11 Comparison of proposed method with existing techniques: PI = per image

4.3 Limitations and future work

Feature extraction is the most important step for any automated cucumber diseases recognition and most of the recent studies are relyed on various feature types. The current limitation of existing methods is a selection of more discriminant features for better recognition accuracy. In this work, the major shortcoming is a complex structure because the proposed system covers few essential steps such as preprocessing, infected region segmentation, deep features fusion, features selection, and at the last classification. This structure consumes huge time during the training process which is not good for large datasets. Moreover, the propose sharif saliency-based segmentation method is failed when the input images have low contrast. Therefore, in future articles, we will be trying to improve the segmentation method that works well under the low contrast environment. Additionally, few more measures need to be added in the feature selection step for improved classification results.

5 Conclusion

Early detection and classification of plant diseases is a critical task, especially in the field of computer vision. Due to several key factors such as low contrast input images, irregularity of leaf spots, minimum color variations between foreground and background, higher number of extracted features, etc, many existing methods have failed to meet the desired recognition rate, especially when dealing with the cucumber leaf diseases. Therefore, this article is an effort to cover the mentioned problems with an improved pre-processing technique and selection of most prominent features.

With an improved segmentation technique, cucumber leaf disease spots are accurately detected. Afterward, the deep features are extracted prior to selection step using three statistical parameters of L-Entropy, L-SD, and IQR. These parameters are sorted in an ascending order to select top 2000 features and later fused them using a technique of parallel fusion. The experimental results show that the proposed deep feature selection method performs significantly well compared to individual features as well as other existing methods. Additionally, the proposed method shows a major advantage in the form of lower execution time. Amongst several factors, the key aspects which require more focus is the addition of an improved segmentation and feature selection method. It is quite obvious, better the infected regions are identified, better features we will get.