1 Introduction

The agricultural landmass is more than just being feeding fount in current world. The economic growth of developing nations is very much depending on agricultural productivity. The crops’ leaf-based disease identification in the agriculture, at the starting phase, [8] performs an dramatic role to sustain their economic growth. The leaf image analysis of plants provides symptoms of various types of diseases [52]. The image processing applied to explore the various regions in the images of plant leaves. There are different types and levels of image analysis techniques. The initial level of image covers the specifice points and region localization. The point or corner and contour-based image analysis [25] provide lots of useful information to mark the diseases like early and late blight, target spot, etc. Neto et al. [45] worked on leave extraction using clustering and genetic algorithm. It is connected components based fuzzy clustering using genetic optimization.

The second level of image analysis covers the specific features based prediction. The contour detection-based image segmentation to determine the region of interest in a hierarchical manner is presented by Arbelaez et al. [5] through the tree formation. Rumpf et al. [50] presented hyperspectral data analysis to differentiate diseased from healthy sugar beet leaves. Different spectral vegetation indices, containing the physiological information were used as attributes for automatic classification. The differentiation of inoculated plants with particular diseases was analyzed using a support vector machine (SVM) classifier. Leaves recovery in addition to image segmentation and classification is performed by Teng et al. [60] with arbitrary image conditions. Xu et al. presented a color and texture features based leaves extraction that utilized the intensity histogram, derived histogram termed as the differential histogram, the Fourier transform, and the wavelet packets [66]. The third level of image analysis increases the depth of features study and compute the most significant attributes through different feature optimization approaches. The feature selection is performed using a genetic algorithm to obtain the suitable details for the disease diagnosis. Wang et al. [61] presented an adaptive thresholding technique to segment the single leaves from the leaf images.

Soares et al. [58] studied the leaf images segmentation in semi-controlled conditions (LSSCC) [56, 57]. Biswas et al. de-correlated the stretching to enhance the image colors and employed segmentation through the fuzzy C-mean clustering (SFCMC) [10]. Aparajita et al. [4] worked on automated potato disease identification as late blight by employing the statistical features-based adaptive thresholding for segmentation (SFATS). Wu et al. presented a multi-feature fusion recognition model (MFFRM) [64] for the classification of tomato images.

A plant disease detection and classification (PDDC) was performed by Barbedo et al. [6]. Yanikoglu et al. [68] worked on a plant recognition system for automatically identifying the plant in a given image. An image recognition based plant disease detection (IRPDD) was performed in [48]. A clustering based binary classification for the lesion pixels or healthy pixels separation to obtain the segmented image. After segmentation the texture, color, and shape features were extracted from the lesion images. Sabrol et al. [51] presented classification by classifying tree [53] using color, texture and shape features of localized healthy and disease affected tomato leafs’ images. Islam et al. studied a plane disease diagnosis system through machine learning (PDDML) [27]. The PDDML performed L*a*b* color based segmentation and the extraction of GLCM features with statistical traits for the classification of plant diseases. This automated method classifies diseases (or absence thereof) on potato plants. The disease prediction using SVM was analyzed through the segmented images. Patil et al. [46] worked on potato leaf images to develop an automated disease management techniques (ADMT) [18, 54]. The ADMT performed image segmentation using blob analysis and morphological filters.

Grand-Brochier et al. [22] studied the user and input stroke interaction application, along with the utilization of color distance maps. A deep convolutional neural network for plant disease identification was presented by Wang et al. [62]. Kaur et al. [28] presented a k-means clustering-based semi-automatic method (KMCSAM) for plant disease classification. A genetic approach-based attribute selection for apple disease identification and recognition (GAFSADR) was presented by Khan et al. [30]. The GAFSADR [28] system utilized hybrid filtering throuhg the Gaussian, box, and median filter, followed by correlation based segmentation and finally, the color, histogram, and Local Binary Pattern (LBP) features are fused by comparison based parallel fusion. The KMCSAM worked on L*a*b* color based segmentation and color and texture attributes for each cluster are extracted to train the classifier. The features are clustered first before training the classifier for the specific class. Mu et al. [44] presented geometric features and Haar wavelet-based feature extraction method that easily provides the tree leaves features. Kurmi et al. [37, 38] presented a leaf image-based crop diseases classification (LICDC) that extracted Fisher vector (FV) features for the detection of disease classes. All these methods compute the image level features based on the texture, color and shape, those are not capable to differentiate the different types of patterns present in the leaf region. These features can be extracted out through the adaptive analytic wavelet transform (AAWT) and FV that are capable to extract the specific pattern related features.

The existing work faces the impediments of different types for multi-class categorization of the dataset. However, the implementation work still paucity the segmentation outcome accuracy. The disease affected leaves segmentation is crucial to determine the disease type that needs a priori information and covers very few diseases. There is a requirement of the work to be extended and integrated that cover various crops and their diseases. The methodology upgrading, as well as database enlargement and refinement, is needed in order to achieve better accuracy.

The contribution of paper comes in the third level of image analysis, that overcomes the above discussed limitations. Initially, the regions of interest are extracted by color space transformations. The L*a*b color space utilization [12, 31, 63], to map the RGB (red, green, blue) in a high-dimensional color space. From which the intended region can be easily extracted in terms of the saliency map to determine an optimal color coefficients. It provides the leaf region separate from the background. Followed by the leaf county extraction different types of features using adaptive analytic wavelet transform (AAWT), BoW, and Fisher vector for the classification of images. The AAWT decomposes the preprocessed image into different sub-band image. Afore, Relief F and a box counting algorithm is employed to extricate the different entropy and fractal dimension (FD) attribute, individually.

It recommends the feature fusion of spatial domain with the frequency domain to extract the best possible discriminative information of the selected domain. Further the significance of extracted features is investigated using LDA and less significant features are removed from the feature set used for the classification purpose in order to reduce the complexity of classification. The features used for classification are an ensemble of bag of visual word features, Fisher vectors, extracted from the preprocessed image and 13 handcrafted features, extracted from the internal parts of the leaf. The extracted feature sets and their combinations are utilized logistic regression, multilayer perceptron model, and SVM classifiers. The proposal is offering improved classification accuracy of various plant diseases.

The remaining organization of the work goes further as: Section 2 explicates the materials and method. In Section 3, we have discussed the result computed from different models and compared the proposed system. Finally, the Section 4 provides conclusion of the work and gives future research directions.

2 Materials and method

The leaf image-based disease analysis needs a set of images to understand and a technique that is able to predict the disease by utilizing its dataset understanding. Here, we first discuss the dataset that is used to develop the efficient predictive methodology followed by leaf region localization, feature extrication, and classification.

2.1 Dataset

The dataset used in our study is taken from the PlantVillage that is freely available for research. It contains leaf image 14 plants with a good amount of images for each category. Here, we are utilizing only three plants bell pepper, potato, and tomato are members of the nightshade family (Solanaceae) [24]. The diseases effect on the nightshade family is different for different locations where crops grows and also varies with time at which the crop can be grown. To manage the crop production it is very essential to identify and control these diseases. The details of these datasets are given below:

2.1.1 Dataset 1: Bell pepper

The database of plant leaf images for the image-based disease study is accessible as PlantVillage data. It comprises 14 various crops data that contains 54,309 labeled images. The bell pepper dataset has two classes as bacterial spot and healthy. The sample images from the database are depicted in Fig. 1. The image count for each class includes 997 bacterial spot samples and 1478 healthy sample images as shown in Table 1. Latin name of pathogens [15, 41] causing bacterial spot diseases is Xanthomonas vesicatoria.

Fig. 1
figure 1

Bell pepper leaf images. (a) and (b) bacterial spots (c) and (d) healthy images

Table 1 Dataset 1: Bell pepper leaf images

2.1.2 Dataset 2: Potato

The sample potato leaf images from the dataset are shown in Fig. 2. It comprises three different categories of potato leaf images as healthy, early blight, and late blight. Latin names of pathogens [15, 41] causing plant diseases are mentioned in braces with the plant disease. The image sets of early blight (Alternaria solani) and late blight (Phytophthora infestans) containing 1000 images each on the other hand healthy leaves category has 152 images as given in Table 2.

Fig. 2
figure 2

Potato leaves. (a) healthy; (b) early blight; (c) late blight

Table 2 Dataset 2: potato leaf images

2.1.3 Dtaset 3: Tomato

A set of images of tomato leaves having various diseases is taken from the PlantVillage dataset. There are ten different categories that have nine disease classes and one healthy class. The sample images are shown in Fig. 3.

Fig. 3
figure 3

The sample images of tomato dataset from top left to right: the bacterial spot, early blight category, healthy class, late blight, the leaf mold; bottom left to right: the septoria leaf spot, leaves affected by spider mites, the target spot, the mosaic virus, and the yellow-leaf curl virus

The tomato database consists 1404 images of target spot class, 373 images for mosaic virus, 3209 samples of yellow leaf curl virus affected images, 2127 leaf images with bacterial spot. It has 1000 images of early blight category, 1591 healthy images, and 1909 late blight class. A set of 952 images of leaf mold, 1771 of septoria leaf spot, and tomato leaf affected by spider mites are 1676 as shown in Table 3. Latin names of the pathogens [15, 41] that causes the tomato disease are also included in Table 3.

Table 3 PlantVillage Tomato Dataset

2.2 Proposed method

The images persist some artifacts (noise) in some of the leaves. So, we applied some sort of pre-processing mechanism to encounter this problem. The intensity histogram equalization. In this section, a novel methodology for the automated classification of crop disease stages based on AAWT is developed. The block diagram of the proposed system is demonstrated in Fig. 4. In preprocessing, the input images are resized using bi-cubic interpolation technique into 360 × 480 pixels for the same resolution and to decrease the computational time.

Fig. 4
figure 4

Block diagram of the proposed methodology

The proposed segmentation methodology explicates a three-stage system. The first step separates the foreground i.e. extrication of leaf region from the background. The next step performs initial leaf identification purporting to the AAWT technique. The result of AAWT decomposition has been utilized for features extraction alongwith BoW, FV for the classification of leaf images.

2.2.1 Leaf foreground extraction

The seed point initialization requires prior information of the considered object in terms of leaves. The plant leaf images easily offer a color based seed point marking for the leaf region segmentation. The color space transformation is utilized for this process. The conversion of an image from RGB color space to L*a*b* color space provides an extension of spectral range. The RGB color space analysis is explained with visual illustration in Fig. 5.

Fig. 5
figure 5

Color analysis of disease affected bell pepper leaf in RGB color space

The light effect is reduced by normalization of the RGB color space and denoted as (Rnorm; Gnorm; Bnorm). The perceptual color spaces are CIELab and CIELuv. The color conversion from RGB to the perceptual color space, initially, the color space RGB is converted to the CIE XYZ color space that provides the basis for conversion to perceptual color spaces. The constituent L is similar for CIELab and CIELuv color spaces. The value of L indicates the lightness that is not dependent on the other two constituents. The formulation of the conversion process is given as follows:

$$ \begin{bmatrix} X \\Y \\ Z \end{bmatrix} = \begin{bmatrix} 0.431 &0.342 & 0.178 \\ 0.222 &0.707 &0.071 \\ 0.020 &0.130 & 0.939 \\ \end{bmatrix} \begin{bmatrix} R_{norm} \\G_{norm} \\ B_{norm} \end{bmatrix} $$
(1)
$$ L=\left\{ \begin{array}{llr} 116 \times \left( \frac{Y}{Y_n} \right)^{\frac{1}{3}} -16, & \frac{Y}{Y_n} > 0.008856 \\ 903 \times \frac{Y}{Y_n}, & \frac{Y}{Y_n} \le 0.008856 \end{array}\right. $$
(2)
$$ a = 500 \times \left( f \left( \frac{X}{X_n}\right)- f \left( \frac{Y}{Y_n}\right) \right) $$
(3)
$$ b = 200 \times \left( f \left( \frac{Y}{Y_n}\right)- f \left( \frac{Z}{Z_n}\right) \right) $$
(4)

where Xn,Yn, and Zn are base tristimulus values explained in the CIE chromaticity representation [21] and

$$ f(t)=\left\{ \begin{array}{llr} t^{\frac{1}{3}} , & t > 0.008856 \\ 7.787 \times t + \frac{16}{116}, & t \le 0.008856 \end{array}\right. $$
(5)

Global thresholding method has been employed for the segregation of the intensity extent of leaves i.e. the image pixels encompassing the leaves from the background pixels. Initially, the primary seed points have been chosen using this color based properties of the processed image. In this work, green channel (G) is extracted from red (R) and blue (B) channels, which contains more significant details, it goes through with contrast limited adaptive histogram equalizations process (CLAHE) for enhancing the pixel contrast. The result of color based leaf foreground region is illustrated in Fig. 6.

Fig. 6
figure 6

Sample foreground extraction images of bell pepper from Fig. 1, (a) original image (b) foreground separated image (c) the remaining background

The preprocessed images decompose into different subband images using AAWT image decomposition. Furthermore, Relief F and the box-counting algorithm is employed to extract the significant entropy features: Kapoor entropy (KE), Yager entropy (YE), Renyi entropy (RE), and FD features, independently [65]. Then, the extracted features value was ranked using the Fishers’ LDA dimensionally reduction strategy. Finally, the SVM classifier is used for classification.

2.3 Attribute extrication

Feature extrication is used to obtained relevant data from the decomposed images. To classify the crop diseases, texture attributes are applicable to evaluate the coarseness, smoothness, and pixel-regularities.

2.3.1 Adaptive analytic wavelet transform

The AAWT is a developed version of the discrete wavelet transform (DWT) decomposition technique which is a useful tool for medical imaging. The time-frequency covering is the most significant feature of AAWT [7]. Figure 7 depicts a different aspect of time-frequency awning of the wavelet framework. It offers various kind of attractive characteristics such as high recurrence frequency resolution and essential command on the dilation factor measures, the number of decomposition levels, Q-factor (QF) and redundancy (r). Additionally, it is appropriate for 2D signal processing because it comprises Hilbert transform companion of atoms. It can also make a narrow chirplet framwork for discrete signals that is applicable for time-frequency analysis [55]. Iterated filter banks (FBs) are used to obtain the transform, which provides fast processing for 2D signals. AAWT transform with iterated FBs shown in Fig. 8. It contains two high-pass and one low-pass channels. One high-pass channel utilized for negative frequency inspection and another for positive frequency investigation. The transition bands for the filters in Fig. 7, are depicted in Fig. 9.

$$ \ Q_{F}=\frac{\omega_{0}}{\Delta \omega} $$
(6)
Fig. 7
figure 7

AAWT wavelet framework representing time-frequency interrelation

Fig. 8
figure 8

AAWT transform with iterated FBs

Fig. 9
figure 9

Transition bands of the filters of Fig. 7

Where, ω0 represents central frequency and Δ denotes bandwidth, Control parameter QF. The time locator of wavelet function is handled by redundancy [70]. AAWT defines the QF, redundancy and dilation factor using parameters: e, f, g, h, and β, where e and f for up and downsampling of high pass channel, g and h are employed for up and downsampling of low pass channel, respectively, β is a positive coefficient, which is defined with QF and it can be represented as:

$$ \ \beta=\frac{2}{Q_{F} + 1} $$
(7)

In AAWT decomposition, the parameters e, f, g, h, and β are used as a controlling parameter in the wavelet. AAWT decomposition is done using iterative FBs consisting of high-pass and low-pass channels at each level of iteration [23]. Negative and positive frequencies are differentiated by the low-pass and high-pass channels of the FBs, respectively. The frequency response curve proportional to the high-pass filter is represented as:

$$ \ H (\omega)=\left\{ \begin{array}{llr} (ef)^{\frac{1}{2}}, |\omega|< \omega_{p} \\ (ef)^{\frac{1}{2}} \theta \frac{\omega-\omega_{p}}{\omega_{s}-\omega_{p}}, & \omega_{p} \le \omega \le \omega_{s} \\ (ef)^{\frac{1}{2}} \theta \frac{\pi- (\omega_{s}-\omega_{p})}{\omega_{s}-\omega_{p}}, & -\omega_{s} \le \omega \le -\omega_{p} \\ 0 & |\omega| \ge \omega_{s} \end{array}\right. $$
(8)

The low-pass filter frequency response is represented as:

$$ \ G (\omega)=\left\{ \begin{array}{llr} (gh)^{\frac{1}{2}}, \theta \frac{\pi- (\omega_{s}-\omega_{p})}{\omega_{s}-\omega_{p}}, & \omega_{0} \le \omega \le \omega_{1} \\ (gh)^{\frac{1}{2}}, & \omega_{1} \le \omega \le \omega_{2} \\ (gh)^{\frac{1}{2}} \theta \frac{\omega-\omega_{2}}{\omega_{3}-\omega_{2}}, & \omega_{2} \le \omega \le \omega_{3} \\ 0, & \omega \in [(0, \omega_{0}) \cap (\omega_{3},2\pi)] \end{array}\right. $$
(9)

Where dilation factor d, Q-factor Δf/f and shift parameter Δt; \(\omega _{p} = \frac {(1-\beta )\pi +\varepsilon }{e}\); \(\omega _{s} = \frac {\pi }{f}\); \(\omega _{0}=\frac {(1-\beta )\pi +\varepsilon }{g}\); \(\omega _{1}=\frac {e\pi }{fg}\), \(\omega _{2}=\frac {\pi -\varepsilon }{g}\) \(\omega _{3}=\frac {\pi +\varepsilon }{g}\); \(\varepsilon \le \frac {e-f+\beta f}{e+f}\pi \).

The 𝜃(ω) can be given by

$$ \theta(\omega)\ =\frac{[1+cos(\omega)][2-cos(\omega)]^{\frac{1}{2}}}{2}, for \omega \in [0, \pi] $$
(10)

The constraints for the selection of QF parameter is represented as:

$$ 1-\frac{e}{f} \le \beta \le \frac{g}{h}S $$
(11)

In this work, AAWT decomposition is applied iteratively using the iterated filter bank on the preprocessed green channel. AAWT decomposed components are suitable for extracting significant detail for the classification of crop disease stages as shown in Fig. 10. The AAWT components are band-limited, pixel variation is captured with the high value from the earlier sub-band image for further decomposition [3]. The iterative approach is effectively used in extracting the finer detail from previous decomposed sub-band images. The AAWT is successfully applied for discriminating normal, crop disease stages. The implementation of AAWT for Analytic wavelet-based features (AWF) extraction can be accessed from http://web.itu.edu.tr/ibayram/AnDWT/

Fig. 10
figure 10

Iterative AAWT features for bell pepper (top two rows for healthy and bacterial spot), potato leaf (bottom three rows), images: healthy, early blight, and late blight, and tomato sample images (a) original images, (b)-(e) iterative AAWT components

Here, the entropy and FD attributes are extracted [26]. The entropy traits such as KE, RE, YE have been employed to quantify the uncertainty.

Non-Shannon entropies is used because of the higher dynamic range [9]. The px can be represented as \(p_{x} =\frac {y}{r\times c} \). Where px denotes probability of pixel x appeared y times and r × c represents the image size. The KE, RE and YE can be defined as [9]:

$$ KE= \frac{1}{b-a}log_{2} \left( \frac{{\sum}_{X=0}^{X-1}{p_{x}^{a}}}{{\sum}_{X=0}^{X-1}{p_{x}^{b}}} \right) $$
(12)
$$ RE= \frac{1}{b-a}log_{2} \left( {\sum}_{X=0}^{X-1}{p_{x}^{a}} \right) $$
(13)
$$ YE= 1-\frac{{\sum}_{X=0}^{X-1}|2p_{x}-1|}{r\times c} $$
(14)

Fractal dimension (FD) features: The irregularity measure of the self-similarity with roughness has been provided by FD for the texture evaluation [2]. The crop leaf images having texture, which can be captured using Fractal. If an image surface S, scaled through f, is defined as self-similar surface solely if S signifies the union of none overlying (Sf) of itself [16].

Calculated as

$$ FD= \frac{log(S_{f})}{log \frac{1}{f}} $$
(15)

where \(f= \frac {scale~ value}{original~ scale} \) modified sequential box-counting algorithm is very useful to calculate fractal features [1]. In this approach sequential box-counting (SBC) algorithm is used, initially grid size is fix to the power of 2 and the final size is set to (r×c).

2.3.2 Bag of visual words

The BoW extrication is an automated system performing the rotation and scale invariant geometry-based attributes extraction [43]. The shape-based scale-invariant feature transform (SIFT) features are extracted using the BoW system from original images. The SIFT features of patches are utilized to construct the coding table of visual words. The coding table containing a definite number of codewords (or visual words) are contrived with descriptors as shown in Fig. 11.

Fig. 11
figure 11

The features extraction system using bag of visual words (BoVW)

2.3.3 Fisher vectors

Fisher vectors (FV) can be understand as a high-dimensional illustration of dense vectors [40]. The dense vector depiction in terms of the Gaussian Mixture Model (GMM) fitted to the attributes by encoding the derivatives of the log-likelihood of the model with respect to parameters [34]. The considered GMM system has been trained through diagonal covariance as well as the derivatives with respect to variances and mean, which additionally averages the first and second-order differences between the GMM centers and their dense features.

$$ {\Phi}_{k}^{(1)}= \frac{1}{N \sqrt{w_{k}}} {\sum}_{p=1}^{m} \alpha_{p} (k) \left( \frac{x_{p} - \mu_{k}}{\sigma_{k}} \right) $$
(16)
$$ {\Phi}_{k}^{(2)}= \frac{1}{N \sqrt{2w_{k}}} {\sum}_{p=1}^{m} \alpha_{p} (k) \left( \frac{(x_{p} - \mu_{k})^{2}}{{\sigma_{k}^{2}}} -1 \right) $$
(17)

Here {Wk,μk,σk}k are the GMM internal mesures termed as weights, means values, and diagonal covariances, consecutively. αp(k) indicates the soft weight assignment of the pth attribute xp to the kth Gaussian. An FV Φ can be computed by stockpile the differences: \({\Phi }=[{\Phi }_{1}^{(1)},{\Phi }_{1}^{(2)} ,...,{\Phi }_{K}^{(1)} ,{\Phi }_{K}^{(2)} ]\). The encoding explores the attributes distribution of a given image distance from the dispersal fitted to the feature of all the training images. The diagonal covariance of GMM based FV is amenable to be initially decorrelated by PCA. The PCA is applied to the SIFT features along with dimensionality reduction up to half from 128 to 64. The dimensionality of the Fisher vector is 2Kd, with K Gaussians in GMM and the patch feature vector dimensionality is d. The dimensionality of FV is high for d = 64 and K = 512 it is 65536, which still lower as compared with the dimensionality of stacked dense features (1:5M in this case). By following [40], the FV performance can be improved by applying the L2 normalization.

2.4 Feature selection

To design a strong learning system, the robust attributes selection helps to enhance the classification performance. Before selection feature normalization is an important step to bring all features in a common range for analysis. The attribute normalization boosts the system performance. The skewed details may offer the wrong caution and restrain the performance of the system. Generally, the information is standardized by scaling the values in a range of 0 to 1. In the proposed approach, the attributes are normalized through a z −score normalization function having unit standard deviation with zero mean [39]. Let A is data, and sdt signifies standard deviation subsequently the normalized data \( \tilde {A} \) by employing the z-score normalization function is defined as:

$$ X_{norm} = \frac{X-X_{min}}{X_{max}-X_{min}} $$
(18)

As we know all the features are not contributing equally in classification problem. Similarly, the selection of the ranked features among the available attributes set is important to acquire the better classification performance. The significance of the selected features is considered [42]. The features with a higher rank are more valuable for the classification of crop disease stages than lower-ranked features. LDA is an appropriate approach in machine learning employed for classification and provides the highest possible discrimination to classify the stages of crop disease using Fisher’s discriminant index (FDI) [20]. LDA offers maximum category adaptability by extending the fraction amidst different classes [1]. Maximum discriminations of classes can be obtained with Fisher’s LDA. In this work, LDA reduced the dimensionality of twenty-four selected features to thirteen robust features without affecting the variance. Results of dimensionality reduced ranked features are described in Table 4. The LDA using Fisher’s discrimination index provides thirteen high ranked robust features among the entire features set [67].

Table 4 Results of dimensionality reduced ranked AAWT features for potato dataset (mean ± standard deviation)

2.5 Classification

Basically, the classification of particulars follows the features extraction and classifying process. Applying a similar technique we extracted the handcrafted feature and machine learning-based features. Now, we discuss the applied classification models. With all the classifier models 10-fold cross-validation has been utilized in the experiments.

2.5.1 Logistic regression

The logistic regression (LR) [69] used to predict the disease based on the available features set. It is a type of probabilistic, statistical classification model to measures the relationship of a categorical dependent variable to one or more independent variables through the probability scores of the dependent variable. The LR generated the coefficients of features that show significance level of features and formulated to predict a logit transform of the probability of the features. The LR model formulated as:

$$ logit(P_{LLF})= \beta_{0} + \beta_{1} x_{1} + ... + \beta_{15} x_{15} $$
(19)

where PLLF represents the probability of the presence of leaf localized features (LLF) and the logit transformation in terms of logged odds is given as:

$$ odds= \frac{P_{LLF}}{1-P_{LLF}} $$
(20)

and

$$ P_{LLF}= exp \frac{\beta_{0} + \beta_{1} x_{1}+...+ \beta_{15}x_{15}}{1-exp(\beta_{0} + \beta_{1} x_{1}+...+ \beta_{15}x_{15})} $$
(21)
$$ \frac{P_{LLF}}{1-P_{LLF}}=exp(\beta_{0} + \beta_{1} x_{1}+...+ \beta_{15}x_{15}) $$
(22)
$$ logit(P_{LLF})=ln\left( \frac{P_{LLF}}{1-P_{LLF}}\right) $$
(23)

The concept used in LR is the parameters optimization by maximizing the likelihood of observing samples instead of error minimization as in ordinary linear regression. The implementation through the L2-regularization has been utilized the LIBLINEAR [17].

2.5.2 Multilayer perceptron model

A multilayer perceptron model (MLP) [49] employed for the multi-class classification with nonlinearity is given as:

$$ O^{0} = x, O^{l} = F^{l} (W^{l} \hat{o}^{l-1}) ~~~~~~~~ for ~~l= 1,...,L. $$
(24)

where x represents an input vector that is treated as the “zeroth layer output”. The notation \( \hat {o}^{l-1} \) is used to show an operation where a number 1 shows a prepend vector, in this way, the bias terms of considered layer l to represent the first column of the matrix Wl. The activation function (sigmoid) is represented by notation Fl applied to all vector components. The softmax function is used as the output layer activation function of the four-layered MLP model. It has one input layer, two hidden layers, and the output layer. The two hidden layers utilized in the MLP model have a different number of neurons as 7 and 8 respectively. The fully connected feed-forward MLP model is implemented with sigmoid activation. The learning rate utilized is η = 0.15 during backpropagation to train the model by considering the input features as the training parameters while the image labels as the target variable. The stochastic gradient descent algorithm based gradient is computed as weight error and the parameters are adjusted as to move the MLP one step closer to minimize the error. The 10-fold cross-validation is used in the experiments.

2.5.3 Support vector machine

It is used for the binary and multi-class classification of the dataset. The binary classification through SVM provides high flexibility for separation of classes [13, 35]. An adjustable linear decision surface called hyperplane is used that is capable to take the decision and splits the space into the number of classes. The concept of nonlinear hyperplane is based on the soft marginal SVM. Kernel-based SVM is used to discriminate the different diseases and their stages. A kernel-based multiclass SVM is used with multiple features. The kernel function helps in mapping the data into a next feature space for straightly divisible of features [14].

$$ Minimize \frac{1}{2} {\sum}_{i=1}^{n} {W_{1}^{2}} +C {\sum}_{i=1}^{N} \zeta_{i} $$
(25)

subject to \( y_{i} (\bar {w}.\bar {x}+b) \ge 1-\zeta _{i} \) for i = 1,... ,N.

The value of C is used to provide an adjustable margin in SVM. The feature mapping based nonlinearity was also utilized by transforming the Hilbert space as it provides the facility of transformation from one spatial domain to another. The Gaussian kernel application offers the nonlinear projection that creates additional separation between the data points in mapped high-dimensional space. The formulation of Gaussian kernel is given as

$$ K(\bar{x_{i}},\bar{x_{j}})= exp(- \gamma ||\bar{x_{i}}- \bar{x_{j}}||^{2}) $$
(26)

A k-fold cross validation based strategy is used in SVM model training and testing, where k = 10 with C = 1 and γ= 1 using the hyperparameter tuning [32]. The mean of the optimal hyperparameters is determined to find a robust estimate. The parameters C, γ, and kernel are optimized for SVM and the activation, solver, and learning rate are optimized for MLP classifier.

3 Result and discussion

Performance of the proferred leaf image segmentation technique is done using the PlantVillage dataset.

3.1 Experimental setup

The proposed segmentation approach was implemented on Python using the Keras API [29], Scikit-learn [47] library, and Tensorflow back-end [59]. NVDIA Quadro K5200 graphic card was used for the image data processing on a computer (Intel®;Xeon®; Processor E5-2650 v3 @2.30 GHz, with 8 GB RAM). The proposed technique classifies the dataset into multiple classes as mentioned in dataset description Section 2.1. The performance comparison shown here includes the average performance for all the categories and one versus all concept is utilized to compare the multi-class classification. The 70% and 30% images are used for the training and validation of the results. Inside the training dataset the 10 fold cross validation is performed for the training and testing experimentation.

3.2 Performance parameters

The performance measures of the proferred system are defined in terms of the classification accuracy (Ac) [11, 33, 36] and area under the curve (AUC) through the receiver operating characteristic (ROC) curve [19]. For better performance the AUC should be with high numeric value.

3.2.1 Accuracy

The accuracy measure is given as follows:

$$ Ac=(T_{P}+T_{N})/(Total~ data~ samples) $$
(27)

3.2.2 Area under the curve (AUC)

The classification performance is evaluated continuously through the ROC curve. It basically provides a trade-offs between sensitivity (true positive outcome rate) and false-positive outcome rate (i. e. 1-specificity).

3.3 Performance evaluation

The classification performance is evaluated for the bell pepper, potato, and tomato datasets using three classifiers LR, MLP, and SVM. The classification is performed by extracting a different set of features along with their combinations. The features in terms of BoW, FV, and AWF, along with their combinations as BoW+AWF, FV+AWF, and combination of all three BoW+FV+AWF as proposed features set. The performance is evaluated in terms of accuracy (%) and AUC using three classifiers as LR, SVM, and MLP. The classification results in terms of Acc. and AUC using LR, MLP, and SVM for bell pepper, potaton and tomato datasets are given in Table 5.

Table 5 Classification result for bell pepper, potato, and tomato datasets

The first classification experiment is performed by taking an original image set of the bell pepper dataset, as mentioned in Table 1. The SVM classifier gives performance using BoW+AWF and FV+AWF with values 88.33% and 89.23% accuracy with AUC > 0.88 as shown in Table 5. The MLP classifier with FV+AWF gives an average accuracy of 88.36% and AUC = 0.851. The tabular analysis shows that the two-class classification accuracy using three classifiers LR, SVM, and MLP the combinations of the features as BoW+ AWF, FV+AWF, and BoW+FV+AWF perform better as compared with individual types of features. A different set up for classification as Experiment 2 using the potato dataset as mentioned in Table 2. The LR gives 86.57% classification accuracy using FV+AWF features with 0.842 AUC as shown in Table 5. The MLP classifiers give performance using BoW+AWF and FV+AWF with values 88.69% and 88.27% accuracy with AUC > 0.86. The SVM classifiers give performance using BoW+AWF and FV+AWF with values 88.84% and 89.21% accuracy with AUC > 0.87. The combination of BoW+FV+AWF offers 94.13% accuracy with 0.966 AUC.

Experiment 3 performed on the tomato dataset using LR gives 86.99% classification accuracy with FV+AWF features and 0.864 AUC as shown in Table 5. The MLP classifier gives performance using BoW+AWF and FV+AWF with values 83.732% and 88.32% accuracy with AUC > 0.86. The SVM classifiers give performance using BoW+AWF and FV+AWF with values 86.85% and 87.28% accuracy with AUC > 0.86. The proposed combination of BoW+FV+AWF offers 91.89% accuracy with 0.939 AUC.

The average AUC and Ac. comparison of the proferred method with the state of the art methods is shown in Fig. 12.

Fig. 12
figure 12

The ROC plot illustrating a comparative analysis of the proferred approach with state of art classification approaches

The GAFSADR [30] shows the AUC performance with 0.914 and the LICDC [38] method shows AUC 0.950. The proposed method shows 0.961 AUC that is 5% better than the GAFSADR [30] method.

The detailed comparison of the proposed method with the state of the art methods is given in Table 6. The proposed techniques classifies the different leave images in the multiple classes of healthy and unhealthy categories. Referred methods are used in the manuscript for the performance analysis purpose to rate the classification accuracy with the simulation codes of these methods has been run to solve the considered dataset classification problem. The results of proposed method have been compared with various state of the art methods for the validation. The classification accuracy for PDDC [6] method is 0.860. The IRPDD [48] method utilized the texture, color, and shape features that missed the internal patterns as well as the frequency based features in the leaf region that limits the classification accuracy that is 0.888. The PDDML [27] method only the GLCM and statistical features of segmented region which may be efficient for a particular type of dataset with accuracy of 0.868. The KMCSAM [28] performs the clustering of features before trainig the classifier which is not very much supportive to improve the results and offers accuracy of 0.864. The GAFSADR [30] method gave an accuracy of 0.874, 0.818, and 0.815, for pepper, potato, and tomato leaf image datasets, respectively. A set of different features using FV and AWF along with BoW has been introduced in the proposed model that improves the classification performance. The average accuracy of the proposed method (BoW+FV+AWF using SVM classifier) is 0.941%. The average AUC has reported 0.961 which is 8% and 12% better as compared with to and 0.836 and 0.864 provided by GAFSADR [30] and KMCSAM [28], respectively as shown in Table 6.

Table 6 Average accuracy (Ac) and area under curve (AUC) for Datasets-1, 2, and 3

3.4 Complexity analysis

The computational complexity of PDDC [6], IRPDD [48], PDDML [27], and LICDC [38] is O(N3). GAFSADR [30] offers is O(N2 × n), for n images of training set. The prediction system during testing possesses O(N2) computations. The computing complexity offered by the ADMT [54] method is O(N2) and nuclei refinement is \( O(N^{2}L_{B} +N_{P} {L_{B}^{2}}) \) and O(N2LB), respectively. A computational complexity of the KMCSAM [28] and the proposed method is O(N3). The classification performance of the proposed method has improved to a good extent without increasing the computational complexity.

4 Conclusion

Detection of diseases in crops is based on the visual observation of structural deformations in leaf images. The paper recommends the feature fusion of spatial domain with the frequency domain to extract the best possible discriminative information of the selected domain. Further the significance of the extracted features is investigated using LDA and less significant traits are removed from the feature set used for the classification purpose in order to reduce the complexity of classification. In this method, AAWT decomposes the preprocessed images into various sub-band images. Afore, Relief F and a box-counting algorithm has been employed to extricate the different entropy and fractal dimension attributes, respectively. The extracted features value were optimized using LDA strategy. The attributes utilized to classify images are a combination of 120 BoW, 120 FVs, extracted from the preprocessed image and 13 AAWT features of leaf region. The performance of the proposal is analyzed through PlantVillage datasets of bell pepper, potato, and tomato images using LR, MLP, and SVM classifiers. The proposal is offering improved classification accuracy of various plant diseases. The simulation outcomes validate the performance superiority of the proferred classification approach than the existing methods in the field. The presented classification technique offered an average accuracy of 94% on the considered PlantVillage database. The future work can be the deep learning-based feature extraction and classification.