1 Introduction

The farming landmass is sufficient as required for feeding crop sourcing in today’s world. The economizing phase of developing countries is extremely dependent on agricultural productiveness. The plant leaves-based disease detection in the field of agriculture, at the initial stage, [1] performs a paramount impact to sustain their economy.

Soares et al. [2] presented a study on leaf images segmentation in semi-controlled conditions (LSSC) [3]. Singh et al. worked on localization-based classification using soft-computing technique (LCSCT) [4]. Biswas et al. minimized the redundancy for reclining to enhance the image color differences and performed segmentation through the fuzzy logic-based C-mean congregation (SFCC) [5]. Aparajita et al. [6] worked on an automatic system for late blight disease detection in potato leaf images. It used a segmentation using statistical features-based adaptive thresholding (SFAT) in leaf image. Yanikoglu et al. [7] worked on a self-operating plant identification system to identify the plant variety in a considered imaginarium. An imaginarium pattern recognizing system for plant disease prediction (IRPD) has designed in [8]. Sabrol et al. [9] conducted classification using the color, shape as well as texture features to separate the healthy and diseased tomato leaf images. Islam et al. presented a plane disease diagnosis approach using machine learning (PDML)-based [10] image processing. It classifies the healthy and unhealthy types of potato plants. The disease classification using a support vector machine was performed through these segmented images. Patil et al. [11] introduced an automated disease management techniques in potato (ADMT) [12].

The leaf color-based image analysis was performed in [13], for the identification and disease classification. Lowe et al. [14] analyzed the hyperspectral images for plant health monitoring and prediction. Devi et al. [15] analyzed the leaf disease of rice plants using a wavelet transform, scale-invariant feature transform (SIFT), and grayscale co-occurrence matrix technique. It employed the multiclass SVM and the Naive Bayes classifiers to classify the leaves images. Khamparia et al. [16] worked on a hybrid technique for the crop disease detection through combined features of an autoencoder and the convolutional neural network (CNN) model. Rangarajan et al. [17] used the transfer learning using the Pre-trained Visual Geometry Group 16 (VGG16) network for eggplants disease classification. The evaluation of the dataset was performed through augmentation using grayscale image along with other color spaces like hue saturation value (HSV), and YCbCr [18].

Wang et al. [19] have been accomplished a deep CNN-based flora disease intensity evaluation system. Kaur et al. [20] presented a composition of the k-means congregation for semi-automatic technique (KCM) to classify the plant diseases. Khan et al. [21] improvised a genetic algorithm-based feature selection [22] for apple disease identification and recognition (GFSD). Sladojevic et al.[23] presented a deep learning technique to classify and identify the plant diseases using the leaf images. Brahimi et al. [24] worked on a CNN-based algorithm for symptom detection and classification in tomato (SDCT) leaves. A great advantage of CNN is the automatic feature extraction directly from raw images. Ferentinos et al. [25] also worked on deep CNN models to carry out plant disease detection and diagnosis (PDDD) through the leaves images of plants. Bharali et al. [26] worked on deep learning-based leaf image analysis (DLLA) to identify and classify the plant diseases. Hang et al. [27] presented deep learning for disease identification and classification (DLDIC) in plant leaves.

The existing work faces the shortcoming of different varieties to classify the multi-class problem of the contaminated data along with high time complexity. The system accomplishment quiet shortfalls the accurateness of segmentation results. In this work, a novel leaf properties-based localization technique is presented for the region of interest segmentation and classification. The proposed algorithm gives a direction over the conventional techniques of object localization in terms of precision and computational time. In order to segment the disease affected areas, we have fused the pit-based region growing technique [28]. The region growing has performed using the mixture model for the leaf area refinement. The SIFT feature-based Gaussian distribution of localized images is utilized for the discriminant feature extraction. The compact representation of these features as Fisher vectors provides optimal solutions for the healthy and diseased leaves classification [29].

Further, the structure of the paper goes as follows. In Section 2, we have explicated the dataset and proposed an object segmentation technique and the essential conceptual background. Section 3 bestows the frame of the proposed image sectionalization technique. The distinguished image object properties-based feature extraction and classification models are discussed in Sect. 4. In Section 5, we have explored the result acquired from assorted systems and juxtaposed to the projected approach. Finally, Section 6 concludes the work and provides future research guidelines.

2 Dataset

2.1 PlantVillage bell pepper dataset

The plant image dataset repository for the image-based disease analysis is available as a PlantVillage dataset. It contains 14 different crops dataset that has 54,309 labeled images. The bell pepper database has two distinct categories for healthy and disease. Sample images from the database are shown in Fig. 1. The number of images of each class is mentioned in Table 1. It contains 997 images of bacterial spot and 1478 images of healthy class.

Fig. 1
figure 1

Bell pepper leaves in upper row shows the healthy images and lower row depicts the bacterial spots

Table 1 The PlantVillage Bell Pepper Dataset

2.2 PlantVillage potato dataset

The big database named as PlantVillage for containing dataset of plant provides multiple crop dataset. There are three different classes for potato images including healthy ones. The image sets of early and blight containing 1000 images each while a healthy image set contains 152 images.

Table 2 The PlantVillage potato dataset

2.3 PlantVillage tomato dataset

The PlantVillage database has different kinds of disease affected tomato leaf images. There are ten different classes for tomato plant leaf images encompassing a healthy class. The target spot contaminated imaginarium (or images) is 1404, mosaic virus affected category has 373. The yellow leaf curl virus infected images are 3209, tomato plant leaves imaginarium with bacterial spot are 2127. The early blight and healthy categories have 1000 and 1591 images, respectively. The late blight and leaf mold categories have 1909 and 952 images. The Septoria leaf spot and spider mites imaginarium varieties have 1771 and 1676 images as shown in Table 3.

Table 3 PlantVillage tomato dataset

3 Proposed image segmentation method

The complete model for plant image-based disease classification is shown in Fig. 2. It consists of two steps: 1) imaginarium localization through sectionalization and 2) attribute extraction for categorization.

Fig. 2
figure 2

The flow process chart of proposed localization-based classification algorithm

The preferred preprocessing techniqueology explicates a three-stage system. The foremost stage environs the techniques for forefront separation, i.e., extraction of the leaflet from background. The penultimate step circumscribes initial pit detection passed through region growing process. Fig. 2 delineates the workflow of the proposed algorithm. Let us explore each one of them in-depth in the following section.

3.1 Preprocessing

The images contaminated with some types of noise into the leaves region. The primary cause of the noise is the speckle noise that basically introduced because of the environmental effects on the sensors at the time of image acquisition. To overcome it we apply the pre-processing on these images. This has performed by investigating every pixel and frame a histogram chart for the voxels that lie within the leaf region. The histogram of the pixel’s intensity undertakes the auto-thresholding process [30]. The voxel of intensity value lower than the localized threshold has been taken as the background voxel.

3.2 Primary Pit Selection

The pit point initialization primarily requires prior information of the considered leaf object. The plant leaf imaginarium for the leaf county segregation simply provides a color-deployed pit point indication. The color imaginarium analysis shows that the green color component is with higher intensity values in the leaf area of the image as shown in Fig. 3. The red color component is moderate values while the blue color component is less intense. The ratio of green to blue (G/B) color shows pit point initialization very efficiently as compared to the green to red (G/R) color ratio.

Fig. 3
figure 3

The color component (Comp.) analysis of leaf images (from top to bottom) Original image, red, green, blue color components, fifth row shows the green to red (G/R) ratio with binary G/R map in sixth row, the green to red ratio in seventh row with binary G/B map in last row (the maps are created using Otsu’s threshold)

Global thresholding techniques can be applied for the separation of the intensity level of the foreground i.e. the imaginarium pixels encompassing the leaf area from background [31]. Nevertheless, the leaf image inspection shows that the mean intensity values are disparately laid out across the complete background owing to that the leaf borders may be misleading. Therefore, it is not possible to attain acceptable outcomes through global thresholding techniques. As a remedial measure to the aforesaid problem, our approach circumscribes both the local and global thresholding techniques. In the local regions, the regional thresholding technique employs an customizable valley prong value. Thereby, it can possibly mitigate the variations of intensity levels throughout the background [32]. Further, the regional thresholding approach encompasses the foreground imaginarium by employing the global threshold. The window expanses as well as the sliding window dimensions have a remarkable impact on the reliability of the approach chosen to opt threshold as shown in Fig. 4.

Fig. 4
figure 4

The effect of window size variation for foreground extraction from background a Original image, b foreground for window size 3\(\times \)3, c 5\(\times \)5, d 7\(\times \)7, and b 9\(\times \)9

The precise leaf county marking in imaginarium has been performed with a square window of size 9x9 or greater otherwise there is an issue of under segmentation as depicted in Fig. 4. This theory is true for the counter case too. Therefore, the choice of apt window expanse is of utmost pertinence to this process. The results obtained using various window sizes are illustrated in Fig. 4.

3.3 County expansion

The equivalent neighboring pixels in terms of properties are chosen from given initial pits and are appended for region growing. The operational activity is nonlinear in nature prevail to the morphology features or shape of an image is labeled as the morphological operations. A repetitive morphologic erosion [33] has employed to contract the leaf borderline and secure the solid initialization marking of the leaf. Primarily, the region growing framework is employed to acquire the circumference of the homogeneous county. Therefore, the procured outcome was not the exact imitation of the leaf area. To mitigate this, the mixture model-based county expansion technique [28] has been applied. It has developed from the county expansion method and is employed to intensify the outcome. Global Gaussian disseminations have been amalgamated with the prior knowledge acquired from the county expansion technique.

Let an image Y containing pattern X of K-classes in which a point l having intensity \(y_l\) is categorized as a member of class \(x_l \in \{1,\ldots K\}\). The mathematical model description of \(k^{th}\) class is the conditional probability \(P(y_l|x_l = k)\) [34]. The \(k^{th}\) class of the model is given in terms of linear MM:

$$\begin{aligned} \xi _k (y)= (1-\alpha _k) \xi _{Sk} (y-K_0) + \alpha _k \xi _B (y-K_0) \end{aligned}$$
(1)

The parameters y, \(K_0\), and \(\alpha _k\) representing the intensity of pixel offered system offset, and MM lifeline parameters, respectively. The parameter \(\xi _B\) denotes the background pixels distribution that obeys the normal distribution having mean and variance \(K_0\) and \(v_B\), respectively, with intensity statistics of \(k^{th}\) category pixels being \(\xi _{Sk} \) given in terms of the negative-binomial distribution (NBD) having parameters \(\mu _{Sk}\) and \(v_{Sk}\) as mean and variance, respectively [34].

The region growing is basically determined through a homogeneity property to establish a regional threshold using the intensity values of the commonly focused datasets. In general, it utilizes the background statistics and the signal distributions of the confocal data collection along with a linear mixture model for probability determination where the considered pixel values are discriminated as the background or foreground. These probabilities give the direction to define the rules of region growing approach.

The initial pit is taken as the center of the considered spherical region (or spheres). Then, the locality-based homogeneity property is developed focusing on the pit in the image. The dimension of considered volume be the trade-off between the local region of the segmentation and Gaussian mixture model (GMM) model fittingness and by default is set to \(N/8 \times M/8 \times 3\), where M and N represent the image stack dimensions. And the patch sizes are taken of not smaller than \(32\times 32\times 3\) to fulfill the requirement of data points to fit the GMM.

Fig. 5
figure 5

The original image with pit region and intermediate results for iterations (Ite) = 10, 20, 30, 40, 50, 60, 70, 80, 90, and 100

The Otsu [35] thresholding-based segmentation is an optimum option to multi-modal distributions [36], while the background considered as having the normal distribution with the negative binomial of signal, is fitted through an EM approach [28] on the distribution of crop pixel intensity. The iterative region growing process for different iterations (Ite = 10, 20, ..., 100) is shown in Fig. 5.

4 Attributes extortion and classification

The feature extortion for the simplification of the complex image features has been employed for the classification system.

4.1 Fisher vectors (FV)

The FV [37, 38] is computed using the Fisher Kernel (FK) that describes a feature by probability density function-based gradient vector using scale-invariant feature transform (SIFT) [29]. The murky depiction is given in terms of the GMM feature fitting and encoding the log-likelihood model’s derivatives with respect to specified parameters as shown in Fig. 6. A set of attributes computed using a specified probability distribution. The evaluated attributes are associated with their gradient that directs the data fitting to the Gaussian models appropriately.

The FK having well-defined statistical features as a local descriptor and Gaussian fitting centers are defined as FV. A set of complete features such as mean and covariance describing every individual Gaussian model has utilized to represent every object of interest.

Fig. 6
figure 6

The step-by-step process flow of the Fisher vector (FV) extraction of leaf images showing black arrows indicating training phase and the sky-blue arrows show the testing phase

In our experiments, we have taken the patch size of 24x24 for scale-invariant feature transform (SIFT) feature extraction followed by PCA to compact the size of the descriptors of SIFT from 128 to 64. The FV size for K Gaussian distributions is defined as 2Kd, where d is the feature vector dimension. The FV dimensionality is for K=256 and d=64 is 16384. The performance measure of the FV is optimized using the signed square-root followed by the \(L_2\)-normalization [29].

4.2 Classifiers

The simple neural network-based classifier as a multilayer perceptron model (MLP) [39] has been used for the two-class classification model having some nonlinearity. The nonlinearity is introduced by a rectified linear unit (ReLU) activation obligation in the entire constituent of the vectors, albeit for the last classification layer a softmax obligation has been used as an incitement for the conceptualized MLP system having four layers conception.

A two-category classification using a support vector machine (SVM) [40] offers greater flexibility of class separation. It is the most protruding model among all available classifiers the classification [41]. The fundamental mathematical formulation of SVM is explained as

$$\begin{aligned} Minimize \frac{1}{2} \sum _{j=1}^{n} W_{1}^2 +C \sum _{j=1}^{N} \zeta _j \end{aligned}$$
(2)

subject to constraint \( y_j (\bar{w}.\bar{x}+b) \ge 1-\zeta _j \) for \(j= 1,2,3,\ldots ,N\).

A soft-max tunning measure C is adjusted for providing an adjustable edge to SVM. Additionally, the nonlinearity has been instigated by the radial basis function kernel which can be explained in terms of the Hilbert space transformation of the image. The experiments using tenfold cross-validation are performed with softmax parameter \(\gamma \) = 1 and \(C=1\) for training as well as testing of the classification model.

4.3 Performance measures

The performance specifications used for the segmentation work evaluations are \( F_1 \)-score, Dice coefficient (DC) [41] and modified Hausdorff distance (MHD) [42]. The accuracy metric (Ac) [43, 44] and receiver operating characteristic curve (ROC) [45], along with the area under the curve (AUC) [45], are measured to evaluate the performance of classification technique. For the high AUC measure, classification outcomes perform better. Generally, the evaluation metrics are utilized to measure the classifier performance and are given as;

$$\begin{aligned} TPR= & {} (Sensitivity)= \frac{TP}{TP+FN}, TNR= \frac{TN}{TN+FP} \nonumber \\\end{aligned}$$
(3)
$$\begin{aligned} FPR= & {} \frac{FP}{FP+TN} \end{aligned}$$
(4)

where TP represents the number of appropriately categorized positive samples, TN denotes the number of appropriately categorized negative samples, while FP denotes the number of incorrectly (or falsely) categorized negative samples and FN is the number of falsely categorized positive samples. TNR is the true negative (that is the majority) rate, and FPR defines the false positive (that is the minority) rate.

$$\begin{aligned} Precision= & {} \frac{TP}{TP+FP} ~~and~~ Recall = \frac{TP}{TP+FN} \end{aligned}$$
(5)
$$\begin{aligned} G-mean= & {} ( \Pi _{k=1}^{m}Recall_k)^{\frac{1}{m}} \end{aligned}$$
(6)

where m is the number of classes. The G-mean measure is the accuracy ratio of minority grade to majority grade. This measure tries to balance by enhancing each class’s accuracy in unbalancing databases too because the overall accuracy does not provide enough information for class imbalance problems [46]. G-mean computes the effectiveness of the technique by considering the skewed class distributions [46]. The ROC curve [47] computes the accuracy of the system by varying the cut-off value to analyze the model score and obtain the various values of TPR on Y-axis and FPR on X-axis. The AUC [45, 48] can be used to measure the classifier with an idealistic point at (0, 1) that specifies the correctly classified all minority samples and non of the majority sample is wrongly classified to minority class sample. For higher values, the AUC performance of the classifier is better. The accuracy is defined as follows:

$$\begin{aligned} Ac=(TP+TN)/(TP+FP+FN+TN) \end{aligned}$$
(7)

5 Results and discussion

The result analysis of the preferred localization deployed classification method is performed through the PlantVillage datasets of three crops, which cover the leaves of pepper plant, potato plants, and tomato leafage. The proposed model was implemented using the packages of Python, Keras [49], Scikit-learn [50], and TensorFlow [51] on the personal system (Core-i5 CPU, Clock 2.30 GHz, and Random Access Memory: 4 GB).

5.1 Evaluation of segmentation work

The segmentation performance of the LSSC [2] technique shows 0.877 \(F_1\)-score with 0.721 DC and 10.72 MHD. The SFAT [6] technique offered 0.875 \(F_1\)-score and 0.763 DC along with 10.25 MHD. The proposed segmentation technique provides average \( F_1 \)-score, DC and MHD of 0.916, 0.824, and 7.29, respectively, which is 5\(\%\) and 7\(\%\) better than SFAT [6] technique.

Table 4 Comparison of average \( F_1 \)-score, Dice coefficient, and MHD for segmentation techniques

The computational complexity of the SFCC [5] and SFAT [6] techniques is of the order of \(O(N^3)\) while for the LSSC [2] and the proposed technique has in the order of \(O(N^2)\).

5.2 Classification work evaluation

The classification performance using MLP and SVM for bell pepper, potato, and tomato datasets is given in Table 5.

Table 5 The comparison of MLP and SVM classifier for bell pepper, potato and tomato datasets

The accuracy and AUC for the tomato dataset are 0.892 and 0.865 using the MLP model while for SVM classifier is 0.918 and 0.933, respectively. For the potato dataset, the Ac and AUC values are showing moderate performance using both the classifiers. Accuracy and AUC for the bell pepper dataset are 0.928 and 0.912 using the MLP model; on the other hand, the SVM classifier shows 0.955 and 0.951, respectively. The average value of accuracy and AUC for the MLP model are 0.916 and 0.901, respectively. THE SVM Classifier shows the average accuracy of 0.944 with 0.947 AUC.

The detailed comparison of the preferred classification technique with the state-of-the-art techniques is given in Table 6.

Table 6 Accuracy (Ac) and AUC for images of leaves for bell pepper plants, potato plant and tomato plants datasets of proposed technique without and with segmentation (Segment.)

The various state-of-the-art techniques show the average classification accuracy performances as the LCSCT [4] technique shows 0.894 Ac., and 0.922 AUC for the tomato dataset while for the bell pepper images classification it offers the accuracy of 0.921 with 0.891 AUC. The GFSD [21] technique gave Ac of 0.918 and 0.859, for the pepper leaves and potato leaf images, respectively, while it is 0.856 for tomato plants imaginarium. The proposed technique without segmentation offers 0.916 Ac. and 0.897 AUC for the bell pepper dataset and for potato and tomato accuracies are 0.893 and 0.866 with AUV of 0.917 and 0.881, respectively. The proposed technique using the SVM classifier for tomato dataset classification provides 0.928 and 0.933 accuracy and AUC values, respectively. The bell pepper dataset classification accuracy is 0.955 with 0.951 AUC. The average performance measure of the proposed technique is 0.943 accuracy with 0.947 AUC.

The average AUC comparison of the proposed technique directly to the state-of-the-art techniques is depicted in Fig. 7.

Fig. 7
figure 7

The ROC curve shows a comparative analysis of the preferred technique with different state-of-the-art techniques

The mean value of AUC is depicted in figure with their respective techniques. The KCM [20] gives the least measure with 0.767 AUC on the other hand the DLDIC [27] provides 0.944. The preferred classification technique offers 0.945 AUC that performs better than all the mentioned state-of-the-art techniques.

The G-mean comparison of different approaches is shown in Fig. 8.

Fig. 8
figure 8

The G-mean analysis of the preferred system with different state-of-the-art techniques

The value of G-mean given by PDML [10] technique is 0.843. The DLLA [26] technique gives the G-mean measure of 0.943 while the DLDIC [27] technique offers the G-mean value of 0.948. The proposed technique without segmentation provides the G-mean value of 0.927, on the other hand, the proposed technique offers a G-mean measure of 0.952.

The training time analysis (in seconds) for the different techniques is shown by the bar graph in Fig. 9. The DLLA [26] technique shows the average time of 9574 sec. for the training of the datasets. The LCSCT [4] technique takes 7200 seconds to train the datasets while the DLDIC [27] technique is trained in 8596 seconds.

Fig. 9
figure 9

The training time (in seconds) analysis of the preferred technique with the state-of-the-art approaches

The time taken by the proposed technique without segmentation is 1837 seconds, while the training time of the proposed localization-based classification technique takes 2100 seconds. The DLDIC [27] classification accuracy performance is at par with the proposed technique, but the time complexity is 4 times higher than the proposed technique.

6 Conclusion

Digital image analysis techniques have been utilized for numerous applications in agriculture like vigor diagnosis, vegetation measurement, phenotyping, etc. The Proposed localization-based classification technique classifies the three crops leaf images for a different kind of disease detection. The leaf localization was performed using natural color properties of the leaf images and the mixture model-based region growing. The diseased image properties like spots and damage of leaf region in the localized image provide the keys of characteristics discrimination from the healthy images that can be easily grabbed using Fisher vector extraction. The classification performance is measured in terms of accuracy and AUC. To measure the performance for the data imbalance cases the G-mean parameter is analyzed for all the three crops datasets. The average accuracy and AUC using SVM are 0.943 and 0.947, respectively, with 0.953 G-men scores and training time of 2100 seconds. Overall the proposed classification technique offers better performance as compared to the state-of-the-art techniques. One can extend this work with other crops disease detection and classification with improved accuracy.