1 Introduction

1.1 Background and motivation

India's agriculture sector plays a pivotal role in its economy. However, the adverse impact of plant diseases on crop yield and quality is a pressing concern. Plant diseases lead to reduced productivity, increased production costs, and heightened economic losses. Swift and accurate disease detection is essential to curbing these losses. According to Global Food Security Index, India ranks 68th and one of the main reasons behind this ranking is plant disease [1]. The convergence of image processing and deep learning offers a powerful tool to tackle this challenge. Convolutional Neural Networks (CNNs) have proven highly effective in image classification tasks, motivating their application to the realm of agriculture [2, 3].

In this paper, we present a model that utilizes image analysis through a combination of Convolutional, max pooling, dropout, and dense layers to classify potato diseases. A transfer learning technique is applied and, in this technique, if a model is trained on one task then it is reused on a second related task. The major focus is on storing knowledge acquired after solving one problem and implementing this knowledge to a different but related problem. MobileNet, Inception and ResNet models are used which are CNN techniques commonly used for image classification [4,5,6].

MobileNet is a cutting-edge computer vision model developed by TensorFlow, designed specifically for mobile devices. This class of Deep Convolutional Models is known for being incredibly lightweight, with significantly smaller file sizes and faster processing speeds than their counterparts. In addition to their streamlined design, MobileNet utilize unique global hyperparameters such as width multiplier and resolution wise multiplier to further optimize computational efficiency. Overall, MobileNet represents a powerful and efficient solution for mobile-based computer vision tasks [7, 8]

The Inception technique is a type of Deep Learning Neural Network that is composed of recurring components referred to as Inception modules. In Fig. 1 to learn and train data across depth in the given input, Inception uses a filter size of 1 × 1, while filter sizes of 3 × 3 and 5 × 5 are used to learn and train data across all the dimensional components, such as height, width, and depth of the input. When the patterns from the selected filter sizes are combined, it results in increased representational power (Fig. 2).

Fig. 1
figure 1

Representation of an inception module

Fig. 2
figure 2

Comparative analysis of related previous work done by different researchers in the term of accuracy

ResNets are notable for their ability to handle the "vanishing gradient" problem that can occur in deep neural networks. This problem arises when gradients become too small during backpropagation, leading to slow convergence and accuracy degradation. ResNets use "residual blocks" that allow for "shortcut" connections that can bypass layers, allowing for easier flow of information and thus improving performance. ResNets have been successful in many computer vision tasks, including image classification, object detection, and segmentation, and have set numerous records in various benchmarks. Overall, ResNets have played a significant role in the advancement of deep learning and computer vision research [9].

In order to enhance the accuracy of the Potato disease classification model through the use of transfer learning and ensemble techniques, several approaches were employed. Specifically, a deepstack ensemble model was developed to integrate the predictions of multiple individual models, thereby improving overall accuracy. Furthermore, a stacking transfer learning ensemble model was constructed, leveraging pre-trained transfer learning models such as MobileNet, Inception, and ResNet to generate more accurate predictions. Finally, the resulting ensemble model was utilized to make the final predictions on the input dataset. These techniques collectively contribute to the goal of optimizing the performance of the Potato disease classification model [10, 11].

Potato blight, which is caused by the pathogen Alternaria solani, is a fungal disease that affects potato plants. It is characterized by the presence of circular, dark spots on the leaves and stems of the potato plant. Without proper treatment, this disease can drastically reduce the potato yield and even cause the plant to die prematurely.

1.2 Practical applications

The practical application of accurately identifying and addressing potato blight through advanced technology enables early intervention, optimizing crop yield, reducing economic losses, and promoting sustainable agriculture for enhanced food security.

1.3 Research gap

While the potential of CNNs in disease detection is recognized, the adaptation of specific CNN architectures for potato disease classification remains an underexplored area. Moreover, the synergy of transfer learning and ensemble techniques, specifically in the context of potato disease classification, has not been comprehensively addressed. Bridging these gaps is critical to providing agriculturalists with an efficient and accurate disease detection mechanism.

This research bridges the gap between technology and agriculture by introducing an innovative approach to potato disease classification. The integration of CNNs, transfer learning, and ensemble techniques underscores the potential of modern computational methods in tackling real-world challenges. Ultimately, this study contributes to India's journey towards enhanced food security and agricultural sustainability.

1.4 Contributions

This research paper presents a comprehensive approach to address the challenge of potato disease detection. By combining the strengths of image processing, transfer learning, and ensemble techniques, the study contributes to the advancement of agricultural technology and food security. The novel contributions of this work are as follows:

  1. 1.

    Model Architecture: The proposed model leverages Convolutional, max pooling, dropout, and dense layers to effectively capture disease-related features from potato leaf images.

  2. 2.

    Transfer Learning: The integration of MobileNet, Inception, and ResNet models showcases the power of transfer learning in repurposing pre-trained models for accurate disease classification.

  3. 3.

    Ensemble Techniques: The deepstack ensemble model and stacking transfer learning ensemble model exhibit the potential of ensemble methods to enhance accuracy and robustness.

  4. 4.

    Practical Application: The study's outcomes hold significant practical implications for farmers and agricultural industries. Early disease detection can prevent yield losses and bolster food security.

1.5 Paper organization

The paper’s remaining sections are organized as: Section 2 discuss related works, Section 3 describe proposed methodology, Section 4 discuss results, Section 5 concludes the paper.

2 Related work

Salem et al. (2019) [12] developed a deep learning approach to early detect and classify three major potato diseases: early blight, late blight, and healthy. The study used a convolutional neural network (CNN) architecture to classify potato diseases based on the images of potato leaves. The model was trained on a dataset of 1,500 images and achieved an accuracy of 92.50% in detecting and classifying the three types of potato diseases.

Zhou et al. (2020) [13] developed a potato disease classification method based on CNN and transfer learning. The study uses the VGG16 architecture pre-trained on the ImageNet dataset as the base CNN model. The dataset used in this research potato leaf images with six different diseases and a healthy class. Method achieved an accuracy of 96.25% for disease classification.

Sarkar et al. (2020) [14] developed framework addresses some of the key challenges associated with traditional potato disease classification methods, including low accuracy rates and time-consuming processes. The authors utilized a dataset of potato images and applied various deep learning models, including VGG-16, ResNet-50, and Inception-v3, to create an ensemble of models for disease classification. The results show that the ensemble of models outperformed individual models and traditional machine learning methods. Furthermore, the authors conducted extensive experiments and analysis to evaluate the developed framework's performance and compare it with existing state-of-the-art methods. The results indicated that the developed framework outperforms existing methods and achieves high accuracy rates across multiple potato diseases.

Padhy et al. (2020) [15] developed an automated system for potato disease classification using deep learning techniques. They applied various deep learning techniques in dataset of potato images, including CNNs and Transfer Learning, to create a system for disease classification. The results show that the proposed system outperforms traditional machine learning methods and achieves high accuracy rates across multiple potato diseases.

Ali et al. (2020) [16] developed approach for potato disease classification using CNNs with feature fusion. The proposed model achieved promising results on a publicly available dataset, demonstrating the effectiveness of the proposed approach in accurately classifying potato diseases. The use of feature fusion helped to improve the performance of the CNNs by combining features from multiple sources. Their approach integrated multiple features extracted from different layers of the network to improve classification accuracy.

Qu et al. (2020) [17] conducted a comparative study of different deep learning methods for potato disease classification. They evaluated the performance of various models, including CNNs, Recurrent Neural Networks (RNNs), and SVMs, and found that CNNs performed the best.this study compared and evaluated the performance of various deep learning models for potato disease classification. The results showed that the ResNet50 model achieved the highest accuracy of 97.3% in potato disease classification, followed by InceptionV3 with 95.8% accuracy. The study also demonstrated the effectiveness of data augmentation techniques in improving the classification accuracy of the models. Overall, the findings of this study suggest that deep learning-based approaches have the potential to be highly effective in potato disease classification and can assist in early detection and prevention of diseases, ultimately leading to improved crop yield and quality.

Wang et al. (2020) [18] developed a deep learning-based potato disease identification system that utilizes transfer learning to achieve high accuracy in identifying various potato diseases. The developed system performed on a large-scale potato disease dataset, demonstrating the effectiveness of the proposed method. The use of transfer learning is shown to significantly improve the classification accuracy of the system, making it more robust to different types of potato diseases. The results of this study suggest that deep learning techniques can be an effective tool for potato disease identification and monitoring, which can potentially lead to better crop management and increased crop yield.

Sarkar et al. (2020) [14] developed framework incorporates multiple pre-trained models, including Inception-v3, VGG-16, and ResNet-50, to learn rich and diverse representations of potato disease images. The ensemble model then combines the predictions of these models to improve the overall accuracy and robustness of the classification system.

Shariff et al. (2021) [19] developed a potato disease classification system using transfer learning and CNNs. Their model was trained on a large dataset of potato disease images and achieved high accuracy in disease detection and classification. They also compared their approach with other state-of-the-art methods, demonstrating the superiority of their proposed method. The results suggest that transfer learning is a promising technique for improving the accuracy and efficiency of CNN-based potato disease classification.

Ma et al. (2021) [20] developed a hybrid deep learning framework for potato disease detection and classification. Their approach combined CNNs with other techniques, including Principal Component Analysis (PCA) and Support Vector Regression (SVR), to achieve high accuracy in disease detection and classification.

Zaman et al. (2021) [21] developed approach addressed some of the key challenges associated with traditional potato disease identification methods, including low accuracy rates and time-consuming processes. The authors utilized a dataset of potato images and applied various deep learning models, including VGG-16, ResNet-50, and DenseNet-121, to create an ensemble of models for disease identification. The results show that the ensemble of models outperforms individual models and traditional machine learning methods. The authors conducted extensive experiments and analysis to evaluate the proposed approach's performance and compared it with existing methods.

Maity et al. (2022) [22] developed a multi-task learning approach for potato disease detection and classification using deep learning techniques. Their approach combined multiple tasks, including image segmentation and disease classification, to achieve high accuracy in detecting and classifying multiple diseases.

Khan et al., (2022) [23] carried out a thorough survey and summarized AFPs predictors that were developed for identification of AFPs. They provided a brief description of applied datasets, feature descriptors, model training classifiers, performance assessment parameters, and web servers. In the review article, the drawbacks of the developed predictors and the best predictors were highlighted. They explained the future insights and more effective feature descriptors, appropriate feature selection techniques, and efficient classifiers that enhance the performance of predictors for fast and accurate identification of AFPs. TargetFreeze and iAFP-Ense are designed as online web server.

Surya et al. (2022) [24] provided a comprehensive review of different techniques for potato disease detection and classification. They discussed various approaches, including machine learning, deep learning, and image processing techniques, and highlighted the strengths and weaknesses of each approach.

The detection and classification of potato diseases have been studied by several researchers using deep learning approaches, such as CNNs and transfer learning. The researcher proposed various frameworks and techniques to improve the accuracy of disease identification, including feature fusion, ensemble models, and hybrid frameworks. The use of deep learning methods has shown promising results in the field of potato disease classification. The papers listed in Table 1 mainly focus on the detection and classification of potato diseases using deep learning techniques. The problem of identifying crop diseases is of great importance as it can impact crop yield and agricultural productivity. Several researchers have developed models using convolutional neural networks (CNNs) and transfer learning to improve the accuracy of crop disease detection.

Table 1 Comparative study of related works

There is still a requirement of a transfer learning based model for more accurate disease prediction. In this research, a detailed study and comparison of various existing machine learning algorithms is done. In this Research the existing algorithms for the detection of potato disease are implemented in Google Colab (GPU + TPU with 32 GB RAM) environment based on the dataset ( collected from primary source as well as from PlantVillage) of 730 images. In this research, various Image pre-processing techniques (augmentation, Canny edge detection, Noise reduction etc.) are also used for features extraction from potato plant leaves images. In this research, three important machine learning Models (MobileNet, ResNet and Inception) are studied and implemented and compared on various performance evaluation metrics like Accuracy, f1-Score, recall, precision etc. and compare with Stacking ensemble Model.

3 Proposed model

The proposed scheme integrates edge detection, data augmentation, pre-trained feature extraction models (MobileNet_V2, Inception_V3, and ResNet V2), and deep stacking to achieve enhanced accuracy in classifying diseases in potato plants, presenting a novel approach that combines various techniques for improved disease detection.

The proposed model for disease detection is based on three classes of the Potato dataset obtained from PlantVillage. Figure 3 depicts the proposed approach, which starts with data pre-processing, including the crucial step of data normalization to scale and standardize the input data. The Canary edge detection algorithm is applied to identify the edges of objects, specifically plant leaves, to indicate the presence of diseases. To expand the dataset, data augmentation techniques such as flipping and rotating images are used. Pre-trained feature extraction models such as Mobilenet_V2, Inception_V3, and ResNet V2 are utilized in this research for image classification tasks. The deep stacking approach is used in the proposed model to enhance model performance by stacking several models together. The implementation of deep learning methods and ensemble learning techniques is done to significantly enhance disease detection accuracy.

Fig. 3
figure 3

Block diagram of proposed model

The proposed model involves following steps to achieving the goal. These steps include:

  1. 1.

    Data Collection: The first step involves collecting the required data for the Research.

  2. 2.

    Data Pre-processing: The collected data is then pre-processed to remove inconsistencies and prepare it for further analysis.

  3. 3.

    Model Development: A model is developed using appropriate algorithms and the pre-processed data.

  4. 4.

    Model Training: The model is trained using the collected data to improve its performance.

  5. 5.

    Model Evaluation: The trained model is evaluated to assess its accuracy and effectiveness in achieving the desired outcome.

  6. 6.

    Performance Analysis: The performance of the proposed model is analyzed using various metrics such as accuracy, precision, recall, F-score, and loss.

  7. 7.

    Result Interpretation: The results obtained from the model are interpreted, and conclusions are drawn based on the findings.

3.1 Data collection

The proposed model for detecting potato diseases was mainly based on a dataset consisting of three classes, which were sourced from Plant Village and primary source. The dataset was divided into 706 training images and 151 testing images. Specifically, for the training set, 300 images each were used for Early Bright and Late Bright, and 106 images were used for the Healthy dataset.

For the purpose of training and testing the proposed model, an 80:20 dataset split was used. A detailed summary of the dataset used in this research is provided in Table 2. To implement the proposed model, the images were uploaded to the cloud and processed using Google Colab. Some example of Late Blight, Early Blight, Healthy images are shown in Figs 4 and 5.

Table 2 Details of data set used for proposed model train and test purpose
Fig. 4
figure 4

Potato Leaves Image

Fig. 5
figure 5

Target Dataset distribution

3.2 Pre-processing

Data normalization, also known as data scaling, is a crucial step in the pre-processing of data before feeding it into machine learning algorithms. It involves transforming the features of dataset to a standard scale, typically between 0 and 1, or with a mean of 0 and a standard deviation of 1. This helps in ensuring that all features contribute equally to the analysis and prevents any particular feature from dominating the learning process due to its larger range.

3.2.1 Data normalization

Data normalization plays a crucial role in data pre-processing in improving the accuracy of plant image disease detection models. It involves scaling and standardizing the input data.The process of data normalization in three-channel (RGB) images typically involves calculating the mean values of the RGB (Red, Green and Blue) channels for the entire image dataset using list comprehension. The red channel values tend to be concentrated at lower values, with a slight positive skew. The green channel values are more uniform, with a larger peak at around 135, indicating that green is more pronounced in these images. The blue channel values are the most uniform, with minimal skew, but show great variation across images.

3.2.2 Image processing

Canary edge detection algorithm is used to identify the edges of objects. It identifies the edges of the plant leaves and this information is used to identify areas of the plant that are affected by diseases. A digital image of the plant leaves is captured and processed using edge detection algorithms. The algorithm analyses the edges of the leaves and looks for changes in the texture and colour of the leaves which indicate the presence of disease.

In the Canny edge detection algorithm, intensity gradient calculation is applied. The next step is to find intensity gradient of the image. Intensity gradient of an image refers to the pixel intensity changing in different directions across the image. This is done using a Sobel kernel which is applied to the smoothed image in both vertical and horizontal directions to obtain the first derivative in the vertical (Gy) and horizontal (Gx) directions. Using these two images, the edge gradient and direction for each pixel are calculated.

$$\mathbf E\mathbf d\mathbf g\mathbf e\boldsymbol\;\mathbf G\mathbf r\mathbf a\mathbf d\mathbf i\mathbf e\mathbf n\mathbf t,\;G=\sqrt{G_x^2+G_y^2}$$
(1)
$$\mathbf{A}\mathbf{n}\mathbf{g}\mathbf{l}\mathbf{e},{\uptheta =tan}^{-1}({G}_{x}^{2}{+G}_{x}^{2})$$
(2)

The bounding box is determined by finding the most extreme edges at the four corners of the image, and the coordinates of the box are used to crop the image. The bounding box is visualized in Fig. 9 using red border.

3.2.3 Data augmentation

Data augmentation is a method to make a dataset bigger by changing the original pictures in different ways. One way is to flip the picture either up-down or left–right. This does not change the picture much but makes a new one which can help the model work better. By doing this with many pictures, a varied dataset can be obtained that can help the model to work better with new and different pictures.

$$\mathbf{I}\mathbf{m}\mathbf{a}\mathbf{g}\mathbf{e}={A}_{ijk}$$
(3)
$$\mathbf H\mathbf o\mathbf r\mathbf i\mathbf z\mathbf o\mathbf n\mathbf t\mathbf a\mathbf l\boldsymbol\;\mathbf f\mathbf l\mathbf i\mathbf p:A_{ijk}\rightarrow A_{i(n+1-j)k}$$
(4)
$$\mathbf V\mathbf e\mathbf r\mathbf t\mathbf i\mathbf c\mathbf a\mathbf l\boldsymbol\;\mathbf f\mathbf l\mathbf i\mathbf p:A_{ijk}\rightarrow A_{(m+1-i)jk}$$
(5)

In convolution, a kernel (a 2D matrix) moves across an entire image and calculates dot products with each window it passes over. This process can be represented by an equation, where the kernel, h moves across the image and takes the dot product of h with a sub-matrix or window of matrix, f at each step. Data augmentation involves making small changes to the images, like flipping or rotating them, so that the kernel can learn more about the different features in the images.

3.2.4 Convolutional kernel function

The convolution operator is a mathematical operation that is used in CNNs to extract features from images. It performs multiplication and addition operation on each pixel sliding over a small filter over the image. Convolutional Kernel Function is:

$$\mathbf{C}\mathbf{o}\mathbf{n}\mathbf{v}(\mathbf{f},\mathbf{h})=\sum_{j}\sum_{k}{h}_{jk}.{f}_{\left(m-j\right)\left(n-k\right)}$$
(6)

Conv(f,h) represents the convolution operation applied to the input signal f using a filter h.

\({h}_{jk}\) denotes the filter coefficient at position (j,k).

\({f}_{\left(m-j\right)\left(n-k\right)}\) represents the value of the input signal f at the position (m − j,n − k), where m and n are the dimensions of the input signal f.

The variables used in the equation are as follows:

f:

The input signal, typically represented as a matrix or an array, on which the convolution operation is performed.

h:

The filter or kernel, which is a smaller matrix that slides over the input signal f to perform the convolution operation.

m:

The row index of the input signal f.

n:

The column index of the input signal f.

j:

The row index of the filter ℎ.

k:

The column index of the filter ℎ.

Conv(f,h):

The result of the convolution operation applied to the input signal f using the filter ℎ.

The convolution operation involves sliding the filter ℎ over the input signal f and computing the element-wise product of the filter coefficients \({h}_{jk}\) and the corresponding values from the input signal \({f}_{\left(m-j\right)\left(n-k\right)}\)

​The sum of these products is calculated to obtain the output value at a specific position in the output matrix resulting from the convolution operation. Convolution is a fundamental operation in image processing and signal processing, widely used in tasks such as image filtering, feature extraction, and deep learning operations like convolutional neural networks (CNNs).

Blurring is used to make image less sharp and clear, by adding noise to the image. Blurring can be useful for data augmentation because it can create new images that are similar to the original but with minor details obscured. This can help make the model more robust and accurate by exposing it to a wider range of variations in the data. Blurring transformation representation equation

$${{\varvec{A}}}_{{\varvec{i}}{\varvec{j}}{\varvec{k}}}={A}_{ijk}+\mathrm{\rm N}(\mathrm{0,0.1})$$
(7)

3.3 Model development

3.3.1 Transfer learning

Transfer learning is a machine learning technique that involves using knowledge gained from training one model and applying it to a different but related problem. In transfer learning, a pre-trained model that has already learned to recognize general patterns from a large dataset is used as a starting point for a new model. In transfer learning for research purpose used three popular model MobileNet, Inception, and ResNet.

3.3.2 MobileNet model

In this concept, Mobilenet_V2 pre-trained feature extraction model is used that is provided by Google's TensorFlow Hub. The pre-trained model is frozen which means that its weights are fixed, and only the final classification layer is trainable. It is also compiled with the Adam optimizer and Sparse Categorical Cross-Entropy loss function. Mobilenet_V2 model is used as a feature extractor to extract features from plant images. By using a pre-trained model like this, one can leverage the knowledge learned from millions of images and avoid the need to train a model from scratch, which can be time-consuming and computationally expensive.

3.3.3 Inception model

In this concept, Inception_V3 Pre trained model is used as another feature extractor for image classification tasks. The Inception model uses a more complex architecture with multiple parallel convolutional layers of different sizes to extract features from an image. This allows the model to capture both fine-grained and coarse-grained features from the image, making it more accurate than some simpler models. Inception v3 has a larger architecture and has been trained on more data than MobileNet v2.

3.3.4 ResNet model

In this approach, ResNet V2 Pre trained model is used as yet another feature extractor.ResNet (short for "Residual Network") is a type of deep neural network that is very deep, which means, it has many layers. It was designed to address the problem of vanishing gradients. Compared to MobileNet and Inception, ResNet generally performs better on more complex image classification tasks because of its deeper architecture and the use of skip connections.

3.3.5 Ensemble model using stacking

Ensemble is a technique that involves combining multiple models to improve overall performance. Specifically, the Stack Ensemble is used, which is a type of ensemble that involves stacking multiple models using a Stacking Classifier as the meta-learner for the ensemble, which combines the predictions of multiple base models to make a final prediction. The Stacking Classifier consists of two levels: the first level consists of the three pre-trained models, which serve as the base models, and the second level is the meta-learner, which combines the predictions of the base models.

This research used an Ensemble Deep Stacking approach that used three pre-trained models: MobileNet, Inception, and ResNet. The Ensemble Deep Stacking technique combines the predictions of multiple models to improve the overall accuracy of the model.

By combining the strengths of these three models using Ensemble Deep Stacking, the research aimed to improve the accuracy of potato disease detection and classification. The proposed approach takes advantage of the unique features and strengths of each pre-trained model, resulting in a more robust and accurate model for potato disease detection.

4 Results and discussions

4.1 Model training and evaluation

4.1.1 MobileNet

Table 3 shows Mobile Net Hyperparameter values. Based on Fig. 6, it can be inferred that the model performs exceptionally well after just two epochs. The accuracy of both the training and validation datasets is almost identical, indicating an excellent performance of the model. The peak accuracy achieved during training is 98.9%, while the validation accuracy reaches up to 97.2%, which is a good indication of the model's generalization ability.

Table 3 Mobile Net Hyperparameter values
Fig. 6
figure 6

Accuracy graph of Mobilenet model

Additionally, the loss values of both training and validation datasets decrease steeply after the second epoch, with the minimum values being 7.0% for training and 4.0% for validation. This signifies that the model is learning effectively and making progress towards minimizing the loss function.

4.1.2 Inception

The accuracy vs epochs graph in Fig. 7 indicates that the Inception model shows significant improvements in accuracy after just one epoch. The model achieves a peak training accuracy of 96.3%, and the validation accuracy reaches up to 94.2%. This suggests that the model is learning and generalizing well to unseen data.

Fig. 7
figure 7

Accuracy graph of inception model

On the other hand, the loss vs epochs graph reveals a rapid decrease in losses for both the training and validation datasets. The model achieves low training and validation losses of only 9% and 15%, respectively. This indicates that the Inception model performs well in minimizing the loss function and effectively learns from the training data.

4.1.3 ResNet

The accuracy vs epochs plot shows in Fig. 8 that the model's accuracy improves significantly after only two epochs. By the 10th epoch, the model achieves a high accuracy of 97% for training and 92.8% for validation. This indicates that the model is learning well and generalizing to unseen data.

Fig. 8
figure 8

Accuracy graph of ResNet model

Moreover, the losses vs epochs graph indicates that both the training and validation losses decrease significantly after the second epoch. By the 10th epoch, the model achieves low training and validation losses of 9% and 18%, respectively. This suggests that the model is effectively learning and making progress towards minimizing the loss function.

4.1.4 Performance analysis

The experiments were carried out using the Keras framework and TensorFlow—GPU on Google colab. To evaluate the proposed method's performance, confusion metrics such as accuracy, precision, F-score, recall, and loss were used.

The accuracy was calculated based on the positive and negative classes, providing insights into how well the model performs in classifying the images into their respective categories. The precision, recall, and F-score provide further information on the model's ability to predict the positive and negative classes correctly.

  1. A.

    Accuracy = (TP + TN)/ (TP + TN + FP + FN)

  2. B.

    Precision = TP / (TP + FP)

  3. C.

    Recall = TP / (TP + FN)

  4. D.

    F1 Scope = 2*((precision*recall) / (precision + recall))

Where,

TP:

True Positive

TN:

True Negative

FP:

False Positive

FN:

False Negative

In Fig. 9, a confusion matrix is a representation of the performance of a classification model that shows the counts of true positive, true negative, false positive, and false negative predictions on a set of data points. Confusion matrix for a proposed ensemble model predicts three classes: "potato early blight," "potato late blight," and "potato healthy." Fig. 9 shows potato early blight as 60 label, potato late blight as 57 label, potato healthy as 29 label.

Fig. 9
figure 9

Confusion matrix representation of the proposed ensemble model

4.2 Comparative analysis of different used transfer learning models with proposed stacking ensemble model

Table 3 presents the results obtained from different models, where each model using 857 images and 10 epochs for both training and validation. The results indicate that the Stack Ensemble model performs best than other models in terms of accuracy for PlantVillage dataset.

Table 4 and Fig. 10 presents the performance evaluation of different deep learning models for potato disease classification using a dataset of 857 images with three classes of diseases. The models were trained over 10 epochs, and their validation accuracy on unseen data was recorded: MobileNet achieved a validation accuracy of 97.20%. MobileNet is a lightweight CNN model, well-suited for mobile and resource-constrained applications. Inception achieved a validation accuracy of 94.20%. Inception is a deep learning architecture that uses inception modules to capture different features across multiple dimensions. ResNet achieved a validation accuracy of 92.80%. ResNet is known for its ability to address the vanishing gradient problem and has achieved success in various image classification tasks. Stack Ensemble achieved the highest validation accuracy of 98.86%. The "Stack Ensemble" likely refers to an ensemble model that combines the predictions of multiple individual models, demonstrating superior performance.

Table 4 Results from the different techniques for PlantVillage dataset
Fig. 10
figure 10

Comparative analysis of proposed work with previous works

Overall, the "Stack Ensemble" model outperformed the individual models, achieving the highest accuracy of 98.86%. This suggests that combining the strengths of different models through ensemble techniques can result in enhanced classification performance for potato disease detection.

4.3 Discussion of results

The results presented in this section showcase the performance of three different deep learning models, namely MobileNet, Inception, and ResNet, as well as a Stack Ensemble approach, for the task of potato disease classification. The models were trained and evaluated on a dataset of 857 images with three classes of diseases over a span of 10 epochs. The graphical analyses of accuracy and loss evolution over epochs demonstrate distinct patterns for each model.

MobileNet exhibited rapid convergence, achieving impressive accuracy (98.9% for training, 97.2% for validation) within a short period, underscoring its efficiency for resource-constrained applications. Inception, characterized by its intricate architecture, showed substantial accuracy gains after only one epoch, reaching 96.3% for training and 94.2% for validation, while effectively minimizing losses.

ResNet displayed consistent learning, achieving 97% accuracy for training and 92.8% for validation by the 10th epoch, with concurrent loss reduction. The Stack Ensemble model, a culmination of various individual models, notably outperformed all others with a validation accuracy of 98.86%, accentuating the efficacy of ensemble techniques in enhancing classification accuracy. This section underscores the suitability of MobileNet for lightweight deployment, Inception's feature-capturing prowess, ResNet's gradient-handling capabilities, and the prowess of ensemble models in achieving superior performance.

5 Conclusion

Detecting and diagnosing plant diseases in a timely and accurately is a critical issue that affects food security worldwide. The traditional method of manual identification of plant diseases is not only time-consuming but also requires expertise, making it difficult to achieve accurate and consistent results. This is where deep learning techniques come into play. In this research, a deep learning model has been developed to detect and diagnose plant diseases, with a focus on Potato Late Blight and Early disease. The model achieves a remarkable accuracy of 98.86%, which is a significant improvement over the traditional methods. The model was trained on a dataset of over 857 images sourced from the PlantVillage dataset on Kaggle. However, the model can still be improved by training it on a larger dataset. This model is a step forward in reducing the efforts of detecting plant diseases and reducing the human efforts and expertise required to detect the disease. The results obtained from this model can help farmers to detect diseases at an early stage, thereby preventing losses in yield and ensuring food security. This model can also be extended to other crops, providing a solution to the problem of plant disease detection in a broader sense. Overall, this model has the potential to revolutionize the way plant diseases are diagnosed, making it more efficient, accurate, and accessible. Future work involve enhancing the deep learning model's robustness by incorporating multi-modal data sources, such as infrared imagery or spectroscopic data, to further improve accuracy and disease detection capabilities.