1 Introduction

A major component of the Indian economy is agriculture. In terms of economic value, mangoes are the most valuable fruit crops in India (Rao et al. 2021). In India, the mango crop grows up 39% of the total area under fruit crops (Mango 2024). In terms of global mango production, India is the world’s top producer, producing more than 50% of global mango production. Agriculture is the primary source of income for more than 70% of rural households (Tharanathan et al. 2006; Birthal et al. 2014). Plants need to be protected from disease in order to produce high-quality food. Food security can be impacted by climate change, pollinator decline, water supply quality, plant diseases, etc. The Food and Agriculture Organization (FAO) of the United Nations has estimated that pests and illnesses cause up to 40% of the world’s food crops to be destroyed annually (Food and Agriculture 2024). It is a vital tropical fruit significant to global trade and agriculture. However, the sustainable production of mangoes is challenged by several diseases that adversely affect the yield and quality of mangoes (Ploetz 2003). Early and efficient detection of these diseases is crucial to implementing preventive actions and ensuring crop health. Plant pathology is the scientific study of plant diseases. Diagnosing plant diseases through eye inspection can be very challenging. Due to the wide variety of plant species cultivated worldwide, several illnesses that harm plants have evolved over time and have comparable symptoms. As a result, it may become unclear to farmers, agronomists, and plant pathologists which illnesses the plant is genuinely afflicted with (Too et al. 2019; Matheyambath et al. 2016). As a result, an inaccurate diagnosis could lead to an inappropriate course of treatment, which could harm plants more in the long run by having the opposite effect on them and leaving the disease untreated. Furthermore, this complicated procedure is time-consuming and challenging because many plant leaves resemble one another and display disease signs identical to those of other plants. Thus, farmers and agronomists would benefit significantly from a simple automated approach rather than detecting plant illnesses by looking at the harmed plant leaves and creating our own opinions.

The development of automated and computerized tools and systems has advanced significantly over the previous decade because of AI technologies like computer vision (CV), ML, and DL (Chaurasia et al. 2023; Singh et al. 2019; Elbasi et al. 2022). These completely automated tools can quickly and accurately diagnose or identify plant diseases. ML and DL are the primary AI techniques used to make agriculture smart. Multiple layers are used by DL, a subtype of machine learning technology, to convert unstructured input into meaningful information. Many complex tasks, such as feature extraction, transformation, image classification, and pattern analysis, can be completed with DL. Convolutional neural networks (CNN) are ideally suited for DL applications that need the detection or classification of a large number of classes. We employ it as the classification approach since CNN has demonstrated significant performance in applications like object identification and image recognition. It is recognized as one of the best tools for pattern detection in image processing. Its structure is intricate, with several information processing stages. Figure 1 shows that the mango leaves are associated with disease and health. The objective of the study is to build a model that can classify the mango leaf images into eight classes: Anthracnose, Bacterial Canker, Cutting Weevil, Die Back, Gall Midge, Powdery Mildew, Sooty Mould, and Healthy. As it has been demonstrated, we employ CNN with a transfer learning (TL) approach to classify diseased and healthy mango leaves. The proposed DTLD model is trained and assessed using a variety of complex algorithms, datasets, and validation methodologies in order to address the aforementioned leaf disease problem affecting mangos. We separate the data into training and testing datasets after doing some processing. Evaluate performance in terms of training and validation accuracy. In the proposed DTLD model, we used 5 pre-trained models to reduce the training time and computational resources of CNN models with TL and 8 classes.

Fig. 1
figure 1

Diseased and healthy mango leaf images

1.1 Motivation

Agriculture has a significant role in maintaining, sustaining, and supporting human life and worldwide economies. Plant diseases threaten crop output and quality, among other challenges facing agriculture. Mango trees are widespread in tropical and subtropical regions, but they can cause several illnesses that harm growers (Ploetz 2007). This research aims to increase crop yield; early disease diagnosis is essential for preventing the spread of diseases and minimizing agricultural losses. Our goal is to create a program that can identify illnesses in mango leaves quickly and accurately by using CNNs, which have demonstrated remarkable performance in image identification tasks. This early identification will enable farmers to respond quickly, resulting in an overall boost in crop output using cutting-edge technologies like CNNs, enabling precision agriculture (Shaikh et al. 2022). The recommended strategy ensures that treatments, including pesticides, are applied precisely where required by taking a customized approach to disease control. This helps farmers maximize their resource utilization while reducing the adverse environmental effects of agricultural activities. With the advent of cutting-edge technologies like CNNs in the agricultural industry, smart and technology-driven farming approaches are becoming increasingly common. By developing a consistent method for identifying mango leaf disease, we hope to transform conventional farming into a more productive and sustainable one. As the world’s population continues to rise, ensuring food security is becoming more and more crucial (Food security 2024). The overall objective of reducing the impact of diseases on crop productivity to secure food supplies is in line with the proposed research.

1.2 Contribution

The contributions of the work are summarized as follows:

  • We have prepared a dataset from open sources of different mango leaves. We have filtered them into 8 categories: Anthracnose, Bacterial Canker, Cutting Weevil, Die Back, Gall Midge, Powdery Mildew, Sooty Mould, and Healthy.

  • We have proposed a deep transfer learning-driven (DTLD) model with pretrained EfficientNetV2 and VGG16. In this proposed model, we remove the upper layers and freeze the feature extraction layers of the pretrained VGG16 model to adapt it to the diseased mango leaf classification problem for more accuracy and EfficientNetV2 taking less time than VGG16.

  • We also contribute to replacing the fully connected layer (FC) with a flatten layer, the dense layer with the ReLU activation function, and the softmax layer. This way, we reduce the number of parameters and increase the network’s sensitivity to classify the diseased mango leaf.

  • We have also used the adaptive momentum (Adam) optimizer to reduce the loss of the model and the sparse categorical cross-entropy loss function used for multiclass data.

  • The proposed DTLD can handle large datasets with ease.

  • The proposed DTLD model with pretrained EfficientNetV2 and VGG16 can achieve up to 99.28% and 99.76% accuracy respectively.

  • The DTLD model can process larger datasets with minimal training time and lower computational power.

The remaining section of this article is structured in Fig. 2 as follows: Sect. 2 presents related work. Problem formulation presents in Sect. 3. Section 4 defines the proposed methodology. The results and analysis are shown in Sect. 5, along with the proposed CNN with transfer learning based predictions. Section 6 presents the conclusion and future scope.

Fig. 2
figure 2

Structure of paper

2 Related work

To increase crop quality and productivity, it is essential to recognize and classify mango plant leaf diseases. Diseased plants can cause severe financial losses for individual farmers. In earlier research on mango leaf identification, Kien Trang et al. presented an image-based mango leaf disease identification method using a deep neural network with contrast enhancement and transfer learning from the PlantVillage dataset (Trang et al. 2019). The model was trained to distinguish 3 common diseases from the healthy mango leaf using a dataset on mango diseases and contrasted with additional trained models. The proposed model achieved an accuracy of 88.46%. U.S. Rao et al. used a transfer learning approach to classify diseased mango and grape leaves using a pre-trained deep learning model called AlexNet (Rao et al. 2021). They trained and tested the model on self-acquired 1266 images of fresh and diseased mango leaves. The proposed model achieved an accuracy of 89% for mango leaf disease classification. Venkatesh et al. used a proposed modified VGGNet model named V2IncepNet, which combines the Inception module’s most significant features with VGGNet’s, in their study (Nagaraju et al. 2020). Basic features are extracted by the VGGNet module, and the Inception module handles image classification and high-dimensional feature extraction. 2268 color images of mango leaves were included in their data collection, comprising 1070 color images downloaded from Plantvillage and 1198 color images taken in real-time self-capturing in the field. The experiment’s result was that the suggested model could accurately identify the level of anthracnose disease infection on mango leaves, up to 92%. Pham et al. used artificial neural networks (ANNs) to detect early disease in plant leaves by identifying microscopic disease blobs in mango leaf images (Pham et al. 2020). Following a phase of pre-processing with a contrast enhancement technique, every infected blob is segregated throughout the entire dataset. They have used well-known CNN models (AlexNet, VGG16, and ResNet-50) with improved transfer learning approaches and achieved an accuracy of 78.64%, 79.92%, and 84.88%, respectively. The ANN outperformed CNN models and achieved an accuracy of 89.41%. Thaseentaj and Ilango used a customized deep CNN model for the detection and classification accuracy of South Indian mango leaf disease in their study (Thaseentaj and Ilango 2023). For their study, they collected a dataset of diseased and healthy mango leaf images of different classes, including Anthracnose, Leaf Blight, and Powdery Mildew. The customized deep CNN model performed well and obtained 93.34% classification accuracy. Kumar al. introduced a novel deep-learning CNN model to identify Anthracnose disease in mango leaves in their study (Kumar et al. 2021). They have captured a real-time dataset of healthy and diseased leaves in farms in Karnataka, Maharashtra, and New Delhi for training and testing the proposed model. The dataset was divided into 80% for training and 20% for testing, and as a result, the proposed model achieved an accuracy of 96.16%. In their study, Singh et al. used a multilayer convolutional neural network (MCNN) to classify mango leaves infected by the fungal disease Anthracnose (Singh et al. 2019). The dataset of 1070 photos of mango tree leaves was taken at the Shri Mata Vaishno Devi University in Katra, J&K, India, for training and testing the MCNN model. Images of healthy and diseased leaves are included in the dataset; the suggested MCNN model achieved a classification accuracy of 97.13%. Rajbongshi et al. used DenseNet201, InceptionResNetV2, InceptionV3, ResNet50, ResNet152V2, and Xception—all these CNN models with transfer learning techniques to detect mango leaf disease (Rajbongshi et al. 2021). They evaluated the overall performance matrices on a dataset of 1500 mango leaf images of different mango leaf diseases such as anthracnose, gall machi, powdery mildew, and red rust. They found that the DenseNet201 model achieved the highest accuracy of 98%. In their study, Shaik and Swamykan used CNN techniques to identify 13 distinct fungal and bacterial mango leaf diseases (Shaik and Swamykan 2023). The images were taken in the Chittoor district of Andhra Pradesh state, India’s mango-growing region. Popular CNN techniques, such as GooGLeNet, EfficientNet, and ResNet-50, were used to train the dataset of 1100 images of various diseased mango leaves. Their study found that the EfficientNet model classified mango leaf illnesses with the highest accuracy, at 98.7%. In their research, Jayanthi and Kumar used CNNs to classify leaves from mango plants into healthy and diseased categories (Jayanthi and Kumar 2024). They compared the performance of CNN models, AlexNet, ResNet-50, and VGG-16, for identifying diseased mango leaves. The models were trained on the Mendeley dataset; ALexNet achieved an accuracy of 94.54% and consumed less training time. ResNet-50 and VGG-16 achieved testing accuracies of 98.56% and 98.26%, respectively. Due to heterogeneity in the dataset and multi-class disease detection, deep transfer learning is necessary and gives more accurate result.

The research (Chouhan et al. 2020) addresses a crucial issue in agricultural health management by utilizing cutting-edge ML approaches for disease identification. Mango output is significantly impacted by the fungus anthracnose, which affects mango leaves. As a result, effective and precise detection techniques are required. The authors suggest a novel method for segmenting and identifying sick areas on mango leaves that combines RBF neural networks with web-facilitated systems. While previous research has shown that various approaches, such as image processing and neural networks, can detect plant diseases, this study is the first to integrate these techniques with web-based facilitation to improve accessibility and real-time application (Iftikhar et al. 2024). RBF neural networks are particularly noteworthy for effectively segmenting complex illness patterns because of their robust classification capabilities and quicker training times. The authors highlight the system’s efficacy in distinguishing healthy and infected leaf areas by carefully outlining the dataset gathering, pre-processing, and neural network training steps. The RBF network performs better than other neural network models regarding accuracy and computational efficiency. Integrating web-based technologies enhances practical application and promotes timely intervention and disease management by enabling farmers and other agricultural experts to access and use the system remotely (Kalfas et al. 2024). This study contributes substantially to precision agriculture by providing a reliable method for early disease diagnosis, essential for reducing crop losses and improving the quality of yields. However, model complexity and accuracy over different datasets are an issue. The future scope of this work is to add more sophisticated deep-learning models to increase segmentation accuracy and broaden the system’s detection capabilities to encompass additional plant illnesses. Another application of DL for forecasting and prediction is presented in Jin and Xu 2024a; Xu and Zhang 2021a; Jin and Xu 2024; Jin and Xu 2024b; Jin and Xu 2024c; Jin and Xu 2024). Price forecasting through neural networks for crude oil, heating oil, and natural gas (Jin and Xu 2024a) research study investigates the use of neural networks for price forecasting in the markets for natural gas, heating oil, and crude oil. Because they can capture intricate, non-linear correlations in time-series data, neural networks are used. By taking into account variables like past price data, market demand, geopolitical developments, and economic indicators, the study seeks to improve predicting accuracy. The study shows how well neural network models predict changes in energy market prices by contrasting them with more conventional forecasting techniques (Xu and Zhang 2021a; Jin and Xu 2024; Jin and Xu 2024b, 2024c; Jin and Xu 2024). For stakeholders, politicians, and energy traders looking for sound approaches to risk management and decision-making, this offers insightful information.

Agricultural commodity price forecasting is introduced in Xu and Zhang 2022a. Neural network models for house price forecasting in China are introduced in Xu and Zhang 2021b. By comparing neural network models with more traditional forecasting methods, the study demonstrates how well these models predict changes in energy market prices (Xu and Zhang 2022b). This provides useful information for legislators, energy dealers, and stakeholders seeking sensible methods for risk management and decision-making (Xu and Zhang 2022c). The work (Meenakshi 2023) introduces the automatic detection of diseases in the leaves of medicinal plants using a modified logistic regression algorithm. ResNet-based classification for tomato and potato leaf disease detection and SVM were also compared (Kalaivani et al. 2024; Deshpande and Patidar 2023). CNN models with different convolution layers, AlexNet and MobileNet, classify plant leaf diseases into 26 classes, as introduced in Indira and Mallika (2024). A transfer learning approach is employed in the health care system (Shukla et al. 2024), to improve plant leaf disease prediction through enhanced deep feature representations as mentioned in Naralasetti and Bodapati (2024). ML approaches for health care systems, intrusion detection systems, and plant disease detection-data deployment of comparative analysis are illustrated in Tripathi et al. (2023); Bajpai et al. 2023), and (Deepti 2023), respectively. For IDS detection and plant leaf recognition and classification, the random forest (RF) and whale optimization algorithm (WOA) are addressed in Bajpai et al. (2024) and (Pankaja and Suma 2020), respectively. In (Dey et al. 2022), the use of adaptive neuro-fuzzy inference systems (ANFIS) and artificial neural networks (ANN) in the detection of pathogens affecting betel leaves, a crop that is highly profitable in many areas, is investigated. Fungal pathogens are the main cause of betel leaf diseases, which can result in significant losses for agriculture and call for accurate and efficient detection techniques. The authors discuss the drawbacks of manual inspection methods, which are frequently laborious and prone to human mistake. They then suggest using artificial neural networks (ANN) and ANFIS classifiers to improve and automate pathogen detection. ANFIS, which combines neural network learning skills with fuzzy logic technique for addressing uncertainty, and ANNs, which are renowned for their capacity to model intricate patterns and relationships within data, are both used to categorize healthy and diseased leaves based on picture attributes (Rashed and Popescu 2024). The necessity of choosing pertinent features to enhance classification performance is emphasized in the paper’s description of the dataset preparation, feature extraction techniques, and training procedures for both classifiers. A comparative investigation shows that although both ANN and ANFIS models attain high accuracy, because of its fuzzy logic component, the ANFIS classifier performs better when processing imprecise input and yields more interpretable results. Farmers can benefit from a scalable and effective solution when these intelligent technologies are integrated into disease detection frameworks, as it can greatly minimize reliance on expert knowledge. Future research topics suggested by the authors include developing more user-friendly interfaces for wider adoption, exploring alternative machine learning models, and integrating a wider variety of datasets. By demonstrating the potential of ANN and ANFIS classifiers to enhance plant disease management techniques, this study advances precision agriculture. A multivariate time series classification based on DL is presented in Xiao et al. 2024a; Xiao et al. 2024b). This work also uses a densely knowledge-aware network (DKN) for MTSC using a vision transformer model for time series classification. Another application of the vision transformer model for the health care system is presented in Singh et al. (2023). In (Xiao et al. 2023), CapMatch-a semi-supervised contrastive capsule transformer method with feature-based knowledge distillation (KD)-is presented as a way to streamline the current semi supervised learning (SSL) approaches (Xiao et al. 2024c) for wearable human activity recognition (HAR).

The utilization of CNN model for disease diversity detection and data diversity, overfitting, model complexity, efficiency, and overlearning are the research gaps in mango leaf disease detection by deep learning models. In light of this, deep transfer learning models show immense potential for reliably and precisely detecting mango leaf disease. The detailed problem formulation based on the existing research gap is presented in the next Section.

3 Problem formulation

The “king of fruits,” the mango, is grown in approximately 1500 varieties throughout India, of which about a thousand are grown commercially. Mango crops are an essential part of India’s agricultural economy and culture, but several illnesses have a negative impact on their quality and productivity. Detecting diseases effectively is critical to minimizing losses and maintaining crop health. Because traditional methods are subjective and effectively identifying diseases is complex, they are often insufficient. Traditional methods generally rely on visual inspection. Thus, an automated technique to identify mango leaf disease is essential for sustainable farming. Mango leaf disease identification is challenging because of human error, symptom similarities, changing climatic conditions, and the requirement for real-time, non-destructive techniques. Mango leaf illnesses commonly affecting them include Sooty mold, Bacterial Canker, Cutting Weevil, Die Back, Gall midge, Powdery Mildew, and Anthracnose. Each of these diseases calls for a different approach to care. An automated system that uses cutting-edge technology like ML and image processing is suggested to solve these problems. With an emphasis on accuracy, efficiency, real-time operation, non-destructive procedures, cost-effectiveness, user-friendliness, environmental adaptability, and handling changes in leaf size, shape, and colour, this system seeks to identify and classify mango leaves as healthy or diseased. Convolutional neural networks (CNNs) are used for model training, pre-processing, validation, testing, and deployment via a web interface or mobile application. This method uses cutting-edge technologies to diagnose diseases quickly and accurately. This allows farmers to take prompt, appropriate action to protect their crops. Adopting such an automated detection system can transform disease management in mango farming by improving crop health and productivity and strengthening the overall sustainability of mango agriculture. Future developments might incorporate extra features like remote sensing, predictive analytics, and agricultural advice services to further promote efficient crop management.

4 Proposed methodology

This section describes the methodology of the proposed deep transfer learning-driven (DTLD) model, with a brief discussion of the dataset and preprocessing techniques. Figure 3 shows the workflow of the proposed model. The proposed methodology of the model is divided into 7 phases. The first phase is dataset collection. Data preprocessing is done in the second phase, and image data augmentation is done in the third phase. The model is being trained in the fourth phase. Model testing, feature extraction, and classification are done in phases 5, 6, and 7 of the proposed methodology.

Fig. 3
figure 3

Workflow of the proposed model

4.1 Dataset

The dataset used in this research was taken from Mendeley open-source reliable data (Dataset 2023; Ali et al. 2022). The dataset consists of 4000 images of 8 different mango leaves, i.e., 500 images for each. The images are present in various resolutions. Each image was stored in RGB format with a depth of 24 bits. However, the dataset was prepared using mobile phones. This dataset is used as the training and testing set for the model. The dataset consists of 8 mango varieties: anthracnose, bacterial canker, cutting weevil, die back, gall midge, powdery mildew, sooty mould, and healthy.

4.2 Data preprocessing

In the dataset, images of various resolutions and sizes are present. Adjusting the image’s shape is necessary to improve the model’s performance and avoid the model’s overfitting. Resizing and rescaling techniques change the image’s shape (Saponara and Elhanashi 2021). Resizing involves adjusting the width and height to 240 × 240. The width-to-height ratio must be maintained during rescaling. Therefore, rescaling doesn’t result in skewing or distortion. The dataset is divided into 3 sets: training, validation, and testing. The training dataset is used for model training, while the validation dataset is used for model validation at the model training time. Lastly, the testing dataset is used for model testing.

4.3 Image data augmentation

To decrease the model’s overfitting, the image data augmentation technique is used in DL and ML to expand the quality of training datasets (Ali et al. 2022). It can enhance model performance and extend datasets that are now small. Although numerous approaches for augmenting image data exist, we have only employed the DL technique.

  • Flipping One of the most popular techniques for augmenting visual data is flipping (Yang et al. 2022). Using this technique, an image can be flipped both vertically and horizontally. It assists in making the model invariant to object orientation, such as top-to-bottom or left-to-right.

  • Rotation To augment the image, rotate the image right or left on an axis that ranges from 1° to 359° around its center (Yang et al. 2022).

4.4 Training model

In this phase, different model architectures with optimizer are briefly discussed as follows:

  • Model architecture Transfer learning applied to deep neural networks is called as deep transfer learning. In this study we use inductive transfer learning for multi class classification, and define as: Given a source domain \({D}_{S}\) and a learning task \({T}_{S}\), a target domain \({D}_{T}\) and a learning task \({T}_{T}\), inductive transfer learning aims to help improve the learning of the target predictive function \({f}_{T}(\bullet )\) in \({D}_{T}\) using the knowledge in \({D}_{S}\) and \({T}_{S}\), where \({T}_{S}\ne {T}_{T}\). In deep transfer learning model development, we first load the pre-trained CNN model without a classification layer, then freeze the convolutional layer and add a custom classification layer for multiclass classification (Plested and Gedeon 2022).

The architecture of the DTLD model based on the VGG16 model is shown in Fig. 4. In this model, we load the pre-trained VGG16 model without the classification layer and then freeze the convolutional layers of the VGG16 model. Then, add a custom classification layer for multiclass classification. We use pre-trained parameters of the VGG16 model for mango leaf classification.

Fig. 4
figure 4

Architecture of the DTLD model (VGG16)

The architecture of the DTLD model based on the DenseNet121 model is shown in Fig. 5. In this model, we load the pre-trained DenseNet121 model without a classification layer and then freeze the convolutional layers of the DenseNet121 model. Then, add a custom classification layer for multiclass classification. We use the pretrained parameters of the DenseNet121 model for mango leaf classification. In DenseNet layout consequently, the \(l\) th layer receives the feature-maps of all preceding layers, \({x}_{0}, . . . , {x}_{l-1}\), as input:

$${x}_{l} = {H}_{l}([{x}_{0},{x}_{1}, . . . , {x}_{l-1}])$$

where [\({x}_{0},{x}_{1}, . . . , {x}_{l-1}\)] refers to the concatenation of the feature-maps produced in layers \(0, . . . , l-1\) (Huang et al. 2017).

Fig. 5
figure 5

Architecture of the DTLD model (DenseNet121)

The architecture of the DTLD model based on the DenseNer201 model is shown in Fig. 6. In this model, we load the pre-trained DenseNer201 model without a classification layer and then freeze the convolutional layers of the DenseNer201 model. Then, add a custom classification layer for multiclass classification. We use pre-trained parameters of the DenseNer201 model for mango leaf classification.

Fig. 6
figure 6

Architecture of the DTLD model (DenseNet201)

Figure 7 depicts the architecture of the DTLD model, which is based on the MobileNet model. In this model, the convolutional layers of a pretrained MobileNet model without a classification layer are loaded and then frozen. For multiclass classification, add a custom classification layer after that. For classifying mango leaves, we employ the MobileNet model’s pretrained parameters. In MobileNet the standard convolutional layer is parameterized by convolution kernel \(K\) of size \({D}_{K}\times {D}_{K} \times M\times N\) where \({D}_{K}\) is the spatial dimension of the kernel assumed to be square and M is number of input channels and N is the number of output channels as defined previously (Howard et al. 2017). The output feature map for standard convolution assuming stride one and padding is computed as:

Fig. 7
figure 7

Architecture of the DTLD model (MobileNet)

$$Gk,l,n = \sum_{i,j,m}Ki,j,m,n \cdot Fk+i-1,l+j-1,m$$

Standard convolutions have the computational cost of:

$${D}_{K} \cdot {D}_{K} \cdot M \cdot N \cdot {D}_{F} \cdot {D}_{F}$$

where the computational cost depends multiplicatively on the number of input channels \(M\), the number of output channels N the kernel size \({D}_{K} \times {D}_{K}\) and the feature map size \({D}_{F} \times {D}_{F}\).

Depthwise convolution with one filter per input channel (input depth) can be written as:

$$\widehat{G} k,l,m = \sum_{i,j}\widehat{K} i,j,m \cdot Fk+i-1,l+j-1,m$$

where \(\widehat{K}\) is the depthwise convolutional kernel of size DK × DK × M where the mth filter in \(\widehat{K}\) is applied to the mth channel in F to produce the mth channel of the filtered output feature map \(\widehat{G}\).

Depthwise convolution has a computational cost of:

$${D}_{K} \cdot {D}_{K} \cdot M \cdot {D}_{F} \cdot {D}_{F}$$

Architecture of the DTLD Model based on EfficientNetB2 model is shown in Fig. 8. In this model we load pretrained EfficientNetB2 model without classification layer then freeze convolutional layers of EfficientNetB2 model. Then add custom classification layer for multiclass classification. We use pretrained parameters of EfficientNetB2 model for mango leaf classification.

  • Loss function and optimization To direct the model during training, defining an efficient loss function is essential. For multiclass classification issues, the categorization of mango leaves and sparse categorical cross-entropy are utilized. Optimization techniques like Adam change the model weights to reduce the loss (Kingma and Ba 2014; Choi et al. 2019).

  • Adaptive moment estimation, or “Adam” optimizer, is a term for an iterative optimization process that reduces the loss function when neural networks are being trained. Adam can be described as a stochastic gradient descent with momentum combined with RMSprop. We updated the parameters according to the below-given formulas:

Fig. 8
figure 8

Architecture of the DTLD model (EfficientNetB2)

The default setting for the tested machine learning problems is hyperparameters \(\alpha =0.001, {\beta }_{1}=0.9, {\beta }_{2}=0.999\). On vector all operations are element wise.

$${{\varvec{A}}{\varvec{D}}{\varvec{A}}{\varvec{M}}({\varvec{H}}}_{{\varvec{t}}},{\boldsymbol{\alpha }}_{{\varvec{t}}},{{\varvec{\beta}}}_{1},{{\varvec{\beta}}}_{2},{\varvec{\epsilon}})$$
$${m}_{0}=0,{v}_{0}=0$$

\({m}_{t+1}={\beta }_{1}{m}_{t}+\left(1-{\beta }_{2}\right)\nabla l\left({\theta }_{t}\right)\) //Update biased first moment estimate.

\({v}_{t+1}={\beta }_{2}{v}_{t}+\left(1+{\beta }_{2}\right)\nabla l{\left({\theta }_{t}\right)}^{2}\) //Update biased second raw moment estimate.

\({b}_{t+1}=\frac{\sqrt{1-{\beta }_{2}^{t+1}}}{1-{\beta }_{1}^{t+1}}\)//Corrected bias.

\({\theta }_{t+1}={\theta }_{t}-{\alpha }_{t}\frac{{m}_{t+1}}{\sqrt{{v}_{t+1}}}{b}_{t+1}\)//Update parameters. where \(l\) represents loss function, \(\nabla l\left(\theta \right)\) is stochastic estimate of the true gradient, \(\theta \epsilon { R}^{d}\) represents model parameter, \(t\) is time step, \(m\) is biased first moment estimate, \(v\) is biased second moment estimate and \(b\) is bias.

  • Hyperparameter optimization Adjusting hyperparameters like learning rate and batch size is a critical step (Yu and Zhu 2020). This process goes through several iterations to determine the best configuration for the particular mango fruit variety classification task.

  • Training process The proposed DTLD model is trained through 50 epochs on the prepared mango leaf dataset. The proposed model processes all the training data in each epoch, adjusts its weights and parameters, and refines its understanding of mango leaf features. At training time, monitoring training metrics like accuracy and loss is crucial for assessing model performance

4.5 Feature extraction

The technique of converting unprocessed data into manageable numerical features while maintaining the information included in the original data set is known as feature extraction. It produces better outcomes than simply applying machine learning to the raw data. Feature extraction is a crucial ML technique to increase the effectiveness of algorithms used for tasks like object identification, facial recognition, and image categorization. Techniques for extracting features are:

  • Handcrafted elements Conventional image processing methods rely on artificial characteristics such as color histograms, local binary patterns (LBP), and histograms oriented gradients (HOG). These features were designed with the assistance of domain expertise and experience.

  • Deep learning-based features As a result of the advancements in DL, CNNs are now applicable models for automatically extracting features. CNN layers use hierarchical feature learning to extract complicated patterns and representations from the input data automatically (Jogin et al. 2018).

Feature extraction is the cornerstone of merging machine learning and image processing because it enables systems to extract relevant information from visual input for various applications. This indicates that in the future, machines will be able to comprehend and interpret graphic data with previously unheard-of efficiency and accuracy. Features extracted by VGG16 on a huge imagenet dataset are used in our proposed model.

4.6 Testing model

A subset of the dataset is kept for validation to assess the model’s performance during training. This set is not used for training; instead, it is a benchmark for evaluating how effectively the model generalizes to new data.

  • Evaluation matrix Mango leaf disease classification is frequently evaluated using the F1 score, recall, accuracy, and precision. These metrics provide comprehensive measures of the model’s functioning, highlighting its performance.

  • Confusion matrix A confusion matrix is an excellent technique for assessing a model’s classification performance (Navin and Pankaja 2016). It provides data on the percentage of each class that was correctly or incorrectly classified, providing insight into potential reasons why some classes may have been misclassified.

  • Fine-tuning Researchers may apply more modifications to the model based on the validation results. This iterative process involves adjusting hyperparameters or changing the architecture to increase the model’s overall performance.

4.7 Classification

In a CNN architecture, FCs are used for classification, pooling layers for down sampling, and convolutional layers for feature extraction. To improve its parameters during training, the model learns to minimize a predetermined loss function using techniques like stochastic gradient descent and backpropagation. Data augmentation is widely used to enhance generalization and prevent overfitting. Rectified linear unit, or ReLU for short, is a non-linear activation function used in the fully connected layer except the output layer (Dubey et al. 2022). Utilizing the ReLU function has the advantage that not every neuron is stimulated simultaneously. This suggests that a neuron won’t become inactive until the linear transformation's output is zero. The mathematically definable expression for it is

$$f\left(x\right)=\left\{\begin{array}{c} x , x>=0\\ 0 , x<0\end{array}\right.$$

or

$$f\left(x\right)=max(0,x)$$

The model’s performance is gradually improved during the training process by iteratively going over the entire training dataset. After being trained, the CNN can categorize previously unseen data. During the classification phase, Softmax activation is widely used to create probability distributions across classes for multiclass classification (Nwankpa et al. 2018). For an input vector (\(z\)) and several classes (\(k\)), the softmax function is

$$Softmax\left({z}_{r}\right)=\frac{{e}^{{z}_{r}}}{{\sum }_{j=0}^{k}{e}^{{z}_{j}}}$$

where \(e\) is the base of the natural logarithm.

The input is subsequently assigned to the class with the highest probability and classifies the mango fruit category.

5 Result and analysis

The efficacy of the DTLD model has been simulated, validated, and tested using NumPy (Harris et al. 2020), Scikit-learn (Pedregosa et al. 2011), TensorFlow (2.10.1) (Tensorflow 2023), and Python 3 (3.9) (Python 2023). An Intel Core i9-10900 K HP G5 Workstation with a 2.70 GHz CPU with 20 cores and 32 GB of RAM and the Windows 11 operating system and an NVIDIA RTX 3070Ti with 8 GB of VRAM make up the system. We have trained and tested DTL model on a dataset (Ali et al. 2022) of around 4000 images for mango leaf disease detection. The dataset consists of images of 7 disease categories i.e., Anthracnose, Bacterial Canker, Cutting Weevil, Die Back, Gall Midge, Powdery Mildew, Sooty Mould, and Healthy leaves. To evaluate the performance of the DTLD model in two cases followed by subsections:

In case 1, the dataset is divided into a ratio of 80:10:10, which means 80% for training, 10% for validation, and 10% for testing, considered D1. In case 2, the dataset is divided into a ratio of 70:15:15, which means 70% for training, 15% for validation, and 15% for testing, considered D2.

5.1 Case 1: to evaluate the performance of the DTLD model on the dataset (80:10:10)

The DTLD model based on DenseNet121, which achieved an overall accuracy of 92.25% at dataset D1, is depicted in Fig. 9. To identify mango leaf disease, the dataset has been trained across 50 epochs. Figure 9a shows that the training accuracy is gradually increasing, up to 92.22%; the overall accuracy is up to 92.25%. Figure 9b shows that the training and validation losses gradually decrease from 1 to 50 epochs. At 50 epochs, the training loss is 23.16%. However, the validation loss at 50 epochs is 28.49%.

Fig. 9
figure 9

Training, validation accuracy and loss curve of the DTLD model (DenseNet121) over D1 dataset

The confusion matrix of the DTLD model based on DenseNet121 is used to evaluate the test classification efficiency (Fig. 10). The confusion matrix contains the values of true positive, true negative, false positive, and false negative. The higher diagonal values in the confusion matrix show the model’s accurate predictions. It has been highlighted that 93 images are correctly identified as Anthracnose disease, 7 images are misclassified in other classes, 103 images are correctly classified as Bacterial Canker, 11 images of mango leaves are misclassified in other classes, 86 images are detected correctly as Cutting Weevil and 7 images misclassified in other classes, 96 images of mango leaves correctly classified as Die Back and only 1 image is misclassified in other classes, 82 images of mango leaves successfully classified as Gall Midge and 20 images misclassified in other classes, 93 images of healthy mango leaves are successfully classified as Healthy and 0 image is misclassified, 91 images are successfully classified as Powdery Mildew and 10 images are misclassified in other classes, 94 images are successfully classified as Sooty Mould disease and 6 images are misclassified in other classes.

Table 1 shows the performance of the DTLD model based on the DenseNet121 model in terms of precision, recall, and F1 score. The result shows that the precision of Die Back is up to 98.97%, the recall of Cutting Weevil and Die Back is high, i.e., 1.0, and the F1 score is also high for the Die Back class, up to 99.48%. Similarly, the precision of Healthy mango leaves is 1.0, the low recall of Sooty Mould is 70.68%, and the low F1 Score of Bacterial Canker is 94.50%.

Table 1 Precision, recall and F1-score of the DTLD model (DenseNet121)
Fig. 10
figure 10

Confusion matrix of the DTLD model (DenseNet121)

The DTLD model based on DenseNet201 trained and tested over the D1 dataset, which achieved an overall testing accuracy of 91.35%, is depicted in Fig. 11. To identify mango leaf disease, the dataset has been trained across 50 epochs. Figure 11a shows that the training accuracy is gradually increasing, up to 96.09%; the overall accuracy is 91.35%. Figure 11b shows that the training and validation dataset loss is slowly decreasing from 1 epoch up to 50 epochs. At 50 epochs, the training loss is 16.64%. However, validation loss at 50 epochs is 35.21%.

Fig. 11
figure 11

Training, validation accuracy and loss curve of the DTLD model (DenseNet201) over D1 dataset

The confusion matrix of the DTLD model based on DenseNet201 is used to evaluate the test classification efficiency (Fig. 12). It has been highlighted that 100 images are correctly identified as Anthracnose disease, 5 images are misclassified in other classes, 95 images are correctly classified as Bacterial Canker, 13 images of mango leaves are misclassified in other classes, 97 images are detected correctly as Cutting Weevil, and 0 images are misclassified in other classes. 79 images of mango leaves are correctly classified as Die Back, and only 9 images are misclassified in other classes. 68 images of mango leaves are successfully classified as Gall Midge, and 34 images are misclassified in other classes. 92 images of healthy mango leaves are successfully classified as healthy, and 0 images are misclassified; 111 images are successfully classified as Powdery Mildew and 0 images are misclassified in other classes, 91 images are successfully classified as Sooty Mould disease, and 6 images are misclassified in other classes.

Fig. 12
figure 12

Confusion matrix of the DTLD model (DenseNet201)

Table 2 shows the performance of the DTLD model based on the DenseNet201 model in terms of precision, recall, and F1 score. The result shows that the precision of Die Back is up to 89.77%, the recall of Anthracnose, Bacterial Canker and Die Back is high, i.e., 1.0, and the F1 score is high for the Healthy class up to 98.40%. Similarly, the precision of Cutting Weevil, Healthy, and Powdery Mildew mango leaves is 1.0, the low recall of Sooty Mould is 77.12%, and the low F1 Score of Bacterial Canker is 93.60%.

Table 2 Precision, recall and F1-score of the DTLD model (DenseNet201)

The DTLD model based on MobileNet, which achieved an overall testing accuracy of 97.36% over the D1 dataset, is depicted in Fig. 13. To identify mango leaf disease, the dataset has been trained across 50 epochs. Figure 13a shows that the training accuracy is gradually increasing, up to 97.41%; the overall accuracy is up to 92.25%. Figure 13b shows that the training and validation dataset loss is gradually decreasing from 1 epoch up to 50 epochs. At 50 epochs, the training loss is 8.28%. However, the validation loss at 50 epochs is 13.83%.

Fig. 13
figure 13

Training, validation accuracy and loss curve of the DTLD model (MobileNet) over D1 dataset

The confusion matrix of the DTLD model based on MobileNet is used to evaluate the test classification efficiency (Fig. 14). It has been highlighted that 109 images are correctly identified as Anthracnose disease, 4 images are misclassified in other classes, 85 images are correctly classified as Bacterial Canker, 9 images of mango leaves are misclassified in other classes, 100 images are detected correctly as Cutting Weevil, and 0 images are misclassified in other classes. 94 images of mango leaves are correctly classified as Die Back, and only 1 image is misclassified in other classes. 91 images of mango leaves are successfully classified as Gall Midge and 4 images misclassified in other classes; 100 images of healthy mango leaves are successfully classified as Healthy and 0 images are misclassified; 94 images are successfully classified as Powdery Mildew, and 2 images were misclassified in other classes, 101 images were successfully classified as Sooty Mould disease, and 6 images are misclassified in other classes.

Fig. 14
figure 14

Confusion matrix of the DTLD Model (MobileNet)

Table 3 shows the performance of the DTLD model based on the MobileNet model in terms of precision, recall, and F1 score. The result shows that the precision of Die Back is up to 98.95%, the recall of Cutting Weevil and Healthy is high, i.e., 1.0, and the F1 score is also high for the Cutting Weevil class, up to 100%. Similarly, the precision of Anthracnose mango leaves is 96.46%, the low recall of Powdery Mildew is 89.32%, and the low F1 Score of Gall Midge is 93.33%.

Table 3 Precision, recall and F1-score of the DTLD model (MobileNet)

Figure 15 shows the EfficientNetB2 based on DTLD model, which tested with 99.28% accuracy on the D1 dataset. Figure 15a shows that the training accuracy is gradually increasing, up to 99.12%; the overall testing accuracy is up to 99.28%. Figure 15b shows that the training and validation dataset loss is gradually decreasing from 1 epoch up to 50 epochs. At 50 epochs, the training loss is 10.80%. However, the validation loss at 50 epochs is 0.06%.

Fig. 15
figure 15

Training, validation accuracy and loss curve of the DTLD model (EfficientNetB2) over D1 dataset

The confusion matrix of the DTLD model based on EfficientNetB2 is used to evaluate the test classification efficiency (Fig. 16). It has been highlighted that 110 images are correctly identified as Anthracnose disease, one image is misclassified in other classes, 94 images are correctly classified as Bacterial Canker, 0 images of mango leaves are misclassified in other classes, 96 images are detected correctly as Cutting Weevil and 0 images misclassified in other classes, 99 images of mango leaves correctly classified as Die Back and only 0 image is misclassified in other classes, 106 images of mango leaves successfully classified as Gall Midge and 0 images misclassified in other classes, 99 images of healthy mango leaves are successfully classified as Healthy and 0 image is misclassified, 96 images are successfully classified as Powdery Mildew and 1 images are misclassified in other classes, 97 images are successfully classified as Sooty Mould disease and 1 image is misclassified in other classes.

Fig. 16
figure 16

Confusion matrix of the DTLD model (EfficientNetB2)

Table 4 shows the performance of the DTLD model based on the EfficientNetB2 model in terms of precision, recall, and F1 score. The result shows that the precision of Anthracnose is up to 99.10%, the recall of Bacterial Canker, Cutting Weevil, Gall Midge, and Powdery Mildew is high, i.e., 1.0, and the F1 score is also high for the Bacterial Canker and Cutting Weevil class, up to 100%. Similarly, the precision of Bacterial Canker, Cutting Weevil, Die Back, and Healthy Mango Leaves is 1.0, the low recall of Sooty Mould is 98.98%, and the low F1 Score of Sooty Mould is 98.98%.

Table 4 Precision, recall and F1-score of the DTLD model (EfficientNetB2)

The DTLD model based on VGG16, which achieved an overall testing accuracy of 99.76% on the D1 dataset, is depicted in Fig. 17. Figure 17a shows that the training accuracy is gradually increasing, up to 99.50%; the overall testing accuracy is up to 99.76%. Figure 17b shows that the loss of training and validation datasets is gradually decreasing from 1 epoch up to 50 epochs. At 50 epochs, the training loss is 1.90%. However, the validation loss at 50 epochs is 0.13%.

Fig. 17
figure 17

Training, validation accuracy and loss curve of the DTLD model (VGG16) over D1 dataset

The confusion matrix of the DTLD model based on VGG16 is used to evaluate the test classification efficiency (Fig. 18). It has been highlighted that 94 images are correctly identified as Anthracnose disease, zero images are misclassified in other classes, 108 images are correctly classified as Bacterial Canker, 0 images of mango leaves are misclassified in other classes, 92 images are detected correctly as Cutting Weevil and 0 images misclassified in other classes, 91 images of mango leaves correctly classified as Die Back and 0 image is misclassified in other classes, 99 images of mango leaves successfully classified as Gall Midge and 0 images misclassified in other classes, 114 images of healthy mango leaves are successfully classified as Healthy and 0 image is misclassified, 105 images are successfully classified as Powdery Mildew and only one images is misclassified in other classes, 96 images are successfully classified as Sooty Mould disease and 0 images are misclassified in other classes.

Fig. 18
figure 18

Confusion matrix of the DTLD model (VGG16)

To calculate the different parameter such as Precision, Recall and F1-score the formulas used were:

$$Precision = \frac{{True\;Positives}}{{True\;Positives + False\;Positives}}$$
$$Recall = \frac{{True\;Positive{\text{s}}}}{True\;Positives + False\;Negatives}$$
$$F1 - Score = \frac{2*Precision*Recall}{{Precision + Recall}}$$

Table 5 shows the performance of the DTLD model based on the VGG16 model in terms of precision, recall, and F1 score. The dataset is significant; it is categorized into small and simple classes for every class, followed by processing, which makes it easy for the model to achieve perfect scores. So, the result shows that the precision of Powdery Mildew is up to 99.06%, the recall of Anthracnose, Bacterial Canker, Cutting Weevil, Die Back, Gall Midge, Healthy, and Powdery Mildew is high, i.e., 1.0, and the F1 score is also high for the Anthracnose, Bacterial Canker, Cutting Weevil, Die Back, Gall Midge, and Healthy classes, up to 100%. Similarly, the precision of Anthracnose, Bacterial Canker, Cutting Weevil, Die Back, Gall Midge, Healthy, and Sooty Mould mango leaves is 1.0, the low recall of Sooty Mould is 98.97%, and the low F1 score of Powdery Mildew is 99.53%.

Table 5 Precision, recall and F1-score of the DTLD model (VGG16)

The accuracy of all DTLD models on the D1 dataset is shown in Fig. 19. We have considered that the DTLD model based on DenseNet121 represented DTLD1, the DTLD model based on DenseNet201 represented DTLD2, the DTLD model based on MobileNet represented DTLD3, the DTLD model based on EfficientNetB2 represented DTLD4, and the DTLD model based on VGG16 represented DTLD5. The DTLD5 model achieves the highest accuracy among them and takes more time in training and testing than DTLD4—results show that the VGG16 deep transfer learning model achieved a significant accuracy of 99.76. Similarly, the DTLD4 model also achieved an accuracy of up to 99.28, and it takes less time to train, validate, and test compared to DTLD5.

Fig. 19
figure 19

Comparative accuracy of all DTLD models on D1 dataset

5.2 Case 2: to evaluate the performance of the DTLD model on the dataset (70:15:15)

All DTLD models also trained and tested at D2 dataset. The DTLD model is based on DenseNet121, which achieved an overall accuracy of 87.25% at the D2 dataset. Which is not significantly good. Then DTLD model based on DenseNet201, achieved an overall accuracy of 92.28%. The DTLD model based on MobileNet, which achieved an overall testing accuracy of 93.28%, at the D2 dataset is depicted in Fig. 20. To identify mango leaf disease, the dataset has been trained across 50 epochs. Figure 20a shows that the training accuracy is gradually increasing, up to 97.77%; the validation accuracy is up to 93.75%. Figure 20b shows that the loss of training and validation dataset is gradually decreasing from 1 epoch up to 50 epochs. At 50 epochs, the training loss is 6.84%. However, validation loss at 50 epochs is 21.57%.

Fig. 20
figure 20

Training, validation accuracy and loss curve of the DTLD model (DenseNet) over D2 dataset

The confusion matrix of the DTLD model based on DenstNet is used to evaluate the test classification efficiency (Fig. 21). It has been highlighted that 154 images are correctly identified as Anthracnose disease, 29 images are misclassified in other classes, 149 images are correctly classified as Bacterial Canker, 5 images of mango leaves are misclassified in other classes, 146 images are detected correctly as Cutting Weevil and 0 images misclassified in other classes, 137 images of mango leaves correctly classified as Die Back and only 2 images is misclassified in other classes, 136 images of mango leaves successfully classified as Gall Midge and 8 images misclassified in other classes, 146 images of healthy mango leaves are successfully classified as Healthy and 4 images is misclassified, 146 images are successfully classified as Powdery Mildew and 0 images are misclassified in other classes, 123 images are successfully classified as Sooty Mould disease and 31 images are misclassified in other classes.

Fig. 21
figure 21

Confusion matrix of the DTLD model (DenseNet)

Table 6 shows the performance of the DTLD model based on DenseNet model in terms of precision, recall, and F1 score. The result shows that the precision of Die Back is up to 98.56%, the recall of Anthracnose, Cutting Weevil and Die Back is high i.e., 1.0, and the F1 score is also high for the Cutting Weevil class up to 100%. Similarly, the precision of Powdery Mildew mango leaves is 99.32%, the low recall of Powdery Mildew is 73.00%, and the low F1 score of Sooty Mould is 87.86%.

Table 6 Precision, Recall and F1- Score of the DTLD Model (DenseNet121)

The DTLD model based on EfficientNetB2, which achieved testing accuracy of 99.84%, over dataset D2 is depicted in Fig. 22. Figure 22a shows that the training accuracy is gradually increasing, up to 99.28%; the overall testing accuracy is up to 99.84% which is better than previous one. Figure 22b shows that the loss of training and validation dataset is gradually decreasing from 1 epoch up to 50 epochs. At 50 epochs, the training loss is 7.79%. However, validation loss at 50 epochs is 1.71%. Result shows that EfficientNetB2 performs better at both datasets D1 and D2. It takes also less time for training and testing than others.

Fig. 22
figure 22

Training, validation accuracy and loss curve of the DTLD model (EfficientNetB2) over D2 dataset

The confusion matrix of the DTLD model based on EfficientNetB2 is used to evaluate the test classification efficiency (Fig. 23). It has been highlighted that 136 images are correctly identified as Anthracnose disease, zero image is misclassified in other classes, 151 images are correctly classified as Bacterial Canker, 0 images of mango leaves are misclassified in other classes, 145 images are detected correctly as Cutting Weevil and 0 images misclassified in other classes, 169 images of mango leaves correctly classified as Die Back and only 0 image is misclassified in other classes, 163 images of mango leaves successfully classified as Gall Midge and only one image misclassified in other classes, 161 images of healthy mango leaves are successfully classified as Healthy and 1 image is misclassified, 141 images are successfully classified as Powdery Mildew and 0 images are misclassified in other classes, 148 images are successfully classified as Sooty Mould disease and 0 image is misclassified in other classes.

Fig. 23
figure 23

Confusion matrix of the DTLD model (EfficientNetB2)

Table 7 shows the performance of the DTLD model based on the EfficientNetB2 model in terms of precision, recall, and F1 score. The result shows that the precision of Gall Midge is up to 99.39%, the precision of Anthracnose, Bacterial Canker, Cutting Weevil, Die Back, Powdery Mildew, and Sooty Mould is high i.e., 1.0, and the recall is also high for the Anthracnose, Bacterial Canker, Cutting Weevil, Die Back, Gall Midge, and Powdery Mildew class up to 100%. Similarly, the F1 score of Anthracnose, Bacterial Canker, Cutting Weevil, Die Back and Powdery Mildew mango leaves is 1.0, the low recall of Sooty Mould is 99.33%, and the low F1 Score of Sooty Mould is 99.66%.

Table 7 Precision, recall and F1-score of the DTLD Model (EfficientNetB2)

The DTLD model based on VGG16, which achieved an overall a testing accuracy of 98.91%, at the D2 dataset is depicted in Fig. 24. Figure 24a shows that the training accuracy is gradually increasing, up to 99.14%; the overall testing accuracy is up to 98.91%. Figure 24b shows that the loss of training and validation dataset is gradually decreasing from 1 epoch up to 50 epochs. At 50 epochs, the training loss is 8.61%. However, validation loss at 50 epochs is 3.82%.

Fig. 24
figure 24

Training, validation accuracy and loss curve of the DTLD model (VGG16) over the D2 dataset

The confusion matrix of the DTLD model based on VGG16 is used to evaluate the test classification efficiency (Fig. 25). It has been highlighted that 94 images are correctly identified as Anthracnose disease, zero images are misclassified in other classes, 108 images are correctly classified as Bacterial Canker, 0 images of mango leaves are misclassified in other classes, 92 images are detected correctly as Cutting Weevil and 0 images misclassified in other classes, 91 images of mango leaves correctly classified as Die Back and 0 image is misclassified in other classes, 99 images of mango leaves successfully classified as Gall Midge and 0 images misclassified in other classes, 114 images of healthy mango leaves are successfully classified as Healthy and 0 image is misclassified, 105 images are successfully classified as Powdery Mildew and only one images is misclassified in other classes, 96 images are successfully classified as Sooty Mould disease and 0 images are misclassified in other classes.

Fig. 25
figure 25

Confusion matrix of the DTLD model (VGG16)

Table 8 shows the performance of the DTLD model based on the VGG16 model in terms of precision, recall, and F1 score. The result shows that the precision of Powdery Mildew is up to 99.06%, the recall of Anthracnose, Bacterial Canker, Cutting Weevil, Die Back, Gall Midge, Healthy and Powdery Mildew is high i.e., 1.0, and the F1 score is also high for the Anthracnose, Bacterial Canker, Cutting Weevil, Die Back, Gall Midge, and Healthy class up to 100%. Similarly, the precision of Anthracnose, Bacterial Canker, Cutting Weevil, Die Back, Gall Midge, Healthy, and Sooty Mould mango leaves is 1.0, the low recall of Sooty Mould is 98.97%, and the low F1 Score of Powdery Mildew is 99.53%.

Table 8 Precision, recall and F1-score of the DTLD Model (VGG16)

5.3 Comparative analysis

In this section, the comparative analysis with other schemes is also analyzed. The accuracy of all DTLD models over the D2 dataset is shown in Fig. 26. We have also analyzed the accuracy of DTLD models over both the D1 and D2 datasets. The analysis over the D2 dataset shows that the DTLD model based on DenseNet121 represents DTLD1, achieves an accuracy of 87.25%, the DTLD model based on DenseNet201 represents DTLD2, achieves an accuracy of 92.28%, the DTLD model based on MobileNet represents DTLD3, achieves an accuracy of 93.28%, the DTLD model based on EfficientNetB2 represents DTLD4, achieves an accuracy of 99.84%, and the DTLD model based on VGG16 represents DTLD5, achieves an accuracy of 98.91%. The DTLD4 model achieves the highest accuracy among them and takes less time in training and testing than the DTLD5. It is also evident that the DTLD4 model shows consistently significant accuracy over both datasets and takes less time to complete training, validation, and testing. The proposed EfficientNetB2 deep transfer learning achieved up to 99.84 and 99.28% accuracy over the D2 and D1 datasets, respectively.

Fig. 26
figure 26

Comparative accuracy of all DTLD models on D1 & D2 datasets

To prove the efficacy of the proposed model, we have also compared it with the other existing models. Figure 27 shows the comparative analysis with the other 5 CNN models. Several deep learning models, including Trang et al.’s ResNet (Trang et al. 2019), Rao et al.’s AlexNet (Rao et al. 2021), Thaseentaj et al.’s customised deep CNN (Kumar et al. 2021), Singh et al.’s MCNN (Singh et al. 2019), A. Rajbonshi et al.’s DenseNet201 (Rajbongshi et al. 2021), and the recently proposed model, were employed to categorise the diseases using the dataset of mango leaves. The proposed model, well-known for its distinctive architecture and characteristics, distinguishes itself by offering superior performance on the target dataset in addition to quicker and more effective training. According to existing studies, ResNet (Trang et al. 2019), AlexNet (Rao et al. 2021), customised deep CNN (Kumar et al. 2021), MCNN (Singh et al. 2019), and DenseNet201 (Rajbongshi et al. 2021) had an accuracy of 88.46%, 89%, 93.34%, 97.13%, and 98%, respectively, and the proposed model achieved the highest accuracy of 99.84%. Additionally, the proposed DTLD model based on EfficientNetB2 has fared better than the other models in practically every evaluation criterion employed in this study because it takes less time in training and testing than others and also gives better result over D1 and D2 datasets. The application of this work is that the farmer, buyer, or seller may use the website or web app that will be developed to predict diseases and take precautions to save their yield from diseases. It would improve financial stability and aid in disease prevention.

Fig. 27
figure 27

Comparative analysis among other CNN models

6 Conclusion

Deep transfer learning driven model for mango leaf disease detection is presented in this work. Several algorithms, datasets, and validation techniques were used to train and evaluate the proposed optimized model. The data was separated into training and testing datasets after it had been prepared. Rectified linear unit, or ReLU for short, a non-linear activation function were used in the fully connected layer except the output layer. Utilizing the ReLU function has the advantage that not every neuron is stimulated simultaneously. The model’s performance is gradually improved during the training process by iteratively going over the entire training dataset. After being trained, the CNN can categorize previously unseen data. During the classification phase, Softmax activation is widely used to create probability distributions across classes for multi-class classification. The results showed that the effectiveness of the proposed DTLD model based on VGG16 achieved a greater accuracy rate. The proposed model is also complete, lightweight, optimized due to reduced parameters and layers. The effectiveness of deep transfer learning models heavily relies on the availability of large, annotated datasets. In the future, the work may enhance performance using data augmentation, tuning, and multiple preprocessing approaches and optimizers.