1 Introduction

The tomato plant is also called Solanum lycopersicum, which is rich in nutrition and the largest consumable vegetable plant in this world. Tomato is an edible plant that grows on well-drained soil. The majority of farmers or gardeners plant tomatoes in their fields because this is the primary source of income for them, especially for small-scale farmers. Around 3,50,000 area of land is covered for harvesting and production grows by 53,00,000 tons in India (Korav et al. 2020). However, this production sometimes decreases due to disease occurring on the leaves of the plants and a lack of knowledge among farmers regarding plant disease. At each stage of the growth, the diseases highly occur to a large extent. Some of the diseases that commonly affect the plant area are a late blight, bacterial spot, leaf mold, early blight, etc. Some of the reason for these effects on a plant is (1) abiotic factors such as environmental effects, nutritional excess, light, and temperature; (2) biotic factors such as fungus, virus, and bacteria; (3) pests such as worms, bugs, flies, etc. The properties that are changed in plants due to these factors are shape, color, texture, and form. It is possible to prevent a significant loss of yield by detecting these diseases (Eli-Chukwu 2019). As a result, these differences are difficult to discern, making early detection and treatment difficult. Early detection and treatment may help prevent some crop losses. Besides, the quality and quantity of the final agricultural product may be increased. Maintaining a manual list of all the symptoms and signs caused by diseases is relatively difficult in real-time. Also, controlling plants in a wide field necessitates a significant amount of manual labor (Patricio and Reider 2018).

As a part of agriculture, the identification of various diseases using naked eyes is a difficult task for farmers. With the arrival of various artificial intelligence (AI) techniques, several approaches have been recognized for the identification of disease in the early stage (Polder et al. 2019). With this AI technique, the deep learning technologies are an advantage because of feature extraction and recognition that work in the convolution neural network (CNN) model for automatic identification of tomato diseases. CNN model automatically extracts the features using weight values and connection between local values, great output had been shown for the identification of disease in the various plants such as banana, tomato, and cucumber (Liakos et al. 2018). The visual imagery is processed well by CNN. The artificial neural network (ANN) has a three-layer feed network with different layers such as “input”, “hidden”, and “output” layers. For completing a spatial connection of the data and images a set of hidden layers are synchronized and complete the task, such hidden layers are Convo, pooling, and fully connected layer with the normalizing layer (Chen et al. 2021). However, the recognition effect differs due to variations in the layout of various recognition models and the consistent intervention of the natural environment on the image of tomato leaves. Still, problems occur after studying many papers: (1) image quality, when the images are captured, the noise and extra undistorted data are shown in the image when input this image into CNN for classification. Different type of features is extracted as compared with original laboratory data. (2) Identification of disease at an early stage is difficult and shows the different results when capturing an image at different stage times. (3) The collection of a large dataset is not obvious, due to the natural environment; also the quantity of data is not possible. For solving a new issue related to the repurposing of a pre-trained CNN model a new approach named transfer learning is followed. Transfer learning trained the model by using the output of the model again as input (Kaur and Gautam 2021). As a result, the time duration for some models is high which are trained from the beginning as compared to the model whose performance has increased after this (Mishra and Gautam 2021).

Due to the lack of information and knowledge among farmers related to leaf disease (biotic and abiotic) in crops which is not visible through the naked eye. The detection of leaf disease at an early stage will help the farmers to cure plants/crops of diseases. With disease the morphological properties of the plant get changed which affects the growth of a plant. Thus, it is necessary to identify the texture feature of the leaf using deep learning techniques. This work has been implementing the proposed deep learning model Modified InceptionResNet V2 (MIR-V2) for the identification and detection of tomato leaf diseases. For the characterization of morphological properties of leaf plants, the work has focused on the CNN model which will provide better accuracy and efficiency in lesion detection. Furthermore, a large amount of data with different classes such as late blights, bacterial spots, leaf mold, and early blight is collected from the repository. With this stress, the identification of quantity and impact of yield loss due to these lesions are recognized.

This work has used the Modified InceptionResNet V2 (MIR-V2) with four different frameworks and achieved better accuracy than the existing InceptionResNet-V2 (IR-V2) model. The IR-V2 model is an assembly of pre-trained Inception V1 and ResNet V2 models and also has 162 Convo layers. The existing model has difficulty in discriminating the similar morphological properties of the images. With these properties, the model provides the average performance of the classification. Figure 1 shows the data flow for tomato disease characterization using transfer learning.

Fig. 1
figure 1

Dataflow of tomato disease characterization using transfer learning

The work shows a modified pre-trained CNN model called InceptionResNet-V2 that integrates information in the form of weights, which is then transferred for feature extraction utilizing the transfer learning approach. To increase the accuracy in diagnosing different forms of tomato leaf diseases, the deep learning-based model with transfer learning has been fine-tuned. The main contribution in this work is given such as

  1. 1.

    A novel modified CNN architecture is proposed with modified basic Inception ResNet V2 models for maintaining the knowledge of tomato leaf disease, the number of layers i.e., pooling is reduced, and the skip connection method is used to extract more information from lower layers. Algorithm 1 describes the complete steps.

  2. 2.

    To enhance the efficiency of InceptionResNet-V2 models, this work has presented a processing method based on robust principal component analysis (RPCA) and image enhancement for a likely tomato leaf image.

  3. 3.

    The proposed framework MIR-V2 achieves the best accuracy over the basic InceptionResNet-V2 model.

The flow of the article is: Sect. 2 describes the related work. Section 3 deals with materials and methods. Section 4 deals with the building model, and Sect. 5 explains the result and discussions. Finally, Sect. 6 concludes the work.

2 Related work

Controlling crop loss and increasing output requires the use of effective procedures to identify healthy and sick leaves. This section lists various deep-learning or artificial intelligence strategies for detecting plant diseases that are currently available. The authors used photos of tomato leaves to identify illnesses. It extracts features using histogram and geometric extractor from segmented diseased portions and classified using an SVM classifier with varying kernels. Kaur (2020) uses distinct color and texture traits techniques for the identification and detection of disease in soybean plant crops. The approach followed by the author is based on accuracy and better result. The accuracy achieved with color property is 92% this can be achieved better with more accurate images. Gao et al. (2018) identify fungal and bacterial infection with bacterial foraging-optimization-based radial-basis-function neural network (BRBFNN). The region-growing algorithm is used to extract information from a plant leaf based on seed points with similar qualities in their methods with an accuracy of 93.6%. Yasrab et al. (2021) have proposed a pre-trained CNN model for feature extraction with SVM as a classifier. The approach followed by the author gives a better result as the CNN can extract features with good quality and the performance is increased with the SVM classifier. The performance is calculated with the values of F1-score, precision, and accuracy of 93.4%. Sethy et al. (2020) presented pre-trained (AlexNet and VGG16) models to classify tomato crop diseases with image 13,262 and achieved 97.49% accuracy. Pellas et al. (2017) implemented a model for the recognition of plant leaf lesions in millet crops. The author used the transfer learning technique of the CNN model for classification and detection. The major approach using the transfer learning technique is that the output of one model will become the input of another model. The detection accuracy of 95% achieved with this model is better as compared with other models. Gehlot and Saini (2020) discussed AlexNet CNN pre-trained model using transfer learning for the classification of tomato leaf disease and achieved 95.75% accuracy. Suhet al. (2018) proposed a simple CNN model with 8 hidden layers for the identification of diseases in the tomato plant. This study shows a modified pre-trained CNN model called InceptionResNet-V2 that integrates information in the form of weights, which is then transferred for feature extraction utilizing the transfer learning approach. To increase the accuracy of diagnosing different forms of tomato leaf diseases, the AI-based model has been trained using fin-tuned. After many rotations, the model predicts the class of lesions with 98.30% training and 98.0% testing accuracy.

Ahmad et al. (2018) used a hybrid feature extraction principal component analysis technique called the whale-optimization algorithm. The features output is then fed to the DL classifier for classification of tomato plant leaf disease. Also, evaluate the result in terms of accuracy and severity rate with loss metrics. Artificial intelligence is the overarching domain in Deep Learning, in which machines are given the ability to learn new data instances and then adapt to the basic domain of Machine Learning. Gyamerah et al. (2019) compared the segmentation models with the random forest to find the disease detection part that affects crop production. The “Root Mean Square Error” and “Mean Absolute Error” detect the performance with RMSE value = 0.7. Maddikunta et al. (2021) discussed some remote sensing devices for capturing the data in a very efficient manner. Discuss some challenges related to devices, UAVs, and potential requirements suitable for agriculture. Santa Cruz (2021) used different CNN models for the identification of plant leaf disease. The model accuracy is found by changing the different parameters of the CNN models such as “batch size”, “number of epochs” and “dropout”. The accuracy achieved is 97% with Inception V3. Yin et al. (2020) discussed the classification technique for the recognition of the disease in hot pepper. Different eight CNN models such as VGG16, VGG19, ResNet50, and DenseNet121 are used to extract features from images. Different image features are extracted such as color and edges for finding the similarity-based index. The performance of different models is calculated at different levels, the highest accuracy achieved is 98.32% with the ResNet 50 model. Abbas, et al. (2021) for the detection of different plant diseases some of the deep learning models have been of great help. As the models depend on the number of data images, the size of the image, and the quality of images. Using the C-GAN model and pre-trained DenseNet121 model the accuracy of tomato leaf disease is identified which is 99.51% for 5-classes, 98.65% for 7-classes, and 97.11% for 10-classes. Chen et al. (2020a, b) discussed the detection of disease in plant leaves is necessary for the growth of a crop. Some of the lesions caused a disastrous impact on the safety and growth of the plant. For the automatic identification and detection of disease, a pre-trained VGG16 model on ImageNet is selected. For performance improvement, the accuracy achieved is 92.00% for predicting the disease in rice leaf plants. Malathi and Gopinath (2021) introduced a “Deep Learning” approach for the recognition of pests in the image of a plant leaf. Before detection of the pest in a plant leaf, data augmentation is carried out for clear detection of data. The accuracy achieved with fine-tuned “ResNet-50” model gives an accuracy of 95.012% which is better than other pre-defined CNN models. Chen et al. (2020a, b) combine the two different deep learning models Dense-Net with Inception module for the classification of rice leaf disease. The average accuracy of 94.07% is achieved with this model.

Kamath et al. (2019) implemented a re-train model for the classification of apple leaf disease with Inception V3. The disease reduces the growth of the plant crop so the identification and detection of the disease at an early stage will help the farmer protect the crop from these diseases. The approach follows is extracting the features first and then down-sampled with the variance technique and removing the unwanted data from an image. The accuracy achieved with the classifier is 97% after extracting the features. Xie et al. (2020) discussed the proposed Efficient Net model for the detection of disease in plant leaves. As the manual identification process is slow in terms of calculating or identifying the disease. Computer diagnosis makes this process more efficient and accurate classification is done with an accuracy of 99.01%. Raja et al. (2020) introduce an ensemble deep learning technology for the classification of citrus pests in plant leaves. The three levels of variety are taken into account: classifier level, feature level, and data-level diversity. Data augmentation is used in the training phase to enhance the number of training samples and improve classifier generalizability. The accuracy achieved after the tenfold cross-validation process is 99.0%. Nevavuori et al. (2019) discussed a technique for the identification of fungal disease, its amount, and the location in strawberry plant leaf. Different deep learning techniques are evaluated and optimized well for the detection of disease. For the identification of powdery mildew disease in the strawberry plant the accuracy achieved is 98.11% with the ResNet 50 model. Some of the authors have suggested different deep learning models such as ResNet 50 (Malathi and Gopinath 2021), Dense-Net by (Chen et al. 2020a, b), Efficient Net (Xie et al. 2020), Deep Ensemble Network by (Vallabhajosyula et al. 2022), and Inception V3 (Qiang et al. 2019)for the detection and classification of different leaf diseases. Some other authors suggested a modified deep learning model with a Gabor filter for the diagnosis of tumor formation in breast cancer (Ghoushchi et al. 2021), and some suggested case-based reasoning methods for the memory application analyzing and analysis by Khanmohammadi et al. (2022). Another suggested feature selection uses a random forest classifier on video games by Al-Asadi and Tasdemir (2021). Storey et al. (2022) discussed segmentation techniques for the detection of apple leaf diseases using Mask R-CNN. The task of object detection and segmentation is done using Mask R-CNN with different three backbones. The performance is evaluated in terms of accuracy which is achieved better with the ResNet-50 backbone. Hemalatha et al. (2022) proposed a convolutional neural network-based deep learning classification model for the training model in the identification of sugarcane leaf disease. Most of the diseases were undetected and led to huge losses to farmers. With this model, the accuracy achieved for the detection of disease is 96%.

In this literature, despite the use and usefulness of several CNNs for detecting crop diseases, no research has been done to show that they can be used to detect multi-class illnesses on tomato leaves. Developing a DL technique that could quickly assess the condition of multi-class illnesses in tomato fields would assist growers to know when targeted biotic stress applications are needed and reduce disease-scouting labor requirements. This paper demonstrates a modified InceptionResNet V2 model for the identification and detection of tomato leaf disease on a large dataset.

3 Materials and methods

This section has been describing the study and collection of a different class dataset of tomato leaf disease in Sect. 3.1, followed by pre-processing and augmentation, and enhancement of the tomato leaf disease dataset in Sects. 3.2 and 3.3.

3.1 Dataset description

A total of 9600 tomato leaf images are collected from the standard repository and ground truth, whereas 9025 images are from the Plantvillage and PlantDocdataset with \(256\times 256\times 3\) dimensions. This work also includes 575 images from a ground-truth dataset with a dimension of \(256\times 256\times 3\). All these datasets are of different classes of leaf diseases such as Leaf mold, Yellow leaf curl virus, Late blight, Septoria leaf spot, Target spot, Spider mites, mosaic virus, and a healthy image (Hughes and Salathe 2015). There is only one disease in each image as shown in Fig. 2. The validation and the testing sets contain 840 and 360 photos from each class of dataset, respectively. The dataset is partitioned into 70:30. Modified InceptionResNet-V2 was used to implement the classification function. Table 1 shows the total number of images used for implementing the task.

Fig. 2
figure 2

Sample of tomato leaf images

Table 1 Count of tomato leaf image dataset

3.2 Pre-processing and augmentation

It is a necessary step in formulating are input data for use in developing and training deep learning models, and it improves the model’s accuracy and efficiency. It aids in the improvement of data quality to assist in the mining of useful insights from the data (Kamath et al. 2019). Also did picture rescale and resize because all of the photos in the gathered dataset had RGB coefficients in the range of 0–255 and varying (h x w). All picture pixel values were rescaled to the range of 0–1 during the pre-processing phase, and all photos were resized into the shape of \(224\times 224\times 3\) pixels as shown in Fig. 3a.

Fig. 3
figure 3

a Image after pre-processing, b image after augmentation

The initial training picture dataset was expanded using the image augmentation technique. Some common manipulating techniques used in the augmentation process are shearing, rotation, vertical and horizontal flipping of images, and random zooming (Xie et al. 2020). Figure 3b shows the rotated image samples used in Modified InceptionResNet-V2 for the characterization task. This operation was carried out using Keras AI-based libraries of the deep learning Image Data Generator class. The complete flow of the process is given below in Fig. 4. All the steps are explained one by one in the next section.

Fig. 4
figure 4

Data flow of this work

The data flow in Fig. 4 shows the steps flow from data collection to accuracy of the diseased part. Firstly, the data is pre-processed and augmented from the original given image to resized image. Using the RPCA technique the images are enhanced. Then divide the data in a 7:3 ratio and then trained the data using the proposed CNN model. Both the proposed and pre-trained models are trained and tested on a given tomato leaf dataset. The final better accuracy with the proposed model is calculated and compared with the pre-trained model.

3.3 Enhancement of tomato leaf dataset

The tomato leaf image dataset consists of different intensity levels so it needs to improve the quality of the image using used robust principal component analysis (RPCA) method (Raja et al. 2020). This method used the original image, low-rank matrix, and sparse matrix images. The sparse matrix analysis has been expressed in Eq. (1). The primary structural information, as well as possible minor noise information, is contained in the general data matrix. The input matrix is decomposed into two matrices, one of which is a “low-rank” matrix (which contains the important basic data) and the second is a “sparse” matrix (which contains noisy data).

$$\underset{M,T}{\mathrm{min}} \, rank(p)+\beta {\Vert u\Vert }_{0}, ob{j}_{top}=p+u$$
(1)

where \(p\) represents the “low-rank” matrix and \(rank\left(p\right)\) calculates the “rank of a matrix”, \(u\) for “sparse matrix”, \({\Vert u\Vert }_{0}\) represents the “l0-norm” of u. The convertible Eq. (2) can recover low-rank and sparse parts with high probability:

$$\underset{M,T}{\mathrm{min}} \, rank(p)+\delta {\Vert u\Vert }_{1}, ob{j}_{top}=p+u$$
(2)

In Eq. (2), the main component of an image is represented using \(p\) which can further decompose the input image \({T}_{{RGB}_{noise}}\) to matrix value m i.e., “low-rank” and u as “sparse matrix”. Because the MIR-V2 area to be recognized in the tomato leaf picture had sparsely similar properties to the whole image in this investigation, pre-processing of the “RPCA” disintegration was done for crop field images. To improve the MIR-V2 value in the original tomato image, the proposed model combines the original data image with a sparse matrix and upgraded image with \(ob{j}_{top}=p+u\). This process is known as a sparse enhancement (SE) (Nevavuori et al. 2019). Figure 5 shows a few tomato leaf photos that have been enhanced.

Fig. 5
figure 5

Enhance tomato leaf image

4 Building model

One of the artificial neural network (ANN) based feed-forward networks is called CNN whose connection structure is modeled after the organization of the animal visual cortex. CNN’s are made up of “neurons” with active “weights” and “biases”. Each axon receives information, does a dot product, and, if desired, adds non-linearity to it. It’s nothing more than a simple neural network. The set of three layers such as pooling, dropout, and convolutional as one, two, and three with completely linked layers respectively precede the output layer with a total of ten layers. The image size is \(224\times 224\times 3\) pixels. The feature extraction is made up of a sequence of 3 Convo, pooling layers, and activation functions (Barth et al. 2019). The \(3\times 3\) filters with strides are used in convolution layers shown in Fig. 6. It is used to extract related information from an input image using filters with a set of weight parameters.

Fig. 6
figure 6

Basic architecture of InceptionNet

4.1 InceptionNet and ResNet V2

This work has modified the InceptionResNet-V2 model using the basic Inception and ResNet V2 architecture of CNN. A detailed description of the InceptionNet and ResNet V2 model is given below, which works on \(224\times 224\times 3\) as an input tomato leaf image. The change of InceptionNet architecture from the size of 32–92 with corresponding up-sampling and max-pooling layers are shown as (\(28\times 28\times 32\)) to \((28\times 28\times 192\)). The feature map of images has been reduced to restore an input image, and it has been revised as (\(28\times 28\times 64\)). Figure 6 depicts the architecture in which a total of 256 layers are used which contains up-sampling, max-pooling, and convolutional layers counted as 4, 4, and 248 respectively in the InceptionNet CNN model. With the help of the Convolutional layer a feature map for an image imported into InceptionNet (width × height × 3) and with standardization process, and using max-pooling the feature maps are reduced to a specified size for the up-sampling operation. As the feature maps are large as compared with the original input image, the de-convolved features are required. The result is forwarded to the softmax classifier, which looks at the pixels of every class (Tran et al. 2019). Figure 7 illustrates this CNN model as a ResNetV2. ResNetV2 was originally developed for biological picture segmentation, but it has proven to be effective in a variety of other fields.

Fig. 7
figure 7

Basic ResNet V2 architecture

There are two main reasons for use of this CNN model. Firstly, it can extract exhaustive features from local information through convolution layers. Secondly, it will provide the best accuracy for the limited number of samples (Greene and Groenendyk 2021). The classical ResNetV2model has large consumption of calculation resources and speed was slow, therefore the proposed CNN model has been simplifying these factors.

4.2 Modified InceptionResNetV2 (MIR-V2)

The proposed work uses the Residual network (ResNet), which won first place in the ILSVRC and COCO competitions in 2015. The most commonly used deep CNNs in transfer learning are “AlexNet”,“GoogLeNet”, “VGG”, and “InceptionV3”. Deep CNNs have some drawbacks, such as vanishing gradients and degradation issues (Szegedy et al. 2017). The residual connection is incorporated into the InceptionResNet-V2 deep convolutional neural network. It outperforms Inception-V3, Inception-V4, and InceptionResNet-V1 in terms of performance. The standard input size for this architecture is \(224\times 224\times 3\). The main skeleton of the model was based on the inception model. Pooling times are longer in the original conception model. Tiny targets’ low-level features will fade or possibly vanish. As a result, the number of pooling layers was initially lowered. On the other hand, to boost the detection rate, the model might be smaller and the loss of small targets minimized while preserving accuracy. The spatial information of the same level was connected upward in the up-sampling process on the bottom layer when InceptionNet skip connection (SC) was applied, and then the up-sampling was convolution (Le et al. 2020). Finally, as illustrated in Fig. 8, batch normalization was applied for data stability in each convolutional layer.

Fig. 8
figure 8

Layout of Modified InceptionResNet-V2

Based upon the InceptionResNet-V2 and the Skip Connection approach, this study offered four different models. The acronyms used in the proposed CNN model are listed in Table 2, where “Convolutional Layer” represented as CL utilizes a \(3\times 3\) kernel map for padding. The CL64 and CL128 represent 64 and 128 masks, respectively, while the kernel size of the outer layer is (1, 1). Furthermore, the sigmoid function was used to activate the data, which was set to 0–1 for binary problems. The following section introduces the full architecture, and Fig. 9 displays the entire model flow.

Table 2 Abbreviation used in proposed CNN model
Fig. 9
figure 9

Flow chart of proposed Modified InceptionResNet-V2 model

4.3 MIR-V2 algorithm for tomato leaf disease

The input color tomato \({T}_{RGB}\) (\(224\times 224\times 3\)) leaf image has been an enhancement of image using robust principal component analysis (RPCA) to check the noise (\({T}_{{RGB noise}}\)) and image quality from the original image (\({T}_{RGB}\)). The description is given below in Algorithm 1.

figure a

The input color tomato \({T}_{RGB}\left(224\times 224\times 3\right)\) leaf image has been an enhancement of image using robust principal component analysis (RPCA) to check the noise (\({T}_{{RGB}_{noise}}\)) and image quality from the original image (\({T}_{RGB}.\)). The rank and quality of the image can enhance by achieving \(\underset{M,T}{\mathrm{min}}rank(M)\). The few images (\({T}_{RGB}.\)) are already intensifying so no deed for enhancing the quality of an image. Initially pre-processed image on modifying max-pool layer of basic IR-V2 and update max pool layer where layer(l) is in 256. This work updates the last 3 max-pool layers of basic InceptionResNet-V2 (3MPL-IRV2) where input \(C64 \left(3\times 3\times 256\right)\) and remove one skip connection finally achieve \(C64 \left(3\times 3\times 255\right)\) after removing one layer skip the connection. This work checks the accuracy of the modified MIR-V2 algorithm and basic IR-V2, finally achieving the best accuracy.

4.4 Description of Modified Max Pooling in MIR-V2

4.4.1 3-Max pool layer InceptionResNet-V2 (3MPL-IR-V2)

To maintain the knowledge of microscopic targets, the number of layers is reduced which is directly based on the Inception model. As a result, little target data would be lost during pooling, and the number of weights needed to train the entire network may be reduced. Figure 10 shows the 3 max pool layer InceptionResNet-V2 architecture, which only has three layers of convolution after lowering the inception model. After each convolution, batch normalization was performed to ensure that the data was free of the gradient problem and that the activation function’s following layer was not rendered inactive or saturated. Max pooling has five layers, three of which were updated using the inception model and three of which were passed.

Fig. 10
figure 10

Basic architecture of 3 max pool layer of InceptionResNet-V2

4.4.2 3-Max pool layer with skip connection (3MPL-SC)

After that, the inception skip connection (SC) was imported using the 3-layer InceptionResNet network. The InceptionResNet pooling depth was lowered by two layers with the use of a skip connection (Papapanou et al. 2018). With this architecture, the detection capabilities enhanced the spatial information of the sample during up-sampling. Figure 11 shows the 3-layer skip-connection Inception-Res-Net network architecture.

Fig. 11
figure 11

3 Max pool layers skip connection (SC) InceptionResNet-V2

4.4.3 2-Max pool-layer InceptionResNet-V2 (2MPL-IR-V2)

As stated in the last section of the network model, with the use of a skip connection and an additional pooling layer the number of convolutions in the convolutional layer was limited. The purpose was to test if there was still some detection potential for leaf disease at a two-layer pooling degree shown in Fig. 12.

Fig. 12
figure 12

2 layer skip connection (SC) InceptionResNetV2 architecture

4.4.4 2-Max pool layer with skip connection (2MPL-SC)

Figure 13 depicts the bottom layer in the encoding step of the 2MPL-SC network model, which was utilized to extract feature information from low-level data in the hopes of improving detection efficiency.

Fig. 13
figure 13

2 Max pool layer skips connection (SC) InceptionResNet-V2 architecture

4.5 The loss function of MIR-V2

In a convolutional neural network, the loss function has been calculating the error rate of the model which has an estimate based on the prediction value of the confusion matrix. The minimum error rate has indicated a certain recognition capability for the classification of cross-entropy (Liu et al. 2016). However, the MIR-V2 algorithm accounts for few images and classifies the diseased part of the data, which means an imbalance image data problem for training the model. We employed the following loss function strategy with adaptable data weights for the detection of lesions.

4.5.1 Binary cross-entropy function (CEF)

The cross-entropy shows the loss function in CNN, which has been represented as CE (Bansal 2020), which is expressed as follows in (3):

$$BCE \left(j,\widehat{j}\right)= -(ilog\left(\widehat{j}\right)+\left(1-j\right)\mathrm{log}\left(1-\widehat{j}\right))$$
(3)

where \(j\) represents “ground truth” and \(\widehat{j}\) represents the result for the disease data category. The loss for a perfect classification model is zero, whereas the log loss of all samples represents the average of each sample loss. When estimating the loss for a small number of samples, if the sample amount is extremely uneven for the two categories of equal importance, it is likely to be ignored.

4.5.2 Weighted cross-entropy function (CEF)

Pan et al. (2019) introduced a novel method called the weighted cross-entropy function to solve the problem of an unequal sample number (CEF). To raise the weight of the tomato leaf entropy and the range of isotropy, WCE imports the hyper parameter \(\gamma\) is (1, 2). The Eq. (4) represents a WCE value as:

$$WCE\left(i,\widehat{i}\right)= -\left(\gamma \mathrm{log}\left(\widehat{i}\right)+\left(1-i\right)\mathrm{log}\left(1-\widehat{i}\right)\right)$$
(4)

4.5.3 Balanced cross-entropy function (CEF)

The BCE value is used for calculating the loss value using MIR-V2 weights (Li and Fine (2008)), and the weight of the IR-V2 and MIR-V2 tomato image data samples is repressed, as shown in (5):

$$BCE\left(i,\widehat{i}\right)=-\left(\beta \gamma \mathrm{log}\left(\widehat{i}\right)+(1-\beta )(1-\gamma )\mathrm{log}\left(1-\widehat{i})\right)\right)$$
(5)

5 Results and discussions

The models were tested and trained with 400 and 300 occurrences of a dataset. This section discusses the outcome of the modified pre-trained model.

5.1 Training data accuracy

The size of the raw image dataset is \(256\times 256\times 3\) which is then resized using pre-processing technique to \(224\times 224\times 3\). After pre-processing, for the training dataset the image window slide 70 pixels towards the right direction and 60 pixels towards the left side or to the left edge for avoiding the overlapping of an image. This process was repeated until the entire image was recorded. 70% of the image data was separated for training purposes, and each image of size \(224\times 224\times 3\) was analyzed separately. When you slide the window to the left edge, it would go down 70 pixels before returning to the left edge. The above procedure was repeated until the entire image of the tomato field was taken.

Each image was \(224\times 224\times 3\) and 30% of the images were divided from the test data. Model training was carried out after the separation of a dataset into training and testing data, as illustrated in Fig. 14. In the beginning, the model was used to train the training data and groundtruth. With the groundtruth of the testing data, the performance of detection of lesions is calculated with trained model parameters.

Fig. 14
figure 14

A detailed description of the hybrid convolutional neural network model for data execution

5.2 Evaluation of detection results

In this work, the methods which are used for measuring the accuracy at different points are “ROC” and “AUC”. One is the “receiver operating characteristic (ROC)” (Singh et al. 2022), which has sparked a lot of interest in a variety of signal processing and medical diagnosis applications. The ROC analysis measures the effectiveness of the lesion detection with true positive and false positive rate values. Next is the “area under curve (AUC)”, which is an alternative to ROC for the diagnosis of disease rate. This also diagnoses the curve of medical images and different types of real-life images. The ROC curve is used in image processing to assess the performance of the detector depicted in Fig. 15.

Fig. 15
figure 15

Measure the effectiveness of four classes of tomato leaf

The resulting images of the global and local target object detection results, using IR-V2, 3MPL-IR-V2, 3MPL-SC, 2MPL-IR-V2, and 2MPL-SC deep learning models. The proposed modified CNN model (MIR-V2) has achieved 98.92% accuracy. The complete detail is given below in Fig. 16 also in the form of Table 3.

Fig. 16
figure 16

Resulting images of the global and local target detection results IR-V2 and our proposed model

Table 3 Detection comparison of different classes in terms of accuracy

The modified MIR-V2 classify eight classes of leaf image with different disease such as Yellow_Leaf_Curl_Virus, Mosaic_Virus, Target_Spot, Spider_Mites, Septoria_Leaf_Spot, Leaf_Mold, Healthy, Late_Blight. Every class has been classifying image segments with different accuracy. The accuracy of the pre-trained deep learning model InceptionResNet-V2 is 94.69% with a recall value of 42.43%. After changing the number of layers or modifying the InceptionResNet-V2 the accuracy increases to 98.92%. The recall increases with the increase of precision from 75 to 85.27% with a recall of 80.85%. Figure 17 shows the training and validation accuracy of the MIR-V2 CNN model.

Fig. 17
figure 17

Training and validation accuracy of MIR-V2 CNN model (2MPL-SC)

5.3 Performance analysis

In this study, the outcomes of all methods were compiled into an extremely matrix notation with different measures such as: “Accuracy”, “Precision”, “Recall”, and “F1-score”. Accuracy is a measure of finding the accurate neural community with positive detection of the test. The precision is calculated with the degree of positive values to which a neural network efficiently identifies a goal with total positive values in Eq. (6).

$$Precision= \frac{TP}{TP+FP}$$
(6)

TP the true positive correctly predicts the tomato plant disease leaf pixels, false positives (FP) predicts the tomato leaf disease as non-damaged or healthy pixels, the false negatives (FN) are tomato leaf diseased pixels that are present in the actual truth area of leaf but not identified by an algorithm, and TN the true negatives correctly predicts the leaf diseases area. The recall value is calculated with the degree of positive values to which a neural network efficiently identifies a goal with the total value represented in Eq. (7):

$$Recall= \frac{TP}{TP+FN}$$
(7)

The mean of precision and recall value with overall value is calculated in terms of F1-score:

$$F1-Score= \frac{2\times Precision \times Recall}{Precision+Recall}$$
(8)

The “True positive” represents the classification of a healthy tomato leaf, “True negative” suggested a healthy leaf as a diseased plant leaf. The “False Negative” diagnoses the healthy leaf incorrectly as diseased and the “False Positive” incorrectly identified healthy leaf images in terms of F1-score as shown in Eq. (8). The complete description of the confusion matrix is given below in Fig. 18.

Fig. 18
figure 18

Confusion matrix of tomato leaf analysis

This work has proposed a CNN model (MIR-V2) and modifies 3 max pool layers with the enhanced layers such as 3MPL-IR-V2, 3MPL-SC, 2MPL-IR-V2, 2MPL-SC shown in Figs. 10, 11, 12 and 13 respectively. The accuracy achieved with the modified 2MPL-SC layer is the highest at 98.92%among all other layers. The precision value will increase with the increase of accuracy and drop once at layer 3MPL-IR-V2 shown in Fig. 19. The maximum precision value ranges to 87.43% before modification of the IR-V2 model. But after the modification, the model achieves a precision of 85.27% with the 2MPL-IR-V2 layer. Same with the recall value, before modification of the model the recall will increase from 42 to 82%. But after some more changes, the recall remains constant till the last layer.

Fig. 19
figure 19

Modified InceptionResNet-V2 evaluation graph

Table 4 shows the difference between pre-trained CNN models with modified pre-trained InceptionResNet V2. For the detection and classification of different leaf diseases, the CNN-based different pre-trained models were used. With the use of different models, only a small number of classes were used for the detection of diseases. The C-GAN with DenseNet121 (Abbas et al. 2021) used 2 classes of tomato leaf images and achieved an accuracy of 97.92%, the same as other CNN-based pre-trained models of deep learning the models achieved an accuracy of 92.00% and 95.01%. A model such as AlexNet (Ashwinkumar et al. 2022) shows the result with a smaller number of images and fast detection.

Table 4 Different pre-trained CNN models with an accuracy rate

The proposed model is significantly improved than an existing model; Table 4 has shown the comparison of a different model in terms of a dataset, model name, and accuracy. The proposed model has achieved an accuracy of 98.92%. The proposed model can be used for the diagnosis of medical images and other research areas. The proposed model structure description with several layers used in every model, the kernel size, and the map size is shown in Table 5. The proposed model with further four different frameworks has different kernel sizes.

Table 5 Network structure description

The model 2MPL-SC with a kernel size of \(3\times 3\) has achieved better accuracy in comparison with other models. In this paper, the Modified InceptionResNet-V2 (MIR-V2) is a form of CNN model that was used in conjunction with a transfer learning approach to recognize illnesses in images of tomato leaves. The proposed model is trained on an open and self-collected dataset that includes seven different tomato leaf disease classifications as well as healthy leaves. So it is important to prepare the efficient hybrid CNN model for recognizing different classes of illnesses in images of tomato leaves and achieve maximum accuracy with different disease classifications. It is also important to implement the model on a mixed crop dataset with fewer dropouts’ maximum epochs, batch size, and learning rate.

6 Conclusion

This observation presents a neural network-based technique for improving the evaluation of the leaves of different tomato plants. Tomato leaves included leaf mold, late blight, septoria leaf spot, spider mites, target spot, mosaic virus, yellow curl virus, and a healthy category were among the diseases, which included 9600 images. Based on the findings, it is summarized that with the transfer learning model the training becomes more mature as compared to training the model from the ground up. Furthermore, the next step of reducing characteristics will improve classification accuracy. For image processing and classification, algorithms have gained popularity. Dataset is cleaned by reducing the resolution to downgrades. The excessive accuracy and F-1 score have been achieved using the proposed Modified InceptionResNet-V2 (MIR-V2) CNN model of 98.92% and 97.94% respectively. The basic IR-V2 CNN model has been achieving 96.97% accuracy which has been providing less accuracy.The experimental results show that our suggested models outperform existing tomato leaf disease detection algorithms and state-of-the-art deep learning models in terms of overall performance. The dataset is divided into 80:20 based strategy on the outcomes. The fundamental drawback of this work is data availability, which is mitigated in part by incorporating a data augmentation stage. Similarly, selecting the most relevant features is critical; otherwise, the total classification accuracy may suffer. It did well in classifying grape leaf diseases, but the guaranteed/universal strategy for selecting the most valuable traits is still absent. The plant leaf disease dataset can be expanded by boosting plants’ diversity and the number of classification classes. Another future work is the classification of tomato leaf based on nutrient deficiency and disease that need to be implemented using the CNN model. Future studies will discover the maximum correct approach and combine the detection technique with a correct diagnosis technique.