Keywords

1 Introduction

Herbal plants symbolize biodiversity, which is frequently employed as an alternative to conventional medicine. Almost every part of a herbal plant, notably the leaves, can be employed as a component in traditional medicine. This is thus because, in contrast to fruit and roots, leaves are simpler to acquire. Herbal plants play a crucial part in maintaining human health [1]. Given that medical treatment is not accessible to everyone and is expensive in comparison to medical treatment, nearly 80% of people still rely on traditional medicine [2]. For ages, people have employed herbal plants to prevent illness and treat it [3]. Due to the variety of therapeutic plant kinds [4] and the difficulty in differentiating between them, the public is currently unaware of the existence of herbal plants. To recognize and distinguish between these kinds of herbal plants, one needs significant expertise and information. It is important to conserve knowledge of herbal plants so that people can more easily identify the different kinds and use them when necessary [5]. In general, plant species are still identified manually by comparing and recognizing photos, particularly leaf photographs when the data is already known with the leaves on the plant. Manual identification, meanwhile, still leaves room for mistakes. This is due to the nearly identical leaf color and similar texture and shape of various varieties of herbal plants. Botanists and other individuals with specialized knowledge of herbal plants are the only ones qualified to present different varieties of herbal plants in this manner.

Since not everyone has the requisite knowledge and experience in this area, the community views this method of introducing herbal plants as ineffective for differentiating between the various types of herbal plants [6].

Numerous researchers have used technological advancements and identification of these plants’ leaves to conduct research on the classification of herbal plants. Herbal leaf identification is simpler since leaves are more important and accessible than roots [7], which are a portion of the plant that are buried in the ground [8]. Consequently, a system for intelligent and precise herbal leaf identification is required. Most of the earlier research has been devoted to identifying leaves. Numerous classification techniques are employed in the identification system to assist users in recognizing herbal leaves without specialized botanical or anatomical knowledge by identifying the types of herbal plants through leaf identification.

2 Related Work

Recently, research has been done to identify the various kinds of herbal leaves. The study by [9] selected therapeutic herbs in order to assess their content value. Computer vision-based feature extraction was used in this investigation. Using ITS2 Sequences and Multiplex-SCAR Markers, [10] authenticated herbal aralia plants carried out another experiment. A categorization of herbal leaves using the SVM method was also done in a different study [11]. Scale Invariant Feature Transform (SIFT) technology was used to extract the picture feature. To combat affine transformations, noise, and changing lighting, the SIFT functionality was deployed. Laws’ mask analysis and SVM as the classifier were also used to classify 5 different types of leaves [12]. The obtained accuracy is 90.27%. Using the CNN approach, the identification of Thai medicinal herbs was carried out in [13]. In this paper we are going to train our model on leaves of plants which are mainly main found in India. 10 types of herbal plants are considered for this approach. These are Apta: Bauhinia Racemosa, vad: Ficus Benghalensis, Indian Rubber Tree: Ficus Elastica Roxb, Ex Hornem, Karanj: Pongamia Pinnata, Kashid: Senna Siamea (lam.) Irwin & Barneby, Sita Ashok: Saraca Asoka (roxb.) Willd, Pimpal: Ficus Religiosa, Nilgiri: Eucalyptus Globulus, Sonmohar: Peltophorum Pterocarpum, Villayati Chinch: Pithecellobium Dulce [14]. The approach uses Resnet18, Resnet50, MobilnetV2 and MobilenetV3 convolutional neural network to build the models.

3 Architectures

In this paper, 3 architectures with 2 variations each are used. They include Resnet 18 with freeze, Resnet 18 without freeze, Resnet 50 with freeze, Resnet 50 without freeze.

3.1 Resnet 18

In the article “Deep Residual Learning for Image Recognition” by Kaiming He et al., ResNet-18, a convolutional neural network architecture, was introduced [15]. It is one of the most compact versions of the ResNet model family, which is renowned for its excellence in image recognition tasks.

18 layers make up the ResNet-18 architecture, containing 16 convolutional layers and 2 fully linked layers. The network can learn more effectively thanks to the architecture’s use of residual connections, which propagate information between the layers without causing information loss. In order to do this, skip connections that omit one or more network tiers are added.

The first layers of ResNet-18, such as the convolutional layers, batch normalization, and activation functions, are comparable to those of other convolutional neural networks. ResNet-18’s usage of residual blocks, which are made up of two or more convolutional layers and a skip connection that omits one or more of the layers in the block, is what makes it special.

Multiple image recognition tasks, such as classification, object detection, and segmentation, have been accomplished using ResNet-18. On numerous benchmark datasets, including ImageNet, CIFAR-10, and CIFAR-100, it has been demonstrated to produce state-of-the-art results. It is a well-liked option for applications where computer resources are scarce due to its tiny size and relatively low computational complexity.

3.2 Resnet 50

Convolutional neural network ResNet-50 was first presented in the article “Deep Residual Learning for Image Recognition” by Kaiming He et al. [16]. It is one of the more robust models in the ResNet family, which is renowned for its excellence in image recognition tasks.

50 layers make up the ResNet-50 architecture, containing 48 convolutional layers and 2 fully linked layers. It makes use of residual connections, which help the network learn more quickly by transferring knowledge without losing it as it moves through the layers. In order to do this, skip connections that omit one or more network tiers are added.

ResNet-50 is more sophisticated than ResNet-18 and can therefore learn more intricate features from the data. It can also learn more sophisticated representations of the input data because it has larger residual blocks, more convolutional layers, and filters.

Multiple image recognition tasks, such as classification, object detection, and segmentation, have been accomplished using ResNet-50. On numerous benchmark datasets, including ImageNet, COCO, and PASCAL VOC, it has been demonstrated to produce state-of-the-art results. It is a suitable option for activities requiring a high degree of precision and processing resources because of its size and computational complexity.

Overall, ResNet-50 is a potent and popular deep learning model that has shown promise in a variety of image identification applications.

3.3 Mobilenet V2

In 2018, Google unveiled MobileNetV2, a convolutional neural network architecture [17]. It is specifically made for embedded and mobile devices, and it is optimized for high precision and minimal computational expense.

By combining depthwise separable convolutions with linear bottlenecks to lower the network’s computing cost, MobileNetV2 builds on the original MobileNet architecture. In depthwise separable convolutions, each input channel is subjected to a single filter first, and then the output channels are combined using a 1x1 filter in a pointwise convolution. Comparatively speaking to conventional convolutions, this has fewer parameters and lower processing costs.

To boost the network’s nonlinearity without introducing new parameters, linear bottlenecks are used. They are made up of a 1x1 convolution followed by a ReLU activation function, a 3x3 depthwise separable convolution, and a final 1x1 convolution. By reducing the number of parameters and computational cost, the network can learn more intricate features.

To make better use of the network’s capacity, MobileNetV2 also has a feature known as inverted residuals. The linear bottleneck, expansion layer, and subsequent linear bottlenecks make up an inverted residual. The expansion layer expands the number of data channels. Due to this, the network can learn more complicated features without having to add more parameters or pay for more compute.

It has been demonstrated that MobileNetV2 can perform at the cutting edge on a range of image recognition tasks, including segmentation, object detection, and classification. It is frequently utilized in embedded and mobile applications when high accuracy is needed but computational resources are constrained.

4 Experimental Setup

The leaf dataset consists of 3000 images with 10 classes and each class has 300 images. We have trained our models using 70% of data and reserved 30% for validation. Before applying the model, preprocessing is performed on the dataset. At first, input images are scaled to 224 × 224, then these images are randomly flipped horizontally with a frequency of 0.5 when using the RandomHorizontalFlip transformation [18]. It indicates that there is a 50% likelihood that each image will be horizontally inverted. The image is horizontally mirrored yet the alteration does not alter the image’s content. Each pixel in the image is represented as a floating-point value between 0 and 1, with 0 denoting black and 1 denoting white, and it is converted from a NumPy array or PIL image object into a tensor object. Using mean and standard deviation values, the “Normalize” transformation normalizes the image’s pixel values. Based on the statistics of the dataset used for model training, the mean and standard deviation values are computed. The Normalize transformation ensures that the input data have similar statistical features to the data used to train the model by normalizing the input picture. This may aid in enhancing the model’s generalizability and accuracy. The dataset has 10 folders with name of the class and has 300 images of that class. By using split function, the data is divided into features and classes. In this model we have initialized the weights from IMAGENET1K_V1 [19]. A deep learning model’s pre-trained weights that have been learned on the IMAGENET1K_V1 dataset for image classification tasks are referred to as “IMAGENET1K_V1 weights.” When a new deep learning model is created and trained on a similar dataset, these weights are often utilized to set its model parameters. The learnt parameters of the deep learning model make up the weights, which are typically in the form of a sizable file. These parameters, which are determined during the training phase on the IMAGENET1K_V1 dataset, include the weights and biases of the various layers of the model. In order to prevent gradients from earlier iterations from interfering with the gradients calculated for the current batch of data, we first set all gradients to zero. All the network must be frozen here, except for the top layer. In order to prevent the gradients from being computed in backward(), we must set “requires_grad = False” to freeze the parameters. For the model’s optimization, stochastic gradient descent is applied. The model parameters are updated using tiny batches of data rather than the complete dataset at once, which is a variation on the gradient descent approach. When using SGD, the model parameters are adjusted in the direction of the loss function’s inverse gradient with respect to the parameters.

$$\theta =\theta -\eta .{\nabla }_{\theta }J(\theta ;{x}^{\left(i\right)};{y}^{\left(i\right)})$$
(1)

where, J(θ) is the objective function, \(\eta \) is learning rate, \(\nabla \) is the gradient of objective functions. Especially for big datasets, this speeds up and improves the computational efficiency of computing the gradient. One of the hyperparameters that controls how big of a step is taken in the direction of the negative gradient is learning rate. It is commonly set to a low value, like 0.01 or 0.001, and can be changed during training to enhance the performance of the model. In this paper, the learning rate is set to 0.001[20]. We have used LR schedular to adjust the learning rates. A learning rate scheduler is a method for modifying the learning rate during training to enhance the model’s performance [21]. A hyperparameter called learning rate regulates how big of a step the optimization method takes when updating parameters [22]. We have used 25 epochs to train the model. The loss of the model is calculated using Cross Entropy Loss [23]. To calculate the cross-entropy loss, the function combines the softmax function and the negative log likelihood loss function. The softmax function turns a set of logits, or unnormalized scores, from the model’s output into a probability distribution over the classes.

$$ f\left( s \right)_{i} = \frac{{e^{{S_{i} }} }}{{\mathop \sum \nolimits_{j}^{C} e^{{S_{j} }} }}{ }CE = - \Sigma_{i}^{c} t_{i} {\text{log}}\left( {f\left( s \right)_{i} } \right) $$
(2)

where, Si is input in the form of one hot encoded matrix, e is the exponent, C is number of classes, ti and Si are the ground through variables. The difference between the target class’s actual probability distribution and the anticipated probability distribution is then measured by the negative log likelihood loss function. The model is then connected to a fully connected network which has 10 outputs.

5 Results

By using the above setup, Resnet 18 architecture is used to build the model. The accuracy of model stands at 88.16% as in Fig. 1 (left). A variation of the above model is also built as shown in Fig. 1 (Right). In this case, freezing of the layers is not performed to block backpropagation. In this case the accuracy 94.16%. The same setup is used with Resnet 50 architecture as in Fig. 2. The accuracy of model with and without accuracy is 90.50% and 95.33% respectively. In case of MobileNet V2, we have not used bias and there is no fully connected layer. The accuracy of the model with freeze is 92.83%. In the same way, the accuracy is 93.66% without freezing of layers.

Fig. 1.
figure 1

(Left) Accuracy w.r.t to number of epochs using Resnet 18 architecture without freeze. (Right) Accuracy w.r.t to number of epochs using Resnet 18 architecture with freeze.

Fig. 2.
figure 2

(Left) Accuracy w.r.t to number of epochs using Resnet 50 architecture without freeze. (Right) Accuracy w.r.t to number of epochs using Resnet 50 architecture with freeze.

Fig. 3.
figure 3

(Left) Accuracy w.r.t to number of epochs using MobileNet-V2 architecture without freeze. (Right) Accuracy w.r.t to number of epochs using MobileNet-V2 architecture with freeze.

Figure 3 demonstrates the accuracy of MobileNet V2 w.r.t to number of epochs. The graphs demonstrating the models’ accuracy in relation to the number of epochs are shown above. We have always considered 20 epochs because overfitting has been shown above this. Resnet 50 without freezing has outperformed among all the models. The model’s output, which is displayed below in Fig. 4, uses plant identification based on leaves.

Fig. 4.
figure 4

Shows sample predictions made by the above architectures.

The performance of all models, both with and without freezing layers, is summarized in the table below with respect to the number of parameters, epochs, the optimizer, and accuracy (Table 1).

Table 1. Model performance w.r.t to number of parameters

6 Conclusion and Future Scope

We were able to identify 10 different kinds of leaves of plants having good medical properties. Most of the plants are used in Ayurveda for curing many diseases. Based on 3000 photos of medicinal plants, we developed 6 image classification models (Resnet-18 with freeze, Resnet-18 without freeze, Resnet-50 with freeze, Resnet-50 without freeze, MobileNet-V2 With freeze, MobileNet-V2 without freeze) for this study. The dataset of leave images is publicly available. We found that Resnet- 50 architecture without freeze layers has outperformed other models. We have used accuracy to measure the performance of the model and got 95.33% in this setting. By creating a mobile application and integrating more classes of leaves, this study can be furthered.