Keywords

1 Introduction

There are thousands of plants that can be found around us, many of which are utilised as medicines. Long-established methods for medications made from medicinal plants by our forefathers are used extensively within the pharmaceutical industry and research for which many lives have been saved. To this day, many indigenous people and populations employ medicinal plants. To date identifying plants successfully and correctly is done manually based on morphological characteristics (Begue et al. 2017). A professional botanist’s inherent knowledge is essential for recognising and discovering unknown new plants. People use their eyes, noses, hands or other human organs to assess the shape, colour, taste, and texture of the whole plant or individual parts (leaf, flower, fruit, or bark), and then they decide the species of medicinal plants based on either reference or experience (Kan et al. 2017). Nonetheless, manually identifying medicinal plants is time-consuming and difficult, and it is highly dependent on the person’s knowledge, experience, and skills about the plants. Hence, automatic classification of plants based on their characteristics is supported by many researchers researching this subject. Thanks to this many folks can now easily recognised medicinal plants. With the advancement of image processing and pattern recognition technologies, computer-based automatic image identification is now widely used in practice with the majority of them relying on machine learning techniques.

Machine learning is data analysis technique. It takes data as inputs, and it learns to spot or identify patterns before making a decision with little human intervention. It improves as a result of training. Machine learning is employed in a variety of fields, including medicine, finance, and many more. By attempting to automate the system, it decreases people’s workload.

Although the methods are very similar, the systems produced so far use a different number of steps to automate the process of automatic classification. These processes entail prepping the leaves collected, performing some pre-processing to identify their unique characteristics, classifying the leaves, building the database, training for identification, and finally evaluating the results. The most important parameter in recognition accuracy is the ability to detect similar things as well as distinguish between distinct types of objects.

2 Brief Literature Review

In paper (Begue et al. 2017), computer vision algorithm was employed to extract many shape-based properties from medicinal plants leaves. Machine learning methods are then used to classify 24 distinct plant species into their correct categories. The most accurate classifier was the Random Forest classifier which had a 90.1% accuracy.

In paper (Dudi and Rajesh 2019), a medicinal plant recognition based on CNN and machine learning was proposed, and 32 species and 1800 data were used from Flavia dataset. Both testing and training were conducted using the Flavia dataset. They employed ANN, SVM, KNN, and Naïve Bayes (NB) as well as other machine learning techniques. It was achieved with a 98% accuracy.

This research, (Kan et al. 2017) proposes a useful strategy for classifying 12 different types of medicinal plant leaf photos. After pre-processing of leaf photos, the shape and texture details of the plant leaves are retrieved. Secondly, the classification efficiency of numerous models such as the BPNN (BP neural network), probabilistic neural network (PNN), K-nearest neighbour classification algorithm, and SVM (support vector machine classifier) is examined in a comparison experiment. For the image classification of medicinal plant leaves, the suggested SVM classification approach based on both shape and texture data was effective and practicable.

It was proposed in this work (Venkataraman and Mangayarkarasi 2016) that a vision-based technique is used to construct an automated system that recognises plants. Even a man who is not familiar with this sector can grasp what plants he is seeing and their therapeutic properties thanks to the system’s design. The creation of the feature set, which is a key stage in recognising any plant species, is discussed in this work.

A leaf classification system based on the dual-path deep CNN is proposed in the publication (Shah et al. 2017) by Shah, Sougatta Singh et al. The remaining big operations will be carried out using the dual-path CNN.

  1. i.

    Both form and texture properties are examined and investigated.

  2. ii.

    The resulting features are optimised for categorisation.

It claimed that it outperformed other CNN approaches since the method they utilised had a good accuracy of around 99.28% (Flavia dataset).

References

Classifier

Accuracy (%)

Dataset

Training

Testing

Species

1

Random Forest

90.1

NM

NM

NM

24

2

ANN, KNN

98

1800

1400

320

32

3

SVM

93.3

1800

240

120

12

4

Non-classifier

*

5

*

*

1

5

Dual-Path CNN

99.28

6630

5500

800

29

3 Medicinal Plant Recognition

3.1 Classification Method

  1. (A)

    VGG16

VGG16 is a CNN model that has been pre-trained using images from the ImageNet dataset. VGG16 gets its name from the fact that it contains 16 layers with different weights. It’s a really huge network, with around 138 million (approximately) parameters. It has 3 × 3 filter convolutional layer with a stride of one, as well as an equivalent padding and max pool layer of 2 × 2 filter of stride 2. Throughout the architecture, the convolution and max pool layers are arranged in the same order. As illustrated in Fig. 1, VGG16 contains two FC (fully connected layers) for output, followed by a softmax. Each convolutional layer employs a fixed number of kernels with variable dimensions in each layer to conduct a simple convolution operation on the input pictures. The beginning layer of the convolution extracts edge and colour characteristics from the input images. The feature maps of the input photos are generated by these layers.

Fig. 1
A schematic representation of V G G 16 architecture has a 224 by 224 photograph of a road with a car. It is processed into different sizes of convolution plus R e L u layers, max-pooling layers, fully connected plus R e L u layers, and softmax layers.

VGG16 architecture

The amount, number, and size of the filters used during a single convolutional layer determine the size of the feature maps. The output of the last convolutional layer is the input to the first fully connected layer. The first two each have 4096 channels, whereas the third uses a 1000-way ILSVRC classification system and hence has 1000 channels. The softmax layer receives the output from the last FC layer. The classification is done via the softmax layer, which also calculates the probability values for each species. Rectification (ReLU) nonlinearity is present in all buried layers.

  1. (B)

    VGG19

In VGG19 pictures with a fixed size of (224 * 224), RGB image is sent to the network, indicating that the matrix is of the shape (224, 224, 3). The only pre-processing is the elimination of the mean RGB value from each pixel; it utilises kernels of (3 * 3) size with a stride size of 1 pixel, allowing it to span the whole image concept. To retain the image’s spatial resolution, spatial padding is utilised. Max pooling is done with two-pixel sweeps across 2 * 2 pixel window. After that, a rectified linear unit (ReLu) was employed to inject nonlinearity into the model in order to enhance classification and reduce processing time, since previous models depended on sigmoid functions, which had shown to be considerably superior. In the end, VGG19 implements three fully connected layers, the first two of which were of size 4096, followed by a layer with 1000 channels for 1000-way ILSVRC classification, and last, as seen in Fig. 2, a softmax function.

Fig. 2
A schematic representation of the V G G 19 architecture starts with convolution layers of depth 64 leading to a max pool layer, followed by 4 groups of convolution layers of depths 128, 256, 512, and 512 with a max pool layer in between every two. It ends with a softmax layer after a max pool layer.

VGG19 architecture

  1. (C)

    RESNET

Hundreds or thousands of convolutional layers are allowed to be used in Residual Network (ResNet); also ResNet is a Convolution Neural Network. As seen in Fig. 3, ResNet stacks identity mappings, or layers that don’t do anything at first, and passes over them, reusing the activations from preceding levels. Batch Normalisation is at the heart of ResNet. Batch Normalisation improves the network’s performance by adjusting the input layer. The issue of covariate shift has been solved. The Identity Connection is used by ResNet to prevent the network against vanishing gradient concerns.

Fig. 3
A schematic representation of the residual network represents multiple stacked convolutional layers in two groups labeled V G G 19 and res net 152. The right side has a feed-forward neural network that contains res net 152 output plus V G G 19 output.

ResNet architecture

4 Experimental Result

  1. (A)

    Data collection

A medicinal plant dataset (Mendeley dataset) is employed. Alpinia Galanga (Rasna), Citrus Limon (Lemon), Moringa Oleifera (Drumstick), and many other plant/herb species are included in the dataset. The dataset contains 1800 photos of 30 species. Each folder has 50–100 high-quality photos, and the folders are named by the species they contain. These datasets are then utilised to train and test the model. In the image, i.e. Figure 4, a sample data is shown.

Fig. 4
A group of four photographs of four different leaves. These include Brassica Juncea, Basella Alba, Moringa Oleifera, and Citrus Limon.

Sample images of Mendeley dataset

  1. (B)

    Experiment

Seventy per cent of the medicinal plant dataset is used for training, whereas 30% is used for testing. The training set contains approximately 1284 images, while the validation set contains approximately 551 images for 30 classes of Alpinia Galanga (Rasna), Amaranthus Viridis (Arive-Dantu), Artocarpus Heterophyllus (Jackfruit), Azadirachta Indica (Neem), Basella Alba (Basale), Brassica Juncea (Indian Mustard), Carissa Carandas (Karanda), Citrus Limon (Lemon), Ficus Auriculata (Roxburgh fig), Ficus Religiosa (Peepal Tree), Hibiscus Rosa-sinensis, Jasminum (Jasmine), Mangifera Indica (Mango), Mentha (Mint), Moringa Oleifera (Drumstick), Muntingia Calabura (Jamaica Cherry-Gasagase), Murraya Koenigii (Curry), Nerium Oleander (Oleander), Nyctanthes Arbor-tristis (Parijata), Ocimum Tenuiflorum (Tulsi), Piper Betle (Betel), Plectranthus Amboinicus (Mexican Mint), Pongamia Pinnata (Indian Beech), Psidium Guajava (Guava), Punica Granatum (Pomegranate), Santalum Album (Sandalwood), Syzygium Cumini (Jamun), Syzygium Jambos (Rose Apple), Tabernaemontana Divaricata (Crape Jasmine), Trigonella Foenum-graecum (Fenugreek). The dataset is then fed into the model, which is subsequently trained and assessed using several hyperparameters such as epoch, batch size, and optimiser.

The experiment is carried out using Google Colab on a machine with an Intel Core i3 processor, 8 GB of RAM, and 2 GB of AMD M3 graphics.

We aimed to evaluate the classification with different Convolutional Neural Networks. The results of the experiment are that VGG19 gains an accuracy of 0.9964; however, its loss accuracy is higher than that of VGG16. Meanwhile, RESNET 50 did perform well as it obtained an accuracy of 0.9951; however, due to its huge loss accuracy, it is not the suitable classifier for this experiment. Finally, we recognised that VGG16 performs the best when the optimiser is of ADAM, its epoch is 20 and its batch size is 64 with an accuracy of 0.9952. For evaluation, the results of the different CNN are given in Table 1.

Table 1 .

RESNET

Out of all the experiments in RESNET50, we find that opt = adam, epoch = 6, batch_size = 64 has the highest accuracy, i.e. 0.9951.

VGG16

Optimiser

Epoch

Batch_size

Loss

Accuracy

Val_loss

Val_acc

SGD

6

32

1.9052

0.4877

1.8079

0.3891

6

64

0.2303

0.9426

0.2501

0.9339

10

32

0.1398

0.9776

0.1686

0.9572

10

64

0.2728

0.9309

0.2265

0.9416

20

32

0.3474

0.9124

0.3055

0.9339

20

64

0.2019

0.9611

0.5116

0.8911

Adam

6

32

0.0828

0.9776

0.1253

0.9689

6

64

0.0549

0.9864

0.1198

0.9611

10

32

0.0776

0.9747

0.1851

0.9533

10

64

0.0093

0.9971

0.1147

0.9650

20

32

0.0290

0.9952

0.0749

0.9728

20

64

0.0129

0.9971

0.1524

0.9533

Out of all the experiments in VGG16, we find that opt = adam, epoch = 20, batch_size = 64 has the highest accuracy, i.e. 0.9952.

VGG19

Optimiser

Epoch

Batch_size

Loss

Accuracy

Val_loss

Val_acc

SGD

6

32

0.0546

0.9883

0.1148

0.9650

6

64

0.0273

0.9871

0.1128

0.9767

10

32

0.0062

0.8948

0.0635

0.9767

10

64

0.0100

0.9971

0.0774

0.9728

20

32

0.0147

0.9987

0.1808

0.9300

20

64

0.0363

0.9964

0.1805

0.9705

Adam

6

32

0.1512

0.9591

0.7713

0.8599

6

64

0.2317

0.9292

0.6767

0.8282

10

32

0.0202

0.9971

0.5598

0.9572

10

64

0.0192

0.9971

0.4418

0.9494

20

32

0.1231

0.9640

0.3721

0.9339

20

64

0.0772

0.9734

0.4292

0.8949

Out of all the experiments in VGG19, we find that opt = SGD, epoch = 10, batch_size = 32 has the highest accuracy, i.e. 0.9964.

From the graph Figs. 5 and 6, we know that VGG16 is the best model since its accuracy and val_accuracy are higher than the other models at an epoch of 20, batch size of 32, and using adam as its optimiser.

Fig. 5
A line graph depicts the different accuracy of the models with respect to different epochs and batch sizes. It plots the fluctuating trends of Res Net, V G G 16, and V G G 19.

Accuracy of the models at different epoch and batch size using adam optimiser

Fig. 6
A grouped bar graph plots the values of accuracy, validation accuracy, and loss. Accuracy has the highest value of 0.9971 for V G G 19.

Comparison of all the models at their best

5 Conclusion

In this work, we evaluated the different CNN models in classifying medicinal plants, and we recognised that out of all the models used VGG16 performed the best based on its accuracy, loss, validation loss, and validation accuracy. Besides CNN help in medicinal plant recognition to be applied in real life as it extracts the features on its own.

In the future, different state-of-the-art models could be used in the future to improve its accuracy. Furthermore, alternative machine learning algorithms can be applied, and their performance can be compared to the current performance of CNNs.