Keywords

1 Introduction

Plants are necessary for all life on Earth because they supply food, oxygen, and medicines. To aid in the identification of new medicinal plant species in order to help the pharmaceutical business, the environment, and agricultural production and sustainability, a complete understanding of medical plants is essential. According to the Botanical Survey of India, India possesses 8000 medicinal plant species. India is the main supplier of medicinal herbs, with a global herbal trade of $120 billion. Medicinal plants are considered to be fairly safe to use with few or no side effects. The fact that these treatments are in harmony with nature is a great advantage. Medicinal plants provide a rich source of chemicals that can be used in the production of pharmacopoeia, non-pharmacy, and synthetic drugs. Changes in leaf characteristics, for example, are used by botanists and agronomists in their plant study as a comparison tool. This is because leaf traits in deciduous, annual plants, and evergreen perennials may be observed and analyzed throughout the year. Despite several attempts (including powerful computer vision algorithms), medicinal plant identification remains a challenging and unsolved problem in computer vision. India is a biologically diverse country. Several articles have suggested potential solutions to the problem. Medicinal plant identification has been advocated for medication development, drug research, and the preservation of rare medicinal plant species. The first component of this survey investigation looks at the current methods used by agronomists to identify medicinal plants. In Sects. 2 and 3, we classified some of the papers related to medicinal plant recognition by the type of data utilized to identify the medicinal plants, namely structured and unstructured data. For identifying the medicinal plants using structured data approaches such as SVM, extreme learning machine, artificial neural networks, K-nearest neighbors, and random forests [1,2,3,4,5,6] have been experimented for recognizing the medicinal leaves. Convolutional neural networks with various different architecture such as MobileNet, AlexNet, VGG-16 [7,8,9], and many other approaches have been tried for identifying the medicinal plants using unstructured data such as images. In this paper, we look at all the prior strategies and see how effective they were in overcoming this obstacle. The survey article's last section summarizes what we gained from reading all of that research, as well as how the methodologies have evolved over time.

2 Different Approaches Based on Data Type Format

2.1 Structured Data

Kan et al. [1] propose an automatic identification method using images of leaves of medicinal plant. The leaf is shot in color at first (Fig. 1).

Fig. 1
figure 1

(source classification of medicinal plant leaf image based on multi-feature extraction, p. 2)

Experiment flowchart

The images are sharpened and denoised after the petiole is removed. After grayscale conversion, images are binarized. Edges, texture properties, and shape features are all gathered. Threshold segmentation is used to separate the leaves from the background and get the contours of the leaves. Pixels with values greater than the threshold Q (Q = 0.7) are white, and the rest are black. From medicinal plant leaf images, the approach derives three form features (SF) as well as five texture characteristics (TF). Some of the form features are eccentricity, rectangleness, and circularity. Haralick et al. introduced GLCM as a well-known second-order statistical method for characterizing texture qualities of images. The main concept behind this approach is to create a matrix based on the probability that two pixels will meet a particular displacement relationship and the gray value of the image. The distance and angle between the two pixels define it. GLCM can be used to characterize the texture features of images of medicinal plant leaves. The five texture qualities are contrast, entropy, correlation, energy, and inverse difference moment. Invariant moments help to highlight an image's geographical differences. Moment invariant is a valuable indicator for defining picture aspects since it has traits like translation, rotation, and scale invariance. In this experiment, seven invariant moments are used to characterize the leaf shape characteristics of medicinal plants. The researchers utilized 240 leaf photographs from 12 different medicinal plants. After that, split each medicinal plant's 20 leaf photographs into two groups: 10 training and the rest 10 testing samples. Finally, SVM model was trained using 120 training examples before being utilized to categorize the 120 samples. Repeat the second and third steps several times to get the average recognition rate and classification result. BP network classifiers, PNN, and KNN classifiers were also used in the comparative experiments. The average recognition rate of all the four classifiers exceeds 70% when using only 10 SF functions, and the recognition rate of PNN is the highest at 78.3%. For all classifiers, the average identification rate surpasses 50% when just 5 texture characteristics (TF) are utilized, with the SVM classifier with the highest detection rate of 58.3%. When both texture and shape data (TF + SF) are used, the average detection rate of the four classifiers has improved significantly, and the SVM classifier has achieved a detection rate of 93.3%. This is a higher value than the other three classifiers. The findings show medicinal plants may be automatically categorized by combining multi-feature analysis of images of the leaves with SVM. This publication gives a valuable conceptual framework for the research and the development of classification models of medicinal plants.

The Flavia leaf dataset (which comprises 1907 photos of leaf belongs to 32 distinct plants) is utilized in Muammer Turkoglu and Davut Hanbay's research article [2], and the authors offer a method for extracting attributes using segmentation rather than collecting data from the full leaf. The overall image's features are then established by merging the information obtained from each piece. Examples include vein-based features, GLCM, color, and Fourier descriptors. These methods help to determine the characteristics of each leaf. The ELM approach was used to classify and test these feature parameters. Leaf pictures were preprocessed by removing the backdrop and converting RGB photographs to gray scale. To segment leaf pictures, a trial-and-error method of identifying suitable thresholding settings was applied. Erosion and filling processes were also used to decrease potential pixel blockages. The image data are normalized once the characteristics have been retrieved. ELM hidden layer input weights and biases start randomly and, unlike traditional networks, are stable throughout the process. The determination of output weights is done by least squares method. Seven leaf photographs were created from a 1907 leaf sample after being randomly selected prior to each intersection using a ten-fold cross-validation model. With this strategy, an average success rate of 98.36% was reached. A comparison between ANN and LS-SVM was done. The suggested approach minimized the influence of deformations on the identification and classification performance of certain plant leaves. Unlike earlier studies, the presented strategy solves problems and can be improved by adding more feature extraction methods.

In the experiment done by Yigit et al. [3], ANN, NB algorithm, random forest algorithm, KNN, and SVM were used for autonomous identification of leaves. They applied 637 leaves from 32 distinct plants were applied in this investigation. Image processing methods are used to extract the visual features of every leaf. Texture, pattern, dimension, and color are the four groupings of 22 VF. To explore the impact of these clusters on the classification recital, 15 alternative combinations of four groups were created. After that, the models are trained using data from 510 plant leaves, and their accurateness is calculated using data from 127 leaves. SVM model is the most accurate identifier for all-category combinations, according to the testing, with an accuracy of 92.91%. With an accurateness of 87.40%, a combination of D#6, C#6, and P#5 groups comes in second. Because employing the fewest number of VF is the most significant component of the classification process, the CFS method is utilized for selection of the 16 most effective VF for identification. The SVM model also delivers the greatest results for these 16 VF, with an accuracy of 94.49%. The efficiency of the proposed method is then put to the test in order to identify sick and defective leaves. As a result, 637 healthy leaves have been combined with 33 damaged or infected leaves. The SVM model recognized 536 randomly selected leaves for training and 134 leaves for testing with an accuracy of 92.53%, accounting for 80% of all leaves. The P# is the 5-feature group is demonstrated to be the most effective feature group quantitatively in this investigation. Furthermore, the edge Fourier transform feature has been shown to be the effective attribute in P#5 groups. The findings imply that, if AIT replicas were built and skilled appropriately, they may reliably identify plants even when samples are ill or damaged.

In their paper, Azadnia and Kheiralipour [4] offer an ANN for categorizing medicinal plants. Forty sites were chosen for the investigation, with four plant specimens obtained from each. Six collections of medicinal plants were collected and numbered and characterized from A1 to A6. The photos were preprocessed before the features were extracted. Preprocessing includes picture segmentation, which divides the sample photos into various segments and treats just the desirable portions of the images. Certain texture, shape, and color information were recovered following the classification of significant attributes. Before extracting the features, the established image processing approach created different color spaces for the photographs, such as I1I2I3, HIS, CrCgCb, and NRNGNB. To discover crucial color-based features, the skewness, kurtosis, mean, variance, and standard deviation metrics were extracted from the investigated medicinal plant photographs. Homogeneity, entropy, energy, and correlation are all texture-related metrics that have been quantified. After acquiring the GLCM of the pictures, the texture properties were assessed. Following feature extraction, the best features were picked for classification. The medicinal plants under investigation were identified using effective qualities gleaned from sample images. ANN is utilized for the categorization of the many types of medicinal plant samples. To analyze the ANN models, the obtained characteristics were classified into three parts: training, testing, and validation sets, each including 60, 20, and 20% of the dataset. The ANN models were assessed using some statistical metrics, such as the correlation coefficient of test data (r), CCR, and MSE. The results showed that the algorithm can properly categorize different types of medicinal plants. In agricultural commodities, texture, shape, and color extraction algorithms have been widely and successfully used for several applications, such as classification, identification, and so on. The design and implementation of robustness algorithms for color, texture, and shape extraction was a major contribution of the study. As a result, the integration of the suggested machine vision system has a lot of potential for categorizing and identifying various agricultural goods based on valuable texture, form, and color features.

In their paper, Pacifico et al. [5] report a mechanized characterization of restorative plant species in view of variety and surface elements. The first step was to collect images of a variety of medicinal plant species, which were subsequently processed. The first step in preprocessing was to eliminate the backdrop. Once the backdrop is removed, the RGB shots are converted to grayscale images. After that, the grayscale photographs are converted to binary images. After the image has been preprocessed, many color properties are extracted/calculated, including mean, standard deviation, skewness, and kurtosis. The gray-level co-occurrence matrix is used to compute textural properties for each leaf image segment (GLCM). The GLCM counts how many times certain gray-level combinations appear in a photograph. Weighted K-nearest neighbors (WKNNs), random forest (RFC), and a multilayer perceptron trained with backpropagation technique were used to classify medicinal plant species (MLP-BP). The trials employed a ten-fold cross-validation method. The scientists discovered that, in contrast to the other literature-based classifiers, RFC and MLP-BP can get the highest scores in all four-classification metrics, with MLP-BP doing marginally well than RFC. Furthermore, all KNN and WKNN slants were surpassed by the DT classifier. The authors also observed that when the k value increases, both KNN and WKNN perform worse. Both RFC and MLP-BP get the best results in all four stated classification metrics when compared to the other carefully chosen classifiers from the prose, with MLP-BP somewhat outperforming RFC. Furthermore, all KNN and WKNN slants were surpassed by the DT classifier. The authors observed that, when the k value increased, the performance of both KNN and WKNN dropped. Subsequent to leading a general assessment that considered both grouping measurements and normal execution time, the creators inferred that the arbitrary timberland classifier would be the most ideal decision among the chose classifiers for making a programmed order framework for the proposed informational collection.

Sharma et al. [6] study’s major goal was to use image processing techniques to identify the leaves. The writers acquired the photographs first, then utilized the equation to convert them to grayscale images.

$$ C_{{{\text{gray}}}} = 0.2989 * C_{{{\text{red}}}} + 0.8570 * C_{{{\text{green}}}} + 0.1140 * C_{{{\text{blue}}}} $$
(1)

where C is the channel, red, green, and blue address the different variety channels of the picture.

The distinctive properties of leaves help to identify them. The histogram of oriented gradients (HOGs) approach was used for this. The essential thought behind hoard is that the light power of privately circulated slopes is the most effective way to depict an article's shape and structure. The picture might be separated into cells that show parcels that are equally compared and isolated. These areas can be separated into wedges, which are then normalized to preserve consistency in the face of illumination or photometric changes. It is calculated to normalize extracted properties across blocks. The 900 criteria calculated include length, width, aspect ratio, perimeter, form factor, circularity, and compactness. K-nearest neighbors and artificial neural network backward propagation algorithms are the two supervised learning methodologies categorized in the final stage. K-nearest neighbors have a 92.3% accuracy rate. When the two methodologies were compared, artificial neural networks were found to be the better option, with a 97% accuracy.

2.2 Unstructured Data

In the article proposed by Varghese et al. [10], the system was created to detect the plants using transfer learning. The six medicinal plant groupings considered in the study were Coleus Aromaticus, Annona Muricata, Hemigraphis Colorata, Aloe Barbadensis Miller (Aloe vera), and Azadirachta Indica (neem). Each class had about 500 photographs. Once the features have been obtained, the model is trained using the MobileNet architecture. The component extraction method is done utilizing the convolutional brain organization, or CNN. Extra convolutions, for example, pooling layers, completely associated layers, and normalizing layers, which are alluded to as covered up layers since their feedback and result sources are hidden by the actuation work and the last convolution, are often trailed by a ReLU layer. Back engendering is as often as possible used in the last convolution in this method, bringing about a rising amount of exact weight. The input image is processed over several kernels in the convolution layer, apiece with its unique function. Color identification is handled by a few filters, while edge separation is handled by others. Likewise, each filter has a certain role. Lines, corners, and other essential parts will be discovered in early stages, and a full edge, for example, will be discovered in the final layers. The item's shape and other features will be revealed in the final layers. More features will be obtained as the number of convolutional layers increases, and these data will be arranged into feature maps. After each convolutional layer, the pooling stage is performed. For instance, a 2 * 2-pixel channel will slide multiple pixels in a steady progression, with the greatest impetus coming from those four pixels being saved in one pixel of the accompanying layer. This allows for the reduction of dimensions and a more complete examination of key traits. A totally associated layer has each neuron connected to each and every other neuron, and the result is gotten from the last layer. The preparation model is then changed over completely to TensorFlow light viable organization using a TensorFlow light converter. The plant name was the output of the label with the highest probability. The model was trained in two stages, each with 12 epochs each. The research work completed by these individuals was 95% accurate, which is comparable to human performance on the dataset.

Pechebovicz et al. [7] suggest a smartphone app to help universities, students who have never encountered the species, and healthcare professionals recognize Brazilian medicinal plants. They looked at a collection of 10,162 downloadable photographs that encompassed different stages of plant life, visual backdrops, and ecosystems. The Google Photos dataset was created using a download service that allows users to download photographs based on keywords or key phrases. They describe how the database was created using the Brazilian Service of Wellbeing's rundown of therapeutic and normal, unsafe plants. Data augmentation methods such as rotation, cropping, mirroring, random zooms, random distributions, and skewing were developed using Python's augmentor module. After the augmentation, the absolute number of photographs was augmented to 151,128 through 5230 in the hazardous group and a characteristic of 2000 in the other classes. In this work, the authors employed the Mobilenetv2 architecture, using beginning weights learnt from the ImageNet dataset with values between 1 and 224. The model has an aggregate of 2,629,032 params, with 371,048 teachable and 2,257,984 non-teachable. Simply by preparing the top layers of the Mobilenetv2 with the new classifier, the model was tweaked to increment execution. After the layers 101 through 155 were relaxed and the model was reinstructed, the model was retrained. This stage has 2,233,640 teachable and 395,392 non-teachable params. Several epoch sizes (5, 10, 20, 30) were used in both the training and fine-tuning phases, and a comparison was made. After 20 epochs of training, the final output is fine-tuned. The absolute approval exactness was 0.4254, though the main 5 precision was 0.685. In the implanted model in a cell phone, the loads from the last preparation and adjusting step are utilized.

Using deep features to characterize the original plant leaf image, Prasad and Singh [8] constructed a data interchanged from thing identification to plant hereditary research. The authors suggested utilizing VGG-16 to construct features, which were then condensed using PCA to appropriately display medicinal plant leaves for categorization (Fig. 2). During preprocessing, the leaf is photographed against a dark background using the recommended unique approach for obtaining leaf samples quickly and clearly. To create classification feature maps, the image was transformed to l color space and voted for through fully allied convolutional layers. The book also goes into how to capture good photographs. The photo capture requires a basic transparent triangular glass structure set aside over a black backdrop with LED illuminations to prevent the dispersion consequence. The camera head is situated opposite to the board with L distance to best catch the leaf wedged between the straightforward sheets. The distance between the camera and the leaf is L, and the point between the camera and the item plan is. To make it reliable, this paper utilizes = 90° and L = 1 m.

Fig. 2
figure 2

(source medicinal plant leaf information extraction using deep features, p. 2)

Proposed leaf capture setup block diagram

The image is in the RGB color space. This is a device-dependent color space that needs to be converted to a device-independent CIE-1 color space using equations.

$$ I\mathop{\longrightarrow}\limits^{xyz}I_{xyz} \mathop{\longrightarrow}\limits^{1\alpha \beta }I_{1\alpha \beta } $$
(2)

where I is the caught picture,

l is the gentility coefficient going from 0 (compares to unadulterated dark) to 100.

For any deliberate shade of power li, the directions (α, β) on a rectangular-coordinate framework opposite to the l hub at li find the variety credits (tint and immersion).

Since there is no immediate change among RGB and lαβ and we initially change RGB to XYZ variety space utilizing the condition:

$$ \begin{array}{*{20}c} x \\ {y } \\ z \\ \end{array} = \begin{array}{*{20}c} {{\text{R}} \times 0.4124 + {\text{G}} \times 0.3576 + {\text{B}} \times 0.1805} \\ {{\text{R}} \times 0.2126 + {\text{G}} \times 0.7152 + {\text{B}} \times 0.0722} \\ {{\text{R}} \times 0.0183 + {\text{G}} \times 0.1192 + {\text{B}} \times 0.9505} \\ \end{array} $$
(3)

where R, G, B are various channels of the picture.

$$ \begin{array}{*{20}c} l \\ \alpha \\ \beta \\ \end{array} = \begin{array}{*{20}c} {116* Y^{\prime} - 16} \\ {500*X^{\prime} - Y^{\prime}} \\ {200*Y^{\prime} - Z^{\prime}} \\ \end{array} $$
(4)

The gentility coefficient, l, goes from 0 (complete dark) to 100 (complete white) (analyzes to pure white). The variety qualities for each assessed shade of solidarity li are found utilizing the directions of focuses on a rectangular direction network that are opposite to the l hub at area li (shade and immersion). At the grid origin (= 0, 0), the color is achromatic, meaning that it does not show any color. The grayscale image has a range of colors, from black (the darkest color) to white (the lightest color), with positive suggesting reddish-purple hue, negative representing bluish-green hue, positive indicating yellow, and negative indicating blue. We must first convert RGB to XYZ color space because there is no direct conversion from RGB to l. After the picture has been transformed to a device-independent lab color space, the VGG-16 feature map was developed. This element map is re-projected to PCA subspace to work on the adequacy of species recognizable proof. To illustrate the sturdiness, the researchers used two sorts of plant leaf collections. The tests employed two distinct kinds of leaf datasets. The first is the straightforwardly open ICL leaf benchmark dataset. The learning rate is 0.0015. The results are derived from their articles and contrasted with our techniques. The authors’ accuracy percentage was 94.1%. The researchers devised two methods for recognizing plant leaves: l-VGG-16 and l-VGG-16+PCA, both of which were 94% accurate. This paper presents a better profound organization design for mechanized plant leaf species acknowledgment. A VGG-16 design in PCA space performs better using l variety space as an information. This study presents a novel 2+ layered catching instrument that takes into consideration the exact recording of shape and surface data with negligible leaf folds. Leaf data are separated using a multi-scaled profound organization, bringing about a rich component vector. At long last, the vector is decreased using PCA to lessen the grouping cost.

Naresh and Nagendraswamy [11] exhibited a representative way for perceiving plant leaves. Modified local binary patterns (MLBPs) have been proposed for retrieving textural information from plant leaves. Plant leaves of comparable species can have an assortment of surfaces concurring on their age, obtaining, and environment. Thus, the bunching idea is utilized to pick countless class delegates, and intra-group changes are recorded using the stretch esteemed type emblematic qualities. The mean and standard deviation of a 3×3 framework of pixels inside the picture are utilized as neighborhood limit boundaries. After that the whole picture is handled to make a changed LBP histogram, which is then used as a surface descriptor to describe the picture. The chi-square distance is used to compare unknown leaf samples of a species with the reference leaf provided in the knowledge base. The plant leaves’ textural qualities are retrieved first, following by emblematic portrayal. From that point onward, the matching strategy is utilized to sort the submitted test. To make categorization easier, a basic nearest neighbor classifier is utilized.

$$ {\text{LBP}}_{P,R} = \mathop \sum \limits_{i = 1}^{p} S*2^{i - 1} $$
(5)

where S = \(\left\{ {\begin{array}{*{20}c} 1 & { {\text{if}}\,\left( {Gi - Gc} \right) \ge 0} \\ 0 & {\text{ otherwise}} \\ \end{array} } \right.\)

P is the similarly divided pixels with sweep R from the middle pixel and where Gc and Gi are the dim upsides of the middle pixel and neighborhood pixel separately.

The essential LBP depends on a hard limit laid out by the dim worth distinction between the middle and adjoining pixels. Rather than using a hard threshold, the authors developed a strategy that takes into consideration the closeness of adjoining pixels to the mean () and standard deviation () of the whole area. The dark upsides of pixels in the quick area influence the upsides of the image, which is versatile in nature and addresses the underlying connection between the dim upsides of pixels in the prompt area. Let Gi be the dim worth of the encompassing pixels around the middle pixel Gc in a geography of a circle with range R and P number neighbors. The mean and the standard deviation of the dim upsides of the relative multitude of pixels are figured as follows:

$$ \mu = \frac{{\mathop \sum \nolimits_{i = 1}^{P} Gi + Gc}}{{\left( {P + 1} \right)}} $$
(6)
$$ \sigma = \sqrt {\frac{{\mathop \sum \nolimits_{i = 1}^{p} \left( {Gi - \mu } \right)^{2} + \left( {Gc - \mu } \right)^{2} }}{{\left( {P + 1} \right)}}} $$
(7)

where μ is the mean and σ is the standard deviation of the entire area.

The creators use progressive bunching to pick various class agents for each class by gathering comparable leaf tests in view of textural designs. The similarity score is calculated by comparing the test leaf sample to all of the knowledge base reference leaf samples. The chi-square distance was utilized to work out the disparity between reference leaf tests communicated as span esteemed type emblematic component vectors and a test leaf test with fresh element vectors. The researchers utilized a standard 1-closest neighbor arrangement technique to classify the given test leaf test as one of the known plant species. A range of datasets, including the UoM medical plant dataset, Flavia dataset, Foliage dataset, and Swedish leaf dataset, were used to evaluate the given approach. Overall exactitude for the suggested method was 98%. When compared to traditional crisp representation tactics, the nominated emblematic delegacy approach has been demonstrated to be effective in terms of decreasing the quantity of reference leaf tests expected to prepare the framework while holding grouping exactness. Other well-known classification approaches have been surpassed by the proposed classification system. Furthermore, the proposed method provides equal computational and storage efficiency when compared to the current systems.

Tan et al. [9] proposed D-Leaf, another CNN-based approach. To pre-process the leaf pictures and concentrate the characteristics, three unique convolutional brain organization (CNN) calculations were utilized: pre-prepared AlexNet, calibrated AlexNet, and D-Leaf. The support vector machine (SVM), artificial neural networks (ANNs), k-nearest neighbor (k-NN), Naive Bayes (NB), and CNN were the five AI techniques utilized to distinguish these properties. The leaf tests for this examination come from three unique areas at the College of Malaya in Kuala Lumpur, Malaysia. A total of 43 tropical tree species were collected, with each piece containing 30 samples. The leaf pictures were reproduced into square aspects, which our CNN models required as inputs. All of the pictures were first converted to gray scale from RGB. Sobel was used to isolate the area of interest from the photos. After division, the pictures were present handled and skeletonized to acquire a reasonable vascular engineering. CNN was used to retrieve the leaf's properties. The authors produced a model termed D-Leaf, which was compared to pre-trained AlexNet, fine-tuned AlexNet, and D-Leaf. The recommended model was built to extract characteristics from photographs rather than fine-tuning the AlexNet model (Fig. 3). This model contains three levels that are all connected. The 1290 neurons of the primary completely associated layer (FC4) and the second completely associated layer (FC2) accepted C3's result (FC5). The third completely associated layer (FC6), which had 43 neurons, mirrored the quantity of classes in the dataset. The D-Leaf model was prepared utilizing the stochastic inclination drop with energy approach with a clump size of 100. A preparation subset called mini-batch changes the weights and evaluates the loss function gradient, number of finishing focuses, number of branches, number of areoles, and different factors. The leaf region was utilized to process the thickness of veins, fanning destinations, end focuses, and areoles. The D-FC5 Leaf's layer qualities were then arranged utilizing five (5) distinct classifiers: SVM, ANN, k-NN, NB, and CNN. The D-Leaf with SVM had a precision of 82.75%, ANN 94.88%, KNN 82.44%, NB 81.86%, and CNN 79.03%. Cross-approval (CV) was used to forestall overfitting troubles on the grounds that the suggested dataset just included 30 examples for every species. The cross-endorsement process produces numerous miniature train-test parts utilizing the underlying preparation information. The information is divided into k subsets, known as folds, in regular k-overlap cross-endorsement. The technique is then prepared over and over on k-1 folds with the excess crease filling in as a test set. With cross-approval, the model is calibrated with various hyperparameters utilizing just your unique preparation information. This decreases overfitting by keeping your test set as a totally obscure dataset for picking your last model. The 5-overlay and 10-overlap CV were investigated utilizing the D-Leaf strategy. The accuracy of fivefold cross-validation was 93.15%, and tenfold cross-validation was 93.31%, respectively. Aside from the dataset, D-Leaf was evaluated on the Malaya Kew, Flavia, and Swedish datasets, with accuracy of 90.38, 94.63, and 98.09%, respectively. The scientists found that consolidating CNN with ANN which included feed forward brain network with a solitary secret layer which comprised of 80 neurons that utilized the default scaled conjugate gradient work and the accomplishment of a base inclination as the halting models produced the best outcomes.

Fig. 3
figure 3

(source deep learning for plant species classification using leaf vein morphometric, p. 6)

Architecture of AlexNet

In the Western Ghats of India, Sabu et al. [12] nominated a PC vision technique for comprehending ayurvedic restorative plant species. Picture catch, picture pre-handling, highlight extraction, and order are the four stages of the proposed framework. A 14.1 megapixel camera was used to take leaf photos. The picture maximum area boundary is then examined for preprocessing, which may correlate to the leaf area. This region is clipped off in step two of the preprocessing and captured as leaf pictures, which are then transformed to binary and morphological procedures are conducted. After that, all of the cropped photos were resized to the same height of 1200 pixels. The authors evaluated two techniques for feature extraction: accelerate vigorous element and histogram of arranged slopes. An assortment of interest focuses is first found from the picture utilizing the SURF highlight descriptor, then the strongest 20 are chosen, resulting in a SURF feature vector of 20 * 64 dimensions. The dimensionality of HoG features varies with picture resolution since every point in an image has an oriented gradient. The authors limited the HoG feature vector to a maximum of 100,000 values. The element vectors are then joined to make a solitary 101,280-pixel standardized highlight vector. SURF qualities are extricated using the indispensable picture idea, which is resolved using the accompanying condition.

$$ {\text{S(x,y) = }}\mathop \sum \limits_{i = 0}^{x} \mathop \sum \limits_{j = 0}^{y} I\left( {x,y} \right) $$
(8)

where S addresses the SURF highlights x, y addresses the directions of the picture and I addresses the picture.

In computer vision and picture handling, the histogram of arranged inclinations is a component descriptor for object location. This approach counts how often an inclination direction shows up in a specific segment of a photo. To increment precision, hoard is figured across a thick matrix of consistently separated cells using a covering neighborhood contrast standardization procedure. The dataset was then parted fifty, with one half used to prepare and the other half used to test the KNN classifier. The melded highlight vectors and relating class names were processed using the KNN classifier, and the results appear to be sufficient for constructing real-world apps. The presentation of the proposed framework was assessed using measurable techniques, including the computation of the metrics precision, recall, F-measure, and accuracy. When tested with a KNN classifier, the proposed technique achieves near-perfect accuracy by combining SURF and HOG features.

Dileep and Pournami [13] worked effectively of distinguishing therapeutic plants in light of leaf elements like shape, size, variety, and surface. The proposed dataset incorporates leaf tests from 40 therapeutic plants. Ayurleaf, the proposed own dataset, has 2400 pictures of in excess of 30 leaves from 40 different plant species. A comparative naming show is utilized to check each picture: plant species name followed by an interesting grouping number. Because specimen guarantees that the dataset is varied in terms of plant species, the algorithm may produce more accurate classification results. Using the GIMP image editor, just the leaf region is selected and cropped, and each image is saved in the jpg (Joint Photographic Experts Group) format, with the naming method used to label the images. They used RGB pictures, and if the photos weren't in the 227 × 227 × 3 format, they padded the image to make it N ×  N dimensions, then scaled the padded image to the 227 × 227 × 3 dimension. The Ayurleaf CNN framework is created and constructed using the AlexNet. The underlying layer, the information layer, decides the size of the info pictures. In the second layer of the convolution layer, 90 (7 × 7) channels with a step size of 2 were utilized. A ReLU layer limits the result, which is then trailed by a maximum pooling layer with a 2 × 2 channel size. This layer lessens the result size to around 50% of its unique size. After the maximum pooling layer, a subsequent convolution layer chips away at 512 pieces with a size of 5 × 5 and a step of 2. From that point forward, there is a ReLU layer at that point, a maximum pooling layer with a 3 × 3 channel size and step 1. The following two layers are consecutive convolution layers with the accompanying arrangements. The first and second use 3 × 3 portions with a step of 2, with 480 and 512 bits, separately. A ReLU layer and a maximum pooling layer with channel size 2 × 2 and step 2 are added after these layers. The result from the last max-pooling layer is gotten by the main completely associated layer, which involves 6144 neurons. The second altogether coupled layer comprised of 6144 neurons. The third completely associated layer has 40 neurons, which compares to the quantity of restorative plant groupings to be characterized. Lastly, the production of the third fully connected layer, which generates classification probabilities for each species, is sent to the SoftMax classification layer. The recommended CNN model performs four tasks: picture obtaining, picture preprocessing, include extraction, and arrangement. Picture capture using a flatbed scanner requires sampling the photos and guaranteeing image quality. For feature extraction, variable CNN models with varied numbers of layers, filters, and training parameters were created. The Ayurleaf dataset was used to compare the results of AlexNet, fine-tuned AlexNet, and the D-Leaf CNN model. On a training graph, they presented their final findings. The accuracy and loss functions have evolved through time, as seen in the graph. According to the study, the recommended technique AyurLeaf based on AlexNet has an average accuracy of 96% on the validation set.

In this review, Shah et al. [14] utilize a double-way profound convolutional brain organization (CNN) to learn shape-subordinate highlights and surface qualities. These paths eventually join to form a multilayer perceptron, which integrates balancing info on silhouette and quality to categorize leaf images more effectively. The leaf picture and the surface fix are made by the creators and contribution to the CNN model's two ways. The texture-patch input is made by increasing the original leaf photo by 2×, sharpening, and cutting the core section of the sprig to produce a 144,192-pixel reinforcement that substantially captures elements relevant to leaf consistency and venation. The CNN's element extraction part is comprised of convolutional and max-pooling layers. Each convolutional layer is trailed by a group standardization (BN) layer. With the exception of the last layer, the corrected straight unit is the enactment utilized in all stages (ReLU). On the way to create the leaf's joint shape–texture representation, the components gathered from both pathways are integrated. This composite representation is sent into a multilayer perceptron for classification. The images were improved to increase accuracy while avoiding overfitting. On datasets including Flavia, Leafsnap, and ImageClef, the model performed well, with accuracy of 99.28, 95.61, and 96.42%, respectively. They have also introduced the reduced shape context, a clever shape include that is computationally proficient and outperforms standard hand-crafted features, making it superior to most traditional approaches. As leaf classification techniques, they looked at dual-path CNN, uni-path CNN, texture-patch CNN, marginalized shape context with an SVM classifier, curvature histogram, and multiscale distance matrix. The proposed technique outperforms all others with an average accuracy of 97%.

Samuel Manoharan [15] recommends a two-stage framework for recognizing the restorative plants or spices. The initial step of the stage one includes edge-based natural plant recognition and order-based restorative plant discovery in the subsequent advance (Fig. 4). The creators have utilized different edge location strategies like Prewitt, Canny, Laplace, and Sobel edge detection activity for distinguishing the edge in the primary stage which thus helps in the quicker ID. Then, an information-based regulator has been picked the edge discovery administrator in view of the estimation, which can work on the exactness of the identification cycle. In the subsequent stage, input crude pictures are handled at first to further develop the picture lucidity. When the picture is preprocessed, the variety, shape, length, and width of the photos are utilized to separate the highlights. The chose highlights are removed utilizing chi-square strategy to work on the order. The CNN classifiers are then utilized for arrangement of information extricated pictures by leaf. The shape data are gone through the visual cortex by the form of the picture, and afterward different elements like shape, color, and measurement of passes on shipped off tertiary visual cortex. Later, CNN is utilized with numerous convolution channels to handle elements to acquire suitable component vectors for wonderful characterization. The factorization takes different boundaries to work on the viability of the model. The inception structure in the pooling layers is utilized to recognize the home grown plant leaf. The consequences of the Phase 1 and Phase 2 are contrasted and XOR door activity. The total distinguishing proof is reported once the indistinguishable outcome is gotten in both phase of ID, i.e., which follows the XOR door activity. In this review, 250 leaf tests with the front and back of the picture were utilized with 80% of the information being preparing information and 20% of the information being utilized as the test information. The different gatherings of the highlights of natural plant leaf structures are contrasted and given input plant leaves. The recommended approach considered by the creators doesn't represent the way that all leaves have a similar perspective proportion, centroid, and roundness. The writers additionally look at the different edge location procedures and recommend that Sobel operator is more exact when the length of the article (leaf) is thought of while canny edge operator performed well when broadness of the item (leaf) was thought of. The above concentrate on brought about precision of 92% outpassing the single classifiers, for example, CNN, canny edge detector, and image segmentation technique subsequently performing better compared to other proposed approaches.

Fig. 4
figure 4

(source flawless detection of herbal plant leaf by machine learning classifier through two stage authentication procedure, p. 131)

Phases 1 and 2 architecture

Comparison of different models for classification of medicinal plants is shown in Table 1.

Table 1 Comparison of different models for classification of medicinal plants

3 Conclusion

Identifying medicinal plants has a wide scope of moves that actually should be tended to. In this paper, we have distinguished a thorough rundown of improvements that can be made in recognizing the medicinal plants using deep learning methodologies. We have also discussed some popular methods implemented in overcoming these challenges in part or as a whole. The impact of identifying medicinal plants using technologies like deep learning will be big for agronomists, farmers, doctors, and the pharmaceutical industry. The applications once developed for identifying the medicinal plants are going to give the agronomists, pharma industry, and ayurvedic practitioners a fabulous chance to revolutionize the way the medicine industry works currently. Every one of the papers examined above have tracked down their own interesting approach to moving toward the issue, utilizing various kinds of information or learning models. As an outline, these techniques demonstrate that there are new potential ways of fostering a mechanized framework to distinguish therapeutic plants. These elements add to our objective recorded as a hard copy this review study, which is to reveal new responses and a hole on the lookout for per users to load up with their inventive and unique answers for this overall issue. According to our viewpoint of examining the exploration hole, we can obviously see that each of the previously mentioned approaches have drawbacks; however, the longing to find the better strategies is consistently the main impetus for us, and this study paper is doubtlessly a supportive device.