1 Introduction

At present there is a rising demand of using automated recognition system. Like automated face recognition, hand written digit recognition, human identity recognition [1], plants can also be recognized automatically from leaf images. There exist millions of different plant species around the globe. The leaf of each species contains a unique pattern by which it is possible to classify them according to their species. Manual identification of plants from leaves require a lot of expertise in this domain just like the knowledge a botanist can have. Time is also a key factor which forces us shifting from manual to automatic identification. An automated leaf recognition system can wipe out the downside of this manual identification of the leaf. There are so many application areas where this type of model can be used including agriculture, forestry, environmental science, medicine, etc.

An automated recognition system can be built using the ML, Deep Learning (DL) [2], pattern analysis, image processing, etc. In any ML-based classification model there is a need of explicit feature extraction step [3, 4]. Through the use of different image processing techniques unique features presented in the image have been extracted to form the feature vector. Once this feature vector has been built, it is required to build a classifier and feed it with the values of the feature vector. This is traditional ML-based classification. But the feature selection and feature extraction are the most difficult steps in building such model because the accuracy of the classifier depends on it. In Convolutional Neural Network (CNN) [5, 6] which is a modern DL architecture, this explicit feature extraction step is not required [3, 7]. Feature extraction has been carried out by the different layers of the CNN through the series of convolution operation [3, 8].

In 2021, a research work was proposed by us [9], where a CNN model has been developed for automatic classification of plants from leaf images. Overall accuracy was 93.98% on Flavia dataset and 94.66% on Swedish dataset. In this present research work our main focus is to obtain optimal feature set for automated plant recognition system with higher accuracy. Different feature combinations have been used to find out which set of features can be used to obtain higher accuracy. Not only that, this feature set has also been used for different classifiers and the result remains almost consistent in each case.

2 Literature review

Because of its numerous application areas for automatic plant classification based on leaf images, it has got utmost importance among the researchers. Some relevant research work proposed by the researchers have been discussed in this section. An automated model for leaf recognition was proposed by Wu [10] et al. in 2007. Some geometric and morphological features like height, width, diameter, etc. were used as features. In order to reduce the dimensionality of the feature vector PCA was applied and by the use of Probabilistic Neural Network (PNN) they achieved 90.31% accuracy on the Flavia [10] dataset. Using the Scalar Invariant Fourier Transform (SIFT) algorithm in combination with contour-based edge detection Lavania et al. [11] also proposed a model for automated leaf recognition. They also used Flavia dataset to train and test their model. But the accuracy was only 87.5%. PNN was also used as a classifier by Hossain [12] et al. Some shape features of the leaf like area, perimeter, major axis, minor axis, etc. were used as their feature vector. This model was trained and tested on the same Flavia dataset with 91.41% accuracy. Prasad [13] et al. proposed a model for Mobile plant species classification system. K-Nearest Neighbour (KNN) classifier was used after extracting some geometric features and polar Fourier transformation. Accuracy of the model was 91.34%. Two models for identifying plant species from leaf pattern was proposed by Bao [14] et al. In one model Histogram of Oriented Gradients (HOG) features were extracted from the leaf images and classified by Support Vector Machine (SVM). Accuracy achieved on Flavia dataset was 92% and on Swedish dataset it was 98%. In the second model CNN was used as classifier. Accuracy was 95.5% and 98.22% on Flavia and Swedish dataset respectively. Using deep convolutional feature extraction models like MobileNetV2, VGG16, ResnetV2, and InceptionResnetV2 Hieu [15] et al. developed automatic plant identification models. These models were tested by SVM classifier. Best accuracy was 83.9% on MobileNetV2. Using multi-scale overlapped block LBP, Ren [16] et al. proposed a model for leaf image recognition. After extracting LBP features they achieved 96.67% accuracy on Swedish dataset using SVM classifier. A plant classification model was proposed by Siravenha [17] et al. from leaf texture using Discrete Wavelet Transform (DWT) and GLCM texture features. This model was tested on Flavia dataset using ANN classifier with 91.85% accuracy.

From the above discussion it has been found out that all of the models developed so far include some drawbacks. Accuracy should be more with less number of features. There is still a need of better models for automatic leaf recognition with higher accuracy and using less number of features so that execution time can be minimized. Hence there is a scope of further improvement in this domain. Our present study deals with these downsides and a better leaf recognition model has been proposed.

The rest of the paper has been crafted as follows. Section 3 deals with the system design view of our model. In Sect. 4 description of our proposed model has been given. Experimental setup has been discussed in Sect. 5. Outcome of our model, i.e the result has been analyzed in Sect. 6. Finally at the end conclusion has been drawn.

Fig. 1
figure 1

Use-case diagram of the proposed model

3 System design

In this section we are going to discuss briefly on the system design view of our model. The design of a system can be visualized through the Unified Modelling Language (UML) [18], which is based on the concept of Object Oriented Software Engineering (OOSE) [19]. The behaviour and structure of the system can be visualized through this UML diagram. There are different types of UML diagrams which represent the behavioural or structural information of the system. Among these, Use Case [20, 21] diagram has been used in this study to represent the dynamic behaviour of the system. The Use case diagram of our model has been depicted in Fig. 1.

Fig. 2
figure 2

Resulting images obtained from image preprocessing step

It has been observed from Fig.1 that in our system there exist two types of users - naive users and system expert. Let us discuss the working flow of our system. An image has been selected by the naive user and has been uploaded in the server. Image pre-processing and feature extraction have been carried out by the System expert. The model has been trained by the system expert from the existing database. After completion of the training part, our system can predict the output, i.e. the class of the leaf in which it belongs. This result has been sent back to the naive user.

4 Description of the proposed model

Following steps have been followed in order to build our model to classify leaves. These include Image Acquisition, Image Preprocessing, Feature Extraction and Classification.

4.1 Image acquisition

To build any classification model we need to collect data. These images have been collected from two well known dataset Flavia and Swedish. Flavia dataset contains 32 different varieties of species. There are 1906 leaves in this dataset. Swedish dataset is a collection of 1125 leaves with 15 different species.

4.2 Image pre-processing

All the images of the two datasets are color images. Before the feature extraction step there is a need to do some sort of image pre-processing to obtain accurate feature values. In this study Gray scale conversion of the images have been performed first. Equation 1 has been used to do this conversion.

$$\begin{aligned} G=c_1+c_2+c_3 \end{aligned}$$
(1)

where \(c_1=0.2989*R\), \(c_2=0.5870*G\) and \(c_3=0.1140*B\). Then a smoothing operation has been performed on these Gray scale images to reduce noise if any. Gaussian Filter [22] has been used for this purpose. In Fig. 2 some resulting images obtained from this preprocessing step have been shown.

4.3 Feature extraction

Feature extraction step is the most challenging in any machine learning-based classification model [3]. In our previous work as we used CNN, there was no need to do any explicit feature extraction. But here this step is required because traditional ML model has been used here for classification. Finding out the best combination of features to obtain better accuracy is quite difficult [3]. In this research work some texture features and region based shape features have been used in different combinations in order to obtain optimal feature set.

4.3.1 Texture feature

GLCM [17] and LBP [23] feature descriptors have been taken as texture features in this study. Depending on some properties like smoothness, roughness, etc. there exists some sort of pattern in each image. This is termed as texture of the image. It is a function of pixel intensity values which are arranged spatially in the image.

1. GLCM texture feature: Here we consider GLCM for texture analysis and based on which 5 Haralick [24] texture features out of 14 have been included in the feature vector. Dimension of GLCM is equal to the number of gray level of the quantized image. If there are \(L_G\) gray level, then the size of the GLCM will be of \(L_G\) * \(L_G\). Every cell of GLCM designated by \(g_{d,\theta }(p,q)\) contains a value which is equal to the number of times Gray level p and q appears together with an angle \(\theta \) (\(0^{\circ }\), \(45^{\circ }\), \(90^{\circ }\), \(135^{\circ }\)) and distance d. After constructing GLCM in 4 direction second order statistical features proposed by Haralick [24] have been computed.

i. Correlation: Dependency between a pixel and its neighbour can be measured through the correlation [17, 25].

ii. Contrast: An image is a collection of pixels that contains some intensity value. To measure the changes of the intensity value in terms of quantity we can use contrast [17, 25]. For a coarse texture contrast becomes high, whereas for an acute texture contrast is low. In order to compute the contrast of an image, equation 2 [17, 25] has been used.

$$\begin{aligned} CONT_{d,\theta }= \sum _{p=0}^{L_G-1}\sum _{q=0}^{L_G-1}(p-q)^{2}g_{d,\theta }(p,q) \end{aligned}$$
(2)

iii. Inverse difference moment: To determine whether the texture of an image is homogeneous or not, Inverse Difference Moment (IDM) [17, 25] is used. IDM measures the homogeneity in terms of similarity between pixels. Equation 3 [17, 25] describes the IDM of an image.

$$\begin{aligned} IDM_{d,\theta }=\sum _{p=1}^{L_G}\sum _{q=1}^{L_G}\frac{1}{1+|p-q|^{2}}g_{d,\theta }(p,q) \end{aligned}$$
(3)

iv. Entropy: To find out degree of randomness or non-uniformity of the gray level distribution, the term entropy [17, 25] is used in image texture analysis. Mathematical formulation of entropy has been given in equation 4 [17, 25].

$$\begin{aligned} ENT_{d,\theta }=-\sum _{p=0}^{L_G-1}\sum _{q=0}^{L_G-1}g_{d,\theta }(p,q)*\log g_{d,\theta }(p,q) \end{aligned}$$
(4)

v. Angular second moment: Degree of uniformity of the Gray level distribution of an image is called Angular Second Moment (ASM) [17]. ASM can be defined by the equation 5 [17].

$$\begin{aligned} ASM_{d,\theta }=\sum _{p=0}^{L_G-1}\sum _{q=0}^{L_G-1}g^2_{d,\theta }(p,q) \end{aligned}$$
(5)

2. Local binary pattern: In computer vision and image processing, LBP [23] is one one of the most popular texture feature descriptors. As the name implies LBP represents an image locally. In this study to obtain the LBP code, a 5*5 neighbourhood has been considered. Equation 6 [23] represents LBP in decimal form for a pixel at location \((m_c,n_c)\). The notation (P,R) represents P sampling points in a neighbourhood with a circle of radius R.

$$\begin{aligned} LBP_{P,R}(m_c,n_c)=\sum _{p=0}^{P-1}f(i_p-i_c)2^P \end{aligned}$$
(6)

in which \(i_p\) is the gray level values of P sampling pixels around the circle of radius R in the neighbourhood and \(i_c\) is the Gray level value of the central pixel in the neighbourhood. The function f(x) [23] has been defined as follows.

$$\begin{aligned}f(x)={\left\{ \begin{array}{ll} 1&{}x\ge 0\\ 0&{}x<0 \end{array}\right. } \end{aligned}$$

In this study, the value of (P,R) has been taken as (16,2). Figure 3 represents this circular neighbourhood with 16 sampling points and radius 2. Figure 4 depicts LBP of some sample images.

Fig. 3
figure 3

5*5 neighbourhood with 16 sample points and radius 2

Fig. 4
figure 4

Local Binary Pattern of some sample images

4.3.2 Hu invariant moment

The shape of an image can be represented by a feature descriptor called moment. Geometrical features of any image can be obtained by calculating the moment of that image. Equation 7 [26, 27] can be used to calculate image moment of order \((u+v)\).

$$\begin{aligned} MT_{uv}=\sum _M\sum _N M^u N^v I(M,N) \end{aligned}$$
(7)

in which \(u,v\in (0,1,.....)\). I(M,N) represents the pixel intensity value of any location (M,N). This I(M,N) is called as raw moment. To obtain the Central moment of an image the mean value has been subtracted from M and N. Mathematical formulation of central moment has been given in equ. 8 [26].

Table 1 Detail description of the Flavia [10] Dataset. It contains 1906 leaf images, out of which 80% have been taken for training and 20% for testing
Fig. 5
figure 5

Description of Swedish [28] Dataset. It contains 1125 images with 75 samples per species. Approximately, 80% have been used for training and 20% for testing

$$\begin{aligned} \alpha _{uv}=\sum _M\sum _N(M-{\bar{M}})^u (N-{\bar{N}})^v I(M,N) \end{aligned}$$
(8)

in which \({\bar{M}}=\frac{MT_{10}}{MT_{00}}\), \({\bar{N}}=\frac{MT_{01}}{MT_{00}}\). Normalize central moment which is scale invariant has been given in equ. 9 [26].

$$\begin{aligned} \delta _{uv}=\frac{\alpha _{uv}}{\alpha _{00}^p} \end{aligned}$$
(9)

in which \(p=\frac{u+v}{2}+1\) and \(\alpha _{00}=MT_{00}\). In this present research work, 7 set of transformation invariant moments proposed by Hu [27] have been taken. Equations 10 to 16 [25,26,27] have been used to compute these values.

$$\begin{aligned}&HM1=\delta _{20}+\delta _{02} \end{aligned}$$
(10)
$$\begin{aligned}&HM2=(\delta _{20}-\delta _{02})^2 + 4{\delta _{11}}^2 \end{aligned}$$
(11)
$$\begin{aligned}&HM3=(\delta _{30}-3\delta _{12})^2 + (3\delta _{21}-\delta _{03})^2 \end{aligned}$$
(12)
$$\begin{aligned}&HM4=(\delta _{30}+\delta _{12})^2 + (\delta _{21}+\delta _{03})^2 \end{aligned}$$
(13)
$$\begin{aligned}&\begin{aligned} HM5={}&(\delta _{30}-3\delta _{12})(\delta _{30}+\delta _{12})[(\delta _{30}+\delta _{12})^2 \\&- 3(\delta _{21}+\delta _{03})^2]+(3\delta _{21}-\delta _{03})(\delta _{21}+\delta _{03})\\&[3(\delta _{30}+\delta _{12})^2-(\delta _{21}+\delta _{03})^2] \end{aligned} \end{aligned}$$
(14)
$$\begin{aligned}&\begin{aligned} HM6={}&(\delta _{20}-\delta _{02})[(\delta _{30}+\delta _{12})^2-(\delta _{21}+\delta _{03})^2] \\&+4\delta _{11}(\delta _{30}+\delta _{12})(\delta _{21}+\delta _{03}) \end{aligned} \end{aligned}$$
(15)
$$\begin{aligned}&HM7={} (3\delta _{21}-\delta _{03})(\delta _{30}+\delta _{12})[(\delta _{30}+\delta _{12})^2 \nonumber \\&\qquad \qquad -3(\delta _{21}+\delta _{03})^2]+(3\delta _{12}-\delta _{30})(\delta _{21}+\delta _{03}) \nonumber \\&\qquad \qquad [3(\delta _{30}+\delta _{12})^2-(\delta _{21}+\delta _{03})^2] \end{aligned}$$
(16)

After the feature extraction step PCA [10] has been performed to reduce the dimensionality of the feature vector. These set of features are optimal to maintain a standard accuracy.

Table 2 Accuracy Details obtained from Flavia [10] dataset. Total 32 species have been identified. Precision, Recall and F1-score values range from 0 to 1. Higher values indicate better accuracy. The details of the abbreviations are given in Table 1
Table 3 Accuracy Details obtained from Swedish [28] dataset. Total 15 species have been identified. Precision, Recall and F1-score values range from 0 to 1. Higher values indicate better accuracy. The details of the abbreviations are given in Figure 5

4.4 Classification using ANN

After completion of the feature extraction step we need to choose a classifier for the classification task. In this study ANN, SVM and Random Forest (RF) have been used as classifiers. Accuracy obtained from these classifiers using different combination of feature set have been discussed in the result section.Here ANN consists of one input layer, one output layer and two hidden layers. In the input layer 26 neurons have been used as there are 26 features. In the two hidden layers 40 neurons have been used in each.Sigmoid activation function has been used in each intermediate layer and at the output layer softmax activation function has been used.

5 Experimental setup

5.1 Dataset

Two well-known datasets Flavia [10] and Swedish [28] have been taken in this study to classify leaves. These datasets have been widely used by most of the researchers who worked in this area. There are 32 different species of leaf images in the Flavia dataset. All of them are color images having dimension 1600*1200. The dataset contains 1906 leaf images in total. 80% of the data have been taken for training. The remaining 20% have been used to test the model. Details of this data set has been given in Table 1.

In Swedish [28] dataset there are 15 varieties of leaves each having 75 samples. The dataset contains 1125 sample images in total. All of them are color images having different dimension. Here also 80% of the data have been taken for training and the remaining 20% have been used for testing. Sample images of this dataset have been shown in Fig 5.

Table 4 Accuracy obtained from different combination of features using ANN in case of Flavia [10] dataset
Table 5 Accuracy obtained from different combination of features using SVM in case of Flavia [10] dataset
Table 6 Accuracy obtained from different combination of features using RF in case of Flavia [10] dataset
Table 7 Accuracy obtained from different combination of features using ANN in case of Swedish [28] dataset
Table 8 Accuracy obtained from different combination of features using SVM in case of Swedish [28] dataset
Table 9 Accuracy obtained from different combination of features using RF in case of Swedish [28] dataset
Table 10 Accuracy comparison of our proposed ANN model with other existing models that used Swedish [28] dataset

5.2 Standard indexes

After building the model it is required to measure the accuracy of the model through some performance measurement metrics. In most of the classification model precision, recall and f1-score [29] have been used to measure accuracy. In this study these three metrics have been used to compute the accuracy of the model. Equations 17,18, and 19 [29] have been used to express precision, recall and f1-score, respectively.

$$\begin{aligned}&{\textit{Precision}}=\frac{TS}{(TS+ FS)} \end{aligned}$$
(17)
$$\begin{aligned}&{\textit{Recall}}=\frac{TS}{(TS+ FR)} \end{aligned}$$
(18)
$$\begin{aligned}&F1\_{\textit{Score}}=\frac{2*({\textit{Precision}}*{\textit{Recall}})}{({\textit{Precision}}+{\textit{Recall}})} \end{aligned}$$
(19)

In Equation 17 and 18TS, FS and FR represent True Selection [29], False Selection [29] and False Rejection [29] respectively.

Fig. 6
figure 6

Confusion Matrix for Flavia dataset using our optimal feature set and ANN classifier [10]

Fig. 7
figure 7

Confusion Matrix for Swedish dataset using optimal feature set and ANN classifier [28]

Fig. 8
figure 8

Comparison of accuracy of our proposed ANN model with PNN-1 [10], PNN-2 [12], SIFT [11], KNN [13], ANN [17], HOG-SVM [14] and CNN model [9]

6 Results and discussions

In the previous section the dataset selection and the choice of performance measurement metrics has already been carried out. After setting all these things the outcome obtained from these two datasets have been discussed in this section.

In Table 4 it can be found out that using GLCM, Hu Invariant Moment and LBP individually, 62.30%, 61.25% and 89.79% accuracy have been achieved using ANN in case of Flavia data set. The accuracy has been increased significantly in combination of these features with each other. Maximum accuracy has been achieved using all these feature combinations. It has also been tested that accuracy has not been changed significantly by adding any other feature in this set. Using SVM and RF classifier also the same trend have been found out. Accuracy using different feature combination can be observed from Tables 5 and 6 using SVM and RF, respectively. Accuracy details for each species of the Flavia data set with precision, recall and f1 score using ANN classifier have been given in Table 2.

Similarly, in case of Swedish dataset it can be observed from table 7 that using GLCM, Hu Invariant Moment and LBP individually 53.33, 73.33 and 90.66% accuracy have been achieved, respectively. Using combination of these features with each other accuracy has been increased significantly. Finally 98.22% accuracy has been achieved using all these feature combinations using ANN classifier. Here also accuracy has not been increased significantly in addition of any other features in this set. Accuracy obtained using different feature combination for different classifiers have been given in Tables 8 and 9 . Details of the accuracy of each species for Swedish data set along with precision, recall and f1 score using ANN classifier have been given in Table 3.

Now we need to compare the accuracy of our model with other existing models along with our previous CNN based model [9]. The bar graph in Fig. 8 represents the accuracy obtained from different models where Flavia dataset has been used. It can be observed that a significant improvement in the overall accuracy has been achieved compared to our previous model. Not only that, the overall accuracy of our present model (95.54%) outperforms all other existing models shown here. For the Swedish dataset the accuracy comparison of our model with others has been given in Table 10 in which the references have been taken from [30]. Here also our model outperforms all other existing models shown here in terms of overall accuracy (98.22%). Confusion metrics corresponding to the highest accuracy achieved in our present study for Flavia and Swedish dataset have been shown in Figs 6 and 7, respectively.

7 Conclusion and future scope

The present study focuses on building an automated plant recognition system that can classify plants from leaf images using optimal feature combination. Two well-known datasets Flavia and Swedish have been taken to train and test our model. In both the datasets we achieve better result compared to our previous work. 95.54, 94.50 and 91.88% accuracy have been achieved in case of Flavia data set using ANN, SVM and RF respectively. Similarly, 98.22, 97.77 and 95.55% accuracy have been achieved in case of Swedish data set using ANN, SVM and RF respectively. In this present research work, a significant improvement in the overall accuracy has been achieved. Although we get better result compared to our previous work and other existing models, there is still a further scope of improvement. It has been observed that all the leaf images of the two datasets contain scan like images. So working with complex background leaf images will be a challenging task. Present research work focuses on spatial domain only. Working in frequency domain will also be an interesting research work.