1 Introduction

Angiosperm are flowering plants with 416 families and 295,383 species [29]. Both angiosperms and gymnosperms [6] are seed-producing plants, Nevertheless, the former has distinctive features [26]: flowering organs, endosperm within the seeds, and producing fruits covering the seeds.

The flowering organs (i.e., flowers) are the most remarkable feature [3]. It provides the plant with a more species-specific breeding system [31], which guarantees a ready means into different species without crossing back into related species. The faster speciation ability make them adaptive to wider ranges of ecological niches [21].

Most of people cannot identify them due to the enormous species (over 250,000) [13]. They need to turn to specialists, read flower monographs, or search the internet, in order to identify the families, genera, and species of the flowers [2]. A feasible means is by computer vision based on a digital camera either in hand or in a mobile-phone [19]. Scholars have shown increasing interest in this field.

In the last decade, Saitoh, Aoki and Kaneko [28] focused the blooming flowers and defocused the background. They selected a route to extract the boundary. Their method is the combination of normalized cost (NC) and piecewise linear discriminant (PLD). Their aim is minimize the sum of local cost divided by the route length. Nilsback and Zisserman [22] developed a visual vocabulary (VV) method. They used the nearest neighbor classifier (NNC). Nilsback and Zisserman [23] computed three features: (i) hue-saturation-value (HSV), (ii) Scale-invariant feature transform (SIFT), and (iii) histogram of oriented gradient (HOG). Afterwards, they used support vector machine (SVM) as the classifier. Guru, Sharath and Manjunath [14] extracted features from gray-level co-occurrence matrix (GLCM) and Gabor filter response (GFR). They employed k-nearest neighbors (KNN) algorithm. Guru, Sharath Kumar and Manjunath [15] improved their methods later. They introduced a new color texture moment (CTM), and replaced KNN with probabilistic neural network (PNN). Cheng and Tan [8] combined 100 SIFT features, 40 pyramid histogram of oriented gradient (PHOG) features, and 64 color histogram features. They used sparse representation-based classifier (SRC). Sari and Suciati [30] took only a* and b* channel (ABC) in the L*a*b* color space. They also obtained texture features by segmentation-based fractal texture analysis (SFTA). They used kNN classifier with cosine distance. Vasudevan, Joshi, Shekokar, Kumar, Kumar and Guru [35] obtained the flower skeletons, and then used Delaunay triangulation (DT) to extract features. They used symbolic classifier (SC).

Meanwhile, there are some apps to identify flowers on either IOS or android platforms. For example, the “Plant Snapp”, “Like Thar Garden”, “Garden Answers Plant Identification”, “Flower Checker”, “Plantifier”, etc. Nevertheless, those apps are mainly based on color and shape features. For the petals with similar color and shape, those apps may perform bad.

This study proposed a new color feature – most abundant color index (MACI) and used a relatively new texture feature—fractional Fourier entropy (FRFE). This combined feature set showed promising result. In the classification stage, we used the single-hidden layer feedforward neural-network (SLFN), and used the weight decay technique to avoid overfitting.

The highlights of this study are composed of four points: (i) We proposed a hybrid feature combining MACI and FRFE, and showed the superiority of combined features than individual feature. (ii) We used weight decay for generalization, and our experiment showed the effectiveness of weight decay. (iii) Our method gives better performance than six state-of-the-art approaches and AlexNet. (iv) Grid searching was used to find the optimal parameters of our classifier.

The rest of the paper is described as follows: Section 2 contains the angiosperm dataset containing three genera as Hibiscus, Orchis, and Prunus. Section 3 provides the proposed most abundant color index and introduces fractional Fourier entropy. It also gives the background of classifier and weight decay. Section 4 provides the experiment results and discussions. Final Section 4.1 gives the concluding remarks.

2 Materials

We collected three genera of flower petals (Hibiscus, Orchis, and Prunus). Each contains 40 images. The flowers are collected in the nature in China. We put them under a glass over a piece of black cloth, making them spread all the way. The images were captured with a digital Canon EOS 80D camera. The CMOS image sensors can take 24.2 megapixels with 3:2 aspect ratio. A 1/8000 s shutter was set. Shutter lag time was 0.06 s.

There are different species within each genus. The scale, pose, and light illumination vary in each image. Original size of each image is 6000 × 4000. Region-growing method [27] was carried out to remove the background automatically. Original captured picture is too large and contains a massive of redundant spatial information. After removing the background, we placed the petal in the center, and resized the image size of 400 × 400.

Figure 1 shows the samples of the petal images. In the light of improving the generalization of the classification system, the petal images were captured in different rotations, angles, and illumination conditions. We did not carry out any scale normalization, pose normalization, or illumination compensation. The reason is we would like our classifier has the ability to resist scale change, pose variation, and illumination variation.

Fig. 1
figure 1

Samples of the petal image dataset: a Hibiscus, b Orchis, and c Prunus

3 Methodology

3.1 Most abundant color index

Color histogram (CH) [16] was employed in this study; nevertheless, it contains 64 features and most of them are close to zero [1]. Therefore, we proposed a new color feature scheme, which extracts the indexes of several most abundant color channels. This new scheme is named most abundant color index (MACI).

We firstly discretized the color space from the original 256 × 256 × 256 = 16,777,216 color space to 4 × 4 × 4 = 64 discrete color bins [36]. We counted pixel number in each of the 64 bins. Fig. 2(a) shows an original rose image in RGB space. Figure 2(b) shows the 64 discrete color bins. Figure 2(c) shows the color histogram of rose image. The y-axis denotes the number of pixels, and the x-axis denotes the 64 color bin index. Figure 2(d) shows the sorted color histogram. The x-axis now denotes the index. We can observe the five indexes are 53, 58, 57, 52, and 32, respectively.

Fig. 2
figure 2

Sorted color histogram of a rose image

The pseudocode of calculating MACI is listed in Algorithm 1. Here we choose five most abundant indexes by experiences. The advantages of MACI contain two points. First, it can extracts color information of any type of given color images. Second, it uses less features than color histogram.

Algorithm 1 Pseudocode of MACI.

figure c

3.2 Fractional fourier entropy

Suppose x(t) denotes a signal in time or spatial domain, then the corresponding fractional Fourier transform (abbreviated as FRFT) [7] is defined below:

$$ {F}_a(u)={\int}_{-\infty}^{\infty }x(t)\mathrm{\mathcal{L}}\left(t,u|a\right)\mathrm{d}t $$
(1)

here u represents the frequency domain [4] a the angle of FRFT ℒ is the transform kernel as:

$$ \mathrm{\mathcal{L}}\left(t,u|a\right)=\sqrt{1-j\cot a}\times \exp \left( j\pi \left({t}^2\cot a-2 ut\csc a+{u}^2\cot a\right)\right) $$
(2)

Here j represents the imaginary unit, and exp.(.) represents the exponential function. From basic mathematical knowledge, we know that if a is set the value of a multiple of π, then both “csc” and “cot” operators will diverge to infinity [10]. Using knowledge of limitation, we can transform Eq. (2) to

$$ \mathrm{\mathcal{L}}\left(t,u|a\right)=\left\{\begin{array}{c}\hfill \mathcal{C}\left(t-u\right)\hfill \\ {}\hfill \sqrt{1-j\cot a}\exp \left( j\pi \left({t}^2\cot a-2 ut\csc a+{u}^2\cot a\right)\right)\hfill \\ {}\hfill \mathcal{C}\left(t+u\right)\hfill \end{array}\right.\ \mathrm{if}\ \frac{a}{\pi}\left\{\begin{array}{c}\hfill =2k\hfill \\ {}\hfill \ne k\hfill \\ {}\hfill =\left(2k+1\right)\pi \hfill \end{array}\right. $$
(3)

where k represents an arbitrary integer, and C is the Dirac delta function.

An illustration of FRFT over the rectangular function is plotted in Fig. 3. The rectangular function r(t) is defined with

$$ r(t)=\left\{\begin{array}{c}\hfill 0\hfill \\ {}\hfill 1\hfill \\ {}\hfill 1/2\hfill \end{array}\right., if\mid t\mid \left\{\begin{array}{c}\hfill >\hfill \\ {}\hfill <\hfill \\ {}\hfill =\hfill \end{array}\right\}\frac{1}{2} $$
(4)
Fig. 3
figure 3

FRFT curve of r(t): a a = 0/10; b a = 2/10; c a = 4/10; d a = 6/10; e a = 8/10; f a = 10/10. (The red and blue lines denote the real and imaginary parts, respectively)

Here |.| represents the absolute value. The FRFT results of r(t) with angles a of 0/10, 2/10, 4/10, 6/10, 8/10, and 10/10 are shown in Fig. 3(a-f), respectively.

When FRFT extends to the two-dimensional situation, we have two angles: a for x-axis and denoted by b for y-axis. This combined angle vector (a, b) serve as the rotation angle for 2D–FRFT. In this study, we chose by experiences in total 36 different 2D–FRFTs as shown in Fig. 4. That means, we chose 36 angle vectors, i.e., (0, 0), (0, 0.2), …, (0, 1), (0.2, 0), (0.2, 0.2), …, (0.2, 1), …, (1, 0), (1, 0.2), …, (1,1).

Fig. 4
figure 4

36 different combinations with their angle vector (a, b) from 0 to 1 with step of 0.2

Yang, Sun, Dong, Liu and Yuan [37] proposed a novel image feature dubbed fractional Fourier entropy (FRFE), which is a combination of fractional Fourier transform with Shannon entropy. Sun [34] applied FRFE in creating an intelligent pathological brain detection system. Cattani and Rao [5] applied FRFE in tea category identification. Those literature all obtained promising results. In this study, we also employed FRFE by calculating Shannon entropy over the 36 FRFT spectrums. The pseudocode of FRFE is presented in Algorithm 2.

Algorithm 2 Fractional Fourier entropy.

figure d

3.3 Classifier

The 5 MACIs and 36 FRFEs are combined, and then submitted to a single-hidden layer feed-forward neural-network (SLFN). The number of hidden neuron is set to 15 by grid-searching method (See Section 4.9). Thus, a 41–15-3 SLFN was initialized with random weights and biases as shown in Fig. 5.

Fig. 5
figure 5

Structure of SLFN in petal classification

The SLFN is trained in supervised learning scheme [32]. Suppose the loss function is E(ω), where ω represents the weights and biases. The backpropagation (BP) learning algorithm [12] can be divided into two phases: (i) propagation and (ii) weight update.

In the propagation stage, we first forward propagated an input through the network and generated an output [25]. Afterwards, we generated the gradient descent by backward propagation [33]. In the weight update stage, the gradient is multiplied with a learning rate η, and the term is subtracted from current weight [11]. In mathematics, the BP can be written as

$$ {\omega}_k\leftarrow {\omega}_k-\eta \frac{\partial E}{\partial {\omega}_k} $$
(5)

The above procedure repeats until the performance of SLFN meets our termination requirement.

3.4 Weight decay

The “Weight Decay (WD)” [24] is a powerful regularization method that can reduce the test error and resist overfitting, at the expense of increasing training error. A weight decay factor λ was introduced, and the first term of Eq. (5) was multiplied with (1 - λ) as

$$ {\omega}_k\leftarrow \left(1-\lambda \right){\omega}_k-\eta \frac{\partial E}{\partial {\omega}_k} $$
(6)

where η represents the learning rate, E is the loss function, λ is the weight decay factor, and ω k represents the weights and biases at k-th step.

The weight decay term modified the learning rate [38], so the new algorithm shrinks the weight vector at each step, before performing the usual weight update in standard BP [20]. In this study, we chose the weight decay factor fixed, namely, unchanged during the training procedure. In addition, we found λ = 0.1 performed the best for our task.

3.5 Experiment design

We did not divide the dataset into training, validation, and test sets. The reason is that our dataset is small, so dividing will yield a smaller training dataset, which is not suitable for training. To get test error without reducing the size of training dataset [18], this study employed a ten-fold cross validation method.

We divide the dataset into ten folds. Each fold contains 4 Hibiscus images, 4 Orchis images, and 4 Prunus images. Then, we used nine folds for training, and the other fold for test. The performance of trained SLFN over test set was recorded. The above procedure repeated 10 times as shown in Fig. 6. In each time, different sets were chosen as test set [17]. Finally, the performance over each test set was combined and a final overall test performance was presented. To further reduce the variance on test set, we run the 10-fold cross validation 10 times.

Fig. 6
figure 6

10-fold cross validation (A-J are 10 folds of petal dataset)

The flowchart of our method is provided in Fig. 7. In addition, its pseudocode was described in Algorithm 3.

Fig. 7
figure 7

Flowchart of this proposed method

Algorithm 3 Pseudocode of our proposed system.

figure e

4 Results and discussions

4.1 Feature extraction

A petal image was used to extract both color feature and texture feature. Figure 8(a) shows the original petal image. Figure 8(b) shows the color histogram of this petal image. Figure 8(c) offers the five MACIs. Obviously, the color vector here is [0, 54, 58, 53, 59].

Fig. 8
figure 8

The MACI features of a petal image

Then, the texture feature was obtained. The FRFT of the petal image is shown below in Fig. 9. Here the arrangement is coherent with that in Fig. 4. That is, the left-upper subgraph represents the FRFT with (a, b) = (0, 0), and the right-button subgraph corresponds to FRFT with (a, b) = (1, 1). We can deduce that FRFT will provide more information than standard Fourier transform.

Fig. 9
figure 9

FRFT of the petal image

4.2 Performance without weight decay

In this experiment, we compared the “SLFN with WD model (SLFN + WD)” with “SLFN without WD”. The parameters were the same as previous Section: We run a 10 × 10-fold cross validation. The weight-decay factor λ was set as 0.1. The maximum iteration was 1000. The learning rate η was set as 0.01. The overall accuracy was used as the measure. The comparison results are listed in Table 1.

Table 1 With Weight Decay versus W/O Weight Decay

Table 1 shows that merely SLFN model obtains an overall accuracy of 95.50%; nevertheless, introducing WD can significantly increase the overall accuracy to 98.92%. The increase of 3.41% is under strict statistical analysis of 10 × 10-fold cross validation; hence, the improvement is meaningful. Connor, Hollensen, Krigolson and Trappenberg [9] presented Bayesian priors can be implemented in gradient descent as a form of weight decay; hence, we will study their connections in future.

4.3 Classifier comparison

We compared the proposed single-hidden layer feed-forward neural-network with weight decay with traditional classifiers, for example, the decision tree (DT), support vector machine (SVM), and Bayesian classifier (BC). λ was assigned to the value of 0.1, and number of hidden neurons was assigned to the value of 15. The comparison results are listed in Table 2 and Fig. 10.

Table 2 Classifier Comparison
Fig. 10
figure 10

Plot of overall accuracy comparison of four different classifiers

Here DT, SVM, and BC are not combined with WD, since their training algorithms have already taken overfitting into account. The results in Table 2 and Fig. 10 showed that DT, SVM, and BC yielded an overall accuracy of 96.67%, 96.92%, and 95.33%, respectively. Those three traditional classifiers perform worse than our proposed “SLFN + WD” classifier. The reason is two folds: First, the universal approximation theory guaranteed SLFN can approximate to any function. Second, the weight decay shows an excellent ability in resisting overfitting as in Section 4.2.

4.4 In-depth statistical analysis

Table 3 lists the results over 10 runs of 10-fold cross validation. Here x-y-z represents x, y, and z instances are classified correctly as Class 1 (Hibiscus), Class 2 (Orchis), and Class 3 (Prunus) respectively. x(y) represents x instances of all classes are recognized correctly out of y instances.

Table 3 Results over 10 runs of 10-fold cross validation

The results in Table 3 show the identification result of each fold in each run. As remembered, there are in total 40 petal images of Hibiscus, 40 of Orchis, and 40 of Prunus. Hence, we have 4 instance of each class in every fold. The 10-fold cross validation was repeated 10 times; hence, we identify correctly in total 396, 394, and 397 instances of Hibiscus, Orchis, and Prunus, respectively. The final averaged overall accuracy is 98.92%.

Remembering that there are different species within each genus, and the photographing conditions (scale, pose, and light illumination) vary in each image. Therefore, this result indicates our method is insensitive to changes of scale, pose, and illumination.

4.5 Comparison to state-of-the-art methods

We compared the proposed MACI + FRFE + SLFN-WD with six state-of-the-art methods: NC + PLD [28], VV + NNC [22], HSV + SIFT + HOG + SVM [23], CTM + GLCM + GFR + PNN [15], ABC + SFTA + KNN [30], and DT + SC [35]. The detailed results are shown in Fig. 11.

Fig. 11
figure 11

Comparison to state-of-the-art methods in terms of petal recognition. The accuracies are 91%, 81.3%, 72.8%, 79%, 73.63%, 93%, and 98.92%. (NC = normalized cost, PLD = piecewise linear discriminant; VV = visual vocabulary, NNC = nearest neighbor classifier, HSV = hue saturation value, SIFT = scale-invariant feature transform, HOG = histogram of oriented gradient, SVM = support vector machine, CTM = color texture moment, GLCM = gray level co-occurrence matrix, GFR = Gabor filter response, PNN = probabilistic neural network, ABC = a* and b* channels, SFTA = segmentation-based fractal texture analysis, KNN = k-nearest neighbors, DT = Delaunay triangulation, SC = symbolic classifier)

Here NC + PLD [28] yields an accuracy of 91%, VV + NNC [22] yields an accuracy of 81.3%, HSV + SIFT + HOG + SVM [23] yields an accuracy of 72.8%, CTM + GLCM + GFR + PNN [15] yields an accuracy of 79%, ABC + SFTA + KNN [30] yields an accuracy of 73.63%, DT + SC [35] yields an accuracy of 93%. In addition, our method outperforms other six methods with an accuracy of 98.92%. This suggests the effectiveness of our proposed MACI and FRFE methods.

Feature is also an important indicate, which measures the efficiency of feature extraction. NC + PLD [28] extracted in total 10 features, ABC + SFTA + KNN [30] extracted 58 features, and other literature did not report the number of features. In contrary, our method only used in total 41 features. This shows the size of our features is moderate. It may be reduced in further studies.

4.6 Comparison to AlexNet

In this experiment, we compared our method with AlexNet [39], which is a well-pretrained 25-layer neural network in the field of deep learning. The AlexNet model in Matlab is trained on a subset of ImageNet database, and it can classify 1000 object categories (for instance, pencil, mouse, keyboard, etc.). We invoked the model by Matlab command of “alexnet”, and compared it with our method. The parameter settings were the same as previous experiments. The comparison results are presented in Table 4.

Table 4 Comparison to AlexNet

Here we see that AlexNet [39] gives an overall accuracy of only 96.08%, less than our “MACI + FRFE + SLFN-WD” method of 98.92%. The reason is three folds. First, AlexNet [39] can identify 1000 types of objects, but they are not trained particularly for petal identification. Second, the input size of AlexNet [39] is 227 × 227 × 3, we need to resize the original image to 227 × 227, and this low-resolution may impair the information contained in original image. Hence, our method can give better performance than AlexNet.

4.7 Analysis on combined features

The combined features include 5 most abundant color index (MACI) values and 36 fractional Fourier entropy (FRFE) values. In this experiment, we compared the combined feature vector (41 features) with two individual feature vectors: (i) 5 MACIs; (ii) 36 FRFEs. We used SLFN as the classifier and weight decay as the regularization methods. The statistical analysis described in Section 3.5 was used. The results are listed below in Table 5.

Table 5 Combined feature vector versus individual feature vector

The comparison results in Table 5 show that the overall accuracy is only 95.25% when we only used 5 MACIs, and the overall accuracy is 97.33% when we only used 97.33%. Nevertheless, if we combined the two feature sets, the combined features yield an overall accuracy of 98.92%. This result validates the effectiveness of our proposed combined feature vector.

4.8 Optimal weight decay factor

In order to obtain the optimal weight decay factor, here we run a 10 × 10-fold cross validation. The weight decay factor λ was chosen as [0.05, 0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4]. The maximum iteration was 1000. The learning rate η was set as 0.01. The overall accuracy was used as the measure. Figure 12 plots the curve between overall accuracy versus the factor λ.

Fig. 12
figure 12

Optimal Weight Decay Factor

From the curve in Fig. 12, we can observe that the overall accuracy achieves the highest at the optimal weight decay factor with λ of 0.1. Besides, we see the accuracy has a decreasing trend when λ increases. The reason is when λ is large, the weight update term, viz., the first term in Eq. (6), cannot preserve the update rule efficiently. Thus, it will slow down the training of SLFN.

4.9 Grid searching of number of hidden neurons

In this experiment, we used grid searching method to validate 15 is the optimal number of hidden neurons. We changed the number of hidden neurons from 5 to 20 with increase of 1, and keep the settings as the same as previous: use combination of MACIs and FRFEs, and use SLFN + WD as the classifier. λ is assigned to the value of 0.1. The overall accuracy of 10 × 10-fold cross validation changes with the number of hidden neurons, and Fig. 13 shows the curve depicting their relationship.

From Fig. 13, we can observe that the optimal number of hidden neurons is 15. Besides, the overall accuracy will decrease irregularly if the number is less than 15 or more than 15. This experiment shows the grid-searching method is effective in tuning the neural network parameters.

Fig. 13
figure 13

Curve of overall accuracy changing with number of hidden neurons

5 Conclusion

We proposed a novel angiosperm-genus classification method based on two kinds of features: most abundant color index and fractional Fourier entropy. Weight decay was used as regularization method for the single-hidden-layer feed-forward neural network. The results showed the effectiveness of both the proposed combined feature vector, and the weight decay strategy.

This preliminary research collects three main genera (Hibiscus, Orchis, and Prunus) with various species and different photographing conditions. In the future, we will add more petal images of other angiosperm genera. We shall make tentative experiments based on other classifiers, for example, twin support vector machine and convolutional neural network.