1 Introduction

There are about 250,000 named species of flowering plants in the world. Everyday, we can see many blooming flowers in the roadside, garden, park, mountain path, wild field, etc. Generally, experienced taxonomists or botanists can identify plants according to their flowers. However, most people do know nothing about these wild flowers, even their names. To know the names or characteristics of the plants, we usually have to consult flower guide books or browse any relevant web pages on the Internet through keywords searching. Typically, such a keyword searching approach is not practical for most people. Since digital cameras have been widely used for most people, it would be very useful to identify the blooming plant based on the flower images taken by a digital camera. The first problem in a flower recognition system is how to accurately extract the flower region from a natural complex background. Once the flower region is segmented, effective color, shape, and texture features are extracted for further recognition purpose.

Saitoh and Kaneko [11] proposed an automatic method for recognizing wild flowers using a frontal flower image and a leaf image taken by a digital camera. To take the flower images and the leaf images, they first placed a black sheet under the flowers or leaves, which is inconvenient and laborious. To well separate the flower and the leaf from the background, they used k-means clustering algorithm to model the background region. A total of 17 features that describe the color and shape properties of the flower and the leaf images were extracted for flower recognition by using a neural network. A recognition rate of 95% was obtained for the recognition of 20 sets of flower and leaf images from 16 species. The main problem with this approach is that it is inconvenient to take the images.

Das et al. [3] proposed an approach to indexing flower patent images using the domain knowledge of flower colors and their spatial locations. Generally, the colors appear in the flower regions are rarely green, black, gray, or brown and the background colors are usually visible along the periphery of the image. An automatic iterative segmentation algorithm exploiting the domain knowledge was developed to isolate the flower region from the background. Only the colors in the flower region instead of all colors in the entire image were used to index similar flowers. The color features include color names and their relative proportions in the flower region. Their flower indexing system provided queries using natural language color names and an example image. A flower database consists of 300 images was tested to demonstrate the effectiveness of their proposed approach. However, using color information alone, without considering the shape features, cannot recognize flower images effectively.

Hong et al. [7] proposed a flower image retrieval method based on the features extracted from the region-of-interest (ROI), which corresponds to the flower region. A segmentation method was proposed to separate the flower regions from background using color clustering method and the domain knowledge similar to that proposed by Das et al. [3]. The color histogram, which represents the color distribution of the flower region, as well as two shape features were extracted to search similar flower images. These two shape features, the centroid-contour-distance (CCD) curve and the angle-code-histogram (ACH), were extracted to characterize the shape of the flower contour. CCD curve measures the distances from all contour points to the center of the flower region. For each contour point, the angle between two approximate lines starting from and ending at the point will be accumulated to form ACH. Experimental results on 885 flower images from 14 plant species have shown that their approach outperforms the method based on the global color histogram proposed by Swain and Ballard [14] and the method proposed by Das et al. [3]. The main problem with this approach is that the CCD curve and ACH will be greatly affected if some petals fall off, bend, curl, twist, etc.

Zou and Nagy [16] developed a model-based interactive flower recognition system based on the concept of Computer Assisted Visual InterActive Recognition (CAVIAR). In the training process, each training image was interactively segmented in order to extract the flower regions. Domain-specific rose-curve model was then employed to fit the silhouette of each flower region. Eight model parameters, including the petal number, the ratio of the outer radius to the inner radius, and the first three moments of the hue and saturation histograms of the pixels within the rose curve, were extracted to recognize flower images. In the recognition process, an initial rose curve of the test flower image was estimated and superimposed on the test flower image. The first three candidates were displayed according to the model parameters extracted from the initial rose curve. The user can accept one of the recognition results or tries to interactively adjust the parameters of the rose-curve model with mouse operations. According to the adjustment, the system will re-compute the model parameters and re-rank the recognition results. Such an interactive process will repeat until the user accepts the recognition result. One major problem of this system is that too many user interactions have to be conducted to get high recognition accuracy.

Nilsback and Zisserman [10] developed a visual vocabulary that explicitly describes the various characteristics (color, shape, and texture) of flowers. First, each image is automatically segmented into foreground region (flower part) and background region using the contrast dependent prior Markov random field (MRF) cost function [1] and optimized using graph cuts. The HSV color values of all pixels in the training images were then divided into V c clusters using k-means clustering algorithm. The number of clusters V c is optimized on the dataset. Then, a color vocabulary is constructed by the set of cluster centers (visual words). As a result, each image is represented by a V c -dimensional normalized frequency histogram of the set of visual words. To describe the shape of each petal, a rotation invariant descriptor called scale-invariant feature transform (SIFT) descriptor [8] was computed on a regular grid and optimized over three parameters: the grid spacing M, the radius R of the support region for SIFT computation, and the number of clusters. Vector quantization was then applied to get the visual words representing the petal shapes. The frequency histogram corresponding to the shape visual words was calculated to describe the shape characteristic. To model the characteristic patterns on different petals, texture features were computed by convolving the image with maximum response 8 (MR8) filter bank [15]. The performance was optimized over the size of the square support regions of the MR8 filters. A vocabulary was created by clustering the texture descriptors of all training images and the frequency histogram was obtained for each image. For each characteristic (color, shape, or texture), the distance between two images is evaluated by the χ2 measure of their frequency histograms. To get better performance, they combined these three vocabularies into a joint flower vocabulary and obtained a joint frequency histogram. A weight vector associated with the joint frequency histogram was introduced to optimize the performance. Experimental results on a dataset of 1360 images from 17 flower species have shown that the combined vocabulary outperforms each of the individual ones. Typically, there are too many parameters need to be optimized to get high recognition rate.

Saitoh et al. [13] extended the route tracing method [12] to automatically extract the flower boundary under the assumption that the flower region is focused and the background is out of focus. The extended route tracing method is based on the Intelligent Scissor (IS) approach [9] which searches a route that minimizes the sum of local costs according to a number of manually selected points on the visually identified flower boundary. Instead of minimizing the sum of local costs, the extended route tracing method tried to minimize the average cost defined as the sum of local costs divided by the route length. Four shape features (the ratio of the route length to the sum of distances between the gravity center and all boundary points, the number of petals, the central moment, and the roundness) as well as six color features (the x and y coordinates and the proportions of flower pixels accumulated in the two largest color cells in the HS color space) were extracted to recognize flower images. Experiments were conducted on 600 images from 30 species with 20 images per species. The recognition rates were 90.7%, 97.7%, and 99.0% if the correct one is included in the top one, top two, and top three candidates, respectively. It is worth to note that the number of petals will change if some petals fell off or were occluded by others.

Cho and Chi [2] proposed a structure-based flower image recognition method. The genetic evolution algorithm with adaptive crossover and mutation operations was employed to tune the learning parameters of the Backpropagation Through Structures algorithm [5]. A region-based binary tree representation whose nodes correspond to the regions of the flower image and links represent the relationships among regions was constructed to represent the flower image content. Experimental results showed that the structural representation of flower images can produce a promising performance for flower image recognition in terms of generalization and noise robustness. In fact, the classification accuracy of the system depends on the selection of the feature values.

Fukuda et al. [4] developed a flower image retrieval system by combining multiple classifiers using fuzzy c-means clustering algorithm. In their system, flowers were classified into three categories of different structures: gamopetalous flowers, many-petaled flowers, and single-petaled flowers. For each structure, a classifier with specific feature set was constructed. Fuzzy c-means clustering algorithm was then used to determine the degree of membership of each image to each structure. The overall similarity is a linear combination of each individual similarity computed for each classifier with the weight being the degree of membership. The test database consists of 448 images from 112 species with 4 images per species. Experimental results have shown that the multiple-classifier approach outperforms any single-classifier approach. However, it is too rough a classification mechanism to classify flowers into three different categories according to the number of petals.

Note that the previous researchers extracted color and shape features from the whole image region or flower boundary, without specifically treating the color and shape characteristics of the pistil/stamen area. Thus, an interactive flower image recognition system, which extracts the color and shape features not only from the whole flower region but also from the pistil/stamen area, will be proposed to describe the characteristics of the flower images more precisely. First, a flower segmentation method is developed to segment the flower boundary with as fewer user interactions as possible. Further, a simple normalization procedure is employed to make the extracted features more robust to shape deformations, including the number of petals, the relative positions of petals, the poses of petals taken from different directions, flower sizes, etc. The rest of this paper is organized as follows. Section 2 describes the proposed flower image recognition system. Some experimental results will be given in Section 3. Conclusions will be given in Section 4.

2 The proposed flower image recognition system

The proposed flower image recognition system consists of three major phases: flower region segmentation, feature extraction, and recognition, as shown in Fig. 1. In the segmentation phase, the proposed system provides an interface allowing a user to draw a rectangular window which circumscribes the flower region. A segmentation algorithm similar to that proposed by Saitoh et al. [13] is then developed to extract the flower region within the rectangular window. In the feature extraction phase, the shape and color features of the whole flower region as well as the pistil/stamen area are extracted to measure the similarity between two flower images. In the recognition phase, the flower image in the database that is most similar to the input image will be found using the extracted features.

Fig. 1
figure 1

Flow diagram of the proposed flower image recognition system

2.1 Flower region segmentation

In order to extract the flower boundary as correctly as possible, the proposed system provides a simple interactive interface which allows the user to select the interested flower for recognition. Figure 2 illustrates the steps of the interactive flower region segmentation phase. First, the user can draw a rectangular window which circumscribes the interested flower by using mouse click and drag operations. Let P 0 denote the center point of the rectangular window, P 1, P 2, P 3, and P 4 denote the middle points on each of the four boundary lines of the rectangular window as shown in Fig. 3. For each scan line starting from any P i (i = 1, 2, 3, 4) to P 0, the edge point locating at the flower boundary will be detected. These four edge points will then be regarded as the starting/ending points for boundary tracing. Since the proposed flower edge detection method used the “local cost” value associated with every pixel on each scan line, we will define the local cost first.

Fig. 2
figure 2

Block diagram for flower region segmentation

Fig. 3
figure 3

Four boundary lines on the flower bounding rectangular window

2.1.1 Definition of local cost

The local cost (LC) of a pixel on a scan line is defined as follows:

$$ LC = 1 + MG - G, $$
(1)

where G denotes the gradient magnitude for the pixel and MG denotes the maximum gradient magnitude of all pixels in the image. According to the definition of local cost, we can see that a pixel with strong edge gradient will have a small local cost. In this paper, the Sobel operators (see Fig. 4) are employed to compute the horizontal and vertical gradient magnitudes of a pixel. Let I R (x, y), I G (x, y), and I B (x, y) denote respectively the R, G, and B color values of a pixel locating at (x, y). For R color value, the corresponding gradient magnitude, notated G R (x, y), is defined as follows:

$$ {G_R}\left( {x,y} \right) = \sqrt {{G_{R,H}^2\left( {x,y} \right) + G_{R,V}^2\left( {x,y} \right)}}, $$
(2)

where G R,H (x, y) and G R,V (x, y) denote respectively the horizontal and vertical gradients derived by convolving the horizontal and vertical Sobel operators with the 3 × 3 image blocks centered at (x, y) and can be described by the following equations:

$$ \begin{gathered} \begin{array}{*{20}{c}} {{G_{R,H}}\left( {x,y} \right) = \left[ {{I_R}\left( {x + 1,y - 1} \right) + 2{I_R}\left( {x + 1,y} \right) + {I_R}\left( {x + 1,y + 1} \right)} \right] - } \\ {\;\,\left[ {{I_R}\left( {x - 1,y - 1} \right) + 2{I_R}\left( {x - 1,y} \right) + {I_R}\left( {x - 1,y + 1} \right)} \right]} \\ \end{array} \hfill \\ { } \hfill \\ \end{gathered} $$
(3)

and

$$ \begin{gathered} \begin{array}{*{20}{c}} {{G_{R,V}}\left( {x,y} \right) = \left[ {{I_R}\left( {x - 1,y + 1} \right) + 2{I_R}\left( {x,y + 1} \right) + {I_R}\left( {x + 1,y + 1} \right)} \right] - } \\ {\;\,\left[ {{I_R}\left( {x - 1,y - 1} \right) + 2{I_R}\left( {x,y - 1} \right) + {I_R}\left( {x + 1,y - 1} \right)} \right]} \\ \end{array} \hfill \\ { } \hfill \\ \end{gathered} $$
(4)
Fig. 4
figure 4

The Sobel operators a vertical Sobel operator b horizontal Sobel operator

The gradient magnitudes for G and B color values, notated G G (x, y) and G B (x, y), can be computed in a similar manner. The overall gradient is then defined as the maximum value among G R (x, y), G G (x, y), and G B (x, y):

$$ G\left( {x,y} \right) = \max \;\left\{ {{G_R}\left( {x,y} \right),\,{G_G}\left( {x,y} \right),\,{G_B}\left( {x,y} \right)} \right\} $$
(5)

2.1.2 Detection of flower edge points

According the computed local cost associated with each pixel, four profiles of local costs along the lines starting from every P i (i = 1, 2, 3, 4) to P 0 are generated. In this study, the estimated stamen region will be excluded on each profile (see P 3 → P 5 in Fig. 5). The estimated stamen region is defined as the rectangular window with its center locating at P 0 and its area being 1/9 of the flower bounding window. For each profile, the 5th percentile, P LC(5), of local costs is evaluated. The threshold value, T LC, used to find edge points on each profile is defined as the average of the local costs smaller than P LC(5). If the local cost value of a point is smaller than the threshold T LC, it will be considered as a candidate edge point. The one closest to the border of the flower bounding window is regarded as the flower edge point (see e1, e2, e3 and e4 in Fig. 5). These four flower edge points will be taken as the starting/ending points of the flower boundary tracing algorithm. In our experiments, about 14.4% (50/348) among the 348 images in our database contain at least one wrongly detected edge points. Figure 6 gives some examples of wrongly detected edge points. The main reasons for producing these wrongly detected edge points are: 1) there exist strong edge points within the flower region (see Fig. 6(a)); 2) the contrast between the flower region and the background is not sharp enough (see Fig. 6(b)); 3) overlapping between neighboring flowers (see Fig. 6(c)); 4) no flower edge point survives in the profile when the stamen region is excluded (see Fig. 6(d)), etc. For these images, we provide an interactive interface which allows the user to use the mouse to select the correct edge point (see Fig. 7).

Fig. 5
figure 5

An example for the detection of flower edge point on each profile

Fig. 6
figure 6

Some example images showing that the edge points detected are incorrect

Fig. 7
figure 7

Correction of a wrongly detected edge point a a wrong edge point, e3 b The corrected edge point through user interaction

2.1.3 Flower boundary tracing

Let e1, e2, e3, and e4 denote the detected flower edge points. The two lines which connect P 1 and P 2 as well as P 3 and P 4 will divide the flower bounding window into four sub-regions (see R 1, R 2, R 3, and R 4 in Fig. 8). The flower boundary of each sub-region will be independently traced. Each pair of the four sets of edge points (e1, e4), (e4, e2), (e2, e3) and (e3, e1) will serve as the starting and ending tracing points of each sub-region, respectively. The proposed flower boundary tracing algorithm starts from the starting point and stops when the ending point is reached. These four partial flower boundaries will then be combined to form the whole flower boundary (see the yellow curve in Fig. 8).

Fig. 8
figure 8

The flower bounding window is divided into four sub-regions

The proposed flower boundary tracing algorithm modifies the 2-D dynamic programming graph search algorithm developed by Mortensen et al. [9]. It treats each pixel within the flower bounding window a vertex in a graph. An edge in the graph will connect a pixel to one of its 8-connected neighboring pixels. The cost associated with an edge is defined as the local cost evaluated on the neighboring pixel. The concept of average path cost, which is defined as the partial average cost computed from the previous pixel to the next pixel, is employed to decide which direction to move. The partial average cost is updated by adding the average of the previous pixel cost and the next pixel cost. The detailed algorithm of the modified flower boundary tracing algorithm is described as follows.

figure d

2.2 Feature extraction

The most widely used features for describing flowers are color and shape descriptors. In this paper, the color and shape features of the whole flower region and the pistil/stamen area will be extracted in an attempt to describe the characteristics of the flower images more precisely.

2.2.1 Features of the whole flower region

First, we define the flower region as the internal region within the segmented flower boundary. In this paper, nine color features in which the first six color features were proposed by Saitoh et al. [13] and three shape features are extracted from the whole flower region for recognition purpose.

Color features of flower region

Since the flower images were taken in different environmental conditions, the variation in illumination will greatly affect the recognition result. To deal with such a problem, we convert each pixel from the RGB color space to HSV (hue, saturation, and value) space [6] and discard the illumination (V) component. The color features are derived from the primary, secondary, and thirdly flower colors appearing in the whole flower region. First, the HS space is divided into 12 × 6 color cells represented by C i , 1 ≤ i ≤ 72 (please see Fig. 9). The color coordinate of each cell, which is defined as the coordinate of the center point of each cell, can be represented by a pair of H and S values, (H i , S i ), 1 ≤ i ≤ 72. For each flower region, a color histogram (notated CH(i), 1 ≤ i ≤ 72), which describes the probability associated with each color cell C i , will be generated. Let DC(1), DC(2), and DC(3) denote respectively the first three dominant color cells appearing in the flower region. The color coordinates of these three dominant color cells and their corresponding probabilities are taken as the color features of the flower region. Let (dx i , dy i ) and p i denote the coordinate vector and the corresponding probability of DC(i), 1 ≤ i ≤ 3, where dx i = S DC(i) cos(H DC(i)) and dy i = S DC(i) sin(H DC(i)). These color features can be summarized as follows.

  • CF1: x-coordinate value of DC1, dx1

  • CF2: y-coordinate value of DC1, dy1

  • CF3: the probability of DC1, p1

  • CF4: x-coordinate value of DC2, dx2

  • CF5: y-coordinate value of DC2, dy2

  • CF6: the probability of DC2, p2

  • CF7: x-coordinate value of DC3, dx3

  • CF8: y-coordinate value of DC3, dy3

  • CF9: the probability of DC3, p3

Fig. 9
figure 9

The HS color space is divided into 12 × 6 color cells

Shape features of flower region

To get the shape features, we first define the centroid (g x , g y ) of the flower region as the flower center, which is computed as follows:

$$ {g_x} = \frac{1}{N}\sum\limits_{i = 1}^N {{x_i}}, $$
(6)
$$ {g_y} = \frac{1}{N}\sum\limits_{i = 1}^N {{y_i}}, $$
(7)

where N is the number of pixels on the flower boundary, x i and y i are respectively the x and y coordinates of the i-th boundary pixel. The distance between the flower center and each boundary pixel is then computed as follows:

$$ {d_i} = \sqrt {{{{\left( {{x_i} - {g_x}} \right)}^2} + {{\left( {{y_i} - {g_y}} \right)}^2}}}, \;1 \leqslant i \leqslant N. $$
(8)

Without loss of generality, let the distances be sorted in an increasing order. That is, d i d i+1, for 1 ≤ iN-1. The three shape features (notated SF 1, SF 2, and SF 3) used to represent the shape characteristics of the flower region will be described as follows.

  1. 1)

    SF1: A ratio which indicates the relevant sharpness of the petals and is computed from the distances between the flower boundary points to the flower center defined as follows:

    $$ S{F_1} = \frac{{{R_{10}}}}{{{R_{90}}}}, $$
    (9)

    where R10 and R90 are respectively the average distances among all d i ’s which are smaller than 10th percentile and larger than 90th percentile of all d i ’s:

    $$ {R_{{10}}} = \frac{1}{{0.1 \times N}}\sum\limits_{i = 1}^{0.1 \times N} {{d_i}}, $$
    (10)
    $$ {R_{{90}}} = \frac{1}{{0.1 \times N}}\sum\limits_{i = 1}^{0.1 \times N} {{ }{d_{N - i}}} . $$
    (11)

    Note that the computed feature value SF 1 defined as a ratio between R 10 and R 90 will not change greatly when the flower region is broken or captured from different directions.

  2. 2)

    SF2: The average of normalized distances computed from every flower boundary point to the flower center defined as follows:

    $$ S{F_2} = \frac{1}{N}\sum\limits_{i = 1}^N {{D_i}}, $$
    (12)

    where D i is the normalized distance defined as follows:

    $$ {D_i} = \left\{ {\begin{array}{*{20}{c}} {1,} \hfill & {{d_i} \geqslant {R_{90}}} \hfill \\ {\frac{{{d_i} - {R_{10}}}}{{{R_{90}} - {R_{10}}}},} \hfill & {{R_{10}} < {d_i} < {R_{90}}} \hfill \\ {0,} \hfill & {{d_i} \leqslant {R_{10}}} \hfill \\ \end{array} } \right.. $$
    (13)

    Note that the definition of the feature value SF 2 using the averaged normalized values D i ’s will make it invariant to the size of the flower region.

  3. 3)

    SF3: Roundness measure which indicates how much the shape of the flower petal is closer to a circle and is defined as follows:

    $$ S{F_3} = \frac{{4\pi S}}{{{L^2}}}, $$
    (14)

    where L is the length of the flower boundary and S is the area of the flower region defined as the total number of pixels in the flower region. When the flower shape is close to a circle, SF3 will be close to 1. Note that this feature value is robust to rotation, translation, and scaling of flower objects.

In summary, the 12-dimensional feature vector used to represent the flower region can be described as follows:

$$f_{F} = [CF_{1} ,\;CF_{2} , \cdots ,\;CF_{9} ,\;SF_{1} ,\;SF_{2} ,\;SF_{3} ]^{{\text{T}}} .$$
(15)

2.2.2 Features of the pistil/stamen area

First, we define an initial estimate of the pistil/stamen area as the square area with its center locating at the flower center and its width 2/3 of the petal length, where the petal length is defined as R 90 in (11). Let PDC(1) denote the dominant color cell in this estimated pistil/stamen area. Note that PDC(1) is found by excluding the primary color appearing in the flower region, DC(1). Then, all image pixels within the square area of width 4/3 of the petal length having color values identical to that of PDC(1) will constitute the pistil/stamen area.

Since the color and shape of the pistil/stamen area also exhibit some discriminating information for flower image recognition, the dominant color and its corresponding probability will be taken as the color features of the pistil/stamen area. In addition, the mean, standard deviation, and the third central moment of the normalized distance from each pixel in the pistil/stamen area to the center of the pistil/stamen area will be computed as the shape features of the pistil/stamen area.

Color features of pistil/stamen area

For most flowers, the dominant color of the pistil/stamen area is often different from that of the flower region. Thus, the color characteristic of the pistil/stamen area provides some discriminating information. In this study, the coordinate vector (pdx 1, pdy 1) and the corresponding probability pp 1 of PDC(1) will be taken as the color features of the pistil/stamen area, where pdx 1 = S PDC(1) cos(H PDC(1)) and pdy 1 = S PDC(1) sin(H PDC(1)). These color features can be summarized as follows.

  • PCF1: x-coordinate value of PDC1, pdx1

  • PCF2: y-coordinate value of PDC1, pdy1

  • PCF3: the probability of PDC1, pp1

Shape features of pistil/stamen area

Let the pistil/stamen area consist of M pixels and the coordinates of these M pixels be notated as (px i , py i ), 1 ≤ iM. Next, the centroid (g px , g py ) of the pistil/stamen area is computed as follows:

$$ {g_{px}} = \frac{1}{M}\sum\limits_{i = 1}^M {p{x_i}}, $$
(16)
$$ {g_{py}} = \frac{1}{M}\sum\limits_{i = 1}^M {p{y_i}} . $$
(17)

The distance between the centroid and every pixel of the pistil/stamen area is then computed as follows:

$$ p{d_i} = \sqrt {{{{\left( {p{x_i} - {g_{px}}} \right)}^2} + {{\left( {p{y_i} - {g_{py}}} \right)}^2}}}, \;1 \leqslant i \leqslant M. $$
(18)

Without loss of generality, let the distances be sorted in an increasing order. That is, pd i pd i+1, 1 ≤ iM-1. These distances are then normalized using the following equation:

$$ P{D_i} = \left\{ {\begin{array}{*{20}{c}} {1,} \hfill & {p{d_i} \geqslant P{R_{90}}} \hfill \\ {\frac{{p{d_i} - P{R_{10}}}}{{P{R_{90}} - P{R_{10}}}},} \hfill & {P{R_{10}} < p{d_i} < P{R_{90}}} \hfill \\ {0,} \hfill & {p{d_i} \leqslant P{R_{10}}} \hfill \\ \end{array} } \right., $$
(19)

where PR 10 and PR 90 are respectively the average distances among all pd i ’s which are smaller than 10th percentile and larger than 90th percentile of all pd i ’s:

$$ P{R_{{10}}} = \frac{1}{{0.1 \times M}}\sum\limits_{i = 1}^{0.1 \times M} {p{d_i}}, $$
(20)
$$ P{R_{{90}}} = \frac{1}{{0.1 \times M}}\sum\limits_{i = 1}^{0.1 \times M} {{ }p{d_{M - i}}} . $$
(21)

Note that the normalized distance values PD i ’s will make the extracted shape features of the pistil/stamen area invariant to the size of the flower image. The three shape features (notated PSF 1, PSF 2, and PSF 3) used to represent the shape characteristics of the pistil/stamen area are defined as follows.

  1. 1)

    PSF1: The mean of the normalized distance values PD i ’s, defined as follows:

    $$ PS{F_1} = {\mu_{PD}} = \frac{1}{M}\sum\limits_{i = 1}^M {P{D_i}} . $$
    (22)
  2. 2)

    PSF2: The standard deviation of the normalized distance values PD i ’s, defined as follows:

    $$ PS{F_2} = {\sigma_{PD}} = {\left( {\frac{1}{M}\sum\limits_{i = 1}^M {{{\left( {P{D_i} - {\sigma_{PD}}} \right)}^2}} } \right)^{1/2}}. $$
    (23)
  3. 3)

    PSF3: The third central moment of the normalized distance values PD i ’s, defined as follows:

    $$ PS{F_3} = {m_3} = {\left( {\frac{1}{M}\sum\limits_{i = 1}^M {{{\left( {P{D_i} - {\sigma_{PD}}} \right)}^3}} } \right)^{1/3}}. $$
    (24)

Note that these three shape features are typically robust to rotation, translation, and scaling of flower images. In summary, the 6-dimensional feature vector used to represent the pistil/stamen area can be described as follows:

$$f_{P} = [PCF_{1} ,\;PCF_{2} ,\;PCF_{3} ,\;PSF_{1} ,\;PSF_{2} ,\;PSF_{3} ]^{{\text{T}}} .$$
(25)

The feature descriptor used to represent a flower image consists of all 18 features extracted from the flower region as well as the pistil/stamen area:

$$\begin{array}{*{20}c} {{f = [f(1),\;f(2), \cdots ,\;f(18)]^{{\text{T}}} = [(f_{F} )^{{\text{T}}} \;(f_{P} )^{{\text{T}}} ]^{{\text{T}}} }} \\ {{ = [CF_{1} ,\; \cdots \,,\;CF_{9} ,\;SF_{1} ,\; \cdots ,\;SF_{3} ,PCF_{1} ,\; \cdots ,\;PCF_{3} ,\;PSF_{1} ,\; \cdots ,\;PSF_{3} ]^{{\text{T}}} .}} \\ \end{array} $$
(26)

2.3 Flower image recognition

In the recognition phase, the distances between the input image and all flower images in the database are calculated. The distance between the input image and the i-th image in the database, notated dist i , is measured by the weighted Euclidean distance defined as follows:

$$ dis{t_i} = \frac{{\sum\limits_{k = 1}^{{N_f}} {w(k)} \left| {{f_i}(k) - f(k)} \right|}}{{\sum\limits_{k = 1}^{{N_f}} {w(k)} }}, $$
(27)

where f i (k) is the k-th feature value of the i-th image, f(k) is the k-th feature value of the input image, and w(k) is the weight associated with the k-th feature value. The variable N f determines which type of features is used for flower image recognition. If only the features extracted from the flower region are used for recognition purpose, set N f  = 12; if features extracted from both the flower region and the pistil/stamen area are used for recognition purpose, set N f  = 18. In this paper, the normalized top five recognition rate, NAR5(k), is used as the weight associated with the k-th feature value (1 ≤ k ≤ 18). That is, w(k) = NAR5(k). NAR5(k) is derived by normalizing the top five recognition rate associated with the k-th feature value, AR5(k):

$$ w(k) = NAR5(k) = \frac{{AR5(k)}}{{\sum\limits_{j = 1}^{{N_f}} {AR5(j)} }} \times 100, $$
(28)

where AR5(k) denotes the recognition rate when the k-th feature value is individually used for flower recognition and the recognition result will be regarded as accurate if at least one of the species in the top five candidates is identical to the input one. Table 1 shows the top five recognition rate (AR5(k)) and the weight (w(k)) associated with each feature value for the first flower image database. Note that these initial weights are distinct for different databases. Based on the computation of all weighted distances, the top K candidate images with minimum distances to the input image will be returned to the user. In this study, the top K recognition rate will be employed to evaluate the recognition performance.

Table 1 Top five recognition rate AR5(k) and weight w(k) associated with each feature value for the first flower image database

3 Experimental results

3.1 Flower image databases

In this paper, two flower image databases are used to evaluate the performance of the proposed method. The first flower image database constructed by us consists of 348 flower images from 24 species. A summary of these 24 plants is shown in Table 2 and some example images are shown in Fig. 10. The second database constructed by Zou and Nagy [16] consists of 612 flower images from 102 species.

Table 2 Common names and Latin names of the plant species and their corresponding image numbers in the first database
Fig. 10
figure 10

Some example images in the first database

The flower images in the first database were taken in the fields by using digital cameras equipped with macro lens. The range of the lens aperture specified as an F-number was set between F2.8 and F5.6. To ensure the robustness of the proposed system, a number of different cameras, including SONY T9, Canon IS 860, and NIKON S1, were used to take these flower images. In addition, the images of the same species were taken from different flowers with different poses. The number of images in each species is different and ranges from 4 to 33. Several images contain multiple, tiny, overlapping flowers as shown in Fig. 10. Before recognition, all flower images were re-scaled to be the same size of 400 × 300 pixels.

The second database, which was constructed by Zou and Nagy, consists of 612 flower images from 102 species collected from (http://www.ecse.rpi.edu/doclab/flowers). Each species consists of six images. The size of these images are identical, 300 × 240 pixels. Some pictures are quite out of focus, and several pictures contain multiple, tiny, overlapping flowers.

3.2 Recognition results

To show the effectiveness of the proposed approach, we will compare the recognition results of the proposed approach with the feature sets proposed by Hong et al. [7], Zou and Nagy [16], and Saitoh et al. [13]. The overall recognition accuracy is evaluated by taking each flower image in the database as an input image and the remaining flower images are considered as the training set.

For the first flower image database, the comparison of the recognition rate of the proposed approach with those proposed by Hong et al. [7] and Saitoh et al. [13] is shown in Table 3. We can see that the color features give more discriminating ability than the shape features. The proposed approach using both shape and color features of the flower region can achieve a very high recognition rate. Furthermore, the best performance is obtained by combining the shape and color features of both the flower region as well as the pistil/stamen area, which outperforms the approaches proposed by Hong et al. [7] and Saitoh et al. [13] in terms of the recognition rate. Figure 11 shows in the first flower image database the three different species (Zinnia, Pilose Beggarticks, and Marguerite) with similar appearance. The comparison of the recognition rate on these species is shown in Table 4. From this table, we can see that our proposed approach also outperforms the other two approaches. Specifically, Fig. 12 shows the 27 images of Pilose Beggarticks. We can see that these images exhibit different characteristics, including the number of petals, the relative positions of petals, the shape of petals, etc. Furthermore, for some images there are some bugs appearing in the different part of the flower region. From Table 4, we can see the comparison of recognition rates on this species. It is clear that the proposed approach yields the best recognition rate. This result illustrates that the proposed approach is more robust to noises and shape variations.

Table 3 Comparison of recognition rate on the first flower image database
Fig. 11
figure 11

The 3 different species with similar appearance a Zinnia b Pilose Beggarticks c Marguerite

Table 4 Comparison of recognition rate on the three species (Zinnia, Pilose Beggarticks, and Marguerite)
Fig. 12
figure 12

The 27 images of Pilose Beggarticks in the first image database

For the second database, we have also compared the recognition rate of the proposed approach with those proposed by Hong et al. [7], Zou and Nagy [16], and Saitoh et al. [13]. Table 5 compares the recognition rates of these approaches. From the table, we can see that our proposed approach using shape and color features of the flower region and the pistil/stamen area always yields a higher recognition rate than the other approaches. Note that the method proposed by Zou and Nagy achieved a recognition rate of 93% when a number of user interactions were repeatedly conducted until the user accepted the recognition result, which takes a longer computation time (10.7 s) than our proposed approach (4.2 s). Figure 13 shows the images of two flower species (FlowerBridge1 and RosaArkansana) in the second database. We can see that the images of the same species were taken from different directions and distances and thus these images reveal different shapes and sizes. The comparison of the recognition rate on these two species is shown in Table 6. From this table, we can see that our proposed approach also outperforms the other methods.

Table 5 Comparison of recognition rate on the second flower image database
Fig. 13
figure 13

The images of the two species in the second database a FlowerBridge1 b RosaArkansana

Table 6 Comparison of recognition rate on FlowerBridge1 and RosaArkansana in the second database

4 Conclusions

In this paper, we have presented an interactive flower recognition system. First, the system provides a simple user interface which allows each user to draw a rectangular bounding window containing the interested flower region. Then, a boundary tracing algorithm is developed to find the flower boundary as accurately as possible. Besides the color and shape features of the whole flower region, the color and shape features of the pistil/stamen area are also extracted to represent the flower characteristics in a more precise fashion. Experiments conducted on two different flower image databases consisting of 24 species and 102 species have shown that our proposed approach achieves a higher recognition rate than the methods proposed by Hong et al. [7], Zou and Nagy [16], and Saitoh et al. [13].