Keywords

1 Introduction

Plants play a crucial role in Earth’s ecology by providing sustenance, shelter and maintaining a healthy breathable atmosphere. Plants also have important medicinal properties and are used for alternative energy sources like bio-fuel. Building a plant database for quick and efficient classification and recognition is an important step towards their conservation and preservation. This is especially significant as many plant species are at the brink of extinction due to incessant de-forestation to pave the way for modernization. In recent years computer vision and pattern recognition techniques have been utilized to prepare digital plant cataloging systems. Most of these techniques rely on extraction of visual properties from plant leaf images and representing them as computer recognizable features using data modeling techniques. Properties like shape and texture [1] and shape, texture and color [2] have been used for discrimination. Different data modeling techniques like fractal dimensions [3], Fourier analysis [4], and wavelets [5] have been applied. The current work proposes an innovative scheme of a plant recognition system based on shape and texture of the leaf. Shape is modeled using Curvelet Transform and Invariant Moments, while texture by using a Ridge Filter and some statistical measures derived from it. Experiments using a neuro-fuzzy classifier demonstrate acceptable recognition accuracies. The organization of the paper is as follows: Sect. 2 outlines the proposed approach, Sect. 3 provides details of the dataset and experimental results obtained, Sect. 4 compares the proposed approach vis-à-vis some contemporary approaches, Sect. 5 brings up the overall conclusions and scopes for future research.

2 Proposed Approach

As mentioned above, the proposed approach uses a combination of shape and texture features. The feature values are however sensitive to the size and orientation of the leaf image. To make them invariant to translation, rotation and scaling, a pre-processing step is used to standardize these parameters before feature calculation.

2.1 Pre-Processing (PP)

The objective of the pre-processing step is to standardize the scale and orientation of the image before feature computation. The raw image (\(I\)) is typically a color image oriented at a random angle and having a random size. See Fig. 1. The image is first converted to binary (\(bw\)) and grayscale (\(gs\)) forms. To make features rotation-invariant, the angle of the major axis of the leaf is extracted from the image and used to rotate it so that the major axis is aligned with the horizontal line (\(rg\) and \(rb\)). If visible, the white bounding rectangle is removed to superimpose the leaf over a homogeneous background (\(cg\)). At this point even though the leaf is horizontal, it can have varying translation factors with respect to the origin. To make the features translation-invariant, the background is shrunk until the leaf just fits within the bounding rectangle (\(pg\) and \(pb\)).

Fig. 1
figure 1

PP steps of the original image I

To make the features scale-invariant, the image is rescaled to standard dimensions, called ‘segments’. Since the ratio of major axis to minor axis, henceforth called ‘aspect ratio’ (\(R\)) is different for different leaf types, rescaling to a single size will produce distortions due to non-uniform scaling. Hence a scheme is devised so that the leaf can be scaled to one of 6 pre-determined segments based on different values of this ratio, with no or minor distortions. The output of the pre-processing block for each leaf image, is its segment number (\(s\)), and the grayscale and binary versions (\(pg\) and \(pb\)). The segment number assigned to each class along with the aspect ratio is tabulated below in Table 1. If \(R > 13\), subsequent feature values were found to give inconsistent results.

Table 1 Pre defined scaling dimensions mapped to segments

2.2 Curvelet Transform (CT)

Curvelets were first introduced in [6]. Subsequently a faster form was developed called Fast Discrete Curvelet Transform (FDCT) which had two variants: unequally spaced Fast Fourier Transforms (USFFT) and wrapping function. The current work utilizes the wrapping function, which involves several sub-bands at different scales consisting of different orientations and positions in the frequency domain. An image with dimensions \(M \times N\) is subjected to FDCT which generates a set of curvelet coefficients \(C\) indexed by scale \(a\), orientation \(b\) and spatial location parameters \(p\) and \(q\). Here \(0 \le m \le M\), \(0 \le n \le N\) and \(\varphi_{a,b,p,q}\) is the curvelet waveform.

$$C(a,b,p,q) = \mathop \sum \limits_{m = 0,n = 0}^{M,N} f\left( {m,n} \right)\varphi_{a,b,p,q} (m,n).$$
(1)

2.3 Invariant Moment (IM)

For a digital image, the moment \(m\) of pixel \(P(x,y)\) is defined as: \(m = x.y.P(x,y)\). The moment of the entire image is the summation of moments of all its pixels. More generally the moment of order \((p, q)\) of an image is given by

$$m_{pq} = \sum\limits_{x} {\sum\limits_{y} {x^{p} y^{q} P(x,y)} } .$$
(2)

Hu [7] proposed 7 moment features that can be used to describe images and are invariant to rotation. The first of these φ1 is given by the following:

$$\varphi_{1} = m_{20} + m_{02} .$$
(3)

To make the moments invariant to translation the image is shifted such that its centroid coincides with the origin of the coordinate system. The centroid of the image in terms of the moments, is given by:

$$x_{c} = \frac{{m_{10} }}{{m_{00} }}, y_{c} = \frac{{m_{01} }}{{m_{00} }}.$$
(4)

Then the central moments are defined as

$$\mu_{pq} = \sum\limits_{x} {\sum\limits_{y} {(x - x_{c} )^{p} (y - y_{c} )^{q} P(x,y)} } .$$
(5)

It can be verified that

$$\mu_{00} = m_{00} , \mu_{10} = 0 = \mu_{01} .$$
(6)

To make the moments invariant to scaling, the moments are normalized by dividing by a power of \(\upmu_{00}\)

$$\gamma_{pq} = \frac{{\mu_{pq} }}{{(\mu_{00} )^{\omega } }}, \omega = 1 + \frac{p + q}{2}.$$
(7)

The normalized central Hu moments are defined by substituting \(m\) terms in Eq. (3) by γ terms. The first normalized central invariant moment of an image \(I\) is:

$$M_{1} (I) = \gamma_{20} + \gamma_{02} .$$
(8)

2.4 Ridge Filter (RF)

In computer vision algorithms, particularly those dealing with image analysis, edge detection forms an important step for recognizing the shape, location and orientation of image objects. In some cases however we might be more interested in gaining information about the nature of surfaces on such objects. Ridge detection is a step towards understanding the corrugatory nature of these surfaces. Ridge filters have been mostly used to enhance the ridge and valley structures of fingerprint images as part of minutiae extraction modules [8]. In this paper we use a ridge filter to enhance vein structures of a leaf to model its texture content. To compute ridge information, a grayscale image is first partitioned into a grid of non-overlapping blocks. The standard deviation within each block is evaluated and a threshold is used to determine whether the region is a part of the object or the background. This generates a mask showing the pattern of ridge lines on the surface. The mask is normalized to have zero mean and unit standard deviation. From the mask image gradients are calculated and the local ridge orientation at each point is estimated. The ridge frequency is obtained within each block by rotating the block using the orientation values and finding peaks in projected grey values along the ridges. The spatial frequency of the ridges is determined by dividing the distance between the first and last peaks by number of peaks. The frequency and orientation values are used to generate a ridge filter to enhance ridge lines of the original image.

2.5 Statistical Measures

A normalized histogram \(p\) is calculated from the image data using a specified number of bins \(N\). Three statistical measures calculated from the normalized histogram are uniformity (\(U\)), entropy (\(E\)) and third moment (\(T\)). If \(p = [x(1), x(2), \ldots , x(N)]\) depicts the histogram and μ the mean of the distribution then:

$$U = \mathop \sum \limits_{i = 1}^{N} [x(i)]^{2}$$
(9)
$$E = - \mathop \sum \limits_{i = 1}^{N} [x\left( i \right).{ \log }_{2} x(i)]$$
(10)
$$T = \frac{1}{{(N - 1)^{2} }}\mathop \sum \limits_{i = 1}^{N} [x\left( i \right) - \mu ]^{3}$$
(11)

2.6 Features

The binary pre-processed image (\(pb\)) is subjected to a Curvelet Transform. From all possible coefficients \(C\), the one with the highest energy (\(CC\)) is retained as it contains the most significant information pertaining to the image shape. See Fig. 2. The shape feature (\(FS\)) is formulated by computing \(M_{1}\) from \(CC\) as per Eq. (8).

Fig. 2
figure 2

Curvelet coefficient CC and ridge structures of a leaf

$$FS = M_{1} (CC)$$
(12)

The grayscale pre-processed image (\(pg\)) is subjected to a Ridgelet Filter to enhance the ridge structures of the leaf. See Fig. 2. Three statistical measures as defined in Eqs. (9)–(11) computed from the filtered image form the texture feature (\(FT\)).

$$FT = \{ U,E,T\}$$
(13)

To study the joint effects of shape and texture features, a combined feature vector \(FC\) is used for class discrimination and recognition

$$FC = \left\{ {FS,FT} \right\}$$
(14)

2.7 Classification

A leaf class consists of a set of member images. Each class is characterized by a collection of the \(FC\) vectors obtained during a training phase. A test image with its computed vector is said to belong to a specific class if the probability of its feature values being a member of that class is maximum. Since there is no prior mathematical model based on which data samples may be classified, classification is done solely on the basis of a number of observations which is used to train a Neuro Fuzzy Classifier (NFC) so as to combine the advantages of a fuzzy classification scheme and the automatic adaptation procedure of a neural network.

3 Experimentations and Results

To study validity of the proposed scheme, experimentations done using 600 images from Flavia [9] involving 30 classes having 10 images per class for training and 10 for testing. Each image is of size 300 × 225 and in JPG format. Figure 3 shows samples of the dataset. Overall accuracy obtained is 97 %. Details are shown in Table 2.

Fig. 3
figure 3

Samples of the dataset

Table 2 Percentage recognition accuracies

4 Analysis

To put the current work in perspective with other contemporary works, their approaches were applied to the current dataset. Color, texture and shape features in [10] produce an accuracy of 74.3 %. Due to the large number of features, resource overheads were also high. The LBP based method of [11] was sensitive to noise, and different patterns of LBP were seen to produce incorrect classifications giving an accuracy of 37 %. Color information of [12] produces incorrect classifications due to small variations of green shades between leaves giving a 31 % accuracy. Fourier basis functions [4] produce an accuracy of 16 % as these are sinusoidal in nature with infinite lengths, and cannot suitably model transient signals with sharp changes, as is often encountered along leaf contours.

5 Conclusions

This article discusses a method of characterizing plant leaves by using a ridge filter and statistical measures to model texture information, together with Curvelet coefficients and invariant moments to model shape. Prior to feature extraction, the leaf image is made invariant to transformations through a pre-processing stage. To avoid distortion for leaves having different aspect ratios, a set of 6 predefined segments have been proposed. Classes are categorized using neuro fuzzy classifiers. Experimental results demonstrate that the proposed approach is effective in discriminating between 30 classes of leaf images having a variety of textures and shapes. Future work would involve research along two directions: (1) combination of other shape, texture and color based features with the current method. (2) Using other classifiers like k-Nearest Neighbor, Support Vector Machine etc.