Keywords

1 Introduction

There are numerous species of flowers around the world. Flowers have great demand in pharmaceutical, cosmetic, floriculture and food industry. Accurate identification of flowers is essential in applications like flower patent analysis, field observing, plant identification, floriculture industry, research in medicinal plants, etc. Manual classification of flowers is time consuming, less accurate and cumbersome. Automation of the classification of flower is therefore essential but a challenging task due to high similarities among classes [1]. There exists interclass similarity and intra-class dissimilarity among flower species. Due to deformation in flowers, lighting and climatic conditions, variations in viewpoints, large intra-class variations occur [2]. Because of these problems, flower recognition has become a challenging research topic in recent years.

In [3], it has been noticed that most of the manual approaches describe images using difference of image gradients, textures and/or colors. As a result of this, there exists a large dissimilarity between the low level representations and the high level semantics giving rise to low classification accuracy. Deep learning is found to be helpful in producing accurate image classification results.

It has been revealed that the features extracted from a pre-trained CNN can be directly used as a collective image representation. Compared with the traditional feature extraction methods, deep features extracted by the deep learning methods can represent the information content of the massive image data effectively. In [4], authors observed that deep learning techniques exhibit high degree of accuracy as compared with classical machine learning methods. The problem of image classification, identification etc. are efficiently tackled by deep learning approaches. At present, commonly used deep learning networks are Stacked AutoEncoder [5], Restricted Boltzmann Machine [6], Deep Belief Network [7] and Convolutional Neural Network (CNN). Deep CNN [8] is the most effective one for image classification.

In this paper, CNN based technique to classify flower images using deep features extracted from fully connected layers of AlexNet [9] has been presented. Consequently, 9192 (f6 and f7-4096 features each and f8-1000 features) dimensional feature vector is obtained. Discriminate feature selection is then done by ranking them using minimum Redundancy Maximum Relevance Algorithm (mRMR) [10]. Support Vector Machine (SVM) [11] with linear kernel is employed to classify flower images. The experiments are performed on Flower 17 database belonging to Oxford Visual Geometry Group [12] and KLUF database [13]. Use of deep CNN ensures robustness, eliminates the need of hand crafted features and improves classification accuracy.

The rest of the paper is organized as follows: Sect. 2 highlights the related work on image classification, Sect. 3 contains the outline of the proposed method, Sect. 4 describes the datasets used and Sect. 5 provides experimental results. Finally, we conclude the paper in Sect. 6.

2 Related Work

Image classification is a vibrant research topic in computer vision. Several approaches have been proposed for image classification in an automatic manner. Image classification was done using hand-picked features until’90s. Computer vision and image processing based classification techniques use a blend of features extracted from images for improving classification accuracy. Commonly used features for image classification are: colour, texture, shape and some statistical information.

In [14], authors have developed and tested a visual vocabulary that represents colour, shape, and texture to distinguish one flower from another. They found that combination of these vocabularies yield better classification accuracy than individual vocabularies. However, the accuracy reported by this approach was 71.76% which is quite low.

Authors in [15] have developed an approach for learning the discriminative power-invariance trade-off for classification. Optimal combination of base descriptors was done in kernel learning framework giving better classification results. Though this approach was capable of handling diverse classification problems, classification accuracy was not up to the mark.

An improved averaging combination (IAC) method based on simple averaging combination was proposed in [16]. Dominant set clustering was used to evaluate the discriminative power of features. Powerful features were selected and added into averaging combination one by one in descending order. Authors claim that their method is faster. However, classification accuracy is not satisfactory.

The conventional flower image classification methods lack in robustness and accuracy as they rely on handmade features which might not be generalizable. Flower classification technique applied to one flower dataset is not guaranteed on a different flower dataset.

Automated feature extraction is essential for improving the classification accuracy. Deep learning techniques are very effective in extracting features from a large number of images. In [17], flower classification model based on saliency detection and VGG-16 deep neural network was proposed. Stochastic gradient descent algorithm was used for updating network weights. Transfer learning was used to optimize the model. Classification accuracy of 91.9% was reported on Oxford flower-102 dataset. In [18], AlexNet, GoogleNet, VGG16, DenseNet and ResNet were analysed for classification of kaggle flowers dataset. It was reported that VGG16 model achieved highest classification accuracy of 93.5%. However, the time complexity of this method was high. In [19], a generative adversarial network and ResNet-101 transfer learning algorithm was combined, and stochastic gradient descent algorithm was used to optimize the training process of flower classification. Oxford flower-102 dataset was used in this research. Accuracy of 90.7% was reported by authors.

Authors in [20] used f6 and f7 layers of AlexNet and f6 layer of VGG16 model for deep feature extraction. Feature selection was done using mRMR feature selection algorithm and SVM classifier was employed for classification of the flower images. Classification accuracy of 96.1% was reported by authors. In this approach, more time is needed to extract deep features from two pre-trained networks i. e. AlexNet and VGG16. In [21] combination of an improved AlexNet Convolutional Neural Network (CNN), Histogram of Oriented Gradients (HOG) and Local Binary Pattern (LBP) descriptors was used by authors for feature extraction. Principle Component Analysis (PCA) algorithm was used for dimension reduction. The experiments performed on Corel-1000, OT and FP datasets yielded classification accuracy of 96%.

In [22], authors extracted Deep CNN features using VGG19 from and handcrafted features using SIFT, SURF, ORB Shi-Tomasi corner detector algorithm. Fusion of deep features and handcrafted features was done. The fused features were classified using various machine learning classification methods, i.e., Gaussian Naïve Bayes, Decision Tree, Random Forest, and eXtreme Gradient Boosting (XGBClassifier) classifier. It was revealed that fused features using Random Forest provided highest classification accuracy. Caltech-101 dataset was used by authors.

A hybrid classification approach for COVID-19 images was proposed by combination of CNNs and a swarm-based feature selection algorithm (Marine Predators Algorithm) to select the most relevant features was proposed in [23]. Promising classification accuracies were obtained. The authors concluded that their approach could be applicable to other image classes as well.

Authors in [24] evaluated the performance of the CNN based model using VGG16 and inception over the traditional image classification model using oriented fast and rotated binary (ORB) and SVM. Transfer learning was used to improve the accuracy of the medical image classification. The experiments using transfer learning achieved satisfactory results on chest X-ray images. Data augmentation method for flower images was used by authors in [25]. The Softmax function was used for classification. Classification accuracy was observed to be 92%.

The attractive attribute of pre-trained CNNs as feature extractor is its robustness, and no need to retrain the network. The objective of using the deep CNN model for flower classification is that, the feature learning in CNNs is a highly automated therefore it avoids the complexity in extracting the various features for traditional classifiers. Hence, we are motivated to use deep feature extraction approach for flower classification. AlexNet [9], ResNet [26], GoogleNet [27], VGG 16 [28] are some of the available choices of pre-trained networks for image classification. AlexNet DCNN [9] was pre-trained on one million images so the feature values are simple. Other CNNs were trained on more than 15 million images, giving rise to more complex feature values at fully connected layers. In AlexNet, fully connected layers provide discriminant features suitable for SVM classifer. This is reason for selecting AlexNet as feature extractor in the proposed work.

From the review of related works it was revealed that deep learning based techniques tackle the image classification problem efficiently. There is scope for improvement in classification accuracy. In this paper, we present a simple approach for flower classification using deep features extracted from fully connected layers of AlexNet. Feature ranking and selection is done to avoid redundant features. MSVM is then used for classification of flower images. The proposed work is presented below.

3 Proposed Work

The proposed work consists of 3 stages, namely deep feature extraction using AlexNet [9], feature ranking and selection using mRMR algorithm [10], classification using MSVM classifier [11]. The block diagram of the proposed flower classification system is shown in Fig. 1.

3.1 Feature Extraction Using AlexNet

The AlexNet has eight layers out of which first five layers are convolutional layers and remaining layers are fully connected layers. Rectified linear unit (RELU) activation is used in each of these layers except the output layer. Use of RELU speeds up training process. The problem of overfitting is eliminated using dropout layers in AlexNet. This network was trained on ImageNet dataset having one thousand image classes.

In AlexNet, size of each input image is 227 * 227 * 3. First convolutional layer has 96 filters of size 11 * 11 with stride of 4. The size of output feature map is calculated as

Output Feature Map size = [(Input image size-Filter Size)/Stride] + 1.

Hence, Output feature map is 55 * 55 * 96.

Fig. 1.
figure 1

Block diagram of proposed flower classification system

Five convolution operations with different number and size of filters and strides are performed. At the end of fifth convolution, size of feature map is 13 * 13 * 256.

The number of filters goes on increasing as the depth of the network increases resulting in more number of features. The filter size reduces with depth of the network giving rise to feature maps with smaller shape. The fully connected layers f6 and f7 have 4096 neurons each. The last layer f8 is the output layer with 1000 neurons.

In the proposed approach, the deep CNN features extracted from fully connected layers of AlexNet are used. The size of concatenated feature vector is (f6 and f7 4096 features each and f8 1000 features) 9192. It is essential to reduce the number of features for saving computation cost and time hence, mRMR algorithm [10] for selecting distinct features by ranking them is used in this work, as explained in the following subsection. The proposed method is simple and less complex and no pre-processing on the input images is needed.

3.2 Feature Ranking and Selection: Minimum Redundancy Maximum Relevance Algorithm

If all the available features in the model are used then it suffers from the drawbacks such as high computation cost, over-fitting and model understanding difficulty. Therefore, distinct features should be selected. It leads to faster computation and accurate classification of flowers.

In this paper, a filter method called mRMR [10] is used on account of its computation efficiency and ability to effectively reduce the redundant features while keeping the relevant features for the model.

The mRMR algorithm finds each attribute as a separate coincidence variable and uses mutual information, I(x,y), among them to measure the level of similarity between the two attributes:

$$ I\left( {x,y} \right) = \sum \nolimits_{y \in Y} \sum \nolimits_{x \in X} p\left( {x,y} \right){\text{log}}(\frac{{p\left( {x,y} \right)}}{p_1 \left( x \right)p_2 \left( y \right)}) $$
(1)

where \(p\left( {x,y} \right)\) represents the combined probability distribution function of X and Y, and \(p_1 \left( x \right)\,and\,p_2 (y\)) represent the marginal probability distribution function of coincidence variables of X and Y, respectively.

To facilitate equation, each attribute \(f_i\) defined as a vector formed by sorting N features \(( f_i = [f_i^1 f_i^2 ,f_i^3 , \ldots ,f_i^N ])\). \(f_i\) is treated as an example of a discrete coincidence variable and mutual information between i and j attributes is defined as \(I(F_i ,F_j ). \)

Where i = 1, 2,… d, j = 1,2,… d and d represents number of feature vector.

Let S be the set of selected features and |S| shows the number of selected features. The first condition to select best features is called as the minimum redundancy condition and is given by

$$ minW,W = \frac{1}{\left| S \right|^2 } \sum \nolimits_{F_i ,F_{j \in } S} I(F_i ,F_j ) $$
(2)

And the other condition is named as maximum relevance condition which is given by

$$ minV,V = \frac{1}{\left| S \right| } \sum \nolimits_{F_i , \in S} I(F_i ,H) $$
(3)

The two simple combinations that combine the two conditions can be denoted by the following equations:

$$ \begin{gathered} {\text{Max(V,}}\,{\text{W)}} \hfill \\ {\text{Max(V/W)}} \hfill \\ \end{gathered} $$

The search algorithm is required to select the best number of feature, primarily, the first feature is selected according to Eq. (3). At each step, the feature with the highest feature importance score is added to selected feature set S.

3.3 Multi-class Support Vector Machine

Support Vector Machine is an effective tool which is widely used in image classification [11]. The elementary idea of SVM classifier is to find the best possible separating hyper-plane between two classes. This plane is such that there is highest margin between training samples that are closest to it. Initially, SVM was a binary class problem. Multiclass classification using SVM is done by breaking the multi-classification problem into smaller sub-problems named as one versus all and one versus one. One- versus one binary classifiers identify one class from another. One-versus-all classifiers separate one class from all other classes.

Let C1, C2,…, Cn be n number of classes.

Let S1,S2,…, Sm are the support vectors of the above classes.

In general,

$$ Ci = \mathop \sum \nolimits_{k = 1}^{n - 1} \mathop \sum \nolimits_{j = 1}^m c_k Sj $$
(4)

where Ci consists of a set of support vectors Sj, separates nth class from all other classes.

The discriminant features obtained in second stage assist SVM to classify flower images.

4 Dataset

We have used publicly available Flower 17 [12] and KLUF [13] datasets in this work.

4.1 Flower 17 Dataset [12]

This dataset consists of 1360 flower images of 17 categories (buttercup, colts’ foot, daffodil, daisy, dandelion, fritillary, iris, pansy, sunflower, windflower, snowdrop, lilyvalley, bluebell, crocus, tigerlily, tulip, and cowslip). There are 80 images in each category.

FLOWERS17 dataset from the Visual Geometry group at University of Oxford is a challenging dataset. There are large variations in scale, pose and illumination intensity in the images of the dataset. The dataset also has high intra-class variation as well as inter-class similarity. The flower categories are deliberately chosen to have some ambiguity on each aspect. For example, some classes cannot be distinguished on colour alone (e.g. dandelion and buttercup), others cannot be distinguished on shape alone (e.g. daffodils and windflower). Buttercups and daffodils get confused by colour, colts’ feet and dandelions get confused by shape, and buttercups and irises get confused by texture. The diversity between classes and small differences between categories make it challenging. Hence, handcrafted feature extraction techniques are insufficient for describing these images. Sample images from Flower 17 dataset are as shown in Fig. 2.

Fig. 2.
figure 2

Sample images from Flower 17 dataset [11]

4.2 KLUF Dataset [13]

KL University Flower Dataset (KLUFD) consists of 3000 images from 30 categories of flowers. There are 100 flower images in each category. Sample images in few categories of this dataset are as shown in Fig. 3.

Fig. 3.
figure 3

Sample images from KLUF dataset [12]

5 Experimental Results

The overall flower image classification problem is evaluated using different combinations of features extracted by fully connected layers f6, f7 and f8 of AlexNet. The convolutional layers provide low level features whereas fully connected layers provide high level features which are useful for flower image classification. Hence, we make use of features from fully connected layers. f6 and f7 provide 4096 features each and f8 provides 1000 features. Hence, total number of features is 9192. As mentioned in previous section, so many features increase computational burden and causes storage space problem. Therefore, feature selection is done and the selected features are trained using multiclass SVM classifier. Classification accuracy is tested for various combinations of number of features from f6, f7 and f8. Features from f6 and f7 have almost no effect on classification accuracy. From the obtained classification results, it was observed that 800 features from f8 provides better features compared with f6 and f7 for flower classification problem. Deep features from pre-trained AlexNet f8 layer are sufficient for classifying the flower images efficiently. There is no need to use features from other pre-trained networks. Novelty of our method lies in improvising the accuracy by integrating deep features with selection criteria followed by multiclass classification.

Classification accuracy was compared by splitting the Flower 17 dataset into 75–25% and 60–40% training-testing images.

Table 1 shows the effect of number of selected features from fully connected layers of AlexNet on classification accuracy. These results are obtained for 5 fold cross validation.

Table 1. Classification accuracy for different number of selected features

It is observed that using proposed approach highest classification accuracy of 97.7% and 97.8% were obtained on Flower 17 and KLUF dataset respectively when the total number of selected features were 800. Experimental results reflect that the features obtained from f8 are more crucial in improving classification accuracy. When more features are selected from f8, better classification accuracy is obtained. Reducing the number of features from f7 and f6 has very little effect on classification accuracy. It is also noticed that when datasets are split as 75% training 25% testing images then better accuracy is obtained compared with 60% training-40% testing images as given in Table 2.

Table 2. Comparison of classification accuracy for various partitions of dataset

Classification accuracy based on Flower type is given in Table 3. By comparing confusion matrices for different number of features, it was observed that all the flower classes were correctly classified maximum number of times except Flower class 14 (Crocus). This particular class has a very wide intra-class variation that is why more number of features from f8, f7 and f6 are required for its correct classification.

Table 3. Classification accuracy based on Flower type (Flower 17 database)

Proposed classification results are compared with few state of the art existing approaches as shown in Table 4. Very less classification accuracy of 71.76% was obtained in the approach proposed by Nilsback et al. [14] using colour, shape, and texture features of flower images. Improved classification accuracy of 82.55% was obtained by authors in [15]. They used best possible trade-off for classification and combination of base kernels. However, the feature selection in this method was poor. Corresponding to base features which achieve different levels of trade-off (such as no invariance, rotation invariance, scale invariance, affine invariance, etc.) authors obtained classification accuracy of 82.55%.

Handpicked features HOG shape descriptor, Bag of SIFT, Local Binary Pattern, Gist Descriptor, Self-similarity Descriptor, Gabor filter and Gray value histogram were used by Wei et al. [16]. Clustering, ranking and averaging combination of features were used to yield classification accuracy of 86.1%. Though the accuracy is satisfactory, this approach is very cumbersome as it involves manual way of feature extraction.

In [3], authors used first, second and third layer semantic modelling which provided accuracy of 87.06% but it was noticed that the accuracy does not increase by adding more layers.

In [20] 800 features form AlexNet{fc6 + fc7} + VGG16{fc6} are needed for achieving classification accuracy of 96.1%.

Proposed deep feature based classification employing feature ranking and selection outperforms above mentioned approaches give accuracy of 98.3% and 97.7% with 820 {AlexNet f6–10, f7–10 and f8–800} features. It is revealed that selection of more number of features from f8 layer improves classification accuracy.

Feature selection strategy in our proposed work helped us in getting discriminative features which lead better classification accuracy using multiclass SVM classifier.

Table 4. Comparison with state of the art approaches

Figure 4 shows comparison of classification accuracy of existing approaches and proposed method.

Fig. 4.
figure 4

Comparison of classification accuracy

6 Conclusion

In this paper, an accurate and efficient flower classification system based on deep feature extraction using AlexNet is proposed. In the presented approach, to select relevant features, feature ranking is done using mRMR algorithm. It is revealed that fully connected layer f8 provides more prominent image features compared with f6 and f7 for flower classification problem. Further, Multiclass SVM classifier is used for classification. Classification accuracy of proposed method is studied for different number of features selected from fully connected layers. Classification accuracy of 97.7% and 98.3% was observed on Flower 17 database and KLUF database respectively, which is far better compared with existing methods. It is revealed that the proposed approach is efficient and very much useful in flower patent search as well as identification of flowers for medicinal use.

Finding of this research lies in designing a method for tailoring up deep architecture with conventional classification algorithm with suitable features selection algorithm to achieve higher accuracy compared to existing works. Proposed method on the flower classification problem, can be applied to other applications, which share similar challenges with flower classification hence in future evaluation of performance of proposed method on different dataset is to be done.

Classification accuracy of few classes is less, which reduces the overall accuracy. This is the limitation of proposed method. By combining AlexNet f8 features with features extracted from other deep CNN models, the classification accuracy for all flower classes in the dataset can be improved.