Keywords

1 Introduction

Machine learning (ML) and computer vision are significant application areas introduced with the intent of making machines think and make intelligent decisions. A number of machine learning algorithms have also gained immense popularity. Deep learning algorithms are also being substantially used in various classification tasks due to their ability to deal with large numbers of unstructured training data such as images. The deep learning (DL) model has a strong learning capacity which combines feature extraction and categorization. Convolutional neural network (CNN, ConvNet) is a class of deep neural networks that has proven to be powerful in pattern recognition tasks. CNNs are capable of capturing the automated features from the input images and thus they can accurately categorize images into distinct classes. ConvNets also maintain the spatial and temporal features of the input image which are essentially useful for obtaining accurate predictions. Deep learning, being a branch of machine learning, has strong learning capacity and also combines feature extraction and classification tasks into one. It is being used for different applications such as virtual assistants, image identification, self-driving cars, language translators, health care, fraud detection, etc. Deep learning techniques are nowadays widely used in plant seedling classification tasks [1,2,3,4,5]. These methods make use of large datasets containing labeled seedling images for training the models so that it can accurately categorize the new seedling images. The involvement of deep learning algorithms has essentially improved the classification accuracy, thus allowing plant species identification more precisely. The process of plant seedling classification also involves the identification and classification of different plant species based on their seedling images. This method is significantly related to utilize image processing techniques, as it needs sophisticated methods for image analysis and feature extraction. Plant seedling classification is considered as one of the significant research areas due to its ability to accurately identify different plant species based on its seedlings. This approach is especially useful in many areas such as agriculture, forestry and environmental conservation.

Plant seedling classification in forestry can help with the identification of tree species, which is essential for managing forests sustainably. Plant seedling classification can aid in identifying invasive plant species that could outcompete native species in environmental protection and damage the ecology. Automated classification of plant seedlings incorporated with computer vision, machine learning and deep learning techniques proved to be a promising field for researchers worldwide. Plant seedling classification is a challenging task due to various reasons. Since the current climatic conditions pose a substantial obstacle to agricultural growth, producing plant outputs at lower price is utmost important. Also, there is huge variation in the appearance of seedlings based on the different environmental factors. These variations pose a challenge in developing a well accurate automated system for distinguishing these species. Therefore, there is a high need to develop an automated plant species recognition system through the seedlings, since it can help farmers monitor the growth of crops, identify pests, optimize the use of fertilizers, water, etc. Plant seedling classification can provide an efficient computational solution for addressing the food security, environmental conservation and sustainable agriculture issues.

In this study, different deep CNN-based pretrained models are used for classification since the CNN models are versatile in feature extraction and can analyze information from each input image precisely. The pretrained models used in this work are InceptionResnetv2, Xception, InceptionV3, Resnet50 and MobileNetV2. The transfer learning models used here take the 12 classes of seedling images as input and automatically differentiate between weed species and crop seedlings in their early phases of growth, accurately predicting the kind of image given. The dataset used in this study comprises about 960 plant species from 12 different classes containing a total of 4750 pictures given by the Aarhus University Signal processing group in collaboration with the University of Southern Denmark. Along with investigating a better deep model, this study also aims to conduct a comparative analysis regarding the performance of each of the individual models.

In this study, some data preprocessing techniques are also applied to obtain reliable results. The study also focuses on investigating the significance of deep learning models for plant seedling classification since deep learning uses neural networks to learn useful representations of features directly from data and it is the most preferred model for image classification. The contribution of the work lies in the identification of Xception deep architecture as the suitable model for plant seedling classification.

The remaining part of the paper is structured as follows: Sect. 2 deals with an extensive study in plant seedlings classification. Section 3 deals with the proposed model for plant seedlings classification. Section 4 presents the various deep models used in this study for plant seedlings classification. The experimental results are sketched in Sect. 5. Performance comparison and results of the proposed work with other works are described in Sect. 6. Lastly, the conclusions are given in Sect. 7.

2 Related Work

Automatic identification and classification of good quality seedlings have emerged as a scientific discipline in the field of agriculture. Since there is a high need for production of good quality seedlings, there is a need to adopt computational paradigms such as machine learning (ML) and deep learning (DL). DL which is a subset of ML, has achieved remarkable progress in recent years. The deep learning-based pretrained architectures are showing remarkable improvement in the performance, with which the implementation of these models would produce better results for classification tasks. Also, it will highly reduce the chances of misclassification. A public image database consisting of approximately 960 unique plants belonging to the 12 species of plants is presented by Giselsson et al. [4]. The authors also performed segmentation by using Naive Bayes for the identification of vegetation pixels in the image. Nkemelu et al. [1] proposed a method for the classification of plant seedlings by exploiting the performance of traditional machine learning classifiers—KNN, SVM and CNN. The custom CNN model is implemented with and without background segmentation. The results showed that the custom CNN applied with background segmentation performs well for classifying seedlings with an accuracy of 90.26%. They also used the Aarhus University dataset. Elnemr et al. [3] presented a method for the classification of plant seedlings by developing a custom CNN model for automatically discriminating between weed species and seedlings at early growth stages. The CNN comprises of an input layer, hidden layer and an output layer, and the seedlings images were resized to 128 * 128 pixels. They also used the same dataset. The system achieved average accuracy of 94.38%. Ashqar et al. [2] addressed the problem of classification of plant seedlings by using a segmented dataset along with fine tuning the VGG16 architecture with two experiments. In the first experiment, they used the original plant seedling dataset and got a validation accuracy of 98.57%. In the second experiment, the balanced plant seedling dataset is used and obtained 99.48% accuracy. Alimboyong et al. [5] employed the AlexNet model for plant species classification by using the same dataset containing approximately 4,234 unique plants provided by the Aarhus University wherein the validation accuracy was 99.77% and testing accuracy was 99.69%. Namratha et al. [6] used the pretrained models ResNet50V2, MobileNetV2 and EfficientNetB0 for the classification of plant seedlings and they also worked on the same Aarhus University dataset. Their study revealed that EfficientNetB0 has the highest accuracy of 96.52% when compared with the other deep models. Gupta et al. [7] used the same dataset and they employed five deep learning models for classifying plant seedlings—ResNet50, VGG16, VGG19, Xception and MobileNetV2 and the result showed that ResNet50 obtained the highest accuracy of 95.23%. Malliga et al. [8] also used the same dataset and implemented a custom CNN and VGG16 architecture and their results proved that VGG16 is better in classifying seedlings and obtained higher accuracy of 90.36% when compared with the custom CNN. Ofori et al. [9] implemented three experiments with the same dataset and their study involved training the five deep learning models—VGG16, InceptionV3, DenseNet121, ResNet152 and Xception by initializing them with random weights and then next experiment dealt with using the pretrained models as fixed-feature extractors and then in the final experiment, all the models are fine-tuned. The results showed that the accuracy is high when the models VGG16, DenseNet121 and ResNet152V2 are fine-tuned and also Xception and InceptionV3 worked well when they are initialized with random weights. Rahman et al. [10] used the same dataset provided by the Aarhus University and implemented the pretrained models LeNet-5, VGG-16, DenseNet-121 and ResNet-50. The authors performed data preprocessing in the images and their study intended to find the best performing model among the implemented models and found out that ResNet-50 proved the best for classifying plant seedlings with the accuracy of 96.21%. Surveys on exsisting models are analyzed and are given in Table 1.

Table 1 Literature review

3 Proposed Approach

This approach explores the significance of deep models in plant seedling classification and investigates the best model. This approach employs a range of deep CNN-based classification models for plant seedling categorization because CNN models are versatile and can analyze the information from each input precisely. In this work, we focus on transfer learning models to do the classification. The basic idea of transfer learning is that we are adopting a model that is well-trained on a large dataset, and using it to train a small dataset. Transfer learning can be used to retrain the model to apply in different research problems. Since models are already trained in ImageNet dataset, better features can be used for classification. The transfer learning models that are implemented for our plant seedlings classification are InceptionResNetv2, Xception, Inceptionv3, ResNet50 and MobileNetv2. We have also fine-tuned the models by freezing the last two convolution layers. First, we need to load the pretrained weights from the transfer model. Each layer of the transfer learning model should be taken. Then, freeze the layers of the convolutional layers. On top of the frozen layers, add fresh, trainable layers. We need to train only the custom classifier layers that we have added to the top of the pretrained model. Thus we can optimize the pretrained model to our plant seedlings classification. Apply the previous CNN features to a fresh dataset and make predictions. Finally, use our plant seedlings dataset to train the new layers. Proposed methodology is shown below in Fig. 1.

Fig. 1
A horizontal flow chart of the proposed methodology. The input data is plant seedling images. The next step is data preprocessing, followed by a block of Inception ResNet v 2, Xception, Inception v 3, ResNet 50, and MobileNet v 2. The last step is the database of output classes.

Methodology proposed

3.1 Data Preprocessing

Data preprocessing is extremely important for data analysis since it involves improving the overall accuracy and reliability of the model. The various data preprocessing steps applied before the classification process in this study are shown in Fig. 2.

Fig. 2
An illustration presents the methodology for preprocessing. They are data augmentation, normalization, and label encoding.

Methodology for preprocessing

  1. (i)

    Resizing: Resizing is an essential step in preprocessing in order to make the format of the input images uniform. In this study, the training images are of different sizes, so all the images are resized to a fixed scale of 299 × 299.

  2. (ii)

    Label Encoding: In the dataset, the 12 categorical labels are mapped to numbers corresponding to each class 0, 1, 2 … 11.

  3. (iii)

    Data Augmentation: It is used to increase the size of a training set by augmenting more images for better classification and hence to obtain better results. Overfitting is also avoided by using data augmentation. In this study, ImageDataGenerator is used to augment the images. Zooming, rotating, flipping and rescaling are the augmentation techniques employed in this study.

  4. (iv)

    Splitting the Dataset: The dataset is split in an 80:20 ratio for training and validation purposes.

4 Methods—Deep Pretrained Models

The various pretrained CNN models used in this study are as follows:

  1. (i)

    InceptionV3: It employs depth-wise separable convolutions, which means that instead of combining all three and flattening them, each color channel is given its own convolution. As a result, the input channels have been filtered. The important properties of Inception V3 includes label smoothing, factorized 7 * 7 convolutions and the use of an auxiliary classifier to propagate label information to the lower layers of the network. The input shape is (299, 299, 3). It should have exactly three input channels.

  2. (ii)

    InceptionResNetV2: This model is similar to the Inception family but has residual connections in it. The residual connections are achieved by replacing the Inception architecture filter concatenation stage. Residual connections provide for model shortcuts and have enabled researchers to train increasingly deeper neural networks, resulting in even greater performance. The Inception blocks have also been significantly simplified as a result of this operation.

  3. (iii)

    ResNet50: Residual networks (ResNet) is a well-known neural network that serves as the foundation for many computer vision applications. Deep neural networks always need to address the vanishing gradient problem which makes the network difficult to learn and train images. To overcome this, activation functions from a layer might be sent straight to a network’s deeper layer, such as skip connection. ResNet’s building blocks are residual blocks/identity blocks. When the activation of a layer is quickly transmitted to a deeper layer in the neural network, a residual block is created. The input shape is (224, 224, 3). It should have exactly three input channels.

  4. (iv)

    Xception: The Xception model is similar to the Inception architecture, wherein the traditional Inception modules are modified with depth-wise separable convolutions. This model has less parameters and is more accurate. In Xception, the depth-wise separable convolution includes a pointwise convolution followed by a depth-wise convolution. A channel-wise spatial convolution is performed first followed by a 1 × 1 depth-wise convolution on the result. In depth-wise separable convolution, each input channel receives a separate convolutional filter. The filter is as deep as the input in standard 2D convolution done over multiple input channels, allowing us to arbitrarily combine channels to produce each element in the output. Also, in Xception, the updated depth-wise separable convolution, there is no intermediary ReLU nonlinearity. The input shape is (299, 299, 3). It should have exactly three input channels.

  5. (v)

    MobileNetV2: MobileNetV2 is a convolutional neural network that has been optimized for use on mobile devices. It is built on an inverted residual structure, with residual connections connecting bottleneck levels. It is nearly identical to the original MobileNet, with the exception of inverted residual blocks with bottlenecking capabilities. It has a considerably less number of parameters than the original MobileNet. Any picture size greater than 32 by 32 pixels is supported by MobileNets, with larger image sizes providing better performance.

5 Results Analysis

5.1 Aarhus Dataset

The Aarhus University signal processing group in collaboration with the University of Southern Denmark contributed 4750 images of roughly 960 different plants categorized into 12 species of plant seedlings collected at early growth stages for this work. The dataset is downloaded from the Kaggle website.

5.2 Experimental Results

The CNN models employed in this study are implemented using the Python Colab environment.

Table 2 shows the results for the comparison on the accuracy of the various models. The results show that the Xception model gives better accuracy when compared with the other deep models. All of the five models are trained over 30 epochs. The optimizer used is Adam with a batch size of 32 and with the categorical cross-entropy loss function.

Table 2 Comparative analysis

The above accuracy and the loss plot (Figs. 3, 4, 5 and 6) show the performance of deep architectures for the plant seedlings classification. Table 3 also demonstrates the accuracy of the Xception model model for plant seedlings classification. The figures depict the accuracy curve and and loss curves for the best performing Xception model and InceptionResNetv2 model.

Fig. 3
A multi-line graph of accuracy versus epoch plots 2 curves. Accuracy train begins at (0, 0.25), ascends in a concave downward manner and ends at (18.5, 0.99). Accuracy validation begins at (0, 0.5), ascends in a concave downward manner and ends at (18.5, 0.98). Values are estimated.

Accuracy curve of Xception model

Fig. 4
A multi-line graph of loss versus epoch plots 2 curves. Loss train begins at (0, 2.3), decays exponentially, and ends at (18.5, 0.2). Loss validation begins at (0, 1.6), decays exponentially, and ends at (18.5, 0.2). Values are estimated.

Loss curve of Xception model

Fig. 5
A multi-line graph of accuracy versus epoch plots 2 curves. Accuracy train begins at (0.0, 0.2), ascends in a concave downward manner, and ends at (18.5, 0.99). Accuracy validation begins at (0.0, 0.05), ascends in an S-shaped manner, and ends at (18.5, 0.99). Values are estimated.

Accuracy curve of InceptionResNetV2 model

Fig. 6
A multi-line graph of loss versus epoch plots 2 curves. Loss train begins at (0.0, 4.5) and ends at (18.5, 0) in a decreasing trend. Loss validation begins at (0.0, 13.5) and ends at (18.5, 0) in a fluctuating trend. Values are estimated.

Loss curve of InceptionResNetV2 model

6 Performance Analysis on Aarhus Dataset

See Table 3.

Table 3 Performance analysis on Aarhus dataset

7 Conclusion

The significance of classifying plant seedlings from weeds is an inevitable process to boost agricultural yields and to reduce the losses. Deep learning-based transfer learning models are used in this research to create efficient feature map and thereby predict reliable distinctions among plant seedling species. In this paper, we tried to investigate the performance of deep architectures and identified the best model to be used for plant seedlings classification. The experimental results yielded improved results, with all of the used models achieving validation accuracies of over 90%. The ablation study on the dataset from Aarhus University, Denmark throws light into the efficiency of the Xception model that obtained an accuracy of 96% for plant seedlings identification.