Keywords

1 Introduction

Many scientists have been interested in the implementation of Deep Learning models in practice in recent years, particularly the Single Shot MultiBox Detector (SSD) model [1, 2]. SSD is a well-known algorithm for dealing with issues including large data processing, input noise management, and online processing. In addition, the Faster region-based convolutional neural networks (Faster R-CNN) model is also one of the best models available today [3, 4].

SSD is intended for real-time object detection [5, 6]. Faster R-CNN creates boundary boxes using a region proposal network and then uses those boxes to classify objects [3]. While it is called cutting-edge inaccurate, the entire process runs at 7 frames per second, which is much below what real-time processing requires. By eliminating the requirement for the region proposal network, SSD speeds up the procedure. SSD uses a few innovations, such as multi-scale features and default boxes [2], to compensate for the decline in accuracy. These enhancements allow SSD to match the accuracy of the Faster R-CNN utilizing lower quality pictures, increasing the speed even further. Table 1 shows that it reaches real-time processing speed and even outperforms the accuracy of the Faster R-CNN [7].

Table 1 Data collection and labeling for flowers [7]

SSD does not employ a delegated region proposal network. Instead, it boils down to a really simple operation. Both the location and class scores are calculated using small convolution filters. SSD predicts using three convolution filters for each cell after extracting the feature maps. These filters produce the same results as traditional CNN filters.

Recognition accuracy is an essential factor of the model when applied in practice. When the input is noisy (noise: the image is in a dark environment, it's raining or the image is partially obscured…), how does it affect the identification process? In this study, the influence of input noise on the accuracy of recognition will be shown.

2 Research Deployment

It is critical to create a data collection in order to train learning models. Because it has an impact on the trained model's output. The data for training learning models include 10 different flower species that were collected from internet sources.

2.1 Data Collection and Flower Labeling

A total of 500 photos of objects were gathered for the training of geometric models [8]. The objects (flowers) are labeled and divided into two data sets: one data model was trained to account for 80% of the total item recorded, while the test data set was trained to account for 20%. Data sets for teaching and testing are chosen at random.

The LabelImg software is used to label the objects during the picture preprocessing stage. Table 2 details the number of photos of each object that was gathered and tagged.

Table 2 Data collection and labeling for flowers

2.2 Operating Model Environment

Experimental author on PC Intel: CPU core i7 9700F, Memory (RAM) 32 GB, Hard Drive (SSD) 128 GB, Graphics card (VGA) 1050TI.

2.3 Model of Training for Learning

SSD architecture based on VGG with 256 output channels, 3 × 3 kernel, 2 × 2 stride, and pad 1 × 1 (Fig. 1).

Fig. 1
a and b are screenshots that present losses for steps at the beginning and end of the procedure of data training.

a The procedure for beginning model data training; b the procedure of terminating model data training

The author model's training was halted during the training phase due to a tensorboard graph and a histogram of loss over time. As demonstrated in Fig. 2, the loss in training ranges from 0 to 1.5 in step 12,000. As a result, after the model has been trained to this step limit, learning can be stopped. At step 18,000, the author finished training the model and received a value of 1.5, which reflects the training loss (Fig. 1b). One step takes an average of 1.300 s to train.

Fig. 2
A line graph of loss versus time plots a trend that descends in a concave upward manner with fluctuations.

Loss chart over time of the model

2.4 The Real Model Operation

With 15 samples (pictures) for each flower and an identification process for four distinct environmental variables, the author created a real-life identification model for recognizing 10 species of flowers. The photos were acquired from a Google video source and were inspired by reality. A total of 600 (images) were collected for the identification model [9]. The findings of the author's photo identification have been preserved in reference [10].

2.5 The Performance of the Algorithm

The performance of the recognition process is based on the number of correctly recognized sample images divided by the total number of recognized model sample images.

$$A\left( \% \right) = \frac{S}{TS}100;$$

where:A: Accuracy of the algorithm;

S: Number of the correctly identified sample images;

TS: Total number of the identified model sample images.

3 Actual Model Performance

Identification result conventions: A verified input sample produces the correct identification result; the effect of poor identification with the validated input sample produces a false identification result. An unidentified result is one that does not identify any species or recognizes more than one species.

3.1 The Results of Identification with the Object Is Fully Lightened

Table 3 illustrates the outcomes of model recognition when the image is not shaded. Table 3 shows a total of 150 input control samples in the red box, and the number of samples defined by the model in the blue box. The findings revealed that all samples were correctly recognized. In this scenario, the model accurately recognizes and the accuracy rate is 100%.

Table 3 The findings of item identification are fully lightened

3.2 The Results of Identification with the 1/3 Size of the Object is Darkened

Similarly, Table 4 shows that the model recognized 122 objects out of 150 input samples, resulting in an identification rate of 81.3%. There were four objects in this scenario that had a 100% identification rate. The model correctly recognized 3 samples, 1 sample was not detected, and 6 samples were incorrectly identified. Porcelain flowers had the lowest recognition rate, with a ratio of 53.3%. There are 22 unidentified objects and 6 false positives in this environment.

Table 4 The results of identification with the 1/3 size of the object is darkened

3.3 The Results of Identification with the 1/2 Size of the Object is Darkened

Table 5 shows that we have 150 objects, with the model identifying 67 of them. The identification rate for this scenario is 44.7%, and no object has a 100% identification rate. With an accuracy score of 86.7%, the rose specie has the best identification accuracy, while the apricot blossoms specie has the worst with a rate of 20%. Using 15 objects samples as input The model detected three samples, whereas ten samples were not identified and two samples were incorrectly identified. There were 51 correctly recognized objects in total, with 7 incorrectly identified objects. Moreover, half of the model objects were not detected when the object was occluded 1/2.

Table 5 The results of identification with the 1/2 size of the object is darkened

3.4 The Results of Identification with the Object is Fully Darkened

Table 6 reveals that a total of 93 objects samples were accurately recognized. With 15 input samples, the model correctly identified one sample, four samples were incorrectly identified (Lily: three samples; Apricot Blossom: one sample), and ten samples were not identified. There were 28 unidentified samples and four incorrectly recognized samples in the case of the objects in the dark.

Table 6 The results of identification with the object is fully darkened

3.5 Comparison of the Effect of the Shade on the Model Recognition

To test whether the noise affects the model recognition, we compared the accuracy of the model recognition corresponding to different part shades.

The results showed that the accuracy of the model recognition when the object is fully lightened, 1/3 size of the object is darkened, then 1/2 size of the object is darkened and the object is fully darkened is 100.0 (%), 81.3(%), 44.7(%) and 62.0 (%), respectively (Table 7). The results of the analysis of variance (ANOVA) illustrated a significant difference (p < 0.05) in the accuracy of the model recognition from different part shades. The accuracy of the model recognition was significantly higher in the case of the object being fully lightened and 1/3 size of the object being darkened than in the case of 1/3 of the size of the object being darkened and the object being fully darkened (p < 0.05, Least Significant Difference Test). However, there were no significant differences were found between the object being fully lightened and 1/3 size of the object being darkened (p > 0.05, Least Significant Difference Test). A similar tendency was detected also for 1/2 size of the object is darkened and the object is fully darkened (p < 0.05, Least Significant Difference Test).

Table 7 The effect of the shade on the model recognition

4 Conclusions

In this paper, we have proposed an experimental method for the SSD model to detect objects in normal states and noisy states. The algorithm has been shown to be able to detect objects under poor conditions, such as changes in illumination, 1/3, 1/2 size of the object is darkened. The results showed that the detection accuracy decreases when the subject is placed under poorer conditions. The proposed algorithm achieves modern detection accuracy of 100.0% and 62.0%, the object is fully lightened and the object is fully darkened, respectively. The accuracy rate of the model is also reduced in the case of 1/3 and 1/2 of the objects being obscured, to 81.3% and 44.7%, respectively. This research result will certainly bring offer much value to the application of the SSD model in practice. In our future works, we will aim to improve the recognition accuracy of the model when the object is placed under poor conditions.