1 Introduction

The safety helmet, as the basics safety protection equipment for constructers, is mainly used to protect workers’ head and prevent them from being hit and collided by sharp objects [18]. It is of great significance to the personal safety of constructers. Therefore, it has become a hot topic of artificial intelligence to detect the wearing status of safety helmet, in different construction scenarios, based on computer vision technology, especially in pattern recognition community.

The task of object detection is to find out the interested objects in the image and determine their classes and locations. We form the safety helmet wearing status detection as two class object detection problems. In pattern recognition community, there are two main methods, traditional statistical pattern recognition and deep learning, to detect the safety helmet wearing status.

In traditional object detection, firstly, some candidate regions are selected from a given image. Then, feature extraction is carried out for these regions, the commonly used features are Histogram of Oriented Gradient (HOG) [4], Local Binary Patterns (LBP) [12], Scale-Invariant Feature Transform (SIFT) [10] etc. Finally, the pretrained classifier is used for classification. Limei Cai et al. constructed standard images of helmets, extracted four directional features, modeled the distribution of these features using a Gaussian function and separated local images of frames into helmet and non-helmet classes [2]. Bahaa et al. Detect the helmet in different scenarios [1]. Park et al. proposed a background difference method for detection [13]. After extracted HOG features from the input image, SVM classifier is used to detect person and safety helmet.

There are two problems in the existing safety helmet wearing detection based on traditional methods. One is the region selection strategy based on sliding window lack of pertinence, with high time complexity and redundant window. Another is the hand-designed features have poor robustness to the changes of diversity.

In deep learning method, after multiple convolutional and pooling operations, the extracted features are put into the classifier for detection. There are two distinct ideas of deep learning detection. One is using semantic segmentation [11, 27] for reference, which mainly includes You Only Look Once (YOLO) [14], YOLOv3 [17] and Single Shot MutiBox Detector (SSD) [19], predicting the classes and locations of different objects with convolutional neural network. This kind of method can achieve higher accuracy while slower detection speed. Another is the R-CNN framework based on candidate region, which mainly includes R-CNN [8], Fast R-CNN, and Faster R-CNN [15]. Within this framework, the object candidate box is generated for classification and regression. This framework has strong generalization ability and has been widely used. Cheng Raoet al. proposed a helmet wearing detection algorithm based on SAS-YOLOv3-Tiny [3]. Fang Ming et al. proposed a fast detection algorithm based on YOLOv2 [5].Added dense blocks, and compressed the network by adopting the lightweight network structure, to realize the rapid detection of safety helmet. Wen P et al. proposed a method based on the YOLOv3 algorithm to detect the wearing of safety helmet [24]. By using multi-scale training and increasing the anchor points to enhance the robustness of network for different sizes object detection, the Fast R-CNN is optimized.

In general, deep learning can achieve high accuracy, however a lot of samples, high hardware, and long time is need for training. Moreover, there is no public large-scale safety helmet datasets. In this way, the unbiased estimation based on deep learning is in efficient on small samples.

In order to realize the accurate detection of helmet wearing status, this paper proposes an new IBRFs. Based on HOG feature to construct random ferns, then weak classifiers are constructed. Selected the most discriminative ones to build a strong classifier to detect the wearing status of the safety helmet. The main contirbutions of our work are summarized as follows:

  • Feature extraction of image based on HOG domain solves the inaccurate feature description.

  • Weak classifiers are constructed based on BRFs, by selecting the position points and parameters with strong discriminative ability to improve their performance.

  • An improved ensemble learning algorithm is proposed to enhance the discriminative ability of strong classifiers.

The rest of this paper is composed of 4 sections. Section 2 briefly describes the theory of random ferns. Section 3 presents the proposed IBRFs. Section 4 submits experimental validation of IBRFs on GZMU-SHWD. And section 5 concludes the work and points out the future work.

2 RandomFerns Algorithm

Random Ferns algorithm (RFs) is an ensemble learning method that performs well in classification and regression tasks in machine learning [21, 25]. As shown in Fig. 1, RFs takes a particular decision tree as the basic meta-model, and there is only one judgment criterion in each layer of fern. In the random ferns, the training time increases linearly as the number of ferns increases, so the calculating feasibility can be confirmed.

Fig. 1
figure 1

Tree classifier and Fern classifier

2.1 Random ferns cluster based on HOG domain

In order to improve the robustness of the random ferns classifier for illumination and intra-class variation, we extract image HOG features referencing to the feature representation based on the intensity domain in the RFs [20]. Given an image window X, HOG is calculated in the center position U of the sub-window S. Each local gradient in the sub-window S contributes to the construction of the HOG, and is positioned in the corresponding box through weighting processing. In the HOG, two boxes are randomly selected for binary test:

$$f\left(x;u,\theta \right)=I\left( HOG\left(x;u,b\right)> HOG\left(x;u,{b}^{\prime}\right)\right)$$
(1)

Where I(a) is the index function, if ais true, then I(a) = 1, otherwiseI(a) = 0. θ = {b, b} is two randomly selected dindexes in the interval [1, B], and B is the total number of cases in the histogram. A binary feature vector of fern is synthesized by aggregating M local features to represent S belong to one class. Therefore a single random fern f(x; u, θ) based on HOG domain is represented as follows:

$$f\left(x;u,\theta \right)=\left[f\left(x;u,{\theta}_1\right),\dots, f\left(x;u,{\theta}_M\right)\right]$$
(2)

In Fig. 2, the right one is random ferns based on HOG, each output of the random ferns is a M dimensional binary feature vector Z, Z ∈ 0, …, 2M − 1. In the image window X, When M = 3, the output feature of random fern s is Z = {011} = 3, Z = {0, …, 255}. The output of the random ferns is determined by u and θ = {θ1, …, θM}.

Fig. 2
figure 2

Random Ferns based on HOG

As shown in Fig. 3, when training N samples in the same class, random ferns will output a probability distribution P(F| C). For training samples with multiple classes, random ferns will output probability distribution P(F| Ck) of each class, Ck denotes the probability of the input sample belongs to class k. When testing a sample with an unknown class, using a fixed-size rectangular sliding window to scan it, its feature is calculated by random ferns. Semi-naive Bayes is used to get the final classification.

$$Class(f)=\mathit{\arg}{\mathit{\max}}_kP\left({C}_k\right)\prod_{n=1}^Np\left({F}_n|{C}_k\right)$$
(3)
Fig. 3
figure 3

Random Ferns classifier

2.2 Boosted random ferns

In order to improve the performance of the classifier, BRFs [22] adopts the Real AdaBoost strategy [7] to select T weak classifiers ht(x), t = 1, …, T with strong discriminative ability to build a strong classifier H(x):

$$H(x)=\mathit{\operatorname{sign}}\left(\sum_{t=1}^T{h}_t(x)-\beta \right)$$
(4)

Where β is a threshold that determines the classifier tolerance, each weak classifier returns a confidence score:

$${h}_t(x)=\frac{1}{2}\mathit{\log}\left(o\left(x;{u}_l,{\theta}_r\right)\right)$$
(5)
$$o\left(x;{u}_l,{\theta}_r\right)=\frac{P\left(z|O\right)+\varepsilon }{P\left(z|B\right)+\varepsilon }$$
(6)

Where ε is a small positive integer to avoid zero division. Higher score weak classifier ht(x) denotes there is a significant difference between object and background. It means the random ferns can produce a weak classifier with recognition ability on the location of u and parameter selection θ.

In order to get the most suitable u and θ for each weak classifier, using N samples S = {(x1, y1), …, (xN, yN)} to train the classifier, where yi ∈ {−1, +1}, yi =  − 1 represents the image window xi belongs to the background class, and yi =  + 1 represents xi belongs to the object. Every training sample also has an associated weight ωt(i), it is initialized as \({\omega}_1(i)=\frac{1}{N}\). The main steps of BRFs as follows:

Specifically, we first define a set {u1, …, uL} of all possible 2D pixel coordinates within a window X and a pool {θ1, …, θR} of R different sets of random pair of histogram bin indices.

Then, a weak classifier is constructed. Each iteration of a possible random pair (ul, θr) generates a random fern f(x; ul, θr). Each random fern is evaluated according to the whole training datasets, and HOG of object and background was constructed.

$$P\left(z|O\right)=\sum_{i:f\left({x}_i;{u}_l,{\theta}_r\right)=z\hat{\mkern6mu} {y}_i=+1}{\omega}_t(i)$$
(7)
$$P\left(z|B\right)=\sum_{i:f\left({x}_i;{u}_l,{\theta}_r\right)=z\hat{\mkern6mu} {y}_i=-1}{\omega}_t(i)$$
(8)

Where P(z| O) stands for P(f(xi; ul, θr)| O), the output valuezof random ferns under the parameters ut and θt represents the probability that the image windowxbelongs to the object. P(z| B) stands for P(f(xi; ul, θr)| B). It is the probability that the image window belongs to the background.

The discriminative ability of each weak classifier ht(x) is determined by the Bhattacharyya coefficient [16] Q between P(z| O) and P(z| B).

$$Q\ \left({u}_l,{\theta}_r\right)=2\sum_{z=0}^{2^M-1}\sqrt{P\left(z|O\right)P\left(z|B\right)}$$
(9)

The value of Q is lower, the discriminative ability of weak classifier is stronger. A weak classifier ht(x) is constructed by retaining the parameters \(\left({u}_{l^{\ast }},{\theta}_{r^{\ast }}\right)\) that minimize Q.

$$\left({u}_{l^{\ast }},{\theta}_{r^{\ast }}\right)=\mathit{\arg}{\mathit{\min}}_{l,r}Q\ \left({u}_l,{\theta}_r\right)$$
(10)
$${h}_t(x)=\frac{1}{2}\mathit{\log}o\left(x;{u}_{l^{\ast }},{\theta}_{r^{\ast }}\right)$$
(11)

Finally, sample weight is updated:

$${\omega}_{t+1}(i)=\frac{\omega_t(i)\mathit{\exp}\left(-{y}_i{h}_t\left({x}_i\right)\right)}{\sum_{i=1}^N{\omega}_t(i)\mathit{\exp}\left(-{y}_i{h}_t\left({x}_i\right)\right)}$$
(12)

3 Algorithm design

3.1 Improved boosted random ferns algorithm

In order to enhance the detection accuracy, the Real AdaBoost method [7] is introduced in BRFs [22]. It selects the weak classifier ht(x) with strong discriminative ability, and obtains the classifier H(x) by accumulating sum.

This method simply combines multiple weak classifiers, and pays less attention to weak classifiers with low weight. In order to achieve optimal combination of classifiers, this paper enhances the BRFs by Real AdaBoost with the help of a new definition method [23]. Random variable is defined as:

$${D}_t=\left\{\begin{array}{*{20}c}{h}_t(x),\kern2em if\ y=+1,\kern0.5em x\in s\\ {}-{h}_t(x),\kern1em if\ y=-1,\kern0.5em x\in s\end{array}\right.$$
(13)

Let \(D=\sum_{t=1}^T{\gamma}_t{D}_t\), then weighted combination is used to construct strong classifier H(x):

$$H(x)=\mathit{\operatorname{sign}}\left(\sum_{t=1}^T{\gamma}_t{h}_t(x)-\beta \right)$$
(14)
$${\gamma}_t=\frac{\mu_t}{\delta_t^2}$$
(15)

Where γt is the ratio of mean μt and variance \({\delta}_t^2\) of random variable D, μt and \({\delta}_t^2\) are expressed as:

$${\mu}_t=\frac{1}{2}\sum_{i=1}^N\left(\left(P\left(z|O\right)-P\left(z|B\right)\right)\mathit{\log}\left(\frac{P\left(z|O\right)}{P\left(z|B\right)}\right)\right)$$
(16)
$${\delta}_t^2=\sum_{i=1}^N\left(P\left(z|O\right){\left(\frac{1}{2}\mathit{\log}\left(\frac{P\left(z|O\right)}{P\left(z|B\right)}\right)-{\mu}_t\right)}^2+P\left(z|B\right){\left(\frac{1}{2}\mathit{\log}\left(\frac{P\left(z|O\right)}{P\left(z|B\right)}\right)+{\mu}_t\right)}^2\right)$$
(17)

After introducing the weighted coefficient, the sample weight is updated as follows:

$${\omega}_{t+1}(i)=\frac{\omega_t(i)\mathit{\exp}\left(-{y}_i{\gamma}_t{h}_t\left({x}_i\right)\right)}{\sum_{i=1}^N{\omega}_t(i)\mathit{\exp}\left(-{y}_i{h}_t\left({x}_i\right)\right)}$$
(18)

IBRFs is show as algorithm 1:

figure a

3.2 Algorithm implementation

As shown in Fig. 4, the pipeline framework of IBRFs is given as follows.

Fig. 4
figure 4

Pipeline framework of the IBRFs

At the training stage, for an input image, firstly, the feature is extracted by HOG with candidate box to form the feature domain space. After that, the random binary test is used to construct the random ferns. Next, a weak classifier is built by the random ferns. At last, the improved Real AdaBoost algorithm is used to select the most discriminative ones to construct IBRFs. At the testing stage, the HOG for input image with candidate box is obtained, and estimated by the IBRFs to complete the safety helmet wearing status detection.

4 Experimental results and analysis

4.1 Datasets and experimental platform

The experimental environment of this paper is as follows: window10, 64-bit operating system, Intel(R) Core(TM) i5-8300H CPU @ 2.30GHz, 8G RAM.

In this experiment, an enlarged Safety Helmet Wearing-datasets (SHWDFootnote 1) called GZMU-SHWD is taken to evaluate the IBRFs. The GZMU-SHWD originates from Internet crawler data (Google-Net crawler) and SHWD. It consists of 7589 images, including 5000 training samples, 2481 test samples and 108 verification samples. Some data are shown in Fig. 5.

Fig. 5
figure 5

Some images fromGZMU-SHWD

The image comes from the different scenarios, different lighting conditions and different angles, and the sample distribution is uneven. Therefore, image preprocessing is performed. The image is augmented by horizontal flipping and clipping, and normalized to 100*120.

4.2 Experimental parameter setting

In the training phase, some parameters of the model are fine-tuned with reference to BRFs [22]. The experimental parameters are shown in Table 1.

Table 1 Parameters

4.3 Experimental results

In order to evaluate the effectiveness of the proposed IBRFs, precision and recall are used in the measuring the effectiveness of the proposed method for safety helmet wearing status detection. The experimental evaluation formula is given as:

$$precision=\frac{TP}{TP+ FP}$$
(19)
$$recall=\frac{TP}{TP+ FN}$$
(20)

Where True Positive (TP) is the positive sample predicted to be positive, False Positive (FP) is the negative sample predicted to be positive, False Negative (FN) is the negative sample predicted to be negative.

4.3.1 Comparative analysis of different ensemble algorithms

The weighted coefficient is used to improve the Real AdaBoost algorithm to get the best combination coefficient. In order to verify the superiority of improved Real AdaBoost algorithm for classification, the algorithms are tested on Ionoshpere and Sonar datasets [23] and compared with other algorithms. The results are shown in Table 2.

Table 2 Comparative analysis of different ensemble algorithms

As show in Table 2, the detection error rate and variance of Improved Real AdaBoost are lowest. On Ionoshpere datasets, the detection error rate of Improve Real AdaBoost is 0.0939, which is 0.0129 lower than Gentle AdaBoost, and 0.0111 lower than Real AdaBoost. This verifies the effectiveness of the Improve Real AdaBoost algorithm to help the IBRFs algorithm in the classification problem.

4.3.2 Analysis of BRFs and IBRFs

In order to verify the effectiveness and generalization ability of IBRFs.In the same experimental environment, the BRFs and IBRFs are trained on public datasets include UIUC cars [26], INRIA horses [6], and GZMU-SHWD. There are 108 images in UIUC cars datasets, including 139 detection objects. There are 35 images in the INRIA horses datasets, including 45 detection objects. There are 108 images in GZMU-SHWD datasets, including 124 detection objects. The detection results of the two models are shown in Table 3.

Table 3 Comparative analysis of different datasets

Table 3 Compares the performance of BRFs and IBRFs on different datasets, and the detection accuracy of the IBRFs on INRIA horses datasets, UIUC cars datasets and GZMU-SHWD datasets is improved compared with the BRFs, especially on GZMU-SHWD, the detection accuracy reaches 92.74%, which is about 9.5% higher than the BRFs, The results show that the IBRFs is effective to detect the wearing status of safety helmet.

Figure 6 is the PR (Precision and Recall) curves of GZMU-SHWD, it can be seen that the IBRFs is better than BRFs in detection accuracy and recall rate. The results show that the weighted coefficient introduced in this paper can improve the detection accuracy of the model.

Fig. 6
figure 6

Precise and Recall curves

The IBRFs can distinguish and select random ferns during the detection process, so that the classifier can focus on the object parts that are more relevant to the classification. In Fig. 7, we observe the results of feature selection for the three different object classifiers. The first row depict the spatial layout of the ferns used to ensemble the classifiers. The colored contour boxes indicate the positional density and weight of individual ferns that give rise to the strong classifier. Red contours indicate higher density of ferns. We can see how ferns concentrate on those semantically relevant regions, such as the wheels of cars, or the neck and head on horses. The second row contains shared ferns distributions map for each strong classifier, the height of the i column indicates the number of weak classifiers that use the parameters θi .

Fig. 7
figure 7

The first row denotes spatial layout of ferns for different object categories.The second row denotes the distribution of weak classifiers for each class

4.3.3 Analysis of experimental results

In order to verify the superiority of our algorithm, compared it with other algorithms in the same hardware environment. IBRFs, BRFs, SVM + HOG [9], SSD [19], Faster R-CNN [15] and YOLOv3 [17] are tested on GZMU-SHWD. The detailed detection results are shown in Table 4.

Table 4 Comparison of different algorithms

From Table 4, in the safety helmet wearing status detection task, the detection accuracy of IBRFs is 92.74%, which is higher than SSD, Fast R-CNN, BRFs, especially HOG + SVM. Compared with BRFs, the detection accuracy is improved by 9.5%, which verifies the effectiveness of IBRFs. In CPU environment, the detection speed of IBRFs is about 15 times faster than Fast R-CNN, compared with BRFs, the detection speed of IBRFs is also improved. It verifies IBRFs meets the requirements of real-time detection. Above all, in the helmet wearing status detection task, the IBRFs proposed in this paper can achieve faster detection speed and higher detection accuracy.

Figure 8 shows the detection results of IBRFs in the safety helmet detection task in different angles, partial occlusion and complex environment. The reason for the success of IBRFs is that it adopts a crowd voting strategy based on prior probability, takes the differences between various features into full consideration, and gains strong adaptability in generalization.

Fig. 8
figure 8

Detection results of IBRFs. The blue box represents Ground Truth (GT), the green box represents the detected person with helmet, and the red box represents the situation of mistaking the background as the object

5 Conclusions

A novel object detection method, known as Improved Boosted Random Ferns algorithm (IBRFs) is proposed in this work for safety helmet wearing status detection. The IBRFs makes use of the advantage of basic BRFs and the improved Real AdaBoost algorithm for voting. In the proposed method, it adopts a crowd voting strategy based on prior probability, takes the differences between various features into full consideration, and gains strong adaptability in generalization for safety helmet wearing status detection. The proposed IBRFs shows more efficient outcomes compared to some deep object detectors. Furthermore, two other public datasets, INRIA horses datasets and UIUC cars datasets, are tested and compared with the basic BRFs. The results aslo verified the competitiveness of IBRFs.

Due to the effectiveness of the proposed IBRFs, it is suitable for safety helmet wearing status detection under various scenarios. Next, some recent object detection methods, such as FCOS and DETR, will be taken into account to enhance our IBRFs for object detection task so as to complete complex and various applications.