Keywords

1 Introduction

Helmets recognition systems play a crucial role in the safe production. Through such real-time supervision of the construction operation plant area, it sounds the safety alarm for the workers, and improves the safety awareness of the workers while reducing the occurrence of safety accidents. In the future trend, with the continuous development of the industry and continues to segment demand, helmet identification system will further optimize related functions, leading to a more convenient management for the enterprise.

Domestic and foreign scholars had some research: Girshick and Ren [2] proposed Fast R-CNN and Fast R-CNN respectively, which improved the accuracy and detection speed, and the frame rate reached 5 f/s. In 2015, Redmon J [3] proposed the YOLO detection algorithm, which can detect video at a speed of 45 f/s. Based on YOLO, Redmon also proposed YOLOv2 [4] and YOLOv3 [5] detection algorithms. In April 2019, Zhou et al. proposed CenterNet: objects as points [6], and proposed a new anchor free algorithm based on key points.

Based on the CenterNet algorithm, this paper optimizes it when it is applied to the safety helmet wearing detection that combining with GIoU to optimize the original IoU method. By adding C detection frame (C detection frame is the smallest rectangular box including detection box and real box) to make up for the defect that IOU’s loss is the same when the detection frame and real box do not overlap, we also debug the training model Res/DLA framework in the training process. At the same time, various parameters are adjusted by experiments. To some extent, it eliminates the occurrence of coincidence detection and false detection of non target objects. Experiments show that the improved algorithm based on CenterNet can guarantee the detection rate and improve the detection accuracy.

2 CenterNet Principle

CenterNet takes the target as the centre of BBox when building the model. In the detection, the center point is determined by key point estimation, and other attributes, such as size and position, are regressed.

2.1 Frame

As shown in the Fig. 1.

Fig. 1.
figure 1

Flow chart of framework.

2.2 Prediction of Key Point

Set \(A\subseteq B^{W*H*3}\) as input image, the goal is to generate a heat map for key point prediction \(E\subseteq \left[ 0,1\right] ^{\frac{W}{B}*\frac{H}{B}*C}\), the R is the size scaling, and C is the number of feature maps. For the key point c of Ground Truth, its coordinate position is \(\theta \in B^2\), after the network processing, the position on the feature map is

$$\begin{aligned} \tilde{\theta }=\left\lfloor \frac{\theta }{B}\right\rfloor \end{aligned}$$
(1)

The paper passes the key points of Ground Truth through the Gaussian core

$$\begin{aligned} E_{d,e,c}=exp(-\frac{(d-\tilde{\theta }_d)^2+(e-\tilde{\theta }_e)^2}{2\sigma ^2\theta }) \end{aligned}$$
(2)

disperse to \(E\subseteq \left[ 0,1\right] ^{\frac{W}{B}*\frac{H}{B}*C}\),the loss function is

$$\begin{aligned} L_{k}=\frac{1}{N} \sum _{d e c}\left\{ \begin{array}{c} (1-\hat{E}_{dec})^{\alpha } \log (\hat{E}_{dec}) {if } E_{dec}=1 \\ (1-E_{dec})^{\beta }(\hat{E}_{dec})^{\alpha } , { otherwise} \\ \log (1-\hat{E}_{dec}) \end{array}\right. \end{aligned}$$
(3)

\(\alpha \) and \(\beta \) are set to 2 and 4; N is the number of key points. Without considering \((1-E_{dec})^{\beta }\), the new formula is

$$\begin{aligned} (1-Z_t)^{\alpha }*\log {Z_t} \end{aligned}$$
(4)
$$\begin{aligned} Z_{t}=\left\{ \begin{array}{c} \hat{E}_{dec}, { if } E_{d e c}=1 \\ 1-\hat{E}_{dec}, { otherwise } \end{array}\right. \end{aligned}$$
(5)

\(log{Z_t}\) is the standard cross-entropy loss function.

And that is adding the prediction of the local offset to each key point \(\tilde{T}\subseteq B^{\frac{W}{B}*\frac{H}{B}*2}\).

$$\begin{aligned} L_{off}=\frac{1}{N}\sum _{\theta }\left| \hat{T}_{\tilde{\theta }}-(\frac{\theta }{B}-{\tilde{\theta }}) \right| \end{aligned}$$
(6)

2.3 IOU(Intersection over Union)

IoU is what we call cross ratio.

$$\begin{aligned} IOU=\frac{M\cap N}{M\cup N} \end{aligned}$$
(7)
  1. (1)

    It can reflect the effect of the predicted detection frame and the real detection frame;

  2. (2)

    Scale invariance, that is scale invariant. In regression tasks, IoU is a direct indicator of distance between predict box and ground-truth (Satisfy non-negativity; Identity; Symmetry; Triangle inequality).

Problems (disadvantages) that may occur as a loss function:

  1. (1)

    If the prediction frame and the detection frame do not intersect, IOU = 0. Moreover, when loss = 0, the gradient can not be returned, resulting in the inability to train;

  2. (2)

    As shown in the Fig. 2, the IOU of the three cases are equal, but it can be seen that their coincidence degree is not the same. The IOU value can not reflect how the two boxes intersect. The effect is good to bad from left to right.

Fig. 2.
figure 2

Chart of comparison.

As shown in the figure above, three boxes with different relative positions have the same IoU = 0.33, but different GIoU = 0.33, 0.24, –0.1. When the alignment direction of the box is better, the value of GIoU is higher.

2.4 Improvement Direction

The regression loss (MSE loss, L1-smooth loss, et al.) optimization of BBox in detection task is not completely equivalent to IOU optimization, and Ln norm is also sensitive to object scale, IOU can not directly optimize the non overlapping part. Therefore, IOU can be directly set as the loss of regression. This method is called GIoU Loss(Generalized Intersection over Union) [7].

$$\begin{aligned} GIOU=IOU-\frac{P-(M\cup N)}{P} \end{aligned}$$
(8)

The meaning of the above formula is: firstly, calculate the minimum closure area C of the two boxes (popular understanding: including the area of the smallest box of the prediction box and the real box), and then calculate the IoU, and then calculate the proportion of the closure area that does not belong to the two boxes in the closure area, and finally use IoU to subtract this proportion to get GIoU.

Characteristic:

  1. (1)

    GIoU as distance, \(L_{GIoU}\) = 1-GIoU, which satisfy nonnegativity, identity, symmetry, trigonometric inequality;

  2. (2)

    Scale invariance;

  3. (3)

    When two frames overlap wirelessly, IoU = GIoU;

  4. (4)

    When a and B coincide, GIoU = 1. When the intersection of a and B is very small or there is no intersection, GIoU = –1;

  5. (5)

    GIoU takes into account non overlapping areas not considered by IoU.

GIoU is used to optimize the original IoU in CenterNet. Finally, the improved CenterNet algorithm is used to test the wearing of safety helmet.

3 Experimental Data Set Making

Data Collection: In this paper uses the same specification but different color helmet wearing pictures collected on the Internet to make the datasets.

Data Filtering: The background image without subject is deleted from the datasets, and the photo of human body wearing safety helmet is needed in the experiment.

Data Marking: In the annotation, we select the target in the image by using lableImg. The interface is shown in Fig. 3.

Fig. 3.
figure 3

Diagram of annotation.

4 Experimental Analysis

As shown in Fig. 4.

Fig. 4.
figure 4

Chart of experimental flow.

4.1 Experimental Scheme

In this paper, The datasets are divided into five categories: A to E. The classification is shown in Fig. 5.

Fig. 5.
figure 5

Diagram of sample.

4.2 Network Training

The parameters are adjusted according to many experiments, as shown in Table 1.

Table 1. Description of network parameters.

4.3 Experimental Platform and Network Training

In this paper, the improved CenterNet algorithm is used to test. The improvement point is: the IOU in the original CenterNet algorithm is improved by using the GIoU which calculates the minimum closure region. At the same time, Faster R-CNN in reference [1], RetinaNet in [6], Yolo V3 in reference [5] and CenterNet in reference [7] are used for comparison, the results shown in Table 2.

Table 2. Comparison of experimental results.

4.4 Result Analysis

The experimental results show that although the accuracy rate of improved CenterNet is lower than that of Faster R-CNN, the detection rate ratio is several times of that of Faster R-CNN. Therefore, the improved CenterNet performs better. However, the other methods are not as good as the improved CenterNet algorithm in two aspects. The improved CenterNet algorithm takes into account the detection accuracy and detection rate at the same time, and can complete the safety helmet wearing detection task better.

In addition, in order to feel the detection differences more intuitively between different algorithms, this paper selects some detection images for comparative analysis, in which Fig. 6 is the detection effect of CenterNet algorithm, and Fig. 7 is the detection effect of improved CenterNet algorithm.

Fig. 6.
figure 6

The detection effect of CenterNet algorithm.

Fig. 7.
figure 7

The detection effect of improved CenterNet algorithm.

By observing Fig. 6 and Fig. 7, it can be found that the coincidence frame display and the false recognition of non target objects are solved to a certain extent. Therefore, in this paper that the improved CenterNet algorithm can maintain a high detection rate and meet the requirements of real-time detection for helmet wearing detection and classification tasks.

5 Conclusion

In this paper, an improved method of helmet wearing detection based on CenterNet algorithm is proposed. The safety helmet wearing detection experiment is carried out by using the video taken by mobile phone in the construction site as the data set, which can ensure the high detection accuracy and still have a fast detection speed, and meet the requirements of the accuracy and real-time of the safety helmet wearing detection in the work environment monitoring video basically.