Keywords

1 Introduction

The billet is an upstream product of the rod and wire from Sinosteel. The process of producing billet from casting to production involves cooling, sand-blasting, rusting, inspection, grinding and heating, and finally rolling into strips. Billets are approximately 145 mm × 145 mm in size, and can be supplied to strip and wire factories for rolling into strip steel, wire rods and linear steel. However, inspection is necessary to ensure the quality of the product before it is sent out.

The surface temperature of billets reaches as high as 700 to 900° [1] in the production environment. These conditions make defect detection on billets difficult to achieve. Traditional billet defect detection methods are divided into visual inspection [2, 3] and magnetic particle inspection [4]. However, visual inspection is more cost and time efficient; therefore, we will only focus on visual inspection in this paper. The types of defects found can indicate the cause of defect formation and be used to improve the steelmaking process, since different defects have different causes.

In this paper, we develop a billet defect detection technology based on convolutional neural network. We propose a hierarchical structure to defect defects with revised SSD and ResNet50 [5, 6]. The experimental results show the effectiveness of the proposed method.

2 Architecture Overview

2.1 Structure of SSD

With the rise of convolutional neural networks, many models have evolved, such as Faster RCNN [7], Mask RCNN [8], Single Shot Multibox Detector (SSD) [9], and You Only Look Once (YOLO) [10]. All of these models have object detection capabilities. Among them, we chose the SSD300 version as the basic model. The reasons that we selected SSD300 are as follow:

  • Faster RCNN and Mask RCNN are two-stage methods, which means that the training process is performed in two steps. In contrast, SSD and YOLO are one-stage methods, which are more efficient.

  • The detection speed is better than that of other models. According to the author’s paper, the detection speed of SSD300 is 59 FPS (frames per second).

  • The architecture of SSD300 is simpler than other models’ architectures and easier to adjust.

  • SSD has multi-scale predictions.

The original SSD300 contains anchor boxes that are a combination of horizontal and vertical rectangles as shown in Fig. 1(a). In this work, we only use three horizontal rectangles as shown in Fig. 1(b). When there are many prediction boxes on an object, as shown in Fig. 2(a), non-maximum suppression (NMS) in SSD can solve this problem as shown in Fig. 2(b).

Fig. 1.
figure 1

(a) Original anchor box. (b) Customized anchor box.

Fig. 2.
figure 2

(a) Bounding boxes before NMS. (b) Bounding box after NMS.

In Figs. 35, we present our revisions to the SSD architecture based on characteristics of collected defect images. Billet defects are mainly small ones. In Fig. 4, a 75 × 75 feature map is added to convolutional block 3 of the VGG16 layer, and the last two feature maps (3 × 3 and 1 × 1) are removed. In order to compare advantages and disadvantages of various SSD structures, the original SSD module (SSD300) and modified SSD module (revised-SSD300) will be trained. In addition, the revised-SSD300 will be extended to a revised-SSD600 with an input size of 600 × 600 as shown in Fig. 5. Therefore, in total, three models will be trained for comparison.

Fig. 3.
figure 3

SSD300 architecture.

Fig. 4.
figure 4

Revised-SSD300 architecture.

Fig. 5.
figure 5

Revised-SSDSE600 architecture.

Fig. 6.
figure 6

Final hierarchical structure.

Fig. 7.
figure 7

(a) Original image. (b) SSD300 prediction results. (c) Revised-SSD300 prediction results. (d) SSD600 prediction results. (e) Original image. (f) Original image. (g) Revised-SSD300 prediction results. (h) SSD600 prediction results.

Fig. 8.
figure 8

Defect samples produced from SSD directly.

2.2 Introduction of SENet and ResNet

In our main task, we need to detect two defects, called “sponge” and “scar” defects. The task of SSD is to determine whether the defects exist and where they are. We also added Squeeze-and-Excitation Net (SENet) [11] structure in our model to boost the results with adaptive weights for each feature map. SENet is not a complete network structure, but rather a small architecture in between convolution blocks. When SENet is applied, our method is called Revised-SSDSE, which is shown in Fig. 5.

Sometimes, another non-defect factor, called “rusty factor” as shown in Fig. 9(c), will be present in the dataset. The rusty factors, which are not defects, have various shapes and features and significantly affect our results. In order to detect rusty factors in the dataset, the 3 ∗ 3 and 1 ∗ 1 layers must be added back to the revised SSD network.

Fig. 9.
figure 9

Extension of defect range. (a) Scar. (b) Sponge. (c) Rust.

After determining the existence and location of defects, ResNet should identify the name of the defect. In this paper, we use ResNet50 [12] and classify three categories of defects as shown in Fig. 9. The defect from SSD will be resized to 224 × 224 to fit the input size for ResNet50. The combination of revised-SSDSE600 and ResNet50 forms the complete hierarchical structure as shown in Fig. 6.

3 System Requirements

3.1 Hardware and Software

The hardware and software environment used in this paper is given in Table 1. The software part of the system includes Anaconda and GPU environment settings.

Table 1. Hardware and software environment.

3.2 Data Annotation

We use LabelImg v1.6.0Footnote 1 tool to mark defect locations and non-defect classes in the dataset for the SSD model. LabelImg supports several operating system platforms, like Windows, Linux and Mac OS X. In this work, we use a Windows environment. After the labeling process is completed, the label result is saved in an XML format.

4 Experimental Results

The detection results are affected by camera types, illumination, number of defective samples, and other factors. The training process is performed as follows.

  • Collect various defect samples.

  • Mark the defect samples and generate corresponding XML files containing defect information.

  • Train marked defect samples through the neural network structure and save the training results.

4.1 Initial Test

In the initial test, we prepared defect data with 464 Scar and 246 Sponge images in the dataset, 10% of which were validation and 90% of which were training data. The experimental results are shown in Table 2. The results in Fig. 7 show that the performance of the revised-SSD300 is similar to that of the revised-SSD600, but better than SSD300. There are too many redundant boxes when SSD300 is applied, as presented in Fig. 7b.

Table 2. Initial results of models.

We test the daily images provided by the onsite database and used the following parameters as an accurate benchmark for calculating the system performance [13]:

  • True Positive (TP).

  • True Negative (TN).

  • False Positive (FP).

  • False Negative (FN).

  • Precision (P) in Eq. (1).

  • Recall (R) in Eq. (2).

  • F-Measure is a comprehensive evaluation index, which is used to understand whether two values of Precision and Recall are good, as shown in Eq. (3).

$$ \text{P} = \text{TP}/\left( {\text{TP} + \text{FP}} \right). $$
(1)
$$ \text{R} = \text{TP}/\left( {\text{TP} + \text{FN}} \right). $$
(2)
$$ {\text{F-Measure}} = \left( {2 \times {\text{P}} \times {\text{R}}} \right)/\left( {{\text{P}} + \text{R}} \right). $$
(3)

4.2 Final Test

According to Tables 3 and 4, after a seven-day training period, the highest precision and recall of the revised-SSD300 were 100% and 77.6%, respectively. The revised-SSD600 had a better recall due to its high-resolution images. However, the combination of the revised-SSDSE600 and ResNet50 achieved the highest precision and recall rates.

Table 3. Final test results of revised-SSD300.
Table 4. Final test results of revised-SSD600.

Note that Fig. 10 shows that if we used defect bounding boxes directly from SSD for ResNet as shown in Fig. 8, the training process was hard to converge because the bounding boxes were too fitted to the defects. Therefore, we enlarged the range of the bounding boxes as shown in Fig. 9. After extending the bounding box, the training process could converge, which enabled the performance in Table 5 to be achieved.

Fig. 10.
figure 10

Train results of revised-SSD600 with ResNet50. (a) With first samples of defect. (b) With second samples of defect.

Table 5. Final test results of the combination of revised-SSDSE600 and ResNet50.

5 Conclusions

In this paper, we design a hierarchical model to build a defect detection system for steel billets. We have modified the architecture of SSD by changing the sizes of feature maps and the sizes of anchor boxes to fit the shape of defects. The experimental results demonstrate the effectiveness of the proposed method. In further work, we will collect more rust defect images, because rusty types include many variations.