Keywords

Introduction

Intelligent manufacturing is an important symbol showing the modernization level of the iron and steel enterprise. It is also an important condition to achieve the minimum cost and energy consumption in the process of production. The converter steelmaking process is a key part of the steel process, and its stability in hot metal conditions have an important impact on the quality of steel products. The precise control of the hot metal condition and smelting operation in the converter is beneficial to obtain a stable end-point carbon content and temperature, thereby promoting the smelting process narrow window control. The converter production is the multi-factor interaction process, so that the condition of hot metal, the operating, and the smelting cycle have an important effect on the converter end-point temperature and carbon content. Therefore, the development of converter operation process evaluation model will help to identify the main factor affecting the converter end-point control, and optimize the relevant process parameters and precise control of the narrow window. The twenty-first century is the information era, and the steel company has the massive data [1]. At the same time, with the rapid development of the cloud platform and the distributed system technology, the data storage and the computing capability have been greatly improved [2,3,4], which makes the data collection from steel companies, and refining the valuable production information through machine learning possible. Bai [5] developed a quality control system and realized online production of steel production, which benefited Cheng De Steel. Rot [6] collected the flame image data of the converter smelting, and utilized the convolutional neural network to predict the carbon content of the converter end-point, which brings a high prediction precise. Liu [7] established an intelligent control system for the quality of the sintering production line by collecting the actual production data of the sintering plant, which has improved the yield rate of sinter. These research works are of great significance to improve the level of intelligent manufacturing of steel mill production. However, the current production information of the steel plant has combined the human experience information and machine sensing information, so it is necessary to utilize the big data to construct the model, which provides more production information for human, to adjust the production in time. Based on the chi-square box method and logistic regression algorithm, this paper utilizes the actual production data of the converter to divide the converter production data and score the operation parameter, then evaluates the influence degree of the process parameters on the ideal target interval of the converter according to the WOE value. The evaluation model constructed in this paper can feedback the converter production process parameters timely, thus guiding the practice production to narrowly control the end-point of the converter.

Converter Process Parameters Selection

Based on the actual production data collected at the mill, the total converter production data about 1 year was sorted out. The data parameter is shown in Table 1, which mainly includes the raw material condition, the process operation parameter, and the end-point target. Among them, the end-point temperature and carbon content are the target variables, and the steel scrap addition, the hot metal condition, and the smelt cycle, etc., are processing operation parameters. This paper intends to divide the ideal target interval reasonably based on the actual requirements of steel production, and according to the ideal target interval, we utilize the chi-square box method and logistic regression algorithm to construct the converter evaluation model with nonlinear evaluation function, so as to guide the converter actual production and realize the narrow window control of the end-point.

Table 1 Relative parameters of converter production

Figure 1 is the frequency distribution histogram of the end-point temperature and carbon content. It can be seen that the overall distribution of the end-point carbon content and temperature are approximately normal distribution. If the range of ideal target interval is too large, the constructed evaluation model will bring lower evaluation efficiency; if the range of ideal target interval is too narrow, the constructed evaluation model will be too harsh. Therefore, dividing the ideal target interval range should consider the above two aspects comprehensively.

Fig. 1
figure 1

Frequency distribution histogram of end-point target

Considered with the requirement of the low carbon steel Q235B, the ideal target interval was reasonably divided. The ideal target interval of the end-point carbon content is 0.02–0.05%. In order to satisfy the temperature of the refining process, the ideal target interval of the end-point temperature is 1660–1680 °C. According to the selected ideal target interval, the actual converter production data of the end-point carbon content and temperature are re-divided and classified. The data which is in the ideal target interval is classified as Class I, being seen as ideal data, and the data which is not in the ideal target interval is classified as Class II, being seen as non-ideal data.

Evaluation Model Construction

For converter steel production, there are many process parameters affecting the end-point carbon content and temperature, mainly including the hot metal conditions and the operating conditions. Figure 2 shows the Pearson correlation coefficient between the process parameters of the steelworks and the end-point target. It can be seen that there is no significant linear correlation between all process factors and the end-point temperature or carbon content, which undoubtedly increases the difficulty of the precise control in the converter operation to achieve the ideal end-point temperature and carbon content.

Fig. 2
figure 2

Correlation coefficient between converter process parameters and end-point objectives

With rapid development of big data, the analysis thing mode has changed from the causal relationship model to the correlation relationship model [8]. With the help of a big data model, the relevant factors affecting converter production can be found. By collecting the actual production data of the converter steel plant, based on the big data technology, the potential law among the parameters of hot metal, operating parameters, and smelting cycle is deeply excavated, and the narrow window precise control of the converter end-point temperature and carbon content can be realized.

Establishment of the Chi-Square Boxing Method Model

The chi-square box method is an applying statistical model. By statistical analysis of existing data, the degree of influence for the variable on the target can be evaluated [9, 10]. The chi-square box method is used to group the converter production parameter, then the mass production data is summarized and analyzed, which is a benefit to analyze the relationship between the converter process factors and the end-point target.

The algorithm flow of the chi-square box method is (i) sorting every parameters from low to high; (ii) treating the data with the same value as the same interval; (iii) using Eqs. (1) and (2) to calculate the chi-square value of each interval; (iv) compare the chi-square values of adjacent intervals, and merge the similar value intervals that don’t exceed the chi-square threshold, then repeat steps (i)–(iii) until the proper number of bins.

$$ E_{\text{j}} = N_{\text{i}} \times C_{\text{j}} $$
(1)
$$ X^{ 2} = \sum\limits_{{{\text{j}} = 1}}^{2} {\frac{{\left( {A_{\text{j}} - E_{\text{j}} } \right)^{2} }}{{E_{\text{j}} }}} $$
(2)

where Aj is the number of instances of class j in each interval; Ej is the expected frequency number of Aj; Ni is the total number of each group; Cj is the total number of samples in each group of j samples.

The purpose of the chi-square box method is to discretize the continuous data, which is convenient for further data analysis. Through the chi-square box method, every parameter will be divide to a certain number interval, and to every interval, its target distribution will be maximized differently, so that each interval represents a specific operation. By analyzing the importance of each interval, we can evaluate the impact of different converter process parameters on the end-point target.

The importance of each interval is determined by the WOE value. The WOE value of each interval represents its effect on the end-point target. A positive WOE value indicates that the interval has a good effect on the end-point target, and a negative WOE value indicates a bad impact on the end-point target. The larger the absolute value of WOE, the greater is the impact on the end-point target. The calculation of the WOE value is as shown in the Eq. (3).

$$ WOE = \ln \left( {\frac{{P_{\text{bad}} }}{{P_{\text{good}} }}} \right) $$
(3)

where subscript good represents ideal data and subscript bad represents non-ideal data. Pgood represents in each interval the proportion of good in all good; Pbad represents in each interval the proportion of bad in all bad. It can be seen from the expression on the Eq. (3) that the positive WOE value represents a large negative influence, while the negative WOE value represents a large positive influence.

In addition, through the chi-square box method and the WOE value calculation, it is possible to determine the interval contribution of each process parameter in the converter production to the ideal target interval of the converter endpoint. Generally, the number of boxes is based on people’s experience. This paper uses an iterative method to calculate the reasonable number of boxes. The IV value represents the amount of information contained in the variable. The higher IV value represents the more information on the end-point target. The calculation of the IV value is as shown in the Eq. (4).

$$ IV = \sum\limits_{i = 1}^{n} {\left( {P_{\text{bad}} - P_{\text{good}} } \right) \times WOE_{\text{i}} } $$
(4)

Through iterative calculation, when the IV value reaches the maximum value, the number of bins corresponding to the variable is the optimal number of bins, and the iterative termination condition is taken as the IV value convergence. The information of the final IV value and the final binning number are shown in Fig. 3. As can be seen from the figure, the number of boxes in the smelting cycle is 10, the number of converter consumption O2 quantities is 16, the number of hot metal additions is 4, the number of steel scrap additions is 30, and the number of light roasting additions is 5. The number of hot metal w(C) is 48, the number of hot metal w(Si) is 25, the number of hot metal w(Mn) is 5, the number of hot metal w(P) is 22, the number of hot metal temperature is 14, and the number of lime addition is 12. The IV value represents the converter end-point information contained in the different converter process parameters. The IV value of the hot metal w(C) and the hot metal w(Si) is high, indicating that the hot metal composition of the steelmaking plant is the most important for the converter end-point.

Fig. 3
figure 3

IV value of related parameters and the number of bins

Process Parameters WOE Value Statistics

Utilizing statistic knowledge to count the WOE situation of 2418 production data collected from the steelmaking plant, the result is shown in Table 2. It can be seen that the WOE mean value of the converter process parameters is −0.1, indicating that the overall operation of the converter has a good influence on the end-point target; the WOE variance is 0.72, indicating that the converter operation fluctuates greatly; the WOE maximum is 2.1, indicating that in the past, the operation of converter has been a high negative impact operation and should be avoided in actual production.

Table 2 WOE statistical result of converter process parameters

Figure 4 is a comparison between Class I and Class II with each process parameter WOE mean value. It can be seen that the WOE value of each parameter in the class I target data is lower than the WOE value of the class II data, and the WOE value of each parameter of the class I data is less than 0, and the WOE value of each parameter of the class II data is higher than 0, indicating that the WOE is available. The positive and negative values represent the influence of the converter operating process on the ideal target interval of the endpoint. The WOE computing result indicates that the binning situation in this paper is reasonable. The difference in WOE mean value between the hot metal w(C) and the hot metal w(Si) is the largest, indicating that the values of these two process parameters have a great influence on the ideal target interval of the converter. The WOE values of hot metal w(Mn), light roasting addition, smelt cycle, and hot metal addition are all around 0, indicating that these four process parameters have a relatively stable influence on the ideal target interval of the converter.

Fig. 4
figure 4

Comparison of WOE mean values between class I and class II parameters

Establishment of Logistic Regression Model

Every WOE value represents the influence of the data box to the end-point target. By adding the WOE values linearly, the total influence of the relevant parameters can be obtained. The higher total influence value means the less likely to reach the ideal interval. Using the logistic regression algorithm shown in Eqs. (5) and (6), through the \( - \theta^{T} {x} \), the total influence of each parameter can be calculated, and then through the function \( \frac{1}{{1 + {\text{e}}^{{ - {\text{x}}}} }} \), the nonlinear relationship can be mapped.

$$ {\text{h}}_{\uptheta} \left( {\text{x}} \right) = \frac{1}{{1 + {\text{e}}^{{ -\uptheta^{T} x}} }} $$
(5)
$$ {\text{p(y}}/{\text{x;}}\uptheta )= ({\text{h}}_{\uptheta} ({\text{x}}))^{y} (1 - {\text{h}}_{\uptheta} ({\text{x}}))^{1 - y} $$
(6)

where θ represents a linear regression coefficient.

Equation (6) is a logistic regression function. The model is using the built-in logistic regression model of Python sklearn version 0.19. For the calculation results, the mean value of the class I data is 0.7, and the mean value of the class II data is 0.8. In order to identify the bad data, we set the threshold as 0.7. So if the calculation result exceeds 0.7, we determine that the data is unqualified.

As can be seen from Table 3, the total data number is 2418. From the results of the model calculations, the comprehensive accuracy is 79%. For Class II data, the recall rate is 84% and the accuracy rate is 88%, indicating that the constructed model can find unqualified data well.

Table 3 Model effectiveness assessment

Conclusion

Based on the actual production data of the converter, this paper utilizes the chi-square box method to discretize the data. To the discretized data, we calculate each box WOE value, and then evaluate the impact of box on the end-point target according to the WOE value. We found that the hot metal w(C) and the hot metal w(Si) have higher influence on the end-point target. So in order to achieve the narrow window control, we should first stable the hot metal w(C) and the hot metal w(Si). To the discrimination model, the comprehensive accuracy rate of the converter operation process evaluation model is 79%, and for the data that does not meet the ideal target interval, the discriminative accuracy rate is 88% and the recall rate is 84%.