Introduction

As an essential variety of steel materials, plate plays an irreplaceable role in national economic construction, people's lives and national security.1,2,3 V–N steel hot-rolled plate has realized mass industrial production. Compared with the conventional production process, the production process saves the RH vacuum refining, quenching, tempering and other heat treatment processes, dramatically reduces energy consumption, realizes green production and produces significant economic benefits.4 With the rapid development of computer technology in recent years, mass data generated in V–N steel hot-rolled plate production are collected and stored. However, due to some plates' unqualified mechanical property data in the production process, the products cannot be put into regular use. Therefore, by establishing a mechanical property prediction model to mine the considerable research value existing in industrial big data, material science theory and big data can be effectively linked, and the law between the composition, structure, performance and process of the material can be revealed macroscopically. This provides practical guidance for optimizing the chemical composition and production process of products and improves the production efficiency of products.5,6 However, in the face of V–N steel hot-rolled plate products with a large amount of data, complex production process and various parameters, the traditional PM model will consume much time and economic costs and be prone to failure. Thus, it is necessary to establish an efficient prediction model for the mechanical properties of the V–N steel hot-rolled plate.

Artificial intelligence technology has gradually emerged as an influential power for large-scale data processing, steel performance prediction, material composition design and process preparation.7,8,9,10 The mechanical properties prediction model based on the ML algorithm has achieved remarkable results and advantages in the field of big data analysis for the steel industry. Hore et al. established a prediction model of mechanical properties of hot-rolled TRIP steel by combining an adaptive neural network with a fuzzy inference system and obtained the relationship between coiling temperature, tensile strength, elongation and other properties and achieved good prediction results.11 Qian et al. established the DNN model based on composition and process parameters to predict the yield strength (YS), tensile strength, elongation and impact energy of hot-rolled steel plate. It is applied to the factory's manufacturing execution system to assist the regular online production monitoring.12 Wu et al. established the tensile strength prediction model of X70 pipeline steel based on stepwise regression (SR), ridge regression (RR), Bayesian regularization neural network (BRNN), radial-basis function neural network (RBFNN), support vector machine (SVM) and random forest (RF). They found that RF had a strong nonlinear fitting ability and achieved the best prediction effect (R = 0.95, MSE = 278.7 MPa2).13 Diao et al. constructed five ML models based on 97 kinds of carbon steel data and successfully predicted the tensile strength, fracture stress, impact energy, hardness, fatigue strength and elongation. They found that SVR and MLP models had the best performance in predicting mechanical properties.14 In addition, to solve the problem of low interpretability of the physical mechanism of the ML model, Li et al. introduced Ac1, Ac3 temperature and deformation resistance parameters to guide the ML process and successfully established a new industrial big data analysis system combining ML classification and regression model with key PM variables.15 However, most of the above studies have carried out performance predictions for multiple steel grades, which cannot profoundly mine the complex functional relationship between the composition of each steel grade, process parameters and mechanical properties and clarify the influence mechanism of different parameters on the performance. So far, there is no report on modeling V–N steel hot-rolled plate based on the ML algorithm.

Therefore, the study aims to establish a prediction model of mechanical properties of V–N steel hot-rolled plate based on the ML algorithm. This study uses the standard YS index as an example to provide modeling ideas for other mechanical property prediction models. The carefully preprocessed data are added with PM parameters for model training and testing. The efficiency ratio (Er), mean absolute error (MAE) and root mean square error (RMSE) are used to evaluate the model's prediction performance, and the best two models are selected according to the prediction effects of KNN, SVR, MLP, GBR, RFR and XGBoost. Then, the BO method is used to optimize the selected model's hyperparameters further to improve the model's prediction accuracy and generalization ability.

Experimental Method

Data Preprocessing

Data Cleaning and Processing of Missing and Abnormal Values

The research data of the V–N hot-rolled plate in this study were obtained from Laiwu Iron and Steel Group Co., Ltd., of Shandong Province, including chemical composition, rolling process and mechanical properties. The corresponding steel grade is Q550D, and the corresponding national standards are GB/T 1591–2008 and GB/T 16270–2009. The data mainly include chemical composition, rolling process and mechanical properties. First, the data were correlated by composition-process properties and arranged according to the production process. The samples and features with many null values and zero values were eliminated, and the unqualified samples were eliminated according to the corresponding national standards GB/T 1591–2008 and GB/T 16270–2009 of Q550D steel. Then, the Pauta criterion is used to detect and eliminate abnormal values.16 When the data are normally distributed, the Pauta criterion has an excellent processing effect,17 as shown in formula (1):

$$ \left| {v_{i} } \right| = \left| {x_{i} - x} \right| > 3\sigma $$
(1)

where \(x_{i} ,x_{2} ,...,x_{n}\) are YS values in plate data, x is the mean value, \(v_{i} = x_{i} - x\left( {i = 1,2,...,n} \right)\) is the residual error, and σ is the standard deviation. If the residual error \(v_{i} \left( {1 \le i \le n} \right)\) of the YS \(x_{i}\) satisfies formula (1), that is, the YS xi ∉ \( \left[ {\mu - 3\sigma, \mu + 3\sigma } \right]\), the YS corresponding to the steel is considered abnormal data and eliminated.

In addition, a small number of missing values in the data set will bring additional complexity to the model. The missing values can be processed by filling, such as artificial, mean, EM, hot card and regression filling.18,19,20 The study used methods such as mean, median, mode, KNN interpolation and RFR filling to fill in the missing values in the dataset.

Feature Scaling

Feature scaling applied to features with an extensive numerical range can standardize their scale, so all variables can make comparable contributions to the model results and accelerate the training process, which positively impacts the model effect.21 Therefore, the study uses the Z-score standardization method to scale features. The Z-score standardization formula is as follows:

$$ x^{^{\prime}} = \frac{x - \mu }{\sigma } $$
(2)

where x is the original feature, \(x^{^{\prime}}\) is the standardized feature, and \(\mu\) and \(\sigma\) are the mean and standard deviation of all samples of the original feature, respectively.

Feature Processing and PM Element Introduction

Feature selection is selecting a relevant feature subset for model construction and discarding some irrelevant features under minimum information loss. Feature selection can effectively simplify the model, make the model easy to understand, save storage and computing overhead and reduce the risk of overfitting.22 The study used XGBoost model importance evaluation and Pearson correlation analysis to select the input features. In the first step, the XGBoost model is used to evaluate the importance of features, and the features as segmentation points are calculated according to the gain of structure score. The importance of each feature is the number of times it appears in all trees. The more times a feature is used to build decision trees in the model, the higher its importance.23 In the second step, the Pearson correlation coefficient is used to measure the degree of linear correlation between two features \(x_{1}\) and \(x_{2}\) and reduce the features. Pearson correlation coefficient is calculated as follows:

$$ \gamma_{xy} = \frac{{\sum\nolimits_{i - 1}^{m} {\left( {x_{i} - \left. {\overline{x} } \right)\left( {y_{i} - \left. {\overline{y} } \right)} \right.} \right.} }}{{\sqrt {\sum\nolimits_{i - 1}^{m} {\left( {x_{i} - \left. {\overline{x} } \right)^{2} \sqrt {\sum\nolimits_{i - 1}^{m} {\left( {y_{i} - \left. {\overline{y} } \right)} \right.^{2} } } } \right.} } }} $$
(3)

where \(\overline{x}\) and \(\overline{y}\) are the averages of elements in vectors 1 and 2, respectively. \(\gamma_{xy}\) is the correlation between different variables; m is the number of data in the sequence. \(\gamma_{xy}\) is between -1 and 1, where 1 is completely positive and -1 is completely negative. When \(\left| {\gamma_{xy} } \right| > 0.9\), it indicates that the two variables have a strong correlation.24 Two highly correlated features are considered to contain similar information or have similar effects on the mechanical properties of V–N steel hot-rolled plates. Therefore, we build XGBoost models with two highly correlated features as input respectively and eliminate the features with poor model effects.

According to the characteristics of the V–N steel hot-rolled plate data set, this study introduced several PM parameters to guide the ML model. Considering that the phase transformation temperature is an essential reference factor for the formulation of the heat treatment process of V–N steel, the Ac1 and Ac3 temperatures that can reflect the microstructure of plates are introduced in the study as new input features.25 In addition, the reduction rate will affect the microstructure evolution of plates. Controlling the reduction rate in the hot-rolling process can effectively refine the grain size, make the precipitates dispersed and fine, and improve the strength and toughness of the steel. Therefore, according to slab, intermediate slab and rolling thicknesses, the reduction rate (ε1) in the rough rolling process and reduction rate (ε2) in the finish rolling process are calculated as other input features.

Based on the above data processing process, the input features selected for each category include chemical composition, rolling process and PM parameters. The output feature is YS. The descriptive statistical information of the above data is shown in Table I.

Table I Range of input and output parameters in data sets

Machine Learning Model

Without loss of generality, six ML models were established to predict YS,and the most suitable method was found to solve the problem in the study, including KNN, SVR, MLP, RFR, GBR and XGBoost.26,27 In the above model, KNN finds k training samples closest to the training set based on a specific distance metric and predicts them by K 'neighbors' information. SVR aims is to obtain a model \(f\left( x \right)\) that can fit the training set samples as much as possible. By constructing a loss function between the sample label and the model's predicted value, the loss function is minimized to determine the model \(f\left( x \right)\).28 In the study, RBF is selected as the kernel function of SVR. MLP, also known as artificial neural network (ANN), is composed of the input layer, hidden layer and output layer, and the layers are fully connected. In the training process, the model continuously adjusts the node weight according to the difference between the measured value and the predicted value by backpropagation to obtain a smaller loss function value.29 GBR, XGBoost and RFR belong to the ensemble learning algorithm. Since ensemble learning completes the learning task by constructing multiple weak learners and combining specific strategies, the main advantage is that when one or more weak learners make error predictions, other weak learners can be corrected, so the ensemble learning algorithm often has a better fitting ability and generalization performance. At present, ensemble learning is mainly divided into two categories: bagging and boosting. RFR belongs to the bagging class, and GBR and XGBoost belong to boosting class.30,31 Compared with GBR, XGBoost adds regularization to the objective function, which is beneficial to reducing the variance of the model, simplifying the model and preventing overfitting. It has been successfully applied to air pollution and landslide.32

For the problem of data set partitioning, the specific solutions are the hold-out, cross validation and bootstrapping method. In the study, the data set was randomly divided into a training set and a testing set in the ratio of 8:2 using the hold-out method. To further analyze different models and select appropriate models for prediction, three evaluation indexes, Er, MAE and RMSE, are used to measure the model's prediction accuracy.15 Er refers to the proportion of the total number of data relative to the total amount of data whose prediction deviation of V–N steel data points is < 6% of the average value of the target attribute. The formula of the three indexes is defined as follows:

$$ {\text{Er}} = \frac{{N_{{\text{e}}} }}{{N_{{{\text{all}}}} }} \times 100\% $$
(4)
$$ {\text{MAE}} = \frac{1}{n}\sum\limits_{i = 1}^{n} {|y_{i} - \mathop {y_{i} }\limits^{ \wedge } } | $$
(5)
$$ {\text{RMSE}} = \sqrt {\frac{1}{n}\sum\limits_{i = 1}^{n} {(y_{i} } - \mathop {y_{i} }\limits^{ \wedge } )^{2} } $$
(6)

where \(y_{i}\) and \(\mathop y\limits^{ \wedge }\) represent the measured and predicted values of plates’ mechanical properties, respectively; n is the total number of sample data, \(N_{{\text{e}}}\) represents the total number of data within the specified error range, and \(N_{{{\text{all}}}}\) represents the total amount of data. The higher the Er value of the model is, the lower the MAE value and RMSE value, and the better the fitting effect of the model.

The SVR, MLP, GBR, RFR and XGBoost models used in the study were implemented based on the scikit-learn library in Python. The experiment was carried out on a Win10 system with 64 bits, Intel (R) Core (TM) i5—6200 U CPU @ 2.30 GHz (4 CPU) and running memory of 8 GB.

Bayesian Hyperparameter Optimization

BO is a global optimization algorithm based on probability distribution. It is assumed that there is a black box function \(f\left( x \right)\) from the hyperparameter to a specific objective (such as the model performance index). BO infers the information of the black box function by collecting the hyperparameter (independent variable) and the corresponding model performance index (dependent variable). The purpose is to find the hyperparameter \(x^{ * }\) with d dimensions in the hyperparameter space that optimizes the model performance index. The study takes finding the minimum value as an example, and the expression is as follows (7):

$$ x^{ * } = \arg \min f\left( x \right) $$
(7)

where \(x \in X\), \(X \subseteq R^{d}\), \(X\) is a hyperparameter space with d dimensions. The BO process utilizes the Bayes theorem. The core steps are the probability surrogate model and the acquisition function. The probability surrogate model approximates the black box function \(f\left( x \right)\). The acquisition function selects the next evaluation point that optimizes the objective function value according to the evaluated domain H to reach the optimal solution quickly. The Bayes theorem and the expression of the evaluated domain H are shown in (8) and (9) as follows:

$$ p\left( {\left. f \right|H} \right) = \frac{{p\left( {H\left| f \right.} \right)p\left( f \right)}}{p\left( H \right)} $$
(8)
$$ H = \left\{ {\left( {\left. {x_{1} ,f\left( {x_{1} } \right)} \right)} \right.} \right.,...,\left. {\left( {x_{{\text{n}}} ,} \right.\left. {f\left( {x_{n} } \right)} \right)} \right\} $$
(9)

where \(p\left( f \right)\) and \(p\left( {H\left| f \right.} \right)\) are the prior probability distribution and likelihood distribution of \(f\left( x \right)\), respectively. \(p\left( {f\left| H \right.} \right)\) is the conditional probability distribution of \(f\left( x \right)\) given the evaluated domain H, that is, the posterior probability distribution.

In the study, tree-structured Parzen estimator (TPE) is selected as the probability proxy model and expected improvement (EI) is selected as the acquisition function. The \(p\left( {H\left| f \right.} \right)\) in (8) is defined as follows:

$$ p\left( {x\left| y \right.} \right) = \left\{ {\begin{array}{*{20}c} {l\left( x \right),y < y^{ * } } \\ {g\left( x \right),y \ge y^{ * } } \\ \end{array} } \right. $$
(10)

where \(l(x)\) is the density formed by using the observations \(x^{\left( i \right)}\) such that corresponding loss\(f\left( {x^{\left( i \right)} } \right)\) was less than \(y^{ * }\) and \(g\left( x \right)\) is the density formed by using the remaining observations.

Optimization criteria based on improving EI:

$$ {\text{EI}}_{{y^{ * } }} \left( x \right) = \int_{ - \infty }^{{y^{ * } }} {\left( {y^{ * } - y} \right)} P\left( {y\left| x \right.} \right)d_{y} = \int_{ - \infty }^{{y^{ * } }} {\left( {y^{ * } - y} \right)} \frac{{P\left( {x\left| y \right.} \right)P\left( y \right)}}{P\left( x \right)}d_{y} $$
(11)

By construction, \(\gamma = p\left( {y < y^{ * } } \right)\) and \(p\left( x \right) = \int_{R} {P\left( {x\left| y \right.} \right)} P\left( y \right)dy = \gamma l\left( x \right) + \left( {1 - \gamma } \right)g\left( x \right)\). Therefore:

$$ \int_{ - \infty }^{{y^{ * } }} {\left( {y^{ * } - y} \right)} p\left( {x\left| y \right.} \right)p\left( y \right)d_{y} = l\left( x \right)\int_{ - \infty }^{{y^{ * } }} {\left( {y^{ * } - y} \right)} p\left( y \right)d_{y} = \gamma y^{ * } l\left( x \right) - l\left( x \right)\int_{ - \infty }^{{y^{ * } }} y p\left( y \right)d_{y} $$
(12)

So finally: \(EI_{{y^{ * } }} \left( x \right) = \frac{{\gamma y^{ * } l\left( x \right) - l\left( x \right)\int_{ - \infty }^{{y^{ * } }} y p\left( y \right)d_{y} }}{{\gamma l\left( x \right) + \left( {1 - \gamma } \right)g\left( x \right)}} \propto \left( {\gamma + \frac{g\left( x \right)}{{l\left( x \right)}}\left( {1 - \gamma } \right)} \right)^{ - 1}\). This last expression shows that to maximize improvement, we would like points \(x\) with high probability under \(l\left( x \right)\) and low probability under \(g\left( x \right)\). The tree-structured form of \(l\) and \(g\) makes it easy to draw many candidates according to \(l\) and evaluate them according to \(\frac{g\left( x \right)}{{l\left( x \right)}}\). On each iteration, the algorithm returns the candidate \(x^{ * }\) with the greatest EI.33,34,35,36

In the study, the BO algorithm is implemented based on the Optuna library in Python, and the optimization process is shown in Fig. 1. The specific implementation steps can be summarized as follows:

  1. (1)

    According to the determined hyperparameters to be optimized, the model hyperparameter optimization range is set, and the initial parameter combination is randomly generated within the parameter range.

  2. (2)

    Based on the acquisition function EI, the next superparameter combination sampling point \(x_{i}\) is selected from the TPE probabilistic surrogate model.

  3. (3)

    The value of the model's hyperparameters as a new combination of hyperparameters is input into the prediction model for training, and the prediction results of the model are obtained.

  4. (4)

    If the error of the new hyperparameter combination meets the accuracy requirement, the algorithm is terminated and exited, and the optimal hyperparameter combination is output. If the accuracy requirement is not met, \(\left( {x_{i} ,f\left( {x_{i} } \right)} \right)\) is input into the TPE model to modify the acquisition function, and steps (2) and (3) are repeated to find the hyperparameter combination that meets the accuracy requirement and stops the iteration.

Fig. 1
figure 1

Process of Bayesian hyperparameter optimization based on TPE.

Results and Discussion

Data Preprocessing Results and Discussions

Results and Discussion of Data Cleaning and Processing of Missing and Abnormal Values

In the study, 5021 samples were obtained through data association, and 3856 samples were obtained after deleting samples with many null values, redundant features and abnormal values. The cleaning effect of the above abnormal values is shown in Fig. 2a. The processed data are more intensive and concentrated. Then, the missing values were filled, and the filling effect is shown in Fig. 2 (based on the XGBoost model runs ten times). As shown in Fig. 2b and c, the KNN imputation and RFR imputation methods achieved better results on the training and test sets than other imputation methods. Because the KNN interpolation method uses the average of K nearest neighbors to fill in missing values, RFR can combine multiple decision trees to make a more accurate and stable prediction of missing values than a single tree. Among them, the RFR filling achieved the best filling effect. The training sets Er, MAE and RMSE were 99.99%, 3.63 MPa and 5.18 MPa, respectively. The testing sets Er, MAE and RMSE were 93.26%, 14.25 MPa and 21.41 MPa, respectively. Therefore, the study took the data after RFR filling as the subsequent modeling data.

Fig. 2
figure 2

Comparison of abnormal value elimination effect and effects of different missing value filling methods on modeling: (a) abnormal values elimination effect, (b) training set, (c) testing set.

Feature Processing Results and Discussion

The XGBoost model importance evaluation method was used to calculate the importance of input features to YS. To avoid the influence of randomness on the results, the data set was randomly divided ten times, and the obtained feature importance ranking is shown in Fig. 3. To ensure the generalization of the model, the features with an importance score < 90, namely Cu, Nb, Mo, Al, B, Ti and Ni, were removed.

Fig. 3
figure 3

Feature importance ranking based on XGBoost model.

Then, the correlation coefficient value with an absolute value > 0.9 is shown in Fig. 4. When \({|r}_{xy}|\)>0.9, two features were used as input features respectively to establish two ML models. The RT and CIT features were removed by comparing the model's prediction accuracy. Based on the XGBoost model feature importance analysis and Pearson correlation analysis, the final input and output parameters are determined as shown in Table I in Sect. "Data preprocessing".

Fig. 4
figure 4

Feature correlation analysis based on Pearson.

Figure 3 shows that the input features have different degrees of influence on the YS of the V–N steel hot-rolled plate with steel grade Q550D. The feature importance score of chemical composition from high to low is C, N, Si, Mn, Cr, P, S and V. C element can stabilize austenite and form an interstitial solid solution. Adding an appropriate amount of C element to the V–N steel of Q550D can form V (C, N), VN and other precipitates with V and N elements.37 The precipitates can promote the formation of intra-grain ferrite (IF) in the austenite region and play the role of fine grain strengthening. In the ferrite region, fine dispersed particles can be formed to play the role of precipitation strengthening, thus significantly improving the YS of the steel. However, too high C content will reduce the low temperature toughness of steel and significantly destroy the weldability of materials. Therefore, by ultra-low carbon composition design, the C content is controlled between 0.058 wt.% and 0.136 wt.%. Both Si and Mn can improve the stability of austenite, thus improving the strength and hardness of steel. However, too high Mn is prone to composition segregation, leading to too high hardness and toughness deviation. Cr is a cheap element that not only has the effect of solid solution strengthening and refining the organization but also can improve steel's hardenability, significantly improving steel's antioxidant effect. However, too much Cr has a greater tendency to temper brittleness. S and P are harmful elements in steel. S will cause serious segregation of FeS, resulting in steel cracking during hot working, which is a hot brittle phenomenon. P is mainly introduced by raw materials such as ore and pig iron. Although it has a significant strengthening effect, it will significantly reduce low temperature toughness, that is, cold brittleness. Therefore, in the actual production process, the content of S and P in steel needs to be strictly controlled in a lower range. V and N are the main microalloying elements of Q550D V–N steel. The non-metallic compound of V can form MnS + VN composite second phase with fine MnS particles in V–N steel to further promote the nucleation of IF.38 On the one hand, precipitation strengthening improves the strength of the steel; on the other, acicular ferrite also greatly improves the material's toughness. V is abundant and inexpensive, and it has a strong affinity with N. N is a harmful element when it is free and can exist in the form of precipitates with V to improve the overall performance of the steel. Therefore, by designing the ratio of V and N, the steel can be guaranteed to obtain excellent comprehensive performance while greatly saving production costs.

For the rolling process, the microstructure of the steel will change continuously through the rolling process of heating, rough rolling, finishing rolling and cooling. HT, ROT, RFT and ε1 will affect the morphology of the original austenite and austenite recrystallization behavior. As the temperature decreases, the austenite grains are gradually refined, increasing the overall austenite grain boundary area, dislocations, and substructures. FOT, FRT and ε2 are essential factors affecting the deformation and phase transformation of V–N steel. With the continuous decrease of temperature, plastic deformation leads to the elongation and flattening of austenite grains, the large increase of grain boundary area and the generation of many deformation bands and strain-induced precipitates, which provide a good nucleation site for ferrite phase transformation and refine the ferrite grains. As the factor with the highest feature importance score in Fig. 3, COT will affect the microstructure refinement of austenite to ferrite transformation.39 Controlling the COT can make the second phase of V–N steel fully precipitate and produce more acicular ferrite, polygonal ferrite and other microstructures, thereby improving the strength and toughness of steel.40

A relatively higher correlation (corresponding to darker ellipses in the top right region of Fig. 4) is observed between YS and processing parameters and among processing parameters themselves. Some of it is expected since many processing parameters are inherently coupled (e.g., FRT and CIT). It shows the critical dependence of material properties on the rolling process through microstructure.7 Although all process parameters were highly correlated with YS, the most influential ones are FOT, FRT, CIT and COT. In particular, FOT shows a strong positive correlation with YS. Given the way the dataset was constructed, most of these reflect that performing one or more of these processing steps enhances the YS of V–N steel hot-rolled plate. IBT and RT are negatively correlated with the YS, indicating that the smaller the thickness, the worse the YS.

Analysis of Modeling Results

After data partitioning, there were 3084 samples in the training set for training the model and 772 samples in the test set for evaluating the effect and generalization of the model. Considering that the model pays more attention to improving model generalization performance in actual production, the model was evaluated by the prediction results on the testing set. Figure 5a used the histogram to compare the prediction effects of the six models in Sect. "Machine learning model". The evaluation indexes Er, MAE and RMSE of the XGBoost and RFR models were much better than those of KNN, RBF-SVR and MLP models. The MAE and RMSE evaluation indexes of the GBR model were poor, but its Er was close to that of the XGBoost and RFR models. To show the model fitting effect, a scatter plot of the measured and predicted values of the six models were drawn. Figure 5b–g clearly shows the relationship among the measured value, predicted value and best-fitting line through the scatter plot.

Fig. 5
figure 5

Comparison of prediction results of KNN, RBF-SVR, MLP, GBR, XGBoost and RFR models and scatter plots of predicted and measured values distribution: (a) comparison results, (b)–(g) prediction results (Color figure online).

In Fig. 5, the slope of the red line is 1, and the dashed black line is the boundary of the Er value. The more prediction points fall within the boundary range and are close to the line with slope 1, the better the fitting effect of the model is. Figure 5b–d indicates many prediction points outside the Er boundary line. The Er of the model was < 90%, and the MAE and RMSE were > 20 MPa and 30 MPa, respectively, indicating the generalization ability of the KNN, RBF-SVR and MLP model was poor. The main reason for the poor generalization ability of the KNN model is that the industrial data dimension used in the study is too high. As the dimension increases, the distance between the two points in the KNN model tends to be larger. Compared with the KNN model, the RBF-SVR model can solve high-dimensional problems. However, the study's poor generalization ability is mainly related to the parameter selection and the stability of the kernel function in high-dimensional space mapping. For the MLP model, the main reason for the poor prediction effect is that the appropriate artificial neural network parameters are not selected, such as the number of hidden_layer sizes and batch_size. The points in Fig. 5e were close to the reference line, and 90.42% of the prediction points were located within 6% of the average YS value of the testing set. However, in the range of YS 700–782 MPa, due to the small number of edge data, the frequency of occurrence in the training set was low, making the information in the data difficult to learn by the model. Therefore, with the decrease in data density, the prediction effect worsened. In Fig. 5f and g, > 90% of the data in the prediction results of the XGBoost and RFR models were located within the Er boundary, and the fitting slope was closer to 1. The MAE and RMSE were < 16 MPa and 23 MPa, respectively, indicating that the prediction effect was perfect. As described in Sect. "Machine learning model", the ensemble learning algorithm has a strong generalization ability and better fitting effect for high-dimensional complex features. Therefore, XGBoost and RFR models were selected as the objects of BO algorithm optimization.

Hyperparameter Optimization Results Analysis

In the modeling process, the hyperparameters affecting RFR are mainly n_estimators, max_depth, min_samples_split, max_features, etc.41 The hyperparameters affecting the XGBoost model are mainly n_estimators, max_depth, min_child_weight, learning_rate, etc.42 The greater the number of n_estimators in the modeling process, the more information the model learns, but too many n_estimators will lead to increased running time and overfitting phenomenon, thus reducing the model prediction accuracy. If Max_depth is deeper, more feature attributes need to be divided in the modeling, and the corresponding model structure will be more complex. Therefore, the size of max_depth needs to be set with reference to how many feature attributes of the training data. The min_samples_split, which affects the random forest model, mainly restricts the conditions for further sub-tree division. If the number of samples at a node is less than the min_samples_split, no further attempts will be made to select the optimal features for division, while the max_features can control the maximum number of features considered for the division to control the generation time of the decision tree. The smaller the min_child_weight affecting the XGBoost model, the easier the model is to overfit, while learning_rate controls the step size when updating the weights in each iteration, and the smaller the value, the slower the training. According to the description of the physical mechanism of hyperparameters above, the main hyperparameter ranges and optimal hyperparameters of the RFR and XGBoost models optimized by the BO method in the study are shown in Table II. The hyperparameter range in Table II (default values for other hyperparameters) was adopted, and the hyperparameters obtained by optimization were used for modeling. The prediction effect of the BO-RFR and BO-XGBoost models is shown in Fig. 6, and the points with different colors in Fig. 6b and c represent the absolute error size. Figure 6a compares the performance indexes of RFR, XGBoost, BO-RFR and BO-XGBoost models. It was found that BO made two models' Er increase by 0.29% and 2.27%, two models' MAE increase by 1.17% and 10.54%, respectively, and two models' RMSE increase by 1.27% and 10.54%, respectively. It showed that the BO algorithm had a better optimization effect on the XGBoost model and improved its accuracy, robustness and reliability because the BO algorithm can establish a probability model through the previous evaluation results of the objective function, find the value of the minimum objective function and avoid falling into the optimal local solution. In addition, as shown in Fig. 6b and c, the points of the BO-XGBoost model were more concentrated within the boundary line, while the BO-RFR model had multiple points with huge prediction absolute errors. Figure 6d, e shows that the relative error of the BO-XGBoost model was concentrated in a narrow interval. Finally, it was found that the BO-XGBoost model had higher performance indicators than the BO-RFR model. Er, MAE and RMSE were 2.70%, 6.40% and 8.35% higher, respectively. Therefore, the BO-XGBoost model had the best prediction effect and generalization performance for the YS modeling of V–N steel. The study made some data of V–N hot-rolled plate with steel number Q550D public and provided a GitHub link to test the BO-XGBoost model. The GitHub link is https://github.com/Sstar126/data.git. In the form of "black box function," the model directly realizes the efficient prediction of the YS of V–N steel hot-rolled plate. Compared with the traditional PM model, it has the advantages of convenience, high precision and stronger applicability. It can provide key and effective guidance information for the subsequent design of chemical composition and rolling process parameters and reduce the production of unqualified products.

Table II Optimized hyperparameters in three models
Fig. 6
figure 6

Comparison of RFR, BO-RFR, XGBoost and BO-XGBoost models, scatter plot of error distribution and relative error frequency distribution of BO-RFR and BO-XGBoost: (a) comparison results, (b)–(e) prediction results (Color figure online).

Effects of Feature Processing and PM Parameters on Models

In the study, the input features were reduced from 25 to 16 dimensions by XGBoost model feature importance analysis and Pearson correlation analysis, which greatly simplified the data structure. At the same time, four PM parameters, Ac1, Ac3, ε1 and ε2, were introduced to guide the model establishment. In this section, three BO-XGBoost models were established using the original data set, the data set after feature processing and the data set after introducing PM parameters. The influence of feature processing and PM parameters on the model was illustrated by comparing the three models' evaluation indexes and modeling efficiency.

As shown in Fig. 7a, the feature-processed data set's performance was slightly worse than the original data set's performance on the BO-XGBoost model because the removed features contain helpful information for modeling. However, Fig. 7b shows that the modeling efficiency of the data set after feature processing on the BO-XGBoost model was improved by 58.07%. This case showed that feature processing could simplify the model and improve the modeling efficiency under slight information loss. When chemical composition information is removed, dimensionality is reduced, and material variability is eliminated, which provided excellent help for calculating the model and storing data in the steel production process. At the same time, the related elements removed in the dimension reduction process, such as B, Ti, Ni and other elements, have a low influence on the mechanical properties of the plate.43 Through the guidance of data dimension reduction, the cost and design difficulty can be reduced in the actual material research and development. In addition, in combination with Fig. 7a and b, it could be found that the data set with the introduction of PM parameters increased the modeling time on the BO-XGBoost model by about 500 s compared with the data set after feature processing. However, its Er, MAE and RMSE were increased by 0.85%, 5.02% and 5.11%, respectively, indicating that the introduction of PM parameters improved the model's accuracy and increased the interpretability and scalability of the ML model. Ac1 and Ac3 are expected to provide guidance on the subsequent heat treatment of failed products, and ε1 and ε3 can provide assistance on the reduction of rolling passes.

Fig. 7
figure 7

Effect of feature processing and PM parameters on the model: (a) comparison results on the testing set, (b) comparison of modeling time.

Conclusion

A methodology was presented to train multiple ML models and perform hyperparameter optimization to predict YS using chemical composition and rolling process data of V–N steel hot-rolled plates. The framework presented includes industrial big data collection and cleaning, data dimensionality reduction and introduction of PM elements, multiple ML model building, hyperparameter optimization of ML models and evaluation of ML models. The data were obtained from the actual production process of domestic steel mills, and the high-dimensional data were reduced to low-dimensional data after cleaning and dimensionality reduction and passed to KNN, SVR, MLP, RFR, GBR and XGBoost models. The model was compared in two stages, including using six original models to ensure prediction results' comparability under the default parameters and using Bayesian optimization methods to find the hyperparameters of XGBoost and RFR models efficiently. The above two models integrated learning models that were widely used in related fields. Finally, the prediction results of BO-XGBoost and BO-RFR models were compared, and the effects of data dimensionality reduction and PM parameters on the modeling were discussed. The following key conclusions could be drawn from the above analysis.

  1. (1)

    Data preprocessing methods such as data cleaning, data normalization, Pauta criterion and RFR filling could significantly improve data quality and highlight data's regularity, which dramatically influenced data-driven modeling methods. In addition, the feature importance analysis based on the XGBoost model and Pearson correlation analysis reduced the dimensionality and introduced four physical metallurgical parameters, Ac1, Ac3, ε1, ε2, which could fully explore the guiding laws on PM.

  2. (2)

    The prediction and generalization of the XGBoost and RFR models were perfect without the hyperparameter search. The BO-XGBoost was the best model by Bayesian optimization (Er = 93.52%, MAE = 13.56 MPa, RMSE = 20.19 MPa). One possible reason is that the BO-XGBoost model can effectively solve the modeling problem of high-dimensional data sets and learn the correlation between chemical composition-rolling process-yield strength.

  3. (3)

    Feature processing and PM parameter calculation were introduced into this study; the former could improve the modeling efficiency significantly, and the latter could increase the model's interpretability and effectiveness. Therefore, studying different feature processing methods and PM parameters is an essential element and direction for modeling steel data.

ML models for the actual production of V–N steel hot-rolled plates can provide a powerful way to predict the expected mechanical properties in a relatively fast way with the chemical composition parameters and rolling process parameters designed by the researchers, and more production data will augment the training data and facilitate the generalization of the model.