INTRODUCTION

In recent years, the use of laser-induced breakdown spectroscopy (LIBS) made it possible to considerably improve the characteristics of fixed and portable instruments for quantitative and qualitative analysis of samples [13]. LIBS is used for express analysis of the composition of various substances with minimal sample preparation or even without it, which is a considerable advantage over standard chemical methods. For example, LIBS was used in [1] for quantitative measurements of the chemical composition of welding joints of stainless steels with temporal and spatial resolution directly during welding. A possible solution of the problem of disclosure of various types of powder milk adulteration by using LIBS and machine learning methods is considered in [2]. These adulterations may cause serious indigestion of people. Work [3] is devoted to the use of LIBS for searching the methods of decreasing the errors in carbon calibration due to trace amounts of contaminants on the surfaces of low-alloy steels.

The steels and iron-based alloys are widely used almost in all human activity areas and hold a special place among the objects studied by LIBS. The presence of technological impurities, as well as doping with chromium, manganese, and other chemical elements, determines their physical, chemical, and technological properties. The quantitative analysis methods for determining the concentrations of dopants and technological impurities are important for classification or sorting of steels. Usually, this analysis is performed using mass-spectroscopy [4] and optical emission spectroscopy using spark discharge [5], inductively coupled plasma [6], and glow-discharge [7]. The LIBS advantages are the possibility of rapid multivariate analysis in open air and relatively inexpensive instruments with accuracy sufficient for qualitative analysis. As the main disadvantage of LIBS, we should note the insufficient accuracy of quantitative measurements [8]. Nevertheless, numerous works are devoted not only to qualitative but also to quantitative LIBS applications (see, for example, [915]). Due to different experimental conditions (wavelength, pulse duration, laser beam energy and focusing, spectrometer’s spectral range and resolution, delay time and interval of spectral measurements, the number of preliminary and measuring laser pulses, the number of accumulations, blowing of the object with gas), which considerably affect the measured spectra, LIBS is considered as a semiquantitative method [16]. Construction of uni- or multivariate quantitative models with different preprocessing of spectra also leads to results with considerably different accuracies.

The spectrometers used for LIBS in compact portable and mobile systems usually have a low resolution. Therefore, due to a strong overlap of emission wings, the classical univariate approach to construction of the calibration dependence by the intensity of an isolated analytical line is hardly suitable for such spectrometers. In this case, wide use is made of multivariate calibration models [1, 17] covering the entire recorded spectrum. According to [18], calibration is a process used to create a model that relates two types of measured data. In the present work, we calibrate the concentration of main elements in low-alloy steels, i.e., create a mathematical model that relates the sought concentrations of chemical elements in a set of known reference samples of low-alloy steels to the spectral data obtained by LIBS.

EXPERIMENTAL

Previously, we solved the problem of calibration over the entire range of measured low-resolution emission spectra (190–440 nm, resolution 0.4 nm, spectral step 0.1 nm). The experimental setup and the measurement conditions are given in [19]. We studied 44 reference samples of low-alloy steels UG0d–UG7d, UG9d (Russia) and 51/1–58/1, 72–76, 101–103, 110–125 (IMZ, Poland); from this set, we used for calibration from 31 to 39 samples with different (non-coinciding) concentrations of C (in the range below 0.8%), Mn (2.0%), Si (1.2%), Cr (1.0%), Ni (0.8%), and Cu (0.5%).

METHODS AND RESULTS

The training and test sets approximately identical in the number of reference samples were formed according to the conventionally used Kennard–Stone algorithm [20], i.e., the first sample selected into the training set has a concentration closest to the studied range center and the concentration of each subsequent sample should be most distant from the already selected. In our case, this algorithm allows obtaining of more stable models in comparison with the uniform or random distribution of the training sample due to narrowing of the estimation intervals of concentrations of chemical elements in the training set. After normalization of the spectra to the intensity at the characteristic iron emission wavelength of 252.0609 nm, calibration models were developed using the least-squares method with the root-mean-square error of prediction (RMSEP) with respect to the corresponding reference values in the test sample being RMSEP = 0.06% for С, 0.12% for Mn, 0.09% for Si, 0.13% for Cr, 0.07% for Ni, and 0.08% for Cu.

To increase the calibration accuracy, we partially took into account the requirements of [18] when forming the training and test sets. Since the requirement for the minimum number of samples in the training (24) and test (20) sets were in total not satisfied, about 60% of samples form the training set and remaining 40% form the test set. This proportion is determined by the requirement of [18] that the number of samples for training multivariate models should be identical to the number of used latent variables multiplied by a factor of six. For the test sample, this factor is four. This problem was solved at the first stage of the work. Then, we applied to the full-range multivariate model three methods of selection of spectral variables, namely, ranking of spectral variables (RSV) [21] according to their correlation with the sought parameter, successive projections algorithm (SPA) [22], and an original modification [23] of searching combination moving window interval partial least squares (scmwiPLS) [24]. Let us characterize each of these methods of selection of spectral variables and discuss the results obtained.

In the RSV method, all spectral variables are ranked in the order of decreasing coefficient of correlation with the calibrated concentration. Then, one variable with the minimal correlation is removed at each step and multivariate modeling is performed by the least-squares method [25] with determination of the optimal number of latent variables. The spectral variables are selected by the minimal root-mean-square deviation of the estimated concentration of the sought element in the test set samples from the reference values. If the selection of variables is restricted by the correlation coefficient value, this RSV method modification is called the significance multivariate correlation (SMC) method [26]. This method is characterized by introduction of arbitrariness of the researcher in the choice of this restriction, which is absent in the RSV method.

The characteristics of the full-range partial least squares (PLS) and PLS + RSV models are compared in Table 1.

Table 1. Characteristics of multivariate PLS and PLS + RSV models for calibration of concentrations of six chemical elements in low-alloy steels by emission spectra

One can see that the change in the proportion of the numbers of samples in the training and test sets from 1:1 to 3:2 leads to slight changes in the RMSEP only in the case of full-range calibration of Si and Ni concentrations (by 0.02% and 0.01%, respectively). This confirms the stability of multivariate PLS models to changes in the sizes of sets. Comparison of the RMSEP of the PLS and PLS + RSV models shows that the use of the correlation method of selection of spectral variables is insufficiently effective in the cases considered. The quality of the calibration models for Cr and Cu did not change because the number of variables selected from 3630 spectral lines was 3629 and 3625, respectively. The calibration quality for the other four elements improved, but insignificantly. Let us illustrate the obtained results of multivariate calibration of the C concentration by the PLS + RSV method. Figure 1 presents the dependence of RMSEP on the number of spectral variables eliminated from the model, which are ranked according to their correlation coefficient with the C concentration in the samples. The minimal RMSEP is achieved when the model includes 757 spectral variables, which allows one to obtain the calibration dependence shown in Fig. 2 with the use of four latent variables in the PLS method. The RMSEP is 0.04%, and the residual ratio of performance to deviation (RPD) in the test set (ratio of the root-mean square deviation of a parameter in the set to the root-mean-square deviation of the prediction from reference value) is 2.7. It is the best model among the models constructed for six considered elements by the PLS + RSV method, but it is only semiquantitative (2.5 < RPD < 3) [27]. Attention is drawn to the disposition of selected spectral variables shown in Fig. 3 for the emission spectrum of reference sample 123. One can see that only one spectral variable (252.06 nm) lies in the region of intense emission lines, which may provide useful information for calibration. The other selected spectral variables lie at the edges of the measured spectra, which, as expected, provide little information.

Fig. 1.
figure 1

Root-mean-square deviation of the estimated C concentration from reference values in the test set of samples of low-alloy steels as a function of the number of spectral variables ranked according to the correlation coefficient and removed from the partial least squares model.

Fig. 2.
figure 2

Relation between the estimated C concentrations and reference values in the case of the multivariate model constructed by the partial least squares method including 757 spectral variables with the best correlation with the calibrated parameter.

Fig. 3.
figure 3

Emission spectrum of reference sample 123 with indicated spectral variables used for calibration of carbon concentration by the PLS + RSV method.

The second method we used for selection of spectral variables for multivariate calibration is the successive projection algorithm (SPA) [22]. At the first SPA stage, for each of available 3630 spectral variables, an ordered sequence of all the other variables is formed. In these sequences, the second variable will be chosen according to the maximum projection to the subspace orthogonal to that of the first variable. This procedure is continued until all the measured spectral variables are included into the sequence. At the second SPA stage, based on the increasing sets of elements in each sequence, PLS models are constructed with selection of the optimal number of latent variables. In our case of restriction of the number of latent variables by ten, the number of these models is 36302 × 10 ≈ 1.3 × 108. At the third stage, by the minimum RMSEP, one determines the best multivariate model and, correspondingly, the sought set of spectral variables providing the minimum calibration error. Table 2 present the examples of the best multivariate models for the PLS + SPA method.

Table 2. Characteristics of the multivariate PLS + SPA models for calibration of concentrations of six chemical elements in low-alloy steels by emission spectra

Comparison of Tables 1 and 2 shows that the quality of the PLS + SPA calibration models is better than that of PLS + RSV for Mn, Si, Cr, and Cu, but worse for C. For Ni, the RMSEP is the same in both cases.

The third method used by us for selecting spectral variables was the scmwiPLS method. This method deals with spectral intervals (windows) of particular widths rather than with individual spectral variables. The original modification of the scmwiPLS method [23] contains three stages. The first stage consists in the construction of a full-range multivariate PLS model and determination of the optimal number n of latent variables by the minimum RMSEP value. The width of the spectral windows, in which the number of spectral variables exceeds n by unity, is fixed at the second stage. This condition minimizes the window width and retains the possibility of selecting latent variables even in one window. Then, the first window is shifted by one spectral variable per step and a multivariate model, which is also characterized by the RMSEP value, is constructed at each step. As the window reaches the edge of the measured spectral range, the optimal position of this window is determined by the minimum RMSEP and fixed. This procedure is repeated with the second spectral window. The modeling is performed over the spectral variables belonging to the first fixed and the second moving spectral windows. Each subsequent window adds to the model the n + 1 spectral variables to take into account all measured variables. For correct scmwiPLS operation, it is necessary to preliminary reduce the total number of spectral variables to a value divisible by n + 1. The total number of multivariate scmwiPLS models in the c-onsidered case for, e.g., four latent variables, is 36302/10 ≈ 1.3 × 106, which is two orders of magnitude smaller than for PLS + SPA. The third stage consists in the selection of spectral variables corresponding to a combination of windows such that the scmwiPLS model based on them has the minimal RMSEP.

Let us consider in detail the use of scmwiPLS for calibration of the Mn concentration. The full-range PLS demonstrates the minimum RMSEP = 0.12% for four latent variables. The corresponding error in training is RMSEC = 0.14%. For the training set, we have RPDC = 4.7, and, for the test set, RPDP = 1.8, which indicates degradation of this calibration quality parameter due to narrowing of the considered range of Mn concentration in the test set.

In the scmwiPLS method, the selection of spectral variables in the emission spectra for calibration over four latent variables is performed using windows with a width of five variables. The initial number of variables (3630) is divisible by five and should not be decreased. Figure 4 shows the dependence of RMSEP on the number of spectral windows taken into account in the PLS model. The minimum root-mean-square estimation error corresponds to 19 windows or 95 variables. The positions of the selected spectral variables on the emission spectrum of reference sample 123 are shown in Fig. 5. One can see that, in contrast to the C calibration, most of the selected spectral variables lie in the region of intense emission lines. The same dependence was also observed for the other calibrated chemical elements except for C, for which the scmwiPLS method, as well as the two previously used methods, selects spectral variables outside the region of observed intense emission lines.

Fig. 4.
figure 4

Dependence of the root-mean-square error of prediction (RMSEP) of the Mn concentration in the test set of low-alloy steels by emission spectra with the use of the partial least-squares model with selection of spectral variables by searching combination moving window on the number of spectral windows.

Fig. 5.
figure 5

Emission spectrum of reference sample 123 with indicated spectral variables used for calibration of the Mn concentration by the scmwiPLS method.

The calibration of the Mn concentration by the scmwiPLS method is characterized by the following quality parameters: RMSEC = 0.15%, RMSEP = 0.04%, RPDC = 4.4, and RPDP = 5.2. Thus, the developed multivariate model is quantitative for both sets.

Table 3 lists the characteristics of the scmwiPLS models for all six calibrated chemical elements.

Table 3. Characteristics of the multivariate scmwiPLS models for calibration of concentrations of six chemical elements in low-alloy steels by emission spectra

CONCLUSIONS

The use of the spectral variable selection methods makes it possible to improve the quality of multivariate models of calibration of concentrations of main technological impurities and dopants in low-alloy steels by the data of laser-induced breakdown spectroscopy. The calibration based on the partial least squares method with selection of spectral variables by searching combination moving window with a width exceeding the number of latent variables by unity is qualitative only for Cu in the concentration range of 0.06‒0.26%. Similar calibration models for C in the concentration range from 0.13 to 0.43%, Mn (0.47–1.15%), Si (0.15–0.33%), Cr (0.09–0.43%), and Ni (0.05–0.25%) are quantitative.