1 Introduction

Potassium is a key element for plant growth. It participates in activation and synthesis of enzymes for various organic compounds (carbohydrates and proteins) and controls the transportation of water and nutrients in plants [1]. The potassium concentration of an agriculture filed can be ensured by fertilization using potash for example. Therefore, the potash mining industry is required to accurately determine and control the potassium concentration in different production stages to optimize the production process and evaluate the quality of the final product [2]. The existing methods often consist in manual sample collection followed by a determination in an onsite laboratory providing analysis data in a typical time interval in the order of one hour. In the framework of production process automation, online and automatic characterizations of processed materials in the potash production chain are expected to be developed. Among element analytical techniques which offer the capacity of online detection, prompt gamma neutron activation analysis (PGNAA) might be considered. But this technique suffers from disadvantages related to radioactive safety issue and low sensitivity [3]. X-ray fluorescence (XRF) may be used in online mode with correct sensitivity for heavy metals [4]. It becomes less sensitive for light elements such as potassium. Here again, the use of ionizing radiation would limit the application of XRF for industrial online analysis.

Laser-induced breakdown spectroscopy (LIBS) presents unique features that offer an excellent online detection and analysis capability [5]. The recent developments of this technique have led to many industrial applications, especially for inspection and analysis of materials on conveyor belts [6]. Some university laboratories and companies have proposed and tested the use of LIBS for online characterization of potash. The feasibility of LIBS for online characterization of potash was evaluated in comparison with PGNAA and XRF, with a laboratory-scale conveyor belt system and samples collected from production sites in Russia, Belarus and Israel (51 samples) without any additional pretreatment. The results prove the potential of LIBS for online determination of all elements related to industrial process control in potassium [7]. Although analysis of potash with LIBS does not represent an especially difficult task, an accurate determination of potassium concentration requires certain efforts. One of the difficulties comes from the high concentration of potassium in potash combined with the fact that the several lines of potassium detected in a typical LIBS spectrum correspond to resonant lines or those with quite small energy of the lower state. These lines are highly self-absorbable [8]. Therefore, it is necessary to correct self-absorption for a better analytical performance in the determination of potassium in potash.

The goal of this work is to efficiently correct self-absorption of the potassium lines detected in a LIBS spectrum of potash to improve the analysis accuracy. Notice that the used method for self-absorption correction has been developed previously and reported for self-absorption correction in various applications [9,10,11,12,13,14]. In this work, we combine self-absorption correction and multivariate regression based on machine learning and show the improvement of the performance of the models trained with self-absorption corrected spectra. It is thus a step forward of our previous work on a simultaneous determination of potassium and water in potash, where the self-absorption of the potassium lines was not corrected [15]. In this work, a method of self-absorption correction is implemented based on the relationship between the self-absorption coefficient \(SA\), the spectral line intensity and line width of the concerned lines by comparing a reference line without self-absorption and potassium lines with self-absorption to be corrected [16]. Self-absorption corrected spectra were thus used for univariate and multivariate regressions. We compare the analytical performances without and with self-absorption correction. For univariate models, the \({R}^{2}\) of the models based on the K I 404.4 nm and K I 693.9 nm lines increased from 0.4736 and 0.6229 to respectively 0.8971 and 0.9307 after self-absorption correction. We further developed a multivariate model based on machine learning. We observed optimization in feature selection, and then developed a regression model based on back-propagation neural network (BPNN) [17,18,19,20,21]. The root mean squared error of calibration (\(RMSEC\)) of the model decreases from 0.37 wt.% without self-absorption correction to 0.22 wt.% after self-absorption correction. At the same time, the root mean squared error of prediction (\(RMSEP\)) decreases from 1.90 to 0.41 wt.%.

2 Experimental

2.1 Samples

Twenty-seven potash fertilizer samples were collected online from Qarhan potassium salt deposit. The sample was white powder, the information of these samples included concentrations of potassium, calcium, sodium, and magnesium, determined using laboratory chemical analysis methods in the potash production factory. In Table 1, we only list the potassium concentrations of the collected samples. During the sample transportation, water initially contained in the samples was completely evaporated leaving dry powders. And during the experiment, the moisture of the samples was determined by the humidity of the laboratory which was a constant. In the experiment, we focused on the correction of self-absorption for potassium lines to improve the quantitative analysis performance.

Table 1 Potassium concentrations of the samples tested in the experiment

2.2 Experimental setup

The used experimental setup is shown in Fig. 1. The ablation source was a Q-switched Nd: YAG laser (Beijing Guoke Laser Technology) operating at 1064 nm with a repetition rate of 6 Hz, and delivering laser pulses of duration of 10 ns and pulse energy of 100 mJ. Laser pulses were oriented towards potash powders by a high reflection mirror and then focused by a lens of focus length 400 mm in normal incidence on the surface of the powders. The powders were placed inside of a circular groove delimited by walls on a disk rotating at a constant speed of 1 turn per second. The radius of the circular groove is 25 cm, which allows us to simulate a conveyor belt with a speed of  about 1.57 m/s. During the measurement of a sample, the laser ablations induced a furrow in the powder in the middle of the circular groove, which changed the distance between the sample surface and the focusing lens. A cardboard was fixed above the groove with the bottom side constantly sweeping the powder surface. This allowed in an experiment keeping an almost flat and constant sample surface by manually supplying the ablation groove with powders upstream the cardboard sweeping zone with respect to the rotation of the disk. The emission from the induced plasma was backward collimated by the laser pulse focusing lens and oriented by a dichroic mirror towards the entrance of an optical fiber after being collimated by a lens of 100 mm focal distance. The fiber was connected to a spectrometer (MX2500 + , ocean optics) with a spectral range from 200 nm to1000 nm and resolution of about 0.19 nm. For each sample, 2000 single-shot spectra were taken on the moving powder surface.

Fig. 1
figure 1

Experimental setup

3 Spectral data treatment method

Figure 2 shows the main steps of spectrum treatment procedure, which includes spectrum averaging, baseline correction, self-absorption correction, univariate regression and multivariate regression. In the subsections below, some details are given for each step in the data treatment procedure.

Fig. 2
figure 2

General flowchart of the spectral data treatment procedure

3.1 Spectrum pretreatment: baseline correction, denoising and normalization

In the pretreatment procedure, raw spectra were first averaged by 10 leading to 200 average replicate spectra for each sample. Baseline correction and denoising were processed on each average replicate spectrum of all the samples. For the operation, wavelet packet decomposition [16] were performed on the average replicate spectra. A spectrum is decomposed into a linear combination of wavelet packets in a given wavelet system. This projects the spectrum into the space of orthogonal base functions. Such wavelet packet transform is equivalent to apply different filters with a same bandwidth but different center frequencies to the original spectrum. An algorithm minimizes the error function to determine the decomposition coefficients corresponding to the optimized decomposition of the spectrum. The wavelet packets at low frequencies corresponding to the baseline of the spectrum were removed. The principle of wavelet packet denoising is similar to wavelet transform baseline correction. After the wavelet packet decomposition coefficients being determined, a threshold is applied to eliminate the noising part of the spectrum, and finally the denoised signal is reconstructed according to the processed coefficients. A baseline corrected and denoised spectrum is obtained and shown in Fig. 3b. Figure 3 shows an average spectrum before base line correction (Fig. 3a) and after its baseline correction (Fig. 3b). In Fig. 3a, we can see that a continuum appears in the spectrum, and such continuum presents discontinuity because of the connections between the used spectrometers covering different spectral ranges. In addition, variation of the powder sample surface with respect to the laser focus leads the spectral intensity to fluctuate. After treatment with wavelet packet transform on the full spectrum, the quality of the spectrum is significantly improved, as shown in Fig. 3b. Identification of the characteristic lines within the spectral range of the spectrum using the NIST Atomic Spectroscopy Database reveals main potassium emission lines, including K I 404.4 nm, K I 404.7 nm, K I 691.1 nm K I 693.9 nm, K I 766.5 nm and K I 769.9 nm lines. We can see in the insets of Fig. 3 that due to high concentration of potassium in potash, there is a pronounced self-absorption effect in the potassium lines. The K I 766.5 nm and K I 769.9 nm lines are even self-reversed. The other two groups of K I 404.4 nm, K I 404.7 nm, K I 691.1 nm K I 693.9 nm lines are clearly self-absorbed. The solution in this work was to first correct the self-absorption of these two groups of K lines before proceeding spectrochemical analysis with univariate and multivariate regressions. Before univariate and multivariate regressions, the spectra were normalized by the total spectral intensity which aims to reduce spectra fluctuations caused by the variation of experimental conditions, which was especially useful for analyzing moving materials as in the present experiment. The ensemble of operations of spectrum pretreatment generated the pretreated spectra.

Fig. 3
figure 3

a Average LIBS spectrum without baseline correction, and b the same spectrum after baseline correction. Insets show main emission lines from potassium. These lines are obviously self-absorbed and with self-reversal for the K I 766.4 nm K I 769.9 nm lines

3.2 Self-absorption correction

The model used in this work for self-absorption correction has been initially proposed and described in Ref. [17]. In this model, the actual intensity of a spectral line at a peak wavelength \({\lambda }_{0}\) emitted from a thick plasma, \(I\left({\lambda }_{0}\right)\), can be related to its intensity from the corresponding thin plasma in absence of self-absorption, \({I}_{0}({\lambda }_{0})\), by using the self-absorption coefficient \(SA\) [25]:

$${SA}=\frac{I({\lambda }_{0})}{{I}_{0}({\lambda }_{0})} .$$
(1)

In this work, \(I({\lambda }_{0})\) and \({I}_{0}({\lambda }_{0})\) correspond to respectively the peak intensities with and without self-absorption of the two groups of K I lines around 404.5 nm, 691.1 nm and 693.9 nm. The K I 766.49 nm and K I 769.90 nm lines presented self-reversal and were not suitable to be treated with the standard self-absorption correction procedure. The calculations presented in Ref. [17] leads to a practical formula for evaluating \({SA}\):

$${SA}={\left(\frac{\Delta \lambda }{2{w}_{s }}\frac{1}{{n}_{e}}\right)}^{\frac{1}{\alpha }} ,$$
(2)

where \(\Delta \lambda\) is the full width half maximum (FWHM) of a self-absorbed line, \({w}_{s}\) is the Stark coefficient of the line (available through the Stark-b database) [18], \(\alpha =-0.54\), being an empirical parameter [17], \({n}_{e}\) is the electron number density. Due to its sensibility with a linear Stark effect, the hydrogen Hα line is commonly used to determine \({n}_{e}\) [24]. In our case, however, the Hα line was too weak in the spectra so that we cannot extract \({n}_{e}\) in a precise way. Thus, we needed to find another line with a negligible self-absorption to evaluate the electron number density in the plasma. Due to the purity of the powder analyzed, the spectra presented relatively few lines, and after a test on several available lines, we found that the Ca I 422.68 nm line presents suitable characteristics to allow a precise determination of \({n}_{e}\). Its low concentration (0.4–2.7 wt. %) ensured a negligible self-absorption. For such line \({n}_{e}\) is related to the FWHM of the line, \(\Delta {\lambda }_{\mathrm{Ca}}\), by [17, 26]:

$$\Delta {\lambda }_{\mathrm{Ca}}=2{w}_{\mathrm{Ca}}{n}_{e} ,$$
(3)

where \({w}_{\mathrm{Ca}}\) is the Stark coefficient of the Ca I 422.68 nm line. Finally, the self-absorption coefficients for the K I lines were obtained with:

$${SA}={\left(\frac{\Delta {\lambda }_{K}}{{w}_{K }}\frac{{w}_{\mathrm{Ca} }}{\Delta {\lambda }_{\mathrm{Ca}}}\right)}^{\frac{1}{\alpha }} ,$$
(4)

where \(\Delta {\lambda }_{K}\) and \({w}_{K}\) refer respectively to the FWHM and Stark coefficient of the K I lines. Once we obtained \({SA}\), we can calculate potassium lines intensity corrected for self-absorption using Eq. (1).

Figure 4 shows the results of self-absorption correction, by a comparison between a pretreated spectrum without and with correction for the two groups of K I lines around 404.5 nm, 691.1 nm and 693.9 nm. We can see in the Fig. 4, that after correction the intensities of the lines are increased as the consequence of self-absorption correction. In addition, in Fig. 4b, the intensity ratio of the two lines after correction becomes very close to its theoretical value of 1:2.2, proving the validity of the used self-absorption correction procedure.

Fig. 4
figure 4

Detailed presentations of a pretreated spectrum without (in black) and with (in blue) self-absorption correction for the two groups of K I lines around 404.5 nm, 691.1 nm and 693.9 nm

3.3 Spectral data treatment with univariate and multivariate regressions

For the spectral data treatment, we first performed a classical univariate regression. For each sample, the 200 average spectra were further averaged into a global mean spectrum, together with the associated standard deviation. The ensemble of the mean spectral intensities of the samples were then adjusted with a linear regression optimized with least squares. Such regression was performed for spectral intensities before and after self-absorption correction.

A multivariate regression was then implemented according to the flowchart shown in Fig. 5. Five samples (S1, S4, S19, S22 and S26) were selected as test ones and isolated from the rests used as model training samples. The potassium concentrations of the test samples were distributed in a uniform way over the range covered by the ensemble of the samples to ensure their statistical representation by the training samples. Pretreated spectra without and with self-absorption correction were used for respective regression model training and testing, allowing us comparing the performances. A further treatment of standardization was applied to the ensemble of the pretreated spectra without and with self-absorption correction. It consisted in a simple operation which linearly transformed the intensity range of a raw spectrum into the interval between 0 and 1. A feature selection algorithm was then applied to the standardized pretreated spectra of the training samples. SelectKBest (SKB) was used in combination with covariance calculation, for a given spectral channel, between the spectral intensities and the potassium concentrations of the corresponding samples, assigning thereby a score to each of the spectral channels [19]. In this work, 150 highest ranked pixels were selected according to their obtained scores. The selection of such number of spectral channels corresponds to a criterion fixed for the Pearson's correlation coefficient of being greater than 0.75.

Fig. 5
figure 5

Flowchart of the training process of multivariate regression model

The ensemble of the standardized pretreated spectra of the training samples with their spectral intensities of the selected channels were fed into a back-propagation neural network as the input variables for training. The structure of the network consisted of three layers: an input layer with a number of neurons corresponding to the selected spectral features, a hidden layer with five neurons and an output layer with a single neuron receiving the output concentration. The training algorithm involved gradient descent optimization and iterative cross-validation as described in detail in our previous publications [19,20,21,22,23]. The cross-validation process generated parameters assessing the calibration performances of the model, including the determination coefficient \({R}^{2}\), the relative error of calibration (\({REC}\)) and the root mean squared error of calibration (\({RMSEC}\)). The trained model was then tested using the spectra of the test samples which were not involved in the training process. The selected spectral channels for the training samples were used to identify spectral features in each standardized pretreated spectrum of all the test samples. The ensemble of the standardized pretreated spectra of the test samples with their spectral intensities of the identified channels were fed into the trained model to assess its performances for prediction with spectra from independent samples, which results in prediction assessment parameters, including the relative error of prediction (\({REP}\)), the root means square error of prediction (\({RMSEP}\)), and the average relative standard deviation of predicted concentrations (\({RSD}\)). The detailed definitions of the mentioned above analytical performance assessing parameters can be found in Ref. [19].

4 Results and discussions

4.1 Univariate calibration models

The results obtained with univariate regression are shown in Fig. 6, without self-absorption correction in Fig. 6a, b with respectively the K I 404.4 nm and K I 693.9 nm lines, with self-absorption correction in Fig. 6c, d with the two same lines. The error bars in the figures correspond to the standard deviations (\(\pm {SD}\)) over the 200 average replicate spectra per sample. We can see that without self-absorption correction, the \({R}^{2}\) values of the linear fits of the data corresponding to the two lines have very low values of 0.4736 and 0.6229 showing the influence of self-absorption. With self-absorption correction the obtained results are much improved as shown in Fig. 6c, d, with increased \({R}^{2}\) values of 0.8971 and 0.9307 for the two K I lines respectively.

Fig. 6
figure 6

Results with univariate regression before self-absorption correction a and c for the two K I lines at 404.4 nm and 693.9 nm; and after self-absorption correction for the two same lines. Error bars on the intensities are standard deviations (\(\pm {SD}\)) over the 200 average replicate spectra per sample

4.2 Multivariate calibration models

4.2.1 Feature selection

The results of feature selection are shown in Fig. 7. Figure 7a, b show the results without self-absorption correction concerning the K I 404.4 nm line (a) and the group of K I 691.1 nm and K I 693.9 nm lines (b). While Fig. 7c, d show the results with self-absorption correction concerning the two same lines. In the up part of each subgraph, the concerned lines are shown with an emphasis on the selected spectral channels. In the bottom part of each subgraph, the Pearson’s coefficients calculated for the spectral intensities of a given channel over all the spectra of all the samples are shown to indicate the degree of correlation of the intensities of the channel to the K concentrations of the corresponding samples. We can see that the spectral channels within the profiles of the K I lines yield high Pearson’s coefficients. A closer inspection reveals details of the distribution of the coefficients. In particular, we can remark that before self-absorption correction, Pearson’s coefficient distributions present a dip around the peak of an emission line. This corresponds to a loss of correlation between the spectral intensities and the K concentrations of the corresponding samples due to self-absorption occurring especially around the peak of an emission line. Such dip can be observed for the three concerned K I lines around 404.4 nm, 691.1 nm and 693.9 nm, as shown in Fig. 7a, b. In addition, the dip depth increases when the intensity of the line increases, showing a heavier self-absorption for a stronger line. After self-absorption correction in Fig. 7c, d, we can observe a clear change of the behavior of the Pearson’s coefficient distribution around the peaks of the lines. Coefficient peaks appear (instead of dips) in correspondence with the peaks in emission lines, indicating strong correlations of the spectral channels around the emission peaks with the K concentration of the corresponding samples due to the self-absorption corrections on the spectral intensities. In addition, the maximal values of the Pearson’s coefficients are also increased with self-absorption correction showing its effectiveness. The self-absorption correction method proposed in this work improves thus the performance of feature selection and has an obvious effect on the improvement of the quality of the spectra.

Fig. 7
figure 7

Results of feature selection with a comparison between the cases before and after self-absorption correction. In the up part of each subgraph, the K I lines are shown with an emphasis on the selected spectral channels. In the bottom part of each subgraph, the Pearson’s coefficients are presented for the selected spectral channels. a K I 404.4 nm line without self-absorption correction; b group of K I 691.1 nm and K I 693.9 nm lines without self-absorption correction; c and d same lines with self-absorption correction

4.2.2 Calibration and test of the multivariate model

Figure 8 shows the results of multivariate model with the spectral data before (Fig. 8a) and after (Fig. 8b) self-absorption correction. In each subgraph, we can see ground-truth diagonal, training data and their linear fit, and test data. The parameters assessing the analytical performances of the models are presented in Table 2.

Fig. 8
figure 8

Multivariate models trained with spectral data before self-absorption correction (a), and after self-absorption correction (b), with ground truth diagonals in green dashed lines, training data (black circles) with their linear fittings (black lines) and corresponding \({R}^{2}\) values, as well as test data in red crosses. Error bars on the intensities are standard deviations (\(\pm {SD}\)) over the 200 average replicate spectra per sample

Table 2 Parameters showing the performances of the multivariable regression models

We can see first that the multivariate models present much better performances compared to the univariate models, without or with self-absorption correction. This result fits well those reported in our previous work [15], showing the effectiveness of a multivariate model based on machine learning to correct matrix effects and experimental fluctuations. We can especially remark the improvements allowed by self-absorption correction. As shown in Fig. 8 and especially in Table 2, the calibration performances are improved, with better \({R}^{2}\), \({REC}\) and \({RMSEC}\). The improvements in the prediction performances are even more impressive with much reduced REP, \({RMSEP}\) and \({RSD}\). These parameters indicate a high-performance model thanks to the effectiveness of self-absorption correction method developed in this work. Such performances can better fit the requirements of the targeted application of online and real-time analysis of potash on a product transportation belt in a production site.

5 Conclusion

In this work, with an experimental setup simulating an online measurement of moving potash, we have focused on the effect of self-absorption correction of LIBS spectra in the precise determination of potassium in potash. Such correction is especially important for the aimed application of online and real-time analysis of potash on a conveyer belt in a production factory, because of high concentrations of potassium in potash, in particular in the final stage of the production, reaching the level of 95 wt.%. The correction was performed on three intense lines of potassium detected in the spectral range of the detection system and presenting severe self-absorption; according to the classical self-absorption coefficient (SA) method. The effectiveness of the method was first evaluated with a classical univariate regression. Although an improvement can be observed with an \({R}^{2}\) value increased from 0.4736 to 0.8971 for the K I 404.4 nm line, and from 0.6229 to 0.9307 for the K I 693.9 nm line, the performances of the univariate regression models left room for improvement. Multivariate regression was then performed on the basis of machine learning algorithms. The feature selection, necessary for a high-quality multivariate regression, shows a clear influence of self-absorption with an anomalous behavior of the intensities of the channels around the peak of an emission line. Whereas a suitable correction of self-absorption leads to a correct behavior of the spectral intensities for their correlation with the potassium concentrations of the corresponding samples. Such features with better correlations with potassium concentrations allow a regression model with improved calibration and especially prediction performances. The obtained assessment parameters of the multivariate regression model with \({REP}\) = 0.12%, \({RMSEP}\) = 0.41 wt. % and \({RSD}\) = 1.26%, represent a satisfactory performance to fill the requirements of the aimed application of online and real-time analysis of potash on a product conveyer belt in a production site regarding the specific requirements of the related current national standards.