1 Introduction

In recent years, with the acceleration of Chinese industrialization, heavy metal pollution in soil has become increasingly serious, especially Cadmium (Cd) [1]. Cd is listed as a top environmental pollutant by the United States Environmental Protection Agency and is suspected to be a carcinogen [2]. According to the report, Cd is a huge toxic element and eventually enrich the human body with the food chain, threatening human health. Such as the“Itai–Itai Disease” happened in Japan [3], cadmium poisoning in Guangxi, China [4], all make a great impact on human health. So it’s vital to monitor the Cd in soil.

For the soil analysis, conventional methods include atomic absorption spectroscopy(AAS) [5], inductively coupled plasma mass spectroscopy(ICP-MS) [7] and so on. Although these methods can accurately measure the trace elements in the soil, they are time-consuming and is extremely expensive to set up a system, more importantly, they can only be carried out in the laboratory. Therefore, the development of a new and rapid method for substance detection is urgently needed. Laser-induced breakdown spectroscopy (LIBS) has attracted considerable attention in substance detection because of its rapid and convenient detection and simple sample preparation [8]. LIBS is a photoluminescence technology based on atomic emission spectroscopy. It has been rapidly developed in recent years and is widely used in various fields of production and life, such as environmental detection [9], biomedical science [10], nuclear industry analysis [11], antiques and artwork identification [12], and soil testing [14] and so on.

LIBS for soil test has been rapidly developed, but due to the extremely rich element content in the soil, the internal matrix structure is also quite complex [16], the physical and chemical properties are variable, and the Cd naturally is a trace element in soil [17]. That all make LIBS have poor accuracy and low repetition. Try to settle that, Liu built univariate and multivariate quantitative analysis models to research the signal enhancement of Cd in a different gas environment [18] and Sirven combined the neural network analysis to LIBS, the prediction accuracy of about 5% [19]. However, the multivariate quantitative analysis is complex and time-consuming, for it requires calculating the theoretical value of many variables. So it still needs to find a better way to improve the detection of Cd in soil.

This work proposes a concise analysis algorithm called maximum likelihood estimation to LIBS, based on the Lomakin-Schiebe formula, which could improve the analysis ability of LIBS by correcting the self-absorption effect. This paper compared with the univariate calibration model, indicate that the predicted concentration of ML estimation is more stable. And compared self-absorption calculated by measuring the half-widths of the analytical line [20], shows that the maximum likelihood method could correct the self-absorption.

2 Experimental setup

2.1 Optical detection system

Radium optoelectronics Nd:YAG laser (Dawa-300) with a center wavelength of 1064 nm, pulse width < 7 ns, and repetition rate of 1 Hz was used, The corresponding plasma emission spectra signals were collected with a lens (focal length: 8 mm) and then coupled into the fiber for preliminary acquisition analysis by using a spectrometer (Avantes, AvaSpec-ULS2048CL-7-EVO-RM, measurement range: 200–780 nm; and resolution: 0.09–0.13 nm). The laser and the spectrometer timing were controlled by the US digital delay controller DG645, which can be accurate to the picosecond scale. The sample rotary table was controlled by Daheng Optoelectronics (progressive control motor). The beam splitter by Wuhan Youguang Technology (PBP 1125-633) was used to split light and direct a portion of the light into the Israeli laser power meter NAVAII detector for real-time monitoring of energy (Fig. 1).

Fig. 1
figure 1

LIBS experimental schematic

2.2 Sample preparation

Our sample is prepared by calculating the ratio of the mass of Cd to the total mass of the sample and refers to the sample production procedure in soil technical analysis specification [22]. The experimental sample is prepared by the next step: The experimental soil was derived from the pastoral soil around the school and was naturally air dried, and screened for use with 100-mesh sieves; Used the high precision electronic balance(FA1204B, weight range: 0-120 g, precision: 0.1 mg) to weight 9.9999 g soil particles, and the particles were spiked using a 1000 μg/g GBW(E)08612 Cd standard solution at a range of 0.1–1.2 ml by the pipette, and add 10 ml deionized water, mixed the solution and soil particles sufficiently with the magnetic stirring apparatus(model: C-MAG HS7); Next, put the mixed sample to the oven with the temperature set to 353 K, and baked for 4 h until the water completely evaporated to dryness; Place the dried sample on the balance to weight for three times until the sample mass is 10 g. Finally, use a tablet press with 2000 MPa to compress the soil sample into a round cake shape. the soil sample containing the gradient concentration Cd is shown in Table 1; the correction set sample is #C1–#C9, the test set sample is #VC1–#VC3:

Table 1 Reference content of Cd in experimental samples

3 Results and discussion

3.1 Feature line and line selection

A glass substrate was placed with 1000 μg/g pure Cd solution and baked in the oven at 333 K for 6 h to obtain a pure Cd characteristic peak, as shown in Fig. 2a. The spectrum was at Cd II 214.41 nm, Cd II 226.50 nm, and Cd II 228.80 nm. The intensity of the three spectral lines is higher and the interference is lower. However, the line interference is more serious at Cd II 226.50 nm and Cd II 228.80 nm, and the overlapping of characteristic peaks of different elements makes the line difficult to distinguish, as shown in Fig. 2b. This result can be attributed to the rich content of elements in the soil. Therefore, the Cd II 214.41 nm characteristic peak was selected for this experiment.

Fig. 2
figure 2

a Pure Cd line on the glass substrate in the spectral range 200–230 nm. b Cd line in soil samples in the spectral range 200–230 nm

4 Conventional univariate calibration model

In this part, the calibration curves based on the linear calibration model are constructed by the representing intensities of the Cd II 214.41 nm line. the linear regression of line intensities as a function of Cd in the soil is shown in Fig. 3. Table 2 sum up the evaluation of the model. Try to evaluate the accuracy and the precision of the established models, using four evaluate index to explain, that R2 is correlation coefficient, RMSEC is Root mean square of calibration; RMSECP is Root mean square of prediction; RSD is the relative deviation of the predicted concentration for the validation set. The detection limit is calculated as follows: LOD = 3σ/k. The σ of this paper is the standard deviation of 20 background (noise level) around the characteristic peak, and k is the slope of the calibration curve. The detection limit obtained by this method is 14.9 μg/g.

Fig. 3
figure 3

The calibration of the curve of Cd in soil based on the univariate model

Table 2 The main index of ML and Univariate model

5 Maximum likelihood model

The measurement results obtained by the same standard sample under the same experimental environment should be discrete because of the large number of uncontrollable factors. The test on soil sample #5 is repeated, and the result is normally distributed as shown in Fig. 4,

Fig. 4
figure 4

Histogram of 100 measurements of the intensity distribution of sample #5

The same standard sample is tested under the same experimental conditions, and the results are discrete. When the number of measurements is sufficient, the basic normal distribution is recorded as

$$ I\sim N(\mu ,\sigma^{2} ) $$
(1)

where μ is the mathematical expectation of the random variable I, and σ2 is the variance of the random variable I.

The theoretical formula of the standard curve is usually expressed by the Lomarkin–Schiebe formula, that is, under certain experimental conditions and within a certain range of elemental content, the relationship between the net intensity I of the emission line of the element and the content C is

$$ I = aC^{b} $$
(2)

The mathematical expectation of the measured value I is true in the absence of a systematic error, and the true value can be expressed by Eqs. 2, 3:

$$ \mu = aC^{b} $$
(3)

From 2-1 and 2-3

$$1\sim N(aC^{b} ,\;\sigma ^{2} )$$
(4)

By the nature of variance

$$ \frac{I}{\sigma }\sim N(\frac{{aC^{b} }}{\sigma },1) $$
(5)

The number of standard samples used in the experiment is n, and the content of an element in each standard sample is C1, C2,, Cn, and the corresponding intensity of the sample is I1, I2,…, In, the corresponding standard. The difference is σ1, σ2,, σn. In this paper, the maximum likelihood estimation method is used to estimate the theoretical curve parameters a and b using experimental point estimation (Ci, Ii).

The similarity equation of the sample is

$$ {{L}_{n}}=\prod\limits_{i=1}^{n}{\frac{1}{\sqrt{2\pi}}}{{e}^{-\frac{1}{2}({}^{{{I}_{i}}}/{}_{{{\sigma}_{i}}}-{}^{aC_{i}^{b}}/{}_{{{\sigma}_{i}}})}}^{2}=\frac{1}{{{(\sqrt{2\pi})}^{2}}}{{e}^{-\frac{1}{2}Q(a,b)}} $$
(6)

Among them

$$ \text{Q(a,b)}=\sum\limits_{i=1}^{n}{\frac{1}{\sigma_{i}^{2}}}{{({{I}_{i}}-aC_{i}^{b})}^{2}} $$
(7)

Finally, Eq. (7) is solved with a certain precision by the mean and variance of 12 existing sample points.

As for ML estimation, each calibration data is chosen by 50 shuts of the pulse. The 200 mJ pulse is used to ensure the high self-absorption of Cd. All experimental conditions are the same as the univariate model. The error bar is the standard deviations of the intensities calculated by the four replicated measurements under the same experimental conditions, which show in Fig. 5. And the LOD calculated by the method mentioned in the previous section is 7.84 μg/g.

Fig. 5
figure 5

Maximum likelihood estimation of Cd concentration in soil. The validation data is represented by the Cyan square

6 Correction of self-absorption effect

The article studies the change in the overall self-absorption effect of 12 samples under different delay times and energies. The delay time starts from 0 ns and is analyzed in steps of 500 ns. The absorption coefficient is compared with the results calculated by the theory, which is shown in Fig. 4b.

The relationship between the self-absorption coefficient SA and the full width at half maximum of the radiation spectrum is [21]:

$$ SA = \left( {\frac{\Delta \lambda }{{\Delta \lambda_{0} }}} \right)^{{{\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 \alpha }}\right.\kern-0pt} \!\lower0.7ex\hbox{$\alpha $}}}} = \left( {\frac{\Delta \lambda }{{2\omega_{s} }}\frac{1}{{n_{e} }}} \right)^{{{\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 \alpha }}\right.\kern-0pt} \!\lower0.7ex\hbox{$\alpha $}}}} $$
(8)

The algorithm model is based on the nonlinear relationship between spectral emission intensity and element concentration, and the self-absorption coefficient is corrected to obtain the best parameter estimation value. The estimated value of the algorithm is an average estimate of 12 samples, so the correction value using the relationship between the spectral width and the self-absorption coefficient in Fig. 4 is also the average of 12 samples.

As seen from the figure above, the self-absorption effect gradually increases with the increase of energy and delay time. By comparison, the results shown by the two are consistent. Under different delays and energies, the overall deviation is within 3%. The above results all indicate the feasibility of the secondary method in correcting the self-absorption effect (Fig. 6).

Fig. 6
figure 6

Self-absorption coefficient varies with delay time, theoretical estimate (red line), and maximum likelihood estimation (blue line)

7 Discussion

The result of the comparison between the ML estimation and the conventional univariate model is shown in Table 2. It is found that the former has a better degree of R2, and the RMSEC and RMSEP are lower. The difference between the two in the ML model is smaller, which indicates that the model has better stability. In addition, the predicted concentration of ML estimation and conventional univariate model are listed in Fig. 7. It could find the former is closer to the actual concentration. Because the ML method can directly use the Lomakin-Schiebe formula to evaluate its self-absorption effect, reducing the influence on the characteristic spectrum, thereby reducing the volatility of the spectral line, improving the calibration result stability and accuracy of predictions.

Fig. 7
figure 7

Comparison of ML predicted and univariate predicted concentration with standard values

7.1 The actual soil

In this part, the research will test the ML estimation in the actual soil sample (N1, N2, N3, N4) which has been tested by the Institute for Environmental Reference Materials Ministry of Environmental Protection and compare the results with the reference concentration. From Table 3, it is seen that the estimation of the ML model for N1, N2, N3, are 129.6 μg/g, 23.8 μg/g, and 59.2 μg/g. They are all close to the reference concentration, and the relative error is within 6%. However, the paper could not detect the Cd in N4 due to the extremely low content of Cd in this sample.

Table 3 comparison of reference concentration and ML estimation

8 Conclusion

The maximum likelihood estimation algorithm was used to study and analyze 12 samples with gradient concentration. The article compares and analyzes the maximum likelihood estimation from different absorption coefficients and theoretical coefficients obtained from the full width at half maximum of the radiation spectrum under different energies and time delays. The deviation between the two is 3%, which indicates that the algorithm can compare the corrected absorption effect. In the quantitative analysis, the maximum likelihood estimation algorithm and conventional univariate model are used for quantitative calculation. The fitting R2 values are 0.990 and 0.965, respectively, which indicates that the algorithm has a certain improvement in quantitative accuracy. Finally, the theoretical calculation is used to evaluate the elemental detection limit of Cd, and the detection limit can be as low as 7.84 μg/g, which is equivalent to the previous reports. This paper explains the feasibility of the maximum likelihood algorithm in LIBS research from many angles.