1 Introduction

Recently, the use of machine learning and its applications have become common in many fields of science [1,2,3]. The role of machine learning in physics has also been seen recently in several reports [4,5,6,7,8,9,10,11]. Regression analysis is a fundamental concept in the field of machine learning. It undergoes supervised learning where the algorithm is trained with both input features and output labels. It helps in establishing a relationship among the variables by estimating how one variable affects the other [12]. Regression in machine learning consists of mathematical methods that allow data scientists to predict a continuous outcome (y) based on the value of one or more predictor variables (x). Linear regression is probably the most popular form of regression analysis because of its ease-of-use in predicting and forecasting. In this paper, one of the fields of physics, namely X-ray lasers, has been investigated using this machine learning model.

Soft X-ray lasers have many applications in the field of industry and medicine like microscopy, holography and lithography [13]. Soft X-ray lasers are also used for interferometry and radiography of compressed fuels in the field of ICF [14]. Laser plasmas or plasmas produced through high-power optical lasers are one of the sources of soft X-ray laser production that have been used in many experimental and theoretical researches [15,16,17,18]. The first laboratory experiment to produce a soft X-ray laser in 1985 goes back to the use of Se+24 as the active medium [18]. Then, significant progress was made theoretically and experimentally using various methods of X-ray laser pumping [14]. Among the various types of pumping methods, collisional excitation scheme has been expanded and the saturation gain at the wavelengths of soft X-ray laser in Ne-like zinc [19], Ne-like germanium [20] and Ne-like selenium [21] ions has been reported using this pumping method. In this method, first, an optical beam is created in the visible region focused on the target and the plasma medium. After that, population inversions of transition levels produce with the collision of free electrons that take the bound electrons to quasi-stable levels in Ni-like or Ne-like ions, and activity of soft X-ray laser in the form of enhanced spontaneous emission (ASE) occurs. Since the production efficiency of X-ray lasers from optical lasers was relatively low, a pre-pulse was successfully used to enhance the production efficiency of Ne-like soft X-ray lasers in several experiments [22, 23]. Research shows that the use of active plasma medium and the use of the collisional excitation approach with the use of transient pumping are the most beneficial in the production of soft X-rays [15]. From the physical point of view, the absorption efficiency of pump laser radiation is higher with the addition of a pre-pulse and less energy is spent on plasma expansion. In this case, with the sudden increase of the electron temperature in the area where the density of Ne-like ions is high, a fundamental improvement in the X-ray laser efficiency is achieved. This rapid increase of electron temperature is possible through plasma radiation produced with a short (few picoseconds) and high intensity laser pulse. This pulse is so fast that it prevents processes such as vaporization, heat conduction, expansion and re-ionization, or reverse processes during pump.

In reference [11], it has been proposed the use of a multilayer perceptron neural network for prediction of X-ray laser gain coefficient, which requires training the neural network with multiple data, therefore in this research it is proposed using of a multiple regression model that is presented an equation to predict gain coefficient.

Since the gain factor of the soft X-ray laser produced from laser plasmas depends on the characteristics of the pump laser [24,25,26,27,28], in this research, a model for predicting the gain coefficient has been presented using the multiple regression equation. Multi regression analysis is the process of estimating the relationship between a dependent variable (here gain coefficient) and several independent variables (pump pulse characteristics). In simpler words, it means fitting a function from a selected family of functions to the sampled data under some error function.

In the next section, a multiple regression model is explained. In the Sect. 3, the soft X-ray laser gain obtained from hydrodynamic calculations is discussed and the data used for regression analysis is shown. Then, in the Sect. 4, the results of regression analysis are shown and it can be seen that the regression equation predicts well the results of numerical simulations of hydrodynamic equations.

2 The multiple regression model

Regression analysis is one of the most basic tools in the area of machine learning used for prediction. Using regression you fit a function on the available data and try to predict the outcome for the future or hold-out data points.

In general, such a model along with random error is written as follows:

$$ y = \beta_{0} + \beta_{1} x_{1} + \beta_{2} x_{2} + ... + \beta_{p} x_{p} + \varepsilon $$
(1)

Here, y is the predicted value of the dependent variable for any value of the independent variable (x), β0 is the value of y when x is zero, β1 to βp are the regression coefficients, meaning the expected change in when x increases and Ɛ is the estimated error in the regression. Creating and estimating the regression coefficients in this model reveals the linear relationship between independent and dependent variables and provides the possibility of dependent variable prediction.

3 Properties of the pump laser

In reference [27], one-dimensional hydrodynamic simulation code MED103 [29, 30] has been used in order to investigate and study the plasma active medium during laser interaction with plasma. The laser plasma is produced using the transient pump method, which consists of two pump pulses with a delay time of several hundred ps. The first pulse, a long pulse with a width of several hundred ps, hits the solid target and produces plasma with the necessary degree of ionization. The second pulse, which is a short pulse with width of a few ps, heats the free electrons quickly in a shorter time than the plasma ionization time, up to several hundred eV. As a result, the necessary conditions for the pumping of laser active ions are prepared by collisional excitation in such a way that the gain of the soft X-ray laser is high. In the reference [27], the first, a pre-pulse with intensities in order of 1013 W/cm2 and widths of several hundred ps, and therefore with different energies and a wavelength of 800 nm, is irradiated to the germanium target surface with a thickness of 25 μm. Then the main pulse is introduced with different intensities in order of 1015 W/cm2 and widths of several ps with the same wavelength and delay time of 100 to 200 picoseconds between the maxima of the two pulses. In this case, the maximum gain of the soft X-ray laser has been investigated at the wavelength of 19.6 nm. Therefore, the effective factors in producing the maximum gain of the soft x-ray laser are the intensity and width of the pre-pulse, the intensity and width of the main pulse and the delay time between the two pulses. These parameters are independent variables in regression model and the gain coefficient of the soft x-ray laser is dependent variable. They are used to create a regression model. In this case, by creating a regression relationship, it is possible to predict the output soft x-ray laser gain for each input of the effective parameters in the gain coefficient, without the need to solve complex hydrodynamic equations.

4 Results and discussion

As stated in the previous section, in this research, the data of reference [27] is used to create a regression relationship. For this purpose and for the statistical analysis of these data, IBM SPSS Statistics software version 26 is used. Before the regression analysis, it is better to check the existence of a linear relationship between each of the independent variables and the dependent variable by using the scatter diagram and also calculating the correlation coefficient.

As it is clear from the scatter diagram (Fig. 1), it seems that there is no linear relationship between the gain coefficient and the intensity and width of the pre-pulse. While a very good linear relationship can be established between the gain coefficient and the intensity of the main pulse, and also there is a relatively good relationship between the gain coefficient and the delay time parameter. Since one of the important conditions in estimating linear regression parameters is the normality of the residuals, the Durbin-Watson test is used to test the randomness and independence of the residuals. Then Pearson’s correlation coefficient (R) and coefficient of determination (R Square) have been investigated. The coefficient of correlation and the coefficient of determination are obtained 0.652 and 0.425, respectively. In fact, these coefficients indicate a linear correlation between the value of dependent variable and the value predicted by the model. If these coefficients be near to 1, the model has been able to predict the dependent variable well.

Fig. 1
figure 1

The matrix of the scatter diagram for independent and dependent variables (W1 and I1 are pre-pulse width and intensity; W2 and I2 are main pulse width and intensity; dt is delay time between the two pumped pulses and gain is the gain coefficient of soft X-ray laser)

Table 1 shows the estimation of coefficients and characteristics related to their test. In B column, the real coefficients of the variables and the constant value in the model are given. Therefore, the regression equation is displayed as follow:

$$ gain = 300.521 - 9.206\,I_{1} + 15.266\,I_{2} \, - 0.158\,W_{1} + 1.560\,W_{2\,} - 0.341\,dt $$
(2)
Table 1 Regression coefficients, standard errors, T-values and significant level (Sig.) for 5 independent variables

I1 and I2 are the intensities of the pre-pulse and main pulse of the pump laser beam, W1 and W2 are the width of the pre-pulse and the main pulse of the pump laser, dt is the delay time between the two pulses, and gain is the gain coefficient of the output soft X-ray laser. Also, comparing Eq. 1 and 2, it can be seen that the error term is omitted in Eq. 2, because error term is included to account for the difference between the actual and predicted results and is not part of the prediction itself. In fact it reflects the discrepancy between the actual and predicted values, which can only be observed after we know the true outcomes.

The real coefficients in B column are created according to the measurement unit of each variable. Therefore, it is not possible to distinguish the importance of the relevant variable in the regression model based on the magnitude of each coefficient. To determine the importance of each variable and their role in the regression model, one should pay attention to the column of standardized coefficients. Standardized data are used to calculate standardized coefficients. This means that each value of dependent and independent variables is subtracted from their mean and divided by the standard deviation. Then the regression model is fitted and the coefficients are calculated. Since the variables in this case don’t have units, their magnitude isn’t depended on the measurement unit.

Therefore we use the Standardize Beta Coefficients column to determine the importance of each predictor. Any variable with a larger Beta is more important in the regression model. This means the variables can be easily compared to each other with Beta. In this way, it is clear that the variable I2 with Beta = 0.572 is the best variable for predicting the dependent variable in the regression model.

Columns T and Sig are relevant to coefficient hypothesis test. The larger value of T has led to the weaker assumption that the coefficient is zero, and therefore the greater role of that variable in modeling. Table 1 shows the large T-values for the main pulse Intensity and the delay time therefore these two variables have the great role in the gain coefficient. This magnitude is also determined by the value of Sig. At the error level of 0.05, if the value of Sig is smaller than 0.05, the null hypothesis, which indicates that the variable is ineffective in the model, is rejected. Therefore, for sig < 0.05, the regression model is suitable. A value of 0.05 is considered as the same type I error or significant level. According to Table 1, the coefficients of the intensity of the main pulse and the delay time between the two pulses have a Sig smaller than 0.05, so they are statistically significant and should be included in the model, while the intensity and width of the pre-pulse and the width of the main pulse have a Sig greater than 0.05 that are removed from the model and regression calculations should be performed again to reach the correct coefficients for significant variables.

Table 2 shows the coefficients of the regression model considering only two independent variables, the intensity of the main pulse and the delay time between the two pulses. In this case, more correct and accurate coefficients have been obtained for these two variables. However, due to the small difference of these coefficients with the previous coefficients, it can be concluded that the regression model presented in relation 2 is a successful model for predicting the gain coefficient using the pump pulse characteristics.

Table 2 Regression coefficients, standard errors, T-values and significant level (Sig.) for 2 independent variables (I2 and dt)

Figure 2 shows the gain coefficient versus the main pulse intensity (I2) and the delay time between the two pulses (dt). The circles are the obtained values from the hydrodynamic code and the red line is the equation of the regression line. By comparing the two graphs in Fig. 2, it can be seen that the regression line is better for describing the data in the Fig. 2a. Since a previous study using a neural network revealed the significant role of the main pulse intensity in the gain coefficient [11], determining a linear relationship between efficiency and intensity can effectively contribute to predicting the gain coefficient. In this way, instead of solving complex hydrodynamic equations, this simple linear relationship can be employed for practical applications. However, according to the dispersion of the data compared to the regression line in Fig. 2, it should be noted that this regression line can be used in a narrow range of a small set of parameters. For a more accurate match, it is necessary to make corrections to the model, at least of the second order.

Fig. 2
figure 2

Gain coefficient vs (a): intensity of the main pulse and (b): the delay time between the two pulses. (Circles (⦁) are observed values and the straight line is the linear regression function with one independent variable)

5 Conclusion

Soft X-ray lasers are a type of coherent light source that operate in the soft X-ray region of the electromagnetic spectrum, typically ranging from about 10 to 100 nm in wavelength. These lasers are known for their high brightness, short wavelength, and ability to produce ultrafast pulses, making them invaluable tools for a wide range of scientific and technological applications. The amplification of soft X-ray lasers involves a process where the intensity of the emitted X-ray light is increased through stimulated emission. This is typically achieved using a high-energy pump source, such as a high-power optical laser, to excite the medium and create a population inversion. The amplified X-ray beam produced through this process is highly coherent and intense, making it suitable for applications that require high-resolution imaging and ultrafast time-resolved measurements. Therefore, it is important to optimize the conditions of the pump laser to achieve an enhanced X-ray laser. With the expansion of machine learning in various sciences, including physics, in this research, a model is presented to predict the gain coefficient of soft X-ray laser output from laser plasmas by using the multiple regression analysis. In fact, with the characteristics of a two-pulse pump laser, including the intensity and width of the pre-pulse, the intensity and width of the main pulse, and the delay time between the two pulses, it is possible to predict gain of soft X-ray laser from laser plasmas by using this model directly and without solving the hydrodynamic equations numerically. For this purpose, by using values obtained from the hydrodynamic code for training, a linear relationship between the x-ray gain coefficient as the dependent variable and the characteristics of pump laser as independent variables was derived. With consideration of Sig and T values was conclude that the model yields satisfactory linear relationship between the gain coefficient and two of the independent variables, which are main pulse intensity and delay time. Although the presented model performs well within a specific range of parameters, it should be noted that for broader applications or higher accuracy, further improvements and corrections are necessary. One possible approach to enhance the model’s accuracy is by adding second-order or higher-order corrections. These modifications can improve the model’s performance across a wider range of parameters and increase its alignment with real-world data.