1 Introduction

Development of TDLAS has continued for several decades since the first demonstration of high-resolution spectroscopy with lead–tin telluride diode laser by Hinkley et al. [1]. Advantages of TDLAS include high sensitivity and selectivity, rapid response speed, and nondestructive detection. Sensors based on this technique can be tailored to determine parameters such as temperature, pressure, species concentration, or velocity. TDLAS has been extensively used in atmospheric environmental monitoring, industrial process control, medical diagnosis, military, and public safety fields [24]. However, the spectral data always contain noise and disturbance signals (i.e., overlap effect), so it is significant to perform data preprocessing to obtain highly precise and accurate data. As a result, many experimental schemes or techniques have been proposed for sensitivity improvement and resolution enhancement [58], generally classified into two categories: software- and hardware-based techniques.

Digital filtering techniques based on software for online noise reduction or off-line data processing of recorded spectra are a better choice when temporal resolution and lower system cost are priorities. The key point is the optimization and choice of its input parameters when applying the selected digital signal processing techniques. Multi-signal averaging is a relatively simple and widely adopted method for noise suppression; however, it is time consuming and only adaptive to white noise [9]. Generally, derivative calculation is used as a resolution enhancement technique to facilitate the detection and location of poorly resolved components in the complicated spectra; however, numerical computation of the higher-order derivatives has also computational (time) costs. Among various filter techniques, wavelet transform (WT) is a powerful signal de-noising technique [14], but this method depends on more parameters, for example, mother wavelet type, thresholding method, threshold estimation, and decomposition level. Recently, the Savitzky–Golay (S–G) smoothing filter has been shown to be especially attractive since both the smoothed signal and the derivatives can be calculated in a single step [1114], and only two parameters must be set, i.e., the width of the smoothing window and the degree of the smoothing polynomial.

Analogous to other digital signal processing techniques, the effectiveness of the S–G filter is found to be strongly dependent on the window size. Selection of the appropriate window size is essential for achieving the correct trade-off between reducing noise and avoiding bias [15]. For example, Edwards and Willson [16] have found that the optimum width of the smoothing array is 0.7 times the full width at half maximum (FWHM) of the narrowest Gaussian line of their spectra. In a similar study considering Lorentzian- as well as Gaussian-shaped lines, Enke and Nieman [17] concluded that the best signal-to-noise ratio (SNR) enhancement from a single-pass (quadratic–cubic) smoothing occurs for a smoothing array that is twice as wide as the FWHM of the peak being smoothed. In the study of Madden’ work [18], it is found that optimum width can sometimes be greater than 25 points for spectral data with 512 sampling points. Therefore, the optimum filter window width will depend on the signal features and the criteria set by the user. Moreover, an approach based on comparing the fitting residuals with the noise of the instrument was reported for selecting the optimal window size of the S–G algorithm [19]. However, in the case of non-stationary signals, the optimal window size will vary with the dynamics of the signal. Addressing this issue, the S–G filter with varying window size based on evaluation of the residuals of the smoothed data (with Gaussian lineshape) in the local region was proposed by Browne et al. [20]. This strategy was shown to be superior to fixed window S–G smoothing for a test signal at various SNR for noise removal. In the case of trace gas detection using TDLAS, the peak height or the integrated absorbance area of spectral signal is directly proportional to the gas concentration of the targeted species. Therefore, signal preservation is an important quality indicator in signal preprocessing, and this issue is often overlooked. In this work, a study of the simulated and measured TDLAS data (with Gaussian, Lorentzian, and Voigt profiles) by an adaptive S–G filter with varying window size has been conducted, in order to guide TDLAS signal preprocessing.

2 Savitzky–Golay smoothing filter

The S–G filtering technique is well known for smoothing data, so it will not be described in detail. Only some terminology and two key points considered in this work will be discussed. The main idea is similar to a moving average, but instead of just averaging the sampling points, it performs a least-squares-fit convolution procedure. The basic method of the S–G algorithm comprises the following steps: (i) data interval is selected (i.e., window size), (ii) a low-order polynomial function is fitted to the selected data interval, and (iii) the smoothed data point at the center of the selected interval is calculated from the polynomial coefficients. This smoothing process is repeated after shifting the analysis interval to the right by one sampling interval, as depicted in Fig. 1. More detailed discussions of least-squares-fit smoothing can be found in the original paper by Savitzky and Golay [21] and the corrected versions [17, 22] as well as a review paper by Willson and Edwards [23].

Fig. 1
figure 1

Illustration of least-squares smoothing by locally fitting a low-order polynomial (solid line) to five input samples: dot denotes the raw input samples, circle denotes the least-squares smoothed samples, and x denotes the effective impulse response samples. The dotted line denotes the polynomial approximation to centered unit impulse

Generally, the criterion to quantitatively illustrate the effectiveness of the de-noising operation is the SNR improvement, defined as follows:

$${\text{SNR}}({\text{dB}}) = 10{\log_{10}}\left( {\frac{{{\text{std}}({\text{Signa}}{{\text{l}}_{{\text{noise}} - {\text{free}}}})}}{{{\text{std}}({\text{Signa}}{{\text{l}}_{{\text{noise}} - {\text{free}}}} - {\text{Signa}}{{\text{l}}_{{\text{SG}} - {\text{denoised}}}})}}} \right)$$
(1)

where std refers to the standard deviation, Snoise-free, and SSG-denoised are the ideal simulated spectral signal, and S–G-filter-de-noised spectral signal, respectively. However, for real-world applications, the optimal filtering parameters cannot be directly determined from the SNR definition, since the real signal (i.e., noise-free signal) and noise source are completely unknown. To address this challenge, we proposed a varying window S–G filtering by integrating two additional criteria for TDLAS signal processing. The first criterion is to introduce a “real signal” or “noise-free signal” referred to “PolyFit” which is generated by fitting a polynomial function (initialization: polynomial order = 5, window size = 7) to a small segment (typically 50 sampling points) near the absorption peak of the raw signal, as shown in Fig. 2. The multiple linear regression analysis method is used to calculate the correlation coefficient R between the “PolyFit” and the same segment in the S–G-filter-smoothed data, instead of using SNR for assessing the optimal filtering parameters (in case of experimental data). Indeed, this condition is valid for noise reduction, while not credible for signal preservation. The second criterion is to employ a threshold “Th” defined as the difference of peak heights between “PolyFit” and the S–G filtering smoothed data, in order to optimize filtering parameters without excessive signal distortion. The flowchart of the adaptive S–G filter algorithm is shown in Fig. 3. Note that the window size must be an odd integer number, and the polynomial order must be less than window size.

Fig. 2
figure 2

Typical example of experimentally determined CO2 absorption spectrum and corresponding results by applying S–G filter and wavelet de-noising techniques (for details see text)

Fig. 3
figure 3

Flowchart of the newly developed adaptive S–G filter algorithm (for details see text)

3 Parameter optimization by simulation

In order to understand the dependence of the S–G filter on its input parameters (i.e., window size and polynomial order) as well as other effects such as sampling points and signal profiles, we have performed a large numbers of spectral simulations. The simulated datasets were modeled by considering a range of undesired spectral anomalies and variations that can often occur in measured spectra, such as baseline variations, noises, and pressure effects. We first evaluated the S–G filter for the synthetic spectral signals, modified with varying magnitudes of random noise and sampling points. A computer program has been written in the numerical script language Python for the computations and signal simulations. The CO2 spectroscopic parameters were used for simulation, which were extracted from HITRAN database [24], are compiled in Table 1. A set of given experimental conditions, such as temperature, pressure, gas concentration, and optical path length, was considered.

Table 1 Summary of spectroscopic parameters of CO2 line pair studied in this work, data are taken from HITRAN2012 database [23]

First, various spectral absorption signals with 1024 sampling points and different SNR have been simulated with partial signals, and the corresponding S–G filter-smoothed results are presented in Fig. 4. It can be seen that the window size must be chosen appropriately in order to preserve peak height. For a given polynomial degree, smaller window sizes will not give the best SNR; higher window size will produce a smoother result but could introduce bias of signal preservation, which in turn induces measurement errors of gas concentration. The SNR enhancement factor and the best window size as a function of polynomial order are shown in Fig. 5. This figure illustrates that the higher the polynomial order used in the S–G filter, the higher the window size needed for achieving the best SNR. On the other hand, we can see that the SNR enhancement factors are almost same for polynomial orders between 2 and 8. Furthermore, we evaluate the S–G filter by applying to the simulated signals with different sampling points, as presented in Fig. 6. Note that we found the larger the number of total sampling points, the higher the SNR enhancement factor achieved for the same noise level. Therefore, the noise level can be more effectively reduced by increasing the number of sampling points to which the S–G filter is applied. However, one has to compromise between noise reduction and temporal resolution in real-world applications. Moreover, we found that the proposed algorithm can also construct an optimal calibration model for TDLAS spectra with different background structural characteristics (linear or nonlinear baseline drift) [25].

Fig. 4
figure 4

Simulated spectra with different SNR (sampling points = 1024) and the corresponding S–G-filter-smoothed results

Fig. 5
figure 5

SNR enhancement factor and best windows size as a function of polynomial order for the S–G filter applying to the simulated signals with different SNR (number of sampling points = 1024)

Fig. 6
figure 6

SNR enhancement factor and best windows size as a function of polynomial order for the S–G filter applying to the simulated signals with different sampling points (raw SNR = 6.69)

4 Experimental application

From the simulations discussed above, it is recommended to set the polynomial order of the S–G filter in the range of 2–8, while the window size is the primary factor strongly that limits the filtering efficiency. In order to verify this conclusion and use of the algorithm for real measured signals, various spectral data are recorded by our TDLAS system. For creating the “PolyFit,” a polynomial (order = 5, used throughout this section) function was fitted to a small segment (50 sampling points) of the original signal (4096 sampling points) near the absorption peak. The linear correlation coefficients R calculated between “PolyFit” and the same segment in the S–G filter-smoothed results are calculated to replace SNR for assessing the optimal filtering window size. In theory, the higher the R values, the smoother the S–G-filtered results. Considering the second criterion, a threshold of 0.01 is typically selected. A comparison with powerful wavelet-based de-noising technique is also conducted. As demonstrated in Fig. 2, the best S–G-filter-smoothed result is comparable to that obtained from the best wavelet filtering (where Stein thresholding policy, wavelet db10, and decomposition level 6 are used). Finally, Fig. 7 presents the values of R 2 and difference of absorption peak heights as a function of window size for polynomial order between 1 and 8. Obviously, the R 2 and difference of line peak heights shows inverse trend with the optimal window size. When the difference of line peak heights overflows the threshold, the R 2 presents decline trend or abrupt change. Figure 8 shows the parameters determined from the developed S–G filter algorithm as a function of polynomial order. Here, the SNR enhancement factor was directly calculated from the ratio of standard deviation of the segments containing no absorption baseline in the unfiltered and filtered signals. The S–G-filtered results show that the highest R 2 and the best SNR enhancement factor occurred at polynomial order between 2 and 7, while the difference of absorption peak heights are within −0.001 and 0.0035 for each optimal window size, which are much less than the selected threshold of 0.01. These results confirmedly prove that the developed algorithm is reliable for processing our experimentally measured TDLAS signals.

Fig. 7
figure 7

Correlation coefficient R 2 and difference of line peak heights as a function of windows size

Fig. 8
figure 8

Parameters determined in the developed S–G filter as a function of polynomial order. SNR is defined as the ratio of standard deviation of the segments containing no absorption in unfiltered and filtered signal

In order to further evaluate the suitability of the developed adaptive algorithm suitable for absorption spectra with different lineshapes, series of experimental spectra (CO2 concentration around 1.5 %) were recorded at different pressures (between a few mbar and 1 bar). We still use the standard deviation of the segments containing no absorption baseline in unfiltered and filtered signal to denote the noise level. The results are demonstrated and compared in Fig. 9. The statistical mean values are also provided as insets in the figure. Overall, the SNR enhancement factor of 5.5 and 4.7 can be calculated from wavelet filter and the S–G filter, respectively. Based on the study of Chen et al. [11], the developed algorithms have finally been applied to a time series of CO2 concentrations datasets. As it can be seen from Fig. 10 (upper panel), measurement precisions have been significantly improved with standard deviations of 1.01, 0.18, and 0.16 from raw measurements, the output of the S–G filter, and output of the wavelet filter, respectively. The Allan variance in the lower panel shows an optimal averaging time of about 200 s for the present system. The measurement precision improvement by the developed S–G filter and wavelet filter corresponds to a precision level that can be obtained by conventional 40-s averaging. Overall, the wavelet filter demonstrated a higher ability to remove noise, but the method requires more parameters to be specified, for example, mother wavelet type, thresholding policy, threshold estimation, and decomposition level. On the other hand, the S–G filter shows great flexibility and has great potential for time series datasets with fast response, which are particularly attractive for TDLAS and other laser spectroscopy applications.

Fig. 9
figure 9

Comparison of noise level (see definition at upper panel) before and after applying the S–G filter and wavelet de-noised technique to experimental spectra recorded under different pressure

Fig. 10
figure 10

Upper panel shows raw measurements of time series CO2 concentrations with 1-Hz sampling rate and the corresponding S–G filter and wavelet filter output. Lower panel presents the Allan variance plot of raw data

5 Conclusion

In this paper, we have presented a simple but robust method based on the S–G filter to smooth out noise present in our TDLAS system without distorting signals. By applying the newly developed method to both simulated and experimental spectral signals, we found that the window size is the primary factor that limits the smoothing efficiency. Comparing the results with those from the powerful wavelet transform-based filter, the developed adaptive S–G filter shows the following four advantages: (i) it can reconstruct high-quality TDLAS signal by setting only two parameters in the S–G filter. Our results suggest that the optimal polynomial order is between 2 and 8, which is robust in most cases, while the best window size depends on the optimal polynomial order and the dynamics of signal and noise; (ii) it is very simple in theory and easy to implement because most commercial software such as ORIGIN and MATLAB include the S–G filter in their function library; (iii) it can be applied to spectral signals with any lineshape (e.g., Gaussian and Lorentzian), and there are no restrictions on the scaling of TDLAS datasets; (iv) the time cost for searching the optimal window size and outputting the best S–G-filtered result is superior than wavelet filtering technique. For these reasons, we anticipate that the developed method can be further applied to real-time smooth TDLAS spectral signals and time series concentration datasets for a wide variety of applications including atmospheric environmental monitoring and industrial processing control.