1 Introduction

The nondistortional removal of noise from spectra has been an ongoing challenge in all forms of spectroscopy, ellipsometric and otherwise. The classic approach to noise reduction is direct- (spectral-) space (DS) convolution, as exemplified by the extensive tables of Savitzky and Golay (SG), published in 1964 [1]. Results are historically assessed by inspection, although concerns have been raised. In 1997 Kaiser and Reed [2] noted that noise-reduction methods tend to be applied blindly, with results simply inspected to see if they “look good.” In 1995 Barak [3] pointed out that too many approaches were “hit and miss.” While noise reduction is a worthwhile goal for cosmetic reasons, additional advantages include minimizing uncertainties in parameters determined by lineshape analysis and in interpolation, whether simply to change the scale or more generally to convert spectra obtained linear in wavelength to spectra linear in energy [4].

The challenge is not simply to remove noise but to do it in ways that leave the underlying spectrum—the information—unchanged. Commonly known as filtering, noise reduction is typically done by linear methods, either by DS convolution, as mentioned above [1], or by reciprocal- (Fourier-) space (RS) processing [4,5,6,7]. Both take advantage of the separation of information and noise into low- and high-index Fourier coefficients, respectively, because structure follows from point-to-point correlations whereas noise is due to point-to-point fluctuations. This separation opens the path to optimizing filtering either directly in RS, through attenuation or replacement of unwanted coefficients, or indirectly in DS, by convolving the data with a set of coefficients. Filtering in RS can be assessed qualitatively by comparing the Fourier transform of the data with the transfer function of the filter [7]. In DS similar assessments require first calculating the Fourier coefficients of the data and convolving function, then comparing them in RS.

In principle, the ideal filter is that which leaves the information-containing coefficients unchanged while suppressing the noise-dominated coefficients completely. However, results obtained by applying this “brick-wall” (BW) filter are generally unacceptable because its abrupt cutoff generates Gibbs oscillations (ringing) in reconstructed spectra [8]. Because all linear filters incorporate apodization (cutoff), all require a compromise to be made among distortion of information, ringing, and leakage of noise.

In the absence of a quantitative RS measure to assess performance, a wide variety of transfer functions have been developed, largely empirically. However, Le et al. recently capitalized on Parseval’s Theorem to develop an nformative way of calculating mean-square deviations in RS. This expression allows the major sources of filtering errors to be recognized directly and assessed quantitatively. This led to the demonstration that the Gauss-Hermite (GH) is the best previous linear filter, and the development of an improved version, the cosine-terminated brick-wall (CT) [7].

The need for apodization—along with its associated errors—was recently eliminated with the development of the corrected maximum-entropy (CME) approach [9]. Based on calculations of Burg [10], this method projects, in a model-independent way, trends established by low-order coefficients into the white-noise region. By replacing high-order coefficients with values obtained analytically, this makes possible operations like constructing essentially noise-free second-derivative spectra, as seen in Fig. 1. Here, the second derivatives of \(\varepsilon_{2}\) data obtained by spectroscopic ellipsometry (SE) on a monolayer WS2 sample at 41 K [11] are calculated by multiplying the data or filtered RS coefficients \(C_{n}\) by \(n^{2}\), except that the 5-point second-derivative SG DS convolution was used directly. While the SG result is already a significant improvement relative to differentiating the data directly, the GH result is better. The best result is obtained with the CME filter. Further details are provided below. The figure is presented here to introduce capabilities.

Fig. 1
figure 1

Second derivatives of \(\varepsilon_{2}\) of WS2 at 41 K obtained by multiplying the original, GH-filtered, and CME-filtered Fourier coefficients \(C_{n}\) by \(n^{2}\) as described below. The SG result is obtained by using the 5-point second-derivative DS convolution coefficients directly

2 Theory

2.1 Fourier analysis

CME is based on the coefficients of complex-exponential expansions rather than those of cosines and sines [9]. As these are less common, we provide basic equations here. We assume that a spectrum consists of \((2N + 1)\) real, positive-definite data \(\{ f_{j} \}\) where \(- N \le j \le N\). Although Fourier analysis does not require the number of data to be odd, CME does. This restriction also simplifies mathematics.

Define the Fourier coefficients \(R_{n}\) according to

$$ f_{j} = \sum\limits_{n = - N}^{N} {R_{n} e^{{in\theta_{j} }} } , $$
(1a)

where

$$ \theta_{j} = \frac{2\pi }{{2N + 1}}j. $$
(1b)

Because the \(f_{j}\) are real, the \(R_{n}\) satisfy the reality condition \(R_{{^{j} }}^{*} = R_{ - j}\). From Eq. (1b) it follows that \(j = ( - N - {1 \mathord{\left/ {\vphantom {1 2}} \right. \kern-\nulldelimiterspace} 2})\) at \(\theta = - \pi\) at and \(j = (N + {1 \mathord{\left/ {\vphantom {1 2}} \right. \kern-\nulldelimiterspace} 2})\) at \(\theta = \pi\), so all \(f_{j}\), \(j = - N\) to N, are interior points, an advantage. The inverse transformation is

$$ R_{n} = \frac{1}{2N + 1}\sum\limits_{j = - N}^{N} {f_{j} } e^{{ - in\theta_{j} }} . $$
(2)

We assign the normalization factor \((2N + 1)^{ - 1}\) to Eq. (2) rather than Eq. (1a) for reasons described below. The \(R_{n}\) are related to the more familiar cosine and sine coefficients \(A_{n}\) and \(B_{n}\) as

$$ A_{0} = R_{0} = C_{0} ; $$
(3a)
$$ A_{n} - iB_{n} = 2R_{n} ,\quad n \ge 1.\quad C_{n} = \sqrt {A_{n}^{2} + B_{n}^{2} } = 2\left| {R_{n} } \right| $$
(3b)

Basing all calculations on a common range (\(- \pi \le \theta \le \pi\)) is also an advantage, but data are never obtained as a function of \(\theta\). Hence projection is necessary. Using energy as an example, let \(E_{i}\) correspond to the first point \(j = - N\) and \(E_{f}\) to the last point \(j = + N\). Then

$$ \theta = \frac{4\pi }{{(2N + 1)(E_{f} - E_{i} )}}\left( {E - \frac{1}{2}(E_{f} - E_{i} )} \right). $$
(4a)

The inverse transformation is

$$ E = \frac{1}{2}\left( {E_{i} + E_{f} } \right) + \frac{{(2N + 1)(E_{f} - E_{i} )}}{4\pi }\theta . $$
(4b)

2.2 Linear filtering

Linear filtering is defined most conveniently as a convolution in DS:

$$ \overline{f}_{j} = \sum\limits_{\kappa = - N}^{N} {f_{j - \kappa } b_{\kappa } } , $$
(5)

where the \(\{ \overline{f}_{j} \}\) are the filtered data and \(\{ b_{\kappa } \}\) are the filter coefficients. In Eq. (5), values of \((j - \kappa )\) that lie outside the \(( - N,N)\) range are wrapped mod \((2N + 1)\). Following the normalization convention of Eqs. (1a, 1b) and (2), the Fourier coefficients associated with the \(b_{\kappa }\) are

$$ B_{n} = \frac{1}{2N + 1}\sum\limits_{\kappa = - N}^{N} {b_{\kappa } e^{{in\theta_{\kappa } }} } . $$
(6)

The reason for this normalization is now clear: because the \(\{ b_{\kappa } \}\) form a unitary set, then \(B_{0} = 1\), as appropriate for a low-pass filter. The classic examples are the sets of coefficients published in 1964 by Savitzky and Golay [1]. Typically only a few \(b_{\kappa }\) are nonzero; for values that are not specified, the corresponding \(b_{\kappa }\) in Eqs. (5) and (6) are set equal to zero. For example, for the 3-point running-average filter, \(b_{\kappa } = {1 \mathord{\left/ {\vphantom {1 3}} \right. \kern-\nulldelimiterspace} 3}\) for \(\kappa = - 1,\,\,0,\,\,1\) and zero otherwise.

As noted in the Introduction, the effect of convolution in DS is represented in RS by the convolution theorem

$$ \overline{F}_{n} = F_{n} B_{n} , $$
(7)

where \(\overline{F}_{n}\), \(F_{n}\), and \(B_{n}\) are the Fourier coefficients of \(\overline{f}_{j}\), \(f_{j}\), and \(b_{\kappa }\), respectively. The above equation shows explicitly that the operation of the linear filter defined by the \(b_{\kappa }\) is independent of the data being processed. Capitalizing on Parseval’s Theorem, Le et al. [7] recently showed that the mean-square deviation \(\delta_{MSE}^{2}\) between \(\{ f_{j} \}\) and \(\{ \overline{f}_{j} \}\) is given in RS by

$$ \delta_{MSE}^{2} = \sum\limits_{n = - N}^{N} {|\overline{f}_{j} - f_{j} |^{2} } ; $$
(8a)
$$ = \sum\limits_{n = - N}^{N} {|F_{n} |^{2} |1 - B_{n} |^{2} } . $$
(8b)

With \(B_{0} = 1\) for a low-pass filter, Eq. (8b) provides clear justification for the Butterworth criterion of eliminating as many derivatives as possible in a Taylor-series expansion of \(B(n)\) about \(n = 0\), treating the \(B_{n}\) as a continuous function \(B(n)\).

Of the various linear filters that have been proposed, Eq. (8b) reduces the list to 2: the Gauss-Hermite filter mentioned previously [4, 12], and more recent development, the cosine-terminated brick-wall filter [7]. The Gauss-Hermite filter can be written either as a sum of Hermite polynomials of increasing even order multiplying a Gaussian \(e^{{ - {{n^{2} } \mathord{\left/ {\vphantom {{n^{2} } {\Delta n^{2} }}} \right. \kern-\nulldelimiterspace} {\Delta n^{2} }}}}\), as originally done by Hoffman et al., or as the product of the Gaussian and an Mth partial sum of \(e^{{ + {{n^{2} } \mathord{\left/ {\vphantom {{n^{2} } {\Delta n^{2} }}} \right. \kern-\nulldelimiterspace} {\Delta n^{2} }}}}\), which more directly illustrates the removal of low-order derivatives in the Taylor-series expansion.. In the latter case

$$ B(n) = \left( {1 + \zeta + \frac{1}{2}\zeta^{2} + ... + \frac{1}{M!}\zeta^{M} } \right)e^{ - \zeta } , $$
(9a)

where

$$ \zeta = \frac{{n^{2} }}{{\Delta n^{2} }}. $$
(9b)

It is straightforward to show that the first nonvanishing term in a full Taylor-series expansion of Eq. (9a) is proportional to \(\zeta^{M + 1}\), illustrating that the GH filter satisfies the Butterworth criterion explicitly.

The cosine-terminated brick-wall filter is defined by

$$ B(n) = 1;\quad {\text{for}}\;0 \le n \le n_{1} ; $$
(10a)
$$ = a\cos \left( {{{(n - n_{1} )} \mathord{\left/ {\vphantom {{(n - n_{1} )} {\Delta n}}} \right. \kern-\nulldelimiterspace} {\Delta n}}} \right)\quad for\quad n_{1} \le n \le n_{2} ; $$
(10b)
$$ = 0\quad for\;n > n_{2} , $$
(10c)

where a, \(n_{1}\), and \(n_{2}\) are parameters. In applications to discrete data, the variable n in Eqs. (9a, 9b) and (10a, 10b, 10c) is an integer.

GH filters are shown in Fig. 2 for a range of M and compared to the CT filter for high M in Fig. 3. Figure 2 shows apodizations for a range of orders from zero (Gaussian line) to M = 100. As can be appreciated from the figure, GH filters are most effective at relatively high orders, for example \(M\sim 50\) and above. This makes them particularly computationally intensive.

Fig. 2
figure 2

Transfer functions of the GH filter for M = 0, 1, 5, 20, 50, and 100

Fig. 3
figure 3

Transfer functions of the CT filter for \(x_{0} = 1\), \(a = 5\), and \(\Delta n = 0.2,\,\,0.5\), and 1. The transfer functions of the BW and GH, M = 100 filters are shown for comparison

Figure 3 compares the M = 100 GH filter to several CT filters for a = 5 to illustrate its apodizations. As can be seen, the main difference between the GH filter and the CT filter for \(\Delta n = 0.5\) occurs at the upper cutoff. The distortions caused by the abrupt cutoff of the CT filter at the high end is more than compensated by its lack of distortion at low indices.

2.3 Nonlinear filters

The history of the corrected maximum-entropy filter is complicated, involving investigations of phase noise by Walker and Yule [13, 14], extraction of harmonics buried in noise in stationary-time-sequence data by many workers, and the forward-prediction theory of Kolmogorov and Wiener [15, 16]. Burg used a maximum-entropy approach [10] to develop a deconvolution (sharpening or “whitening”) procedure to better define the frequencies of weak signals buried in noise. The CME filter was identified by Le et al. [9] as an alternative solution in the analysis. We summarize the CME filter below.

Let \(P(\theta )\) be a continuous function with the Fourier representation

$$ P(\theta ) = \sum\limits_{n = - \infty }^{\infty } {R_{n} e^{in\theta } } , $$
(11)

where the Fourier coefficients \(R_{n}\) are available from \(- N \le n \le N\). However, because of noise only the coefficients for \(- M \le n \le M\) are useful. Obviously, M is less, possibly much less, than N. However, we can recover the most-probable missing \(R_{n}\) by maximizing the entropy rate

$$ S = \sum\limits_{ - \infty }^{\infty } {\ln \left( {P(\theta )} \right)} = \sum\limits_{ - \infty }^{\infty } {\ln \left( {\sum\limits_{n = - \infty }^{\infty } {R_{n} e^{in\theta } } } \right)} $$
(12)

with respect to the \(R_{n}\) that are unavailable. After numerous steps the result is found to be given by

$$ P(\theta ) = \frac{1}{{|\sum\nolimits_{n = 0}^{M} {a_{n} e^{in\theta } |^{2} } }}, $$
(13)

where the coefficients \(a_{n}\) are given by

$$ \left( {\begin{array}{*{20}c} {R_{0} } & {R_{1} } & {...} & {R_{M} } \\ {R_{1}^{*} } & {R_{0} } & {...} & {R_{M - 1} } \\ {...} & {...} & {...} & {...} \\ {R_{M}^{*} } & {R_{M - 1}^{*} } & {...} & {R_{0} } \\ \end{array} } \right)\left( {\begin{array}{*{20}c} {a_{0}^{*} } \\ {a_{1}^{*} } \\ {...} \\ {a_{M}^{*} } \\ \end{array} } \right) = \left( {\begin{array}{*{20}c} {{1 \mathord{\left/ {\vphantom {1 a}} \right. \kern-\nulldelimiterspace} a}_{0} } \\ 0 \\ {...} \\ 0 \\ \end{array} } \right). $$
(14)

Several points can be mentioned. First, because \(P(\theta )\) appears as the argument of a logarithmic function, it must be positive definite. Second, the infinite series Eq. (11) has been replaced by the reciprocal of a finite series, Eq. (13). Third, it is easily shown that the features generated by Eq. (13) are pseudo-Lorentzians, being periodic in \(\theta\) rather than decreasing monotonically to zero, as true Lorentzians. Consequently, being a modified spectral representation, the CME filter is particularly appropriate for Lorentzian lines generated by first-order decay processes. Finally, Eqs. (12) and (13) show that the CME procedure, or maximum-entropy more generally, intrinsically performs an inverse Fourier transformation.

3 Application

As an example, Fig. 4 presents the WS2 data used to generate Fig. 1. The sample temperature was 41 K. Data were obtained on a JAWoollam RC2 spectroscopic ellipsometer. The data, obtained linear in wavelength, were converted to linear in energy by linear interpolation between points. We are interested in the structure of the fundamental absorption edge giving rise to the features at approximately 2.1 and 2.5 eV. At issue is whether these are simple or composite features including one or more singularities. As indicated by the dashed line in the figure, the data of interest were augmented by the extended removal of endpoint discontinuity (ERED) process [17] to eliminate endpoint-discontinuity artifacts and, as a byproduct, double the number of Fourier coefficients available for analysis.

Fig. 4
figure 4

Real \((\varepsilon_{1} )\) and imaginary \((\varepsilon_{2} )\) parts of the dielectric function of monolayer WS2 at 41 K [11]. The dashed curve shows the ERED extension in the excitonic region, as described in the text

The Fourier coefficients of the segment of interest in Fig. 4 were obtained to double precision using MATLAB. These are displayed in Fig. 5 as \(\ln (C_{n} )\) before the second-derivative processing. The data exhibit interference, indicating the presence of multiple features with amplitudes decreasing approximately linearly on a log scale. The white-noise onset occurs near \(n\sim 60\). To assess filtering the red and blue traces exhibit the results of GH and CME processing before multiplying by \(n^{2}\). For SG processing the order was set at 5 because higher orders noticeably attenuate the structure. For the GH calculation filtering was done with \(M = 4\) and \(n_{c} = 63\) to best match the white-noise onset. Because the SG DS convolution already multiplies the RS data by \(n^{2}\), its filtering equivalent is obtained by dividing the resulting \(C_{n}\) by \(n^{2}\). Consistent with Fig. 1, Fig. 5 shows that the SG filter is ineffective at suppressing noise except near its node at \(n = 150\). The GH filter is significantly better but still allows a substantial amount of noise to leak through.

Fig. 5
figure 5

(Black) Values of ln(Cn) of the highlighted data in Fig. 4. (Blue) same data after passing through a GH filter of order \(M = 4\) set with a cutoff index \(n_{c} = 63\). (Red) same data after passing through a CME filter of order 47. (Green) same data after passing through the SG 5-point second-derivative convolution converted to a filter by multiplying the \(C_{n}\) by \(n^{2}\)

More striking is the CME replacement of the coefficients \(C_{n}\) in the white-noise region. Its extrapolation of the trend established in the low-index range is obviously appropriate. With 47 coefficients retained for the calculation, the extrapolation is solidly based. Because it is also based on analytic functions, the calculation of the second derivative generates no noise beyond the noise cutoff. Thus noise in the CME reconstruction is essentially nonexistent. It can be noted that the CME shows unambiguously that the higher-energy peak is a doublet. As a second example, we consider archival room-temperature \(\varepsilon_{2}\) data for GaAs obtained by rotating-analyzer ellipsometry as 250 data points equally spaced in energy from 1.5 to 6.0 eV [18]. Integration time per point was 1 s. The data are shown in Fig. 6. The question here is whether the four \(E^{\prime}_{0}\) and \(E_{2}\) features reported by Lautenschlager et al. [19] at a sample temperature of 22 K can be resolved at room temperature, given that these appear to collapse into two at 80 K. The second derivatives with respect to energy in the \(E^{\prime}_{0} - E_{2}\) spectral region, along with the second derivatives calculated after GH and CME processing, are shown in Fig. 7. The SG computation exhibits noticeable noise, but in this case there is little difference between the GH and CME results. Both the GH and CME calculations show the presence of the \(E^{\prime}_{0}\), \(E^{\prime}_{0} + \Delta^{\prime}_{0}\), \(E_{2} ({\text{X}})\), and \(E_{2} (\Gamma )\) critical points in the room-temperature data. The \(E_{2} (X)\) structure in the SG result is marginal. Thus progress has been made since the data were obtained.

Fig. 6
figure 6

Room-temperature data for GaAs was obtained spectroscopically as 250 points equally spaced in energy from 1.5 to 6.0 eV [18]. Our interest is in the \(E_{0} ^{\prime} - E_{2}\) spectral range indicated by the arrows

Fig. 7
figure 7

Second energy derivatives of the \(\varepsilon_{2}\) spectrum of Fig. 7 in the \(E_{0} ^{\prime},\,\,E_{2}\) spectral range for the data (black), after SG DS 5-point convolution (green), after GH filtering (blue), and after CME filtering (red). The locations of critical points giving rise to features in these spectra are indicated by arrows

4 Discussion

While linear filtering is well known, the CME is relatively new, and hence comments on its use are in order. The requirement that the original spectrum be positive definite follows from Eq. (12). The procedure is based on maximizing the entropy rate, which is expressed initially as a logarithm. For spectra that have negative excursions, for example, the real part of the dielectric function, this requires the use of an additive constant. It can be shown that in the lowest order the effect of modifying \(R_{00}\) is to change the rate of decrease of the projection of ln(Cn) (the log of the broadening parameter) into the white-noise region, which for cosmetic purposes has little effect.

It can also be shown that Eq. (13) is a form of spectral representation, and hence the CME is most efficient when used with Lorentzian lineshapes. The characteristic of the spectral representation is its linear decrease of ln(Cn) with n. This is inconsistent with Gaussian lineshapes, where the decrease of ln(Cn) with n is quadratic. Nevertheless, model calculations show that when used for its intended purpose, neither the positive-definite nor linear-decrease characteristic appears to be significant.

5 Conclusion

In this work, we discuss different methods of minimizing noise in ellipsometric and other spectra by linear and nonlinear methods. The field has advanced rapidly in the last several years, and our objective is to provide an overview of current best practices, along with some examples of their use.