1 Bayesian Statistics

Let us assume to consider a given physical problem, e.g., the fitting of an observational dataset through the use of a predictive model. We term the dataset D and the fitting model \(\mathcal{M}_{k}\), the latter having a number of k free parameters that we represent with the k-dimensional parameter vector \(\boldsymbol{\theta }= (\theta _{1},\theta _{2},\ldots,\theta _{k})\). The number of free parameters sets the dimensionality of the problem, to which a k-dimensional parameter space \(\varOmega _{\mathcal{M}_{k}}\) is associated, representing the space of the solutions. Our aim is to obtain optimal estimates of each free parameter and a corresponding statistical weight of the model \(\mathcal{M}_{k}\) that takes into account both the number of dimensions and the fit quality. This statistical inference can be properly addressed through the means of Bayesian statistics (Jeffreys 1961; Sivia and Skilling 2006; Trotta 2008; Bolstad 2013; Corsaro et al. 2013; Corsaro and De Ridder 2014). In particular, the core of the statistical representation is given by Bayes’ theorem:

$$\displaystyle{ p(\boldsymbol{\theta }\mid D,\mathcal{M}_{k}) = \frac{\mathcal{L}(\boldsymbol{\theta }\mid D,\mathcal{M}_{k})\pi (\boldsymbol{\theta }\mid \mathcal{M}_{k})} {p(D\mid \mathcal{M}_{k})} \,, }$$
(1)

where \(\mathcal{L}(\boldsymbol{\theta }\mid D,\mathcal{M}_{k})\) (hereafter, \(\mathcal{L}(\boldsymbol{\theta })\) for simplicity) is the likelihood function, which represents the way we sample the data, while \(\pi (\boldsymbol{\theta }\mid \mathcal{M}_{k})\) is the prior probability density function (PDF) that reflects our knowledge about the model parameters. The left-hand side of Eq. (1) is the posterior PDF, which has a key role in the parameter estimation problem. Through a marginalization of the posterior PDF, namely an integration over the uninteresting free parameters, we estimate the free parameters of the model. Among the different estimators for each parameter, in Bayesian statistics the median is usually preferred because it represents the most resistant estimator, namely the least sensitive to possible outliers, and because it is invariant for variable change.

The denominator on the right-hand side of Eq. (1) is instead a normalization factor, generally known as the Bayesian evidence (or marginal likelihood), which is defined as

$$\displaystyle{ \mathcal{E} \equiv p(D\mid \mathcal{M}_{k}) =\int _{\varOmega _{\mathcal{M}_{ k}}}\mathcal{L}(\boldsymbol{\theta }\mid D,\mathcal{M}_{k})\pi (\boldsymbol{\theta }\mid \mathcal{M}_{k})d\boldsymbol{\theta }\,. }$$
(2)

The Bayesian evidence is used for as a statistical weight for model comparison because it encompasses the principle of the Occam’s razor, meaning that models are favored if they provide a better fit to the data but are penalized if their number of free parameters is larger than that of a competitor model. For our study, the model comparison is performed by computation of the Bayes’ factor \(\mathcal{B}_{ij} =\mathcal{ E}_{i}/\mathcal{E}_{j}\) (see also Sect. 5), in which the model corresponding to a larger Bayesian evidence is statistically more likely (Jeffreys 1961; Trotta 2008; Corsaro et al. 2013; Corsaro and De Ridder 2014).

2 Nested Sampling Monte Carlo

Since Eq. (2) is a multi-dimensional integral, with increasing number of dimensions its evaluation becomes quickly unsolvable both analytically and by numerical approximations. For overcoming this problem, a NSMC algorithm was developed (Skilling 2004). This algorithm allows for an efficient evaluation of the Bayesian evidence for any number of dimensions and provides the sampling of the posterior probability distribution (PPD) for parameter estimation as a straightforward byproduct. Detailed descriptions of the algorithm can be found in Skilling (2004), Sivia and Skilling (2006), Feroz and Hobson (2008), Feroz et al. (2009), Corsaro and De Ridder (2014).

In short, a prior mass X is defined such that

$$\displaystyle{ X(\mathcal{L}^{{\ast}}) =\int _{\mathcal{ L}(\boldsymbol{\theta })>\mathcal{L}^{{\ast}}}\pi (\boldsymbol{\theta }\mid \mathcal{M})d\boldsymbol{\theta }\,, }$$
(3)

with \(\mathcal{L}^{{\ast}}\) being some fixed value of the likelihood function. As a consequence, 0 ≤ X ≤ 1 because \(\pi (\boldsymbol{\theta }\mid \mathcal{M})\) is a PDF. Equation (3) is therefore the fraction of volume under the prior PDF that is contained within the hard constraint \(\mathcal{L}(\boldsymbol{\theta })>\mathcal{ L}^{{\ast}}\). This means that the higher is the constraining value \(\mathcal{L}^{{\ast}}\), the smaller is the prior mass considered. This is equivalent to considering a portion of parameter space delimited by the iso-likelihood contour \(\mathcal{L}(\boldsymbol{\theta }) =\mathcal{ L}^{{\ast}}\), in which also the maximum value \(\mathcal{L}_{\mathrm{max}}\) is contained.

In the NSMC, the sampling of the posterior PDF is performed by starting with a prior mass X = 0 (thus considering the entire parameter space) and an initial sampling of N live points that are distributed according to the prior, hence drawn from the prior PDF itself. At each new iteration, a new sampling point is drawn from the prior PDF with a corresponding likelihood value that satisfies the hard constraint \(\mathcal{L}>\mathcal{ L}^{{\ast}}\), with \(\mathcal{L}^{{\ast}}\) the worst likelihood value of the previous iteration. The point associated to the worst likelihood value is then removed from the sample and a new iteration starts. At the end, the prior mass reached corresponds to X = 1 and the sampling terminates in a region that is located around the maximum (or the maxima) of the likelihood function.

2.1 The Diamonds Code

The high-DImensional And multi-MOdal NesteD Sampling (Diamonds) codeFootnote 1 is a C++11 software for Bayesian parameter estimation and model comparison that uses a version of the NSMC algorithm. A major difficulty in implementing the NSMC algorithm is the drawing from the prior PDF that satisfies the hard constraint in the likelihood value of the drawn point. Following on the developments made for other existing codes that implement NSMC (see, e.g., Shaw et al. 2007; Feroz and Hobson 2008; Feroz et al. 2009), Diamonds overcomes this problem by adopting a simultaneous ellipsoidal sampling algorithm (Corsaro and De Ridder 2014). This means that the posterior PDF is actually sampled by means of multi-dimensional ellipsoids, which decompose the parameter space \(\varOmega _{\mathcal{M}_{k}}\) into small hyper-volumes, as shown in the left panel of Fig. 1. Each ellipsoid can thus be used to easily draw new points from, and it is reduced in its volume as the nested iteration proceeds toward a termination condition. In particular, one crucial parameter to control the behavior of the ellipsoids is the initial enlargement fraction, f 0, which is used to enlarge their axes along each direction for as many dimensions as imposed by the number of free parameters. This parameter, whose effect is depicted in the right panel of Fig. 1, tunes the efficiency of the sampling throughout the nested iterations and therefore requires a careful calibration, which I show in Fig. 2 as a function of the number of dimensions, k. A calibrated relation, already implemented in Diamonds, reads

$$\displaystyle{ f_{0} = (0.267 \pm 0.014)\,k^{0.643\pm 0.017} }$$
(4)

and allows for using Diamonds for a wide range of applications without the need to adjust the parameter f 0 every time a new model or a different number of parameters is involved in the analysis.

Fig. 1
figure 1

Left panel: Three-dimensional ellipsoids containing two different clusters of sampling points in the parameter space. Right panel: The enlargement of an ellipsoid used to optimize the sampling efficiency throughout the nesting process

Fig. 2
figure 2

The initial enlargement fraction f 0 as a function of the number of dimensions k involved in the inference problem. The 152 independent computations provided by Corsaro et al. (2015) used 4 clusters each to sample the parameter space. The size of the circles is proportional to the number of processes for which the same f 0 was used. The colored band shows the 68.3% confidence region for the power law fit (thick red line)

Diamonds includes a library of likelihood functions and prior PDFs that can be used for a wide range of applications. As for any inference problem, the code requires an input dataset, a model to be fit to the observations, and the adoption of a given likelihood function and of prior PDFs for each free parameter of the model. The termination condition that allows the code to finalize its computations is based on the remaining Bayesian evidence, as described by Keeton (2011) (see also Corsaro and De Ridder 2014 for additional details). Instructions on how to configure the code and a description of its different parts can be found in the online user guide.Footnote 2 In the following examples, Diamonds is set up in different ways depending on the specific inference problem that is considered.

3 Fitting the Background Signal

The first step in the asteroseismic analysis process is to estimate the background signal in the power spectrum of a star.Footnote 3 This is an important phase of the analysis because if not properly performed it can introduce significant systematics in the asteroseismic parameters that characterize individual oscillation modes (Corsaro and De Ridder 2014). The first part of the tutorial is therefore focused on the estimation of the background signal in the red giant KIC 12008916, observed by NASA’s Kepler mission (Borucki et al. 2010; Koch et al. 2010) for more than 4 years. The dataset has been prepared following García et al. (20112014), thus optimized for asteroseismic analysis.

In order to run the tutorial, one needs to have the Diamonds code already installed in a local machine. This procedure can be accomplished by following the instructions provided in the installation guide section of the code website.Footnote 4 Subsequently it is required to download the code extension for background fitting,Footnote 5 containing the specific fitting model, priors, and dataset to be used in the tutorial. The extension contains a library of Python routines that can be used to plot the results obtained with Diamonds. We note that throughout this tutorial we will adopt an exponential likelihood function, as appropriate for datasets deriving from a Fourier transform of a time series (Duvall and Harvey 1986; Corsaro and De Ridder 2014).

The background model, considered as a function of the cyclic frequency in the PSD of the star, reads

$$\displaystyle{ P_{\mathrm{bkg}}\left (\nu \right ) = W + R\left (\nu \right )\left [B\left (\nu \right ) + G\left (\nu \right )\right ]\,, }$$
(5)

where W is a flat noise level and \(R\left (\nu \right )\) the response function that considers the sampling rate of the observations for Kepler data,

$$\displaystyle{ R\left (\nu \right ) = \text{sinc}^{2}\left ( \frac{\pi \nu } {2\nu _{\mathrm{Nyq}}}\right )\,, }$$
(6)

with ν Nyq = 283. 212 μHz the Nyquist frequency in the case of long-cadence data (Jenkins et al. 2010). We fit three Harvey-like profiles (Harvey 1985) given by

$$\displaystyle{ B\left (\nu \right ) =\sum _{ i=1}^{3} \frac{\zeta a_{i}^{2}/b_{ i}} {1 + \left (\nu /b_{i}\right )^{4}}\,, }$$
(7)

with a i the amplitude in ppm, b i the characteristic frequency in μHz, and \(\zeta = 2\sqrt{2}/\pi\) the normalization constant (Kallinger et al. 2014). The power excess containing the oscillations is described as

$$\displaystyle{ G\left (\nu \right ) = H_{\mathrm{osc}}\exp \left [-\frac{\left (\nu -\nu _{\mathrm{max}}\right )^{2}} {2\sigma _{\mathrm{env}}^{2}} \right ] }$$
(8)

and is only considered when fitting the background model to the overall PSD of the star. The global model given by Eq. (5) therefore accounts for ten free parameters. The resulting fit obtained with Diamonds is shown in Fig. 3.

Fig. 3
figure 3

Background fit of the star KIC12008916 by means of Diamonds. The original PSD is shown in gray. The red thick line represents the background model without the Gaussian envelope. The cyan dotted line accounts for the additional Gaussian component. The individual components of the background model as given by Eq. (5) are shown by blue dot-dashed lines

Questions & Problems:

  • For any of the estimated free parameters, which Bayesian parameter estimator should be preferred among the mode, the median and the mean? And why?

  • What is the value of ν max for this star?

  • Could you guess what the evolutionary stage of this red-giant star is from its ν max value?

  • Using your fitted ν max, and assuming Δν​ = ​ 12. 9 μHz as the large frequency separation (Ulrich 1986), T eff = 5100 K, and solar reference values ν max,⊙ = 3100 μHz, Δν = 134. 9 μHz, and T eff,⊙ = 5777 K, estimate the mass and radius of the star through scaling relations.

4 Fitting the Oscillation Modes

The second part of the tutorial is related to the fitting of the oscillation modes. For this purpose it is necessary to download and install the extension of Diamonds related to the peak-bagging analysis,Footnote 6 similarly to what has been done for the background.

The model that is taken into account is the one presented by Corsaro et al. (2015) and includes a mixture of resolved and unresolved oscillation mode profiles. For resolved modes, i.e., modes with lifetimes much shorter than the total observing time, the typical profile is a Lorentzian expressed as

$$\displaystyle{ \mathcal{P}_{\mathrm{res},0}\left (\nu \right ) = \frac{A_{0}^{2}/\left (\pi \varGamma _{0}\right )} {1 + 4\left (\frac{\nu -\nu _{0}} {\varGamma _{0}} \right )^{2}}\,, }$$
(9)

where A 0, Γ 0, and ν 0 are the amplitude in ppm, the linewidth in μHz, and the centroid frequency in μHz, respectively, and represent the three free parameters to be estimated during the fitting process. For the unresolved modes, i.e., modes with a lifetime comparable or even longer than the total observing time, we consider the profile

$$\displaystyle{ \mathcal{P}_{\mathrm{unres},0}\left (\nu \right ) = H_{0}\,\text{sinc}^{2}\left [\frac{\pi \left (\nu -\nu _{0}\right )} {\delta \nu _{\mathrm{bin}}} \right ]\,, }$$
(10)

where H 0 and ν 0 are the height in PSD units and the centroid frequency in μHz of the oscillation peak, respectively, and must be estimated during the fitting process, while δν bin is fixed as the frequency resolution of the dataset, here corresponding to 0. 008 μHz.

Following Corsaro and De Ridder (2014), Corsaro et al. (2015), we fix the background parameters corresponding to the white noise, \(W = \overline{W}\), and the Harvey-like profiles, \(B\left (\nu \right ) = \overline{B}\left (\nu \right )\), to the median values estimated in the tutorial in Sect. 3. Then, the final peak-bagging model can be represented as

$$\displaystyle{ P\left (\nu \right ) = \overline{W} + R\left (\nu \right )\left [\overline{B}\left (\nu \right ) + P_{\mathrm{osc}}\left (\nu \right )\right ]\,, }$$
(11)

where

$$\displaystyle{ P_{\mathrm{osc}}\left (\nu \right ) =\sum _{ i=1}^{N_{\mathrm{res}} }\mathcal{P}_{\mathrm{res},i}\left (\nu \right ) +\sum _{ j=1}^{N_{\mathrm{unres}} }\mathcal{P}_{\mathrm{unres},\,j}\left (\nu \right )\,, }$$
(12)

with N res and N unres the number of resolved and unresolved peaks to be fitted, respectively. Clearly, any inference problem that takes into account this peak-bagging model will involve a total number of 3N res + 2N unres free parameters. The result of the fit for KIC 12008916 done with Diamonds is shown in Fig. 4.

Fig. 4
figure 4

Peak-bagging fit of the star KIC 12008916 by means of Diamonds. The original PSD is shown in gray. The red thick line represents the estimated peak-bagging model [cf. Eq. (11)], while the blue dashed lines mark the background signal and a scaled (by a factor of eight) version of it

Questions & Problems:

  • In Fig. 4 spot the positions of the radial ( = 0), quadrupole ( = 2) and octupole ( = 3) modes, as follows from the asymptotic relation of the acoustic modes (Tassoul 1980).

  • Which oscillation modes are the most p-dominated mixed modes?

  • Compute the spacing (expressed in seconds) between the frequency ν = 1, m = 0 = 165. 178 μHz and another frequency that has to be computed as the average between the two frequency centroids of the unresolved profiles having the largest frequency (in the range 166–168 μHz). The frequency centroids of the unresolved profiles must be those from the fitting results obtained with Diamonds.

  • Compare the derived period spacing in the ΔPΔν diagram shown in Fig. 8 of Corsaro et al. (2012) and determine the evolutionary stage of the star assuming Δν​ = ​ 12. 9 μHz.

5 Peak Significance Test

As shown by Corsaro and De Ridder (2014) and later on applied by Corsaro et al. (2015) on red-giant stars, by means of the Bayesian evidence it is possible to perform a direct model comparison aimed at assessing the significance of a given oscillation peak. The final part of the tutorial with Diamonds foresees the computation of the peak significance test for one oscillation mode fitted during the peak-bagging analysis. In order to achieve this result, it is required that the peak-bagging presented in Sect. 4 is performed with two different models. By selecting a specific oscillation peak that we want to test, then the competing models to be fitted to the PSD of the star have to be defined as follows: (1) the first model, \(\mathcal{M}_{1}\), must contain the entire set of oscillation peaks to be fitted, including the peak that we intend to test; (2) the second model, \(\mathcal{M}_{2}\), must contain the entire set of peaks to be fitted, except the peak that we intend to test. This implies that the parameters that configure the prior PDFs of the models \(\mathcal{M}_{1}\) and \(\mathcal{M}_{2}\) should be identical, except for the peak to test. Using the set up of the PeakBagging extension of Diamonds, this can easily be achieved by removing the prior parameters of the corresponding peak when we have to fit model \(\mathcal{M}_{2}\). Among the outputs of Diamonds, there will be the Bayesian evidence.Footnote 7 The best model, or statistically more likely, can be identified by computing the Bayes’ factor (see Sect. 1) as \(\ln \mathcal{B}_{1,2} =\ln \mathcal{ E}_{1} -\ln \mathcal{ E}_{2}\). If, for example, \(\ln \mathcal{B}_{1,2}> 5\), according to Jeffreys’ scale of strength for the evidence (Jeffreys 1961; Trotta 2008) we then conclude that the peak is significant and that it should be considered as a real oscillation mode.

Questions & Problems:

  • Why are two different models needed to test the significance of an individual peak?

  • How many models are required to test the significance of two peaks?

  • Perform the peak significance test for the = 3 mode shown in Fig. 4 by means of Diamonds.

  • Provide the value of the natural logarithm of the Bayes’ factor for the aforementioned oscillation mode and assess the strength of the evidence according to Jeffreys’ scale.