Abstract
Since the advent of the space-based photometric missions such as CoRoT and NASA’s Kepler, asteroseismology has acquired a central role in our understanding about stellar physics. The Kepler spacecraft, especially, is still releasing excellent photometric observations that contain a large amount of information not yet investigated. For exploiting the full potential of these data, sophisticated and robust analysis tools are now essential, so that further constraining of stellar structure and evolutionary models can be obtained. In addition, extracting detailed asteroseismic properties for many stars can yield new insights on their correlations to fundamental stellar properties and dynamics. After a brief introduction to the Bayesian notion of probability, I describe the code Diamonds for Bayesian parameter estimation and model comparison by means of the nested sampling Monte Carlo (NSMC) algorithm. NSMC constitutes an efficient and powerful method, in replacement to standard Markov chain Monte Carlo, very suitable for high-dimensional and multimodal problems that are typical of detailed asteroseismic analyses, such as the fitting and mode identification of individual oscillation modes in stars (known as peak-bagging). Diamonds is able to provide robust results for statistical inferences involving tens of individual oscillation modes, while at the same time preserving a considerable computational efficiency for identifying the solution. In the tutorial, I will present the fitting of the stellar background signal and the peak-bagging analysis of the oscillation modes in a red-giant star, providing an example to use Bayesian evidence for assessing the peak significance of the fitted oscillation peaks.
Access provided by CONRICYT-eBooks. Download conference paper PDF
Similar content being viewed by others
1 Bayesian Statistics
Let us assume to consider a given physical problem, e.g., the fitting of an observational dataset through the use of a predictive model. We term the dataset D and the fitting model \(\mathcal{M}_{k}\), the latter having a number of k free parameters that we represent with the k-dimensional parameter vector \(\boldsymbol{\theta }= (\theta _{1},\theta _{2},\ldots,\theta _{k})\). The number of free parameters sets the dimensionality of the problem, to which a k-dimensional parameter space \(\varOmega _{\mathcal{M}_{k}}\) is associated, representing the space of the solutions. Our aim is to obtain optimal estimates of each free parameter and a corresponding statistical weight of the model \(\mathcal{M}_{k}\) that takes into account both the number of dimensions and the fit quality. This statistical inference can be properly addressed through the means of Bayesian statistics (Jeffreys 1961; Sivia and Skilling 2006; Trotta 2008; Bolstad 2013; Corsaro et al. 2013; Corsaro and De Ridder 2014). In particular, the core of the statistical representation is given by Bayes’ theorem:
where \(\mathcal{L}(\boldsymbol{\theta }\mid D,\mathcal{M}_{k})\) (hereafter, \(\mathcal{L}(\boldsymbol{\theta })\) for simplicity) is the likelihood function, which represents the way we sample the data, while \(\pi (\boldsymbol{\theta }\mid \mathcal{M}_{k})\) is the prior probability density function (PDF) that reflects our knowledge about the model parameters. The left-hand side of Eq. (1) is the posterior PDF, which has a key role in the parameter estimation problem. Through a marginalization of the posterior PDF, namely an integration over the uninteresting free parameters, we estimate the free parameters of the model. Among the different estimators for each parameter, in Bayesian statistics the median is usually preferred because it represents the most resistant estimator, namely the least sensitive to possible outliers, and because it is invariant for variable change.
The denominator on the right-hand side of Eq. (1) is instead a normalization factor, generally known as the Bayesian evidence (or marginal likelihood), which is defined as
The Bayesian evidence is used for as a statistical weight for model comparison because it encompasses the principle of the Occam’s razor, meaning that models are favored if they provide a better fit to the data but are penalized if their number of free parameters is larger than that of a competitor model. For our study, the model comparison is performed by computation of the Bayes’ factor \(\mathcal{B}_{ij} =\mathcal{ E}_{i}/\mathcal{E}_{j}\) (see also Sect. 5), in which the model corresponding to a larger Bayesian evidence is statistically more likely (Jeffreys 1961; Trotta 2008; Corsaro et al. 2013; Corsaro and De Ridder 2014).
2 Nested Sampling Monte Carlo
Since Eq. (2) is a multi-dimensional integral, with increasing number of dimensions its evaluation becomes quickly unsolvable both analytically and by numerical approximations. For overcoming this problem, a NSMC algorithm was developed (Skilling 2004). This algorithm allows for an efficient evaluation of the Bayesian evidence for any number of dimensions and provides the sampling of the posterior probability distribution (PPD) for parameter estimation as a straightforward byproduct. Detailed descriptions of the algorithm can be found in Skilling (2004), Sivia and Skilling (2006), Feroz and Hobson (2008), Feroz et al. (2009), Corsaro and De Ridder (2014).
In short, a prior mass X is defined such that
with \(\mathcal{L}^{{\ast}}\) being some fixed value of the likelihood function. As a consequence, 0 ≤ X ≤ 1 because \(\pi (\boldsymbol{\theta }\mid \mathcal{M})\) is a PDF. Equation (3) is therefore the fraction of volume under the prior PDF that is contained within the hard constraint \(\mathcal{L}(\boldsymbol{\theta })>\mathcal{ L}^{{\ast}}\). This means that the higher is the constraining value \(\mathcal{L}^{{\ast}}\), the smaller is the prior mass considered. This is equivalent to considering a portion of parameter space delimited by the iso-likelihood contour \(\mathcal{L}(\boldsymbol{\theta }) =\mathcal{ L}^{{\ast}}\), in which also the maximum value \(\mathcal{L}_{\mathrm{max}}\) is contained.
In the NSMC, the sampling of the posterior PDF is performed by starting with a prior mass X = 0 (thus considering the entire parameter space) and an initial sampling of N live points that are distributed according to the prior, hence drawn from the prior PDF itself. At each new iteration, a new sampling point is drawn from the prior PDF with a corresponding likelihood value that satisfies the hard constraint \(\mathcal{L}>\mathcal{ L}^{{\ast}}\), with \(\mathcal{L}^{{\ast}}\) the worst likelihood value of the previous iteration. The point associated to the worst likelihood value is then removed from the sample and a new iteration starts. At the end, the prior mass reached corresponds to X = 1 and the sampling terminates in a region that is located around the maximum (or the maxima) of the likelihood function.
2.1 The Diamonds Code
The high-DImensional And multi-MOdal NesteD Sampling (Diamonds) codeFootnote 1 is a C++11 software for Bayesian parameter estimation and model comparison that uses a version of the NSMC algorithm. A major difficulty in implementing the NSMC algorithm is the drawing from the prior PDF that satisfies the hard constraint in the likelihood value of the drawn point. Following on the developments made for other existing codes that implement NSMC (see, e.g., Shaw et al. 2007; Feroz and Hobson 2008; Feroz et al. 2009), Diamonds overcomes this problem by adopting a simultaneous ellipsoidal sampling algorithm (Corsaro and De Ridder 2014). This means that the posterior PDF is actually sampled by means of multi-dimensional ellipsoids, which decompose the parameter space \(\varOmega _{\mathcal{M}_{k}}\) into small hyper-volumes, as shown in the left panel of Fig. 1. Each ellipsoid can thus be used to easily draw new points from, and it is reduced in its volume as the nested iteration proceeds toward a termination condition. In particular, one crucial parameter to control the behavior of the ellipsoids is the initial enlargement fraction, f 0, which is used to enlarge their axes along each direction for as many dimensions as imposed by the number of free parameters. This parameter, whose effect is depicted in the right panel of Fig. 1, tunes the efficiency of the sampling throughout the nested iterations and therefore requires a careful calibration, which I show in Fig. 2 as a function of the number of dimensions, k. A calibrated relation, already implemented in Diamonds, reads
and allows for using Diamonds for a wide range of applications without the need to adjust the parameter f 0 every time a new model or a different number of parameters is involved in the analysis.
Diamonds includes a library of likelihood functions and prior PDFs that can be used for a wide range of applications. As for any inference problem, the code requires an input dataset, a model to be fit to the observations, and the adoption of a given likelihood function and of prior PDFs for each free parameter of the model. The termination condition that allows the code to finalize its computations is based on the remaining Bayesian evidence, as described by Keeton (2011) (see also Corsaro and De Ridder 2014 for additional details). Instructions on how to configure the code and a description of its different parts can be found in the online user guide.Footnote 2 In the following examples, Diamonds is set up in different ways depending on the specific inference problem that is considered.
3 Fitting the Background Signal
The first step in the asteroseismic analysis process is to estimate the background signal in the power spectrum of a star.Footnote 3 This is an important phase of the analysis because if not properly performed it can introduce significant systematics in the asteroseismic parameters that characterize individual oscillation modes (Corsaro and De Ridder 2014). The first part of the tutorial is therefore focused on the estimation of the background signal in the red giant KIC 12008916, observed by NASA’s Kepler mission (Borucki et al. 2010; Koch et al. 2010) for more than 4 years. The dataset has been prepared following García et al. (2011, 2014), thus optimized for asteroseismic analysis.
In order to run the tutorial, one needs to have the Diamonds code already installed in a local machine. This procedure can be accomplished by following the instructions provided in the installation guide section of the code website.Footnote 4 Subsequently it is required to download the code extension for background fitting,Footnote 5 containing the specific fitting model, priors, and dataset to be used in the tutorial. The extension contains a library of Python routines that can be used to plot the results obtained with Diamonds. We note that throughout this tutorial we will adopt an exponential likelihood function, as appropriate for datasets deriving from a Fourier transform of a time series (Duvall and Harvey 1986; Corsaro and De Ridder 2014).
The background model, considered as a function of the cyclic frequency in the PSD of the star, reads
where W is a flat noise level and \(R\left (\nu \right )\) the response function that considers the sampling rate of the observations for Kepler data,
with ν Nyq = 283. 212 μHz the Nyquist frequency in the case of long-cadence data (Jenkins et al. 2010). We fit three Harvey-like profiles (Harvey 1985) given by
with a i the amplitude in ppm, b i the characteristic frequency in μHz, and \(\zeta = 2\sqrt{2}/\pi\) the normalization constant (Kallinger et al. 2014). The power excess containing the oscillations is described as
and is only considered when fitting the background model to the overall PSD of the star. The global model given by Eq. (5) therefore accounts for ten free parameters. The resulting fit obtained with Diamonds is shown in Fig. 3.
Questions & Problems:
-
For any of the estimated free parameters, which Bayesian parameter estimator should be preferred among the mode, the median and the mean? And why?
-
What is the value of ν max for this star?
-
Could you guess what the evolutionary stage of this red-giant star is from its ν max value?
-
Using your fitted ν max, and assuming Δν = 12. 9 μHz as the large frequency separation (Ulrich 1986), T eff = 5100 K, and solar reference values ν max,⊙ = 3100 μHz, Δν ⊙ = 134. 9 μHz, and T eff,⊙ = 5777 K, estimate the mass and radius of the star through scaling relations.
4 Fitting the Oscillation Modes
The second part of the tutorial is related to the fitting of the oscillation modes. For this purpose it is necessary to download and install the extension of Diamonds related to the peak-bagging analysis,Footnote 6 similarly to what has been done for the background.
The model that is taken into account is the one presented by Corsaro et al. (2015) and includes a mixture of resolved and unresolved oscillation mode profiles. For resolved modes, i.e., modes with lifetimes much shorter than the total observing time, the typical profile is a Lorentzian expressed as
where A 0, Γ 0, and ν 0 are the amplitude in ppm, the linewidth in μHz, and the centroid frequency in μHz, respectively, and represent the three free parameters to be estimated during the fitting process. For the unresolved modes, i.e., modes with a lifetime comparable or even longer than the total observing time, we consider the profile
where H 0 and ν 0 are the height in PSD units and the centroid frequency in μHz of the oscillation peak, respectively, and must be estimated during the fitting process, while δν bin is fixed as the frequency resolution of the dataset, here corresponding to 0. 008 μHz.
Following Corsaro and De Ridder (2014), Corsaro et al. (2015), we fix the background parameters corresponding to the white noise, \(W = \overline{W}\), and the Harvey-like profiles, \(B\left (\nu \right ) = \overline{B}\left (\nu \right )\), to the median values estimated in the tutorial in Sect. 3. Then, the final peak-bagging model can be represented as
where
with N res and N unres the number of resolved and unresolved peaks to be fitted, respectively. Clearly, any inference problem that takes into account this peak-bagging model will involve a total number of 3N res + 2N unres free parameters. The result of the fit for KIC 12008916 done with Diamonds is shown in Fig. 4.
Questions & Problems:
-
In Fig. 4 spot the positions of the radial (ℓ = 0), quadrupole (ℓ = 2) and octupole (ℓ = 3) modes, as follows from the asymptotic relation of the acoustic modes (Tassoul 1980).
-
Which oscillation modes are the most p-dominated mixed modes?
-
Compute the spacing (expressed in seconds) between the frequency ν ℓ = 1, m = 0 = 165. 178 μHz and another frequency that has to be computed as the average between the two frequency centroids of the unresolved profiles having the largest frequency (in the range 166–168 μHz). The frequency centroids of the unresolved profiles must be those from the fitting results obtained with Diamonds.
-
Compare the derived period spacing in the ΔP–Δν diagram shown in Fig. 8 of Corsaro et al. (2012) and determine the evolutionary stage of the star assuming Δν = 12. 9 μHz.
5 Peak Significance Test
As shown by Corsaro and De Ridder (2014) and later on applied by Corsaro et al. (2015) on red-giant stars, by means of the Bayesian evidence it is possible to perform a direct model comparison aimed at assessing the significance of a given oscillation peak. The final part of the tutorial with Diamonds foresees the computation of the peak significance test for one oscillation mode fitted during the peak-bagging analysis. In order to achieve this result, it is required that the peak-bagging presented in Sect. 4 is performed with two different models. By selecting a specific oscillation peak that we want to test, then the competing models to be fitted to the PSD of the star have to be defined as follows: (1) the first model, \(\mathcal{M}_{1}\), must contain the entire set of oscillation peaks to be fitted, including the peak that we intend to test; (2) the second model, \(\mathcal{M}_{2}\), must contain the entire set of peaks to be fitted, except the peak that we intend to test. This implies that the parameters that configure the prior PDFs of the models \(\mathcal{M}_{1}\) and \(\mathcal{M}_{2}\) should be identical, except for the peak to test. Using the set up of the PeakBagging extension of Diamonds, this can easily be achieved by removing the prior parameters of the corresponding peak when we have to fit model \(\mathcal{M}_{2}\). Among the outputs of Diamonds, there will be the Bayesian evidence.Footnote 7 The best model, or statistically more likely, can be identified by computing the Bayes’ factor (see Sect. 1) as \(\ln \mathcal{B}_{1,2} =\ln \mathcal{ E}_{1} -\ln \mathcal{ E}_{2}\). If, for example, \(\ln \mathcal{B}_{1,2}> 5\), according to Jeffreys’ scale of strength for the evidence (Jeffreys 1961; Trotta 2008) we then conclude that the peak is significant and that it should be considered as a real oscillation mode.
Questions & Problems:
-
Why are two different models needed to test the significance of an individual peak?
-
How many models are required to test the significance of two peaks?
-
Perform the peak significance test for the ℓ = 3 mode shown in Fig. 4 by means of Diamonds.
-
Provide the value of the natural logarithm of the Bayes’ factor for the aforementioned oscillation mode and assess the strength of the evidence according to Jeffreys’ scale.
Notes
- 1.
DIAMONDS is publicly available at https://fys.kuleuven.be/ster/Software/Diamonds/ or through its public GitHub repository at https://github.com/EnricoCorsaro/DIAMONDS.
- 2.
A comprehensive user guide to DIAMONDS can be found at https://fys.kuleuven.be/ster/Software/Diamonds/DIAMONDS_UserGuide.
- 3.
The power spectrum is usually converted into a power spectral density, PSD, to allow for direct comparisons independently of the observing length of the data. Its units are expressed in ppm2 μHz−1.
- 4.
The installation guide of DIAMONDS can be found at https://fys.kuleuven.be/ster/Software/Diamonds/installation-guide.
- 5.
The Background extension of DIAMONDS can be downloaded from https://fys.kuleuven.be/ster/Software/Diamonds/package/AzoresSC16_background_extension.tar.gz. Further information on how to run the tutorial can be found at http://www.iastro.pt/research/conferences/faial2016/files/presentations/TA1.pdf.
- 6.
The PeakBagging extension of DIAMONDS can be downloaded from https://fys.kuleuven.be/ster/Software/Diamonds/package/AzoresSC16_peakbagging_extension.tar.gz. The extension contains a library of Python routines that can be used to plot the results obtained with DIAMONDS. Further informations on how to run the tutorial can be found at http://www.iastro.pt/research/conferences/faial2016/files/presentations/TA1.pdf.
- 7.
More details can be found at http://www.iastro.pt/research/conferences/faial2016/files/presentations/TA1.pdf.
References
Bolstad, W.: Introduction to Bayesian Statistics. Wiley, New York (2013)
Borucki, W.J., Koch, D., Basri, G., et al.: Science 327, 977 (2010)
Corsaro, E., De Ridder, J.: Astron. Astrophys. 571, A71 (2014)
Corsaro, E., Stello, D., Huber, D., Bedding, T.R., Bonanno, A., Brogaard, K., Kallinger, T., Benomar, O., White, T.R., Mosser, B., Basu, S., Chaplin, W.J., Christensen-Dalsgaard, J., Elsworth, Y.P., García, R.A., Hekker, S., Kjeldsen, H., Mathur, S., Meibom, S., Hall, J.R., Ibrahim, K.A., Klaus, T.C.: Astrophys. J. 757, 190 (2012). doi:10.1088/0004-637X/757/2/190
Corsaro, E., Fröhlich, H.-E., Bonanno, A., et al.: Mon. Not. R. Astron. Soc. 430, 2313 (2013)
Corsaro, E., De Ridder, J., García, R.A.: Astron. Astrophys. 579, A83 (2015)
Duvall Jr., T.L., Harvey, J.W.: In: Gough, D.O. (ed.) NATO ASIC Proceedings 169: Seismology of the Sun and the Distant Stars, pp. 105–116 (1986)
Feroz, F., Hobson, M.P.: Mon. Not. R. Astron. Soc. 384, 449 (FH08) (2008)
Feroz, F., Hobson, M.P., Bridges, M.: Mon. Not. R. Astron. Soc. 398, 1601 (F09) (2009)
García, R.A., Hekker, S., Stello, D., et al.: Mon. Not. R. Astron. Soc. 414, L6 (2011)
García, R.A., Mathur, S., Pires, S. et al.: Astron. Astrophys. 568, A10 (2014)
Harvey, J.: In: Rolfe, E., Battrick, B. (eds.) Future Missions in Solar, Heliospheric & Space Plasma Physics. ESA Special Publication, vol. 235, pp. 199–208. Eruopean Space Agency, Paris (1985)
Jeffreys, H.: Theory of Probability, 3rd edn. Oxford University Press, Oxford (1961)
Jenkins, J.M., Caldwell, D.A., Chandrasekaran, H., et al.: Astrophys. J. 713, L120 (2010)
Kallinger, T., De Ridder, J., Hekker, S., et al.: Astron. Astrophys. 570, A41 (2014)
Keeton, C.R.: Mon. Not. R. Astron. Soc. 414, 1418 (K11) (2011)
Koch, D.G., Borucki, W.J., Basri, G., et al.: Astrophys. J. 713, L79 (2010)
Shaw, J.R., Bridges, M., Hobson, M.P.: Mon. Not. R. Astron. Soc. 378, 1365 (S07) (2007)
Sivia, D., Skilling, J.: Data Analysis: A Bayesian Tutorial. Oxford Science Publications. Oxford University Press, Oxford (2006)
Skilling, J.: AIP Conf. Proc. 735, 395 (SK04) (2004)
Tassoul, M.: Astrophys. J. Suppl. Ser. 43, 469 (1980)
Trotta, R.: Contemp. Phys. 49, 71 (2008)
Ulrich, R.K.: Astrophys. J. 306, L37 (1986)
Acknowledgements
This work has been funded by the European Community’s Seventh Framework Programme (FP7/2007–2013) under grant agreement no. 312844 (SPACEINN).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Corsaro, E. (2018). Tutorial: Asteroseismic Data Analysis with DIAMONDS. In: Campante, T., Santos, N., Monteiro, M. (eds) Asteroseismology and Exoplanets: Listening to the Stars and Searching for New Worlds. Astrophysics and Space Science Proceedings, vol 49. Springer, Cham. https://doi.org/10.1007/978-3-319-59315-9_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-59315-9_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-59314-2
Online ISBN: 978-3-319-59315-9
eBook Packages: Physics and AstronomyPhysics and Astronomy (R0)