Key words

1 Introduction

Our standard model of cosmology, ΛCDM, is one in which our universe is made up primarily of dark energy, has a large amount of dark matter, and only a small fraction of ordinary matter; it is currently undergoing a period of expansion and an expansion that is accelerating. This model appears to describe the contents and evolution of the universe very well, and has been determined through the analysis of several astrophysical objects and phenomena. One additional dataset with the potential to provide complementary information is the mass-structure of clusters of galaxies [1, 2, 22, 24]. These objects contain hundreds to thousands of galaxies, as well as hot gas and dark matter, and they gravitationally lens and distort the images of more distant galaxies.

Strong lensing efficiencies, as characterised by the effective Einstein radii (denoted θ E ) scale well with the mass of clusters at large over-densities [14]. If any given set of galaxy clusters sample are, in fact, stronger lenses than predicted by the ΛCDM model, they will have larger θ E for a given total mass at low over-densities (e.g., M 500). The earliest studies of similar galaxy-cluster properties revealed a significant difference between the observations and ΛCDM predictions [1, 15]. Thus began the hunt for solutions to the so-called tension with ΛCDM cosmology.

Previous works in the literature have claimed either ‘tension’ or ‘consistency’ with ΛCDM, or insufficient data [6, 11, 18, 21, 23, 27], but do not allow one to compare competing cosmological hypotheses. In the present work, we propose a Bayesian approach to the cosmological test using galaxy cluster strong lensing properties.

2 The Bayesian Framework

A Bayesian approach is advocated, in which one determines the relative preference of two hypothetical cosmological models, C 1 and C 2, in light of the data D, by calculating the Bayes factor B:

$$\displaystyle{ B = \frac{\mathcal{L}(D\vert C_{1})} {\mathcal{L}(D\vert C_{2})} }$$
(10.1)

where \(\mathcal{L}\) denotes the likelihood of the data assuming a cosmology.

The aim then is to calculate, under one chosen hypothesis: ΛCDM, the likelihood of observing the structural properties of a particular sample of galaxy clusters. This sample is detected using a well-defined selection criteria and all relevant properties have been measured [5, 7, 13, 16, 17, 26, 27].

2.1 Weighted ABC

Achieving the aforementioned goal is non-trivial, because a likelihood function related to the original observables (θ E and M 500) is intractable. This is because: (a) computer simulations are deemed necessary for describing the irregular structure of galaxy clusters, which undergo non-linear structure formation; (b) there are a finite (and relatively small) number of clusters that can be simulated in a reasonable amount of time, and thus the full θ E M 500 space cannot be sampled. Therefore, this problem is an ideal case for which one may apply a variant of approximate Bayesian computation (ABC) [4, 25]. What we propose is not a likelihood-free approach, however, and rather than rejecting mock samples that are dissimilar to the real data, they are down-weighted. Thus, we refer to the novel approach described below as Weighted ABC.

We assume a power-law relation between the strong lensing and mass proxies, and perform a fitting to the following function in logarithmic spaceFootnote 1:

$$\displaystyle{ \log \left [ \frac{M_{500}} {9 \times 10^{14}M_{\odot }}\right ] =\alpha \log \left [ \frac{\theta _{E}} {20\textquotedblright }\sqrt{ \frac{D_{d } } {D_{ds}}}\right ]+\beta }$$
(10.2)

with parameters α and β, and aim to find the likelihood of observing the scaling relationship. α and β act as summary statistics for the dataset. However, rather than calculating precise values for α and β, one would determine a probability distribution that reflects the degree of belief in their respective values. The relevant fitting procedure is described in Sect. 10.2.2.

Next, we outline how to calculate the likelihood of observing α and β. In the following, ι represents background information such as knowledge of the cluster selection criteria, the method of characterising the Einstein radius, and the assumption that there exists a power-law relation between strong lensing and mass.

  1. 1.

    Computer simulations (see [3, 14, 19, 20]) are run within the framework of a chosen cosmological hypothesis, C. In our case, C represents the assumption that ΛCDM (or specific values for cosmological parameters) is the true description of cosmology.

  2. 2.

    Simulated galaxy clusters are selected according to specified criteria, ideally reflecting the criteria used to select the real clusters.

  3. 3.

    Different on-sky projections of these three-dimensional objects produce different apparent measurements of structural properties. Therefore, we construct a large number of mock samples from these by randomly choosing an orientation-angle for each cluster. Equation (10.2) is fit to each mock sample (See Sect. 10.2.2), to determine a posterior over α and β: P i (α, β | C, ι) denotes the result for the ith of N mock samples. We combine these, to give the probability, \(P(\alpha,\beta \vert C,\iota ) \equiv \sum _{i=1}^{N}P_{i}(\alpha,\beta \vert C,\iota )\), that one would observe the scaling relation {α,β} under the hypothesis C. The result can be interpreted as a likelihood function as a function of data: α and β.

  4. 4.

    Fit Eq. (10.2) to the data to obtain the posterior probability distribution for α and β, P(α, β | ι). The normalised posterior is then interpreted as a single ‘data point’: the distribution represents the uncertainty on the measurement of α and β.

  5. 5.

    Calculate the likelihood, \(\mathcal{L}\), of observing the αβ fit as we did, by integrating over the product of the two aforementioned posteriors—now re-labelled ‘data-point’ and ‘likelihood function’.

The result of integrating the product of P(α, β | C, ι) and P(α, β | ι) for the dataset is mathematically equivalent to integrating the product for each mock separately, then taking the average over all mock samples:

$$\displaystyle{ \int {\Bigl [ \frac{1} {N}\sum _{i=1}^{N}P_{ i}(\alpha,\beta \vert C,\iota )\Bigr ]}P(\alpha,\beta \vert \iota )\mathop{}\!\mathrm{d}\alpha \mathop{}\!\mathrm{d}\beta = \frac{1} {N}\sum _{i=1}^{N}\int P_{ i}(\alpha,\beta \vert C,\iota )P(\alpha,\beta \vert \iota )\mathop{}\!\mathrm{d}\alpha \mathop{}\!\mathrm{d}\beta }$$
(10.3)

Thus, what we have described above is equivalent to the weighting of each mock sample according to its similarity to the real data, where the metric is the convolution of the two (mock and real) posterior probability distributions P(α, β | ι).

2.2 Summary Statistic Fitting

The summary statistics α and β are parameters of the scaling relation between strong lensing efficiency and total cluster mass [Eq. (10.2)]. The procedure for calculating \(\mathcal{L}\), as described in Sect. 10.2.1, requires one to fit real or mock data to determine the posterior distribution on α and β. We employ the Bayesian linear regression method outlined in [10]. Additionally, we acknowledge that intrinsic scatter is likely to be present, and thus introduce a nuisance parameter, V, which represents intrinsic Gaussian variance orthogonal to the line.

For this subsection, we change notation in order to reduce the subscripts: the mass of the i-th cluster lens as M i , and the scaled Einstein radius as E i . Each data-point is denoted by the vector \(\mathbf{Z}_{i} = [\log M_{i},\log E_{i}]\). Their respective uncertainties (on the logarithms) are denoted \(\sigma _{M}^{2}\) and \(\sigma _{E}^{2}\). Since we assume the uncertainties for Einstein radii and cluster mass are uncorrelated, the covariance matrix, \(\mathbf{S}_{i}\), reduces to:

$$\displaystyle{ \mathbf{S}_{i} \equiv \left (\begin{array}{cc} \sigma _{M}^{2} & 0 \\ 0 &\sigma _{E}^{2} \end{array} \right ) }$$
(10.4)

In the case of a mock sample of simulated clusters, \(\mathbf{S}_{i} = 0\).

Consider now the following quantities: \(\varphi \equiv \arctan \alpha\), which denotes the angle between the line and the x-axis, and \(b_{\perp }\equiv \beta \cos \varphi\) which is the orthogonal distance of the line to the origin. The orthogonal distance of each data-point to the line is:

$$\displaystyle{ \varDelta _{i} =\hat{\mathbf{ v}}^{\top }\mathbf{Z}_{ i}-\beta \cos \varphi }$$
(10.5)

where \(\hat{\mathbf{v}} = [-\sin \varphi,\cos \varphi ]\) is a vector orthogonal to the line.

Therefore, the orthogonal variance is

$$\displaystyle{ \varSigma _{i}^{2} =\hat{\mathbf{ v}}^{\top }\mathbf{S}_{ i}\hat{\mathbf{v}}\,. }$$
(10.6)

Following [10], we calculate the likelihood over the three-dimensional parameter space: \(\varTheta _{1} \equiv \{\alpha,\beta,V \}\):

$$\displaystyle{ \ln \mathcal{L} =\mathrm{ K} -\sum _{i=1}^{N}\frac{1} {2}\ln (\varSigma _{i}^{2} + V ) -\sum _{ i=1}^{N} \frac{\varDelta _{i}^{2}} {2\varSigma _{i}^{2} + V } }$$
(10.7)

where K is an arbitrary constant, and the summation is over all clusters in the considered sample.

While we ultimately (aim to) provide the parameter constraints on α and β, flat priors for these tend to unfairly favour large slopes. A more sensible choice is flat for the alternative parameters \(\varphi\) and b  ⊥ . We apply a modified Jeffreys prior on V:

$$\displaystyle{ \pi (V ) \propto \frac{1} {V + V _{t}} }$$
(10.8)

This is linearly uniform on V for small values and logarithmically uniform on V for larger values with a turnover, V t , chosen to reflect the typical uncertainties.

Thus, for each Θ 1, we may define an alternative set of parameters \(\varTheta _{2} \equiv \{\varphi,b_{\perp },V \}\), for which the prior is given by:

$$\displaystyle\begin{array}{rcl} \pi (\varTheta _{2})& =& \pi (\varphi,b_{\perp })\pi (V ) \\ & & \propto \pi (V ){}\end{array}$$
(10.9)

where π(V ) is given by Eq. 10.8. The prior on Θ 1 is then dependent on the magnitude of the Jacobian of the mapping between the two sets of parameters:

$$\displaystyle\begin{array}{rcl} \pi (\varTheta _{1})& =& \pi (\varTheta _{2})\mathrm{det}\frac{\partial \varTheta _{2}} {\partial \varTheta _{1}} \\ & \equiv & \pi (\varTheta _{2}) \frac{1} {(1 +\alpha ^{2})^{3/2}}{}\end{array}$$
(10.10)

Boundaries on the priors are sufficiently largeFootnote 2: − 8 ≤ β ≤ 8; − 40 ≤ α ≤ 40; 0 ≤ V ≤ V max. V max is chosen to reflect the overall scatter in the data. The posterior is calculated following Bayes’ theorem:

$$\displaystyle{ P(\varTheta _{1}\vert D) \propto \mathcal{L}(D\vert \varTheta _{1})\,\pi (\varTheta _{1}) }$$
(10.11)

and is normalised. In practice, the posterior distribution was sampled by employing emcee [8], the python implementation of the affine-invariant ensemble sampler for Markov chain Monte Carlo (MCMC) proposed by [9].

As we are interested in the constraints on α and β, we then marginalise over the nuisance parameter, V.

3 Results

In Fig. 10.1, we show the relation between the Einstein radii and the cluster mass M500. The real cluster sample is represented by red circles. For simulated clusters, the situation is more complicated. Since different lines of sight provide a large variation in projected mass distribution, each cluster cannot be associated with an individual Einstein radius, nor a simple Gaussian or log-normal distribution [14]. We therefore measure the Einstein radius for 80 different lines of sight and, for ease of visualisation, describe the distribution of Einstein radii for each simulated cluster by a box-plot.

Fig. 10.1
figure 1

Strong lensing efficiency, characterised by scaled Einstein radii, θ E,eff, plotted as a function of mass. The range of Einstein radii for simulated clusters are shown by the blue box-plots. The red circles represent the real clusters. The red line marks the maximum a-posteriori fit to observational data, while the thin blue lines mark the fit to 20 randomly chosen mock samples from simulations

We fit the observational data to the lensing-mass relation and after marginalising out the nuisance parameter, V, present the posterior distribution for α and β, denoted by red contours in the left-hand panel of Fig. 10.2. This fit is reinterpreted as a single ‘data-point’. To estimate the likelihood, as a function of possible data, we employ simulations. Many mock samples are individually fit to the lensing-mass relations; the maximum of the posterior is shown as a blue point and a typical 1-σ error shown as a blue ellipse. By adding the posteriors for each mock sample and renormalising, we estimate the required likelihood function, shown by the blue contours in the right-hand panel of Fig. 10.2. By multiplying by the ‘data-point’ distribution and integrating over the parameter space, we find \(\mathcal{L}\approx 0.3\).

Fig. 10.2
figure 2

Left: 1-σ and 2-σ constraints on parameters of the strong lensing—mass relation given the real cluster data (red contours), with a maximum a posteriori fit marked by a red circle. Overplotted in blue dots are the best fits to 80 mock observations of simulated galaxy clusters. A typical 1-σ error is shown as a blue ellipse. Right: Same as the middle panel, but the blue circle and curves mark, respectively, the maximum and the 1-σ and 2-σ contours of the likelihood function found by combining all 80 mocks. Ultimately, the likelihood, \(\mathcal{L}\approx 0.3\), is found by convolving the functions marked by the red and blue contours

Note that one cannot comment on whether the likelihood is large or small. Currently, such simulations are only available for the fiducial ΛCDM cosmological model. However, if the same process is repeated for simulations under a different model, then the Bayes factor can be calculated [see Eq. (10.1)] and, after accounting for priors, may (or may not) reveal a preference for one of the cosmologies, in light of this data. Alternative cosmological models may include, for example, those with a different relative dark matter to dark energy ratio, interactions between the two dark components, or a different normalisation for the structure power spectrum.

4 Computational Challenge

The approach described above is an exciting new strategy for calculating the likelihood for observing strong lensing galaxy clusters for a chosen cosmological hypothesis. However, we recognise that the calculation involves running computer simulations that can take months. Computationally ‘cheaper’ simulations ignore several astrophysical processes in the formation of galaxy clusters and it is debatable whether these would be sufficient.

In order to determine the severity of this problem, we repeat the aforementioned procedure using galaxy cluster counterparts from such simulations, at varying levels of complexity and realism, and find that the likelihood, \(\mathcal{L}\), can then vary by a factor of three or four. If the cheaper simulations are employed, then the selection criteria must also be replaced with an alternative compromise. We test this alternative and find that \(\mathcal{L}\) changes by a factor of two.

Our findings suggest that if a model-comparison study was carried out using a simulation based on an alternative cosmological hypothesis and resulting in a Bayes factor of 20 or more [see Eq. (10.1)], then the cheaper simulations (or toy models based on these) would be sufficient. However, in the event that the Bayes factor B is found to be smaller, then the computationally expensive but realistic simulations would be necessary.