1 Introduction

Observational evidence of the acceleration of the Universe marked the beginning of a new era in Cosmology. It is well established that the current expansion of the Universe is accelerating, and an explanation for the current acceleration is done by introducing the dark energy (DE) term in the Einstein equation. Dark energy is described by its equation of state (EoS) parameter \(w=-P^{\prime } / \rho ^{\prime }\), where \(\rho ^{\prime }\) is the energy density and \(P^{\prime }\) is its pressure contribution. It is still unknown whether dark energy is a cosmological constant (Carroll et al. 1992; Turner & White 1997; Carroll 2001; Padmanabhan 2003) or a time-evolving entity (Peebles & Ratra 2003; Copeland et al. 2006). The \(\Lambda \)CDM (cosmological constant and cold dark matter) model corresponds to the dark energy equation of state value \(w=-1\), whereas in the case of time-evolving dark energy, the dark energy EoS parameter varies with time and can assume different values of w (Weinberg 1989; Carroll et al. 1992; Coble et al. 1997; Caldwell et al. 1998; Sahni & Starobinsky 2000; Ellis 2003; Padmanabhan 2003; Peebles & Ratra 2003; Albrecht et al. 2006; Frieman et al. 2008; Linder 2008a; Stern et al. 2010; Arjona & Nesseris 2020). Various models based on scalar, canonical, and non-canonical fields have been proposed to overcome different problems of \(\Lambda \)CDM model (Ratra & Peebles 1988; Copeland et al. 1998; Zlatev et al. 1999; Chevallier & Polarski 2001; Padmanabhan 2002; Bagla et al. 2003; Caldwell & Linder 2005; Linder 2006; Huterer & Peiris 2007; Linder 2008b; Tsujikawa 2013; Rajvanshi & Bagla 2019; Singh et al. 2019). The discrepancies in \(H_0\) measurements and its implication in cosmological model selection is discussed in Banerjee et al. (2021) and Lee et al. (2022). The last two decades have also marked the era of precision Cosmology. Cosmological parameters are measured to high precision utilizing the availability of new data sets (Chevallier & Polarski 2001; Planck Collaboration et al. 2018; Sangwan et al. 2018).

Maximum likelihood estimation (MLE) analysis is the most commonly used technique in cosmological parameter estimation (Nesseris & Perivolaropoulos 2004, 2005, 2007; Jassal 2009; Sangwan et al. 2018; Singh et al. 2019). The increasing availability of the observational data set has tightened the constraints on the parameters of theoretical models (Chevallier & Polarski 2001; Linder 2003a; Jassal et al. 2005; Gong & Wang 2007; Verde et al. 2013; Mukherjee 2016; Di Valentino et al. 2017; Vagnozzi et al. 2018; Bellomo et al. 2020; Bernal et al. 2020). Though it is crucial to determine the theory parameters, we have the observational data dependencies at the core of these methods, and new data sets reject or accept a particular model with quantified precision. Methods like the principal component analysis (PCA) enable us to determine the functional form of the observable of a data set in a model-independent, non-parametric manner (Huterer & Starkman 2003; Huterer & Cooray 2005; Crittenden et al. 2009; Clarkson & Zunckel 2010; Ishida & de Souza 2011; Hojjati et al. 2012; Nesseris & García-Bellido 2013; Nair & Jhingan 2013; Zheng & Li 2017; Miranda & Dvorkin 2018; Hart & Chluba 2019; Sharma et al. 2020; Hart & Chluba 2022a). PCA is a multivariate analysis that gives the form of cosmological quantities as a function of redshift (Huterer & Starkman 2003; Huterer & Cooray 2005; Clarkson & Zunckel 2010; Zheng & Li 2017; Sharma et al. 2020). In a previous work (Sharma et al. 2020), we combined PCA and correlation coefficient calculation to give the analytical, functional form of the observable quantity when observational data sets are given as input. The method is efficient in fitting the observable; the caveat, however, is that the derived cosmological parameters, like the dark energy EoS parameter, are not determined very efficiently. The problem arises due to the non-linear dependency of the dark energy parameter on the observational quantity at hand, for instance, the Hubble parameter and the distance modulus. To circumvent this problem, we incorporate the Markov Chain Monte Carlo method with PCA reconstruction to derive the EoS parameters for dark energy and other cosmological parameters. The EoS parameter is derived by searching for the model that best describes the functional form of the observable determined by the observational data. For the Monte Carlo method, we utilize the No U-Turn Sampler, a variant of the Hamilton Monte Carlo method. In this analysis, we demonstrate that the constraints on the parameters of the dark energy EoS are in line with those obtained from other methods.

This paper is structured as follows. Section 2 briefly reviews background cosmology, describes the reconstruction algorithm, and the No U-Turn sampling. In Section 3, we describe the results of our algorithm. We describe the distinguishing features of our methodology in Section 4. In Section 5, we summarize this paper’s main results.

2 Reconstruction methodology

In this section, we first discuss the methodology of the principal component analysis reconstruction (Sharma et al. 2020) and the modification to the algorithm.

2.1 Reconstruction of the functional form of Hubble parameter, distance modulus, and angular scale in terms of redshift

For a spatially flat Universe composed of dark energy and non-relativistic matter, the Hubble parameter is given by,

$$\begin{aligned} H(z) \!=\! H_0\left[ \Omega _m (1\!+\!z)^3 \!+\! \Omega _{\textrm{DE}} e^{3 \int ^z_0 \frac{1+w(z')}{1+z'}dz'} \right] ^{1/2}\!\!. \end{aligned}$$
(1)

The dark energy EoS parameter \(w(z) = P^{\prime }/\rho ^{\prime }\) can be written as:

$$\begin{aligned} w(z) = \sum _{i=1}^{m} \alpha _{(i - 1)} {\mathcal {F}}(z)^{(i - 1)}, \quad {\mathcal {F}}(z) = \frac{z}{(1 + z)}, \end{aligned}$$
(2)

where \(H_0\) denotes the present-day value of the Hubble parameter and \(\Omega _m\), \(\Omega _{\textrm{DE}}\) are the density parameters for matter and dark energy, respectively. In Equation (2), \(m=2\) corresponds to the Chevallier–Polarski–Linder (CPL) parameterization (Linder 2003a) given by, \(w(z) = w_0 + w' z /(1+z)\), \(w_0\) and \(w'\) being the present-day values of the equation state parameter and its derivative, respectively. The equation gives the Taylor series expression of the dark energy EoS parameter in terms of \((1 - a)\), where a is the scale factor.

From the functional form of the Hubble parameter, we can reconstruct the dark energy EoS parameter w(z). Differentiating Equation (1) we get,

$$\begin{aligned} w(z) = \frac{3 h^2 - 2 (1 + z) h h^{\prime } }{3 h_0^2 (1 + z)^3 \Omega _m - 3 h^2}, \end{aligned}$$
(3)

where h is the reduced Hubble parameter given by H(z)/100 km \(\hbox {s}^{-1}\) \(\hbox {Mpc}^{-1}\).

The luminosity distance \(d_L(z)\) is given by,

$$\begin{aligned} d_L(z) = \frac{c}{H_0}(1+z)\int _0^z d_H(z')dz', \end{aligned}$$
(4)

where \(d_H\), from Equation (1) is,

$$\begin{aligned} d_H(z) = \left( \Omega _m (1+z)^3 + \Omega _x e^{{3\int _0^z \frac{(1+w(z'))dz'}{(1+z')}}}\right) ^{-1/2} \end{aligned}$$
(5)

and is related to the distance modulus as:

$$\begin{aligned} \mu (z) = 5 \log {\left( \frac{d_L}{1~\textrm{Mpc}}\right) } + 25. \end{aligned}$$
(6)

We use the same expression of Equation (2) for the EoS parameter to express \(\mu (z)\).

Since, \(D(z) = (H_0 / c) (1+z)^{-1} d_L(z) \), the EoS parameter in terms of distance is given by,

$$\begin{aligned} w(z) = \frac{2(1+z)D'' +3D'}{3D'^3 \Omega _m (1+z)^3 -3D'}. \end{aligned}$$
(7)

The Baryon Acoustic Oscillation (BAO) angular scale \(\theta _{b}\) is defined in terms of the angular diameter \(D_A\) as,

$$\begin{aligned} \theta _b = \frac{r_{\textrm{drag}}}{(1 + z) D_A}. \end{aligned}$$
(8)

Here, \(r_{\textrm{drag}}\) is the sound horizon at the drag epoch.

Following the reconstruction method of Sharma et al. (2020), we start by calculating the functional form of the reduced Hubble parameter h(z) and distance modulus \(\mu (z)\) directly from the data set, using principal component analysis. The observable of the given data set is expressed as a polynomial over an initial basis function, which creates a coefficient space. The dimension of the coefficient space is the same as the number of terms in the initial basis function. We select different patches in the coefficient space and do a \(\chi ^2\) calculation on each patch. For each patch, we get a minimum value of \(\chi ^2\). We create the PCA data matrix (\({\mathcal {D}}\)) from these minimum \(\chi ^2\) values of each patch. We then calculate covariance matrix \({\mathcal {C}}\) of \({\mathcal {D}}\), from which the eigenvector matrix \({\mathcal {E}}\) is calculated. \({\mathcal {E}}\) is used to diagonalize \({\mathcal {C}}\) and omit the linear correlation of the data matrix. It also creates a new set of basis functions. The observables are finally expressed in terms of the final basis function. With the help of these new basis functions, we create the new data matrix \({\mathcal {D}}'\). To select the value of the final basis number M, we compare the correlation matrix of \({\mathcal {D}}\) and \({\mathcal {D}}'\). A comparison of the correlation matrix also aids in selecting the best initial basis variable.

If the initial basis function is given by,

$$\begin{aligned} G=(f_1(z), f_2(z), \ldots , f_N(z)), \end{aligned}$$
(9)

with \(f_i(z) = f(z)^{(i-1)}\), the initial expression of the observable \(\xi \) in terms of the independent variable z is given by,

$$\begin{aligned} \xi _{ini}(z) = \sum _{i=1}^{N} b_i f(z)^{(i -1)}. \end{aligned}$$
(10)

The value of N is the number of terms in the polynomial expression of \(\xi _{ini}(z)\); it is also the dimension of coefficient space \(\vec {b}\). The correlation coefficient calculation determines the value of N (Kendall 1938; Sharma et al. 2020). This value must be large enough that the function can capture most of the features from the observed data set. To select the value of N, we calculate Pearson, Spearman, and Kendall correlation coefficients for the data matrix \({\mathcal {D}}\) (Kendall 1938; Kreyszig et al. 2011). The Pearson correlation coefficient gives the linear correlation in the data set. On the other hand, Spearman and Kendall correlation coefficients give the non-linear correlations of the data set. For the Spearman correlation coefficient, we calculate the rank of the data set. We arrange the ranks according to the numerical value; we give rank 1 to the highest numerical value of the PCA data set, rank 2 to the second highest, and so on. The Spearman correlation coefficient is the Pearson correlation coefficient of the rank variable of the data set. The Spearman correlation indicates whether there is a monotonic relationship between the dependent and independent variables, showing whether they tend to increase or decrease together. For the Kendall correlation coefficient, we find the concordant and disconcordant pairs. It gives the ordinal association between the variables (Kendall 1938; Kreyszig et al. 2011).

We choose the smallest value of N from the set of which the PCA data matrix gives us a higher value of Pearson correlation coefficient compared to the Spearman and Kendall correlation coefficients. If the expression of the observable \(\xi (z)\) in terms of the polynomial is exact, there would be no correlation between the coefficients of the polynomial expression. Our motive is to break the correlation of the coefficient and obtain the polynomial expression of \(\xi _{ini}(z)\) as closely as possible to the actual \(\xi (z)\). After the reduction of the higher order principal components (PCs), the number of the terms in the polynomial of \(\xi _{ini}(z)\) is M. The final functional form of the observable is,

$$\begin{aligned} \xi _{pca}(z) = \sum _{i=1}^{M} \kappa _i u_{i}(z), \end{aligned}$$

where \((u_1(z), u_2(z),\ldots , u_M(z))\) and \(U = G {\mathcal {E}}\). After applying PCA, the dimension of the coefficient space \(\vec {\kappa }\) is M.

In the earlier work, we have shown that a derived approach where the PCA obtains the observable and then reconstructs the dark energy EoS parameter is an efficient method to reconstruct the dark energy model rather than directly attempting to reconstruct it. Also, while we can reconstruct the Hubble parameter h(z) very well with PCA, the presence of a differentiation term in the equation given by Equation (3), which relates the EoS with h(z) increases the errors in the reconstruction of w(z).

We address this problem by suggesting a modified approach to bypass the differentiation in calculating EoS from the PCA reconstructed Hubble parameter, distance modulus function, and angular scale of BAO. This has been done by combining PCA with the maximum likelihood estimation (MLE) technique, using Markov Chain Monte Carlo (MCMC) to search for the best-fit dark energy model to the PCA reconstructed Hubble parameter, distance modulus, and angular scale. We replace the observational part of the MLE calculation with the best-fit curve of h(z), \(\mu (z)\), and \(\theta _b(z)\) as a function of redshift obtained via PCA. This method omits the dependencies on the number of observational data points. This analysis gives us the machinery to produce the most probable value of the model parameters by constraining the theory with reconstructed PCA data. The errors are the functions created from the covariance matrix of PCA data matrix (Huterer & Starkman 2003; Clarkson & Zunckel 2010; Sharma et al. 2020). The error comprises the eigenvalues and eigenfunctions of the covariance matrix (Huterer & Starkman 2003; Clarkson & Zunckel 2010; Sharma et al. 2020). The eigenvalues of the covariance matrix quantify the error in the reconstruction of the observable \(\xi (z)\). If \(\lambda _i\) are the eigenvalues of the covariance matrix \({\mathcal {C}}\), then the error associated with each of the components is \(\sigma (\alpha _i) = \lambda _i ^ {1/2}\). For M number of final terms, we have the final error as,

$$\begin{aligned} \sigma (\xi (z_a)) = \left[ \sum _{i=1}^{M} \sigma ^2(\alpha _i) e^2_i(z_a) \right] ^ {1/2}. \end{aligned}$$
(11)

Equation (11) gives the error function for a particular reconstructed curve, and we have the error as a function of redshift (Huterer & Starkman 2003; Clarkson & Zunckel 2010).

Fig. 1
figure 1

The plot shows the \(1 \sigma \) and \(2 \sigma \) contours for all the parameters, along with their marginal probability density plots for the case of the simulated data set. The first plot of every column is the marginal probability density plot, which gives the maximum evidence for the real Hubble parameter data set.

Fig. 2
figure 2

The plot shows the \(1 \sigma \) and \(2 \sigma \) contours for all the parameters along with their marginal probability density plots for the case of real data set (Simon et al. 2005; Moresco et al. 2012; Zhang et al. 2014; Moresco 2015; Ratsimbazafy et al. 2017). As in the previous figure, the first plot of every column is the marginal probability density plot.

Fig. 3
figure 3

In this figure, we show the 68% confident level range for the equation of state parameter w(z). The figure on the left is for the simulated Hubble parameter data set, and the right is for the real data set. The value of observational points \(n_d\) and sample points \(M_s\) are 600 and 800000, respectively. The median, mean, and mode of the posterior density function are represented by the black, red, and yellow curves, respectively. The cosmological constant model is consistent with the data and is denoted by the dashed line. For both plots, the mean and median lines overlap.

Fig. 4
figure 4

In this figure, we show the 68% confident level range for the dark energy density evolution \({\rho ^{\prime }}/{\rho _0^{\prime }}\). Here, \(\rho ^{\prime }\) and \(\rho _0^{\prime }\) are the dark energy density at redshift z and at the present time, respectively. Like in Figure 2, the left figure is for the simulated Hubble parameter data set, and the right is for the real data set. The value of observational points \(n_d\) and sample points \(M_s\) are 600 and 800000, respectively. The black, red, and yellow curves are the median, mean, and mode of the posterior density function. The cosmological constant model is consistent with the data and is denoted by the dashed white line. For both the plots, the mean and the median line overlap.

Fig. 5
figure 5

In this figure we show the \(1 \sigma \) and \(2 \sigma \) contours for all the parameters for the \(w_{\textrm{CDM}}\) (top) as well as for the \(w_{\textrm{CPL}}\) model. Plots in the top and right of the main figures are for the marginal probability density plot for the SNIa data set. The plots are created for \([n_d, M_s] = [100, 80000]\). Here, we use the Cepheid Calibrated SNIa data set (Deng & Wei 2018; Riess et al. 2021; Scolnic et al. 2022; Uddin et al. 2023).

Fig. 6
figure 6

The plot shows the \(1 \sigma \) and \(2 \sigma \) contours for all the parameters along with their marginal probability density plots for the case of distance modulus data set (Deng & Wei 2018; Riess et al. 2021; Scolnic et al. 2022; Uddin et al. 2023). For the plot we consider \([n_d, M_s] = [100, 80000]\). The models considered here are detailed in Equations (1) and (2). Plots in the top and right of the main figures are for the marginal probability density plot for the SNIa data set.

Fig. 7
figure 7

The plot shows the \(1 \sigma \) and \(2 \sigma \) contours for all the cosmological parameters along with \(r_\textrm{drag}\). Their marginal probability density plots for the case of transverse BAO data set (Deng & Wei 2018; Riess et al. 2021; Scolnic et al. 2022; Uddin et al. 2023) are shown in the first subplot of each column. For the plot we consider \([n_d, M_s] = [50, 10000]\). The models considered here are detailed in Equations (1) and (2).

Fig. 8
figure 8

The plot shows the joint likelihood assumed by all the data sets. The \(1 \sigma \) and \(2 \sigma \) contours for Hubble parameters are shown in green-blue colors, respectively, while the same for the SNIa are in grey colors, and for the BAO, it is in red. The model considered here are detailed in Equations (1) and (2). Here, for all the data, we run the chains for \((n_d, M_s)=(50, 10000)\).

2.2 No-U-turn sampler

To implement the MCMC search, we use the No-U-turn sampler (NUTS), which effectively chooses the best parameter region. The No-U-turn sampler modifies the Hamiltonian Monte Carlo (HMC), where the algorithm intrinsically selects the Leapfrog steps (Gelman & Rubin 1992; Hoffman & Gelman 2011; Salvatier et al. 2016). The selection of leapfrog steps is crucial in solving the Hamiltonian differential equations of the HMC. At every step, NUTS proceeds by creating a binary tree. In this binary tree, two particles representing progress in the forward and backward directions are created. If these two are represented as \((\mathbf {q_n^+}\), \(\mathbf {p_n^+})\) and \((\mathbf {q_n^-}\), \(\mathbf {p_n^-})\) then the NUTS conditions can be given by,

$$\begin{aligned} ({\textbf{q}}_{\textbf{n}}^+ - {\textbf{q}}_{\textbf{n}}^-)\cdot {\textbf{p}}_{\textbf{n}}^-< 0 ,\\ ({\textbf{q}}_{\textbf{n}}^+ - {\textbf{q}}_{\textbf{n}}^-)\cdot {\textbf{p}}_{\textbf{n}}^+ < 0. \end{aligned}$$

In HMC, we move in the phase space of \({\textbf{q}}\) and \({\textbf{p}}\) in the elliptical path (Gelman & Rubin 1992; Hoffman & Gelman 2011). The introduction of the momentum variable \({\textbf{p}}\) aims to ensure the exploration of a wider area in the parameter space. This is done by moving in an elliptical contour, which we get after solving the dynamical Hamiltonian equation. In NUTS, when we move half of the elliptical path, the sign of the momentum and the position variables are changed, and we stop. This makes the NUTS more efficient than HMC, wherein there is no way to ascertain if we are moving in the region of parameter space that has already been explored.

We choose the value of the total sample points \(M_s\) by checking the convergence limit using Gelman–Rubin statistic (Salvatier et al. 2016). Gelman–Rubin statistic for convergence is based on the notion that multiple convergence chain appears to be similar; otherwise, they will not converge. It is a standard method to run multiple MCMC chains to test for convergence. Scale reduction factor \(\hat{r_o}\) is used to check the Gelman–Rubin convergence. There are two main ways the sequences of MCMC iterations fail to converge. In one case, the chains run in different parts, which have drastic differences in posterior probability densities of the target distribution. On the other, the chains fail to attain convergence. We change the value of \(M_s\) until we get \(\hat{r_o} = 1\), which confirms attaining the convergence.

3 Results

We do the analysis described above for the Hubble parameter data, Cepheid Calibrated SNIa data and the BAO data set. We present the results of both the simulated and real data sets for the Hubble parameter. The simulated data set is created using the same parameter values fixed by Planck Collaboration et al. (2018). For the simulated \(\Lambda \)CDM data set, we have fixed the values of cosmological parameters as \(\Omega _m = 0.3\) and \(h_0 = 0.685\). We test the validity of our method and check if the analysis picks up these values. We then apply the method to the real data set, namely the Cosmic-Chronometer data set (Simon et al. 2005; Moresco et al. 2012, 2020; Moresco 2015; Ratsimbazafy et al. 2017; Jiao et al. 2023; Jimenez et al. 2023) as well as SNIa data set (Deng & Wei 2018; Riess et al. 2021; Scolnic et al. 2022; Uddin et al. 2023), then, compare with the usual likelihood analysis results. We also use the transverse BAO data set from Carvalho et al. (2016), Alcaniz et al. (2017), de Carvalho et al. (2018), Carvalho et al. (2020), which consists of 15 transverse BAO measurements (Nunes et al. 2020), that are calculated using the public data releases of the Sloan Digital Sky Survey (SDSS) (York et al. 2000), without assuming a fiducial cosmological model (Sánchez et al. 2011; Carnero et al. 2012).

To get the reconstructed curve of reduced Hubble parameter, distance modulus, as well as angular scale, we use \(f(z) = {z}/{(1 + z)}\) as the basis variable for simulated as well as observational datasets. This initial basis function gives the best reconstruction as shown in Sharma et al. (2020). Here, we can choose the value of \(n_d\), which is the number of data points in the observed part of MLE. We run the Markov Chain Monte Carlo (MCMC) chain to search for minimum \(\chi ^2\), which gives us the likelihood of the PCA data set. In the MCMC analysis, for the Hubble parameter data set, we use normal priors \({\mathcal {N}}(0.70, 0.2)\) and \({\mathcal {N}}(0.35, 0.1)\) for reduced Hubble constant \(h_0\) and \(\Omega _m\), respectively. For the DE parameters, \(\vec {\alpha }\) we take \({\mathcal {N}}(0, 3)\). Here, \({\mathcal {N}}(x_{\textrm{mean}}, x_{\textrm{mode}})\) represents the normal probability density function with mean \(x_{\textrm{mean}}\) and spread of \(x_{\textrm{mode}}\). For Cepheid Calibrated SNIa data, we use the data archive given in Riess et al. (2021), Uddin et al. (2023) and Deng & Wei (2018). In the MLE part, we take half normal with a standard deviation of 0.4 as a prior of \(\Omega _m\) and for \(\vec {\alpha }\) we take \({\mathcal {N}}(-2, 1.5)\). In the case of the BAO data set, we use the same priors as for the Hubble parameter dataset.

We choose the largest possible value for \(n_d\), which is limited by the computing power. We then check the results for different values of \(n_d\) and \(M_s\). Moreover, we find out the posterior distribution’s mean, median, and mode. For \(m = 3\) in Equation (2), we analyze different values of \(n_d\) and \(M_s\). \(m=3\) is the CPL parameterization along with the following order term (Linder 2003a). We vary \(n_d\) in the range 100–800 whereas \(M_s\) in the range 1000–800000 and find out mean, median, and mode as well as 1\(\sigma \) and 2\(\sigma \) ranges of \(\omega _m\), \(h_0\), \(\vec {\alpha }\).

In Figures 1 and 3, we show results for \(n_d = 600\), where we fix the number of sample points at \(M_s = 800000\). This particular choice of \(n_d\) and \(M_s\) gives us the closest approximation of the model parameters for the simulated Hubble parameter data. Also, we see that about this value of \(n_d\) and \(M_s\), we get the smallest variation in \(1 \sigma \) and \(2 \sigma \) ranges of the model parameters, with the variation of these two quantities. In particular for \((n_d, M_s) = (600, 800000)\) and (1000, 50000) the difference in \(1 \sigma \) and \(2 \sigma \) ranges are of the order of \({\mathcal {O}}(-1)\) for \(\vec {\alpha }\) and \({\mathcal {O}}(-2)\) or less for \(\Omega _m\) and \(h_0\). For the Hubble parameter data set, the mean of the posterior of \(h_0\) and \(\Omega _m\) from the algorithm are \(h_0 = 0.68\) and \(\Omega _m = 0.34\), which are very close to the assumed values to produce the simulated data set, \(h_0 = 0.685\) and \(\Omega _m = 0.3\). Table 1 shows the \(1\sigma \) and \(2\sigma \) ranges for the parameters, along with their best-fit values. The mean of the posterior of \(h_0\) and \(\Omega _m\) for the real Hubble parameter data are \(h_0 = 0.71\) and \(\Omega _m = 0.35\), respectively. From the mode plot of the posterior of the model likelihood of Figures 2 and 4, we can see the difference between the old cosmic chronometer data set (Simon et al. 2005; Moresco et al. 2012; Zhang et al. 2014; Moresco 2015; Ratsimbazafy et al. 2017) with the new cosmic chronometer data set (Moresco et al. 2020; Jimenez et al. 2023; Jiao et al. 2023). For the particular cosmological model of Equation (2), with the NUTS algorithm, PCA reconstruction brings w(z) closer to the \(w(z) = -1\) in comparison to the old cosmic chronometer data set. Also, we present our results for SNIa data set (Deng & Wei 2018; Riess et al. 2021; Scolnic et al. 2022; Uddin et al. 2023) as well as BAO data set (Carvalho et al. 2016; de Carvalho et al. 2018; Alcaniz et al. 2017; Nunes et al. 2020). For the SNIa data set, we show our results for \(w_{\textrm{CDM}}\) and \(w_{\textrm{CPL}}\) in Figure 5. For the model of Equation (2) with \(m=3\), we show our results in Figure 6. We present our results for \((n_d, M_s) = (100, 80000)\) for the SNIa data set. Table 1 gives a comparison; this table can be extended to different data sets and models.

In Figure 7, we present results for the BAO data set, using the DE model with \(m=3\) of Equation (2). We see, both from Figure 7, and the combined joint analysis plot Figure 8 that BAO gives tighter constraints in comparison to Hubble parameter and SNIa data sets. For Figures 7 and 8, we use \((n_d, M_s)=(50, 10000)\), and this choice is made under the available computational power. From Figure 7, we can see that PCA \(+\) MCMC gives very good reconstruction even with a small number of sample points \(M_s\). The 1\(\sigma \) range of \(r_\textrm{drag}\) from our method is [146.1, 148.2].

It is also evident from the Figures 27, that \(w(z) = -1 \) is well within the \(1\sigma \) range of w(z) parameters (\(\vec {\alpha }\)). The plots of w(z) and \(\rho (z)/\rho _0\) are similar for real and simulated data sets. The difference in w(z) and \(\rho (z) / \rho _0\) curve between simulated and real Hubble parameter data are 0.445 and 0.026, respectively. Here, \(\rho (z)\) and \(\rho _0\) are the total energy density at redshift z and at present. The dark energy density plot, \(\rho ^\prime (z) / \rho ^\prime _0\) vs. z, for simulated and real Hubble parameter data set are also similar, and the maximum difference between them for Hubble parameter data set is 0.31.

We restrict to the \(m=3\) cut-off in Equation (2) for \((n_d, M_s)=(600, 800000)\), largely due to the computational power available to us at present. In the Table 2 of supplementary information, for \((n_d, M_s)=(100, 100)\) we use the algorithm up to \(m=10\). We find out that for the Hubble parameter data set, the better constraint on the parameter space for \(m \ge 4\) needs to be done with \((n_d, M_s) \ge (600, 800000)\). In follow-up work, we optimize the algorithm to constrain the parameter space with large enough values of m and draw the physical conclusion. The parametrization of w(z) and \(({\rho ^{\prime }_{de}})/({\rho ^{\prime }_{0}})\) have the same physical implication, which is shown in Figures 2 and 4. The dark energy density can be derived analytically for these parameterizations, and fixing the dark energy EoS parameter determines the evolution of the energy density as a function of time.

In the MCMC run for both the real and simulated Hubble parameter data set, with \((n_d, M_s) = (600, 800000)\), the value of Gelman–Rubin convergence factor \(\hat{r_o}\) is 1. For SNIa MCMC run \((n_d, M_s) = (100, 80000)\) gives the value of \(\hat{r_o} = 1\). To check the convergence, we not only check the \(\hat{r_o}\) factor and eliminate those iterations which do not satisfy the \(\hat{r_o} \approx 1\) criteria, but we also check the trace plots, rank bar plots and the rank vertical line plots of the posterior sampling for visual confirmations (Gelman & Rubin 1992; Cowles & Carlin 1996; Brooks & Gelman 1998).

The error bars of the parameters in Table 1 are derived when the error function from PCA is considered. Hence, the \(1\sigma \), \(2\sigma \) ranges are affected by the error functions we introduce in the MLE. For \(w_{cpl}\), with the Pantheon data set, when we consider a half-normal probability distribution for the error part of the MLE, the range of \(h_0\) changes to [0.6005, 0.6584], which is almost three times smaller than the range when PCA error function is considered. PCA error function is created solely from the data structure we provide in the first step of PCA. With the improvement of error bars in the original data points the range of the parameters will reduce significantly.

Table 1 This table gives the 1\(\sigma \) and 2\(\sigma \) ranges for parameters, \(h_0\), \(\Omega _m\) and \(\vec {\alpha }\), for all the different model and the data-type for which we run our complete analysis. For the cosmic chronometer data set, we use both the simulated and the real data sets and apply them to the \(w_{\textrm{model}}\). For SNIa data, we do our analysis for \(w_{\textrm{CDM}}\), \(w_{\textrm{CPL}}\) as well as the dark energy model of Equation (2) with \(m=3\). Here, \(\vec {\alpha }\) are the parameters of the dark energy equation of state parameter. The last column of the table corresponds to the best-fit value of the parameter of the model given the data set.

We also analyze the classical Metropolis–Hastings (MH) and the Hamiltonian Monte Carlo (HMC) sampler. For comparison with MH and HMC, we do the analysis with \((n_d, M_s) = (600, 800000)\) and \((n_d, M_s) = (100, 80000)\) for Hubble parameter and Supernovae data, respectively. Our analysis shows that the NUTS and HMC sampler perform better than the MH sampler. For more than six continuous parameters and with the same CPU power, NUTS improves speed by a factor of 2.4 to complete the analysis. Details of the time taken by the MH and NUTS are given in Table 2 of supplementary information. The plots for the real Hubble parameter data set, with the MH, are shown in Figure 15 of supplementary information. Again, NUTS is better than HMC as after picking up the leapfrog steps, NUTS stops automatically when the NUTS conditions are satisfied. It has been explicitly shown in (Hoffman & Gelman 2011) that the NUTS algorithm is more efficient. Convergence plots for the NUTS, HMC, and MH samplers, in the case of the Hubble parameter data set, are shown in Figures 9–18, respectively (see supplementary information).

4 A comparison with other methods

The reconstruction of H(z), \(\mu (z)\), and \(\theta _b(z)\) from the PCA algorithm, which is described in Section 2 and in Sharma et al. (2020) is qualitatively different from the other PCA techniques employed in the literature Huterer & Starkman (2003), Huterer & Cooray (2005), Clarkson & Zunckel (2010), Ishida & de Souza (2011) and Zheng & Li (2017). The starting assumption is that the function h(z), \(\mu (z)\), and \(\theta _b\) smoothly vary over redshift z. This is a reasonable choice as described by the current data sets (Simon et al. 2005; Moresco et al. 2012, 2020; Moresco 2015; Carvalho et al. 2016, 2020; Alcaniz et al. 2017; Ratsimbazafy et al. 2017; Deng & Wei 2018; de Carvalho et al. 2018; Riess et al. 2021; Scolnic et al. 2022; Jiao et al. 2023; Jimenez et al. 2023; Uddin et al. 2023). Different variants of PCA techniques have been adopted in Liu et al. (2019) and Liu et al. (2016). Before creating a different set of simulated Hubble data to construct the covariance matrix Liu et al. (2019) uses an error model. While Liu et al. (2016) combines the weighed least square method with PCA.

We apply MLE to the observed data sets using the reconstruction functional form of PCA. Including a Cosmological model is only in the final part of our methodology, where we use the MLE technique. MCMC chain with the NUTS, over the model-independent reconstruction of PCA, gives unbiased constraints on the model parameters. Fisher matrix computation is one of the major ways to PCA reconstruction (Huterer & Starkman 2003; Huterer & Cooray 2005; Crittenden et al. 2009; Clarkson & Zunckel 2010; Ishida & de Souza 2011; Hojjati et al. 2012; Nair & Jhingan 2013; Nesseris & García-Bellido 2013; Zheng & Li 2017; Miranda & Dvorkin 2018; Hart & Chluba 2019, 2022a). Our methodology calculates the Covariance matrix, which quantifies the correlation and uncertainties directly from the PCA data matrix described in Section 2. Reduction of dimension, a distinctive feature of the PCA reconstruction, omits the noise part from the PCA data matrix. Therefore, the parameter constraints that are done by replacing the observational part with the PCA reconstructed part will constrain the parameter more reasonably. One important feature of the (PCA \(+\) MCMC) methodology is that it will also work for sparse data sets. In comparison to the classical techniques, it can be easily generalized to a higher dimension of parameter space with little expense of computational time, as described in Section 2 and showed quantitatively in Appendix (see supplementary information).

5 Conclusions

This paper combines the principal component analysis reconstruction with the Markov Chain Monte Carlo tool to determine cosmological parameters. We assume the Taylor series expansion of the EoS parameter in terms of the scale factor as the parameterization of the dark energy EoS. When the method of PCA is combined with the correlation coefficient calculation and the MCMC tool, we can determine the number of points in the observational part using the maximum likelihood method. We use the No-U-turn sampler for this analysis.

First, we test the method on simulated data and check if the values assumed for the cosmological parameters are reconstructed effectively. We see that the predictions for the model parameters are consistent with the assumed values. The parameter estimation does not depend strongly on the prior probability assumption, and the idea can be generalized to other data sets and different sampling techniques. The relation between the Hubble parameter and the EoS of dark energy also contains the first differentiation of the Hubble parameter, which introduces an unwanted error in the EoS predictions. Similarly, for SNIa and BAO data sets the relation between the distance modulus and the angular scale with the EoS of dark energy contains first and second-order differentiation.

The present method eliminates the error that arises from the first and higher-order differentiation of the observable to infer the value and ranges of the EoS of dark energy. In this work, we only use the error function that comes directly from the PCA algorithm, and one can use different error functions in the error part of the MLE as well. It is clear from the results that, for the simple model of dark energy we take, the allowed range of cosmological parameters is consistent with other analyses, and the cosmological constant model is well within the allowed range of models for both the Hubble parameter and distance modulus data set. This analysis can be extended to other dark energy models. Here the advantage is that the complete functional form of the observable of the dataset is obtained; in the present work, the Hubble parameter and distance modulus as a function of redshift are determined. The second step is deriving the dark energy EoS parameter. The method is suitable for different types of data and merits future analysis. It depends only on the model-independent reconstruction of the data set and its associated error. Improvement of even a single data point leads to an increase in the constraining power of our method. With the upcoming improved Hubble parameter, SNIa, and BAO data sets, the application of the method will lead to better constraints and a much easier distinction between different dark energy models. Also, the method discussed here can be used as a model selection tool for data sets with fewer data points.

6 Supplementary information

Appendices A and B and Figures 9–18 have been given as supplementary information. The online version contains supplementary material available at https://doi.org/10.1007/s12036-024-10009-9 and at https://www.ias.ac.in/listing/articles/joaa/.