1 Introduction

Addressing climate change is regarded as a major challenge for our generation. However, there remains a disparity in the consensus regarding the nature, scope, and underlying causes of the problem, as well as the most effective strategies for mitigation (Ardia et al., 2023). Because of these divergent perspectives, certain clients, regulators, and investors prioritize sustainable solutions and investments that address environmental concerns, whereas others do not. In recent years, Climate Change Conferences (e.g., the 26th United Nations Climate Change Conference) have underscored the urgency and scale of the planets environmental challenges. These summits served as catalysts for worldwide policy choices and had a substantial impact on many industries, including finance (Li et al., 2023). The shift towards environmentally sustainable investments, aligned with the demands of socially responsible investors, has emerged as a central theme in capital markets and investment strategies. According to Bouri et al. (2023), central banks and financial institutions are keenly focused on incorporating climate-related considerations and ESG issues in their policies. Concurrently, investors are expressing their intention to adeptly manage climate risks and devise strategies to curtail carbon emissions in their investment portfolios.

In their recent publication, Pástor et al. (2021) presents a theoretical model designed to accurately capture the impact of shifts in ecofriendly choices on asset valuation. In the context of global warming, their model suggests that green stocks tend to unexpectedly outperform brown stocks when concerns about environmental change become more pressing. Based on their theoretical framework, Ardia et al. (2023) built a daily media Climate Change Concerns index (MCCC) using news from major U.S. newspapers and news wires. Their indicator showed that green companies share prices rise on days when new climate change risks are announced, whereas the prices of brown firms decrease. Moreover, their effects apply to both transitional and physical climate change risks. The index further reveals that a surprising increase in climate change-related apprehensions is correlated with a reduction in the discount rate for brown companies. Therefore, the stock market reacts immediately to news, indicating an increase in consumer and investor preferences for green enterprises as a result of unexpectedly rising concerns about climate change. Considering the worldwide concern about climate change, financial markets are also demonstrating increased awareness and responsiveness to environmental matters. Within this complex interconnection, the significance of ESG indicators has become more prominent, functioning as symbolic measures of corporations dedication to sustainable methodologies. While significant attention has been devoted to the broader implications of climate change, the integration of specific climate change riskstransition and physical risks, in particularinto stock market predictive models, remains insufficiently explored. This study fills this gap by examining the relationship between climate-related risks and refining predictive models in the stock market, emphasizing the role of ESG factors. We provide an in-depth assessment of the impacts of the transition (TRI) and physical risk (PRI) indicators on the prediction of ESG prices. Our analysis aims to address this gap by answering the following question: Do TRI and PRI offer valuable information for predicting ESG prices.

Considering the above debate, the goal of this investigation is to evaluate whether climate-related risks, both physical and transitional, have the potential to predict ESG prices. This study offers a new mechanism to explain the financial effects of climate change. Numerous studies have substantiated the fundamental idea that the media serves as a potent instrument for enhancing public awareness of environmental issues Baker et al. (2016). To our knowledge, this is the first study to use the TRI and PRI proposed by Bua et al. (2024) to forecast ESG stock prices. When compared to the existing literature, our research contributes to earlier studies that generally assess the influence of a single metric of environmental concerns on the returns of green and brown assets (Gong et al., 2023; Guo et al., 2023). Additionally, we employed an alternative set of climate risk indices, known as the MCCC index, as proposed by Ardia et al. (2023). This index encompasses both transitional and physical aspects of climate risk, enabling the identification of potential variations in their effects within our methodological framework. Third, we studied a set of candidate machine-learning models that meet the aforementioned empirical criteria. These include linear regression (LR), Lasso Regression, Elastic Net regression, Bayesian Ridge regression, XGBoost regression, LightGBM regression, and Extra Trees Regression.

The most significant conclusion drawn from our empirical research is that machine learning, in its entirety, possesses the potential to advance our understanding of asset pricing. Further, we contribute to the economic literature on climate change by providing experimental evidence that explainable artificial intelligence (XAI) is effective for price predictions (Jabeur et al., 2021; Stef et al., 2023). When comparing statistical models with artificial intelligence (AI) models, it is evident that AI models exhibit superior performance in successfully identifying nonlinear correlations and patterns between predictors and predictands that exist in several dimensions. They accomplished this without necessitating predetermined assumptions about the statistical distribution of residuals, functional equation form, or the absence of collinearity among predictors. Specifically, tree-based artificial intelligence models exhibit interpretability as a key characteristic and demonstrate superior prediction accuracy compared to statistical models (Chakraborty et al., 2021; Goodell et al., 2023). Explainable Artificial Intelligence techniques aim to address challenges stemming from the inherent opacity of black-box models, including issues such as mistrust, lack of accountability, and susceptibility to errors, while maintaining a commendable level of predictive accuracy (Bauer et al., 2023). This investigation reveals a significant relationship between climate change risks, particularly transition and physical risks, and the precision of stock market predictive models that focus on ESG metrics. By analyzing data from January 2006 to July 2022, it is evident that models integrating climate risk variables (PRI and TRI) show enhanced accuracy in the prediction of ESG stock market prices. Furthermore, the application of explainable artificial intelligence, specifically through SHapley Additive exPlanations (SHAP) plots, illuminated the pronounced influence of variables such as inflation, recession, and concerns about pollution levels on the S &P 500 ESG index.

The remainder of this paper is organized as follows. Section  2 presents the literature review, highlighting prior research on climate change risks, ESG metrics, and stock market predictions. Section  3 delves into the data and variables, providing insight into the daily time-series data and key variables incorporated. Section  4 describes the methodology and details of the machine learning models and techniques employed. Section  5 presents the results and discussion. Finally, Section  6 concludes the paper and suggests avenues for future research.

2 Literature review

Extensive studies have investigated the elements that influence stock market value. Classical determinants include several factors such as interest rates (Blanchard, 1981; Sweeney & Warga, 1986), inflation expectations (Firth, 1979; Pindyck, 1983), corporate earnings (Zarowin, 1989; Pevzner et al., 2015), and economic growth (Levine & Zervos, 1998; Masoud, 2013). Therefore, public attention to pollution and global warming is an important area of study in the ecological field, as pointed out by El Ouadghiri et al. (2021). Public attention towards climate change and pollution may influence the returns on sustainability indices (Li et al., 2022; Ren & Ren, 2024; Liu et al., 2024).

The growing prevalence of natural catastrophes attributed to climate change, along with a heightened focus on climate-related concerns, has prompted a surge in scholarly investigations examining the influence of global warming risks on financial markets (Stroebel, 2021; Hong et al., 2019). The theoretical models of mass media communication by Ball-Rokeach and DeFleur (1976) and Scruggs and Benegal (2012) support the idea that communication transmitted by the media is an efficient mechanism for enhancing the public consciousness of environmental concerns. Ardia et al. (2023) built a climate change index based on major U.S. newspapers and newswires from 2003 to 2018. Researchers have discovered that the index effectively encompasses several significant climate change occurrences that are expected to heighten apprehensions regarding climate change. Engle et al. (2020) developed a pair of complementary indices to assess the level of discourse on climate change in news media. Their findings indicate that hedge portfolios constructed using Sustainalytics exhibit superior in-sample fit, out-of-sample performance, and cross-validation performance.

Faccini et al. (2023) performed a textual and narrative examination of Reuters news on climate change from January 1, 2000, to December 31, 2018. The authors reveal that only the risks stemming from the U.S. climate policy discussion are priced; this pricing phenomenon is a recent development, primarily influenced by the period from 2012 to 2018. Several studies and models have been developed to predict stock prices. These endeavors have yielded a wide array of methodologies (Gu et al., 2020; Leippold et al., 2022; Akey et al., 2022). For example, Giglio and Xiu (2021) provide a three-pass approach to estimate the risk premium associated with an observable element. The methodology used in their study involved a principal component analysis of the returns of test assets to identify the underlying factor space. Hao et al. (2020) introduced a new hybrid model utilizing feature selection and a multi-objective optimization algorithm to predict carbon prices. Guo et al. (2023) investigated the impact of two types of climate risk on the price volatility of natural gas. Their empirical findings demonstrated a strong predictive association between natural disasters and natural gas price volatility. In both in-sample and out-of-sample scenarios, their empirical findings demonstrated a strong predictive association between catastrophe frequency and natural gas price volatility. Yu et al. (2023) used a decomposed GARCH-MIDAS model to investigate the predictive power of the air quality index and the variability of stock market prices. Their findings demonstrate that the air quality index offers useful prediction performance for stock market volatility. Gong et al. (2023) examined the potential impacts of climate change awareness on carbon return forecasts in the future. Their empirical results showed that incorporating climate attention into prediction models offers enhanced performance compared to comparable benchmark models. This finding indicates that climate attention offers valuable predictive insights into future carbon returns. Salisu et al. (2023) assess the predictive capacity of temperature growth uncertainty on stock market tail risk in the United States. Their study revealed that incorporating uncertainty in temperature growth predictions is beneficial for forecasters who experience higher losses from the underestimation of tail risk than from overestimation.

Unfortunately, little research has focused on how climate change risks affect ESG stock pricing. Wang and Li (2023) used a GARCH-MIDAS model to analyze the prediction performance of different climate change indices in the CSI 300 ESG and SSEC index volatility predictions. Umar et al. (2022) investigate the nexus between ESG ratings and the level of accuracy of target prices provided by sell-side analysts. These findings indicate a favorable correlation between the ESG score and the accuracy of the target price. Their research revealed a beneficial correlation between the ESG score and the accuracy of the target price. Considering the limited existing literature on this topic, our study contributes significantly by introducing a novel framework designed to predict ESG stock prices in the context of climate change risk. This framework not only enhances our understanding of the complex interplay between ESG stock performance and environmental risks but also offers valuable insights for investors and policymakers navigating the challenges posed by a changing climate.

3 Data description

We used daily time-series data from January 2006 to July 2022. The selection of this period was constrained by the availability of Climate Change Index data. The primary variable under investigation is the S &P 500 ESG price, which measures the performance of securities that meet specific sustainability criteria while maintaining sector weights similar to those of the standard SP 500 index. We sourced this data from the Bloomberg database. Bustos and Pomares-Quimbaya (2020) argued that analyzing price changes over time is the most crucial factor. In this study, we analyzed the market using closed pricing (Khalfaoui et al., 2022). Based on the existing literature, nine distinct factors have been identified as potential predictors and incorporated into this analysis to enhance the precision of the predictive models (Campbell & Vuolteenaho, 2004) and (Bakas & Triantafyllou, 2018). We used pollution (POL), inflation (INF), recession (REC), geopolitical risk (GPR), economic policy uncertainty (EPU), new-based sentiment index (NSI), physical risk index (PRI), transition risk index (PRI), media, and climate change concerns (MCCC). Table 1 provides definitions and sources of the variables incorporated into our dataset. Concurrently, Fig. 1 displays a time series plot, Fig. 2 offers a visual representation of the correlation matrix and Table 2 presents a summary of the descriptive statistics.

Before training the model, we analyzed the correlation between every pair of features selected to remove any redundant features when building the predictive ML models. In general, additional features may create additional noise. Therefore, we computed a correlation matrix using Spearmans rank correlation coefficients to reduce the number of predictor features and test whether they were suitable for the prediction of the model (Kathuria et al., 2016; Chatterjee et al., 2000). This statistical test establishes a correlation coefficient between -1 and 1 to check a linear correlation between two variables. If the value of Pearsons correlation coefficient is close to zero, this indicates the absence of a linear correlation between the input and output features. Otherwise, the larger the value of the correlation coefficient, the more linear the correlation between the two variables.

The Correlation matrix between the S &P 500 ESG Index and other features is shown in Fig. 2. The strongest correlation was observed between inflation and media climate change. There is a positive correlation between the S &P 500 ESG and inflation, with a value of 0.45 and media climate change concerns, with a value of 0.41. However, transition risk index provides negative coefficient with a value of \(-\)0.13. This finding suggests that these factors have a substantial influence on the S &P 500 ESG market and sustainability-focused securities. Thus, one may suggest that including these features would be beneficial for preventing redundancy. However, it is important to note that the correlation matrix does not provide a thorough and exhaustive understanding of the linkages among all variables being examined. Hence, it is necessary to apply optimal ML model training combined with an ML explainer based on Shapley additive explanations (SHAP) to explore the connections between the descriptors in a predictive model.

Table 1 Variables and measurement
Table 2 Descriptive statistics
Fig. 1
figure 1

Times series plots

Fig. 2
figure 2

Spearman correlation coefficient matrix for all predictive features

4 Methodology

4.1 Quantification of asymmetric dependence

In recent years, there has been significant debate regarding potential biases in symmetric dependency measures. These measures are commonly employed to discern the structure of intricate systems and infer causation in bivariate interactions (Griessenberger et al., 2022). To overcome these limitations, we propose copula-based dependence analysis (qad). The qad method quantifies the degree to which the dependency structure of variables X and Y deviates from independence. In contrast to several other methodologies, qad has the capability to identify both total dependencies. Junker et al. (2021) introduce the copula-based dependence measure, termed qad. The qad method quantifies the extent to which the dependency structure between variables X and Y deviates from independence. Unlike many other methodologies, qad possesses the ability to identify total dependency. The qda measures the dependence values q(XT) and q(TX) while q(TX) quantifies the influence of T on X, q(TX) lies in the range [0, 1] and denotes the influence of T on X.

For a two-dimensional sample of size n denoted as \((x_1, y_1), \ldots , (x_n, y_n)\) from the random vector (XY), the qad method involves calculating the empirical copula \((E_n)\). This process includes comparing the conditional distribution functions of the checkerboard copula within the unit interval to the distribution function of the uniform distribution. Consider X and Y as time series variables characterized by the joint distribution function H and copula A. The dependence measure is expressed as follows:

$$\begin{aligned} d(X, Y) = 3 \int _{[0,1] \times [0,1]} |K_A(x, [0, y]) - K_B(x, [0, y])|d\mu (x)d\mu (y) \end{aligned}$$

where \(K_A\) and \(K_B\) represent Markov kernels of the bivariate copulas while A, B, and \(\mu \) indicate the Lebesgue measure. The two directed qad values \(q(x,y) \in [0,1]\) range from 0 to 1 and can be found using the functions in the R-package qad. These values quantify the influence of X on Y and q(YX), respectively. High values indicate strong relationships between the variables, whereas low values suggest a weak dependence.

QAD is applicable to a wide range of situations including linear, non-linear, and non-monotonic contexts. It is sensitive to data noise, performs well with small sample sizes, detects dependence asymmetry, and exhibits high power in testing for independence. Importantly, qad does not require any assumptions about the underlying distribution of the data and effectively quantifies the information gain or predictability of quantity Y given knowledge of quantity X and vice versa (i.e., \(q(X,Y) \ne q(Y,X)\)).

Junker et al. (2021) pointed out that despite the widespread occurrence of asymmetry in bivariate distributions, most standard methods used across various scientific fields to quantify statistical dependence between random variables X and Y largely ignore this critical aspect. Traditional dependence measures, such as Pearson’s r, Spearman’s rank correlation \(\rho \), and Kendall’s \(\tau \), are only applicable in specific situations (linearity or monotonicity/concordance) and are inherently symmetric.

4.2 Linear regression

Linear Regression (LR) is widely used to statistically analyze dependency relationships between variables. The LR method fits a one-dimensional linear equation to actual values without any hyperparameters Nystrom et al. (2019). Additionally, the linear regression model approximates the sum of the squares of the residuals between the observed targets and the predicted value of the target using a linear approximation. The general form of the LR model is as follows:

$$\begin{aligned} SPESG500= & \beta _0 + \beta _1 \text {INF} + \beta _2 \text {REC} + \beta _3 \text {POL} + \beta _4 \text {PRI} \\ & + \beta _5 \text {TRI} + \beta _6 \text {GPR} + \beta _7 \text {NSI} + \beta _8 \text {EPU} + \epsilon \end{aligned}$$

where \(\epsilon \) represents the error term, assumed to be normally distributed with a mean of zero. This equation models SPESG500 as the dependent variable, predicted through a linear combination of the independent variables \(\text {INF}, \text {REC}, \text {POL}, \text {PRI}, \text {TRI}, \text {GPR}, \text {NSI}\), and \(\text {EPU}\), each multiplied by their respective coefficients \(\beta _1, \beta _2, \ldots , \beta _8\).

4.3 Lasso regression

The least absolute shrinkage and selection operator (LASSO) regression, first proposed by Tibshirani (1996), is a shrinkage technique extensively used to mitigate the complexity of linear models. The core idea of LASSO regularization is to shrink the irrelevant coefficients by setting them exactly to zero. Unlike the ridge regression technique, the LASSO selects variables to minimize a penalty function based on the residual sums of squares, subject to a sum of absolute values smaller than the fixed value (Lloyd-Jones et al., 2018; Geraldo-Campos et al., 2022). In situations when the number of predictors exceeds the number of data \((p > n)\), LASSO forces to select the number of factors used in the model. The magnitude of the penalty is controlled through a hyperparameter, and the larger we attribute a penalty value, the nearer the reduction value of the parameter estimate is to zero. Therefore, LASSO penalty mitigates complexity by leading to higher prediction accuracy and easier interpretability of coefficients. In the LASSO model, the loss function can be expressed as:

$$\begin{aligned} \hat{\beta }_{lasso}=\underset{\beta \in R^{d}}{argmin} \left\| Y-X\beta \right\| ^{2}, \sum _{j=1}^{d} \left| \beta _{j}\right| ; \le t; t\ge 0 \end{aligned}$$

where \(Y=(y_{1},y_{2},\dots ,y_{N})^{T}\) is the dependent variable, \(X=(x_{1},x_{2},\dots ,x_{d})^{T}\) is independent variables. \(\lambda \) is a nonnegative regularization parameter. \(\hat{\beta }\) is the reduction estimate of regression values in the lasso, with \(\left\| \beta \right\| _{1}=\sum _{j=1}^{p}\left| \beta _{j}\right| \) is the L1-norm penalty on \(\beta \).

4.4 Elastic net regression

The Elastic Net (EN) regression algorithm proposed by Zou and Hastie (2005) is a mixture of ridge regression and Lasso Regression procedures, which use the L1-norm penalty of LASSO and L2- norm penalty of ridge regularization methods in the model training. The Elastic Net (EN) is appropriate for highly correlated variables; it permits the extraction of useful information from large datasets without limiting the number of variables selected (Wang et al., 2020). In contrast to the pure ridge and Lasso Regression algorithms, it can also reduce bias (Hastie & Qian, 2014) and multicollinearity, and withstand double compression. The coefficient \(\beta \) via the elastic penalty is defined as:

$$\begin{aligned} \hat{\beta }_{elastic}=\underset{\beta _{0}\in {\mathbb {R}}, B \in {\mathbb {R}}^{K}}{\min } \left( \frac{1}{2N}\sum _{i=1}^{N}(y_{Bi}-\beta _{0}-X^{T}_{i}B) +\lambda (\frac{1-\alpha }{2}\left\| B\right\| ^{2}_{l_2}+\alpha \left\| B\right\| _{l_1}\right) \end{aligned}$$

where, \(\lambda \) is the regularization parameter. \(\beta _0 \in {\mathbb {R}}\) represents the intercept term of the regression model. It is a scalar value belonging to the set of real numbers \({\mathbb {R}}\). The intercept term \(\beta _0\) is the expected mean value of the dependent variable when all predictor variables are set to zero. \( B \in {\mathbb {R}}^k \) indicates that B is a vector in \( k \)-dimensional real space (\( {\mathbb {R}}^k \)), where \( B = [\beta _1, \dots , \beta _k] \) is the vector of regression coefficients for the predictor variables \( X \), excluding the intercept. Each coefficient \( \beta _i \) quantifies the effect of the corresponding predictor variable on the dependent variable. This means that, under the condition that the intercept, \(\beta _{0}\) is excluded, then we have an indicator variable \(y_{B}\) relative to the high test scores. \(\lambda \) denotes as a shrinkage parameter to control the amount of regression coefficients. Here, the regularization term has two components. The first is the ridge penalty, \(\left| \beta \right| ^{2}_{l2}\) and the second the the so-called the lasso penalty \(\left| \beta \right| ^{2}_{l1}\). Both are used to shrink the coefficients to zero in the case of highly correlated variables.

4.5 Bayesian ridge regression

Ridge regression originated by Hoerl and Kennard (1970) for high-dimensional data can be regarded as a special case of the Elastic Net. This model was introduced to solve multicollinearity and instability problems. The objective function of ridge regression is expressed as follows:

$$\begin{aligned} \hat{\beta }_{ridge}=\underset{\beta _{1},\dots ,\beta _{k}}{argmin} \left( \left( \frac{1}{T}\sum _{t=1}^{T}\left( y_{t+h}-y_{t}\right) -\sum _{j=1}^{K}\beta _{i}x^{\prime }_{j,t}\right) ^{2}+\lambda \sum _{j=1}^{K}\beta _{j}^{2}\right) \end{aligned}$$

where \(\beta \) is the \(k\times 1\) vector of the noise precision parameters, \((y_{t+h}-y_{t})\) is the dependent variable. Based on a common notation \(\left\{ x_{1,t},\dots ,x_{k,t}^{\prime }\right\} \) \(K \times 1\) represents the vector of regressors and \(\lambda \) is the shrinkage parameter used to control for the magnitude of the shrinkage penalty arising from bias and variance. Bayesian Ridge Regression (BRR) uses a Bayesian classifier for optimization to minimize the error rate. We assume that there is a specific item to be classified. Thus, it is necessary to first determine the prior probability of an object, and then calculate its posterior probability for all categories. It is assumed that BRR algorithm has an organic combination (Assaf et al., 2019). Thus, the extreme solution function of the BRR can be written as:

$$\begin{aligned} \ln p(\omega \backslash t)=-\frac{\beta }{2}\sum _{n=1}^{N} \left\{ y(x_{n},\omega )\right\} ^{2}+\frac{\alpha }{2}\omega ^{T}\omega +const \end{aligned}$$

where \(\omega \) satisfies the Gaussian distribution, \(p(\omega |T)\) is the conjugate prior. For Bayesian classifiers, the \(\alpha \) and \(\beta \) correspond respectively to the variance of the sample set and the Gaussian distribution of \(\omega \), T is the target vector for the data sample, const is a quantity independent of the parameter \(\omega \).

The feature log of the posterior distribution based on the sum of the log-likelihood and log of the prior can be written as:

$$\begin{aligned} \ln p(\omega \backslash t )=\underbrace{-\frac{\beta }{2}\sum _{n=1}^{N} \left\{ t_{n}-\omega ^{T}\phi (x_{n})\right\} ^{2}}_{\text {likelihood}} -\underbrace{\frac{\alpha }{2}\omega ^{T}\omega }_{\text {prior}}+const \end{aligned}$$

where \(\phi _{i}(x)\) represents a set of fixed basis functions, and t is the target vector of the data samples, \(t = (t_{1}, t_{2}, \dots ,t_{n})\). constis a constant item unrelated to \(\omega \). To alleviate the effect of contradictory ill-conditioned data, we adopt the maximum posterior distribution by adding an \(\ell _{2}\) norm by Bayesian inference. Finally, in order to further extract the hyper-parameters, a modeling method based on the marginal likelihood maximization is proposed in this paper.

4.6 XGBoost regression

We denote a dataset as \(D =\left\{ (x_{i};y_{i})\right\} ; (\left| D\right| =n; x_{i} \in \Re ^{m}, y_{i} \in \Re )\) with n instances and m features. Chen and Guestrin (2016) show that the target value \(\hat{y}_{i}\) predicted from the XGBoost could be expressed as:

$$\begin{aligned} \hat{y}_{i}=\sum _{k=1}^{K}f_{k}(\Theta _{i}), f_{k} \in {\mathcal {F}} \end{aligned}$$

where \(\hat{y}_{i}\) is the predicted value of the i-th sample in the model, \({\mathcal {F}}\) denotes the space of all decision trees; K is the total number of regression trees; \(\Theta \) refers to the input value of the \(i^{th}\) data features employed; \(f_{k}(.)\) the correlation between the predicted value corresponding to the \(k^{th}\) tree and the leaf weight. To prevent overfitting, we build an objective function of XGBoost, which contains two parts: the first part is the loss function of the gradient boosting algorithm and the second part is the regularization term. Following an optimal structure of the regression tree, the loss function is defined as follows:

$$\begin{aligned} T_{obj}= & \sum _{i=1}^{n}l(y_{i},\hat{y}_{i})+\sum _{k=1}^{K}\Omega (f_{k})\\ \Omega (f_{k})= & \gamma T+ \frac{1}{2} \lambda \left\| W\right\| ^{2} \end{aligned}$$

where n is the number of training data; \(l(yi; \hat{y}_{i})\) is the loss function that measures the difference between the predicted value \(\hat{y}_{i}\) and the actual value of \(y_{i}\) and describes how well the model fits the training data. \(\Omega (f_{k})\) is the regularization term, used to penalize the complexity of regression trees. T denotes the number of leaf nodes of tree, \(\lambda \) and \(\gamma \) are the penalty coefficient that control the complexity of the tree, and W represents the weight score vector of leaves in T. The goal of the regularized objective function \(T_{obj}\) is to reduce the loss function as much as possible, this could not be done using traditional optimization methods, therefore XGBoost uses an an incremental function learning in each iteration.

The objective function for \(t^{th}\) iterations is expressed as:

$$\begin{aligned} T_{obj}=\sum _{i=1}^{n}l(y_{i},\hat{y}_{i}^{t-1}+f_{t}(\Theta _{i}))+\Omega (f_{t})+C_{0} \end{aligned}$$

In the formula: \(C_{0}\) describes the constant term. Note that we employ the second-order Taylor expansion for the above equation, so that the \(T_{obj}^{t}\) can be approximated by:

$$\begin{aligned} T_{obj}^{t}= \approx \sum _{i=1}^{n}\left[ l(y_{i},\hat{y}_{i}^{t-1}) +g_{i}f_{t}(\Omega _{i})+\frac{1}{2}h_{i}f_{t}^{2}(\Omega _{i})\right] +\Omega (f_{t})+C_{0} \end{aligned}$$

where \(g_{i}= \partial {l(y_{i},\hat{y}_{i}^{t-1})/\partial {\hat{y}^{t-1}}}\) and \(h_{i}= \partial ^{2}{l(y_{i},\hat{y}_{i}^{t-1})/\partial {(\hat{y}^{t-1}_{i})}^2}\) denote the first and second-order partial derivatives of the loss function. In general, after discarding the constant term, a new objective function \(T_{obj}^{t}\) is given by:

$$\begin{aligned} T_{obj}^{t}=\sum _{i=1}^{n}[g_{i}f_{t}(\Theta _{i})+\frac{1}{2}h_{i}f_{t}^{2}(\Theta _{i})]+\Omega (f_{t}) \end{aligned}$$

By transforming the sample set into a set of leaf nodes, the optimal model parameters and their corresponding predicted value can be determined by the optimization of the objective function \(T_{obj}^{t}\).

4.7 LightGBM regression

The calculation formula of the loss function in the LightGBM model is expressed as:

$$\begin{aligned} L_{t}=\sum _{j=1}^{J}\left[ G_{tj}w_{tj}+\frac{1}{2}(H_{tj}+\lambda )w_{tj}^{2}\right] +\sigma ^{J} \end{aligned}$$

For each number J of leaf node in the area sample, we fix the first and second derivatives \(G_{tj}\) and \(H_{tj}\), respectively. It should be kept in mind that \(w_{tj}\) are assumed to have the best value for each decision trees \(J^th\) leaf node. \(\sigma \) and \(\lambda \) are values defined by the user.

The information gained in the segmentation of each leaf node is written as:

$$\begin{aligned} Gain =\frac{1}{2}\left[ \frac{G_{j}}{H_{l}+\lambda }+\frac{G_{R}^{2}}{H_{R}+\lambda } +\frac{(G_{R}+G_{R})^{2}}{H_{L}+H_{R}+\lambda }\right] -\delta \end{aligned}$$

A more conventional way to evaluate the information gain is to split the points for each feature. However, this method was computationally expensive and required additional memory. For simplicity, we considered Gradient-based One-Side Sampling (GOSS). The information gain can be obtained by ranking the data instances in descending order. Hence, subject to a set of data instances, the top \(\alpha \times 100 \% \) data instances are assigned to a subset A so that the leading new subset B is correctly specified through random sampling. By way of explanation, we assign \(b \times \left| A^C\right| \) for our first complementary set \(A^c\). We closely follow Ke et al. (2017) and calculate the variance gain for each feature j at the point d denoted by \({\hat{\nu }}_{j}(d)\) as:

$$\begin{aligned} {\hat{\nu }}_{j}(d)=\frac{1}{n}\bigg (\frac{\left( \sum _{x_{i}\in A_{l}}g_{i} +\frac{1-a}{b}\sum _{x_{i}\in B_{l}g_{i}}\right) ^{2}}{n^{j}_{l}(d)} +\frac{\left( \sum _{x_{i}\in A_{r}}g_{i}+\frac{1-a}{b}\sum _{x_{i}\in B_{r}g_{i}}\right) ^2}{n^{j}_{r}(d)}\bigg ) \end{aligned}$$

with \(n_{l}^{j}(d)\) and \(n_{r}^{j}(d)\) referring to the number of nodes that we use on the left and right, respectively; \((1-\alpha )/b\) is a coefficient of gradient normalization; \(A_{l}\), \(A_{r}\), \(B_{l}\), and \(B_{r}\) are the subsets related to the subsets A and B. More detailed descriptions of the bundle features can be found in Fan et al. (2019).

4.8 Extra trees regression

Extremely randomised trees (ETR) also known as extra trees, are a typical class of ensemble learning methods that can handle both classification and regression predictions (Geurts et al., 2006; Song & Ying, 2015). To significantly increase the accuracy of the model, the ETR algorithm randomly selects the tree splits of the node.

The basic mathematical principles of the Extra-trees (ET) algorithm are illustrated in detail below: Let’s define the predictors as \( X = X_{1}, X_{2}, \dots , X_{n}\). Assuming continuous values for the target variable, we subsequently denote them by \(Y = Y_{1}, Y_{2}, \dots , Y_{n}\). Accordingly, for each number of observations n, there exists a feature variable f with t being a threshold. Let’s also define m and \(\alpha = (f, t_{m})\)as a node and candidate split respectively.

$$\begin{aligned} p_{(l)}(\alpha )= & (x,y)\big |x_{f} \le t_{m}\\ p_{(l)}(\alpha )= & (x,y)\big |x_{f} \succ t_{m} \end{aligned}$$

Without loss of generality, \(Pr(\alpha )\) is given as follows:

$$\begin{aligned} Pr(\alpha )=P/P_{l}(\alpha ) \end{aligned}$$

With these fitness functions, we can obtain the fitness criteria for the prediction accuracy as follows:

$$\begin{aligned} S(X)_{m}= & \frac{1}{N}\sum _{i=1}^{N}(Y_{i}-{\overline{Y}}_{m})^{2}\\ S(X)_{m}= & \frac{1}{N}\sum _{i=1}^{N}(Y_{i}-{\overline{Y}}_{m})^{2}\\ S(X)_{m}= & \frac{1}{N}\sum _{i=1}^{N}\left| Y_{i}-{\overline{Y}}_{m})\right| \\ S(X)_{m}= & 1-\frac{{1}{N}\sum _{i=1}(Y_{i}-{\overline{Y}}_{m})^{2} }{{1}{N}\sum _{i=1}(Y_{i}-{\hat{Y}}_{m})^{2}} \end{aligned}$$

4.9 Performance criteria metrics

To quantitatively measure the forecast accuracy of the established Machine learning models, several commonly used performance criteria were used in this paper, including Mean Absolute Error (MAE), Mean Square Error (MSE), coefficient of determination (R2), Root Mean Square Error (RMSE), and Mean Absolute Percentage Error (MAPE), where \(y_{i}\), \(\hat{y}_{i}\) and \(\overline{y}_{i }\) are the actual value, predicted output and mean output of sample set, respectively. n refers to the number of spectra used. Basically, if the forecast value is very close to zero, this indicates that the predictive ability of the model is high.

$$\begin{aligned} MAE= & \frac{1}{n}\sum _{i=1}^{n}\left| \hat{y}_{i}-y_{i}\right| \\ MSE= & \frac{1}{n}\sum _{i=1}^{n}(\hat{y}_{i}-y_{i})^2\\ RMSE= & \sqrt{\frac{1}{n}\sum _{i=1}^{n}(\hat{y}_{i}-y_{i})^2}\\ MAPE= & \frac{1}{n}\sum _{i=1}^{n}\frac{\left| \hat{y}_{i}-y_{i}\right| }{y_{i}}\\ R^2= & 1-\sqrt{\frac{\sum ^{i=1}_{n}(Y_{i}-\hat{Y}_{i})^{2}}{\sum ^{i=1}_{n}(Y_{i}-\overline{Y}_{i})^{2}}} \end{aligned}$$

5 Results and discussion

5.1 Copula-based dependence analysis

The statistical parameters denoting the qad value between the S &P ESG 500 and all independent variables are shown in Table 3. Our findings provide compelling evidence that several indices significantly influence the S &P ESG 500. Specifically, S &P ESG 500 can be predicted by INF \((q = 0.240, p < 0.001)\), REC \((q = 0.223, p < 0.001)\), and notably, MCCC with the strongest influence \((q = 0.299, p < 0.001)\), among others.

To illustrate the relationships between the variables based on qad, we calculated the asymmetric dependence in Fig. 3. A visual inspection of the data distribution revealed that the input variable selection was accurate. A correlation value closer to 1 suggests a strong positive dependence of the features on each other. Generally, values higher than 0.75 indicates strong asymmetry, while values between \(0.25\approx 0.75\) medium but significant dependance. As illustrated in Fig. 3, most of the input features have a strong and medium dependence on the dependent variable. Overall, it is important to note that copula-based dependence analysis presents a promising method to resolve the dependence problem, wherein the underlying relationships between the input and output features are governed by the complex interplay.

Table 3 The asymmetric dependence q(X, Y)
Fig. 3
figure 3

Asymetric dependance

5.2 Model predictive power

In this section, we assess the predictive performance of various models in relation to future carbon returns by focusing on the impacts of climatic factors. We employed an array of advanced machine-learning techniques, including ET, LightGBM, XGBoost, BBR, ENR, LASSO, and LR. Through a meticulous comparative analysis, we juxtapose the outcomes of a traditional model with an innovative approach that incorporates climate change variables. Our study developed four distinct models.

  • Model 1: This conventional model relies solely on market features and omits physical and transitional risk factors.

  • Model 2: Model 2 integrates additional market features with the Physical Risk Indicator (PRI).

  • Model 3: Model 3 leverages supplementary market features in conjunction with the Transition Risk Indicator (TRI).

  • Model 4: This model incorporates climate change risk variables (PRI and TRI) and combines them with other market characteristics.

The primary objective of this comprehensive analysis is to gain valuable insights into the influence of climatic considerations on the predictive accuracy of ESG price forecasting. Table 4 presents the prediction performances of the four models using a sample size of 20%. The models were evaluated based on key metrics, including the Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), R-squared \((R^2)\), Root Mean Squared Logarithmic Error (RMSLE), and Mean Absolute Percentage Error (MAPE). In Model 1, in which only market features are employed, we observe that predictive performance varies across models. ET exhibited the lowest MAE and RMSE values, indicating its superior accuracy in capturing future carbon returns. Model 2 introduces a Physical Risk Indicator (PRI) along with market features. This addition generally leads to improved predictive accuracy across models, with ET and LightGBM standing out as the top performers. Model 3, which involves the Transition Risk Indicator (TRI) in combination with market features, showcases similar trends with enhanced predictive capabilities, particularly demonstrated by ET and LightGBM. Model 4, the incorporation of both PRI and TRI in conjunction with market features, yielded consistently improved performance. ET and LightGBM continued to lead the pack, reaffirming their effectiveness in capturing the influence of climate risk variables.

These results underscore the significance of integrating climate change variables into predictive models. Models that incorporate PRI and TRI (Model 4) consistently outperform models that rely solely on market features (Model 1). This finding suggests that climate factors play a substantial role in predicting ESG stock market prices. The presence of both physical and transition risk indicators contributes to more accurate predictions, which is aligned with the growing emphasis on climate-related risks in financial predictions. Furthermore, the strong performances of ET and LightGBM across all models underscore their robustness in capturing the complex relationships between market dynamics and climate. These models showcase their adaptability to the inclusion of additional variables and position them as valuable tools for ESG price predictions. In conclusion, this analysis demonstrates the pivotal role of climate risk factors in enhancing the predictive power of ESG price prediction models. The integration of such factors, particularly the PRI and TRI, provides valuable insights for investors and stakeholders aiming to navigate the implications of climate change on financial markets.

Table 4 prediction performance

To deepen our analysis, we implement the predictive performance improvement method proposed by Gong et al. (2023). The percentage differences in the metrics MAE, MSE, RMSE, R2, RMSLE, and MAPE between the different models are shown in Table 5 and Table 6. The overall performance of all models improved when the climate change risk variables (PRI and TRI) were integrated into the analysis (Model 4). The results demonstrated the positive impact of incorporating climate change risk factors into ESG price prediction models. The improvement in predictive performance was evident through the reduction in the error metrics (MAE, MSE, and RMSE) and the increase in the \(R^2\) coefficient, collectively indicating enhanced accuracy and a better fit of the models to the data. The most substantial improvement was observed in the LightGBM and XGBoost models, as indicated by their notably negative percentage differences. This indicates that these models benefit significantly from the inclusion of climate variables in their predictive frameworks. For example, the MAE metric reduced by 7.41% and the MSE metric reduced by 5.91%. The \(R^2\) increased by 6.25% for LightGBM when comparing Model 4 to Model 1, and by 8.06% for XGBoost for the same comparison. In conclusion, our findings suggest that incorporating climate change concerns can provide valuable information for predicting ESG prices.

Considering the superior predictive performance of the 10-fold cross-validation algorithm, we highlighted some characteristics generated by the 10-fold cross-validation method described in Fig. 4. As can be seen from the above subgraphs, the accuracy of the validation curve started at 41%, and after training, approximately 4000 training samples reached 40% accuracy. More importantly, the ability of the learning curve to predict these physical and transitional risk factors differed between models. For illustration purposes, we highlight other characteristics of these plots. From an examination of Fig.  4a, the training curve for Bayesian Ridge started with an accuracy of between 0.40 and 0.41, and after training, about 4000 training samples reached a value between 0.39 and 0.40. Also, the validation accuracy started from about 0.38 and 0.39; after training around 4000 samples, Bayesian Ridge got the highest accuracy of about 0.39. A similar result was obtained for the lasso and linear regressions (Fig.  4 d–f). There are, however, slightly different patterns for elastic net in Fig. 4b, where the training curves accuracy started from 29%, and after maintaining the same training sample of 4000, elastic net reached accuracy between 0.39 and 0.40. The accuracy of the validation curve for the lasso and linear regression was similar over 4000 training samples, with an accuracy of approximately 0.39. However, one category of models exhibited a different pattern. From an examination of Fig. 4c and e, it is clear that the training curve maintained higher constant accuracy while the validation curve started with an accuracy of between 0.65 and 0.70, and after training, it reached an accuracy above the 0.70 points.

Figure 5 shows the Residual plot diagram, we use the \(R^2\) assessment criteria where the number of green points shows the difference between the actual targets from the dataset and the predicted test data, and the blue points represent the difference between the actual and expected values. It can be observed in Fig. 5 that the residual learning distribution performed similarly for LASSO and linear regressions. From panels (d) and(f), the disparity between the Prediction error measures on test and train data range from \([0.382-0.397]\) and \([0.381-0.396]\) for LASSO and linear regression, respectively. However, a relatively different discrepancy is observed for LightGBM and the extra tree where the number of green and blue dots significantly increased with a cross-validation \(R^2\) is asymptotically approaching 100% compared to lasso and linear regression. As portrayed in these figures, the residual values on the X-axis increase around the Fitted values on the Y-axis. Indeed, as the number of residual points increase, we were able to observe a normal distribution. This might be due to the excellent training and testing accuracy for the lightGBM and extra tree.

Table 5 Differences percentage points
Table 6 Differences percentage points

5.3 Model interpretability based on SHAP

To provide interpretability and understand the contributions of each input variable, we utilized explainable artificial intelligence (XAI), specifically SHAP, as proposed by Lundberg and Lee (2017). SHAP is a valuable tool for interpretable machine learning based on cooperative game theory, which people employ to elucidate decisions and communicate with one another (Chen et al., 2022; Karbassiyazdi et al., 2022). The feature importance of the interpreting variable is used to enhance the interpretability of complex machine learning algorithms using the importance ranking and value distribution of the overall and local features. According to Lundberg and Lee (2017), features with significant fundamental SHAP values must satisfy both the consistency and accuracy requirements.

Figure 6 illustrates the effects of these features on each sample. The Y-axis represents an input feature, and the X-axis represents the corresponding SHAP value. The dot density represents the correct dispersion of the sample and the color represents the feature value. The influence and directionality of each predictors effects on the dependent variable (in this case, the S &P 500 ESG index) in the model can be determined based on the color distribution of the feature. In this study, the feature value size of each point on the SHAP summary plot is represented by different colors, where red and blue represent high and low SHAP values, respectively. It is evident from Fig. 6 that for inflation, the red dots (indicating high search volume for the term inflation) are predominantly situated to the right. This finding suggests that during times of heightened public concern about inflation, there is a positive effect on the S &P 500 ESG index. Conversely, the blue dots (indicating low search volume) are mostly to the left, which means that lower concerns about inflation lead to a decrease in the index.

It is worth noting that the average absolute SHAP value of the feature contributes to the model; thus, nudging the model improves its speed of model building. Red indicates that the contributions of the features were positive. For example, the red dots, which signify high search volumes for the term "inflation, are predominantly located on the right side. This indicates that, during periods of increased public interest in inflation, as evidenced by the surge in Google searches, there is a corresponding positive impact on the S &P 500 ESG index. Conversely, the blue dots representing low search volumes are primarily positioned on the left, suggesting that diminished concerns about inflation are associated with a decline in the index. This finding aligns with that of Antonakaki et al. (2017), who documented that inflation is widely acknowledged as a crucial macroeconomic indicator and is believed to be related to stock values.

The second most important variable in POL, blue points (indicating high search volume for "pollution") are predominantly to the right, suggesting that heightened concerns about pollution negatively affect the S &P 500 ESG index. Conversely, the red points on the left indicate that less concern about pollution may lead to an increase in the index. This result is consistent with previous studies by Capelle-Blancard and Petit (2019), Schuster et al. (2023), and Zhou et al. (2023), but contrasts with El Ouadghiri et al. (2021), who found a positive relationship between public attention to environmental issues and US sustainability stock indices. For the NSI, there was a significant linkage. Shapiro et al. (2022) underscore the importance of news sentiment in assessing economic sentiment. The impact of this variable on the ESG index highlights the influence of media sentiment on sustainable investment trends.

Additionally, a decrease in the REC value is associated with an increase in ESG stock prices. Furthermore, lower TRI values were associated with a higher likelihood of ESG stock prices. The TRI gauges the financial implications of the transition to a lower-carbon economy. Lower TRI values suggest that the perceived risks associated with this transition are reduced. This can reflect market confidence in companies’ abilities to adapt to sustainable practices and navigate the challenges of a shifting energy landscape. In the context of the literature, this aligns with Bouri et al. (2023), who find a negative impact of TRI on green bond. Moreover, our findings indicate a negative relationship between PRI and ESG stock prices. The association between an elevated PRI and a subsequent decrease in ESG stock prices underscore the profound interconnection between sustainable investment and the risks associated with physical climate change. This insight is relevant to our current landscape, where shifts in climatic patterns lead to amplified occurrences of physical hazards Li et al. (2023). Overall, our findings indicate that environmental issues significantly affect the volatility of different assets (Gong et al., 2023; Bouri et al., 2023). This is consistent with earlier research that highlights the predictive nature of climate risks concerning asset volatility (Wang & Li, 2023).

5.4 Robustness check

In this section, we analyze the robustness of the findings across many dimensions. Initially, we investigate the extent to which climate change risks in various subsets of the sample might provide additional insights for prediction of daily pricing of environmental, social, and governance (ESG) assets. Next, we examine the persistence of our findings with the inclusion of the newly introduced climate change concerns index proposed by Ardia et al. (2023).

In order to examine the potential of climate risks to enhance return prediction accuracy across various datasets, a test sample comprising 30% of the data was used. The prediction performance with 30% test samples for each of the seven machine learning models is shown in Table 7. Interestingly, the inclusion of climate change factors notably decreased both the MAE and MSE values, with ET, LightGBM, and XGBoost emerging as the best-performing models. Notably, their MAE and MSE values are lower than those of the LASSO and BRR, which further proves that these three models exhibit superior predictive performance for ESG assets.

Table 8 shows the percentage improvements as defined by Gong et al. (2023). The models incorporating climate PRI and TRI achieved the best performance, surpassing those that did not consider climate. For example, the MAE metric reduced by 8.07%, the MSE by 7.13%, the RMSE by 172.0% and 4.76%, and the R2 increased by 4.55%. These results generally indicate that incorporating climatic factors provides complementary information for forecasting carbon futures returns. Integrating climatic considerations into forecasting models can significantly enhance accuracy. This approach demonstrates that focusing on climatic factors can offer additional insights into predicting ESG prices.

Table 7 prediction performance
Table 8 Robustness Test: Differences (percentage points) between Models 1, 2, 3, and 4

The Media Climate Change Concerns (MCCC) used in this study, referred to as PRI and TRI, are derived from Ardia et al. (2023). This incorporation aims to bolster the robustness of our results, specifically in Model 5. This methodology was consistent with the previous prediction process. The sample was divided into two parts: with 80% allocated for training and 20% for testing purposes. As seen in Tables 9 and 10, we employed the same statistical indicators to assess the accuracy of our forecasts. After including climate-change concerns, the statistical indicators of the forecasts of the seven models improved. The percentage improvements shown in Table 10 confirm previous results by reducing MAE, MSE, RMSE, and MAPE, while increasing \(R^2\). These findings demonstrate that incorporating climate concerns (MCCC) enhance the prediction accuracy of ESG assets prices.

Table 9 Robustness Test: prediction performance using the Media Climate Change Concerns (MCCC) of 20% samples
Table 10 Robustness test: differences (percentage points) between model 1 and 5

6 Conclusion

This study explored the role of climate risk in projecting ESG. Specifically, we distinguish between two types of climate risk: physical risk, associated with the tangible impacts of climate change, and transition risk, related to adjustments made in response to a greener, low-carbon economy. Data analytics were conducted using advanced machine learning and explainable artificial intelligence methods. An analysis of data from January 2006 to July 2022 revealed an enhanced accuracy in predicting the values of Environmental, Social, and Governance (ESG)-related stocks when climate risk indicators, specifically the Physical Risk Indicator (PRI) and Transition Risk Indicator (TRI), are incorporated into the predictive models. Additionally, the application of explainable artificial intelligence, particularly through the utilization of SHAP plots, reveals the substantial influence of variables such as inflation, recession, and environmental pollution concerns on the S &P 500 ESG Index. Policymakers should consider integrating climate risk indicators, such as PRI and TRI, into risk assessment models. This integration enhances the accuracy of predicting the values of stocks associated with ESG factors. Consequently, it contributes to the development of a more resilient and robust financial system. Furthermore, investors need to endorse the integration of cutting-edge technologies, such as machine learning and explainable artificial intelligence, when evaluating financial markets. These technologies have the potential to provide deeper insight into the variables that influence ESG-related equities and the overall market.

Given the substantial influence of inflation, recession, and environmental degradation on the S &P 500 ESG Index, policymakers must formulate more robust regulatory frameworks. These frameworks should mitigate associated risks and foster sustainability and resilience in financial markets. Policymakers must allocate resources toward enhancing public knowledge and education about climate concerns and their consequent effects on financial markets. The presence of educated and knowledgeable citizens can facilitate the adoption of investment decisions that prioritize sustainability, contributing to the development of an environmentally conscious economy.

Future research should consider broadening the analytical framework to include a broader range of environmental, social, and governance variables and their interactions, thereby enhancing the competitiveness and practicality of the study outcomes. Moreover, further investigation is required to examine the use of contemporary and sophisticated analytical techniques in machine learning and artificial intelligence, aiming to increase the precision and interpretability of predictive models. The continuous development of Neural Networks and Deep Learning models may offer unprecedented insights.

Fig. 4
figure 4

Evaluation of learning curve with 10-fold cross-validation

Fig. 5
figure 5

Evaluation of Residual with with 10-fold cross-validation

Fig. 6
figure 6

The SHAP (SHapley Additive exPlanations) plot provides insights into the impact of each predictor on the S &P 500 ESG index. Red dots represent high predictor values, whereas blue points represent low predictor values