Introduction

Sustainable investing (SI) or what was originally called socially responsible investment (SRI) is a moral way of investing that emerged hundreds of years ago. At that time, SRI screening was adopted by religious groups such as the Jewish, the Muslim, the Quaker, and the Methodist (Schueth 2003). Nowadays, the essence of SRI is to incorporate non-financial factors such as personal values, social, and environmental concerns in investment decision-making. For an investment to be qualified as SRI, it has to comply with three principles: preserve the Environment, be Socially responsible (SR) toward people, and establish an environment enabling better corporate Governance. These three standards are qualified as the ESG criteria and represent the pillars of sustainable investing (Cunha et al. 2020), which is a broader concept as it also includes companies working to address climate change challenges.

In the past decades, with the intensification of the issues related to climate change, world poverty, and social movements, ESG concerns at the global level led to a growing interest in this responsible way of investing (Cunha et al. 2020). Therefore, SI is luring the attention of all economic stakeholders such as investors, shareholders, firm managers, financial market regulators, policymakers, and even consumers. Internationally, the concept of SRI was developed at the initiative of intergovernmental or regional organizations such as the United Nations (UN) through the Global Compact and the Environment Program, the World Bank through its recommendations to make companies aware of their social responsibility, the OECD through their guiding principles, etc. It is worth noting that SR commitment has different levels of obligations taking the form of recommendations or voluntary standards.

However, the emergence of SI raised a central and controversial question that attracted the financial community: can investors make profits while being SR and engaging in sustainable development activities? The literature studying the performance of SRI is controversial and can be divided into three main strands. The first strand of the literature analyzes the corporate financial performance (CFP) of SR companies. Most of them find evidence that there is a positive relationship between CFP and corporate social responsibility (CSR) (Hou et al. 2019). Flammer (2015) shows that close-call CSR proposals of the US firms from 1997 to 2012 are related to positive return announcements and higher accounting performance. However, other studies point toward a negative relationship between CSR and CFP. For instance, Gonenc and Scholtens (2017) show a negative relationship between CFP and environmental performance for fossil fuel firms. Recently, Ben Lahouel et al. (2022) note that the more relevant question is rather to identify the optimal CSR level and propose a model to identify this level. Friede et al. (2015) perform a meta-analysis on the CSR and CFP relationship by combining the results of 2200 studies. They find that the majority of studies report positive relationships, which are stable over time.

The second strand of the literature focuses on SR mutual funds’ financial performance. The results are also controversial. Soler-Domínguez et al. (2021) investigate 3920 SR mutual funds from across the world. They find a positive relationship between sustainability and financial performance. Gil-Bazo et al. (2010) show that SRI funds have better financial performance than conventional funds even after controlling for fees. López-Arceiz et al. (2018) explain the outperformance of SR funds with the cultural environment of the funds. Barnett and Salomon (2006) argue that the relationship between financial performance and the social screen is not linear and varies with the types of social screens used.

Finally, the last strand of the literature studies the financial performance of sustainable indices which are relatively new compared to SR mutual funds (Fowler and Hope 2007). The first ever created SR index was launched in 1990 by Ami Domini. Composed of 400 ESG-compliant US-listed firms, this index is currently known as the «MSCI KLD 400 Social Index». Nine years later, we witnessed the launch of the world’s first sustainability benchmark now known as the «Dow Jones Sustainability World Index». Ever since we have been seeing the launch of regional ESG indices such as the FTSE4Good regional indices in 2001. Several emerging countries proceeded to release their SR index to encourage firms to embrace ESG principles and disclosure rules, such as South Africa and Brazil in 2004 which was the first emerging market to release a SR index. It was only in 2019 that S&P DJI enlarged the universe of sustainability indices and launched its S&P ESG index series with a plethora of country and regional indices. Even though some emerging countries put considerable effort into launching their sustainable indices, the majority of developing countries are still struggling to do so essentially because of the lack of firms’ ESG disclosure. In the Middle East North Africa (MENA) region, the first ESG index named «SP/EGX ESG Index» was launched in Egypt in March 2010. The MENA SR index «S&P/Hawkamah ESG Pan Arab» was launched in 2011 by Hawkamah in collaboration with S&P and IFC. It is composed of the largest and most liquid companies listed on the national stock exchanges of 11 markets: Bahrain, Egypt, Jordon, Lebanon, Kuwait, Morocco, Oman, Qatar, Kingdom of Saudi Arabia, Tunisia, and the United Arab Emirates, subject to a liquidity screen. It was only in late 2018 that Morocco launched its ESG index «Casablanca ESG 10», while the first Gulf Corporation countries GCC country to launch a sustainable index was UAE in April 2020. The «S&P/Hawkamah ESG UAE Index» includes 20 listed ESG companies.

Unlike the US where sustainable indices and funds first emerged, the European Union (EU) has been known for its leading role in promoting CSR principles. Indeed, in the US, disclosing non-financial information and adopting CSR strategies and principles have been a voluntary company decision. However, the EU, through the European Commission (EC), has been aiming to promote an inclusive and sustainable economy through a more formalized and regulated approach for CSR. The history of CSR in Europe has gone through different stages and has begun since the 1950s when the concept was specifically defined for the first time (Latapí Agudelo et al. 2019). In 2011, the EC adopted a new CSR strategy aiming to encourage companies to adopt CSR principles and to embed SR into companies’ strategies. Since CSR reporting was voluntary and EU countries were not bound to implement CSR strategy, the disparities among EU countries were noticeable. Only 15 out of 27 EU nations implemented national policy frameworks promoting CSR principles and investing. France was the first EU country to enforce non-financial reporting in 2001 and make it subject to verification in 2010. A key date for sustainable development in Europe was 2014 when the 2014/95/EU directive emerged and made non-financial reporting mandatory for large companies starting from the 2017 fiscal year (Leyens 2018). In April 2021, the EC proposed a new CSR directive aiming to make CSR reporting mandatory from 2023. All these efforts put the European nations on the top of the list of Respeco classification of countries applying CSR principlesFootnote 1.

As the efforts to enhance sustainability in Europe grow, an important question is of much interest: how do European ESG indices perform against their conventional counterpart after all the efforts and measures undertaken by the EU in matters of CSR in Europe? Most of the previous studies, such as Cunha et al. (2020), mainly used static performance measures to analyze the performance of sustainable indices compared to their conventional benchmarks. However, it is well documented that model coefficients are time-varying, and performance may depend on stock market conditions. To overcome these problems, Tripathi and Kaur (2020, 2022) manually divided the full sample period into two subperiods corresponding to the bull and bear market conditions. To our knowledge, none of the previous studies used models that detect and allow for financial performance instability through time. This paper aims to fill this gap by analyzing financial performance using both static and dynamic performance measures. More precisely, it is the first paper that employs in this context the rolling window technique that allows for coefficient variation.

We use daily data on several European indices from their launching date to 2020. The performance analysis is implemented over the whole sample period and yearly employing static portfolio performance indicators, and dynamic performance analysis through the rolling window technique and the MS model. The static analysis shows that the sustainable indices are as performant as their conventional counterpart, in most cases. The Emerging Market (EM) Europe ESG Leaders index is less risky than the benchmark. However, the dynamic financial performance analysis reveals that CAPM alpha and beta are time-varying. They mainly depend on stock market conditions. Indeed, in high volatility market, risk-averse investors would be interested in investing in the ESG index since it reduces market risk. Moreover, when the market is more stable, the sustainable EM Europe ESG index offers better performance. These results are important for index providers, policymakers, and regulators as they will give some insights into the construction and usefulness of the SR index in the European case. Besides, the findings could help investors in decision-making in portfolio management, as an investor will invest in securities of ESG companies depending on the state of the market. Moreover, it provides practical implications through potential applications in systematic trading since the rolling window regression parameters can be incorporated into algorithmic trading strategies to optimize portfolio allocation, assess and manage risk, or improve trading signals. The information derived from rolling window regression parameters can also be useful to investors in defining their portfolio diversification strategies in systematic trading. Furthermore, it is possible to develop tools and platforms that offer real-time monitoring of rolling window regression parameters for investors and traders. Finally, this study is also important for academics and researchers since it extends the existing literature on SI performance.

The remainder of this article is as follows. Section “Literature review” reviews the literature on the financial performance of SR indices. Section “Data and methodology” describes the data and presents the methodology. Section “Results” describes and discusses the empirical results. Finally, section “Conclusion” concludes.

Literature review

The theoretical literature about SI puts forward three antagonistic doctrines. The first defends the underperformance of SI compared to conventional investments. The second supports the opposite thesis. A third doctrine defends neutrality. The underperformance of SI is based on the portfolio theory of Markowitz (1952) according to which SRI reduces the opportunities for diversification due to the selection and exclusion of securities which leads to a reduction in portfolio efficiency. The underperformance of SI is also explained by the monitoring costs, which are additional costs relating to filtering and control. In fact, selection and exclusion lead to more complicated and therefore more expensive asset management (Rudd 1981). Moreover, the neoliberal theory of Friedman (1970) indicates that SR firms could have a competitive disadvantage relative to rivals who do not engage in social practices: Through a screening process, SRI funds restrict their investments to those firms engaged in these costly social practices. On the opposite side, outperformance is explained by the ability of SI to generate value; according to the stakeholders’ theory, the better a firm manages its relationships with its partners, the better its financial performance over time (Freeman 1984; Donaldson and Preston 1995). Social responsibility is also a source of competitive advantage (Porter 1991; Porter and van der Linde 1995). SR firms are willing to allocate reasonable resources to maintain and strengthen a sustainable relationship with important stakeholders (e.g., increases employee satisfaction), which decreases transaction costs (e.g., decreased employee turnover) and leads to financial gain (Barnett 2007). Neutrality is illustrated by the fact that the positive effects of the SRI are neutralized by the costs associated with the screening process. Luther et al. (1992) and Bauer et al. (2005) find no evidence of significant differences between SRI and conventional investment performance. Investors can choose SI without being forced to sacrifice performance (Sauer 1997).

The empirical literature reveals a lack of consensus on the performance of SI compared to conventional investments. This could be explained by the diversity of the approaches used as well as the evaluation methods. The type of economy being developed or emerging may also be a culprit.

Empirical studies analyzing the performance of SRI have gone through large development from the early 1990s to the present day. They initially focused on the developed stock markets (Schröder 2007; Consolandi et al. 2009; Belghitar et al. 2014). Other studies focus on regional markets. Ur Rehman et al. (2016) examine the risk and return profiles of ESG indices and conventional composite indices of eight Asian countries from 2002 to 2014. Their results indicate that investors can achieve ESG targets and simultaneously have their portfolio performance not different from that of a conventional investment. Cunha et al. (2020) look at the performance of several sustainable Dow Jones indices (DJSI) including the following regions: Asia Pacific, Europe, emerging markets, and the US which they compared to their benchmarks, over the period 2013–2018. The analysis was carried out based on classical and modern portfolio measures. The results suggest that the performance of sustainable indices is still heterogeneous around the world, but there are opportunities for investors to achieve higher risk-adjusted returns in some regions while incorporating sustainable investment practices.

Several other papers focus on developing markets and also find conflicting results. De la Torre Torres and Enciso (2017) investigate SRI in Mexico over the period 2008–2013. Using the Sharpe ratio (SR), Jensen’s alpha, the multifactor market capitalization model, and a Monte Carlo simulation. The results show that the three indices IPCS, IPCcomp, and IPC have statistically equal average variance performance, suggesting that, in the long term, SRI is a good substitute for conventional investment. Tripathi and Kaur (2020) investigate the performance of SR indices in the BRICS countries by bringing out the contrasts in performance in these countries. They evaluate portfolio alphas and betas and various risk-adjusted measures of financial performance and find that the SR indices beat the market in almost all considered cases. All these studies use static financial performance measures. However, it is well documented that model coefficients are time-varying, and performance may depend on stock market conditions. To overcome this problem, Tripathi and Kaur (2020, 2022) divided the full sample period into two subperiods corresponding to the bull and bear market conditions.

The state-space model is an effective mathematical tool that is useful when the relationship is not stable over time. It is especially helpful when dealing with underlying states that cannot be directly observed (Brockwell and Davis 2009). The concept of two regimes is crucial for identifying and simulating various states or behaviors within a dataset. The MS model, a particular two regimes model, was originally developed by Goldfeld and Quandt (1973) and then employed by Hamilton (1989) to describe the US economic business cycle. It can be used to identify and simulate various states or behaviors within a dataset. Particularly two regimes MS model allows for identifying the transition between two market cycles (bull and bear states). Managi et al. (2012) use a MS to compare the performance distributions of SRI and conventional stock indices in the US, the UK, and Japan. Their results show two distinct regimes in both SRI and conventional indices for the three countries. Ortas et al. (2014) compare the performance of European SRI indices to their conventional benchmark using a state-space model. Their findings indicate that SRI indices are more sensitive to market cycles since their underperformance in periods of market downturn is more severe than their conventional counterpart. Using the same technique, Ortas et al. (2012) show that the risk and return of the SRI Brazilian index depend on the market cycle. Azmi et al. (2019) compare the performance of Dow Jones (DJ) Islamic, DJ sustainable, and DJ Islamic sustainable equity indices to their conventional benchmark. Using a MS model, they analyze the performance of these indices across different market regimes. They find that investors do not have to pay a price for investing in Islamic or sustainable equity indices. Combining Islamic and sustainability investing strategies is more rewarding, particularly during the economic boom, bullish equity markets, and subprime crisis periods.

Another regression technique allowing for time-varying coefficients is the rolling window technique, which is a statistical technique that assesses the parameter stability over time and the forecast accuracy of the model (Zivot and Wang 2003). The time-varying relationships are evaluated for time-series data by creating a moving window over the dataset. This technique captures changing dynamics, structural breaks, or regime shifts. In the context of portfolio optimization, Hwang et al. (2018) propose to explain the outperformance of naïve diversification by using a rolling sample approach. They find that for well-diversified portfolios, naïve diversification is superior, but results in a larger tail risk. Miralles‐Quirós et al. (2019) investigate the benefits of adding ETFs tracking companies interested in contributing to social development goals to stock-bond portfolios. They use a rolling sample approach and show that adding these ETFs can increase portfolio performance.

To our knowledge, our study is the first that performs both static and dynamic performance analyses using the traditional performance measures, the MS model that detects and analyzes different market conditions, and for the first time in this context, the rolling window technique that allows for coefficient variation.

Data and methodology

Data

In this study, we propose to analyze the financial performance of four MSCI regional European ESG indices, namely, MSCI Europe ESG Leaders, MSCI European Economic Monetary Union (EMU) ESG Leaders, MSCI European EM ESG Leaders, and MSCI Europe and Middle East (ME) ESG Leaders. We collect daily data from Datastream for each index and its conventional benchmark from ESG index availability date on Datastream to July 2020. All these indices components are selected using best-in-class approach and have the highest governance social and environmental ratings among the benchmark index constituents. The MSCI indices have a representative market value and share the same construction methodology. The EM Europe index is composed of companies from European emerging markets countries. However, the European EMU comprises firms from European developed market economies. Finally, the Europe index is composed of both types of countries.

Table 1 presents a description of the used MSCI ESG indices and their benchmarks. Table 2 exhibits the descriptive statistics of the used indices. This table indicates that the distributions of all return series are not symmetrical and that they are skewed left. They are leptokurtic and exhibit tails thicker than normal. The Jarque–Bera test rejects the normality. The LM test confirms the presence of a statistically significant ARCH effect for all indices. Moreover, the ADF stationarity test proves their stationarity in level. The indices’ returns appear to have close means and dispersions with ESG indices displaying slightly higher average values but slightly lower volatilities. Indeed, the median and volatility equality tests confirm that there is no significant difference between the mean and volatility of each ESG index with its benchmark, except for the MSCI Europe EM ESG index which displays significantly lower volatility than the benchmark.

Table 1 Indices description
Table 2 Descriptive statistics of return series

Static financial performance

To compare the performance of the sustainable indices to their conventional counterparts, we use several techniques. First, we perform a static performance analysis based on a comparison of classic risk and return indicators, as well as portfolio performance measurements. Then, we perform a dynamic performance analysis relying on the rolling window technique and the state-space MS model.

Portfolio risk and return

Using the index return at day t, computed as the first difference of two consecutive log index prices \(R_{t} = \ln \left( {\frac{{I_{t} }}{{I_{t - 1} }}} \right)\), we calculate the risk and return of the indices over the time period T as follows:

$$R_{{{\text{i,}}T}} = \mathop \prod \limits_{T} \left( {1 + R_{{{\text{i,}}t}} } \right) - 1$$
(1)
$$\sigma_{i,T} = \sqrt {\frac{1}{n}\mathop \sum \limits_{t = 1}^{n} \left( {R_{{{\text{i,}}t}} - \overline{R}_{{{\text{i,}}T}} } \right)^{2} }$$
(2)

where Ri,t is the return of index i in day t; \(\overline{R}_{{{\text{i}},T}}\) is the average return of index i over the period T; σi,T is the standard deviation of the index return in period T; and n is the number of daily return observations in period T.

Portfolio performance analysis

To analyze the ESG index performance, we rely on different portfolio performance measures such as Jensen’s alpha (Jensen 1968), Sharpe (Sharpe 1966), Treynor (Treynor 1965), information ratio (Treynor and Black 1973), and Sortino ratio (Sortino and Price 1994). These metrics are the most used by financial practitioners. The Jensen alpha measures the excess return of a portfolio over the security market line and focuses on non-diversifiable risk. We compute this ratio through a generalized autoregressive conditional heteroskedasticity (GARCH) capital asset pricing model (CAPM) to capture volatility clustering of financial time series. The Akaike Information Criteria (AIC) show that the Exponential GARCH(1,1) (EGARCH) model of Nelson (1991) is the best-fitting GARCH-type model. The general expression of the model is:

$$R_{{{\text{i}},t}} - R_{f,t} = \alpha_{i} + \beta_{i} \left( {R_{m,t} - R_{f,t} } \right) + \varepsilon_{i,t}$$
(3)
$$\sigma_{i,t}^{2} = {\text{Var}}(\varepsilon_{it} /I_{t - 1} )$$
(4)
$$\log \left( {\sigma_{i,t}^{2} } \right) = \delta_{i,0} + \delta_{i,1} \log \left( {\sigma_{i,t - 1}^{2} } \right) + \delta_{i,2} \left| {\frac{{\varepsilon_{i,t - 1} }}{{\sigma_{i,t - 1} }}} \right| + \delta_{i,3} \frac{{\varepsilon_{i,t - 1} }}{{\sigma_{i,t - 1} }}$$
(5)

where Ri,t–R,ft measures the excess return of the ESG index; αi represents Jensen’s alpha; βi measures the non-diversifiable systematic risk of the portfolio; and Rm,t–Rf,t is the excess return of the benchmark. \(\varepsilon_{i,t}\) is the error term; and \(\sigma_{i,t}^{2}\) measures the conditional variance of the error term.

The Sharpe ratio of index i over the period T (\(S_{{{\text{i}},T}}\)\()\) measures the excess return of a portfolio over the risk-free return per unit of portfolio risk (\(\sigma_{i,T}\)):

$$S_{{{\text{i}},T}} = \frac{{R_{i,T} - R_{f,T} }}{{\sigma_{i,T} }}$$
(6)

As the Sharpe ratio, the Treynor ratio of portfolio i over period T (\({\text{Tr}}_{i,T}\)) is also a performance measure that incorporates risk. However, it relies on systematic risk (\(\beta_{i,T}\)\()\) as a measure of risk rather than the standard deviation:

$${\text{Tr}}_{i,T} = \frac{{R_{i,T} - R_{f,T} }}{{\beta_{i,T} }}$$
(7)

The Sortino ratio of portfolio i over period T (\({\text{Sr}}_{i,T}\)\()\) is another risk-adjusted performance index. It assesses the excess return of a portfolio over the risk-free rate by a unit of the portfolio downside risk of the (\(D\sigma_{i,T}\)\()\):

$${\text{Sr}}_{i,T} = \frac{{R_{i,T} - R_{f,T} }}{{D\sigma_{i,T} }}$$
(8)

The information ratio of portfolio i over period T (\(I_{i,T}\)) measures a portfolio’s outperformance or underperformance compared to a benchmark by a unit of the relative risk:

$$I_{i,T} = \frac{{R_{i,T} - R_{B,T} }}{{\sigma_{{\left( {R_{i,t} - R_{B,t} } \right)}} }}$$
(9)

where \(R_{i,T} - R_{B,T}\) is the excess return of portfolio i compared to the benchmark B; and \(\sigma_{{\left( {R_{i,t} - R_{B,t} } \right)}}\) is the standard deviation of the daily excess returns of portfolio i compared to the benchmark (B).

Dynamic financial performance

Time-varying portfolio performance: the rolling window procedure

Used by several authors such as Hwang et al. (2018) and Miralles‐Quirós et al. (2019), the rolling window procedure is an estimation technique that allows checking the stability of the coefficients of a model using a sliding window. The idea is to choose a sample of “n” consecutive observations (window size n). Then, choose the number of increments between the successive rolling windows. Finally, estimate the model on each rolling window. The technique is usually used to back-test a statistical model on historical data: the rolling means (\(\hat{\mu }_{t}\)\()\) and variances (\(\hat{\sigma }_{t}^{2}\)) estimates at time t are computed using the most recent n observations.

$$\hat{\mu }_{t} \left( n \right) = \frac{1}{n}\mathop \sum \limits_{i = 0}^{n - 1} y_{t - i}$$
(10)
$$\hat{\sigma }_{t}^{2} = \frac{1}{n}\mathop \sum \limits_{i = 0}^{n - 1} (y_{t - i} - \hat{\mu }_{t} \left( n \right))^{2}$$
(11)

We use a rolling sample with a window size of 250 observations. Starting from t=250, we estimate the CAPM EGARCH model (Eqs. (3)–(5)), and then, we repeat this estimation by moving the sample period 1 day forward.

Portfolio performance measures at different regimes: state-space model

Financial markets often display periods of high volatility and periods of low volatility and are dependent on business cycles. The dynamics of asset returns are expected to be dependent on the state of the market, and the parameters of the estimation models could, therefore, change from one state to another. A dynamic model would then be more suitable. Therefore, we propose to estimate the relationship using the nonlinear model MS model proposed by Goldfeld and Quandt (1973) and Hamilton (1989). Inspired by Managi et al. (2012) and Azmi et al. (2019) approach, we consider a dynamic two-state MS model. Starting from the CAPM, we assume that at each point in time t, there are two possible regimes St (St = 1 or St =2) and that coefficients are state-dependent.

$$R_{i,t} - R_{f,t} = \alpha_{{S_{t} }} + \beta_{{S_{t} }} \left( {R_{m,t} - R_{f,t} } \right) + \varepsilon_{{s_{t} }}$$
(12)

where St: unobservable discrete state variable (St = 1 or St =2) at time t following a first-order Markov chain with transition probability from regime i in t − 1 to regime j in t is p(St = j/St-1 = i) = Pij, \(R_{i,t} - R_{f,t}\) t  is the return of the ESG index in excess of the risk-free rate; \(\alpha_{{S_{t} }}\) is Jensen’s alpha in state St, \(\beta_{{S_{t} }}\) is the beta in state St , \(R_{m,t} - R_{f,t}\) is the excess return of the Benchmark compared to the risk-free rate, \(\varepsilon_{{s_{t} }}\) is the independent and identically distributed (i.i.d.) standard normal random error for each St, and its variance \(\sigma_{st}^{2}\) is state-dependent \(\varepsilon_{{s_{t} }} 1 \sim {\mathcal{N}}\left( {0,\sigma_{st}^{2} } \right)\).

Results

Static financial performance

Table 3 presents the annual portfolio performance measures computed for the studied indices and their benchmarks (Panels A–D). The full sample figures show that Jensen’s alpha is positive but too small and insignificant. This result indicates that over the full sample period, none of the studied ESG indices outperform the benchmark. The alpha yearly estimations are also positive and insignificant, in most cases. This result suggests that the studied MSCI Europe sustainable indices and their benchmarks have almost the same yearly returns. These findings are partially consistent with Ur Rehman et al. (2016) who found no significant difference between the performance of the sustainable and the conventional indices for Asian countriesFootnote 2. However, they contradict Statman (2006) who studied the US indices, and Collison et al. (2008) for which sustainable indices for global, Europe, the UK, and the US exhibit better performance than conventional ones.

Table 3 Financial performance measures for MSCI ESG Leaders indices

Regarding risk, panels A, B, and D of Table 3 show that β is close to one in magnitude, but statistically different than one, indicating that the sustainable indices for Europe, EMU and Europe, and ME are almost as riskier as their benchmarks. However, panel C for EM Europe shows that the sustainable index is less risky than its benchmark over the full sample period. This result is also confirmed by the annual analysis. As far as the risk-adjusted returns performance measures, we note that the sustainable indices outperform their benchmarks in most years, except for 2012 and 2014 for Europe ESG and Europe EMU ESG indices and later for the other two indices.

Our results contradict some research that studied either developed or emerging markets. Lean and Nguyen (2014) investigate global and three regional DJSI indices relative to developed regions, namely, North America, Asia Pacific, and Europe over the period 2004–2013. They find underperformance of the sustainable indices except for the European region. Consolandi et al. (2009), who compared the Sharpe ratio of DJSI to DJ Stoxx600 over 1999–2006, find that the sustainable index underperformed its benchmark. In developed countries, Schröder (2007) also found that sustainable indices globally do not exhibit different risk-adjusted returns than conventional benchmarks. The same result is found by Cunha and Samanez (2013) who investigate the sustainable Brazilian index over the period 2005–2010 using Sharpe and Treynor ratios. Clark and Deshmukh (2014) also found that for the European case, conventional and sustainable indices had the same performance. Belghitar et al. (2014) compare the performance of four indices, of the FTSE4GOOD family, with that of conventional benchmarks. They conclude that in the European context, the SR and conventional indices have the same performance. It is worth noting that the previous research on the European case investigated periods before 2011. However, the documented outperformance in our research is in line with Cunha et al (2020), who studied several indices among which the DJSI Europe over a more recent period (2013–2018). The superiority of SR indices in this period could be explained by the efforts made by the EU to implement CSR strategy since 2011.

Dynamic financial performance

Time-varying portfolio performance: the rolling window procedure

Figure 1 displays the time-varying alpha and beta regression coefficients obtained with the rolling window CAPM-EGARCH model for the four index pairs. It is noticeable that the coefficients are unstable over the entire study period, but there is no evidence of a common pattern across the different indices. This finding confirms the nonlinearity of the relationship.

Fig. 1
figure 1

Rolling window regression parameters

Table 4 displays the annual average of CAPM-GARCH coefficients estimated using the rolling window technique. The positive Jensen’s alpha shown in the table indicates that the sustainable indices are more profitable than the conventional ones for most years, except for 2014 and 2017 for Europe, Europe EMU, and Europe ME ESG Leaders indices. These dates coincide with the adoption of the UE directive and its application, respectively. The betas are statistically and significantly different from 1 for almost all years and all indices. The beta magnitude is around 1, indicating that the ESG indices exhibit almost the same risk as their conventional benchmarks. The only exception is the EM Europe, suggesting that this ESG index is less risky than its conventional counterpart. Moreover, the highest ESG index performance is registered for EM Europe. Therefore, we can conclude that the EM Europe ESG index offers better performance and lower risk than its benchmark. It is worth noting that to our knowledge, this is the first study that examines the effect of the annual variation of the risk and return measures using the CAPM rolling window technique (Table 4).

Table 4 Estimation results of CAPM-GARCH rolling window technique

Portfolio performance measures at different regimes: state-space model

Table 5 shows the MS-CAPM estimation results. The sigma, measuring the standard deviation of residuals, gives an idea of the market volatility in different regimes. It allows distinguishing the regimes linked to the market cycle of high/low volatility. Table 5 indicates that the sigma estimations are both significant and significantly different across regimes for all indices, meaning that the studied markets are characterized by the presence of two distinct volatility regimes. Wald equality test indicates that beta coefficients are significantly different across regimes for all indices except for Europe ME. Whereas, the Wald equality test for alpha indicates that the performance of the ESG indices is not significantly different from their benchmarks across the two regimes. In the bullish market state (regime 1) defined by lower market volatility, the SR indices are significantly more sensitive to their benchmarks, except for the EM Europe index. While in the bearish market state (regime 2), the ESG indices are as profitable as the benchmark, but also less risky, except for the EM Europe index. Therefore, we can conclude that in times of high volatility, risk-averse investors would be interested in investing in the ESG indices since it allows them to reduce market risk, with almost the same returns. Moreover, when the market is more stable, the sustainable EM Europe index offers better performance. These results contradict Ortas et al. (2014) who study SRI performance and risk in the European context using a state-space model. They find that SRI indices are riskier in periods of market downturn.

Table 5 Result of dynamic regressions: MS-CAPM

Conclusion

This article aims to study the financial performance of several ESG MSCI European indices relative to their respective conventional benchmarks. We investigate the financial performance through both time and over different market conditions using both static and dynamic financial performance measures. We use daily data for several MSCI Europe ESG Leaders indices over the period 2007–2020. First, we conduct a static performance analysis by comparing indices returns and performance measures yearly over the full sample period. Results show that sustainable indices are as performant as the benchmark. The EM Europe ESG Leaders index is less risky than the conventional index. Besides, it offers the highest performance with respect to the benchmark. Second, we carry out a dynamic performance analysis. This analysis shows that Jensen’s alpha and beta are time-varying. They mainly depend on stock market conditions. Indeed, in high volatility market, risk-averse investors would be interested in investing in the ESG index since it reduces market risk. Moreover, when the market is more stable, the sustainable EM Europe index offers better performance.

This work has methodological and managerial implications. Methodologically, the use of dynamic techniques to measure performance made it possible to perform an in-depth analysis of financial performance. From a managerial point of view, the results obtained could help in decision-making in portfolio management, as long as an investor knows how to choose to invest in securities of SR companies depending on the state of the market. For academics, our study contributes to the literature on sustainable investment and suggests that future research could benefit from assessing sustainable investment considering financial market cycles and dynamic model framework. Given the originality of the main idea of our research, this research could be extended in several ways. For instance, it would be interesting to investigate how the rolling window regression parameters could be used in dynamic asset allocation strategies so that investors adapt their strategies to changing market conditions. Furthermore, how would it be possible to use machine learning techniques with rolling windows parameters to improve forecasting accuracy and predictive models for systematic trading?