1 Introduction

Theories and models in Exact Science as in Social Science can accurately reflect reality if their estimated value reflects the true value. However, the latter is often unobserved in practice and its accuracy varies depending on the measurement instrument used. If the measuring instrument is not exact, then this will result in a measurement error. Measurement error can be defined as the difference between the true value of a quantity and the value obtained by a measurement. Measurement error is known to include both random errors and systematic errors (or statistical errors). While the latter means that a measured value contains an offset, the former implies that a repeated measurement generates a measured value different from the previous value (Taylor 1999). A measurement error can have different sources: (i) the data collection and sampling method used, (ii) selection bias, (iii) the statistical procedure and variables used, (iv) the econometric models and tests, (v) an ill-suited survey design, etc.Footnote 1

Obviously, measurement error is a serious problem in both Rocket Science and Economics as it risks causing biased analyses and leading to incorrect conclusions. Measurement error can have severe consequences, and it is therefore essential to assess the quality of measurements in order to reduce such errors. While in Exact Science, instruments can be improved through experimentation, but it is a difficult and expensive process in Social Science.Footnote 2 Accordingly, various statistical methods have been introduced to deal with measurement errors.Footnote 3 In Economics, economists often make use of econometric tools and proceed through trial and error until they arrive at the best method or model. This requires estimating different specifications, essaying different calibrations, using different variables and data, and applying different methods and repeatability essays to validate an instrument.Footnote 4

Furthermore, it is worth noting that measurement uncertainty and measurement error are mostly important during periods of crisis than in periods of prosperity, as further errors would amplify losses for investors and the economic system as a whole. For instance, following the subprime crisis (2007) and the global financial crisis (2008–2009), several recent studies showed the high level of measurement errors in economic, monetary and financial data as well as in the methods and models under consideration. They suggest that the macroeconomic models, financial methods and economic policies used have failed to reflect the reality of the economic system over the last few decades. Their application resulted in significant errors that were sources of bankruptcy for Some well-known banks, major losses for a number of financial markets and investors, and a considerable risk that threatened to destroy the design of the economic system, etc.

Accordingly, a growing recent body of literature has pointed to several major measurement errors, and recommended urgent solutions to improve measurement processes. In this context, a new US institution, the Society for Economic Measurement (SEM),Footnote 5 was founded in 2013 by Carnegie Mellon University, the Center for Financial Stability and the University of Kansas to promote economic measurement research. Research led by its first President, Professor William A. Barnett (University of Kansans, USA), recommended using recent developments in econometrics, statistics and experimental economics to improve measurement. Indeed, Barnett et al. (2009) compared simple-sum monetary aggregates with the Divisia monetary aggregate indexes to identify measurement errors in monetary aggregates. Using a Markov switching factor approach, the authors found significant differences between these indexes which, interestingly, occur more significantly around the beginnings and the ends of economic recessions and during phases of high interest rate.

Barnett and Chauvet (2011) focused on the index number theory and showed that its use could help the Fed to identify a financial crisis and to reduce the misperception of systemic risk. Barnett (2012) also showed that obsolete monetary aggregates and data were often used, especially in economic systems with complex financial instruments, leading the Fed to make inappropriate decisions that prevented it from correctly assessing the excessive risk taken by homeowners and being able to control the systemic risk. He concluded that increased financial complexity and reduced data quality (by the application of a simple-sum monetary aggregate) resulted in a toxic mix.

In spite of the large numbers of papers that have investigated the issue of measurement errors in the monetary market, there are still a few studies on measurement errors in stock markets, even though the latter were at the centre of the recent global financial crisis, since several stock markets lost over 50 % in 2008. Almeida et al. (2010) pointed to the problem of errors-in-variables that make the standard estimators inconsistent due to an attenuation bias. Drerup et al. (2014) suggested measurement error in subjective expectations and showed that these errors may affect portfolio choice due to the implementation of heterogeneous decision-making rules.

The present paper also analyses measurement errors in stock markets. In particular, we examine the topic with a focus on financial performance to show that the performance evaluation of stocks is also subject to errors.Footnote 6 This choice is justified by the fact that performance evaluation is a key issue for investors as it provides them with a benchmark to decide on their investments and to define the level of risk they might take. A measurement error when evaluating the performance of financial assets can result in an incorrect analysis and severe losses, thereby negatively impacting on the representation of individual decision-making processes. Indeed, measurement error associated with performance evaluation is observed repeated as most performance ratios require strict hypotheses (normality, symmetry, certainty, restrictive measures for risk, estimate of risk premium, perfect expectations, etc.), while financial data do not usually allow these assumptions to be checked, which makes performance ratios obsolete and their conclusions inappropriate and incorrect. The literature says little about errors in financial performance measures as few papers have investigated the issue.Footnote 7 Indeed, Eling and Schuhmacher (2007) compared the ranking of 2763 hedge funds provided by the Sharpe ratio with that of twelve other performance ratiosFootnote 8 and showed that the ranking was unchanged despite significant deviation from normality. The authors concluded that it indicated the high performance of hedge funds and affirmed that the choice of performance measure does not affect their evaluation.

At the same time, in the aftermath of the global financial crisis, hedge funds were not just considered as responsible for the underlying problem but were also the crisis propagators, and they have since been sharply criticized. Moreover, several other, even well noted, financial assets and investments defaulted, resulting in major losses. Our explanation is that the financial performance evaluation methods for the different assets were measured with errors, including the subprime credits and housing assets. Indeed, at least three main arguments and criticisms can be addressed. On the one hand, computation of performance ratios requires a measure of the risk. However, as the latter is not observed in practice, it is often proxied, and an incorrect proxy for risk may lead to significant measurement errors. For example, while the Sharpe ratio refers to the standard deviation to proxy total risk, the Treynor ratio only focuses on the systematic risk that is approximated by the market beta. Thus, unsurprisingly, performance ratios can diverge and the question would be: what is the best performance ratio? On the other hand, most well-known performance ratios evaluate risk taking average and strong hypotheses (normality, symmetry, etc.) into account. Thus, they cannot include further extreme risk and are therefore less reliable if normality does not hold. Furthermore, to our knowledge, no performance takes further systemic risk into account, even though the recent global financial crisis clearly showed the high level of such risk. All in all, taking these limits into account, it seems likely that existing performance ratios provide poor information and somewhat unreliable conclusions.

Accordingly, this paper contributes to present knowledge by highlighting the significant measurement errors in stock markets when evaluating performance indexes in developed and emerging markets. To this end, we first, apply the usual performance ratios for both conventional and Islamic stock indexes and evaluate their performance under normal conditions. Thereafter, further asymmetry and extreme losses are taken into account, and we compute the performance using more robust ratios and identify contradictory conclusions with regard to those of standard ratios, suggesting further evidence of measurement errors. Finally, we focus on the systemic risk issue and show that the supposedly best-performing indexes are, at the same time, systemically the riskiest.

While checking for measurement errors, we indirectly check what would be the most appropriate performance ratio and it seems that taking further extreme risk and the systemic risk into account is helpful. Interestingly, our findings lead us to advocate the use of extreme-risk adjusted performance ratios (using the VaR and Conditional VaR as proxies) that are also robust to systemic risk (e.g., using the CoVaR and MES) in order to correctly evaluate financial assets and reduce stock market measurement errors.

This paper includes four sections. Section 2 presents the econometric methodology. The main empirical results are discussed in Sect. 3. Section 4 concludes.

2 Econometric methodology

This section presents both standard and robust performance ratios. It discusses the limitations of standard ratios and the most recent performance ratios, taking into accoun trobust measures of systemic risk.

2.1 First performance ratios’ generation

The idea of first performance ratios is to evaluate the performance of risky financial assets with regard to the relationship between their expected profitability-risk ratio and the return from a risk-free asset. Among the first ratios, we cite the well-known Sharpe ratio introduced by Sharpe (1964), initially used to classify American mutual funds.The simplicity of this ratio made it very popular as it enables financial performance to be evaluated using only the two first moments. Formally, it measures the return per unit of risk [Eq. (1)], and the higher the Sharpe ratio

\((S_i )\) of an asset i, the better the combined performance of “risk” and return.

$$\begin{aligned} S_i = \frac{E(R_i)-r_f }{\sigma (R_i)} \end{aligned}$$
(1)

where \(E(R_i)\) denotes the expected return of a risky asset i, \(\sigma (R_i)\) denotes the standard deviation considered as a risk proxy, \(r_f\) is the risk-free return.

However, this ratio requires normality distribution for stock returns. Accordingly, with an asymmetrical distribution and investors somewhat sensitive to such asymmetry, the Sharpe ratio is no longer reliable. In addition, when using standard deviation to proxy the risk, the ensuing risk measure includes both diversifiable (idiosyncratic) and non-diversifiable (systematic) risk. Thus, not only penalizes both the upside and the downside potential of portfolio return, but the ratio is also no longer appropriate since a rational investor is expected to select an optimal portfolio that rejects idiosyncratic risk.

As an extension, Treynor (1965) corrected this bias and substituted total risk by systematic risk. The latter is given by the market beta, noted \(\beta _i =\frac{\hbox {cov}(R_i , R_M )}{\sigma ^{2}({R_M })},\) that can be obtained from the market model, where \(\hbox {cov}(R_i , R_M )\) measures the covariance between the return of an asset \(i\,(R_i )\) and that of the market \((R_M )\), while \(\sigma ^{2}({R_M })\) measures the total market risk. The Treynor ratio corresponds to:Footnote 9

$$\begin{aligned} { TR}=\frac{E(R_i)-r_f }{\beta _i } \end{aligned}$$
(2)

As for Sharpe ratio, the higher the TR of an index i, the better its performance. However, while the TR is preferred to the \(S_i \), its main limitation is that it supposes the stability of the market beta which cannot be reliable (Ben Ameur et al. 2015).

Accordingly, Jensen introduced the Jensen’s alpha measure, which provides a performance measure more guided by its theoretical performance that is based on the CAPM (Capital Asset Pricing Model). Formally, Jensen’ alpha of an asset i, noted \(\alpha _i \), corresponds to:

$$\begin{aligned} \alpha _i = E(R_i)-\left\lfloor {r_f +\beta _i({E(R_M )-r_f})} \right\rfloor \end{aligned}$$
(3)

where \(E(R_M )\) denotes the expected market return.

In the same context, Modigliani and Modigliani (1997) proposed a risk-adjusted measure of portfolio performance, noted \(\hbox {M}^{2}\), which presents the advantage of adjusting the portfolio risk to the market risk. They derived it from the Sharpe ratio and expressed the performance in units of percent return. This is a measure of the risk-adjusted returns of some investment portfolios, meaning that it captures the portfolio returns, adjusted for the portfolio risk relative to that of the market. Formally, it corresponds to:

(4)

As for previous performance measures, the higher the \(\hbox {M}^{2}\), the stronger the portfolio performance.

The last two measures of the first generation performance ratios were introduced by Graham and Harvey (1994, 1996) and are noted \(GH_1 \) and \(GH_2 \) respectively. These measures help to overcome two drawbacks associated with S as they provide measures for performance while enabling the risk volatility asset to differ. They also allow the risk-free rate to be variable and at the same time correlated with the risky asset. These measures are particularly interesting as in practice there is a significant correlation between changes in interest rates (risk-free asset) and the short-term profitability of financial markets (risky asset). Formally, the measure \(GH_1 \) is determined by the combination of the benchmark portfolio and the risk-free asset and corresponds to:

$$\begin{aligned} { GH}_1 =E({R_i})-\left[ {R_f +\left( {\frac{\sigma _i }{\sigma _m }} \right) \left( {E( {R_m })-R_f } \right) } \right] \end{aligned}$$
(5)

\(\textit{GH}_2 \) is defined as the difference between the portfolio incorporating the risky asset and the risk-free asset and portfolio returns for the same market value volatility, and corresponds to:

$$\begin{aligned} \textit{GH}_2 =\left[ {R_f +\left( {\frac{\sigma _m }{\sigma _i }} \right) \left( {E(R_i)-R_f} \right) } \right] -E(R_m) \end{aligned}$$
(6)

2.2 Second performance ratio’s generation

Financial markets have experienced several major variations and corrections over the last three decades [the 1987 stock crash, the Internet Bubble (2000), the subprime crisis (2007), the global financial crisis (2008–2009), etc.] that resulted in great volatility excess and extreme losses. Furthermore, an analysis of the effects of the recent global financial crisis points to the importance of high systemic risk that stimulated the propagation of the crisis from the US to several developed and emerging financial markets. However, previous performance ratios were not adapted to take extreme losses and therefore the tail distribution risk into account, which led to several errors and incorrect notations for risky financial assets (i.e. subprime assets). In order to adjust performance ratios to extreme risk and better take the linkage increase between international financial markets into account, several extensions have been proposed that we classify in a second generation of performance ratios.

Dowd (2000) proposes an adjusted to extreme risk ratio similar to the Sharpe ratio but that proxies the risk by the Value-at-Risk (VaR) rather than total risk [Eq. (8)]. Let’s recall that the VaR describes the possible loss of an investment, which is not exceeded for a given probability and a given horizon.

Under the normal distribution hypothesis, the Value-at-Risk of an asset i corresponds to:

$$\begin{aligned} VaR_i(\alpha )=&-\left( {R_i^d +z_\alpha \sigma _i } \right) \end{aligned}$$
(7)

where \(z_\alpha \) is the \(\alpha \)-quantile of the standard normal distribution, \(R_i^d\) is the mean return, \(\sigma _i\) is the volatility parameter,

Regarding the estimation of the VaR,Footnote 10 the historical method is preferred in practice as it does not require any assumption of the distribution, which is determined from the historical data. However, as the normality distribution hypothesis is often rejected for financial data, Cornish–Ficher developed the Modified Value-at-Risk that has the advantage of taking further asymmetry (skewness excess) and leptokurtic character (Kurtosis excess) into account.

$$\begin{aligned} RVaR_i (\alpha )=\frac{E( {R_i })-R_f }{VaR({R_i })} \end{aligned}$$
(8)

While the VaR enables the probability of extreme risk to be taken into consideration, it has been subject to some criticism as it is neither convex nor sub-additive. To overcome these limitations, Martin et al. (2003) proposed another extension, the STARR ratio, whose performance measure is based on the Conditional Value-at-Risk (CVaR). This risk measure is more informative than the VaR as it evaluates the mean of large losses beyond VaR-level. The CVaR also satisfies certain plausible axioms of Artzner et al. (1999) and is considered as a coherent measure of risk in the literature (Pflug 2000). Formally, the CVaR is defined as:

$$\begin{aligned} CVaR_i(\alpha )= E\left[ {R_i \left( {R_i \le VaR_i (\alpha )} \right. } \right] \end{aligned}$$
(9)

where \(R_i\) denotes the return of a risky asset i.

Hence, the corresponding performance ratio becomes:

$$\begin{aligned} RCVaR_i(\alpha )=\frac{E( {R_i } )-R_f }{CVaR( {R_i })} \end{aligned}$$
(10)

Biglova et al. (2004) proposed a performance measure called the Rachev ratio, based on the ratio between average earnings and mean losses. The Rachev ratio is computed using the same idea as a CVaR risk measure, with the particularity that it focuses on the right-tail reward potential. Formally, the Rachev ratio corresponds to:

$$\begin{aligned} R-Ratio\left( {\alpha , \beta } \right) =\frac{CVaR_{1-\alpha } \left( {r_{risk{\text{- }}free\left( {benchmark} \right) } \hbox { }-r_{portfolio\left( {asset} \right) } } \right) }{CVaR_{1-\alpha } \left( {r_{portfolio\left( {asset} \right) } -r_{risk{\text{- }}free\left( {benchmark} \right) } } \right) } \end{aligned}$$
(11)

where \(r_{portfolio\left( {asset} \right) } \) the return of the portfolio or the index and \(r_{risk{\text{- }}free\left( {benchmark} \right) } \) the risk free return or the market return.

The higher the Rachev ratio, the better the performance. However, this ratio focuses on return distribution extremes and ignores most of the returns in the middle.

In the same context, the Darolles et al. (2009) L-performance measure is also recommended as it is robust to outliers and can accommodate thick-tailed distributions. Formally, it is based on the notion of trimmed L-moments, used in the analysis of catastrophic events such as extreme floods or low flows, and corresponds to:

$$\begin{aligned} L_{p, \alpha } = VaR_{R_i , 0.5} \times \left( {VaR_{-R_i , \alpha } -VaR_{R_i , \alpha } } \right) ^{-1} \end{aligned}$$
(12)

where \(R_i\) denotes the return of a risky asset i and \(\alpha \) the quantile.

Finally, to overcome the limitations of previous ratios, the Omega ratio [Eq. (13)], proposed by Keating and Shadwick (2002), provides a measure that incorporates all the moments of the return distribution including skewness and kurtosis. As financial assets do not show a normal distribution, the performance analysis based on their mean and variance may induce measurement and evaluation errors. Accordingly, the use of performance ratios that take the probability of extreme losses into account would help to improve the analysis of financial performance and the selection of optimal investments.

$$\begin{aligned} \Omega =\frac{\int \limits _r^b \left( 1-F\left( x \right) \right) \mathrm{d}x}{\int \limits _{a}^r F\left( x \right) \mathrm{d}x} \end{aligned}$$
(13)

Overall, the second generation of performance ratios contributes by taking further extreme losses into consideration. However, the matter of systemic risk is still not included, something that is more than required as, in the aftermath of the global financial crisis (2008–2009), the importance of systemic risk was noted to be responsible for major losses even for financial assets, investments and institutions with supposedly superior financial performance (compared to the previous performance ratios). Accordingly, besides the evaluation of the financial performance with these ratios, the consideration and the evaluation of the systemic risk in evaluating the financial performance is required to complete the analysis of financial performance and to minimize measurement and evaluation errors.

2.3 The measure of systemic risk contribution

With regard to systemic risk, this sub-section discusses two different measures that help to improve the evaluation of financial performance while taking the further contribution of systemic risk into account: (i) the CoVaR measure, and (ii) the marginal expected shortfall approach.

2.3.1 The CoVaR measure

Adrian and Brunnermeier (2011) proposed a measure for systemic risk using the conditional Value-at-Risk, called the CoVaR. If we consider two assets i and j, the CoVaR is defined as the VaR of one asset at some probability quantile that is conditional on another asset that is at its VaR level for the same quantile. In other words, while the VaR describes the possible loss of an investment, it does not exceed a given probability of \(1-\alpha \, P\left( {R_i \le VaR_i(\alpha )} \right) =\alpha ,\) where \(R_i \) denotes the asset return i . The VaR of asset j,  conditional on the event \(\left( {R_i =VaR_i(\alpha )} \right) \), is denoted by, \(CoVaR_{j\left( i \right. }(\alpha )=VaR_i(\alpha ),\) where \(\alpha \) is the quantile. Accordingly, we write:

$$\begin{aligned} P\left( {R_i\le CoVaR_{j\left( i \right) } (\alpha ) \left| {R_i=VaR_i(\alpha )} \right. } \right) =\alpha \end{aligned}$$
(14)

Thus, the contribution of the asset i to the risk of the asset j corresponds to:

$$\begin{aligned} \Delta CoVaR_{j\left( i \right) } (\alpha )=&CoVaR_{j\left( {R=VaR_i}\right) } (\alpha )-CoVaR_{j\left( {R=median_i } \right) } (\alpha ) \end{aligned}$$
(15)

where \(\hbox {CoVaR}_{j\left( {R=median_i } \right) } (\alpha )\) corresponds to the CoVaR when the market is not in stress.

By using the CoVaR and the \(\Delta CoVaR\) measures with regard to the Treynor ratio, we take into account the further dynamic impact of the world market risk on each index. Accordingly, our contribution consists of proposing the use of a dynamic measure to evaluate financial performance. Indeed, measuring systemic risk with the CoVaR captures the direct and indirect spillover effect based on the covariation between the distribution tails of specific indices and the benchmark index, while the use of the \(\Delta CoVaR\) enables us to incorporate new information at each time t provided by the new sample data sets. In practice, the estimation of the CoVaR measure is determined by the quantile regressor method proposed by Koenker and Bassett (1978), which has the advantage of not requiring any hypothesis regarding the returns distribution. Furthermore, the quantile regression method analyzes the impact on the distribution tails rather than on the average, as is the case with the Least Square method. It also allows us to resolve the problem of aberrant values of widely dispersed errors as the quantiles are less sensitive to further outliers.

As for the \(\Delta CoVaR,\) we use the sliding window technique, which enables us to capture the contribution of extreme market movements on the extreme risk of each index. This method allows us to focus exclusively on the distribution tails of each index while reproducing further spillover effect between the world market risk (asset j) and a given asset i . Accordingly, we proceed in different steps. First, we estimate the sensitivity of each specific index (i) relative to the benchmark index (j). Second, we estimate the contribution of each specific index to the systemic risk of the global market index.

2.4 The marginal expected shortfall approach

The expected shortfall (ES) risk measure, introduced by Artzner et al.(1999), refers to the VaR, but is more appropriate than the VaR as it is not affected by the problem of subadditivity. The idea consists of measuring the tail dependence in the return of market indices and considering the tail dependence estimates as a measure of systemic risk. Acharya et al. (2010) define the marginal expected shortfall (MES) as the marginal contribution of firm J to the expected shortfall of the financial system. In this paper, we adapt the MES Approach from the risk study of financial firms to the study of international systemic risk. This enables us to analyze the further contagion effect related to extreme price variation and to identify the markets with the highest contribution of systemic risk across time on other markets.

Formally, the MES for firm J is the expected value of the stock return \(R_J \) conditional on the market portfolio return \(R_i \) being at or below the sample p-percent quantile. Let C be a constant that corresponds to the level of market “tail risk” that we can choose, we can write:

$$\begin{aligned} MES(R_J , p)=E\left( {R_J \left| {R_i <VaR\left( {R_i , p} \right) =C} \right. } \right) \end{aligned}$$
(16)

The assets that contribute most significantly to the market decline are those with the highest levels of MES. These assets are the most likely candidates to be systemically risky.

In practice, we use the methodology proposed by Brownlees and Engle (2010) to estimate MES from market index i and the market index j . This approach consists of a multi-stage estimation. In the first step, GARCH models are used to generate conditional volatilities and standardized residuals. In the second step, the ADCC specification, which allows us to determine the conditional asymmetric correlation and then the estimates of the tail expectation, is determined by a nonparametric kernel estimation approach proposed by Scaillet (2005). The main advantage of this approach is that the estimator does not require any hypothesis regarding the distribution of the index returns. Furthermore, Brownless and Engle (2010) showed that this approach yields smooth estimates of the MES, implying that the dynamic MES estimates are not over sensitive to small changes in the systemic loss threshold level.

Formally, let \(r_{i, t} \) and \(r_{j, t} \) be respectively the return of two market indices, with:

$$\begin{aligned}&\displaystyle r_{i, t} = \sigma _{i, t} \varepsilon _{i, t} \nonumber \\&\displaystyle r_{j, t} = \sigma _{j, t} \varepsilon _{j, t} \end{aligned}$$
(17)
$$\begin{aligned}&\displaystyle r_{i, t} = \sigma _{i, t} \rho _{i, t} \varepsilon _{j, t} +\sigma _{i, t} \sqrt{1-\rho _{i, t}^2 \zeta _{i, t} }\nonumber \\&\displaystyle With: \left( {\varepsilon _{j, t} , \zeta _{i, t} } \right) \rightarrow F \end{aligned}$$
(18)

where \(\sigma _{i, t} \), \(\sigma _{j, t} \) and \(\rho _{i, t} \) are respectively the volatilities of the market index j and i and the conditional correlation between the two market indices. \(\varepsilon _{j, t} , \quad \zeta _{i, t} \) the disturbances are assumed to be independently and identically distributed. The joint distribution F allows for the random variables to be uncorrelated but not independent.

3 Empirical analysis

3.1 The data and preliminary analysis

The data includes conventional and Islamic stock indexes from developed and emerging markets, including the Dow Jones Developed Market (DJDM), the Dow Jones Emerging Market (DJEM), the Dow Jones Islamic Developed Market (DJIDM), the Dow Jones Islamic Emerging Market (DJIEM) and the Dow Jones Global Market (DJGM) as the benchmark index. While using the DJ indexes for both conventional and Islamic indexes, we did minimize the risk of heterogeneity in the data under consideration. Furthermore, this sample allows us to conduct a dual comparison for financial performance: Developed versus Emerging markets and Conventional versus Islamic markets. As for the risk-free interest rate \((r_f )\), we use the 90-Day Treasury Bill as a proxy. Data is weekly and covers the period April 2004–March 2014, which allows to evaluate financial performance in the context of the financial crisis as well as during calm periods (before July 2007). Interestingly, our sample covers several crises as well as stylized facts linked to the Gulf War, the subprime crisis and the recent crisis, suggesting further evidence of time-variation in stock markets and investors’ behavior. Stock indexes are closing prices, and all data are obtained from the Bloomberg database.

To get a preliminary idea of the performance of the different stock markets under consideration with regard to the benchmark index (the DJGM), we plot each index with the DJG index in Fig. 1 and note several results. First, all of the indexes seem to have undergone a major correction during the subprime and financial crises, while they appear to have performed better over the last few years. Second, the pegging of stock indexes to the DJGM is more marked for developed indexes than for emerging prices. Third, the spread between the DJGM and the Islamic indexes seems significantly reduced. Fourth, the strong time-variation of these indexes suggests that their performance also evolves over time and that they share a common trend by the end of the period, which may be associated with further systemic risk effect.

Fig. 1
figure 1

The dynamics of Dow Jones indexes

Next, we applied three unit root tests (ADF, PP and KPSS) and illustrated that all of the series are I(1). We then focused on stock return series, computed as a first logarithm difference of stock prices.Footnote 11 From Table 1, we note that all stock returns are close to zero on average, which possibly reflects the correction induced by the financial crisis. The DJEM shows the highest mean returns and standard deviation, which is in line with previous studies on emerging markets. The third and fourth moment statistics point to significant asymmetry and leptokurtic excess, while the strong rejection of normality confirms that the distribution queues are higher than those of normal distribution. Furthermore, the negative value for skewness may suggest further nonlinearity and non-normality in the data, which makes the use of some financial performance ratios inappropriate

Table 1 Descriptive statistics and normality test
Table 2 First generation performance measures for (a) the whole period, (b) sub-period 1 and (c) sub-period 2

3.2 Measurement errors in financial performance analysis

We computed the different financial performance ratios and analyzed their results. To better apprehend the evolution of the financial performance of stock indexes, we break down the sample under consideration into two subsamples: (i) sample 1: before the subprime crisis (April 2004–July 2007), (ii) sample 2: after the subprime crisis (August 2007–March 2014) and we recomputed these ratios. We reported the main results of the computation of the first generation performance ratios in Table 2a, b, c.

3.2.1 Financial performance analysis with first generation ratios

Our analysis over the whole period points to higher financial performance for the DJEM, while the DJIEM seems to show the lowest performance (Table 2a). After breaking down the sample into two subsamples to take the crisis effect into account, we show that before the crisis, the DJIEM is still the best performing index of all the first generation performance ratios, while the DJIDM supplants the other indexes after the crisis.

This conclusion is in line with several previous studies (Jawadi et al. 2014; Arouri et al. 2013; etc.) which show that the crisis had a knock-on effect on Islamic finance that became a form of hedging and refuge for investors. However, these results should be considered with caution as the data properties make the application of the first generation performance ratios inappropriate and its conclusions incorrect. Indeed, the data under consideration do not follow a normal and symmetrical distribution, which makes the reference to the Sharpe ratio incorrect. Furthermore, Trenor and Jesen’s conclusion may not operate as they assume stability of the market beta, while Fig. 1 highlights further evidence of time-variation. Finally, it is important to note that all of these ratios adjust return excess to risk (total or systematic) that is adjusted in the mean, but they do not take further extreme losses in the market into account, as observed in 2008. Accordingly, the application of these ratios may result in incorrect conclusions and measurement errors. To remedy this inconsistency, we also applied the second generation ratios for the full period and the two sub-periods.

3.2.2 Financial performance analysis with second generation ratios

We compute the second generation ratios and reported them in Table 3a, b, c. Accordingly, we obtain several interesting results. First, over the whole period, the DJEM seems to be the best performer according to the RVaR, the RCVaR and the OMEGA ratios, while the DJIEM is significantly better for the Rachev ratio. However, the DJDM performs better than the other indexes according to the L-performance ratio. Second, when we focus on the two subsamples, we note that the DJEM is preferred for the RVaR, the RCVaR, the OMEGA and the L-performance ratios, while the Rachev ratio gives preference to the DJIDM. With regard to the second subsample, the divergence between ratios is more explicit (with the same ratio too, but for different levels of probability). Indeed, the DJDM is preferred only by the L-performance ratio, while the RCVaR preferred the DJEM. As for the VaR (1 %) and the Rachev ratio, the DJIEM supplants the other markets, while the RVaR (5 %) and the OMEGA ratios are higher for the DJIDM. Overall, these findings point to two interesting conclusions. On the one hand, it seems important and necessary to include further extreme losses and price variations when measuring financial performance as this enables us to take the asymmetry and high-tail distribution inherent to the data into consideration. On the other hand, the analysis based on the second generation ratios improves the previous analysis when using more robust financial performance ratios. For example, the Rachev ratio takes the specific properties of these indices into account, such as the fact that asset return distributions are fat-tailed and skewed, and also by taking coherent risk measure into account and focusing on some quantile of the distribution. The RVaR and RCVaR for (1 and 5 %) focus on extreme values of the historical distribution. Overall, this yields results that differ from previous analyses, suggesting further evidence of measurement and evaluation errors. However, the results of the second generation analyses are not unanimously conclusive as they vary per sample and according to the ratio under consideration.

Table 3 Second generation performance measures for (a) the whole period, (b) subsample 1 and (c) subsample 2

3.2.3 The systemic risk issue

Furthermore, while the ratios in Table 3 are more robust and adjusted to risk, none of them can take the systemic risk issue into account. The latter is even more important in that it can be transmitted rapidly from one market to another (as in 2008), and can consequently cripple the other markets, even if they show high performance. Accordingly, it is essential to take the contribution of systemic risk into account when evaluating financial performance, as recommended in the Basel III agreement. Before moving on to examine the systemic risk contribution, we illustrate the level of market linkages during calm and crisis periods while computing the unconditional correlation matrix that we reported in Table 4a, b, c. Accordingly, we note high unconditional bilateral correlation between all of the stock markets as well as with the global benchmark, indicating strong linkages between these markets, and suggesting further evidence of systemic risk between these areas. We also note that most correlations (even with Islamic stock markets) increase after the global financial crisis. However, once again, computation of this matrix also requires normality as it is based on the Pearson correlation test. The rejection of normality and leptokurtic excess in our data makes it less reliable, however.

Table 4 Correlation matrix for (a) the whole period, (b) subsample 1 and (c) subsample 2

In order to fill the gap, we check for systemic risk by investigating the sensitivity of our indexes to the global markets while carrying out a quantile regression estimation. This framework has a triple advantage. First, it provides a robust measure of the sensitivity of our indexes to the global market while investigating the quantile properties, enabling the high level tail of the distribution of our data to be taken into account. This provides a more robust beta measure from the quantile regression than that employed by the Treynor index. Second, this robust investigation indirectly enables us to evaluate the level of systemic risk between the markets under consideration and the global markets. Third, the quantile regression estimates are used to generate estimates of the \(\Delta CoVaR\) measure. We report the main results of the quantile regression in Table 5.

Table 5 Beta estimation with quantile regression method

From Table 5, we observe that all parameters are statistically significant. The DIEM is more sensitive to global market variations than the other markets, while the beta is higher than 1 for the two quantile values \(\alpha =1~\% \) and \(\alpha =5~\% .\) This suggests that the DJEM may amplify the global market risk, and it also suggests strong evidence of systemic risk transmission between this market and the benchmark market. We can also note that the standard emerging and developed index for the two quantiles \(\alpha =1~\% \) and \(\alpha =5~\% \) are more sensitive to the global market than the Islamic indexes. For example, the DJIEM presents a lower value of beta. Overall, this analysis confirms the hypothesis of systemic risk and points to a risk that financial performance ratio calculations might fail to take this systemic risk into account.

3.2.4 The measure of the systemic risk effect

Hereafter, in order to measure the contribution of systemic risk and identify which market appears to show greater systemic risk, we apply two different non-parametric approaches: (i) the \(\Delta CoVaR\) approach and (ii) the marginal expected shortfall. Both require dynamic analyses, but while the former only provides a threshold for losses, the latter enables us to measure the average losses for a given probability. In practice, for both approaches, our estimation covers two levels: \(\alpha =1~\% \) and \(\alpha =5~\% .\) We report the main results of the \(\Delta CoVaR\) in Fig. 2a,b, while Fig. 3a, b report those of the marginal expected shortfall.

In practice, we first apply the dynamic \(\Delta CoVaR\) for two quantiles \(\alpha =1~\% \) and \(\alpha =5~\% \) using a sliding window approach for the period from 04 / 2004 to 03 / 2014 such as the estimation interval at each time t is \([t-K, t]\) with K the time delay equal to two years as in previous studies. This choice gives us a long enough period for a distribution with sufficient information on the distribution tails. Accordingly, we note that both DJEM and DJIEM indexes show significant sensitivity toward the DJGM index, with the DJEM displaying higher dependency. The latter, considered by the first generation financial performance ratio as the best performing ratio, is the most systemically risky index too. In other words, this index is the source of the greatest systemic risk, involving major losses. This finding illustrates a form of measurement and evaluation error when the notation of this financial asset is provided by standard performance ratios. It is also important to note that the systemic risk apparent from Fig. 2a, b reached its maximum level in 2008, which implies that measurement errors are higher in crisis periods than in calm periods.

Fig. 2
figure 2

a Delta CoVaR for \(\alpha =5~\% \). b Delta CoVaR for \(\alpha =1~\% \)

However, the \(\Delta CoVaR\) method, even based on a dynamic approach and quantile regression, only computes the loss threshold. In order to compute expected losses in the mean, we use the marginal expected shortfall (MES) approach and reporte the main results on Fig. 3a, b. From these figures, we also note that the DJEM has the highest level of systemic risk as it is associated with the highest losses. It is also the most closely linked to the DJGM index. This means that, even when considering the evaluation of losses in mean with the MES approach, our findings identify the DJEM index as the most important systemic index. As for the \(\Delta CoVaR\) method, the findings from the MSE approach point to further measurement and evaluation stock market errors when considering usual performance ratios. These ratios overnote subprime credits, which, because of the rapid transmission of the systemic risk, led to a major global financial crisis in 2008.

Fig. 3
figure 3

a MES for \(\alpha =5~\%\). b MES for \(\alpha =1~\%\)

4 Conclusion

This paper focuses on a hot topic linked to measurement errors. Tackling it from a financial performance perspective, it shows that the use of first generation performance ratios can lead to incorrect conclusions. Indeed, these ratios are conditioned by strong assumptions associated with normality and symmetry, etc. The frequent violation of these hypotheses makes these ratios inappropriate. Accordingly, the recent financial literature identified a second generation of financial performance ratios, which have the advantage of not requiring such hypotheses and of being risk adjusted while taking further extreme losses into account. While these robust ratios improve the performance analysis and provide a more precise result, they are not suitable for taking further evidence of systemic risk into consideration. However, the systemic risk is considerable, especially during crisis periods. From an empirical point of view, the evaluation of systemic risk contribution with two non-parametric approaches over the period 2004–2014 shows that the DIEM index, which is considered as the best performing financial asset, appears to be the major source of systemic risk. In addition, DJIEM, despite being considered as a hedge area, seems to amplify systemic risk. These findings highlight the importance of measurement errors in stock markets due to the application of such financial performance ratios. These errors not only induce major losses and even bankruptcy for several banks, financial institutions and markets, but also lead to a global financial crisis and a major economic downturn for several developed and emerging economies. A further future extension of this research would be to develop a new financial performance ratio that incorporates the systemic risk dimension.