1 Introduction

According to Tobin (1958); Hanoch and Levy (1969); Arditti and Levy (1975); Leland (1999), and Joro and Na (2006) the mean-variance frontier is only consistent with traditional utility theory if either stock returns are normally distributed or the utility function is quadratic in nature. In fact, since both assumptions are violated if returns are skewed, normality of returns is regarded as the only sufficient condition. Moreover, Ang et al. (2006) further this discussion by estimating the downside risk of equities; equities which are positively skewed should be regarded as more risky thereby providing higher returns. However, they estimate downside risk as market driven rather than asset specific.

Given that stock returns are not necessarily normally distributed (Lau et al. 1990; Turner and Weigel 1992; Campbell and Hentschel 1992; Arditti 1975; Mandelbrot 1963; Fama 1965; Press 1967; Praetz 1972; Blattberg and Gonedes 1974; Simkowitz and Beedles 1980; Smith 1981; Ball and Torous 1983; Kon 1984; So 1987; Gray and French 1990), investors need to be compensated for further moments (Kraus and Litzenberger 1976; Kane 1982; Ho and Cheung 1991). More specifically, the risk-return paradigm needs to be re-evaluated to account for skewness when pricing stocks and mutual funds.

We contribute to the literature by expanding the discussion on downside risk; that is, the extra risk associated with positively skewed returns. Unlike Ang et al. (2006), we develop a measurement for said downside risk as an asset specific measurement which is independent of overall market conditions. In this regard, we look for a measurement that can (1) depend on positive skewness which signals high downside risk, (2) be fund-specific in nature, and (3) be easily quantifiable; Data envelopment analysis meets all this requirements. Using DEA, we estimate an asset specific relative efficiency score which we expect can improve on current empirical asset pricing models. Our contribution works twofold: (1) we include efficiency scores in traditional empirical asset pricing model to test the hypothesis that investors do think about efficiency when investing, and (2) efficiency can help signal differences between investment assets, as it is the case of Socially and Responsible Investing assets.

The use of DEA allows for multiple inputs and outputs to be incorporated into a production framework while still offering a single ex-post performance index, thus allowing easy comparison between funds and even groups of funds. We use this to compare constrained and unconstrained investments. DEA estimates a production possibility frontier which envelops a production possibility set as tightly as possible. One can estimate a performance index based on the distance between a specific fund and the aforementioned frontier. Funds on said frontier are deemed 100% efficient.

Several studies have ranked the performance of mutual funds using DEA scores, but to our knowledge, such studies have been limited to only comparing efficiency scores among different assets [see for example Murthi et al. (1997), McMullen and Strong (1998), Morey and Morey (1999), Choi and Murthi (2001), Basso and Funari (2001, 2003, 2005), Galagedera and Silvapulle (2002), Gregoriou (2003, 2006), Haslem and Scheraga (2003), Chang (2004), Darling et al. (2004), Gregoriou and McCarthy (2005), Gregoriou et al. (2005), Gregoriou and Chen (2006), Eling (2006), among others].

Our study extends these findings by including efficiency scores in factor regressions. We establish an efficiency score as a subjective characteristic of a fund at a given month. The efficiency value is defined as \(0<1-P^{\textit{BCC}}<1,\) where 1 represents a fund which is 100% inefficient. Overall, our results suggest that the use of our efficiency score can help decrease average mispricing errors. We find that including the efficiency score in time-series regressions decreases average alphas from 18 basis points to 13 basis points in absolute values.

Furthermore, this study updates the literature by measuring the performance of smaller asset sub-samples that impose limiting rules on the available asset universe, as is the case of Socially Responsible Investing,Footnote 1 also known as Sustainable and Responsible Investing (SRI Investments are described in more depth in “Appendix”). This set of assets is self-restricted to investing only in specific types of companies, thus effectively influencing portfolio performance based on the mean-variance frontier. Hence, we expect that SRI holds more diversifiable risk. But the performance of these constrained assets has been debated in the literature without consensus (Markowitz 1952, 1959). Yet investment in such funds has increased during the last decades according to the Social Investing Forum.Footnote 2 We expand the literature by not only using the efficiency measurement to rank the funds, but also to measure the level of over- or under-performance.

SRI accounts for a small section of the overall American market, and as such, direct comparison with the asset universe will result in a size bias. To overcome this problem, we compare SRI with unconstrained investments as follows: first, given an efficiency score and a fund type, we rank funds in quintiles. Cross-tabulationsFootnote 3 of efficiency score quintiles for SRI and traditional funds show that there are more ethical funds within the low efficiency quintiles. There are 4.27% more socially responsible mutual funds in Q1 and Q2 compared to traditional investing, while there are 3.75% more orthodox funds in Q4 and Q5 compared to ethical funds.

Second, we include an ethical dummy in a cross-section regression, which includes the efficiency score, to test if efficiency has an effect on such asset sub-samples. We find that ethical funds receive a discount between 7 and 12 basis points in next period returns. Controlled for efficiency, this is taken as evidence that ethical mutual funds underperform traditional investment assets.

In addition, we provide evidence of a significant difference in performance pre- and post- crises. Based on a Chow test, we define 2 structural breaks in the data corresponding to the dot-com bubble and the real state bubble. Our findings show that alphas have increased from pre-crisis levels suggesting that mutual funds have enjoyed an extend period of abnormal returns following the structural breaks. Moreover, we provide evidence that although investors do seem consider efficiency when investing, this consideration was stronger for the pre-crisis period in comparison with the post-crisis periods.

Finally, it should be noted that our findings not only expand on the traditional empirical estimation of asset pricing models, they also provide the means of comparison between different asset classes, as exemplified by the use of SRI investing. The remainder of this paper is structured as follows. Section 3 develops the estimation of an efficiency score and its use under an investment strategy paradigm as well as describing the data. Finally, Sect. 4 provides concluding remarks.

2 Estimating Mutual Funds Efficiency: A DEA Approach

Portfolio performance is categorized by a mean-variance frontier, which results in a benchmark for comparison of the assets’ risk adjusted returns. In this regard, risk is proxied by the portfolio’s standard deviation [see Treynor 1965; Lintner 1965; Sharpe 1966; Jensen 1968; among others]. However, when the distribution is skewed, the risk-return relationship fails to account for the difference between the upside and downside risk. As positively skewed returns are inherently riskier due to the increased likeliness of a loss. Thus we look into an operational model which allows for multiple parameters of risk, allowing for the inclusion of higher moments.

Data envelopment analysis (DEA), as proposed by Charnes et al. (1978, 1979), focuses on an operational research methodology which allows the measurement of the relative efficiency of decision making units (assets) in the presence of multiple inputs and outputs. DEA ranks the efficiency of decision making assets and yields a single value which can be then easily assigned to the unit. In fact, with DEA, we can estimate a time specific score and assign that score as a characteristic of the asset, which we will call efficiency. In terms of investment assets, one can estimate monthly efficiency scores of each mutual fund and then compare them over time.

Several studies have analyzed the performance of mutual funds using DEA [see Murthi et al. (1997), McMullen and Strong (1998), Morey and Morey (1999), Choi and Murthi (2001), Basso and Funari (2001, 2003, 2005), Galagedera and Silvapulle (2002), Gregoriou (2003, 2006), Haslem and Scheraga (2003), Chang (2004), Darling et al. (2004), Gregoriou and McCarthy (2005), Gregoriou et al. (2005), Gregoriou and Chen (2006), Eling (2006), among others]. Our study extends their findings by incorporating the efficiency score into factor regressions.

We expect DEA to give a sound estimation of a performance index by incorporating multiple inputs and outputs simultaneously. DEA assumes a production framework in which one uses inputs to produce outputs—where inputs are defined as bearers of risk while outputs are bearers of wealth. Under normality of returns, only the variance and mean should suffice as measurements for risk and wealth. However, if distributions are non-normal, as it is often the case, then higher moments should be required (Glawischnig and Sommersguter-Reichmann 2010).

It is expected that different assets will have different statistical properties, such as skewed return distributions with fat tails, which makes relevant the inclusion of at least the third moment when characterizing the relationship between risk and return; it is thus paramount to check the normality of returns.

2.1 Data Specifications

We look at the performance of mutual funds listed in the Center of Research in Security Prices (CRSP). The data consists of monthly returns from January 1960 to September 2012. We also differentiate those funds that can be considered Socially Responsible Investments (SRI. We identify as SRI funds those that match the keywords “Ethical”, “Social”, “Socially”, and “SRI” within the fund name. This reveals that the first ethical mutual fund appeared in 1984, and thus we limit our sample to assets post-January 1984.

In order to use DEA, we require fund specific data based on the individual fund’s moments. This implies a restriction on the minimum number of observations per fund. We required funds to have a full trading history and a minimum of 3 years of existence. Finally, following Carhart (1997) we omit sector funds, international funds, balanced funds, and money market funds. The resulting dataset consists of an asset universe of 43,844 mutual funds, of which ethical funds account for only 142 funds.

Table 1 Monthly descriptive statistics of mutual funds

Table 1 shows summary statistics regarding ethical funds and orthodox mutual funds from 1984 to 2012. Simple inspection reveals that the maximum yearly average returns of SRI funds are smaller than orthodox funds, while average minimum returns are usually never as low as that of traditional funds. Intuition will suggest that these types of investments are less risky than traditional investments and thus their returns are less volatile. Table 1 shows that the standard deviation of returns for ethical investment is usually lower than that of traditional investment except for brief periods of time. Finally, average returns seem to be higher for ethical funds than for orthodox funds –not adjusted for risk.

2.2 Normality Testing

In order to measure normality, we use the Jarque Bera (JB) test statistic (Jarque and Bera 1980, 1981, 1987), formally:

$$\begin{aligned} JB_j =\frac{T}{6}\left[ {S_j^2 +\frac{K_j^2 }{4}} \right] \;\sim \;\chi ^{2}\left( k \right) \end{aligned}$$
(1)
$$\begin{aligned}&\text {where}\nonumber \\&S_j =\frac{E_j \left( {r_t -r_d } \right) ^{3}}{\sigma _j^3 },\text { the skewness of monthly returns of fund }j\nonumber \\&K_j =\frac{E_j \left( {r_t -r_d } \right) ^{4}}{\sigma _j^4 }-3, \text { the excess kurtosis of monthly returns of fund }j\nonumber \\&r_t ,\text { monthly return}\nonumber \\&r_d ,\text { average returns}\nonumber \\&\sigma _j ,\text { standard deviation. }\nonumber \end{aligned}$$

Normality is rejected when \(JB_j \) is larger than 5.99 for the 5% significance level and 9.21 for the 1% significance level. We find that at the 5% level, 20839 funds are non-normally distributed. That is, about 69.2% of ethical funds and only about 50.4% of traditional funds can be considered normally distributed. These results are summarized in Table 2 Panel A.

Table 2 JB statistics for the fund universe accounting the sample period from January 1984 to September 2012

It is, however, expected that the results could be bias towards non-normality due to sample interdependence.Footnote 4 Thus we bootstrap the returns to check how many funds will reject true normality under traditional \(\chi ^{2}\) values. We estimate three-year-rolling-period betas, which we then use to create normal returns that mimic fund performance. Given the fund’s specific \(\beta _{i,t} \), we estimate what should be regarded as normally distributed returns given by \(r_{i,t} =\alpha +\beta _i \left( {r_m -r_f } \right) +\varepsilon \) where \(r_m \sim N\left( {\mu ,\sigma _m^2 } \right) \text { and }\varepsilon \sim N\left( {0,\sigma _r^2 } \right) \).

Using standard \(\chi ^{2}\) values, we find that about 10% of the funds would reject normality with a 5% confidence level and 6% of funds would be rejected with a 1% confidence level. This proves the need of sample specific \(\chi ^{2}\) values. These are 9.83 for the 5% level and 24.83 for the 1% level. Under the new \(\chi ^{2}\) values, normality is rejected for a total of 18431 funds under a 5% confidence level, proving the need for moments beyond variance.

2.3 The DEA Model

We estimate fund-specific relative performance scores, which allow for ex-post ranking based on the data envelopment analysis. This value is estimated based on a 3-year monthly rolling window starting on January 1984. This results in the first efficiency score corresponding to December 1986 based on the time window January 1984 – December 1986.

Following Galagedera and Silvapulle (2002) and Glawischnig and Sommersguter-Reichmann (2010), we see the need for a DEA method that is input oriented with variable returns of scale (VRS). The use of this model can be justified because (1) constant returns of scale assume a constant ratio between inputs and outputs which is not consistent with the randomness of stock returns, and consequently mutual fund returns; (2) the assumption of variable returns to scale is further justified by the fact that alternative investment funds might operate in regions of increasing or decreasing return to scale due to, for example, minimum investment requirements or fixed cost digression; (3) input oriented models are preferred because the output-oriented models work under the assumption of how much the outputs would have to increase for the fund to become efficient while maintaining inputs constant, which is not feasible from a mutual fund manager’s stand point; and (4) Glawischnig and Sommersguter-Reichmann argue that the use of the BCC model is advisable whenever ratios are used as inputs or outputs. Evidence under input oriented with CRS can be seen in Lim et al. (2014) for the Korean market.

We, therefore, use the input oriented Banker, Charnes, and Cooper DEA model (BCC) under variable returns of scale. This DEA model estimates a production possibility frontier (PPF) to a production possibility set (PPS) that envelops said PPS as tightly as possible. When comparing the DEA frontier with standard portfolio theory (the Markowitz portfolio theory), the DEA frontier is essentially the same but with a different approach: the DEA frontier is the convex combination of best practicesFootnote 5 followed by industry, given the multiple hyperplanes resulting from the use of multiple inputs and outputs. Given that frontier, one can estimate a performance index based on the distance between a specific fund and the aforementioned frontier.

Formally, given \(j=1,\ldots ,N\) funds where each uses \(x_m\; \forall \;m=1,\,\ldots ,M\) inputs to produce \(y_r \,\,\forall \,r=1,\,\ldots ,\,R\) outputs, then the performance estimate for the \(k^{th}\) fund, \(P^{\textit{BCC}}\), is given by the solution to the linear programming (LP) problem:

$$\begin{aligned} P^{\textit{BCC}}= & {} \min \theta \;\;s.t.\nonumber \\ \sum \limits _{j=1}^N x_{mj} \lambda _j\le & {} \theta \cdot x_{mk} \quad \forall \,\,m=1,\,\ldots ,\,\hbox {M}\nonumber \\ \sum \limits _{j=1}^N y_{rj} \lambda _j\ge & {} y_{rk} \quad \forall \;r=1,\ldots ,\,\hbox {R}\nonumber \\ \sum \limits _{j=1}^N \lambda _j= & {} 1\;with\;\lambda _j \ge 0\quad \forall \,j \end{aligned}$$
(2)

The LP given in 2 represents the percentage of efficiency of each particular fund\(\,k\). Problem 2 minimizes the equiproportionate (radial) contraction \(\theta \) of the inputs produced by unit k. \(P^{\textit{BCC}}\) satisfiies \(0\le P^{\textit{BCC}}\le 1\) where \(P^{\textit{BCC}}=1\) represents a fund which is 100% efficient. That is, given the input-output combination which characterizes the best practice frontier, \(P^{\textit{BCC}}=1\) represents the \(k^{th}\) fund which is on the actual frontier and thus the fund is fully efficient. Any deviation from the PPF, given by \(P^{\textit{BCC}}<1\), will result in inefficiencies. Furthermore, the linear program uses exogenous weights, \(\lambda _j ,\) to fit the best linear combination of all funds given a specific input-output combination; \(\lambda _j \ne 0\) represents the best practice units that delimit the frontier. That is, the process only uses those inputs and outputs delimiting the frontier and disregards everything else.

2.4 Inputs and Outputs Description

The specification of the input-output model is given by the risk-return relationship. In this case, all risk measures are considered as inputs while all return measurements will be considered outputs. Following the literature, initial input candidates are given by the fund’s standard deviation, the lower partial moments (LPM), and maximum drawdown period, while the output candidates are the expected returns, the upper partial moments (UPM) and the maximum consecutive gain.

The partial moments, both upper and lower, are estimated as the mth root of the LPMs and UPMs described by Eq. 3, in percentages. While the LPMs capture the downside or risk of holding a specific investment asset, the UPMs will capture the upside or benefit of the investment. We use \(r_{min} \) as the mean return to differentiate between the downside and the upside of the investment strategy.

$$\begin{aligned}&LPM_{j,m} =\frac{1}{T}\sum \limits _{t=1}^{{\tilde{T}} } \left( {r_{min} -\underline{r}_{t,j} } \right) ^{m}\quad \forall \;m=0,\ldots ,4\nonumber \\&UPM_{j,m} =\frac{1}{T}\sum \limits _{t=1}^{{\hat{T}} } \left( {\overline{r}_{t,j} -{r}_{min} } \right) ^{m}\quad \forall \;m=0,\ldots ,4 \end{aligned}$$
(3)

where

\(r_{min}\), target rate

\(\underline{r}_{t,j}\), monthly return of fund j below target rate

\({\tilde{T}}\), number of returns of fund j below target rate

\(\overline{r}_{t,j}\), monthly return of fund j above target rate

\({\hat{T}}\), number of returns of fund j above target rate.

It is worthwhile to note that LPM0 accounts for the percentage of funds below the target rate and LPM1 accounts for the percentage of funds above it, so using both LPM0 and UPM0 will be redundant.Footnote 6 Thus using only one of them is sufficient. LPM0 is preferred to UPM0 because the risk measurement is regarded as more important.Footnote 7

Finally, the maximum drawdown period (MDP) is estimated as the maximum number of months that fund j has been below historically high net asset value (NAV). And the maximum consecutive gain is the maximum number of months fund j has been above the minimum target rate.

Basso and Funari (2003, 2008) further include an output measurement that takes positive values from 0 ... 3 based on the degree of ethical exposure. However, as this imposes a bonus for ethical investment which could bias the results in their favor, we avoid such a measurement.

Consistent with prior studies, a production framework that provides higher rewards for taking extra levels of risk is chosen, thus requiring inputs and outputs to be positively correlated. The correlation table (not reported) shows that only the inputs LPM0-LPM4 and MDP as well as the outputs UPM1-UPM4 are positively correlated. Further, the fourth moments, LPM4 and UPM4, are disregarded due to the extreme high correlations; LPM3 and LPM4 as well as UPM3 and UPM4 show a correlation greater than 0.999.Footnote 8 Still, using the third moment is enough to apply an extra penalty (bonus) to funds with high negative or positive skewness.

All inputs and outputs are estimated on a 3 year rolling window basis. We impose a restriction that each fund has to have a full history in NAV and returns to estimate its corresponding inputs and outputs. Since the first 36 observations are lost due to the estimation technique, future results are the average based on 309 months. Table 3 summarizes the descriptive statistics of the input-output specifications considered for the production framework used in further tests. Except for MDP, simple inspection shows that all inputs and outputs are usually higher for ethical funds. This is statistically significant only for LPM0LPM2 at the 10% level.Footnote 9

Table 3 Summary statistics of the proposed input–output specification used to estimate the DEA performance score under\(\,P^{\textit{BCC}}\)

It is worth mentioning that only orthodox funds achieve an average 100% on LPM0. This means that every month there are funds that have had only below average returns (\(LPM0=1)\) during the prior 36 months. Furthermore, the opposite is also true; only within traditional investment are there funds having all returns above average during the prior 36 months. Generally speaking, traditional investing seems perhaps more volatile than SRI.

The following subsection provides further analysis of each fund’s monthly relative efficiency, as given by its subjective efficiency value.

2.5 Estimating Efficiency

Since the number of mutual funds increases overtime, finding a feasible solution for all funds might not be possible; the LP might not converge in a solution, and thus values of \(P^{\textit{BCC}}=1\) might be misleading. This is corrected by setting the linear program with two passes, where each pass has a different starting value. \(P^{\textit{BCC}}\) is then recorded only if the solution converges to the same numeric value regardless of its initial seed.

We solve problem 2 based on the inputs and outputs discussed in prior sections. We calculate monthly efficiency values only when a fund has no missing observation during the immediate past 3 years. This results in a panel with monthly efficiency values.

As an application of these efficiency scores, we compare the performance of two different samples: the unconstrained investment universe and the constrained SRI universe. We rank funds based on their \(P^{\textit{BCC}}\) value and sort them into quintiles where Q1 is the lowest efficiency. Every month, we compute a cross-tabulation between the fund’s monthly score with a differentiation based on whether a particular fund is catalogued as a socially responsible investment or not. Every time we run a cross-tabulation we also perform a chi-square test for difference between groups (not reported). We store the number of funds per category and estimate a time series average number of funds per category. Table 4 reports the results.

Table 4 Reports the cross-tabulation of average monthly \(P^{\textit{BCC}}\) values for both ethical and unrestricted mutual funds

As expected, the number of traditional funds far outweighs that of ethical funds. Thus direct comparisons of the percentage number of funds per quintile are irrelevant to conclude difference in efficiency rankings. Instead, we look at the conditional probabilities of each quintile. It seems that traditional investment achieves the highest levels of efficiency. There is an average of 2.42% more traditional funds in the high efficiency quintile, Q5, than ethical funds.Footnote 10 Though traditional investments have a higher amount of funds in the lowest quintile, Q1, when Q1 and Q2 are taken together, socially responsible investment has 4.27% more funds in the lower efficiency categories than orthodox investments (mainly resulting from an average 5.39% more ethical funds in Q2).

Unlike Hamilton et al. (1993), Diltz (1995), Guerard (1997), Sauer (1997), Bauer et al. (2005), Goldreyer and Diltz (1999), who have shown that ethical investing is not different than traditional investing, we find that SRI is more heavily distributed towards the low efficiency scores. These results suggest that traditional mutual funds are more efficient than socially responsible investments.Footnote 11 However, a word of caution is advised given the presence of traditional funds in the very lowest quintile.

Having estimated an ex-post realization of efficiency, it remains unknown if investors take this into consideration when assessing their required rate of return. Though it is expected that highly efficient funds will have higher returns, it is uncertain if the efficiency score can be used as a predictor of future fund performance. The following sections looks at the explanatory power of the estimated efficiency scores and their plausible use as proxies for risk.

3 The Efficiency Score Explanatory Power in Time Series Regressions

Since ex-post estimations can still have an element of luck, we need to test whether knowing a fund’s efficiency score can produce an effective investment strategy. If investors care about fund efficiency, then it will become a valid component of a security’s pricing. Intuition suggests that investors should require higher rates of return for holding underperforming assets. We therefore argue that this will also be true for inefficient assets. We thus test the hypothesis that adding efficiency scores to would improve on the existing empirical asset pricing models; we expect that adding efficiency would decrease mispricing errors in CAPM and Carhart’s four-factor model.

Early attempts to fit the regressions resulted in a market beta well below 1. This would suggest a small correlation between mutual fund returns and the market excess returns. Further inspection revealed that a large number of small market beta funds rely on debt instruments.Footnote 12 Therefore, we include two bond factors to help explain the returns. These factors are constructed based on Fama and French’s (1993) suggested bond factors.

First, \(\textit{TERM}_t \) is expected to capture the term structure of bond yields, defined as the 10-year treasury security minus the 1-year treasury security. Second, \(DEFAULT_t \) is set to capture the default premium of bond yields, measured as the spread between BAA and AAA corporate bonds.

Finally, because \(0< P^{\textit{BCC}}<1\), we could define \(P^{BCC{\prime }}= 1-P^{\textit{BCC}}\) which measures total efficiency deviations rather than efficiency itself. This specification of \(P^{BCC{\prime }}\) guarantees that no penalty will be given for full efficiency, but inefficient funds will receive a 100% penalty. In other words, \(P^{BCC{\prime }}\) can be regarded as an efficiency discount measured as the percentage deviation from full efficiency. The equations are:

$$\begin{aligned} R_{i,t} -r_{f_t }= & {} {\hat{\alpha }} _t +{\hat{\beta }} _{1,t} \left( {R_{m_t } -r_{f_t } } \right) +{\hat{\beta }} _{2,t} TERM_t +{\hat{\beta }} _{3,t} DEFAULT_t \end{aligned}$$
(4)
$$\begin{aligned} R_{i,t} -r_{f_t }= & {} {\hat{\alpha }} _t +{\hat{\beta }} _{1,t} \left( {R_{m_t } -r_{f_t } } \right) +{\hat{\beta }} _{2,t} TERM_t +{\hat{\beta }} _{3,t} DEFAULT_t +{\hat{\beta }} _{4,t}\left( {P^{BCC'}_t } \right) \qquad \end{aligned}$$
(5)
$$\begin{aligned} R_{i,t} -r_{f_t }= & {} {\hat{\alpha }} _t +{\hat{\beta }} _{1,t} \left( {R_{m_t } -r_{f_t } } \right) +{\hat{\beta }} _{2,t} TERM_t +{\hat{\beta }} _{3,t} DEFAULT_t\nonumber \\&+\,{\hat{\beta }} _{5,t} SMB_t +{\hat{\beta }} _{6,t} HML+{\hat{\beta }} _{7,t} MOM_t \end{aligned}$$
(6)
$$\begin{aligned} R_{i,t} -r_{f_t t}= & {} {\hat{\alpha }} _t +{\hat{\beta }} _{1,t} \left( {R_{m_t } -r_{f_t } } \right) +{\hat{\beta }} _{2,t} TERM_t +{\hat{\beta }} _{3,t} DEFAULT_t\nonumber \\&+\,{\hat{\beta }} _{5,t} SMB_t +{\hat{\beta }} _{6,t} HML+{\hat{\beta }} _{7,t} MOM_t +{\hat{\beta }} _{4,t} \left( {P^{BCC{\prime }}_t } \right) \end{aligned}$$
(7)

The betas are estimated based on a rolling window of 36 months, requiring full trading data. After estimating \(P^{\textit{BCC}}\) and the betas, the starting date has moved to December 1989. The monthly market premium, \(\left( {R_{m_t } -r_{f_t } } \right) \), the monthly size factor, \(SMB_t ,\) the monthly book to market factor, HML, the monthly momentum factors, \(MOM_t \), as well as the monthly risk free rate all come from Kenneth French’s website. We estimate monthly excess returns based on said risk free rate. Table 5, Panel A, summarizes the descriptive statistics.

Table 5 Summarizes statistical information on the proposed regressors

As expected, there are funds which are completely efficient and those which are completely inefficient. The median efficiency score is 61.2% which is about 6% higher than the average score. This suggests a negative skewness of efficiency scores. As for the risk factors, TERM and DEFAULT are higher than all others, but also less volatile. All of Carhart’s four factors seem small on average, but highly volatile. Surprisingly, SMB has a negative median which suggests that the returns of big firms are larger than those of small firms. A positive average momentum factor suggests that prior winners continue to outperform prior losers.

Table 5, Panel B, shows the cross-correlations of the proposed factors. Since \(P^{\textit{BCC}}\) is fund specific, its correlation with all factors is estimated based on the pooled sample. One can immediately notice the moderate negative correlation between \(P^{\textit{BCC}}\,\) and the two bond factors, \(\rho _{P^{\textit{BCC}},{ TERM}} =-0.5513\) and \(\rho _{P^{\textit{BCC}},{ DEFAULT}} =-0.4562\). This suggests that bond yields decrease as \(P^{\textit{BCC}}\) scores increase. This could be due to investors requiring higher rates of return for holding inefficient bonds. This in turn should pass onto mutual funds in the same manner. This is more attenuated for long term debt of efficient assets which require lower compensation than long term debt of the less efficient assets. Moreover, both bond factors are positively correlated as expected.

Table 6 Comparison between time-series CAPM, Carhart’s Four Factor Model, and subsequent models than includes a fund specific characteristic,\(\,P^{\textit{BCC}},\) which measures its relative efficiency from December 1989 to September 2012

Table 6, Panel A, shows the time-series monthly average mutual fund beta, as well as the time-series t-stat, for the period December 1989 to September 2012. The use of \(P^{\textit{BCC}}\) seems to slightly increase the fit of the models, regardless of the original specification. \(R^{2}\) increases in both cases by about 2 percentage points, and even Adjusted \(R^{2}\) (not reported) increases by 1% in all cases. Further analysis between regressions shows that all factors are statistically significant, except for the maturity premium, which is only significant under the four factor model. Consistent with prior research, the use of the four factor model improves CAPM results. On one hand the value of \(R^{2}\) increases from 54.6% to 65.61%, and on the other, the average mispricing error, \(\upalpha \), decreases from \(-0.23\% \) to \(\,-0.19\% \).

After including the efficiency penalty, we find that the beta on \(P^{BCC{\prime }}\) is highly significant even at the 1% level. The results suggest that an average inefficient fund \(\left( {P^{\textit{BCC}}=0.5658} \right) \) will expect to receive about 72 basis points less than a fully efficient fund \((P^{\textit{BCC}}=1)\) under the modified CAPM version (MCAPM hereafter), and about 55 basis points less under the modified four factor model. More importantly, a one standard deviation increase, \({\Delta }{P}^{\textit{BCC}}=0.3068,\) will result in an increase of almost 50 basis points under MCAPM and 38 basis points under the modified four factor model.

Moreover, looking at Jensen’s alphas, the average mispricing errors decrease when \(P^{BCC{\prime }}\) is included to explain mutual fund returns. Under the MCAPM model, alpha decreases from \(-22.7\) to 22.5 basis points. While this difference is statistically significant \((t=12.04)\), it is of no economic significance. However, under the modified four factor model, the decrease is of 5.1 basis points which is of both statistical \((t=11.34)\) and economic significance.

It is worth noting that, as expected given the correlation between \(P^{BCC{\prime }}\) score and the bond factors, including the \(P^{BCC{\prime }}\) score in the regression has a direct effect on the betas for both bond factors. The beta for the term structure decreases from 0.0043 to \(-0.0038\) under CAPM, while increasing from \(-0.032\) to \(-0.005\) under the four factor model; the beta for the default premium increases from 0.19 to 0.27 and from 0.160 to 0.196 for both CAPM and Carhart’s model respectively. These mixed effects are quite puzzling; it appears that including the efficiency score in the CAPM captures the combined effect of size, book to market, and momentum factors.

Having established the appropriate use of the efficiency score as a factor of mutual fund returns, we now check whether the effect is time-specific. We perform a Chow test for structural breaks (not reported) which not surprisingly matches the biggest recessions within our timeframe: the dot-com bubble and the global financial crisis. The Chow test shows that the biggest inflection points are in December 2000 and March 2009. We therefore split the sample in three to account for all different periods. Table 6, Panel B through D, shows the time-series average monthly betas for the subsamples. T-statistics are reported in parenthesis.

All in all, prior results continue to hold true. Regardless of the time period, \(P^{\textit{BCC}}\) does provide statistical and even economic significance. Looking at MCAPM, the strongest effect is before the dot-com where an average efficient fund received a discount of 90 basis points. But during crisis periods, it seems like investors cared less about efficiency; after the dot-com the discount drops to 66 basis points, and it further drops to 23 basis points post 2009. The same holds true under the modified Carhart’s model with discounts dropping from 68 basis points pre dot-com, to 25 basis points post 2009.

It seems, however, that the explanatory power of the efficiency values is time-subjective. Looking at the individual sub-periods, alphas increased rather than decrease regardless of the period or model, except for the Carhart’s specification between 2000 and 2009. The results suggest that using the efficiency score shows a penalty applied to inefficient funds, but at the same time alphas increased in comparison. For example, pre dot-com, alphas increase by 90 basis points when adding \(P^{\textit{BCC}\prime }\) to the CAPM specification, but a penalty between \(\left[ {0,2\% } \right] \) was added at the same time. This suggest that efficient funds would have enjoyed a 90-basis-point premium while inefficient funds would have received a 1.1% penalty. The same holds true under the Carhart specification. But after 2009, alphas actually decrease by 1 basis point with a penalty between \(\left[ {0,54\,basis\,points} \right] \). This is consistent with investors not valuing efficiency in a post crisis period.

So far, we have shown evidence that efficiency seems to be a pricing feature of mutual funds. However, we have not provided any evidence that the efficiency scores have any forecasting power. We answer this question in the next section.

4 The Cross-Section of Mutual Funds Returns: Is \(P^{\textit{BCC}}\) significant for pricing future returns?

An important aspect of asset pricing is whether a specific model can accurately predict future returns. In such a case, the model underlies a feasible investment strategy. We thus estimate a second pass regressions at \(t+1\) based on the factor loadings from the prior 3 years. Given an asset pricing model \(R_{i,t} =r_{f_t } +{\hat{\beta }} _t \mathbf{X}_\mathbf{t} \) where \(\mathbf{X}_\mathbf{t} \) is a vector of risk factors, if the model has explanatory power, the returns at \( t+1\) will be given by \(R_{i,t+1} =r_{f_{t+1} } +\delta _{t+1} {\hat{{\varvec{\beta }}}} _{{\varvec{t}}} \), where \(\delta _{t+1} \) is the risk premium at \(t+1\). Also, the risk premium \(\delta _{t+1} \) should be equal to \(\mathbf{X}_{\mathbf{t}+\mathbf{1}} \). We thus estimate the empirical models:

$$\begin{aligned} R_{i,t+1} -r_{f_{t+1} }= & {} {\hat{\alpha }} _{t+1} +{\hat{\lambda }} _{1,t+1} {\hat{\beta }} _{1,t} +{\hat{\lambda }} _{2,t+1} {\hat{\beta }} _{2,t} +{\hat{\lambda }} _{3,t+1} {\hat{\beta }} _{3,t} \end{aligned}$$
(8)
$$\begin{aligned} R_{i,t+1} -r_{f_{t+1} }= & {} {\hat{\alpha }} _{t+1} +{\hat{\lambda }} _{1,t+1} {\hat{\beta }} _{1,t} +{\hat{\lambda }} _{2,t+1} {\hat{\beta }} _{2,t} +{\hat{\lambda }} _{3,t+1} {\hat{\beta }} _{3,t} +{\hat{\lambda }} _{4,t+1} {\hat{\beta }} _{4,t} \end{aligned}$$
(9)
$$\begin{aligned} R_{i,t+1} -r_{f_{t+1} }= & {} {\hat{\alpha }} _{t+1} +{\hat{\lambda }} _{1,t+1} {\hat{\beta }} _{1,t} +{\hat{\lambda }} _{2,t+1} {\hat{\beta }} _{2,t} +{\hat{\lambda }} _{3,t+1} {\hat{\beta }} _{3,t} +{\hat{\lambda }} _{5,t+1} {\hat{\beta }} _{5,t}\nonumber \\&+\,{\hat{\lambda }} _{6,t+1} {\hat{\beta }} _{6,t} +{\hat{\lambda }} _{7,t+1} {\hat{\beta }} _{7,t} \end{aligned}$$
(10)
$$\begin{aligned} R_{i,t+1} -r_{f_{t+1} }= & {} {\hat{\alpha }} _{t+1} +{\hat{\lambda }} _{1,t+1} {\hat{\beta }} _{1,t} +{\hat{\lambda }} _{2,t+1} {\hat{\beta }} _{2,t} +{\hat{\lambda }} _{3,t+1} {\hat{\beta }} _{3,t} +{\hat{\lambda }} _{5,t+1} {\hat{\beta }} _{5,t}\nonumber \\&+\,{\hat{\lambda }} _{6,t+1} {\hat{\beta }} _{6,t} +{\hat{\lambda }} _{7,t+1} {\hat{\beta }} _{7,t} +{\hat{\lambda }} _{4,t+1} {\hat{\beta }} _{4,t} \end{aligned}$$
(11)

Table 7 shows the time-series averages of the slopes of the month by month Fama-Macbeth type regression. Consistent with prior research, the average mispricing of future returns decreases under the four factor model. However, the use of an efficiency factor does not seem to improve results as \(R^{2}\) values do not seem to improve.

Table 7 Comparison between the cross-section performance at time \(t+1\) of CAPM, Carhart Four Factor Model, and the funds relative efficiency values given by the individual factor loadings estimated at time t given 3 years of prior information

Across models, it can be seen that both bond factors have no predictive power regarding excess returns. This is somewhat expected for the maturity premium since its beta is statistically insignificant. Yet the insignificance of the default premium is surprising. Moreover, average factors are 0.43%, 0.22%, 0.27%, and 0.32% for market premium, HML, SMB, and MOM. The estimation shows that only the factor loadings on the market and HML, with factor loading of 0.38% and 0.22%, are close to the averages.

Regarding efficiency, the estimation shows a magnitude problem, but the dimension seems to be accurate. Average \(P^{BCC{\prime }}=0.4372\) and average\(\,{\hat{\lambda }} _4 =0.0044\), significant at the 5% level, under the MCAPM specification and average\(\,{\hat{\lambda }} _4 =0.0045\), statistically significant at the 1% level, under the modified Carhart’s specification. This suggest that the efficiency score can indeed be a predictor of next period performance. Given an efficiency loading at\(\,T=t\), we can predict that \(r_{T=t+1} \) will decrease by \(0.0044{\hat{\beta }} _{4,t} \) in percentage terms. Finally, it’s worth noting that adding \(P^{{ BCC}'}\) actually increases average mispricing errors at \(T=t+1\).

Further analysis of different time periods provides interesting results. While alphas are small and insignificant for the pre-2000s period, they are large and highly significant after January 2001. This suggests difficulty pricing future returns under the rapid expansionary period following the dot com bubble and even the subsequent 2009 financial crisis. Clearly mutual funds have taken advantage of this long bull market.

Looking at efficiency, once again the there was a heavy discount before the dot-com. Average \({\hat{\lambda }} _4 =-0.0056\) under MCAPM, significant at the 10% level, and \({\hat{\lambda }} _4 =-0.0059\) under the modified Carhart version, significant at the 1% level. But post 2009, this effect drops to average premium of \({\hat{\lambda }} _4 =0.0013\) under MCAPM, insignificant at the traditional levels, and \({\hat{\lambda }} _4 =0.0011\) under the modified Carhart version, also insignificant at traditional levels.

4.1 DEA as an Ex-Post Efficiency Measurement: The Case Of Socially Responsible Investing

In prior sections, we stated that ethical funds do not seem to be as efficient as traditional unconstrained investing. However, it remains unknown if investors use this information to value ethical funds future returns. If they do, it is expected that the returns of ethical funds will be lower, on average, than those of orthodox mutual funds. In other words, there will be a relative underperformance of ethical investing.

To capture the effect of Socially Responsible Investing in future returns, we include a dummy variable to the estimations proposed in Eqs. 811. The dummy takes the value of 1 if a fund can be regarded as ethical and 0 otherwise. We include dummies not only for the intercept but for all factors. This allows to estimate two independent models while at the same time increasing sample size and improving standard errors.

Table 6 reports the intercept dummy for all periods. Looking at the whole sample, SRI funds would face a discount of 8 basis points under the MCAPM model, which decreases to 6 basis points under the modified Carhart version. This is consistent with our previous findings; Socially and Responsible Investments underperform unconstrained mutual funds.

Deeper analysis of the individual sub periods shows that the effect is somehow persistent through time. According to the MCAPM specification, average excess returns for ethical investments are 7 basis points lower for the pre-2000s era (not statistically significant), and its effect attenuates to an average of 9 basis points less after 2009 crisis. According to the four factor model, however, the returns of ethical funds are about\(\,12\) basis points before the year 2000, and zero after 2009. But it should be noted that the ethical dummy is not statistically significant across time and across models. It is only significant under CAPM post 2009, and under Carhart pre 2000s.

5 Concluding Remarks

The traditional Markowitz efficiency frontier describes the relationship between equities’ risk and return under the assumption that assets are normally distributed. That is, we only need the variance to know the appropriate return of a given security. In this paper we have extended this view to account for higher moments, including skewness. In this case, we estimate an efficiency score based on the best practices of decision making units as they adjust the level of risk (inputs) to achieve maximum efficiency. This follows the idea that investors will prefer highly efficient funds which minimize the risk that takes to provide a return. We measure efficiency as a non-parametrical measurement based on data envelopment analysis.

Though the use of an efficiency score might be enough for classification purposes, from an investment perspective, it does not bring any new opportunity to the field. Thus, we expand the literature on empirical asset pricing methods by testing the predictability of future mutual fund returns using efficiency scores. We show that the fund’s efficiency score alone can improve on known asset pricing models given that the absolute value of the average mispricing error is decreased throughout different model specifications. But more importantly, estimating a 3-year factor loading on efficiency scores can help predict future mutual fund’s performance.

Ultimately, following the specification of the efficiency score, we use the technique to differentiate between the performance of an unconstrained universe and that of a constrained subsample. Although intuition will suggest the constrained investment, by its own nature, would underperform unconstrained investments, consistent evidence has not been provided in the literature. This paper sheds light into this debate showing that, by using efficiency, the constrained investment does underperform.