1 Introduction

Data envelopment analysis (DEA) is a nonparametric mathematical programming technique proposed by Charnes et al. [9] to measure the relative efficiency of homogeneous decision-making units (DMUs) under the consideration of multiple inputs and outputs. Traditional DEA models can only handle desirable inputs and outputs. Inputs need to be minimized, and outputs need to be maximized [1, 17, 22, 46]. However, there are frequently circumstances in real-life situations where some inputs need to be increased and some outputs need to be decreased to improve the performance of a DMU. For example, for a coal-fired power plant, \(\hbox {SO}_2\) emissions are an undesirable output that should be decreased, and the cost of reducing \(\hbox {SO}_2\) emissions is the undesirable input that should be increased to improve the eco-efficiency. Subsequently, some scholars transformed the values of the undesirable inputs or outputs based on a decreasing function to make them desirable, e.g., Koopmans [28], Lovell et al. [33], Scheel [42], Zanella [52] and Liu et al. [31]. Moreover, many other researchers also focus on the DEA models with undesirable inputs or outputs. For example, Liu and Sharp [32] regarded undesirable inputs as desirable outputs, or undesirable outputs as desirable inputs. Seiford and Zhu [43] used the undesirable outputs as positive desirable outputs by multiplying them by (\(-1\)) and using a translation vector. Färe and Grosskopf [16] estimated the DMUs’ efficiencies with undesirable factors based on directional distance function. Lozano and Gutierrez [34] handled undesirable outputs in a manner similar to handling the inputs in the DEA model with slacks-based measurements. Sueyoshi and Goto [47] discussed a unified treatment of desirable and undesirable outputs in the nonradial DEA models. Barros et al. [5] applied Russell directional distance function to consider the undesirable output. Toloo and Han\(\breve{c}\)lov\(\acute{a}\) [49] selected directional distance function to cope with multivalued undesirable outputs.

It is worthy to note that the above research work is based on accurate measurement of both the input and output data. However, inputs and outputs in real-world problems are often imprecise. Some researchers have proposed various fuzzy methods for dealing with imprecise data in DEA. Since the original study by Sengupta [44, 45], there has been a continuous interest and increased development in fuzzy DEA (FDEA) literature [2, 19, 25,26,27, 29, 37, 41, 50, 51]. In particular, to solve the FDEA models, many researchers have applied different approaches, such as the tolerance approach, the \(\alpha\)-cut approach, the fuzzy ranking approach and the possibility approach. Among these methods, the \(\alpha\)-cut approach is widely used and converts the inputs and outputs into the crisp intervals of various level \(\alpha\) standards. Hatami-Marbini et al. [20] used the \(\alpha\)-cut approach and presented a four-phase FDEA framework based on the theory of the displaced ideal. Chiang and Che [13] proposed a new weight-restricted FDEA methodology by applying the \(\alpha\)-level approach and the fuzzy analytical hierarchy procedure. Kao and Lin [24] constructed a pair of two-level programming models for calculating the lower and upper bounds of the \(\alpha\)-cuts where the input/output data are fuzzy numbers. Chen et al. [10] incorporated the \(\alpha\)-cut technique into the expanding model of fuzzy slack-based measurement to estimate efficiency scores in the Taiwan banking. Puri and Yadav [40] proposed a fuzzy DEA cross-efficiency model with undesirable outputs, which can be solved using the \(\alpha\)-cut approach. Mu et al. [38] used FDEA to account for multiple indicators simultaneously with the \(\alpha\)-cut approach. Table 1 presents a list of the above FDEA articles.

Table 1 FDEA literature

In the portfolio performance evaluation based on DEA methodology, it was not applied until Murthi et al. [39] proposed a DEA model for measuring efficiency of mutual funds. Later, Joro and Na [23] integrated a nonparametric DEA method with mean, variance and skewness to develop a framework for portfolio performance measure. Branda [8] proposed an efficiency evaluation approach based on traditional DEA models and considered portfolio diversification to identify the investment opportunities. Lim et al. [30] presented a DEA-based mean-variance cross-efficiency model for portfolio selection. Basso and Funari [6] constructed a social responsibility index and proposed some DEA models to evaluate the performance of socially responsible investment funds. More recently, Basso and Funari [7] discussed the role of fund size in the performance evaluation of mutual funds via DEA. Gouveia et al. [18] used the value-based DEA method with multiple criteria decision aiding to measure the Portuguese mutual fund portfolio performance. Zhou et al. [53] presented a segmented DEA approach based on data segment points to evaluate the portfolio performance with cardinality constrains. Chen et al. [11] proposed three variants of DEA models based on various risk measures to evaluate the efficiency of the fuzzy portfolio. Table 2 presents an overview of these literature.

Table 2 DEA on portfolio selection literature

1.1 Research Motivation

Uncertainty is a crucial issue for portfolio selection. In financial markets, many uncertain factors affect inputs and outputs. So, it is difficult to obtain the accurate prediction values of input and output data of the DEA model. Notably, the conventional DEA model is susceptible to perturbation in the input and output data. Therefore, in this paper, we deal with portfolio selection using FDEA cross-efficiency evaluation with coexisting undesirable inputs and outputs. Using FDEA, the present study effectively obtains cross-efficiency scores of all assets based on fuzzy input and output data (coexisting undesirable inputs and outputs). To handle the fuzzy input and output data, we apply the \(\alpha\)-cut approach, which considers the fuzzy input and output data corresponding to the different levels of confidence intervals. The cross-efficiency scores of the assets are used to obtain the cross-efficiency mean and semivariances to develop the mean-semivariance model for portfolio selection. The proposed portfolio selection model aims at minimizing portfolio risk (semivariance) constrained to some desired level of portfolio return. Several additional realistic constraints are considered, including the budget, cardinality, buy-in thresholds, and no short selling constraints. Since this kind of mixed-integer nonlinear programming problems cannot be efficiently solved by the conventional optimization approaches, GA is applied to solve the proposed model.

1.2 Novelty of the Proposed Approach

Although there have been a variety of studies on DEA-based portfolio performance evaluation, few researchers have proposed an FDEA model with coexisting undesirable inputs and outputs and then applied it to the portfolio selection problem. Thus, in this paper, we discuss the portfolio selection problem based on FDEA cross-efficiency evaluation, in which both undesirable inputs and outputs are considered. The significant contributions of the proposed research work are as follows:

  1. 1.

    Compared with the literature on DEA with undesirable inputs or outputs [5, 16, 28, 31,32,33,34, 42, 43, 47, 49, 52], the present study extends the framework of performance evaluation. Both undesirable inputs and outputs have been considered to address real-world decision situations better. A DEA cross-efficiency model is developed to solve performance evaluation problems with coexisting undesirable input and output data, in which a cross-efficiency technique is used to increase the discrimination power of the proposed model.

  2. 2.

    Compared with the literature on FDEA [2, 10, 13, 19, 20, 24,25,26, 29, 37, 38, 41, 50, 51], the present study extends the literature by considering both undesirable inputs and outputs. An FDEA cross-efficiency model is proposed coexisting both undesirable inputs and outputs, helping to handle the situations wherein some inputs need to be increased, and some outputs need to be decreased to improve the performance of a DMU.

  3. 3.

    Compared with the FDEA model with coexisting undesirable inputs and outputs [27], the present study extends the literature by incorporating cross-efficiency for the evaluation. The consideration of cross-efficiency helps to increase the discrimination power of the proposed model.

  4. 4.

    Compared with the existing work on portfolio selection with DEA [12, 30, 36], the present study extends the framework of assets evaluation by including uncertainty in DEA evaluation, i.e., FDEA is used for assets evaluation instead of DEA.

  5. 5.

    Compared with the existing work on mean-variance portfolio selection with DEA [12, 30, 36], the present study extends the literature by considering mean, and semivariances based on FDEA cross-efficiency scores of assets evaluation wherein both undesirable inputs and outputs are considered to address the real-world investment situations better.

Additionally, a thorough comparison of the proposed work with similar existing works was performed based on many critical attributes, see Table 3 for details.

Table 3 Comparison with similar works

1.3 Organization of the Paper

The remainder of the paper is organized as follows. In Sect. 2, a DEA cross-efficiency model with coexisting undesirable inputs and outputs is introduced. Section 3 presents the proposed FDEA cross-efficiency model with coexisting undesirable inputs and outputs. Then, the novel mean-semivariance model based on FDEA cross-efficiency is described in Sect. 4. In Sect. 5, the main steps of genetic algorithm are presented. Based on real-market data set, an illustration is provided to validate the proposed approach in Sect. 6. Finally, some concluding remarks are given in Sect. 7.

2 DEA Cross-Efficiency Model with Coexisting Undesirable Inputs and Outputs

In this section, we develop a DEA cross-efficiency model with coexisting undesirable inputs and outputs. In the proposed model, to improve the performance of DMUs, undesirable outputs should be minimized, and undesirable inputs should be maximized.

Assume that there are n DMUs to be measured. Each DMU consumes m different inputs to produce s different outputs. Let the observed desirable and undesirable input vectors of the jth DMU be \(X^g _j = (x_{1j}^g, x_{2j}^g, \ldots , x_{m_1j}^g)\) \((j=1, 2, \ldots , n)\) and \(X^b_j = (x_{m_1+1j}^b, x_{m_1+2j}^b, \ldots , x_{mj}^b)\), respectively. Here, \(x_{ij}^g\) \((i=1, 2, \ldots , m_1)\) represents the amount of the ith desirable input for the jth DMU, and \(x_{ij}^b\) \((i= m_1+1, 2, \ldots , m)\) represents the amount of the ith undesirable input for the jth DMU. Similarly, let the observed desirable and undesirable output vectors of the jth DMU be \(Y^g_j = (y_{1j}^g, y_{2j}^g, \ldots , y_{s_1j}^g)\) and \(Y^b_j = (y_{s_1+1j}^b, y_{s_1+2j}^b, \ldots , y_{sj}^b)\), respectively. Here, \(q_{ik}^g\), \(q_{ik}^b\), \(p_{rk}^g\) and \(p_{rk}^b\) are the cost of the ith desirable and undesirable input and the price of the rth desirable and undesirable output for kth DMU, respectively. In addition, \(\varepsilon_k\) represents infinitesimal positive value. To account for undesirable inputs and outputs, the DEA model in [30] is extended as follows:

$$\max E_k=\sum _{r=1}^{s_1}p_{rk}^gy_{rk}^g-\sum _{r=s_1+1}^sp_{rk}^b y_{rk}^b- \sum _{i=1}^{m_1}q_{ik}^gx_{ik}^g + \sum _{i=m_1+1}^mq_{ik}^bx_{ik}^b+ \varepsilon _k {\text {s.t.}} \sum _{r=1}^{s_1}p_{rk}^gy_{rj}^g-\sum _{r=s_1+1}^sp_{rk}^b y_{rj}^b- \sum _{i=1}^{m_1}q_{ik}^gx_{ij}^g + \sum _{i=m_1+1}^mq_{ik}^bx_{ij}^b+ \varepsilon _k\le 0,\quad j=1,2,\ldots ,n, p_{rk}^g\ge \frac{1}{(m+s) R^{g+}_r }, r=1,2,\ldots ,s_1, p_{rk}^b\ge \frac{1}{(m+s) R^{b+}_r }, \quad r=s_1+1,\ldots ,s, q_{ik}^g\ge \frac{1}{(m+s) R^{g-}_r }, \quad i=1,2,\ldots ,m_1, q_{ik}^b\ge \frac{1}{(m+s) R^{b-}_r }, \quad i=m_1+1,\ldots ,m.$$
(1)

The scalars \(R^{g-}_i\), \(R^{b-}_i\), \(R^{g+}_r\) and \(R^{b+}_r\) can be defined as follows:

$$R^{g-}_i=\max _{j=1,2,\ldots ,n}\{x_{ij}^g\}- \min _{j=1,2, \ldots ,n}\{x_{ij}^g\},\ i=1,2,\ldots ,m_1, R^{b-}_i=\max _{j=1,2,\ldots ,n}\{x_{ij}^b\}- \min _{j=1,2, \ldots ,n}\{x_{ij}^b\},\ i=m_1+1,\ldots ,m, R^{g+}_r=\max _{j=1,2,\ldots ,n}\{y_{rj}^g\}- \min _{j=1,2,\ldots ,n }\{y_{rj}^g\},\ r=1,2, \ldots ,s_1, R^{b+}_r=\max _{j=1,2,\ldots ,n}\{y_{rj}^b\}- \min _{j=1,2,\ldots ,n }\{y_{rj}^b\},\ r=s_1+1, \ldots ,s.$$
(2)

Model (1) can handle coexisting desirable and undesirable inputs and outputs. Here, the desirable outputs and undesirable inputs are expanded and the desirable inputs and undesirable outputs are contracted. Let \(^*\) represent the optimal solution of model (1). The efficiency scores of other DMUs are obtained by using the weights of kth chosen DMU (\(q_{ik}^{g*}\), \(q_{ik}^{b*}\), \(p_{rk}^{g*}\) and \(p_{rk}^{b*}\)). The cross-efficiency of DMU j with the weights of DMU k (\(e_{kj}\)) can be evaluated as follows:

$$e_{kj}=\sum _{r=1}^{s_1} p_{rk}^{g*}y_{rj}^{g}- \sum _{r=s_1+1}^s p_{rk}^{b*} y_{rj}^{b}-\sum _{i=1}^{m_1} q_{ik}^{g*} x_{ij}^{g} + \sum_{i=m_1+1}^m q_{ik}^{b*} x_{ij}^{b} + \varepsilon _k.$$
(3)

We can construct the matrix of cross-efficiencies as \(E = (e_{kj} )\) \((k, j=1,2, \ldots ,n)\), where \(e_{kj}\) is the cross-efficiency of DMU j evaluated by DMU k. The cross-efficiency score of DMU j is defined as the average of the jth column:

$${\overline{e}}_j=\frac{1}{n} \sum _{k=1}^n e_{kj}.$$
(4)

3 Fuzzy DEA Cross-Efficiency Model with Coexisting Undesirable Inputs and Outputs

Conventional DEA models are based on an unrealistic assumption that real situations can be modeled with crisp input and output data. In many situations, inputs and outputs are often imprecise; hence, there is a need to use FDEA models instead of DEA models.

The basic fuzzy form of model (1) with undesirable fuzzy inputs and outputs can be defined in the following FDEA model. It should be noted that the parameters in model (5) are the same as those in model (1), and the symbol \(\tilde{}\) represents the fuzziness of the associated parameter.

$$\max E_k=\sum _{r=1}^{s_1} p_{rk}^g {\tilde{y}}_{rk}^g-\sum _{r=s_1+1}^s p_{rk}^b {\tilde{y}}_{rk}^b -\sum _{i=1}^{m_1} q_{ik}^g{\tilde{x}}_{ik}^g + \sum _{i=m_1+1}^m q_{ik}^b {\tilde{x}}_{ik}^b+ \varepsilon _k {\text {s.t.}}\quad \sum _{r=1}^{s_1} p_{rk}^g {\tilde{y}}_{rj}^g-\sum _{r=s_1+1}^s p_{rk}^b {\tilde{y}}_{rj}^b- \sum _{i=1}^{m_1} q_{ik}^g {\tilde{x}}_{ij}^g + \sum _{i=m_1+1}^m q_{ik}^b {\tilde{x}}_{ij}^b+ \varepsilon _k\le 0,\quad j=1,2,\ldots ,n, p_{rk}^g\ge \frac{1}{(m+s) {\tilde{R}}^{g+}_r }, \quad r=1,2,\ldots ,s_1, p_{rk}^b\ge \frac{1}{(m+s) {\tilde{R}}^{b+}_r }, \quad r=s_1+1,\ldots ,s, q_{ik}^g\ge \frac{1}{(m+s) {\tilde{R}}^{g-}_r }, \quad i=1,2,\ldots ,m_1, q_{ik}^b\ge \frac{1}{(m+s) {\tilde{R}}^{b-}_r }, \quad i=m_1+1,\ldots ,m.$$
(5)

Note that several different patterns, including triangular, trapezoid, S-curve, exponential, hyperbolic, are used to model the vagueness of the parameters in the existing literature. Among them, the triangular distribution is used most often to represent imprecise data owing to the ease it offers in defining the maximum and minimum limits of deviation of the fuzzy number from its central value, although certain practical applications may prefer other patterns. Moreover, in refs. [38] and [40], the values of each input and output are also regarded as triangular fuzzy numbers. Therefore, in this paper, triangular fuzzy numbers (TFNs) are used to represent the values of each input and output. Let \({\tilde{x}}_{ij}^g = (x_{ij}^{lg},x_{ij}^{mg}, x_{ij}^{ug})\) and \({\tilde{x}}_{ij}^b = (x_{ij}^{lb}, x_{ij}^{mb},x_{ij}^{ub})\) represent the ith desirable and undesirable inputs corresponding to the jth DMU, respectively. Also let \({\tilde{y}}_{rj}^g = (y_{rj}^{lg}, y_{rj}^{mg}, y_{rj}^{ug})\) and \({\tilde{y}}_{rj}^b = (y_{rj}^{lb}, y_{rj}^{mb}, y_{rj}^{ub})\) represent the rth desirable and undesirable outputs of the jth DMU, respectively. We can calculate the lower and upper bounds of the membership functions of these TFNs for each input and output. To that end, here we apply the \(\alpha\)-cut approach to transfer the fuzzy numbers to crisp numbers. Using \(\alpha\)-cuts, the fuzzy data (uncertainty range) can be represented by different levels of confidence intervals. At a given \(\alpha\)-cut level (\(0\le \alpha \le 1\)), the upper and lower bounds of the inputs and outputs for an arbitrary \(\alpha\)-cut level can be quantified as follows:

$$(x_{ij}^{gL})_\alpha = x_{ij}^{lg}+ \alpha (x_{ij}^{mg}-x_{ij}^{lg}), \quad i=1,2,\ldots ,m_1, (x_{ij}^{gU})_\alpha = x_{ij}^{ug}- \alpha (x_{ij}^{ug}-x_{ij}^{mg}), \quad i=1,2,\ldots ,m_1, (x_{ij}^{bL})_\alpha = x_{ij}^{lb}+ \alpha (x_{ij}^{mb}-x_{ij}^{lb}), \quad i=m_1+1,\ldots ,m, (x_{ij}^{bU})_\alpha = x_{ij}^{ub}- \alpha (x_{ij}^{ub}-x_{ij}^{mb}), \quad i=m_1+1,\ldots ,m, (y_{rj}^{gL})_\alpha = y_{rj}^{lg}+ \alpha (y_{rj}^{mg}-y_{rj}^{lg}), \quad r=1,2,\ldots ,s_1, (y_{rj}^{gU})_\alpha = y_{rj}^{ug}- \alpha (y_{rj}^{ug}-y_{rj}^{mg}), \quad r=1,2,\ldots ,s_1, (y_{rj}^{bL})_\alpha = y_{rj}^{lb}+ \alpha (y_{rj}^{mb}-y_{rj}^{lb}), \quad r=s_1+1,\ldots ,s, (y_{rj}^{bU})_\alpha = y_{rj}^{ub}- \alpha (y_{rj}^{ub}-y_{rj}^{mb}), \quad r=s_1+1,\ldots ,s.$$
(6)

Additionally, the upper and lower bounds of the \(R^{g-}_i\), \(R^{b-}_i\), \(R^{g+}_r\) and \(R^{b+}_r\) for an arbitrary \(\alpha\)-cut level are defined as follows:

$$(R_{i}^{gL-})_\alpha = R_{i}^{lg-}+ \alpha (R_{i}^{mg-}-R_{i}^{lg-}), \quad i=1,2,\ldots ,m_1,(R_{i}^{gU-})_\alpha = R_{i}^{ug-}- \alpha (R_{i}^{ug-}-R_{i}^{mg-}), \quad i=1,2,\ldots ,m_1,(R_{i}^{bL-})_\alpha = R_{i}^{lb-}+ \alpha (R_{i}^{mb-}-R_{i}^{lb-}), \quad i=m_1+1,\ldots ,m,(R_{i}^{bU-})_\alpha = R_{i}^{ub-}- \alpha (R_{i}^{ub-}-R_{i}^{mb-}), \quad i=m_1+1,\ldots ,m,(R_{r}^{gL+})_\alpha = R_{r}^{lg+}+ \alpha (R_{r}^{mg+}-R_{r}^{lg+}), \quad r=1,2,\ldots ,s_1, (R_{r}^{gU+})_\alpha = R_{r}^{ug+}- \alpha (R_{r}^{ug+}-R_{r}^{mg+}), \quad r=1,2,\ldots ,s_1, (R_{r}^{bL+})_\alpha = R_{r}^{lb+}+ \alpha (R_{r}^{mb+}-R_{r}^{lb+}), \quad r=s_1+1,\ldots ,s, (R_{r}^{bU+})_\alpha = R_{r}^{ub+}- \alpha (R_{r}^{ub+}-R_{r}^{mb+}), \quad r=s_1+1,\ldots ,s.$$
(7)

Using Eqs. (6) and (7) along with model (5), the following models are proposed to obtain the lower and upper efficiency scores of the assets.

$$\quad \max\quad E_k^L=\sum _{r=1}^{s_1} p_{rk}^g(y_{rk}^{lg}+ \alpha (y_{rk}^{mg}-y_{rk}^{lg})) - \sum _{r=s_1+1}^s p_{rk}^b (y_{rk}^{ub}-\alpha (y_{rk}^{ub}-y_{rk}^{mb}))-\sum _{i=1}^{m_1} q_{ik}^g(x_{ik}^{ug}- \alpha (x_{ik}^{ug}-x_{ik}^{mg}))+\sum _{i=m_1+1}^m q_{ik}^b(x_{ik}^{lb} + \alpha (x_{ik}^{mb}-x_{ik}^{lb}))+ \varepsilon _k {\text {s.t.}}\quad \sum _{r=1}^{s_1} p_{rk}^g(y_{rj}^{lg}+ \alpha (y_{rj}^{mg} -y_{rj}^{lg}))- \sum _{r=s_1+1}^s p_{rk}^b (y_{rj}^{ub}- \alpha (y_{rj}^{ub}-y_{rj}^{mb}))-\sum _{i=1}^{m_1} q_{ik}^g(x_{ij}^{ug}- \alpha (x_{ij}^{ug}-x_{ij}^{mg}))+ \sum _{i=m_1+1}^m q_{ik}^b(x_{ij}^{lb} + \alpha (x_{ij}^{mb}-x_{ij}^{lb}))+ \varepsilon _k\le 0,\quad j=1,2,\ldots ,n, p_{rk}^g\ge \frac{1}{(m+s) (R_{r}^{lg+}+ \alpha (R_{r}^{mg+}-R_{r}^{lg+}))} , \quad r=1,2,\ldots ,s_1, p_{rk}^b\ge \frac{1}{(m+s) (R_{r}^{ub+}-\alpha (R_{r}^{ub+} -R_{r}^{mb+}))}, \quad r=s_1+1,\ldots ,s,$$
(8)
$$q_{ik}^g\ge \frac{1}{(m+s) (R_{i}^{ug-} - \alpha (R_{i}^{ug-}-R_{i}^{mg-})) }, \quad i=1,2,\ldots ,m_1, q_{ik}^b\ge \frac{1}{(m+s) (R_{i}^{lb-} + \alpha (R_{i}^{mb-}-R_{i}^{lb-})) }, \quad i=m_1+1,\ldots ,m,$$

and

$$\max\quad E_k^U=\sum _{r=1}^{s_1} p_{rk}^g(y_{rk}^{ug}-\alpha (y_{rk}^{ug}-y_{rk}^{mg})) -\sum _{r=s_1+1}^s p_{rk}^b (y_{rk}^{lb}+ \alpha (y_{rk}^{mb}-y_{rk}^{lb}))-\sum _{i=1}^{m_1} q_{ik}^g(x_{ik}^{lg}+ \alpha (x_{ik}^{mg}-x_{ik}^{lg})) + \sum _{i=m_1+1}^m q_{ik}^b(x_{ik}^{ub}- \alpha (x_{ik}^{ub}-x_{ik}^{mb}))+ \varepsilon _k{\text {s.t.}}\quad \sum _{r=1}^{s_1} p_{rk}^g(y_{rj}^{ug}- \alpha (y_{rj}^{ug}-y_{rj}^{mg})) -\sum _{r=s_1+1}^s p_{rk}^b (y_{rj}^{lb}+ \alpha (y_{rj}^{mb}-y_{rj}^{lb}))-\sum _{i=1}^{m_1} q_{ik}^g(x_{ij}^{lg}+ \alpha (x_{ij}^{mg}-x_{ij}^{lg})) + \sum _{i=m_1+1}^m q_{ik}^b(x_{ij}^{ub}- \alpha (x_{ij}^{ub}-x_{ij}^{mb})) + \varepsilon _k\le 0,\quad j=1,2,\ldots ,n, p_{rk}^g\ge \frac{1}{(m+s) (R_{r}^{ug+}-\alpha (R_{r}^{ug+}-R_{r}^{mg+})) }, \quad r=1,2,\ldots ,s_1, p_{rk}^b\ge \frac{1}{(m+s) (R_{r}^{lb+} + \alpha (R_{r}^{mb+}-R_{r}^{lb+})) }, \quad r=s_1+1,\ldots ,s, q_{ik}^g\ge \frac{1}{(m+s) (R_{i}^{lg-}+ \alpha (R_{i}^{mg-}-R_{i}^{lg-})) }, \quad i=1,2,\ldots ,m_1, q_{ik}^b\ge \frac{1}{(m+s) (R_{i}^{ub-}- \alpha (R_{i}^{ub-}-R_{i}^{mb-})) }, \quad i=m_1+1,\ldots ,m.$$
(9)

In this paper, we use cross-efficiency to evaluate the performance of DMUs in fuzzy environment. Let \(^*\) represent the optimal solution of models (8) and (9). We can calculate the set of weights (\(q_{ik}^{g*}\), \(q_{ik}^{b*}\), \(p_{rk}^{g*}\) and \(p_{rk}^{b*}\)) by solving models (8) and (9) at a given level of \(\alpha\), which maximizes the efficiency score of kth DMU. Then, the cross-efficiency of DMU j with the weights of DMU k (\(e_{kj}\)) at the given \(\alpha\) can be expressed as follows:

$$(e_{kj})^L_\alpha=\sum _{r=1}^{s_1} p_{rk}^{g*}(y_{rj}^{lg}+ \alpha (y_{rj}^{mg}-y_{rj}^{lg})) - \sum _{r=s_1+1}^s p_{rk}^{b*} (y_{rj}^{ub}- \alpha (y_{rj}^{ub}-y_{rj}^{mb}))-\sum _{i=1}^{m_1} q_{ik}^{g*}(x_{ij}^{ug}- \alpha (x_{ij}^{ug}-x_{ij}^{mg})) +\sum _{i=m_1+1}^m q_{ik}^{b*}(x_{ij}^{lb}+ \alpha (x_{ij}^{mb}-x_{ij}^{lb})) + \varepsilon _k, (e_{kj})^U_\alpha=\sum _{r=1}^{s_1} p_{rk}^{g*}(y_{rj}^{ug}- \alpha (y_{rj}^{ug}-y_{rj}^{mg})) -\sum _{r=s_1+1}^s p_{rk}^{b*} (y_{rj}^{lb}+ \alpha (y_{rj}^{mb}-y_{rj}^{lb}))-\sum _{i=1}^{m_1} q_{ik}^{g*}(x_{ij}^{lg}+ \alpha (x_{ij}^{mg}-x_{ij}^{lg})) +\sum _{i=m_1+1}^m q_{ik}^{b*}(x_{ij}^{ub}- \alpha (x_{ij}^{ub}-x_{ij}^{mb}))+ \varepsilon _k.$$
(10)

Once the cross-efficiencies have been obtained, we can construct the matrix called cross-efficiency matrix. The cross-efficiency score of DMU j can be calculated as the average of jth column as follows:

$${({\bar{e}}_j)}_{\alpha}^{L} = \frac{1}{n}\sum\limits_{k = 1}^{n} {(e_{kj})}_{\alpha}^{L} {({\bar{e}}_{j})}_{\alpha}^{U} = \frac{1}{n}\sum\limits_{k = 1}^{n} {(e_{kj})}_{\alpha}^{U}$$
(11)

Since the cross-efficiency scores are fuzzy numbers, they cannot be ranked directly. To solve this problem, we can use the ranking index (RI), which is a suitable method in [54], to rank the fuzzy efficiency scores:

$$RI({\tilde{e}}_j)=\frac{\sum _{k=1}^{\rho } k{\hat{\eta }}_{jk}}{\sum _{k=1}^{\rho }k},$$
(12)

where \(k \in \{0,1,\ldots ,\rho \}\), \(\rho\) is the number of \(\alpha _k\), \({\hat{\eta }}_{jk}=\frac{m_{jk}-L}{\delta _{jk}+U-L+1}\), \(m_{jk}=\frac{({\overline{e}}_j)^U_{\alpha _k}+({\overline{e}}_j)^L_{\alpha _k}}{2}\), \(\alpha _k=\frac{k}{\rho }\), \(\delta _{jk}=({\overline{e}}_j)^U_{\alpha _k}-({\overline{e}}_j)^L_{\alpha _k}\), \(U=max_{j,k}\{({\overline{e}}_j)^U_{\alpha _k}\}\), \(L=min_{j,k}\{({\overline{e}}_j)^L_{\alpha _k}\}\).

4 A Mean-Semivariance Model of Portfolio Selection Based on FDEA Cross-Efficiency Evaluation

The traditional use of DEA cross-efficiency evaluation in the portfolio selection problem involves ranking DMUs in decreasing order of cross-efficiency scores and selecting the several top DMUs as the desired portfolio. However, there are two shortcomings for the simple use of DEA cross-efficiency: (i) no consideration of diversification for portfolio selection and (ii) the ‘ganging-together’ phenomenon [48]. In DEA cross-efficiency evaluation, the DMUs with similar factor levels may have higher cross-efficiency scores simply because they effectively give “high votes” to each other. It leads to selection of a specialized portfolio which consists of relatively similar DMUs and in turn lacks diversification. To eliminate the problems, Lim et al. [30] proposed a DEA MV cross-efficiency model for portfolio selection, which can select a portfolio whose performance is well diversified in terms of its performance on multiple evaluation criteria (for more detail see [30]). Then, Mashayekhi and Omrani [36] proposed an integrated fuzzy multi-objective Markowitz-DEA cross-efficiency model for portfolio selection. Chen et al. [12] used fuzzy mean-semivariance and SR-based DEA cross-efficiency models to develop a comprehensive fuzzy portfolio selection model. Therefore, similar to the above studies, in this paper, we incorporate the FDEA cross-efficiency into Markowitz mean-semivariance model and apply the novel mean-semivariance model to portfolio selection problem.

For a DMU i, Lim et al. [30] defined the return and risk characteristics as its cross-efficiency score and the variance of its cross-efficiencies, respectively. The DEA MV cross-efficiency model is as follows:

$$\quad \quad \min \quad V_\Omega =\sum _{i=1}^n\sum _{j=1}^n w_iw_jcov(e_i, e_j) \quad {\text {s.t.}} \quad E_\Omega =\sum _{i=1}^n w_i{\overline{e}}_i\ge (1-\gamma )E_\Omega ^b, \sum _{i=1}^n w_i=1, w_i \ge 0, \ i=1,2, \ldots ,n,$$
(13)

where \(\gamma\) \((0 \le \gamma \le 1)\) is the return-risk trade-off parameter, \({\overline{e}}_i\) is the cross-efficiency score of DMU i, \(cov(e_i, e_j)\) is the covariance between DMU i’s cross-efficiencies (\(e_i\)) and DMU j’s cross-efficiencies (\(e_j\)), \(E_\Omega ^b\) is the maximum portfolio return achievable, and \(w_i\) is the proportion of asset i (\(i=1,2,\ldots ,n\)).

Note that if the return distributions of assets are not symmetric, the use of variance as a risk measure is not advisable because it leads to predictions of portfolio behavior, which significantly diverge from realistic situations. To handle such situations, many scholars have used semivariance as an alternative risk measure to qualify risk, see for instance Markowitz [35] and Ballestero [4]. Motivated by this information, we have used semivariance to quantify the risk of cross-efficiency. Moreover, to make good investment decision in the complicated financial market, we consider several decision criteria, such as risk, return, budget constraint, cardinality constraint, buy-in thresholds and no short selling. In the following, we formulate a FDEA mean-semivariance cross-efficiency model.

4.1 Objective Function

\(\bullet\)  Risk: We use the semivariance of the RI to measure risk. The semivariance of portfolio \(w=(w_1,w_2,\ldots ,w_n)\) can be obtained as

$$S_\Omega =\sum _{i=1}^n w_i[min(0,RI({\tilde{e}}_i)-E_\Omega )]^2.$$
(14)

4.2 Constraints

  • Expected return: The return of the portfolio \(w=(w_1,w_2,\ldots ,w_n)\) is

    $$E_\Omega =\sum _{i=1}^n w_iRI({\tilde{e}}_i)\ge (1-\gamma )E_\Omega ^b,$$
    (15)

    where \(E_\Omega ^b\) is the maximum portfolio return achievable, which can be obtained by maximizing \(E_\Omega\) under constraints.

  • Budget constraint: Budget constraint represents the full utilization of the available money, i.e.,

    $$\sum _{i=1}^n w_i=1.$$
    (16)
  • Cardinality constraint: Cardinality constraint is used to control the number of assets held in the portfolio. The cardinality constraint is described as

    $$\sum _{i=1}^n z_i=d,$$
    (17)

    where \(z_i\in \{0,1\}\), if any of asset i is held, \(z_i=1\); otherwise, \(z_i=0\).

  • Buy-in thresholds: If any of asset i is held (\(z_i=1\)), its proportion \(w_i\) must lie no less than \(\varepsilon _i\) and no more than \(\delta _i\), while if no asset i is held (\(z_i=0\)), its ratio \(w_i\) is zero. Thus, in the presence of cardinality constraint (Eq. 17), buy-in thresholds are represented by

    $$\varepsilon _iz_i\le w_i \le \delta _iz_i,\ i=1,2,\ldots ,n.$$
    (18)
  • No short selling: This constraint ensures that short selling is prohibited, and it is expressed as

    $$w_i \ge 0,\ i=1,2,\ldots ,n.$$
    (19)

4.3 Model Formulation

Based on the above discussions, the FDEA mean-semivariance cross-efficiency model can be described as follows:

$$\quad \quad \min \quad S_\Omega =\sum _{i=1}^n w_i[min(0,RI({\tilde{e}}_i)-E_\Omega )]^2, \quad {\text {s.t.}} \quad E_\Omega =\sum _{i=1}^n w_iRI({\tilde{e}}_i)\ge (1-\gamma )E_\Omega ^b, \sum _{i=1}^n w_i=1, \sum _{i=1}^n z_i=d, \varepsilon _iz_i\le w_i \le \delta _iz_i,\ i=1,2,\ldots ,n, z_i\in \{0,1\}, \ i=1,2,\ldots ,n, w_i \ge 0,\ i=1,2,\ldots ,n.$$
(20)

5 Genetic Algorithm

Genetic algorithm (GA), which was originally proposed by Holland [21], is a classical practical algorithm based on the mechanism of genetics and natural selection. This paper employs GA to solve the proposed model (20).

5.1 Initialization

At the initialization step, following Bacanin and Tuba [3], GA generates SN random populations using

$$w_{i,j}=\varepsilon _j+\text {rand}(0,1)(\delta _j-\varepsilon _j),$$
(21)

where \(\text {rand}(0,1)\) is a random number uniformly distributed in [0, 1].

5.2 Constraint Handling

  1. 1.

    Boundary constraint: If the initially generated value for the jth parameter of the ith gene does not fit in the scope [\(\varepsilon _j\), \(\delta _j\)], it is being modified:

    $$\text {if} \quad w_{i,j}>\delta _j,\quad \text {then} \quad w_{i,j}=\delta _j, {\text {if}} \quad w_{i,j}<\varepsilon _j, \quad \text {then} \quad w_{i,j}= \varepsilon _j.$$
    (22)
  2. 2.

    Cardinality constraint: Decision variables \(z_{i,j}\ (i=1, 2, \ldots , SN,j=1, 2, \ldots ,n)\) are generated randomly by applying

    $$z_{i,j}=\left\{ \begin{array}{ll} 1, &{}\text{ if } \phi <0.5,\\ 0, &{}\text{ if } \phi \ge 0.5, \end{array} \right.$$
    (23)

    where \(\phi\) is random real number between 0 and 1.

  3. 3.

    Budget constraint: For the constraint \(\sum _{i=1}^n w_i=1\) we set \(\psi =\sum _{i=1}^n w_{i,j}\) and put \(w_{i,j}= w_{i,j}/\psi\) for all assets that satisfy \(j=1, 2, \ldots ,n\). The same approach for satisfying this constraint was used in [14].

The implementation procedure of GA is described as Algorithm 1.

figure a

6 Numerical Experiments

6.1 Data Preprocessing

6.1.1 Selection of Input and Output Parameters

In this example, we select 50 firms from the CSI 300 index, which is an index of 300 large-capitalization Shanghai- and Shenzhen-listed Class A shares. We rely on the financial statements for the year 2017 to obtain the required data for inputs and outputs of FDEA. To consider the input and output parameters, we rely on a thorough literature survey [15, 30] along with financial experts’ opinions. The performance of each firm is evaluated in terms of two desirable inputs, five undesirable inputs, seven desirable outputs and two undesirable outputs. The selected 16 financial input/output parameters are given in Table 4.

We consider leverages (inputs) that should be minimized to decrease the financial risk and prevent insolvency or even bankruptcy for these firms, consistent with the DEA nomenclature and define them as desirable. In addition, firms require larger asset utilization and liquidity to make efficient services, leading to our DEA-based definition of undesirable inputs. Finally, firms expect a consistent flow of profits to result from their activity (desirable outputs) while improving the performance in the area, i.e., decreasing expense ratio during sales (undesirable outputs).

Table 4 Inputs and outputs

6.1.2 Fuzzification of Input and Output Data

Tables 5 and 6 present the 16 financial input/output data of 50 firms. From Tables 5 and 6, we can see that the input–output data of each firm are available in crisp form. To eliminate the scale effect, the data given in Tables 5 and 6 are normalized, and the obtained normalized data are given in Tables 7 and 8. To incorporate the uncertainty exists in the real-market, we fuzzify the normalized data as TFNs. The obtained normalized data are represented by \(a^m\), and the corresponding TFN is represented by (\(a^l, a^m, a^u\)). As noted in [38], \(a^l\) and \(a^u\) values are obtained by subtracting and adding \(20\%\) of \(a^m\) from \(a^m\), respectively. Given space constraints, the corresponding results are omitted here. One company among 50 (000100) has been chosen to represent the numerical findings step by step (till the fuzzification) in the Appendix.

Remark

Note that for the crisp negative value, \(a^l\) and \(a^u\) values are obtained by adding and subtracting 20% of \(a^m\) from \(a^m\), respectively, as per the fuzzy algebra rules.

Table 5 Table caption
Table 6 Output data
Table 7 Standardized input data
Table 8 Standardized output data

6.2 Performance Assessment of Firms

The computations used to solve models (8) and (9) are made in Matlab. By solving models (8) and (9), we derive a lower and upper bound of the efficiency scores at different \(\alpha\)-cut levels (\(\alpha =0, 0.1, 0.2, \ldots ,0.9,1\)). The results are presented in Table 9. Moreover, using Eq. (12), the values of RI with respect to various assets are also given in Table 9. Note that the cross-efficiency scores in the case of \(\alpha =1\) are the same as the results based on the crisp input–output data. Given the variation in the satisfaction level \(\alpha\), the lower and upper bounds of efficiency scores are different for almost every DMU. Using the firm with a stock code of 600018 as an example, its efficiency score is -0.5810 under crisp data, while that with fuzzy data fluctuates between -0.5853 and -0.5750.

Table 9 The lower, upper bounds at 11 levels and rankings

6.3 Portfolio Selection

In this section, we apply the proposed model on real data for portfolio selection to justify the utility in real-world investment situations. For the purpose, we present two cases: Case I, portfolio selection was performed with equal weights, i.e., the investor wishes to have an equal proportion of investment in the assets of the selected portfolio; Case II, portfolio selection was performed with unequal weights, i.e., the investor wishes to satisfy specific proportion of investment in the assets of the selected portfolio. In both cases, different portfolios were generated by varying the values of \(d, \gamma , \varepsilon _i, \delta _i\).

6.3.1 Portfolio Selection with Equal Weights

We apply the proposed model (20) to the DMUs (assets) with the RI data given in Table 9. To present the advantages of the proposed portfolio selection model, we first obtain portfolios using the traditional method for different cardinality constraints, i.e., \(d=\)15, 10, and 5, wherein assets are chosen with equal weights in decreasing order of the RI data. The obtained portfolio strategies are given in Table 10. Further, we employ GA to solve the proposed model (20) with the trade-off parameter (\(\gamma\)) set to 0.1, 0.2, and 0.3. For all three values of \(\gamma\), we consider following three cases: (i) \(d=15\), \(\varepsilon _i=0.06667\) and \(\delta _i=0.06667\); (ii) \(d=10\), \(\varepsilon _i=0.1\) and \(\delta _i=0.1\); (iii) \(d=5\), \(\varepsilon _i=0.2\) and \(\delta _i=0.2\), respectively. Moreover, the parameters of GA are set as follows: the number of populations is 50, the crossover probability is 0.9, the mutation probability is 0.1, and the maximum generation number is 100. Finally, the \(E_\Omega ^b\) is obtained by maximizing \(E_\Omega\) under constraints (16)-(19). In the following, to compare the obtained portfolios with those obtained through the traditional method, the assets are chosen with equal weights. The obtained portfolio strategies under different cardinality constraints (i.e., d) are presented in Table 11. To present the advantages of the proposed model, we compare obtained portfolios given in Table 11 with the portfolios obtained through the traditional approach given in Table 10. First, we compare the mean and semivariance of the portfolios, and it is clear that the portfolios obtained using the proposed model have remarkable reduction in semivariance and a slight reduction in mean. For example, the portfolio obtained in Table 11 corresponding to \(d=15\) and \(\gamma =0.1\) exhibits a 9.26% decrease in the mean and 41.48% reduction in semivariance in comparison with the portfolio given in Table 10 (obtained using the traditional approach). The same conclusion can be obtained in the case of \(d=10\) and \(d=5\). In addition, the results given in Table 11 clearly highlight the fact that a relatively larger decrease in semivariance can be achieved with a larger value of the return-risk trade-off parameter \(\gamma\). For example, given \(d=10\) and comparing the portfolios obtained corresponding to \(\gamma =0.1\) and \(\gamma =0.2\), the latter has significantly smaller semivariance (0.00036 \(\rightarrow\) 0.00020, 44.44\(\%\) reduction) and lower decrease in return (0.44138 \(\rightarrow\) 0.40073, 9.21\(\%\) reduction).

Table 10 The portfolio strategies with different values of d using traditional method
Table 11 The optimal portfolio strategies with different values of d using the proposed model

6.3.2 Portfolio Selection with Unequal Weights

To generate portfolios with unequal weights, for \(\varepsilon _i\) = 0, we set \(\delta _i\) = 0.3, 0.5 and 0.7, respectively. Note that the parameters of d, \(\gamma\), and GA, are the same as noted in the above case. Using the GA, we solve the proposed model (20) under different d. The optimal portfolio strategies are given in Tables 12, 13, 14. By comparing the portfolios given in Tables 12, 13, 14 with those given in Table 10, two conclusions can be drawn. First, it can be easily observed that the portfolio risks decrease with smaller reductions in returns. For example, \(d=15\), \(\gamma =0.1\) and \(\delta =0.3\), the portfolio semivariance decreases from 0.00087 to 0.00046, i.e., 47.18\(\%\) reduction and portfolio return decreases from 0.44608 to 0.41086, i.e., \(7.90\%\) reduction. In addition, a larger decrease in semivariance can be realized by a larger \(\gamma\). For example, in the case of \(d=5\) and \(\delta =0.7\), the portfolio in Table 14 exhibits a \(98.29\%\) reduction in semivariance, while the return decreases by \(20.84\%\) with the value of \(\gamma\) set to 0.3. These conclusions are consistent with the evaluation results obtained for the case of equal weights.

Table 12 The optimal portfolio strategies with unequal weights in the case of \(d=15\)
Table 13 The optimal portfolio strategies with unequal weights in the case of \(d=10\)
Table 14 The optimal portfolio strategies with unequal weights in the case of \(d=5\)

6.4 Comparative Analysis

6.4.1 Comparison with Mashayekhi and Omrani [36]

For substantiating the proposed approach, we conduct a comparative analysis of the proposed approach with [36]. Although most of the inputs and outputs in both the approaches are similar, for the sake of a coherent comparison, the inputs “debt to equity ratio” and “leverage ratio” are dropped from [36], and the outputs “expense ratio during sales” (\({y_8^b}\)) and “income tax / total profit” (\({y_9^b}\)) are dropped from the proposed approach. Next, we apply our input–output data on Mashayekhi and Omrani’s approach and redo a numerical illustration with the common set of inputs and outputs. The portfolio selection model (Model 9) in [36] is a multi-objective model with four objective functions. For the sake of preserving the coherency of comparison, we drop the two additional objective functions of the return and risk (variance) of the portfolio since, in our proposed approach, the cross-efficiencies and the semivariance of the cross-efficiencies of the DMUs serve as the expected return and risk of the portfolio, respectively. The remaining two objective functions are aggregated using the weighted sum approach with equal weights. The portfolio selection parameters used in [36] are used for both the approaches (viz., cardinality = 10, lower bound = 0.05, and upper bound = 0.2) and the values of \(\alpha\) and \(\gamma\) are set as 1 and 0.1, respectively, for the proposed approach. Since crisp values of the inputs and outputs are used in [36], the value of \(\alpha\) is set as 1 for the proposed approach to compare both the approaches on the same horizon. Subsequently, portfolio selection is performed for both equal and unequal weights for the proposed approach. The cross-efficiencies and the portfolios obtained from both the approaches are presented in Table 15.

From Table 15, we infer that the portfolios obtained for both equal and unequal weights using the proposed approach are more diverse in comparison with the portfolio obtained from Mashayekhi and Omrani’s approach. Moreover, the cross-efficiency mean (risk) is also significantly greater (lesser) for both equal and unequal weights compared to [36]. Besides this, the proposed approach is more robust and dynamic because it employs the concept of desirable and undesirable inputs and outputs, which reflect the true performance of the DMUs. The proposed approach can also handle fuzzy inputs and outputs using the concept of \(\alpha -\)cuts. The above facts collectively substantiate the proposed approach.

6.4.2 Comparison with Chen et al. [12]

For a stronger validation, we further compare the proposed approach with Chen et al. [12]. Chen et al. use the same set of inputs and outputs as used in [36]. Therefore, on similar lines of the previous comparison with [36], we use only the common set of inputs and outputs for a coherent comparison with the proposed approach. The portfolio selection model (Model 10) in [12] is a multi-objective portfolio selection model. Therefore, we drop the additional objective functions of the expected return and semivariance of the portfolio from Model 10. Since both Chen et al. [12] and Mashayekhi and Omrani [36] have used the same DEA model for calculating the cross-efficiencies of the DMUs, similar cross-efficiencies are obtained upon using our input–output data as obtained in the previous comparison with [36] (see Table 15). The same values of the cardinality, lower and upper bounds are used as given in [12], viz., 8, 0.05 and 0.2, respectively. The portfolio obtained from Chen et al.’s approach and the proposed approach for both equal and unequal weights (for \(\gamma = 0.1\)) are presented in Table 15.

Table 15 Comparison with Mashayekhi and Omrani [36] and Chen et al. [12]

From the results in Table 15, we infer that the return (risk) for the portfolios obtained using the proposed approach for both equal and unequal weights are significantly greater (lesser) than obtained using Chen et al.’s approach. Note that Chen et al. [12] have used the variance of the cross-efficiency of the risk measure. For similar reasons mentioned in the previous comparison, the proposed approach is superior to [12].

7 Conclusion

This paper discussed an FDEA model for assets evaluation, wherein it is possible to consider both undesirable inputs and outputs simultaneously. Furthermore, we developed a novel FDEA cross-efficiency evaluation-based mean-semivariance model for portfolio selection. To illustrate the proposed approach, a case study based on 50 firms was considered. The numerical results showed that the obtained portfolios using the proposed model have a remarkable reduction in semivariance and a slight reduction in mean than those obtained through the traditional method. Also, a more significant decrease in semivariance can be realized by a larger \(\gamma\). The obtained portfolio selection strategies empirically support the effectiveness of the proposed approach for stock portfolio selection. The proposed model is capable of generating portfolios per the preferences of the investor corresponding to \(d, \gamma , \varepsilon _i,\) and \(\delta _i\). If the investor is not satisfied with the obtained portfolios, more portfolios can be generated by varying different model parameters such as \(d, \gamma , \varepsilon _i,\) and \(\delta _i\). The obtained portfolios are not only efficient but also in line with the preferences of the investor. For future research, we would like to extend the research by considering nonfinancial indicators in the FDEA model to incorporate more realistic situations of performance evaluation. Besides, the inputs and outputs can also be considered as trapezoidal fuzzy variables. The proposed mean-semivariance model can also be extended to a multiperiod one.