1 Introduction

Regression model is usually used to describe how the output (response) variable depends on some input (explanatory) variables in social sciences, economics, and life sciences (Fahrmeir et al. [11]). The simplest regression model is the linear regression given by

$$y=\beta _1x_{1}+\cdots +\beta _kx_{k}+\epsilon,$$

where y is the output variable, \(x_{j}, j=1,\cdots ,k,\) are the input variables, \(\varvec{\beta }=(\beta _1,\cdots ,\beta _k)\) is the unknown vector of regression coefficients, and \(\epsilon\) is the random error. Estimating the unknown vector \(\varvec{\beta }\) is an important topic in regression analysis. In recent years, some shrinkage methods are developed to deal with this problem, and lasso (least absolute shrinkage and selection operator) proposed by Tibshirani [28] is the most commonly used one. From [28], the lasso estimate for \(\varvec{\beta }\) is obtained by imposing an \(L_1\) penalty on the least-squares error, that is, solving the following optimization problem:

$$\mathop {\arg \min }_{\varvec{\beta }}\sum _{i=1}^n\Big (y_i-\sum _{j=1}^k\beta _jx_{ij}\Big )^2 +\lambda \sum _{j=1}^k{|\beta _j|},$$

where \((x_{i1},\cdots ,x_{ik};y_i)\) and \(\lambda >0\) denote the observed data and tuning parameter, respectively. The lasso estimate will reduce to the ordinary least-squares estimate when \(\lambda\) goes to zero, and the coefficients will shrunk to zero when \(\lambda\) goes to infinity. Moreover, the lasso estimate usually has smaller variance and higher predicts accuracy than the ordinary least-squares estimate when k is large. However, the lasso procedure forces the coefficients to be equally penalized, which seems unfair to the important covariates. To solve it, Zou [37] proposed an adaptive lasso method which assigns different weights to different coefficients. From [37], the adaptive lasso estimate for \(\varvec{\beta }\) is to solve the following minimization:

$$\mathop {\arg \min }_{\varvec{\beta }}\sum _{i=1}^n\Big (y_i-\sum _{j=1}^k\beta _jx_{ij}\Big )^2+\lambda \sum _{j=1}^k{\omega _j|\beta _j|},$$

where \(\omega _j\)s are positive and data-dependent weights. Compared with lasso technique, the adaptive lasso has two advantages: near-minimax optimality and oracle property [13, 37]. For more discussions on lasso, adaptive lasso, and other shrinkage methods, we refer the reader to Knight and Fu [20], Fan and Li [13], Zou and Hastie [38], Wang et al. [32], Hastie et al. [14], and the references therein.

In real-life applications, we usually meet with some vague or imprecise data, such as ”young,” ”low,” ”about 5 inches” and so on. To deal with these data, Tanaka et al. [26] formalized traditional linear regression to fuzzy environment. Henceforth, a great number of studies on fuzzy regression analysis have been developed [2,3,4,5,6, 17, 18, 36]. There are two mainly used approaches to handle fuzzy regression problems: the possibilistic approach [26, 27] and the least-squares approach [2, 6, 9, 33]. However, there are only a few works using shrinkage technique on fuzzy regression analysis. Farnoosh et al. [12] proposed a ridge-type estimate for fuzzy linear regression model with fuzzy inputs and fuzzy outputs. Recently, Hesamian and Akbari [15] extended lasso technique to fuzzy linear regression and obtained fuzzy lasso estimate for the unknown coefficients. The fuzzy lasso estimate has good performance in prediction (estimation) and variable selection [15]. However, there are also some drawbacks of fuzzy lasso estimate: (a) it assigns the same weights to different coefficients, which produces biased estimates for the significant ones, and (b) the oracle properties do not hold for lasso estimate [13, 37]. In this paper, we propose a fuzzy adaptive lasso estimate for fuzzy linear regression with crisp inputs and fuzzy outputs. The new estimate is a generalization of fuzzy lasso estimate [15]. It penalizes every covariate according to its importance, that is, it allows larger penalties on unimportant covariates and smaller penalties on important ones. Moreover, it can also achieve parameter estimation and variable selection simultaneously. The numerical studies showed that the proposed fuzzy adaptive lasso estimate has better performance in prediction and variable selection than fuzzy lasso estimate and some popular methods.

The rest of this paper is organized as follows. We review some basic concepts and results of fuzzy number in Sect. 2. In Sect. 3, we introduce the definition of fuzzy linear regression model. In addition, three commonly used estimates are stated. In Sect. 4, we propose the fuzzy adaptive lasso estimate for fuzzy linear regression model. Besides, the algorithm is also presented in this section. To evaluate the performance of the proposed estimator, some numerical experiments are studied in Sect. 5. Finally, some concluding remarks are provided in Sect. 6.

2 Fuzzy Number, Fuzzy Arithmetic and Fuzzy Distance

In this section, we introduce some fundamental concepts of fuzzy number which will be used later. Moreover, some arithmetic operations and a distance metric between fuzzy numbers are presented. For more details, see [1, 8, 19].

Let \(\tilde{A}\) be a fuzzy set on \(\mathbb {R}\) (the real line) with membership function \(\mu _{\tilde{A}}\) from \(\mathbb {R}\) to interval [0, 1]. The fuzzy set \(\tilde{A}\) is said to be a fuzzy number if it satisfies the properties of normality, convexity, and boundary [1, 8]. The fuzzy number \(\tilde{A}\) is said to be an LR fuzzy number if its membership function has the form:

$$\mu _{{\tilde{A}}} (x) = \left\{ {\begin{array}{*{20}c} {L\left( {\frac{{a^{C} - x}}{{a^{L} }}} \right)} & {x \le a^{C} ,} \\ {R\left( {\frac{{x - a^{C} }}{{a^{U} }}} \right)} & {x > a^{C} ,} \\ \end{array} } \right.$$

where L and R are two strictly decreasing and continuous functions from [0,1] to [0,1], \(a^C\), \(a^L>0,\) and \(a^U>0\) denote the center, left, and right spreads of \(\tilde{A}\). For simplicity, the LR fuzzy number \(\tilde{A}\) defined above is denoted by \((a^L,a^C,a^U)_{LR}\). In particular, when \(L(x)=R(x)=1-x, \,0\le x\le 1\), \(\tilde{A}\) will reduce to the triangular fuzzy number \(\tilde{A}=(a^L,a^C,a^U)_T\) in which membership function takes the form:

$$\begin{aligned} \mu _{\tilde{A}}(x)= {\left\{ \begin{array}{ll} \frac{x-(a^C-a^L)}{a^L}&{} a^C-a^L\le x\le a^C,\\ \frac{a^C+a^U-x}{a^U}&{} a^C\le x\le a^C+a^U,\\ 0&{} \text{ else }. \end{array}\right. } \end{aligned}$$

If \(a^L=a^U\), \(\tilde{A}\) is a symmetric triangular fuzzy number denoted by \((a^C,a^L)_{T}\). Moreover, if \(a^L=a^U=0\), it reduces to a crisp real number \(a^C\).

In the following, we state two algebraic operations of triangular fuzzy numbers [15, 21], i.e., addition \((\oplus )\) and scalar multiplication \((\otimes )\). Let \(\tilde{A}=(a^L,a^C,a^U)_{T}\), \(\tilde{B}=(b^L,b^C,b^U)_{T}\) be two triangular fuzzy numbers and \(\rho\) be a crisp number. Then, we have

$$\begin{gathered} \tilde{A} \oplus \tilde{B} = (a^{L} + b^{L} ,a^{C} + b^{C} ,a^{U} + b^{U} )_{T} , \hfill \\ \rho \otimes \tilde{A} = \left\{ {\begin{array}{*{20}c} {(\rho a^{L} ,\rho a^{C} ,\rho a^{U} )_{T} ,} & {{\text{if }}\rho \ge 0,} \\ {( - \rho a^{U} ,\rho a^{C} , - \rho a^{L} )_{T} ,} & {{\text{if }}\rho < 0} \\ \end{array} } \right. \hfill \\ \end{gathered}$$

Yang and Ko [34] defined a distance between LR fuzzy numbers which considers the effect of the shape of the membership function. In Sect. 5, we will use it to evaluate the closeness between the observed and estimated outputs. For two LR fuzzy numbers \(\tilde{A}=(a^L,a^C,a^U)_{LR}\) and \(\tilde{B}=(b^L,b^C,b^U)_{LR}\), the distance between \(\tilde{A}\) and \(\tilde{B}\) are defined as follows:

$$D(\tilde{A},\tilde{B})=\Big \{(a^C-b^C)^2+[(a^C-\eta _1a^L)-(b^C-\eta _1b^L)]^2+[(a^C+\eta _2a^U)-(b^C+\eta _2b^U)]^2\Big \}^{1/2},$$

where \(\eta _1=\int _{0}^1L^{-1}(x)dx\) and \(\eta _2=\int _{0}^1R^{-1}(x)dx\) [34]. When \(\tilde{A}\) and \(\tilde{B}\) are two triangular fuzzy numbers, the distance defined above reduces to

$$D(\tilde{A},\tilde{B})=\Big \{(a^C-b^C)^2+[(a^C-0.5a^L)-(b^C-0.5b^L)]^2+[(a^C+0.5a^U)-(b^C+0.5b^U)]^2\Big \}^{1/2}.$$
(2.1)

For more properties of the distance defined above, we refer the reader to [9, 10, 15, 34].

3 Fuzzy Linear Regression Model and Some Existing Estimates

Consider the following fuzzy linear regression model:

$$\tilde{y}_i=\oplus _{j=1}^k(x_{ij}\otimes {\tilde{\beta }}_j)\oplus {\tilde{\epsilon }}_i, \,i=1,\cdots ,n,$$
(3.1)

where \(\tilde{y}_i=(y_i^L,y_i^C,y_i^U)_T\) is fuzzy response variable, \(x_{i1}, \cdots , x_{ik}\) are crisp explanatory variables, \({\tilde{\beta }}_j=(\beta _j^L,\beta _j^C,\beta _j^U)_T\) is unknown fuzzy coefficient, and \({\tilde{\epsilon }}_i\) denotes fuzzy error term. Let \(\tilde{y}_i^*=(y_i^{*L},y_i^{*C},y_i^{*U})_T=\oplus _{j=1}^k(x_{ij}\otimes {\tilde{\beta }}_j)\). From the arithmetic operation of triangular fuzzy numbers, we have, for \(1\le i\le n\),

$$\begin{aligned}&y_i^{*C}=\sum _{j=1}^k\beta _j^Cx_{ij},\\&y_i^{*L}=\sum _{j=1}^k[\gamma _{ij}\beta _j^Lx_{ij}-(1-\gamma _{ij})\beta _j^Ux_{ij}],\\&y_i^{*U}=\sum _{j=1}^k[\gamma _{ij}\beta _j^Ux_{ij}-(1-\gamma _{ij})\beta _j^Lx_{ij}], \end{aligned}$$

where \(\gamma _{ij}=I(x_{ij}>0)\) and \(I(\cdot )\) denotes the indicator function.

In the following, we review three popular estimates for the unknown coefficients in model (3.1).

3.1 Fuzzy Least-Squares Estimate (FLSE)

The most widely used method in linear regression analysis is the ordinary least-squares method. Diamond [7] generalized it to fuzzy environment and obtained fuzzy least-squares estimate (FLSE) for \(\tilde{\varvec{\beta }}=({\tilde{\beta }}_1,\cdots ,{\tilde{\beta }}_k)\) in model (3.1) as follows:

$$\begin{aligned} \hat{\tilde{\varvec{\beta }}}^{FLSE}=\mathop {\arg \min }_{\tilde{\varvec{\beta }}}\sum _{i=1}^n d_1^2(\tilde{y}_i,\tilde{y}_i^*), \end{aligned}$$
(3.2)

where

$$\begin{aligned} d_1(\tilde{y}_i,\tilde{y}_i^*)=\Big \{(y_i^C-y_i^{*C})^2+[(y_i^C-y_i^L)-(y_i^{*C}-y_i^{*L})]^2 +[(y_i^C+y_i^U)-(y_i^{*C}+y_i^{*U})]^2\Big \}^{1/2}. \end{aligned}$$

Although FLSE is accurate and simple to compute, it will have bad performance when the outliers exist. Even a single unusual value may have a great influence on the estimate. For other fuzzy least-squares estimates of model (3.1), we refer the reader to Chachi [2], Coppi et al. [6], D’Urso [9], and Xu and Li [33].

3.2 Fuzzy Least Absolute Estimate (FLAE)

Since the least-squares estimate is sensitive to outliers, the least absolute method is developed to overcome this problem. Choi and Buckley [3] stated that the least absolute estimate is more efficient than the least-squares estimate when the unusual values exist. Later, based on the distance

$$\begin{aligned} d_2(\tilde{y}_i,\tilde{y}_i^*)=|y_i^C-y_i^{*C}|+|y_i^L-y_i^{*L}| +|y_i^U-y_i^{*U}|. \end{aligned}$$
(3.3)

Zeng et al. [36] established a fuzzy least absolute estimate (FLAE) for \(\tilde{\varvec{\beta }}=({\tilde{\beta }}_1,\cdots ,\tilde{\beta }_k)\) as follows:

$$\begin{aligned} \hat{\tilde{\varvec{\beta }}}^{FLAE}=\mathop {\arg \min }_{\tilde{\varvec{\beta }}} \sum _{i=1}^nd_2(\tilde{y}_i,\tilde{y}_i^*). \end{aligned}$$

For other fuzzy absolute least estimates, see Choi and Buckley [3], Taheri and Kelkinnama [25].

3.3 Fuzzy Lasso Estimate

Lasso is a shrinkage method by assigning an \(L_1\)-penalty to shrunk the coefficients to zero. Moreover, it can select variables simultaneously [28]. Recently, Hesamian and Akbari [15] applied lasso technique to fuzzy linear regression and got the following estimate for \(\tilde{\varvec{\beta }}=(\tilde{\beta }_1,\cdots ,\tilde{\beta }_k)\) as follows:

$$\begin{aligned}&\hat{\tilde{\varvec{\beta }}}^{lasso}=\mathop {\arg \min }_{\tilde{\varvec{\beta }}}\sum _{i=1}^n D^2(\tilde{y}_i,\tilde{y}_i^*),\nonumber \\&s.t.\quad {\left\{ \begin{array}{ll} \sum _{j=1}^k |\beta _j^C| \le \lambda _1, &{} \\ \sum _{j=1}^k \beta _j^L \le \lambda _2, &{} \\ \sum _{j=1}^k \beta _j^U \le \lambda _3, &{} \\ \end{array}\right. } \end{aligned}$$
(3.4)

where the distance \(D(\cdot ,\cdot )\) is defined by Eq. (2.1) and \(\lambda _1, \lambda _2, \lambda _3\) are three positive tuning parameters determined by cross-validation criterion [23]. For more properties of lasso method, see Tibshirani [28] and Hastie et al. [14].

4 Fuzzy Adaptive Lasso Estimate

As stated in Sect. 1, lasso technique is unjust to significant covariates, because it penalizes all coefficients equally in \(L_1\)-penalty. Hence, in the following, we extend the adaptive lasso technique (Zou [37]) to estimate \(\tilde{\varvec{\beta }}=(\tilde{\beta }_1,\cdots ,\tilde{\beta }_k)\), which assigns larger penalties on unimportant covariates and smaller penalties on important ones.

Let \(\tau\) be a crisp positive number. Suppose that \(\hat{\tilde{\varvec{\beta }}}=(\hat{\tilde{\beta }}_1,\cdots ,\hat{\tilde{\beta }}_k)\) denotes a fuzzy least-squares estimate of \(\tilde{\varvec{\beta }}=(\tilde{\beta }_1,\cdots ,\tilde{\beta }_k)\) with \(\hat{\tilde{\beta }}_j=(\hat{\beta }_j^L,\hat{\beta }_j^C,\hat{\beta }_j^U)_T\). For example , we can use Diamond’s estimate \(\hat{\tilde{\varvec{\beta }}}^{FLSE}\) given in Equation (3.2). The fuzzy adaptive lasso estimate \(\hat{\tilde{\varvec{\beta }}}^{alasso}\) is defined by

$$\begin{aligned} \hat{\tilde{\varvec{\beta }}}^{alasso}=\mathop {\arg \min }_{\tilde{\varvec{\beta }}}\sum _{i=1}^n D^2(\tilde{y}_i,\tilde{y}_i^*) +\lambda \sum _{j=1}^k\Big (w_{j1}|\beta _j^C| +w_{j2}\beta _j^L +w_{j3}\beta _j^U\Big ), \end{aligned}$$
(4.1)

where

$$\begin{aligned} w_{j1}=(1/|\hat{\beta }_{j}^C|)^{\tau }, \, w_{j2}= (1/\hat{\beta }_{j}^L)^{\tau }, \, w_{j3}=(1/\hat{\beta }_{j}^U)^{\tau },\,1\le j\le k, \end{aligned}$$

and \(\lambda\) is the tuning constant. These two parameters \(\tau\) and \(\lambda\) will be determined later.

Remark 4.1

Although we recommend to apply Diamond’s estimate \(\hat{\tilde{\varvec{\beta }}}^{FLSE}\) in Equation (3.1), we may use other least-squares estimates [6, 9, 33]. Moreover, if the input variables have high collinearity, we could use the ridge estimate (Hong and Hwang [16]) because of its stability.

Remark 4.2

Determining the tuning parameters \(\tau\) and \(\lambda\) is a crucial problem in adaptive lasso procedure [37]. In the numerical studies, we use two-dimensional cross-validation procedure [23] to choose the optimal values of them. In addition, the parameter \(\tau\) is selected from the set \(\{0.5,1,2\}\) suggested by Zou [37].

4.1 Goodness-of-Fit Measures

Suppose that \(\hat{\tilde{y}}_i\) denotes the estimated value of \(\tilde{y}_i\). To assess the prediction accuracy of fuzzy adaptive lasso estimate, we use the following three widely used measures [15, 36].

  • Mean-Square Error (MSE):

    $$\begin{aligned} MSE=\frac{1}{n}\sum _{i=1}^n D^2(\tilde{y}_i,\hat{\tilde{y}}_i), \end{aligned}$$

    where the distance \(D(\cdot ,\cdot )\) is defined by Eq. (2.1).

  • Mean Absolute Error (MAE):

    $$\begin{aligned} MAE=\frac{1}{n}\sum _{i=1}^n d_2(\tilde{y}_i,\hat{\tilde{y}}_i), \end{aligned}$$

    where the distance \(d_2(\cdot ,\cdot )\) is defined by Eq. (3.3).

  • Mean Similarity Measure (MSM):

    $$\begin{aligned} MSM=\frac{1}{n}\sum _{i=1}^n S(\tilde{y}_i,\hat{\tilde{y}}_i), \end{aligned}$$
    (4.2)

where the similarity measure \(S(\cdot ,\cdot )\) is defined as follows:

$$\begin{aligned} S(\tilde{A},\tilde{B})=1-\frac{d_2(\tilde{A},\tilde{B})}{\max (a^C+a^U,b^C+b^U)-\min (a^C-a^L,b^C-b^L)} \end{aligned}$$

with \(\tilde{A}=(a^L,a^C,a^U)_T\), \(\tilde{B}=(b^L,b^C,b^U)_T\). This measure is also used by Zeng et al. [36]. In addition, it satisfies that (i) \(S(\tilde{A},\tilde{B})=S(\tilde{B},\tilde{A})\), (ii) \(0\le S(\tilde{A},\tilde{B})\le 1\) and \(S(\tilde{A},\tilde{B})=1\) if and only if \(\tilde{A}=\tilde{B}\) [36].

Note that smaller values of MSE and MAE mean smaller total estimation errors and higher prediction performances. On the contrary, a larger value of MSM indicates better prediction performance.

4.2 Algorithm for the Estimate \(\hat{\tilde{\varvec{\beta }}}^{alasso}\)

In the following, we state the algorithm of fuzzy adaptive lasso method. The optimal values of \(\lambda\) and \(\tau\) will also be determined. Moreover, the flowchart for the proposed method is given in Fig. 1.

figure b
Fig. 1
figure 1

Flowchart for fuzzy adaptive lasso estimate

5 Numerical Examples

In this section, we will evaluate the performance of fuzzy adaptive lasso estimate. In Subsect. 5.1, we compare the prediction accuracy of the new estimate with several commonly used methods including Diamond [7] (FLSE), Coppi et al. [6], Taheri and Kelkinnama [25], Zeng et al. [36] (FLAE), and Hesamian and Akbari [15] (Fuzzy lasso). Three goodness-of-fit measures (MSEMAEMSM) introduced in Subsect. 4.1 are calculated to assess their prediction accuracy. In Subsect. 5.2, we illustrate the performance of fuzzy adaptive lasso estimate in variable selection. All experiments were implemented in the R language [22].

5.1 Prediction Accuracy of Fuzzy Adaptive Lasso Estimate

Example 5.1

(A few large effects) Consider the following fuzzy linear regression model:

$$\begin{aligned} \tilde{y}_i=\oplus _{j=1}^{8}(x_{ij}\otimes \tilde{\beta }_j)\oplus \tilde{\epsilon }_i, \, i=1,\cdots ,n, \end{aligned}$$

where the true regression coefficients are \(\tilde{\beta }_1=(0.5,3,0.4)_T, \, \tilde{\beta }_3=(0.3,2,0.3)_T\), \(\tilde{\beta }_2=\tilde{\beta }_4=\tilde{\beta }_7=\tilde{\beta }_8=(0,0,0)_T\), \(\tilde{\beta }_5=(0.001,0,0)_T\), and \(\tilde{\beta }_6=(0,0,0.001)_T\). The crisp inputs \(\varvec{x}_i=(x_{i1},\cdots ,x_{i8}), i=1,\cdots ,n\), are iid normal vectors with mean zero and covariance matrix \(\Sigma =(\sigma _{ij})_{8\times 8}\), \(\sigma _{ij}=0.5^{|i-j|}\). In addition, the fuzzy error term \(\tilde{\epsilon }_i=(\epsilon _i^L,\epsilon _i^C,\epsilon _i^U)_T\) with \(\epsilon _i^L\sim N(0,\sigma _1^2)\), \(\epsilon _i^C\sim N(0,\sigma _2^2),\) and \(\epsilon _i^U\sim N(0,\sigma _3^2)\). We set \(\varvec{\sigma }=(\sigma _1,\sigma _2,\sigma _3)\) to be (0.5, 1, 0.5), (0.2, 0.4, 0.2), and (0.1, 0.2, 0.1).

In this example, we generated \(N=100\) simulated datasets with \(n=20\) and 40. We calculate the mean MSEMAE, and MSM of fuzzy adaptive lasso estimate and other five popular methods. These results are summarized in Table 1. From this table, we may obtain the following observations. First, in all cases, the values of MSE and MAE of fuzzy adaptive lasso estimate are smaller than those of other methods, which means that the method we proposed performs best in terms of MSE and MAE measures. Second, according to MSM measure, Zeng et al.’s method outperforms the other five estimates. Finally, all methods perform better as the variances decrease, which is consistent with the traditional statistical case [37].

Besides the criterions introduced in Subsect. 4.1, we also consider the general defuzzification method introduced by Sugeno [24], which is called ”the center of gravity.” This defuzzification method transforms fuzzy quantities into crisp ones. From [24], the center of gravity for a triangular fuzzy number \(\tilde{A}=(a^L,a^C,a^U)_T\) is

$$G_{\tilde{A}}=a^C+\frac{1}{3}(a^U-a^L).$$
(5.1)

The centers of gravity of the observed and estimated outputs for Example 5.1 with \(n=20\) and 40 are plotted in Fig. 2. It suggests that the the estimated outputs agree with the true outputs very closely.

Table 1 Comparison of the performances of fuzzy adaptive lasso estimate and five popular methods for Example 5.1 with \(n=20, 40\)
Fig. 2
figure 2

The centers of gravity for the observed outputs and the estimated ones of the 1st simulated dataset in Example 5.1 with \(n=20\) (left) and \(n=40\) (right)

Example 5.2

(Many small effects) In this example, we also consider fuzzy linear regression model

$$\begin{aligned} \tilde{y}_i=\oplus _{j=1}^{8}(x_{ij}\otimes \tilde{\beta }_j)\oplus \tilde{\epsilon }_i, \, i=1,\cdots ,n, \end{aligned}$$

where \(\varvec{x}_i=(x_{i1},\cdots ,x_{i8})\) and \(\tilde{\epsilon }_i=(\epsilon _i^L,\epsilon _i^C,\epsilon _i^U)_T\) are the same as those in Example 5.1. We set fuzzy coefficients to be \(\tilde{\beta }_1=\tilde{\beta }_2=\cdots =\tilde{\beta }_8=(0.2,0.85,0.2)_T\). Similar to Example 5.1, we also generated \(N=100\) datasets with \(n=20\) and 40. Three measures of six methods are reported in Table 2. Furthermore, the centers of gravity of observed outputs and estimated ones for this example are given in Fig. 3. From Table 2 and Fig. 3, we obtain similar conclusion with that of Example 5.1.

Table 2 Comparison of the performances of fuzzy adaptive lasso estimate and five popular methods for Example 5.2 with \(n=20, 40\)
Fig. 3
figure 3

The centers of gravity for observed outputs and the estimated ones of the 1st simulated dataset in Example 5.2 with \(n=20\) (left) and \(n=40\) (right)

Example 5.3

In order to illustrate the application of fuzzy adaptive lasso estimate, we consider an example in [33], which analyzed the effect of the composition Portland cement on heat evolved during hardening. The dataset is presented in Table 3. In this data set, the symmetric triangular fuzzy output \(\tilde{y}_i=(y_i,e_i)_T\) denotes the heat evolved in calories per gram of cement, the crisp inputs \(x_1, x_2\) and \(x_3\) denote the amount of tricalcium aluminate, the amount of tricalcium silicate, and the amount of tetracalcium alumino ferrite, respectively.

Table 3 Dataset in Example 5.3
Table 4 The estimated coefficients and three measures of four methods for Example 5.3

In this example, we employ fuzzy adaptive lasso method, Hesamian and Akbari [15] (fuzzy lasso method), Coppi et al. [6], and Zeng et al. [36] to estimate the unknown fuzzy coefficients. The estimated coefficients and three measures (MSEMAEMSM) of these methods are reported in Table 4. From Table 4, none of these methods can universally dominate the other two methods. Specially, the values of MSE and MAE of fuzzy adaptive lasso estimate are 48.4402 and 5.9162, which are less than those of other methods. In terms of the measure MSM, Zeng et al. [36] have the best performance.

Example 5.4

In this example, we consider the house price data which were also studied by Tanaka et al. [26] and Choi [4]. This dataset is stated in Table 5. The symmetric triangular fuzzy output \(\tilde{y}_i=(y_i,e_i)_T\) denotes the house price (1000 yen) and the crisp inputs \(x_1, x_2\) and \(x_3\) denote first floor space m\(^2\), second floor space m\(^2\) and number of rooms, respectively. We also use fuzzy adaptive lasso method, Hesamian and Akbari [15] (fuzzy lasso method), Coppi et al. [6], and Zeng et al. [36] to estimate the unknown fuzzy coefficients. The corresponding results are presented in Table 6. In terms of MSE and MSM, fuzzy adaptive lasso estimate has the best performance than other methods. Moreover, according to the measure MAE, Zeng et al. [36] outperform other methods.

Table 5 Dataset in Example 5.4
Table 6 The estimated coefficients and three measures of four methods for Example 5.4

5.2 Variable Selection of Fuzzy Adaptive Lasso Estimate

Hesamian and Akbari [15] stated that if fuzzy coefficients are \((0,0,0)_T\), \((l,0,0)_T,\) or \((0,0,r)_T\), they are called ”about 0.” Moreover, the corresponding explanatory variable is said to be completely noninformative if its fuzzy coefficient is \((0,0,0)_T\) and strongly noninformative if its fuzzy coefficient is \((l,0,0)_T\) or \((0,0,r)_T\). According to their criterion, the explanatory variables \(x_2, x_4, x_7, x_8\) in Example 5.1 are completely noninformative, and \(x_5, x_6\) are strongly noninformative. In this subsection, we also use this criterion to study variable selection of fuzzy adaptive lasso method.

Table 7 presents the results of fuzzy adaptive lasso and fuzzy lasso estimates in variable selection for Example 5.1 with three different variances. In this table, the column labeled ”C” states the average number of selected nonzero components, and the column labeled ”I” denotes the average number of ”about zero” components incorrectly selected into the final model. From the column labeled ”C,” both methods can correctly select the two informative variables (\(x_1\) and \(x_3\)) in all cases. From the column labeled ”I,” fuzzy adaptive lasso estimate seems to select less noninformative variables into the final model when the variances are low. However, fuzzy lasso estimate tends to outperform fuzzy adaptive lasso estimate when the variances are high. We also studied the case of \(n=20\) and obtained the same conclusion.

Table 7 Average number of selected variables of fuzzy adaptive lasso, fuzzy lasso for Example 5.1 with \(n=40\)

6 Concluding Remarks

In this paper, we propose a fuzzy adaptive lasso estimate for fuzzy linear regression with crisp inputs and fuzzy outputs. This newly proposed estimate is a penalized one which imposes an \(L_1\) penalty on least-squares error. Compared with fuzzy lasso estimate established by Hesamian and Akbari [15], the new estimate assigns different weights on different coefficients, which is fair to the important coefficients. The tuning parameters are determined by two-dimensional cross-validation procedure. We compared fuzzy adaptive lasso estimate with five usually used estimates through several numerical experiments. The experimental results show that the proposed fuzzy adaptive lasso estimate is an effective tool for estimation and variable selection.

There are of course some deficiencies of the proposed method. Note that, in many real applications, the input variables can also be fuzzy [5, 9]. Moreover, compared with fuzzy set, Pythagorean fuzzy set is an effective tool dealing with fuzziness and imprecision, especially the multiple attribute decision making problems [30, 31, 35]. Hence, extending the proposed method to the situation where both input and output variables are fuzzy or Pythagorean fuzzy is one direction in our future research. Moreover, as stated in Vidaurre et al. [29], the adaptive lasso is sensitive to collinearity. Thus, generalizing robust penalized techniques such as elastic net (Zou and Hastie [38]) and LAD-lasso (Wang et al. [32]) to fuzzy environment will be another research topic in the future.