1 Introduction

Linear regression model is a widely used statistics tool to evaluate the linear relationship between a quantitative dependent variable (output, or response variable), and one or more explanatory variables (inputs).

In linear regression modeling, two main issues have to be dealt with in practical problems:

  1. 1.

    imprecision or vagueness in the definition and/or observation of output and/or of inputs (see [10] for a more detailed discussion about the overall sources of uncertainty which may affect regression analysis);

  2. 2.

    presence of outliers, which could cause the estimates of the regression coefficients to be bias.

As for the first issue, data imprecision may be due to several causes: (i) imprecision in measuring the empirical phenomena observed; (ii) vagueness of the variables of interest (inputs and/or outputs) when they are expressed in linguistic terms; (iii) partial or total ignorance about the variables’ values on specific instances; (iv) granularity (categorization) of the variables of interest. When dealing with one or more of these situations, a fuzzification of the inputs and/or the output could suitably exploit the available information. Converting imprecise data into fuzzy data could be more effective than replacing them with a single value.

In this paper, imprecise data are then represented by fuzzy statistical variables.

The second issue regards the robustness of the estimates in a noisy environment. The Least Squares (LS) approach is one of the most popular methods for estimating linear regression coefficients, due to its theoretical and applicative advantages. However, the LS approach is not robust to the presence of outliers. This shortcoming of the LS approach undermines its application, even in presence of a small percentage of anomalous observations. In this paper, we cope with this issue by considering a robust estimation method, which is effective in reducing the distorting effect of outliers.

Following an iterative Weighted Least Squares (WLS) estimation approach, we propose a linear regression model for studying the dependence of a general class of fuzzy linear variables on a set of crisp or fuzzy explanatory variables (see Sect. 2). The proposed model represents a generalization of the fuzzy regression model suggested by Coppi et al. [10].

In order to investigate the goodness of fit of the regression model, some theoretical properties and a suitable generalization of the determination coefficient are described (Sect. 2).

Furthermore, we illustrate an assessment of the imprecision associated with the estimates of the regression coefficients of the proposed regression model (Sect. 3).

We then suggest a robust version of the fuzzy regression model based on the Least Median Squares (LMS) estimation approach, that is able to neutralize and/or smooth disruptive effects of possible crisp or fuzzy outliers in the estimation process (Sect. 4). The proposed robust model is a generalization of the robust model proposed by D’Urso et al. [20].

In order to illustrate the good performance of our model a simulation study and two empirical applications are presented (Sects. 5 and 6).

Some final remarks conclude the paper.

2 The linear regression model for \(\text {LR}_2\) fuzzy inputs and output

Consider a fuzzy regression model with fuzzy/crisp output and fuzzy/crisp input.

Based on the traditional inferential approach, the expected value of the fuzzy/crisp output should be reparameterized in terms of a linear model involving the “regression effects” of \(p\) fuzzy/crisp explanatory variables. In literature, different theoretical contributions have been proposed for the fuzzy regression analysis based on a conjoint approach (inferential and fuzzy) (see, e.g., [2, 3, 22, 23, 25, 26, 29, 32, 36]).

The methodological approach considered in our paper is to select, among a class of possible linear models (expressing the relationship between the fuzzy/crisp output and the fuzzy/crisp inputs), the “best” linear model according to some specific criteria.

Following this approach, two main lines of research could be pursued in literature, the Possibilistic approach firstly introduced by Tanaka et al. [34], and the Least Squares (LS) approach, based on suitable extensions of the well-known least squares criterion to the fuzzy setting (see, among others, [4, 6, 911, 1416, 18, 20, 28, 35]).

In the possibilistic framework, the fuzzy regression coefficients of a regression model are estimated by minimizing the fuzziness of the estimated response variable, conditionally on obtaining fuzzy response values which contain (to a certain possibility degree \(0\le h \le 1\)) the observed fuzzy responses (see, for instance, [33]; and, in a comparative perspective, [5, 12, 24]).

In the LS approach the objective is to find the linear model which “best approximates” the observed data in a given metric space. The LS criterion is then conditional on the chosen metric.

Two main features characterize the adopted line of research, i.e., (a) the definition of the linear regression model, and (b) the specific metric space introduced for applying the LS criterion.

As for the first aspect, we will extend the linear regression models proposed by Coppi et al. [10] and D’Urso et al. [20] to the case when both the output and the inputs are fuzzy, in particular, fuzzy \(LR_2\) variables (see the following Sect. 2.1), by setting up a procedure for estimating the two centers and the spreads of the regression coefficients. As for the second aspect, we will extend the distance function introduced by Coppi et al. [10] in our framework. Notice that our fuzzy regression models, which are explained in the following sections, are based on an exploratory approach.

2.1 Fuzzy data

We formalize imprecise data as \(LR_2\) fuzzy data. In particular, \(LR_2\) fuzzy data can be represented as \({\tilde{y}}\equiv (m_1,m_2,l,r)_{LR_2}\), where \(m_1,\,m_2 \in {\mathbb {R}}\,(m_1\le m_2)\) denotes the centers, or the “modes”, of the fuzzy data, while \(l,\,r \in {\mathbb {R}}^{+}\) are the left and right spread, respectively, with the following membership functions [13, 37]:

$$\begin{aligned} \mu _{{\tilde{y}}}(\omega )= {\left\{ \begin{array}{ll} L\left( \frac{m_1-\omega }{l}\right) &{} \omega \le m_1\quad (l>0)\\ 1 \qquad &{} m_1\le \omega \le m_2\\ R\left( \frac{\omega -m_2}{r}\right) &{}\omega \ge m_2\quad (r>0) \end{array}\right. } \end{aligned}$$
(1)

where \(L\) (and \(R\)) is a decreasing “shape” function from \({\mathbb {R}}^+\) to \([0,1]\) with \(L(0)=1\); \(L(\omega )<1\) for all \(\omega >0\); \(L(\omega )>0\) for all \(\omega \); \(L(1)=0\)(or \(L(\omega )>0\) for all \(\omega \) and \(L(+\infty )=0\)).

If \(l=r\), we obtain symmetrical \(LR_2\) fuzzy data.

When \(m_1=m_2=m\) we obtain the \(LR_1\) fuzzy data, which has the following membership function:

$$\begin{aligned} \mu _{{\tilde{y}}}(\omega )= {\left\{ \begin{array}{ll} L\left( \frac{m-\omega }{l}\right) &{} \omega \le m\quad (l>0)\\ R\left( \frac{\omega -m}{r}\right) &{}\omega \ge m\quad (r>0) \end{array}\right. } \end{aligned}$$
(2)

A particular case of \(LR_2\) fuzzy data is the trapezoidal fuzzy data, whose membership function is:

$$\begin{aligned} \mu _{{\tilde{y}}}(\omega )= {\left\{ \begin{array}{ll} 1-\frac{m_1-\omega }{l} &{}\quad m_1-l \le \omega \le m_1\quad (l>0)\\ 1 &{}\quad m_1\le \omega \le m_2\\ 1-\frac{\omega -m_2}{r} &{}\quad m_2 \le \omega \le m_2+r\quad (r>0)\\ 0 &{}\quad \text {otherwise.} \end{array}\right. } \end{aligned}$$
(3)

Figure 1 shows the membership function of a trapezoidal fuzzy datum.

Fig. 1
figure 1

Geometric representation of the trapezoidal membership function

A particular case of \(LR_1\) fuzzy data is the triangular fuzzy data, with the following membership function:

$$\begin{aligned} \mu _{{\tilde{y}}}(\omega )= {\left\{ \begin{array}{ll} 1-\frac{m-\omega }{l} &{}\quad m-l \le \omega \le m\quad (l>0)\\ 1-\frac{\omega -m}{r} &{}\quad m \le \omega \le m+r\quad (r>0)\\ 0 &{}\quad \text {otherwise.} \end{array}\right. } \end{aligned}$$
(4)

2.2 Model definition and estimation

Consider the linear dependence relationship between a \(LR_2\) fuzzy output (or response variable) \({\widetilde{Y}}\equiv (m_1,m_2,l,r)\) and a set of \(p\) \(LR_2\) fuzzy inputs \(\{{\widetilde{X}}_j\equiv ({}_xm_{1j},{}_xm_{2j},{}_xl_{j},{}_xr_{j}):\, j =1,\ldots ,p\}\).

The proposed fuzzy linear regression model consists of modeling simultaneously the two centers of the \(LR_2\) response variable by means of a multiple regression model on the \(LR_2\) explanatory variables, and the left and right spreads of the response through two multiple linear regressions on the estimated centers.

Hence, the linear regression model with fuzzy response variable \({\widetilde{Y}}\) and fuzzy explanatory variables \({\widetilde{X}}_j,\,j=1,\ldots ,p,\) can be formalized as follows, using a matrix notation:

$$\begin{aligned}&\mathbf {m}_{1}= \mathbf {m}^*_{1}+{\varvec{\varepsilon }}_{\mathbf {m}_{1}} \quad \mathbf {m}^*_{1}= \mathbf {M}_1{\varvec{\alpha }}_1+\mathbf {M}_2{\varvec{\alpha }}_2 +\mathbf {L}{\varvec{\alpha }}_l + \mathbf {R}{\varvec{\alpha }}_r \end{aligned}$$
(5a)
$$\begin{aligned}&\mathbf {m}_{2}= \mathbf {m}^*_{2}+{\varvec{\varepsilon }}_{\mathbf {m}_{2}} \quad \mathbf {m}^*_{2}= \mathbf {M}_1{\varvec{\beta }}_1+\mathbf {M}_2{\varvec{\beta }}_2 +\mathbf {L}{\varvec{\beta }}_l + \mathbf {R}{\varvec{\beta }}_r\end{aligned}$$
(5b)
$$\begin{aligned}&\mathbf {l}= \mathbf {l}^*+{\varvec{\varepsilon }}_{\mathbf {l}} \qquad \qquad ~\mathbf {l}^*= \mathbf {1}\gamma _{0}+\mathbf {m}^*_1\gamma _1+\mathbf {m}^*_2\gamma _2=\mathbf {M}^*{\varvec{\gamma }} \end{aligned}$$
(5c)
$$\begin{aligned}&\mathbf {r}= \mathbf {r}^*+{\varvec{\varepsilon }}_{\mathbf {r}} \qquad \qquad \mathbf {r}^*= \mathbf {1}\delta _{0}+\mathbf {m}^*_1\delta _1+\mathbf {m}^*_2\delta _2=\mathbf {M}^*{\varvec{\delta }} \end{aligned}$$
(5d)

where \(\mathbf {m}_1,\,\mathbf {m}_2\) are the \(n\)-vectors of the left and right centers of the response fuzzy variables, \(m_{1},\,m_{2}\); \(\mathbf {l},\,\mathbf {r}\) are the \(n\)-vectors of the left and right spreads of the response fuzzy variables, \(l,\,r\); \(\mathbf {M}_1,\,\mathbf {M}_2\) are the \((n\times (p+1))\)-matrices of the left and right centers of the input fuzzy variables (design matrices), \({}_xm_{1j},\,{}_xm_{2j}\,(j=1,\ldots ,p)\), respectively; \(\mathbf {L},\,\mathbf {R}\) are the \((n\times (p+1))\)-matrices of the left and right spreads of the input fuzzy variables, \({}_xl_{j},\,{}_xr_{j}\,(j=1,\ldots ,p)\), respectivelyFootnote 1; \({\varvec{\alpha }}_1,\,{\varvec{\alpha }}_2,\,{\varvec{\alpha }}_l,\,{\varvec{\alpha }}_r\) are the \((p+1)\)-vectors of coefficients of the models on the left centers \(\mathbf {m}_1\); \({\varvec{\beta }}_1,\,{\varvec{\beta }}_2,\,{\varvec{\beta }}_l,\,{\varvec{\beta }}_r\) are the \((p+1)\)-vectors of coefficients of the models on the right centers \(\mathbf {m}_2\); \(\gamma _0,\gamma _1\text { and }\gamma _2\) are the coefficients of the model on the left spreads, \(\mathbf {l}\); \(\delta _0,\delta _1\text { and }\delta _2\) are their counterparts for the model on the right spreads, \(\mathbf {r}\); \({\varvec{\varepsilon }}_{\mathbf {m}_{1}},\, {\varvec{\varepsilon }}_{\mathbf {m}_{2}}\) are the \(n\)-vectors of the error terms of the models on the left and right centers, respectively; \({\varvec{\varepsilon }}_{\mathbf {l}},\, {\varvec{\varepsilon }}_{\mathbf {r}}\) are the \(n\)-vectors of the error terms of the models on the left and right spreads, respectively; \(\mathbf {1}\) is the \(n\)-vector of ones. The theoretical values of the centers, and of the spreads are marked with an asterisk symbol (*). Finally, \(\mathbf {M}^*\) is the \((n\times 3)\) matrix whose columns are the vector of ones and the vectors of the theoretical values of the left and right centers of the response variable; \({\varvec{\gamma }}\) and \({\varvec{\delta }}\) are the \((3 \times 1)\) vectors of the coefficients of the model on the left and right spreads, respectively.

Note that in the model (5a)–(5d) we assume that the estimates of both spreads depend on the estimates of both centers (see Eqs. (5c) and (5d)). We can interpret the left and right center of the dependent variable as the lower and the upper bound, respectively, of interval-valued data, and the spreads as the degree of imprecision of these interval-valued data. Hence, the assumption of linear dependency between spreads and centers is reasonable since in many instances the magnitude of the error depends on the size of the interval estimates.

We use the Weighted Least Squares (WLS) procedure to estimate the coefficients of the model [20]. In what follows, we refer to the model (5a)–(5d) as the WLS-based fuzzy regression model. Depending on the nature of the weighing matrix \(\mathbf {W}\), we have different linear regression models. In particular, when \(\mathbf {W}=\mathbf {I}\), we obtain the Least Squares (LS) based fuzzy regression model.

The objective function to be minimized is the weighted squared Euclidean distance between the observed fuzzy variables and their estimates, \({\widetilde{\varDelta }}_{\mathbf {W}}^2\) [9].

Let \(\Vert \mathbf {x}\Vert _{\mathbf {W}}=({\mathbf {x}}^{\prime }\mathbf {W}\mathbf {x})^{\frac{1}{2}}\) be the weighted norm of the generic vector \(\mathbf {x}\), where \(\mathbf {W}\) is a diagonal matrix, whose elements are the weights attached to each observation. Then, the weighted squared Euclidean distance \({\widetilde{\varDelta }}_{\mathbf {W}}^2\) can be written as:

$$\begin{aligned} {\widetilde{\varDelta }}_{\mathbf {W}}^2&= \Vert \mathbf {m}_1-\mathbf {m}^*_1\Vert _{\mathbf {W}}^2+\Vert \mathbf {m}_2-\mathbf {m}^*_2\Vert _{\mathbf {W}}^2 \nonumber \\&+\Vert (\mathbf {m}_1-\lambda \mathbf {l}) - (\mathbf {m}^*_1-\lambda \mathbf {l}^*)\Vert _{\mathbf {W}}^2+ \Vert (\mathbf {m}_2+\rho \mathbf {r}) - (\mathbf {m}^*_2+\rho \mathbf {r}^*)\Vert _{\mathbf {W}}^2 \end{aligned}$$
(6)

where \(\lambda =\int _{0}^{1}L^{-1}(\omega )~d\omega \) and \(\rho =\int _{0}^{1}R^{-1}(\omega )~d\omega \) are parameters which account for the shape of the membership function. In particular, if the membership function is trapezoidal, then \(\lambda =\rho =1/2\) [9].

Equation (6) can be developed as follows:

$$\begin{aligned} {\widetilde{\varDelta }}_{\mathbf {W}}^2&= {(\mathbf {m}_1-\mathbf {m}^*_1)}^{\prime }\mathbf {W}(\mathbf {m}_1- \mathbf {m}^*_1)+{(\mathbf {m}_2-\mathbf {m}^*_2)}^{\prime } \mathbf {W}(\mathbf {m}_2-\mathbf {m}^*_2)\nonumber \\&+{[(\mathbf {m}_1 - \lambda \mathbf {l})-(\mathbf {m}^*_1 - \lambda \mathbf {l}^*)]}^{\prime }\mathbf {W}[(\mathbf {m}_1 - \lambda \mathbf {l}) -(\mathbf {m}^*_1 - \lambda \mathbf {l}^*)]\nonumber \\&+{[(\mathbf {m}_2 + \rho \mathbf {r})-(\mathbf {m}^*_2 + \rho \mathbf {r}^*)]}^{\prime }\mathbf {W}[(\mathbf {m}_2 + \rho \mathbf {r}) -(\mathbf {m}^*_2 + \rho \mathbf {r}^*)]\nonumber \\&= {(\mathbf {m}_1-\mathbf {m}^*_1)}^{\prime }\mathbf {W}(\mathbf {m}_1- \mathbf {m}^*_1)+{(\mathbf {m}_2-\mathbf {m}^*_2)}^{\prime }\mathbf {W} (\mathbf {m}_2-\mathbf {m}^*_2)\nonumber \\&+{[(\mathbf {m}_1-\mathbf {m}^*_1)-\lambda (\mathbf {l}- \mathbf {l}^*)]}^{\prime }\mathbf {W}[(\mathbf {m}_1-\mathbf {m}^*_1)- \lambda (\mathbf {l}-\mathbf {l}^*)]\nonumber \\&+{[(\mathbf {m}_2-\mathbf {m}^*_2)+\rho (\mathbf {r}- \mathbf {r}^*)]}^{\prime }\mathbf {W}[(\mathbf {m}_2-\mathbf {m}^*_2)+\rho (\mathbf {r}-\mathbf {r}^*)]\nonumber \\&= 2\,({\mathbf {m}}^{\prime }_1\mathbf {W}\mathbf {m}_1-2\,{\mathbf {m}}^{\prime }_1\mathbf {W}\mathbf {m}^*_1+ {\mathbf {m}^*_1}^{\prime }\mathbf {W}\mathbf {m}^*_1+ {\mathbf {m}}^{\prime }_2\mathbf {W}\mathbf {m}_2-2\,{\mathbf {m}}^{\prime }_2\mathbf {W} \mathbf {m}^*_2+{\mathbf {m}^*_2}^{\prime }\mathbf {W}\mathbf {m}^*_2)\nonumber \\&-2\,\lambda ({\mathbf {m}}^{\prime }_1\mathbf {W}\mathbf {l}-{\mathbf {m}}^{\prime }_1\mathbf {W} \mathbf {l}^*-{\mathbf {m}^*_1}^{\prime }\mathbf {W}\mathbf {l}+{\mathbf {m}^*_1}^{\prime }\mathbf {W}\mathbf {l}^*) +\lambda ^2({\mathbf {l}}^{\prime }\mathbf {W}\mathbf {l}-2\,{\mathbf {l}}^{\prime }\mathbf {W} \mathbf {l}^*+{\mathbf {l}^*}^{\prime }\mathbf {W}\mathbf {l}^*)\nonumber \\&+2\,\rho ({\mathbf {m}}^{\prime }_2\mathbf {W}\mathbf {r}-{\mathbf {m}}^{\prime }_2\mathbf {W} \mathbf {r}^*-{\mathbf {m}^*_2}^{\prime }\mathbf {W}\mathbf {r}+{\mathbf {m}^*_2}^{\prime }\mathbf {W}\mathbf {r}^*) +\rho ^2({\mathbf {r}}^{\prime }\mathbf {W}\mathbf {r}-2\,{\mathbf {r}}^{\prime }\mathbf {W} \mathbf {r}^*+{\mathbf {r}^*}^{\prime }\mathbf {W}\mathbf {r}^*). \nonumber \\ \end{aligned}$$
(7)

By minimizing (7), we obtain the iterative solutions of the model (5a)–(5d), which are reported in Appendix.

2.3 Properties of the model

In this section, we illustrate some properties of the WLS-based fuzzy regression model (5a)–(5d), which will be useful in the following.

Proposition 1

The weighted sums of the residuals of the left and right centers and of the left and right spreads are equal to 0:

$$\begin{aligned} \mathbf {1}^{\prime }\mathbf {W}(\mathbf {m}_1-{\hat{\mathbf {m}}}_1)&= 0\\ \mathbf {1}^{\prime }\mathbf {W}(\mathbf {m}_2-{\hat{\mathbf {m}}}_2)&= 0\\ \mathbf {1}^{\prime }\mathbf {W}(\mathbf {l}-{\hat{\mathbf {l}}})&= 0\\ \mathbf {1}^{\prime }\mathbf {W}(\mathbf {r}-{\hat{\mathbf {r}}})&= 0 \end{aligned}$$

where \({\hat{\mathbf {m}}}_1,{\hat{\mathbf {m}}}_2,{\hat{\mathbf {l}}}\) and \({\hat{\mathbf {r}}}\) are the estimates of the left and right centers, and of the left and right spreads of the respondent variable.

From this proposition we also derive that the weighted mean of the residuals is equal to 0.

Proposition 2

The residuals of the left and right centers are uncorrelated with the estimates of the left and right centers, respectively:

$$\begin{aligned} (\mathbf {m}_1-{\hat{\mathbf {m}}}_1)^{\prime }\mathbf {W}{\hat{\mathbf {m}}}_1&= 0\\ (\mathbf {m}_2-{\hat{\mathbf {m}}}_2)^{\prime }\mathbf {W}{\hat{\mathbf {m}}}_2&= 0 \end{aligned}$$

Similarly, the residuals of the left and right spread are uncorrelated with the estimates of the left and right spreads, respectively:

$$\begin{aligned} (\mathbf {l}-{\hat{\mathbf {l}}})^{\prime }\mathbf {W}{\hat{\mathbf {l}}}&= 0\\ (\mathbf {r}-{\hat{\mathbf {r}}})^{\prime }\mathbf {W}{\hat{\mathbf {r}}}&= 0 \end{aligned}$$

Note that, given the relationship between the sub-models in (5a)–(5d), it follows that:

$$\begin{aligned} (\mathbf {m}_1-{\hat{\mathbf {m}}}_1)^{\prime }\mathbf {W}{\hat{\mathbf {l}}}&= 0\\ (\mathbf {m}_2-{\hat{\mathbf {m}}}_2)^{\prime }\mathbf {W}{\hat{\mathbf {r}}}&= 0\\ (\mathbf {l}-{\hat{\mathbf {l}}})^{\prime }\mathbf {W}{\hat{\mathbf {m}}}_1&= 0\\ (\mathbf {r}-{\hat{\mathbf {r}}})^{\prime }\mathbf {W}{\hat{\mathbf {m}}}_2&= 0 \end{aligned}$$

Proofs for Propositions 1–2 can be easily derived from the LS properties proved in Coppi et al. [10].

2.4 Goodness of fit

To evaluate the goodness of fit of the model (5a)–(5d) to the data, we propose a generalization of the determination coefficient \(R^2\) for fuzzy regression models suggested by Coppi et al. [10].

First, define the following quantities:

  • the total weighted sum of squares:

    $$\begin{aligned} SST_{\mathbf {W}}&= \Vert \mathbf {m}_1-\mathbf {1}{\bar{m}}_1\Vert _{\mathbf {W}}^2+ \Vert \mathbf {m}_2-\mathbf {1}{\bar{m}}_2\Vert _{\mathbf {W}}^2\nonumber \\&+\Vert (\mathbf {m}_1-\lambda \mathbf {l}) - (\mathbf {1} {\bar{m}}_1-\lambda \mathbf {1}{\bar{l}})\Vert _{\mathbf {W}}^2+ \Vert (\mathbf {m}_2+\rho \mathbf {r}) - (\mathbf {1}{\bar{m}}_2+\rho \mathbf {1}{\bar{r}})\Vert _{\mathbf {W}}^2, \end{aligned}$$
    (8)
  • the weighted explained sum of squares:

    $$\begin{aligned} SSE_{\mathbf {W}}&= \Vert {\hat{\mathbf {m}}}_1-\mathbf {1}{\bar{m}}_1 \Vert _{\mathbf {W}}^2+\Vert {\hat{\mathbf {m}}}_2-\mathbf {1}{\bar{m}}_2\Vert _{\mathbf {W}}^2 \nonumber \\&+\Vert ({\hat{\mathbf {m}}}_1-\lambda {\hat{\mathbf {l}}}) - (\mathbf {1}{\bar{m}}_1-\lambda \mathbf {1}{\bar{l}}) \Vert _{\mathbf {W}}^2+ \Vert ({\hat{\mathbf {m}}}_2+\rho {\hat{\mathbf {r}}}) - (\mathbf {1}{\bar{m}}_2+\rho \mathbf {1}{\bar{r}})\Vert _{\mathbf {W}}^2,\qquad \end{aligned}$$
    (9)
  • the weighted residual sum of squares:

    $$\begin{aligned} SSR_{\mathbf {W}}&= \Vert \mathbf {m}_1-{\hat{\mathbf {m}}}_1\Vert _{\mathbf {W}}^2+ \Vert \mathbf {m}_2-{\hat{\mathbf {m}}}_2\Vert _{\mathbf {W}}^2\nonumber \\&+\Vert (\mathbf {m}_1-\lambda \mathbf {l}) - ({\hat{\mathbf {m}}}_1- \lambda {\hat{\mathbf {l}}})\Vert _{\mathbf {W}}^2+ \Vert (\mathbf {m}_2+ \rho \mathbf {r}) - ({\hat{\mathbf {m}}}_2+\rho {\hat{\mathbf {r}}})\Vert _{\mathbf {W}}^2, \end{aligned}$$
    (10)

where \({\bar{m}}_1,{\bar{m}}_2,{\bar{l}}\) and \({\bar{r}}\) are the sample means of the left and right centers and of the left and right spreads, respectively.

Based on the properties illustrated in Sect. 2.3, it can be shown that:

$$\begin{aligned} SST_{\mathbf {W}}=SSE_{\mathbf {W}}+SSR_{\mathbf {W}}. \end{aligned}$$
(11)

Then, the determination coefficient for the weighted fuzzy linear regression model is defined as:

$$\begin{aligned} R^2_{\mathbf {W}}=\frac{SSE_{\mathbf {W}}}{SST_{\mathbf {W}}}=1- \frac{SSR_{\mathbf {W}}}{SST_{\mathbf {W}}}, \quad 0 \le R^2_{\mathbf {W}} \le 1. \end{aligned}$$
(12)

As in the standard linear regression framework, the closer \(R^2_{\mathbf {W}}\) approaches 1, the better the fit of the model to the data.

The analysis of the goodness of fit of a model is useful when one wants to select the model which provides the best fit to the data, in a class of parametric models.

However, it can be shown that \(R^2_{\mathbf {W}}\) is not decreasing as the number of inputs in the model increases. For this reason, if the objective is to select the “best” model in a class of models, then \(R^2_{\mathbf {W}}\) could be ineffective.

A better solution is to adopt the adjusted determination coefficient \({\bar{R}}^2_{\mathbf {W}}\), which adds a penalization term that takes into account the number of inputs. We indicate the number of parameters of the fuzzy regression model with \({\bar{p}}\). In particular, when both inputs and output are \(LR_2\) fuzzy variables \({\bar{p}}=[8\cdot (p + 1)+6]\). Then, the adjusted determination coefficient is:

$$\begin{aligned} {\bar{R}}^2_{\mathbf {W}}=1-(1-R^2_{\mathbf {W}})\frac{n-1}{n-{\bar{p}}}. \end{aligned}$$
(13)

\({\bar{R}}^2_{\mathbf {W}}\) increases only if the inclusion of a new input improves \(R^2_{\mathbf {W}}\) more than would be expected by chance. The adjusted determination coefficient can be used to select the optimal number of inputs to be included in the model.

As observed by Coppi et al. [10], the denominator of the adjusting factor \((n-{\bar{p}})\) in (13) decreases more than proportionally as \(p\) increases, thus penalizing the model with \(p+1\) variables in a more severe way than the traditional (crisp) version of the adjusted determination coefficient. Then, it would be better to use an adjusting factor in which we consider only the number of parameters of one of the centers-model, \(p\).

2.5 Some remarks

Remark 1

(Generalization of the design matrices) The design matrices \(\mathbf {M}_1,\mathbf {M}_2,\mathbf {L}\) and \(\mathbf {R}\) in the model (5a)–(5d) can be generalized by considering appropriate functions of the components of the fuzzy explanatory variables \({\tilde{\mathbf {X}}}\).

Let be \(\mathbf {F}_1,\mathbf {F}_2,\mathbf {F}_l\) and \(\mathbf {F}_r\) the “transformed” design matrices, where:

$$\begin{aligned}&\mathbf {f}^{\prime }_{1i}=[f_1({}_x\mathbf {m}_{1i}),\ldots ,f_p({}_x\mathbf {m}_{1i})]\\&\mathbf {f}^{\prime }_{2i}=[f_1({}_x\mathbf {m}_{2i}),\ldots ,f_p({}_x\mathbf {m}_{2i})]\\&\mathbf {f}^{\prime }_{li}=[f_1({}_x\mathbf {l}_{i}),\ldots ,f_p({}_x\mathbf {l}_{i})]\\&\mathbf {f}^{\prime }_{ri}=[f_1({}_x\mathbf {r}_{i}),\ldots ,f_p({}_x\mathbf {r}_{i})] \end{aligned}$$

are the generic rows of the transformed design matrices. Each row represents the regression “profile” of observation \(i\) in terms of suitably chosen functions of the observed vectors of the fuzzy explanatory variables. In this way, the model allows also for transformation of the original fuzzy variables, like the polynomial or the logarithmic transformations. By substituting \(\mathbf {F}_1,\mathbf {F}_2,\mathbf {F}_l\) and \(\mathbf {F}_r\) in (5a)–(5d) the properties of the model proposed can be easily extended to this more general case.

Remark 2

(Local optima issues) As for other iterative estimation algorithm, the solutions of the model (5a)–(5d) (see Appendix) do not guarantee the attainment of the global minimum. For this reason we initialize the iterative algorithm considering several different starting points in order to check the stability of the solution.

Remark 3

(Negative spreads) The iterative solutions of the model (5a)–(5d) (see Appendix) do not automatically guarantee the non negativity of the estimated spreads \(\mathbf {l}^*\) and \(\mathbf {r}^*\). To cope with this issue, one can adopt the approaches proposed by D’Urso [14]. In particular, among the different approaches for guaranteeing the non-negativity of the estimated spreads proposed by D’Urso [14] there is the so-called “unconstrained approach”, in which a logarithmic transformation of the spreads is suggested (for more details, see [14]). In literature, this approach has been particularly successful and has been used afterwards by various authors in fuzzy-exploratory and fuzzy-inferential frameworks. For instance, Ferraro et al. [21]—following the idea of considering a modeling structure based on three sub-models proposed by D’Urso and Gastaldi [15] and D’Urso [14] and using the least-squares approach as in Coppi and D’Urso [9] and Coppi et al. [10]—formalized a linear regression model in a fuzzy-inferential framework using the logarithmic transformation of the spreads of the response as suggested in D’Urso [14] within an exploratory framework.

Remark 4

(Particular cases of the model (5a)–(5d)) The model (5a)–(5d) can be considered as the most general fuzzy regression model, with \(LR_2\) fuzzy inputs and outputs. By combining different typologies of membership functions for the fuzzy/crisp inputs/outputs, we obtain different (fuzzy) regression models, outlined in Table 1.

Table 1 Regression models with fuzzy/crisp output/inputs and mixed membership functions

For instance, by putting \(\mathbf {M}_1=\mathbf {M}_2=\mathbf {X}\), \(\mathbf {L}=\mathbf {R}=\mathbf {0}\), \({\varvec{\alpha }}_1={\varvec{\alpha }}_2={\varvec{\alpha }}\) and \({\varvec{\beta }}_1=\varvec{\beta }_2={\varvec{\beta }}\) one can easily obtain from (5a)–(5d) the fuzzy regression model with crisp inputs and \(LR_2\) fuzzy output, and from the iterative solutions, reported in Appendix, the corresponding coefficients’ estimates.

The models in Table 1 can be also generalized to the case in which the fuzzy output and/or the fuzzy inputs are symmetrical.

3 Assessment of imprecision of the regression function

As observed by Coppi et al. [10] in the case of crisp inputs and \(LR_1\) fuzzy output, the estimation procedure of the fuzzy linear regression model provides a crisp evaluation of the regression coefficients. Since the response variable is fuzzy, the fuzzy linear regression model implicitly involves a fuzzy regression model expressed in terms of fuzzy regression coefficients. Thus, the crisp estimates of the fuzzy regression model involve a certain degree of imprecision. This observation can be extended also to all the models reported in Table 1 with fuzzy output, and in particular to our proposed model (5a)–(5d).

To evaluate the imprecision due to the crisp estimates of the fuzzy regression model, we exploit the “implicit” fuzzy model with fuzzy regression coefficients.

Following a similar line of reasoning as in Coppi et al. [10], we refer to the case with \(LR_2\) response variable and crisp inputs, but our conclusions can be extended to more complex models.

The fuzzy regression model with \(LR_2\) fuzzy response variable and crisp explanatory variables is:

$$\begin{aligned} \begin{aligned} \mathbf {m}^*_1&=\mathbf {X}{\varvec{\alpha }}\\ \mathbf {m}^*_2&=\mathbf {X}{\varvec{\beta }}\\ \mathbf {l}^*&=\mathbf {M}^*{\varvec{\gamma }}\\ \mathbf {r}^*&=\mathbf {M}^*{\varvec{\delta }} \end{aligned} \end{aligned}$$
(14)

where, \(\mathbf {M}^*=(\mathbf {1},\mathbf {m}^*_1,\mathbf {m}^*_2)\), \({\varvec{\gamma }}=(\gamma _0,\gamma _1,\gamma _2)^{\prime }\) and \({\varvec{\delta }}=(\delta _0,\delta _1,\delta _2)^{\prime }\).

The implicit fuzzy model can be expressed as:

$$\begin{aligned} {\tilde{y}}^*_i={\tilde{\beta }}_0+{\tilde{\beta }}_1 x_{i1}\oplus \ldots \oplus {\tilde{\beta }}_p x_{ip},\quad i=1,\ldots ,n \end{aligned}$$
(15)

where: \({\tilde{y}}^*_i=(m^*_{1i},m^*_{2i},l^*_{i},r^*_{i})\) is the theoretical value of the \(LR_2\) fuzzy response variable for the \(i\)-th unit; the coefficient \({\tilde{\beta }}_k=(\beta _{1k},\beta _{2k},\beta _{lk},\beta _{rk}),\,k=1,\ldots ,p\) is a \(LR_2\) fuzzy number, with the four components being the left and right centers, and the left right spread of the \(k\)-th coefficient, respectively; \(\oplus \) denotes the addition of fuzzy numbers.

We express the model (15) in the following way:

$$\begin{aligned} \begin{aligned} m^*_{1i}&=\beta _{10}+\beta _{11}x_{i1}+\cdots + \beta _{1p}x_{ip}\quad \mathbf {m}^*_1= \mathbf {X}{\varvec{\beta }}_1 \\ m^*_{2i}&=\beta _{20}+\beta _{21}x_{i1}+\cdots + \beta _{2p}x_{ip}\quad \mathbf {m}^*_2=\mathbf {X}{\varvec{\beta }}_2 \\ l^*_{i}&=\beta _{l0}+\beta _{l1}|x_{i1}|+\cdots + \beta _{lp}|x_{ip}|\quad \mathbf {l}^*=|\mathbf {X}|{\varvec{\beta }}_l\\ r^*_{i}&=\beta _{r0}+\beta _{r1}|x_{i1}|+\cdots + \beta _{rp}|x_{ip}|\quad \mathbf {r}^*=|\mathbf {X}|{\varvec{\beta }}_r \end{aligned} \end{aligned}$$
(16)

where \(|\mathbf {X}|\) is the matrix of the absolute values of the inputs, \({\varvec{\beta }}_1,\,{\varvec{\beta }}_2,\,{\varvec{\beta }}_l\) and \({\varvec{\beta }}_r\) are the \((p+1)\) vectors of the components of the vector of the fuzzy coefficients \({\tilde{\beta }}_k\).

Coppi et al. [10] observed that from (16) we obtain estimates of \({\varvec{\beta }}_1,\,{\varvec{\beta }}_2,\,{\varvec{\beta }}_l\) and \({\varvec{\beta }}_r\) which are compatible with \({\varvec{\alpha }}_1,\,{\varvec{\alpha }}_2,\,{\varvec{\delta }}\) and \({\varvec{\gamma }}\), the coefficient of the model (14).

Let assume the fuzzy arithmetic relationships represented by the equations in (16) can be approximated as follows:

$$\begin{aligned} \begin{aligned}&\mathbf {m}^{*p}_1=\mathbf {X}{\varvec{\beta }}_1+\mathbf {u}_1 \\&\mathbf {m}^{*p}_2=\mathbf {X}{\varvec{\beta }}_2+\mathbf {u}_2 \\&\mathbf {l}^{*p}=|\mathbf {X}|{\varvec{\beta }}_l+\mathbf {u}_l\\&\mathbf {r}^{*p}=|\mathbf {X}|{\varvec{\beta }}_r+\mathbf {u}_l \end{aligned} \end{aligned}$$
(17)

where \(p\) indicates that these relationships are proxies of the real relationships, and where \(\mathbf {u}_1,\,\mathbf {u}_2,\,\mathbf {u}_l\) and \(\mathbf {u}_r\) are vectors of residuals.

By means of Ordinary Least Squares (OLS), we obtain compatible estimate of the model (16). For instance, the OLS estimate of \({\varvec{\beta }}_1\) is:

$$\begin{aligned} {\varvec{\beta }}_1=(\mathbf {X}^\prime \mathbf {X})^{-1} \mathbf {X}^\prime \mathbf {m}_1^{*p}=(\mathbf {X}^\prime \mathbf {X})^{-1} \mathbf {X}^\prime \mathbf {X}{\hat{\varvec{\alpha }}}_1= {\hat{\varvec{\alpha }}}_1 \end{aligned}$$

where \({\hat{\varvec{\alpha }}}_1\) is the LS estimate of \({\varvec{\alpha }}_1\) from the model (14). In a similar way, we obtain a compatible estimate of \({\varvec{\beta }}_2\).

As for the estimates of the spreads \({\varvec{\beta }}_l\) and \({\varvec{\beta }}_r\), one can adopt the non-negative Least Squares (NNLS) algorithm [27], to avoid negative estimates.

In conclusion, the model (14) provides both a good approximation of the centers of the fuzzy regression coefficients \({\tilde{\mathbf {\beta }}}\) and of the fuzzy values of the fuzzy response variable, \({\tilde{y}}\). Moreover, by means of the LS approximation of the spreads in (17) we obtain reasonable estimates of the spreads of \({\tilde{\mathbf {\beta }}}\).

Finally, note that another source of uncertainty in our framework is related to the data generation process [10]. One could take into account this type of uncertainty by means of the bootstrap procedure to evaluate the standard error of the regression coefficients estimates.

Results illustrated in this section could also be extended to the other fuzzy regression models shown in Table 1.

4 Robust fuzzy regression

It can be shown that the WLS-based fuzzy regression model (5a)–(5d) is generally not robust to the presence of outlier data.

For instance, when \(\mathbf {W}=\mathbf {I}\), the WLS-based fuzzy regression model reduces itself to the LS-based fuzzy regression model, which is extremely sensitive to the presence of outliers, yielding a distortion in the parameter estimates [20].

In fuzzy regression, we could have different types of outliers in the dataset with respect to: one or more crisp explanatory variables; the centers of one or more fuzzy explanatory variables; the spreads of one or more fuzzy explanatory variables; the centers of the fuzzy dependent variable; the spreads of the fuzzy dependent variable; the fuzzy regression lines; more aspects.

We denote the data observed on a generic unit, where both output and \(p\) inputs are fuzzy \(LR_2\) variables, with \(({\tilde{y}}_i,{\tilde{\mathbf {x}}}_i)\). Let us also suppose that the theoretical relationship between the fuzzy output \({\tilde{y}}_i\) and the fuzzy inputs \({\tilde{\mathbf {x}}}_i\) can be described by the model (5a)–(5d). A sample of such observations is depicted in Fig. 2a, in the case when there is a single fuzzy input \((p=1)\). Solid lines represent the interval-valued data given by the left and right centers of \({\tilde{y}}\) and \({\tilde{x}}\), dashed lines represents the spreads, i.e. the uncertainty, around the centers.

Fig. 2
figure 2

Example of outliers for the (5a)–(5c). a No outlier. b Outlier with respect to the relationship between \({\tilde{y}}\) and \({\tilde{x}}\), but not with respect to the two variables. c Outlier with respect to the relationship between \({\tilde{y}}\) and \({\tilde{x}}\), and with respect to the centers of \({\tilde{x}}\). d Outlier with respect to the spreads of \({\tilde{x}}\). e Outlier with respect to the centers of \({\tilde{x}}\) to the spreads of \({\tilde{y}}\) and to the relationship between \({\tilde{y}}\) and \({\tilde{x}}\). f Outlier with respect to the centers of \({\tilde{y}}\) and \({\tilde{x}}\), but not with respect to the relationship between \({\tilde{y}}\) and \({\tilde{x}}\)

As observed above, different types of outliers could occur in the dataset. Consider, for instance, Fig. 2b where there is an outlier (depicted with a bolder line) with respect to the relationship between \({\tilde{y}}\) and \({\tilde{x}}\). A closer inspection reveals that the unit is not an outlier with respect to the two fuzzy variables. Indeed, the value of the left and right centers (and of the left and right spreads) are in the range of the values observed for the other units.

In Fig. 2c we have a different case, since the anomalous unit is an outlier with respect to both the relationship between \({\tilde{y}}\) and \({\tilde{x}}\), and to the fuzzy input. As can be seen, the values of the left and right centers of \({\tilde{x}}\) lie outside the range of the left and right centers of the fuzzy input for the remaining units.

In Fig. 2d the outlier is with respect to the spreads of \({\tilde{x}}\). This outlier partially undermines also the relationship between \({\tilde{y}}\) and \({\tilde{x}}\), at least for the part of the model (5a)–(5d) devoted to the spreads.

Figure 2e shows a more general case in which the unit is an outlier with respect to the centers of \({\tilde{x}}\), the spreads of \({\tilde{y}}\) and the relationship between the fuzzy variables.

Finally, in Fig. 2f we show a situation in which the observation is an outlier with respect to both \({\tilde{y}}\) and \({\tilde{x}}\), but not with respect to their relationship.

In this section we propose a robust version of the fuzzy regression model (5a)–(5d). The proposed model is based on the Least Median Squares (LMS) estimation method [30], which relies on the minimization of the median of squared residuals:

$$\begin{aligned} {\widetilde{\varDelta }}^2_{med}&= \underset{i}{median}\left\{ (m_{1i}-m^*_{1i})^2+ (m_{2i}-m^*_{2i})^2 \right. \nonumber \\&\left. + [(m_{1i}-\lambda l_i)-(m^*_{1i}-\lambda l^*_i)]^2\ + [(m_{2i}+\rho r_i)-(m^*_{2i}+\rho r^*_i)]^2\right\} \end{aligned}$$
(18)

The two-steps estimation procedure can be illustrated as follows [20].

In the first step we apply a random re-sampling procedure [31], in which, we consider several subsets of \({\bar{p}}=[8\cdot (p + 1)+6]\) observations, where \({\bar{p}}\) is the number of unknown parameters of the model (5a)–(5d). As the number of subsets increases, the probability of extracting at least one subset without outliers, increases.

Let \(\mathbf {M}_{1s},\,\mathbf {M}_{2s},\,\mathbf {L}_s\) and \(\mathbf {R}_s\) be the \([{\bar{p}} \times (p + 1)]\) matrices extracted from the matrices \(\mathbf {M}_1,\,\mathbf {M}_2,\,\mathbf {L}\) and \(\mathbf {R}\) defined in Sect. 2.2, whose rows match up to the randomly selected observations. Let also \(\mathbf {m}_{1s},\,\mathbf {m}_{2s},\,\mathbf {r}_s\) and \(\mathbf {l}_s\) be the corresponding sub-vectors \(({\bar{p}} \times 1)\) of \(\mathbf {m}_1,\,\mathbf {m}_2,\,\mathbf {l}\) and \(\mathbf {r}\), respectively.

For each subset, the regression coefficients are estimated using the iterative solutions illustrated in Appendix, by putting \(\mathbf {W}=\mathbf {I}_{{\bar{p}}}\), thus obtaining the estimated values \({\hat{\mathbf {m}}}_{1s},\,{\hat{\mathbf {m}}}_{2s},\,{\hat{\mathbf {l}}}_s\) and \({\hat{\mathbf {r}}}_s\). These estimates are employed to compute the median of squared residuals (18).

Since the optimal solution of LMS employs only a subset of observations, it is likely that a great deal of the remaining observations are not outliers. Hence, in the second step of the procedure we improve the estimates by considering all observations, assigning low weights to data identified as outliers. The identification of these observations is based on the robust residuals from LMS (see [20]).

In particular, we adopt the following weights for our analysis:

$$\begin{aligned} w_i={\left\{ \begin{array}{ll} 1,&{} |r_i/{\hat{\sigma }}|\le c_1\\ 0.5,&{} c_1\le |r_i^2/{\hat{\sigma }}|\le c_2\\ 0,&{} |r_i/{\hat{\sigma }}|\le c_2, \end{array}\right. } \end{aligned}$$
(19)

where \(r_i\) is the square root of the \(i\)-th squared residual from LMS:

$$\begin{aligned} r_i^2&= \underset{i}{median}\left\{ (m_{1i}-m^*_{1i})^2+ (m_{2i}-m^*_{2i})^2 \right. \\&\left. + [(m^*_{1i}-\lambda l_i)-(m^*_{1i}-\lambda l^*_i)]^2\ + [(m_{2i}+\rho r_i)-(m^*_{2i}+\rho r^*_i)]^2\right\} , \end{aligned}$$

\({\hat{\sigma }}\) is the robust estimate of the scale of residuals, \({\hat{\sigma }}=\sqrt{median(r_i^2)}\), \(r_i^2/{\hat{\sigma }},\,i=1,\ldots ,n,\) are the standardized residuals, and \(c_1\) and \(c_2\)are constants. For our applications we set \(c_1=2.5\) and \(c_2=3.5\).

The final estimates of the coefficients of the WLS-based fuzzy regression model (5a)–(5d) are derived by applying formulae (20a)–(20n) (see Appendix) to the whole sample, by setting \(\mathbf {W}=diag(w_i)\).

In what follows, we refer to this robust model as the LMS–WLS-based fuzzy regression model.

The weights (19) suitably tune the effect of outliers, removing the units with weight equal to 0 from the optimization process, and reducing the impact of those units with weights 0.5, which deviate less than the former from the bulk of data.

Finally, note that the presence of outlier entails, ceteris paribus, an increase in \(R^2_{\mathbf {W}}\), since outliers have weights equal to 0, or at most 0.5, thus yielding a decrease of \(SSR_{\mathbf {W}}\). Hence, we expect that the robust LMS–WLS-based fuzzy regression model performs better, in terms of goodness of fit to data, with respect to the LS-based fuzzy regression model.

5 A simulation study

5.1 Fuzzy simple linear regression model: crisp input, \(LR_2\) fuzzy output

To illustrate the main features of the LS (\(\mathbf{W}=\mathbf{I}\)) and of the LMS–WLS-based fuzzy regression models proposed, we first consider a simulated dataset with a \(LR_2\) fuzzy response variable and one crisp explanatory variable (\(p=1\)). As observed above, the LS and the LMS–WLS-based fuzzy regression models could be easily derived from the model (5a)–(5d) (see Remark 4).

We generated 40 observations on the crisp variable from \(U[1,2]\). Then we generated the \(LR_2\) fuzzy output as follows:

$$\begin{aligned} {\left\{ \begin{array}{ll} m_{1i}=1.61+3.50x_i+N(0,1)=m^*_{1i}+N(0,1)\\ m_{2i}=1.99+4.18x_i+N(0,1)=m^*_{2i}+N(0,1)\\ l_i=0.24+0.01 m^*_{1i}+0.04 m^*_{2i}+N(0,1)=l^*_i+N(0,1)\\ r_i=0.16+0.04 m^*_{1i}+0.06 m^*_{2i}+N(0,1)=r^*_i+N(0,1) \end{array}\right. } \end{aligned}$$

Figures 3a–b show the results of the two models fitted to the generated dataset. The value of the determination coefficient is reported in the top left of each figure. Each \(LR_2\) fuzzy output value is represented by a solid line (centers) and two dashed lines (spreads). The fitted model is represented by two solid lines for the models on the left and right centers, and two dotted lines for the models on the left and right spreads.

Fig. 3
figure 3

Fitting of the LS and LMS–WLS-based fuzzy regression models to the simulated dataset: no contamination or contamination in one element. a LS: no outlier. b LMS–WLS: no outlier. c LS: one outlier in the input. d LMS–WLS: one outlier in the input. e LS: one outlier in both centers. f LMS–WLS: one outlier in both centers. g LS: one outlier in both spreads. h LMS–WLS: one outlier in both spreads

The two models provide similar results, as can be seen also by the value of \(R^2_{\mathbf {W}}\).

To evaluate the different behaviour of the LS and of the LMS–WLS-based fuzzy regression model in presence of anomalous data, we have also contaminated the simulated dataset with three different kind of outliers: one outlier in the input; one outlier in both centers; one outlier in both spreads.

Figures 3c–h refer to the cases in which each kind of outlier is considered, one at a time. The outlier is highlighted with a thicker line. In Figs. 4a–h different combinations of the three type of outliers are considered.

Fig. 4
figure 4

Fitting of the LS and LMS–WLS-based fuzzy regression models to the simulated dataset: contamination in two or more elements. a LS: one outlier in the input and in the centers. b LMS–WLS: one outlier in the input and in the centers. c LS: one outlier in the input and in the spreads. d LMS–WLS: one outlier in the input and in the spreads. e LS: one outlier in the centers and in the spreads. f LMS–WLS: one outlier in the centers and in the spreads. g LS: one outlier in all elements. h LMS–WLS: one outlier in all elements

Some consideration follows:

  1. 1.

    the LS-based fuzzy regression model is heavily affected by the presence of a single outlier in the input or in the centers of the output, especially when the two contaminations are combined;

  2. 2.

    the effect of the presence of anomalous values in the spread is mitigated by the presence of the weights \(\lambda \) and \(\rho \) of the spreads in the objective function (6);

  3. 3.

    the LMS–WLS-based fuzzy regression model performance is not affected by the presence of a single outlier, irrespective of which element (centers and/or spreads of the response variable variable, and/or explanatory variable) of the generated dataset is contaminated.

5.2 Fuzzy simple linear regression model: \(LR_2\) fuzzy input/output

We now consider a simulation carried out on a fuzzy linear regression model with a fuzzy \(LR_2\) response variable and one fuzzy \(LR_2\) explanatory variable (\(p=1\)). Fuzzy data for the simulation study are generated with the following scheme:

$$\begin{aligned} \mathbf {m}_{1}&= (\mathbf {1},\,U[1,2])(1.42,\,1.53)^{\prime } +(\mathbf {1},\,U[2.5,3.5])(0.16,\,0.88)^{\prime }\\&+(\mathbf {1},\,U[0.1,0.2])(0.20,\,0.58)^{\prime } + (\mathbf {1},\,U[0.15,0.25])(0.80,\,0.26)^{\prime } +N(0,1)\\&= \mathbf {m}^*_1+N(0,1)\\ \mathbf {m}_{2}&= (\mathbf {1},\,U[1,2])(1.00,\,2.25)^{\prime }+(\mathbf {1},\,U[2.5,3.5])(2.61,\,1.38)^{\prime } \\&+(\mathbf {1},\,U[0.1,0.2])(0.24,\,0.48)^{\prime } + (\mathbf {1},\,U[0.15,0.25])(0.01,\,0.48)^{\prime } +N(0,1)\\&= \mathbf {m}^*_2+N(0,1)\\ \mathbf {l}&= 1.1751+\mathbf {m}^*_1\cdot 0.044+\mathbf {m}^*_2\cdot 0.014+N(0,1)=\mathbf {l}^*+N(0,1)\\ \mathbf {r}&= 1.093+\mathbf {m}^*_1\cdot 0.022+\mathbf {m}^*_2\cdot 0.026+N(0,1)=\mathbf {r}^*+N(0,1) \end{aligned}$$

We generated 100 datasets of 200 observations. In each dataset the regression coefficients were held constant, while the values of the fuzzy inputs were randomly generated. We fitted to each generated dataset both the LS and the LMS–WLS-based fuzzy regression model. Finally, we computed the mean and the median of \(R^2_{\mathbf {W}}\) to evaluate the average and the median fitting performance of the two models over the simulation cycle. Results are reported in the fourth column of Table 3. As expected, both models provide a good fitting performance.

Then we contaminated each dataset by adding an increasing percentage of outliers (from 5 to 30 %, by steps of 5 %) in the input or in the output centers following different contamination schemes, summarized in Table 2.

Table 2 Outlier generation schemes

In Table 3 (columns fifth to tenth) we also report the mean and the median of \(R^2_\mathbf {W}\) computed over the 100 datasets generated for each outlier generation scheme.

Table 3 Simulation results: mean and median of \(R^2_{\mathbf {W}}\) computed over 100 simulated datasets

Some consideration follow:

  1. 1.

    the LS-based fuzzy regression model is heavily affected by the presence of outliers in the centers of the input variable, even when there are only 5 % of outliers;

  2. 2.

    the LMS–WLS-based fuzzy regression model is not affected by the presence of outliers in the centers of the input variable, irrespective of the percentage of outliers,

  3. 3.

    similar conclusions can be deduced when there are outliers in the centers of the output variable.

6 Applications

6.1 Daily variation of pollutant concentration

In this application we examine the dependence relationship between the atmospheric concentration of carbon monoxide (CO) and other pollutants, namely mono-nitrogen oxides (\({\text {NO}}_{\text {x}}\)), which, in atmospheric chemistry, correspond to the total concentration of nitric oxide (NO) and nitrogen dioxide (\({\text {NO}}_2\)), and ozone (\({\text {O}}_3\)).

The original data was collected in Rome during the year 1999. The original dataset provides hourly values of all the variables considered. Prior to the analysis, we have standardized all variables.

We are interested in detecting the effect that the daily variation of the inputs (\({\text {NO}}_{\text {x}}\) and \({\text {O}}_3\)) exerts on the daily variation of the concentration of CO. Missing data prevent us from contrasting daily variation of CO with daily variation of \({\text {NO}}_{\text {x}}\) and \({\text {O}}_3\). Therefore, we compute the weekly averages of the daily minimum and maximum of each variable.

To cope with the loss of information due to summarizing the data, we consider the variables as \(LR_2\) fuzzy variables. Then, the two centers of each \(LR_2\) fuzzy variable are given by the mean values of the minimum and maximum value recorded each day of the week; the left (right) spreads are the mean deviations from the average minimum (maximum) values, of those values which are lower (higher) than the average minimum (maximum) values. We further assume that the shape of the \(LR_2\) membership function is trapezoidal, which implies \(\lambda =\rho =1/2\). See Coppi et al. [10] for a similar fuzzy formalization of the data.

Having considered weekly data, we end up with 53 observations. The obtained fuzzy data matrix is reported in Table 4

Table 4 Standardized pollution data: left (right) centers are the mean values of the minimum (maximum) value recorded each day of the week; left (right) spreads are the mean deviations from the average minimum (maximum) values, considering only values lower (higher) than the average minimum (maximum)

The determination coefficient computed for the LS-based fuzzy regression model is equal to 0.879, while that of the LMS–WLS-based fuzzy regression model is 0.880. Both models provide a good fit to data, even if the LMS–WLS-based fuzzy regression model slightly outperforms the non-robust model due to the presence of three outliers.

The estimates of the coefficients are reported in Tables 5 (models on the centers) and 6 (models on the spreads). As can be seen, the estimates are similar between the two models.

Table 5 Coefficients’ estimates for the LS and the LMS–WLS-based fuzzy regression models: Models on the centers (pollution data)
Table 6 Coefficients’ estimates for the LS and the LMS–WLS-based fuzzy regression models: Models on the spreads (pollution data)

With the proposed model, it is possible to highlight which component of each fuzzy explanatory variables mainly affects each component of the response variable.

Consider, for instance, the influence of the left centers of the explanatory variables on the left and right centers of the response, given respectively by \(\mathbf {\alpha }_1\) and \(\mathbf {\beta }_1\). In both cases, the main effect is exerted by the weekly average minimum values of the concentration of \({\text {NO}}_{\text {x}}\). The greater is this value, the greater are both the weekly average minimum values of CO. Moreover, since the effect on the right center is greater, as the minimum value of the concentration mono-nitrogen oxides raises, the daily variation of carbon monoxidencreases.

Similar evidence can be drawn from the effect of the right centers of the inputs, in particular of \({\text {NO}}_{\text {x}}\). Given the almost nil effect on the left center of the output and the strong influence on the right center, we can derive a positive influence of the maximum value of the concentration mono-nitrogen oxides on the daily variation of CO.

Overall, we observe a direct relationship between the daily variation of \({\text {NO}}_{\text {x}}\) and that of carbon monoxide.

6.2 Attitude towards traditional vs. “creative” advertising

The aim of this application is to illustrate how to cope with various source of the uncertainty that may affect the regression analysis: fuzziness of the response and of the explanatory variables; uncertainty about the values of regression coefficients; uncertainty about the choice of a specific model in a class of parametric model.

Data for our analysis are drawn from a survey on a sample of 103 students from Sapienza University and LUISS University, in Rome, interviewed about their opinions about traditional and new media. A section of the survey was devoted to the respondents’ opinions towards traditional vs. innovative advertising campaigns. Respondents are asked to report their degree of agreement towards these seven statements (in brackets are reported the names of each variable):

  • I am sensitive to traditional advertising campaigns, i.e. campaigns broadcast by TV and/or radio, published on newspaper or magazines, etc. (sens-tr; response variable).

  • I am tired of traditional advertising campaigns (tired-tr).

  • I do not pay attention to traditional advertising campaigns (not-tr).

  • I try to avoid traditional advertising campaigns (avoid-tr).

  • I am impressed by “creative” advertising campaigns, e.g., via blog and/or social networks, sponsorship of public events, etc. (impr-cre).

  • Creative advertising campaigns are more effective in capturing my attention (eff-cre).

  • I better remember a creative advertising campaigns with respect to a more traditional one (rem-cre).

The degree of agreement were reported on a 4-item scale, from “I totally disagree” (1) to “I totally agree” (4).

The complete dataset is reported in Table 7.

Table 7 Student data

Coppi and D’Urso [8] observed that the subjective evaluation of a qualitative scale could be better represented in a fuzzy framework, which takes into account the uncertainty and the heterogeneity in individual evaluation.

Hence, we adopted a fuzzy coding for describing the subjective judgements reported in the survey. In particular, we recoded the qualitative variables as \(LR_1\) fuzzy variables. The \(LR_1\) fuzzy recoding of these linguistic variables is reported in Table 8 [1], and represented in Fig. 5, in which it is also shown the membership function of each fuzzy value.

Table 8 Linguistic variables and corresponding fuzzy values (center, left spread, right spread)

Then, we analysed the relationship between the variable sens-tr and the remaining variables by means of a fuzzy linear regression model with \(LR_1\) fuzzy output and \(LR_1\) fuzzy outputs. As observed in the Remark 4, this model is a particular case of the more general model (5a)–(5d).

To select the optimal model we employed a procedure based on the maximization of the value of the adjusted determination coefficient, \({\bar{R}}^2_{\mathbf {W}}\). Notice that in this case each added variable involves the estimation of 7 additional coefficients (\({\bar{p}}=[3 * (p+1) + 4]\)). Then the penalization factor increases more than proportionally for each added variable, as observed in Sect. 2.4. For this reason we considered the following expression for the adjusted determination coefficient:

$$\begin{aligned} {\bar{R}}^2_{\mathbf {W}}=1-(1-R^2_{\mathbf {W}})\frac{n-1}{n-p} \end{aligned}$$

The selection procedure adopted is backward-type and can be illustrated as follows.

For a model with \(k\) fuzzy inputs we compute \({\bar{R}}^2_{\mathbf {W}}(k)\). Then, we compute \({\bar{R}}^2_{\mathbf {W},j}(k-1)\) for all the \(k\) models derived from the first model by dropping one variable at time (\(j=1,\ldots ,k\)). If \(max_{j}{\bar{R}}^2_{\mathbf {W},j}(k-1) > {\bar{R}}^2_{\mathbf {W}}(k)\), we consider the model \(j'\) with \(k-1\) inputs such that \(j'=\hbox {argmax}_{j} {\bar{R}}^2_{\mathbf {W},j}(k-1)\) and we continue the procedure. Otherwise, we select the model with \(k\) fuzzy inputs.

Fig. 5
figure 5

Fuzzy recoding of the 4-item scale, with membership function

The model selection procedure is summarized in Table 9.

Table 9 Model selection

As can be seen, the best model is the LMS–WLS-based fuzzy regression model with six fuzzy inputs. Note also that all the LS-based fuzzy regression models are severely affected by the presence of outliers, as can be seen by the low values of \({\bar{R}}^2_{\mathbf {W}}\).

As for the estimates of the coefficients of the fuzzy regression model, as observed in Sect. 3, one has to take into account the imprecision due to the ignorance about the data generation process. Hence we generated 100 bootstrap samples. We then fitted both models to these bootstrap samples. The standard deviation of the estimates of the regression coefficients provided us a measure of the accuracy of the estimates obtained with both models.

The estimates of the coefficients are reported in Tables 10 (models on the center) and 11 (models on the spreads). The bootstrap estimates of the standard errors are reported in brackets.

Table 10 Coefficients’ estimates for the LS and the LMS–WLS-based fuzzy regression models: Models on the center (advertising data)

As expected, the presence of outliers produces some bias in the estimates of the LS-based fuzzy regression model. Consider, for instance, the effect that the center of the third explanatory variables exerts on the centers of the response. The two models return estimates with opposite signs. However, one would expect that the more the respondent tries to avoid traditional campaigns, the less sensitive is she or he to these types of campaigns, i.e., we expect a negative sign of the coefficient, as in the LMS–WLS-based fuzzy regression model, while the sign for the LS-based fuzzy regression model is positive.

Focusing only on the LMS–WLS-based fuzzy regression model and on the model which relates the centers of the explanatory variables to the center of the output, we notice that the second, the fourth and the sixth variable are significant. Thus, not paying attention to traditional campaigns, being impressed by creative campaign, and better recalling creative campaigns affect the most the sensitiveness to traditional campaigns.

7 Final remarks

In this paper, a generalization of the fuzzy regression model proposed by Coppi et al. [10] has been discussed. In particular, by considering an iterative Weighted Least Squares estimation approach, a general linear regression model for studying the dependence of a general class of fuzzy response variable ,i.e., \(LR_2\) fuzzy variable or trapezoidal fuzzy variable, on a set of crisp or \(LR_2\) fuzzy explanatory variables has been proposed. Furthermore, some theoretical properties and a suitable generalization of the determination coefficient to investigate the goodness of fit of the regression model, have been illustrated. To neutralize and/or smooth disruptive effects of possible crisp or fuzzy outliers in the estimation process, a robust version of the fuzzy regression model based on the Least Median Squares estimation approach has been suggested. Finally, some theoretical remarks and an assessment of imprecision of the regression function have been illustrated. The good performance of our models are shown by means of a simulation study and some applications to real cases.

Table 11 Coefficients’ estimates for the LS and the LMS–WLS-based fuzzy regression models: Models on the spreads (advertising data)

In future, the proposed fuzzy regression model and its robust version might be improved in several directions. In particular, an interesting aspect is related to the modelization of the regression relationship between the spreads of the fuzzy response variable and the respective estimated centers. In model (5a)–(5d) these are assumed in a simple linear form. A more complex relationship could be considered, in order to cope with observational studies where the simple linear assumption is not suitable.

Another interesting issue is to utilize our models in a clusterwise context [17, 19].

Furthermore,to improve the capability of managing the uncertainty due to randomness of the data, a further line of research that deserves careful attention in future research consists of making our fuzzy regression models probabilistic, by using the notion of fuzzy random variable (see, e.g., [7]). We will investigate the above lines of research in future works.