1 Introduction

Load forecasting is a fundamental business process and well-established analytical problem in the electric utility industry. It can be roughly categorized into four groups based on the forecasting horizon: very short term (less than a day), short term (1 day to 2 weeks), medium term (2 weeks to 3 years) and long term (3 years to 30 years or more). A typical short term load forecasting (STLF) example is 1 day ahead hourly load forecasting, which usually requires the forecaster to submit the load forecasts for the 24 h of the next day. Such forecasts can be widely used by all of the four sectors of the utility industry, from generation, transmission and distribution to retail. The applications of STLF include operations, maintenance, demand response and energy market activities (Hong 2010; Hong and Wang 2012).

Short term load forecasting (STLF) has several characteristics that are attractive to the forecasting community:

  1. (1)

    Large volume of time series data. Many utilities today have been storing hourly loads at system level for at least 5 years, or over 40,000 records.

  2. (2)

    High quality data. Due to the maturity of the metering technology today, many utilities are comfortable with the data quality at the corporate level. Most outliers are physically explainable, such as system outages, demand response activities, etc.

  3. (3)

    Multiple patterns of seasonality. Three seasonal patterns are often being investigated in STLF: hours of a day, days of a week and months of a year.

  4. (4)

    Load is highly dependent on several explanatory variables. For instance, electricity consumption is highly correlated to temperature due to space heating and cooling needs in today’s world.

  5. (5)

    High accuracy requirement. Mean Absolute Percentage Error (MAPE) in 1 day ahead hourly load forecasting at a corporate level is typically \(<5\,\%\).

  6. (6)

    Societal necessity. Modern society is irreversibly dependent on electric power. Improving STLF quality can help system reliability and energy efficiency by bringing accuracy to a higher level than before, which means less outages, greener environment, and reductions of costs.

As a result, dozens of techniques have been applied to short-term load forecasting, such as regression analysis (Hong et al. 2010, 2011; Hong 2010), time series models, artificial neural networks (ANN) (Hippert et al. 2001; Khotanzad et al. 1998), and fuzzy regression (Al-Kandari et al. 2004; Song et al. 2005). Many of these techniques have been progressively improved by the scientific community over the past three decades. Several of the techniques have been adopted for production use in the utilities (Khotanzad et al. 1998; Hong et al. 2010; Hong 2010; Fan et al. 2009; Fan and Hyndman 2012).

Fuzzy regression is introduced to overcome some of the limitations of linear regression, such as the vague relationship between the response variable and predictor variable(s), an insufficient number of observations, and error distributions which are difficult to verify. The fundamental difference between the assumptions of the two techniques is on the deviations between the observed values and the estimated values. These values are supposed to be errors in measurement or observations that occur in linear regression models, but are assumed to depend on the indefiniteness of the system structure in fuzzy regression. The earliest formulation of fuzzy regression analysis was later named as the Min problem, which minimizes the fuzziness such that the membership of every estimated interval is above a certain threshold (Tanaka et al. 1982). Two other formulations, the Max problem and the Conjunction problem were then proposed (Tanaka 1987; Tanaka and Watada 1988; Tanaka et al. 1989).

Despite 30 years of theoretical advancement on fuzzy regression and its applications to forecasting (Heshmaty and Kandel 1985; Chen and Wu 2003; Nureize and Watada 2010; Parvathi et al. 2013), fuzzy regression is still not shown to be well understood by the utility industry, nor is it properly applied to STLF. There are two representative research branches for the application of fuzzy regression to STLF. AI-Kandari et al. developed two fuzzy regression models for 24-h ahead forecast in summer and winter respectively (Al-Kandari et al. 2004). The work is mainly at the proof-of-concept level for the following two reasons: (1) the range between the forecasted upper and lower bounds is not tight enough to be useful for utility operations; (2) the forecasted center is not able to capture the salient features of the load curve. Song et al. used fuzzy linear regression to forecast the loads during holidays with promising accuracy (Song et al. 2005). This approach forecasts the load based on previous load without the inputs of weather information. The case study of the local utility is too special to be generalized, in the sense that the load profiles of the holidays in different years remain fairly stable. This means the case study is limited in its applicability to many other utilities.

The fuzzy regression approaches proposed for STLF in the literature focus primarily on the “fuzzy” side, by artificially creating fuzzy inputs or producing fuzzy results. The investigation of the “regression” side was limited (e.g., there was little attention paid to variable selection for fuzzy regression models). In this paper, we propose a fuzzy interaction regression approach to STLF with the models implemented in the earliest possibilistic regression framework (Tanaka et al. 1982). We show that by including interaction effects in the underlying linear model, the forecasting accuracy of fuzzy regression can be significantly enhanced. Additionally, since the variables (load, temperature and calendar variables) in this paper remain as crisp numbers, the proposed approach can be applicable to a wide range of utilities without engaging heuristics to create fuzzy inputs.

The paper is organized as follows: Sect. 2 reviews the theoretical background of fuzzy regression; Sect. 3 introduces three fuzzy regression models including two without interaction effects and one with interaction effects; Sect. 4 explains the experiment’s procedure and shows a comparison of the results. Further comments on a notable paper and tips about applying fuzzy regression to STLF are discussed in Sect. 5. The paper is concluded in Sect. 6 with the discussion of the future research direction.

2 Theoretical background

In fuzzy regression, the deviations between the observed values and the estimated values are assumed to be dependent on the indefiniteness of the system structure. These deviations are regarded as the fuzziness of the parameters of the system rather than the observation errors. In this section, we briefly review Tanaka’s methodology of applying fuzzy regression analysis to crisp data. More detailed discussions of fuzzy regression can be found in Tanaka et al. (1982), Tanaka and Watada (1988). A possibilistic linear function can be defined as:

$$\begin{aligned} Y=A_{1} x_{1} +A_{2} x_{2} +\ldots +A_{n} x_{n} =Ax \end{aligned}$$
(1)

where \(x_{i}\) is non-fuzzy. \(A_{i}\) is a symmetric fuzzy number denoted by \((\alpha _{i}, c_{i})_{L}\), with \(\alpha _{i}\) as the center and \(c_{i}\) as the spread. In this paper, we assume that the reference function \(L(x) = \hbox {max}(0, 1 - |x|\)). The type of fuzzy parameter \(A_{i}\) is a symmetrical triangular fuzzy number:

$$\begin{aligned} \mu _{A_{i}} (a_{i} )=L\left( (a_{i} -\alpha _{i} )/c_{i}\right) \end{aligned}$$
(2)

where \(c_{i} >0\).

The possibilistic linear function \(Y=Ax\) is obtained by the following membership function:

$$\begin{aligned} \mu _{_Y} (y)=\left\{ {\begin{array}{l@{\quad }l} L\left( \left( y-{{\varvec{x}}}^{T}a_{i} \right) /{{\varvec{c}}}^{T}\left| {{\varvec{x}}} \right| \right) ,&{}{{\varvec{x}}}\ne \mathbf{0} \\ 1,&{} {{\varvec{x}}}\ne \mathbf{0},y=0 \\ 0,&{} {{\varvec{x}}}=\mathbf{0},y\ne 0 \\ \end{array}} \right. \end{aligned}$$
(3)

where \(|{{\varvec{x}}}|= ( |x_{1}|, |x_{2}|, {\ldots }, |x_{n}|)^{T}\).

As discussed in Tanaka and Watada (1988), identification of the parameters of the fuzzy linear regression model can be formulated as a linear programming problem:

$$\begin{aligned} \mathop {Min}\limits _{{\varvec{\alpha }},\mathbf{c}}&\quad J({{\varvec{c}}})={{\varvec{c}}}^{T}\left| {{\varvec{x}}} \right| \\ \hbox {s.t.}&\quad y_{i} \le \left| {L^{-1}(h)} \right| {{\varvec{c}}}^{{{\varvec{T}}}}\left| {{{\varvec{x}}}_{{\varvec{i}}} } \right| -{{\varvec{x}}}_{{\varvec{i}}}^{{\varvec{T}}} \varvec{\alpha },\\&\quad -y_{i} \le \left| {L^{-1}(h)} \right| {{\varvec{c}}}^{{{\varvec{T}}}}\left| {{{\varvec{x}}}_{{\varvec{i}}}} \right| -{{\varvec{x}}}_{{\varvec{i}}}^{{\varvec{T}}} \varvec{\alpha },\\&\quad {{\varvec{c}}} \ge \mathbf{0},\\&\quad i=1,\ldots ,N, \end{aligned}$$

where \(h\) is the threshold to control the width of the spread, and 0 \(\le h < 1\). In this paper, the linear programming problem is solved in CPLEX 12.1. Since this paper is focused on the underlying regression model, no specific methods or algorithms are developed to solve the linear programming problem for parameter estimation.

3 Fuzzy regression models for STLF

3.1 Fuzzy regression models without interaction effects

Both the summer and winter fuzzy regression models proposed by Al-Kandari et al. (2004) have the temperatures of the past 3 h and the 3rd ordered polynomial of the current hour temperature. In addition, the summer model includes humidity factors of the current hour and the past 2 h, while the winter model includes wind cooling factors of the current hour and the past 2 h. In practice, many utilities do not use humidity and wind cooling factors in their load forecasting process due to several reasons, such as (1) these two factors are not as easy to predict as temperature; (2) they may not be a significant driving factor of electricity demand for many utilities. To establish a generic model, denoted as M1, for comparison purpose, we remove the humidity and wind cooling factor terms and use the following model altered based on Al-Kandari’s for both summer and winter:

$$\begin{aligned} Load(t)&= A_{1} +A_{2} T(t)+A_{3} T^{2}(t)+A_4 T^{3}(t)+A_5 T(t-1)+ A_{6} T(t-2) \nonumber \\&\quad +\, A_7 T(t-3) \end{aligned}$$
(4)

Al-Kandari’s models do not include any calendar variables. However, it is well-known in load forecasting practices that electricity consumption varies due to human activities, which can be modeled by calendar variables, such as hour of the day, day of the week and month of the year. In this paper, we add Hour, Weekday and Month to M1 to obtain the second fuzzy regression model M2 as shown below. This model includes class variables but not interaction effects. Hour, Weekday and Month are class variables with 24, 7 and 12 levels respectively. Consequently, there are totally 50 fuzzy parameters to be estimated in M2.

$$\begin{aligned} Load(t)&= A_1 +A_2 T(t)+A_3 T^{2}(t)+A_4 T^{3}(t)+A_5 T(t-1)+A_6 T(t-2) \nonumber \\&\quad +\,A_7 T(t-3)+A_{8,Hour} Hour+A_{9,Weekday} Weekday\nonumber \\&\quad +\,A_{10,Month} Month \end{aligned}$$
(5)

To demonstrate the effectiveness of our proposed Fuzzy Interaction Regression approach, we also add a Multiple Linear Regression (MLR) model for comparison. This MLR model has the same variables as M2:

$$\begin{aligned} Load(t)&= \beta _{1} +\beta _{2} T(t)+\beta _{3} T^{2}(t)+\beta _{4} T^{3}(t)+\beta _{5} T(t-1)+\beta _{6} T(t-2) \nonumber \\&\quad +\,\beta _{7} T(t-3)+\beta _{8,Hour} Hour+\beta _{9,Weekday} Weekday\nonumber \\&\quad +\,\beta _{10,Month} Month \end{aligned}$$
(6)

where the \({\upbeta }\)’s are crisp parameters. While parameter estimation of the fuzzy regression models are performed by solving the linear programming problems in CPLEX, parameter estimation of this MLR model (M3) is done through the ordinary least square method in SAS 9.3.

3.2 Fuzzy interaction regression model

In a MLR model, when the independent variables are not independent of each other, interaction effects should be considered. In this paper, we consider the following interactions:

  1. (1)

    Temperature and hour of the day: temperature is high during the day and low at night;

  2. (2)

    Temperature and month of the year: temperature is high in the summer and low in the winter;

  3. (3)

    Hour of the day and day of the week: the hourly electricity consumption of the customers (residential, commercial and industrial) varies due to business schedule.

Therefore, we include the above three interaction effects to obtain the fuzzy interaction regression model (M4) below:

$$\begin{aligned} Load(t)&= A_1 +A_{2,Month} Month+A_{3,HourWeekday} Hour {*} Weekday \nonumber \\&\quad +\,A_{4,Month} T(t) {*} Month+A_{5,Month} T^{2}(t) {*} Month\nonumber \\&\quad +\,A_{6,Month} T^{3}(t) {*} Month +A_{7,Month} T(t-1) {*} Month\nonumber \\&\quad +\,A_{8,Month} T(t-2) {*} Month +\,A_{9,Month} T(t-3) {*} Month \nonumber \\&\quad +\,A_{10,Hour} T(t) {*} Hour+A_{11,Hour} T^{2}(t) {*} Hour\nonumber \\&\quad +\,A_{12,Hour} T^{3}(t) {*} Hour +A_{13,Hour} T(t-1) {*} Hour\nonumber \\&\quad +\,A_{14,Hour} T(t-2) {*} Hour+A_{15,Hour} T(t-3) {*} Hour \end{aligned}$$
(7)

If a quantitative variable, e.g., \(T(t)\), interacts with a class variable, e.g., Month, the quantitative variable does not have to appear as the main effect. If a class variable, e.g., Hour, interacts with a class variable, e.g., Weekday, neither of them need to appear as the main effects. Therefore, the temperature variables, Hour and Weekday are not showing as main effects in (7).

4 Experiment

4.1 Data

In this paper, we use 3 years (2005–2007) of hourly load and temperature data from ISO New England to conduct the experiment. Figure 1 shows the line plot of the hourly loads, where we can observe that the demand is high in winter and summer but low in spring and fall. This is primarily due to the usage of HVAC (heating, ventilation, and air conditioning) systems. The annual peaks of this utility in these 3 years are over 25 GW, while the minimum load level is around 10 GW.

Fig. 1
figure 1

Three years of hourly loads (2005–2007)

To further investigate the load series at hourly interval, we number the 168 h of a week from 1 to 168. We then use the loads of 2005 to create a group of box plots (Fig. 2), which shows the weekly load profile starting from Sunday on the left to Saturday on the right. Each box plot contains high and low extremes, 1st, 2nd and 3rd quartiles. As shown in Fig. 2, the daily load profiles are more or less different from each other, while the profiles of weekends are lower than those of weekdays.

Fig. 2
figure 2

Weekly load profile (2005)

The scatter plot in Fig. 3 shows the relationship between load and temperature, which is an asymmetric U shape. The comfortable zone is around 50\(^{\circ }\)–65\(^{\circ }\), where the load is at the lowest level. As the temperature goes towards the two extremes, the load is getting high.

Fig. 3
figure 3

Relationship between load and temperature (2005)

4.2 Forecasting process

Typically, a short term load forecaster needs to submit the hourly load forecast of the next day before a certain time of the current day. If the deadline for forecast submission is 8 am, the forecaster may only have access to actual loads and temperatures up to Hour Ending (HE) 7. To evaluate the forecasting accuracy of the fuzzy regression models, this paper emulates the 1 day ahead load forecasting process for the year of 2007. The models with the variables mentioned in Sect. 3 are run on a rolling basis for 365 times to forecast the hourly loads for each day of 2007. The parameters are being updated on daily basis using load and temperature data available by hour ending 7 of each day. A fixed length (2 years) of moving window is used to estimate the model parameters. Actual temperatures are used for the forecasted days. The experiment is conducted on a PC with 1.66 GHz CPU and 3G RAM. The entire experiment, running 4 models, 365 times each, takes about 5 h. In other words, running each model once only takes less than a minute on average. The run time is short enough to be negligible in day ahead forecasting operations.

4.3 Results and comparison

Table 1 shows the parameter estimation results using M1 for both winter and summer peak days. Due to usage of dummy variables, both M2 and M3 have over 30 parameters. With interaction effects, M4 has over 250 parameters to be estimated. Instead of listing their parameter estimation results, we will show the accuracy statistics of the forecasts and line plots of the forecasted loads during these two peak days.

Table 1 Parameter estimation results of M1

Mean Absolute Percentage Error (MAPE) is one of the most commonly used error measures to evaluate forecasting accuracy in the utility industry. We first compare the accuracy of the three fuzzy regression models using their forecasted centers. In addition to MAPE of 8760 hourly loads in 2007, we also calculate the MAPE of daily peak, daily energy, annual peak day which falls in the summer, and winter peak day. As shown in Table 2, for each of the five measures, M4 is more accurate than M2, which is more accurate than M1. M4 outperforms M3 in most measures other than the MAPE of winter peak day. Figures 4 and 5 are showing the actual and forecasted loads for annual and winter peak days respectively. The conclusion based on these figures is consistent with the one based on MAPE values in Table 2. In addition, although M4 does not result in as low MAPE as M3 in the winter peak day, it does better than all the other three models in capturing the actual loads during the peak period, as shown in Fig. 5 with hour ending 18–20. Overall, the fuzzy interaction regression model (M4) shows superior accuracy over the other three models.

Table 2 MAPE values of the four models
Fig. 4
figure 4

Model comparison for the annual peak of 2007 (Friday, Aug 3rd, Peak at Hour Ending 15, 25785 MW)

It should be noticed that the behavior of M1 for the winter peak day is similar to the ones shown in Al-Kandari et al. (2004): the forecasted values are close to the mean of actual loads. This can be due to (1) the model does not include all the effects that can capture all of the salient features of the load profile; (2) the slope of load temperature scatter plot below the comfortable temperature zone is less steep than the one above the comfortable zone (Fig. 3).

Fig. 5
figure 5

Model comparison for the winter peak of 2007 (Monday, Feb 5th, Peak at Hour Ending 19, 21321 MW)

Other than comparing the estimated centers, we also compare the spreads generated by the three fuzzy regression models for the annual and winter peak days. Table 3 shows the average spread and the average percentage spread, which is equal to the spread divided by center, for each model during each of the two selected days. We can observe that the more accurate the model is, the narrower the spread is. In other words, the salient features, which are not captured by M1, are being treated as the fuzziness of the system, which is represented through a wide spread. As the salient features are being modeled by a more advanced system with the parameters such as calendar variables and their interactions with temperature variables, the estimated spread is getting narrower.

Table 3 Spreads generated by the three fuzzy regression models for annual and winter peak days

5 Discussion

5.1 Comments on the approach proposed in (Al-Kandari et al. 2004)

There are three major defects that make (Al-Kandari et al. 2004) misleading to practitioners:

  1. (1)

    Questionable parameter estimation. In Table 1 of (Al-Kandari et al. 2004), all the centers and spreads of the parameters to the temperature variables are estimated to be zero. This means that the load has no relationship to the temperature, which can hardly be reasonable for the Nova Scotia Power case study presented in that paper. The result is very likely due to incorrect parameter estimation. As shown in Table 1 of this paper, the nonzero parameter estimates for the temperature variables help capture the relationship between the load and temperature.

  2. (2)

    Flat centers and wide spreads. Take Fig. 1 of (Al-Kandari et al. 2004) for example, the forecasted centers are around 740 \(\pm \) 50 MW, while all of the spreads are 335.7 MW. Each spread is over 40 % of the center. Although the upper and lower bounds cover the actual loads, the flat centers and wide spreads can hardly assist the decision making process in utility daily operations.

  3. (3)

    It is concluded in Al-Kandari et al. (2004) that three parameters are adequate to represent the load for the crisp case and ten parameters are enough to model the type of load. This is a false statement and can be quite misleading to the practitioners in the field. On the other hand, in this paper, the comparison among M1, M2 and M4 shows that the seven parameters are far from adequate to capture the salient features of the load profile. It’s crucial to improve the underlying linear model in order to improve the forecasting accuracy of the fuzzy regression model.

5.2 Tips for fuzzy regression based load forecasting

This paper offers the following four tips for fuzzy regression based forecasting:

  1. (1)

    Select variables. Practitioners can follow similar variable selection approach as MLR analysis to select the variables for fuzzy regression models. As the underlying linear models are being improved, the fuzzy regression models can usually be improved.

  2. (2)

    Determine threshold \(h\). As long as the threshold \(h\) is \(<1\) and greater than or equal to zero, the estimated center based on the same set of data should stay the same.

  3. (3)

    Interpret the resulting spread. When h is equal to 0, it is assumed that the range of system output based on the historical observations is the largest possible range in the history. When h is \(>0\), it is assumed that the range of system output based on the historical observations is the largest possible range in the history multiplied by 1–\(h\). It is not guaranteed that all of the future loads will be covered by the forecasted upper and lower bounds. The higher \(h\) is, the wider the spread will be, and the higher possibility that the actual loads in the future will be within the bounds. For instance, Figs. 6 and 7 show the forecasting results of M4 in the winter and summer peak days. Actual loads of the annual peak day are all within the forecasted upper and lower bounds in Figs. 6, while HE 20 of the winter peak day is outside the upper bound in Fig. 7.

  4. (4)

    Fuzzify the crisp inputs. Unless there are physical implications that the metered loads or recorded temperatures are not enough to represent the actual situation, we do not recommend fuzzifying the crisp inputs at the first stage. Arbitrarily fuzzifying the inputs as shown in Al-Kandari et al. (2004) is subjective and lack of defensibility. Instead, it should be encouraged to try to improve the underlying linear model. This does not imply that we should not fuzzify the inputs at all. If the utility has been practicing load control without proper records of load control activities, fuzzifying the inputs can be considered.

Fig. 6
figure 6

Forecasting results of M3 during annual peak day of 2007 (\(h = 0.1\))

Fig. 7
figure 7

Forecasting results of M3 during winter peak day of 2007 (\(h = 0.1\))

6 Conclusion

Although the theoretical framework of fuzzy regression has been studied for decades, the technique has not been understood by the utility industry to properly apply it to STLF. This paper points out the lack of effort in improving the underlying linear models for fuzzy regression. We propose a fuzzy interaction regression approach to STLF. Through comparisons to two fuzzy regression models and one MLR model, the proposed approach shows significant improvement over its counterparts. This paper also offers three critical comments to a notable but questionable paper on its parameter estimation, forecasting results and conclusion. Finally, four tips on practicing fuzzy regression for forecasting are discussed. Future work on this topic includes: (1) modeling special effects using fuzzy regression models, such as weekend and holiday effects; (2) a comprehensive comparison between fuzzy regression models with MLR models for STLF.