1 Introduction

Studying energy consumption problems has become an important topic of research in recent decades. Energy problems are vital for the security and well-being of the societies. According to economic theories, energy is one of the most important resources for industrial production, and forecasting energy consumption is an important phase for macro-planning of the industry and energy sectors [1]. Long-term planning of energy supply-demand must satisfy the requirements of sustainable development of the countries. Accurate forecasts can help the decision makers to know the volume and trend of the future energy consumption to better schedule and plan the operations of the supply system.

Energy consumption had been increased during the recent decades according to the population growth, and industrial and economic development in many underdeveloped countries. As an example, countries of South East Asia (ASEAN) have the annual growth rate of 3.8 % in final energy consumption [2]. From 1990 to 2005, petroleum consumption in USA had grown 20.4 %, and natural gas consumption also had an increase of 16.32 % [3]. Nowadays, energy is an important factor for economic and even socioeconomic development of the countries [4]. Worldwide dependence to energy is increasing day by day and is seen in many (if not all) aspects of human life. As it is very difficult to store some types of energy (such as electricity), it is very important to distribute the energy with the least cost and waste.

Efficient planning of distributing energy needs accurate forecasts of future demand to make the balance between the supply and demand of the energy [5]. Forecasting errors lead to unbalanced supply-demand, which negatively affect the operational cost, network safety, and the service quality of the supply network [6]. Underestimation of energy consumption can lead to power outage, which can be harmful both for economy and daily life of the society. On the other hand, overestimation of energy demand may lead to creating unused capacity that is equal to wasting the resources, mostly financial. Therefore, using models to accurately forecast the future energy consumption trends—specifically with nonlinear data—is an important issue for the power production and distribution systems [7].

Base operations of the power systems such as economic dispatch, unit maintenance, fuel scheduling and unit commitment can be performed more efficient by having more accurate forecasts [9]. Forecasting studies are important from the aspect of pricing as well. Energy price variation is a function of supply-demand balance. Energy price variations, climate change, increasing global energy demand, dependence to fossil fuels and less development of new energies threatens the energy supply security [10]. Environmental issues as global warming and emission are the other important aspects of energy consumption forecasting. Pachauri et al. [8] noted that more than 75 % of the human made green-house gas is the result of burning fossil fuels. Using fossil fuels is the most common way to produce electricity. Therefore producing electricity both deplete the natural resources and make environmental pollution.

All in all, the importance of energy demand management has been more vital in recent decades as the resources are getting less, emission is getting more and developments in applying renewable and clean energies is not globally applied yet. As demand forecasting is the most important topic in demand management area [11], the number of research performed with the forecasting topic is gaining more and more attention. As both of the terminology of “Forecasting” and “Prediction” are used in the literature, Authors of the current work used the search tool of Thomson Reuters Web of Science to find out, which word is more popular in the topic of the published articles. Results showed that there is not a meaningful difference in the number of articles using “Forecasting” or “Prediction” as their topic. But, the increasing trend of articles related to energy topics is obvious. Figure 1 compares the number of articles published from 1995 to 2014 in the field of forecasting energy consumption. As it is seen in Fig. 1, there is an increasing trend of publications in the field in recent years. It is also seen that the number of publications using the words “Power” and “Gas, Fuel, Oil” has increased during the years, but the number of publications with these words in title is not as high as publications with the words “Energy” and “Electricity, Load” in their title.

Fig. 1
figure 1

Published papers in the field of energy related forecasting/prediction between 1995 and 2014

There are many methods and models developed to forecast the demand in many industries and sciences, the most noteworthy of which are reviewed in this paper. Section 2 has some explanations about different applications of demand forecasting. Section 3 reviews the 10 most attractive energy demand-forecasting models in the last decade (2005–2015) according to the number of published research, and Sect. 4 has some conclusions.

2 Energy demand forecasting

Forecasting the consumption load is an important issue of economic and safe operations planning in power distribution systems. The terminology of forecasting, estimating, and predicting are the words that are used in markets regarding the concept of having an expected value for demand in the future. There are mostly three categories of energy consumption studies in the literature according to the forecasting horizon. Long-term forecasting (5–20 years) is mostly applied for resource management and development investments. Mid-term forecasting (a month to 5 years) is mostly applied for planning the power production resources and tariffs, and short-term forecasting (an hour to a week) is mostly used for scheduling and analyses of the distribution network. As energy demand changes by different time, climate variables, socioeconomic and demographics parameters, accurately forecasting the consumption value is both important and difficult task [12].

Most of the forecasting methods can be categorized to two major categories of causal and historical data based methods. In the causal methods, the cause and effect relation is considered between the energy consumption as the output and some input variables such as economic, social, and climate factors. Artificial neural networks (ANNs) and regression models are the most frequent causal methods used to predict the energy demand. On the other hand, methods based on the historical data use the previous values of a variable to forecast the future values of that variable. Time series, Grey prediction and autoregressive models are among these methods [13]. On the forecasting horizon point of view, most of the forecasting horizons are hourly, daily, weekly, monthly, and annually, which forecast the system demand and peak demand value for the under study horizon [14].

As the energy consumption data gets completed over the time, it can be assumed that the energy consumption data forms a time series, and the previous values can be used to predict the future values. As long as the forecasting is performed based on shorter horizons, input data must be more accurate in terms of variations [15]. This may be the reason that inputs such as temperature and humidity are mostly used in short-term forecasting according to the fact that the value of these variables can differ hourly. On the other hand, most of the socioeconomic input variables such as Gross National Product (GNP), and population of an area are considered in long-term forecasting according to the fact these factors are measured annually.

The main point of production capacity planning and scheduling it is to have an accurate forecast of future electricity demand. This forecasting is a difficult work as the demand data includes the unpredicted trends, high levels of noise and is affected by many of the unknown external variables [16].

3 Forecasting methods

As the energy production, distribution and consumption had been an important place of research for decades, there are many valuable works in the field of forecasting energy consumption [1719]. Many different approaches are used by the researchers to forecast the energy demand with different applications such as applying data mining techniques to forecast the building energy consumption [20], steady-state simulation to forecast the electricity consumption of household freezers and refrigerators [21], Bayesian networks to forecast the house energy consumption [22], artificial bee colony to forecast the energy consumption of Turkey [23], gene expression programming to forecast the electricity demand of Thailand [24].

The most useful methods for forecasting energy demand are reviewed by Suganthi and Samuel [25], and are categorized due to the used techniques in each study. Because of the importance of energy demand management when it comes to economic prosperity and environmental security, they studied many models, including traditional methods such as regression, econometrics, and time series, as well as soft computing techniques such as genetic algorithms, fuzzy logic, and ANNs. Reviewing the demand forecasting models for short-term electric loads were performed by Abu-El-Magd and Sinha [26], where the authors studied both off-line and on-line methods. Ghods and Kalantar [27] reviewed the models for forecasting long-term electric loads. They have reviewed several methods, including some traditional methods, genetic algorithms, fuzzy logic, neural networks, wavelet networks, support vector machines, and expert systems. Other studies in this field are reviewed by Bajay [28], Hyndman [29], and Mukherjee [30].

Many models are proposed by Srinivasan [31] for forecasting the electricity consumption of different sectors of industrial, residential, non-industrial, commercial, public lighting and entertainment. She showed the higher accuracy of ANNs comparing to some traditional methods such as time series and regression. Baker and Rylatt [32] used the combination of questionnaire surveys and some traditional and more new methods to find a forecast of both gas and electricity consumption in UK domestic sectors. Another mixed energy distribution study is done by Pedersen et al. [33], where they statistically analyzed the hourly electricity and heat consumption to have a prediction for planning the distribution system. The study of forecasting the heating and cooling energy demand in London is done by Kolokotroni et al. [34]. They tried to find out according to the hourly temperature, how much energy is used by the city to heat or cool the buildings. A novel study of using weather forecasting information to predict the daily energy demand of buildings is performed by Kwak et al. [35].

Studying the review articles on demand forecasting shows us that there is an agreement among all authors that, according to the forecasting accuracy measures, none of the methods outperforms the others in all situations. Due to this fact, there is a need to review the recently used methods of demand forecasting regarding the different aspects of industries and social services. Current study reviews the ten most-applied energy demand forecasting models between the years 2005 and 2015. It is noteworthy that the sequence of reviewed methods is based upon the number of articles, which used that method in the most recent decade. Moreover, the most cited and novel articles of each category are summarized in a comparative table at the end of the subsection explaining each category.

3.1 Artificial neural network models

The ANN is categorized as a data driven approach. The data is used by ANN to capture the relation between input and output variables and forecast the output values. An ANN is composed of a network of processing nodes (or neurons), which perform numerical manipulations and are interconnected in a specific order. The historical data can be used by ANNs to predict the future values of the noisy multivariate time series [36]. ANN uses a summation method and a transform function to process the data. The real output and generated output values by ANN get compared to measure the accuracy of the developed ANN, until obtaining an acceptable output [37]. According to Chen et al. [38], ANNs are most well-known for being suitable to forecast the outputs of nonlinear datasets, parallel processing to efficiently perform the different simultaneous tasks, and the adaptability to different environmental conditions which is the result of their learning features.

3.1.1 Large locality electricity demand

Yang and Li [40] used the optimized neural network by genetic algorithm to forecast power demand. The authors believed their proposed approach avoids the slower processing speeds in obtaining the optimal solution and would avoid being trapped into a local optimal solution. The economic indicators are used to predict the energy consumption in Turkey by using ANNs [41]. The data of Gross Domestic Product (GDP), GNP and country population from 1968 to 2005 are used to train the models. The Mean Absolute Percentage Error (MAPE) and R-Squared values showed the accuracy of the ANN to predict the net energy consumption of the country. In another electricity-related work, Gonzalez-Romera et al. [15] noted that the electricity demand variations were composed of a rising tendency, due to the influence of economic and technological evolution and the fluctuations caused by the difference in demand from month to month. They applied neural networks to obtain an accurate forecast of the time series of monthly electric energy demands. The authors declared that the MAPE value of about 2 % is obtained by their proposed network.

Forecasting the Spanish monthly electric demand was studied by Gonzalez-Romera et al. [42]. The authors believe in the periodic behavior of consumption, and used Fourier series to predict this periodic behavior; the trend of electricity demand was predicted with a neural network. By measuring the MAPE, it is shown that the proposed model outperforms ARIMA and other neural networks without using a Fourier series. Also, Zhang and Wang [43] proposed forecasting models based upon back-propagation neural networks and particle swarm optimization, in order to predict the annual electricity demand in one of the provinces of China. Ekonomou and Oikonomou [44] tried different ANNs to forecast the daily electricity demand in Greece. Applying the back-propagation neural network was chosen by Wang and Liang [45] to forecast the energy demand, based upon the assumption that traditional methods can’t effectively study the information supplied with just a date. Moreover, back propagation ANN is used to forecast the electricity demand of Turkey based on the different socioeconomic indicators [46]. The percentage and absolute mean square errors are used to show the efficiency of the proposed models.

Forecasting the long-term energy demand of Greece is studied by Ekonomou [4]. He applied the multilayer perceptron model ANNs to predict the energy consumption and compared the results with the results of a support vector machine (SVM) model, a linear regression model and the actual data, which showed the great accuracy of the developed model. Cunkas and Altun [47] used artificial neural networks to forecast the long-term demand of electricity in Turkey. Additionally, Radial Basis Function (RBF) neural networks were used to forecast the long-term peak load of electricity in Iran, as studied by Ghods and Kalantar [48]. A comparative study among autoregressive integrated moving average (ARIMA), artificial neural network (ANN), and multiple linear regression (MLR) models was performed by Kandananond [49] to formulate the prediction models of the electricity demand in Thailand. He applied these models to the historical data regarding the electricity demand from 1986 to 2010. ANN shows better forecasts, due to the comparison of mean absolute percentage error. A comparison between mathematical models and neural networks was done by Filik et al. [50] to forecast the long-term electricity demand in Turkey. In another study, Ahmad et al. [51] proposed four model selection strategies to find the best multilayer feed-forward neural network for forecasting the load demand in a Malaysian electric power supply system. Also, the wavelet-transformed data and a neural network were applied by Bunnoon et al. [52] for load prediction in the planning of a power system. Using the ANN to predict the annual electricity consumption in India from 2011 to 2020 was done by Saravanan et al. [53].

The combination of a multi-output Feed Forward Neural Network (FFNN) and a signal filtering/seasonal adjustment based on Empirical Mode Decomposition (EMD) was used by An et al. [5] to accurately forecast the electricity demand. Applying the proposed model to the electricity demand of New South Wales in Australia, the authors declared that their model improves the forecasting accuracy compared to other useful models. Hassan et al. [55] used aggregation algorithms to complete more accurate forecasts of Australian electricity demand data using neural networks. From the most recent study out of these, Hassan et al. [58] compared different aggregation algorithms in regards to the forecasts of electricity demand obtained by individual neural networks.

3.1.2 Large locality gas demand

Zhaozheng et al. [13] used the combined grey forecasting model and ANNs to predict the Chinese consumption of oil products in 2020. Moreover, using the back-propagation neural network model to predict the Blast Furnace Gas (BFG) consumption in Iron and Steel Works was done by Zhang et al. [54].

3.1.3 Large locality heat demand

An evolutionary designed neural network to predict the heat demand in Komořany, Czech Republic, was studied by Chramcov and Varacha [56]. Voronin and Partanen [57] applied a hybrid approach to predict the electricity price and demand in the Finland energy market.

3.1.4 Building energy demand

The study of forecasting the hourly building energy demand is performed by Gonzalez and Zamarreno [39], where they used both current and forecasted temperature values, the load and the time as the input of a feedback neural network. The precision of the proposed model is approved by comparing the results with the real data and also the results of the proposed models by some other researchers (Table 1).

Table 1 Most important and quality articles using ANNs for Energy demand forecasting

3.2 Fuzzy logic

Many of imprecise and qualitative energy consumption data are modeled by fuzzy sets. Fuzzy sets are applied in making decisions based on imprecise or vague data, dealing with human reasoning processes, and analyzing uncertainty at various process stages [59, 60]. Fuzzy sets theory enables the systems to express their rules in “if-then” forms, which resolve the need to a mathematical analysis for modeling [61]. Fuzzy forecasting methods need fewer observations than many of the other methods, and they can use incomplete data to produce the forecasted values, although the output of fuzzy methods is not always acceptable [62]. Fuzzy sets enable the modelers to reduce a large amount of data into a smaller amount of variable rules, which will be applied by fuzzy forecasting models [63].

3.2.1 Large locality electricity demand

The future trend of electricity consumption in Northern Cyprus is studied by Abiyev et al. [64]. They used the neural network based on fuzzy inference system. Comparing the actual data with the simulation results of the developed system shows the efficiency of the proposed method. Forecasting the monthly electricity demand by an neuro-fuzzy approach was studied by Ucenic and George [65]. The Adaptive Neural Fuzzy System was used to forecast the load demand. The results were more satisfactory than the common autoregressive and autoregressive moving average models. The short-term gross annual electricity demand in Turkey was forecasted by using fuzzy logic, as well [10]. The authors believed the model captured the system’s dynamic behavior, and had an acceptable forecasting error. The results show that comparing the country’s short-term gross electricity demand with the economic performance presented more reliable projections.

A weighted evolving fuzzy neural network was applied by Chang et al. [16] to forecast the monthly electricity demand in Taiwan. The authors adopted a weighted factor to find the most important factors among the different rules, and a different rule clustering method was developed as well. Judging by the MAPE value, the proposed approach had a better performance in comparison to the other methods. Predicting the demand of a power engineering company in Bangladesh was also studied by Kabir and Sumi [66], where they used an integrated fuzzy Delphi method with ANN.

The fuzzy inference approach is used in a particle swarm optimization model to predict the short-term energy consumption by Adika and Wang [67]. Results showed that the developed model has more accurate forecasts than classic fuzzy models with membership functions which are defined by heuristics. Kazemi et al. [1] used a Multi-Level Fuzzy Linear Regression Model to forecast the energy demand of the industry sector in Iran. The fuzzy linear regression was applied based upon socio-economic indicators. A fuzzy wavelet ANN is developed by Amina et al. [68] to predict the hourly electricity demand of the Crete Island, Greece. Main contribution of their model is to provide more accurate results than the classic neural networks approved by applying the real data of the case study. Moreover, Sari [69] used a fuzzy seasonal time series to forecast the monthly energy demand of Hatay in Turkey. The fuzzy forecasting methods were used by Avila et al. [70] to predict the demand of the renewable-energy based micro-grid in Huatacondo, Chile, as a control tool. They compared the stable Takagi and Sugeno fuzzy model with an adaptive ANN, and the result shows that the fuzzy model had the better predictive ability. Forecasting the demand in power distribution systems with fuzzy methodology was studied by Moraes et al. [71]. The future demand was forecasted, based on the historical data, utilizing a fuzzy system which obtained the highest correlation as compared to previous forecasting errors.

Recently, Hassani et al. [72] proposed a self-similar neuro-fuzzy model to forecast the short-term electricity consumption for New England. The proposed model was composed of some local linear neuro-fuzzy models. Moreover, a linguistic out-sample approach of a fuzzy time series was proposed by Efendi et al. [74] to predict the electricity demand in Malaysia. The numerical in-out samples forecasted in the fuzzy time series was determined by fuzzy logical relationships and the midpoints of interval. The weights of fuzzy logical relationships were meant to compensate for the presence of bias in the forecasting. In the most recent study the characteristics of households are used by Vieira et al. [75] to predict the electricity consumption. Fuzzy clustering is used to conclude the useful information out of the all available data and forecast the daily electricity consumption based on the achieved information.

3.2.2 Large locality gas demand

Iranmanesh et al. [3] used the hybrid neuro-fuzzy models to forecast the long-term energy demand. This proposed approach used the local linear neuro-fuzzy model for forecasting and the Hodrick–Prescott filter for extraction of the trend and cyclic components of the energy demand series. The performance of the proposed model was shown by using the data for the monthly demand of crude oil, gasoline, and natural gas for the United States from 2008 to 2010. The problem of forecasting the natural gas demand is studied by Rodger [73], where he developed a fuzzy nearest neighbor neural network (Table 2).

Table 2 Most important and quality articles using Fuzzy logic for energy demand forecasting

3.3 Time series models

Recording the ordered sequence of the values of a variable at fixed time intervals creates a time series. Time series forecasting is to predict the future values of a variable based on the previously observed values of it [76]. Time series models are often categorized as top-down models, and represent the relationship between the variable’s values with time. A typical example of time series is the gold ounce price recorded at fixed time intervals, which are used to predict the future price of the gold ounce. Results of simple regression time series are often promising, and lead to developing univariate and multivariate models. Campo and Ruiz [77] believe that some of the developed time series models such as ARIMA models and state space models are among the most useful short-term forecasting models.

3.3.1 Large locality electricity demand

The problem of forecasting the monthly peak demand of electricity in north India was studied by Ghosh [78], where he used two different time series methods–that of multiplicative seasonal autoregressive integrated moving average and Holt-Winters multiplicative exponential smoothing. The seasonal ARIMA outperformed the Holt-Winters method in terms of square error and absolute error values. Mati et al. [79] used the time series to forecast the electricity demand in Nigeria. A multiple regression time series was applied, and electricity consumption and percentage connectivity to the national grid were considered the independent variables of the model.

Garcia-Ascanio and Mate [80] used the interval time series to forecast the monthly electricity consumption per hour in Spain. They compared the multi-layer perceptron model modified for interval data to a forecasting approach which took into consideration vector autoregressive models adapted to interval time series. The results of this study show that the interval time series’ forecasting methods will reduce the risk of operational decisions and power system planning. Moreover, a chaotic time series method was used by Wang et al. [81] to forecast the electricity demand. The authors believed that the electricity demand series had the chaotic characteristics, and that the proposed method could effectively predict the demand with a mean absolute relative error of 2.48 %. It is notable that the seasonal effects were considered in the method by using a trend adjustment technique; additionally, a data set from the network of New South Wales in Australia was used to simulate the needed data.

A time series approach was applied by Shang [82] to forecast very short-term electricity demand in south Australia. The used approach slices the seasonal univariate time series into a time series of curves, and applies functional principal component analysis prior to using the regression techniques and univariate time series forecasting method. The authors believed that their approach is able to improve the accuracy for both point and interval forecasts. Simmhan and Noor [83] applied the incremental clustering of time series to forecast the energy consumption. The main contribution of the authors is about applying the method to big data and used 700,000 input data points to show the efficacy of the developed model both in terms of accuracy and prediction time. Recently, Rana et al. [84] studied a feature selection problem by combining the time series and a neural network approach in order to forecast the electricity demand in Australia and the United Kingdom. They used the half-hourly electricity demand data and showed that their approach constructed valid prediction intervals in most cases (Table 3).

Table 3 Most important and quality articles using Time Series for Energy demand forecasting

3.4 Grey (gray) prediction

Grey Theory deals with systems having small samples or poor data information. The grey (gray in some studies) terminology is used for this theory according to the presence of vague data. More precisely, as white systems have completely known information, and black systems have completely unknown information, the grey systems are defined as the systems with partially known and partially unknown information. The GM(1,1) notation is used to show the first-order grey models with just one variable, which is the most frequently-used among the family of Grey Prediction models [85]. Grey prediction is widely used in energy demand forecasting studies according to its high forecasting accuracy in comparison to the other methods and needing a relatively low number of data items to construct a forecasting model. Grey prediction is consisted of the three basic operations of accumulated generating operators, inverse accumulating operators, and grey models [86].

3.4.1 Large locality electricity demand

Zhou et al. [87] proposed a grey prediction model to forecast the electricity demand in China. A trigonometric grey prediction approach was presented by combining the classical grey model GM(1,1) with the trigonometric residual modification technique. Forecasting the middle-term electricity demand using a grey model was performed by Niu et al. [88], as well. The authors applied the vector \(\theta \) into the calculation of the background value array and generalized the GM(1,1,\(\theta )\) model, where the \(\theta \) value was optimized by particle swarm optimization. The proposed model had a higher precision for middle-term forecasting compared to the GM(1,1) model.

Akay and Atak [86] used the grey prediction model to forecast the electricity demand in Turkey. The authors believed that the country’s electricity consumption was chaotic, due to the uncertain economic structure of the country. They presented a grey prediction model with rolling mechanisms, and believed that this method could work with limited data and slight computations beside the high prediction accuracy. A grey-RBF neural network was proposed by Liu et al. [90] to predict the electricity demand in China. A grey-based prediction algorithm was presented for electricity demand-control purposes. Since the general GM(1,1) prediction had the problems of overshoots and dissipation, the RBF neural network was applied to correct the prediction. Moreover, the grey-Markov model was applied to forecast the electricity demand [91]. The presented model widened the application of grey prediction, and also overcame the effect of random fluctuation data on accurate forecasting.

The problem of forecasting energy consumption in Zhejiang, China is studied by Wang et al. [92], where they combined Grey and multiple regression models. The accuracy of case study results shows the more accurate results of the combined model in comparison to the individual Grey or multiple regression models. You and Wang [93] used the grey models to forecast energy demand. They used the correlation coefficient method, the regression model, and Granger causal relation to analyze the relationships between the economic variables and demand for energy. Forecasting the electricity consumption in Wenzhou, China is studied by Wang [94], where he combined the Grey method and multivariate statistical techniques. He believed that MAPE criteria is the best measure to show the accuracy of Grey models and this criteria is used to show the accuracy of the developed model in predicting the short-term electricity consumption.

In another energy consumption study, Yanjun and Yuliang [95] used the GM(1,1) model to predict the energy demand for the Henan province in China. They combined the grey model with Granger causal relation to analyze the relationship between the economic variables and energy demand. In a similar study, Pi et al. [96] used a grey forecasting model to predict the energy demand in China. Three different methodologies for the three-point-average technology and the original residual modification were used to improve the GM(1,1) model. The general trend series and random fluctuations about this trend were considered in this method.

Predicting the energy demand of China by using the Grey prediction method is studied by Feng et al. [97]. The data from 1998 to 2006 are used to predict the total energy, clean energy and coal energy consumption of China by the developed model. Results of the study show that the consumption trend for all energy types is upward, although, the clean energy has the highest slope of increasing consumption.

3.4.2 Large locality gas demand

Grey theory was used to forecast the natural gas demand by Zhou and Shuying [89]. The accuracy of the proposed GM(1,1) model was judged by the residual error (Table 4).

Table 4 Most important and quality articles using Grey prediction for energy demand forecasting

3.5 ARMA, ARIMA, SARIMA

Beside the wide applications of moving average or autoregressive methods in forecasting problems, these methods are not individually capable to model some of the stationary random processes according to the fact that those random processes have the qualities of both moving average and autoregressive methods. In this case, a general class of time series models known as autoregressive moving average (ARMA) models can be formed by the autoregressive methods [98]. Stationary time series manner can be forecasted by ARMA (p, q) models. These models are a specific class of regression models, but according to the importance of them, a separate subsection of current review is assigned to the research with the topic of ARMA, ARIMA, and SARIMA models.

Moreover, underlying stochastic processes of some non-stationary time series change over the time. These processes contain trends, seasonality, or both. Therefore, there is need to introduce a new method such as autoregressive integrated moving average (ARIMA) models. In an ARIMA (p,d,q) model, d explains the number of performed differentiations to achieve a stationary process, with p autoregressive terms and q moving average terms [98]. It is possible that in some cases the seasonality dominate the variations of the original time series. The multiplicative model SARIMA (p,d,q)(P,D,Q)S incorporates both seasonal and non-seasonal factors. Seasonal time series can be made stationary by differencing between one value and another with the lag of S or a multiple of S, which is called seasonal differencing [99].

3.5.1 Large locality electricity demand

Kareem and Majeed [100] used the SARIMA model to forecast the monthly peak load demand for the Sulaimany Governorate in Iraq. They proposed a SARIMA (1,1,0)(0,2,1)12 model and evaluated it based on measuring the MAPE from data created during the year 2005. The ARIMA and Generalized Autoregressive Conditional Heteroscedasticity (GARCH) models were used by Hor et al. [101] to forecast the daily electricity consumption. The main focus of the work was on the impacts of climate change on the electricity supply network. The GARCH model was used to model the residuals in the t-student distribution and estimate the maximum load demand. In another study, a Bayesian spatial ARMA model was proposed by Ohtsuka et al. [102] to predict the electricity consumption in Japan. A spatial ARMA (1,1) model was proposed, and a strategy of Markov chain Monte Carlo methods were used to find the parameters of the model. The numerical illustration shows the out-performance of the proposed model in comparison to a univariate ARMA model.

The problem of forecasting the electricity demand of a Hellenic power system using an ARMA model was studied by Pappas et al. [9]. The multi-model partitioning theory was used to propose a model in order to forecast the electricity load demand from January 1st 2006 to December 31st 2006, and this model was compared to three techniques: Akaike’s Information Criterion; Schwarz’s Bayesian Information Criterion; and the Corrected Akaike Information Criterion. Moreover, the problem of predicting the electricity consumption in Malaysia using the double seasonal ARIMA was studied by Mohamed et al. [103]. The Statistical Analysis System and mean absolute percentage error were used to analyze the data and measure the forecasting accuracy, respectively. The authors proposed the ARIMA (0, 1, 1)(0, 1, 1)\(^{48}\)(0, 1, 1)\(^{336 }\) model, with one-step ahead forecasts being the best-fitting. Sigauke and Chikobvu [104] used the different combinations of regression, SARIMA and GARCH models to forecast the daily peak electricity demand in South Africa. Results of developed models are compared with a piecewise linear regression model and approved that the developed model including all regression, SARIMA and GARCH models has the best accuracy among all other individual and combined methods.

Asad [14] fitted an ARIMA model to forecast the daily peak electricity demand for New South Wales, Australia. The author attempted to find how much of the historic data should be used in constructing the forecasting model in order to obtain the most accurate results based upon RMSE and MAPE measures.

Forecasting the demand in the northwest electricity grid for China by the application of a residual modification approach in SARIMA was studied by Wang et al. [105]. The residual modification models were used to improve the precision of electricity demand forecasting. The modification models’ forecasting of the electricity demand appeared to be more workable than that of the single seasonal ARIMA. Three residual modification models were applied, and all of them presented more accurate results than the single SARIMA; additionally, the combined model outperformed the other three models. Forecasting the daily load in Italy by ARIMA models is studied by De Felice et al. [6]. They used the numerical weather predictions as the input of their statistical model and showed that applying the weather forecasts will improve the accuracy of load predictions (Table 5).

Table 5 Most important and quality articles using moving average methods for energy demand forecasting

3.6 Regression models

Regression models determine a forecasting function by calculating a dependent variable value based on one or more independent variables. The terminology of “response variable” and “predictor variable” are used for the dependent variable and the independent variable, respectively [107]. Furthermore, the nonlinear regression model is a type of regression models, where the dependent variable is modeled by a nonlinear combination of the independent variables. On the other hand, the linear relationship between variables is estimated by linear regression models.

3.6.1 Large locality electricity demand

Ismail et al. [108] used a time series regression model to forecast the electricity consumption in Malaysia. They developed a multilinear regression model to investigate the impact of dependent variables upon the daily and monthly consumption rates. Linear regression using spline estimators is applied by Antoch et al. [109] to predict the electricity consumption in Sardinia, Italy. Predicting the energy consumption in the banking sector of Spain by multiple regression models is studied by Aranda et al. [110]. They have presented three models to predict the total energy consumption of the total banking sector, branches in lower winter temperature area, and branches in higher winter temperature area.

Al-Qahtani and Crone [112] used the multivariate k-nearest neighbor regression to forecast UK electricity demand. The proposed method categorized the forecasted day as a working day or a non-working day by utilizing binary dummy variables as a second feature. Moreover, the quantile regression was used by Gibbons and Faruqui [113] to forecast the peak electricity demand. The authors developed a method that used quantile regression to model the daily peak demand, and subsequently used a loss function to estimate a quantile for annual peak prediction. It was demonstrated that extreme values of both affective variables and large unpredictable shocks to demand affected the annual peak days of demand.

Halepoto et al. [115] tried to forecast the short-term load demand based upon multiple regression techniques. The authors applied linear, multiple linear, quadratic, and exponential regression models with the hour-by-hour load data based upon a specific day, and considered temperature as the variant parameter. The quadratic regression model outperformed the others according to mean square error measurements. A regression model based on climatic variables was used by Vu et al. [116] to forecast the monthly electricity demand of New South Wales, Australia. The multicollinearity and backward elimination processes were used to remove the variables with a low level of significance. The temperature, number of rainy days, and humidity data were considered the most affective variables upon electricity demand.

3.6.2 Large locality gas demand

The study of forecasting the energy consumption for supermarkets based on both gas and electricity is done by Braun et al. [114]. The criteria of the relative humidity in combination with the temperature of actual dry-bulb and humidity ratio by the dry-bulb temperature are used as the effective factors for creating the regression equations. The developed model was applied to the case of a super market in the North of England and presented a future trend of consuming both gas and electricity.

3.6.3 Building energy demand

The study of fast forecasting of heating energy demand based on the effective factors on building heat demand is studied by Catalina et al. [111]. A dynamic simulation model is proposed and the prediction model is developed using a multiple regression model, where the proposed model achieved the correlation coefficient of 98.7 % (Table 6).

Table 6 Most important and quality articles using Regression for energy demand forecasting

3.7 Support vector machines

Support Vector Machines is a machine learning method developed by Boser, Guyon, and Vapnik [117], which has the capabilities of classification and regression. There are two datasets of training and testing in a SVM framework. The subtle patterns in complex data sets are recognized in SVM by using a learning algorithm. In SVM, non-linear trends in input space are mapped to linear trends in a higher dimensional feature space. The support vectors are enough for the generalization of a SVM and that does not dependent on the complete training data. A SVM seeks to minimize the upper bound of the generalization error, unlike the neural networks that minimize the empirical error [118].

Support vector classification and support vector regression are the main categories of SVM. Support vector regression, which is mostly the focus of this section, tries to achieve the generalized performance by minimizing the generalization error bound. It is notable that the Support Vector Regression (SVR)-produced model only depends on a subset of the training data [119].

3.7.1 Large locality electricity demand

Wang et al. [120] proposed a SVR model to forecast the electricity demand for northeast China. They produced smooth data series without seasonal variations by applying one-order moving averages. The smooth data was then inputted into the \(\varepsilon \)-SVR model, and consequently accounted for the removed seasonal variation. The SVR was used by Setiawan et al. [121] to predict the very short-term electricity demand for an Australian electricity operator. The results showed the out-performance of the proposed model compared to the BP neural network.

The least square SVM regression model is used by Yi and Ying [122] to predict the energy consumption in China. They considered the economic factors such as GDP, industrial structure, population, import and export as the effective factors on energy consumption and developed their model based on this assumption. The reliability of the developed model is shown in the case study as well.

Moreover, a SVM model based on rough set data preprocesses was developed by Yang et al. [124] for the problem of power demand forecasting. The condition attributes were reduced and redundant attributes were eliminated by rough set data processing. Finally, the important attribute data was used as the training sample for SVM. Kavaklioglu [46] applied the support vector regression to forecast the electricity demand of Turkey. The indicators of GNP, population, imports and exports are considered as the input of the \({\varepsilon }\)-SVR model. Applicability of the model is shown by using the data from 1975 to 2006.

The problem of forecasting the seasonal electricity consumption was analyzed by using a hybrid SVR method [125]. The proposed model combined a chaotic immune algorithm and the seasonal adjustment mechanism with support vector regression. The chaotic immune algorithm dominated the premature local optimum issues in terms of determining the parameters of the SVR model.

In more recent research, SVR was used by Fattaheian-Dehkordi et al. [127] to forecast the hour-ahead demand of electricity in a smart grid. They applied the proposed method in the greater Tehran area of Iran. The grid optimization process and an investigation on different kernel functions were used to determine the SVR parameters. Moreover, Xiong et al. [128] proposed the empirical mode decomposition-based SVR in order to forecast the interval-valued electricity demand. The developed model integrated the bivariate empirical mode decomposition and SVR, which can decompose both the upper- and lower-bound time series simultaneously. The proposed approach was applied to the monthly interval-valued electricity demand data per hour in a Pennsylvania, New Jersey, Maryland interconnection.

3.7.2 Building energy demand

The SVM model is used by Dong et al. [17] to predict the load demand by the buildings of the tropical area. Three weather features of temperature, humidity and solar radiation are considered as the input variables. The developed model is applied to four buildings in Singapore and the measures of the coefficients of variance and percentage error show the accuracy of the model. The concept of parallel SVMs is introduced by Zhao and Magoules [123] to forecast the buildings energy demand, where the parallelization speeds up the training process of the model. Simulation is used to create the needed data for multiple buildings, and the model is developed by applying the SVMs and Gaussian kernel.

Solomon et al. [126] used the SVM regression to forecast the energy demand in large commercial buildings. The proposed model just needed the historical energy consumption data of the buildings, requiring no knowledge of the buildings’ physical properties. The feature vectors of SVM were created by the time-delay coordinates (Table 7).

Table 7 Most important and quality articles using Support Vector Machines for energy demand forecasting

3.8 Genetic Algorithm

John Holland [129] works, made the Genetic Algorithm (GA) one of the most famous evolutionary algorithms. The searching process of GA starts with a population of random solutions and continues iteratively to reach a final criteria or a specific number of iterations. The operators of reproduction, mutation, and crossover will be used to update the population till the stop criterion gets satisfied. A generation is defined as a new created population of the previously mentioned operators. Generally, three stages are performed to search the solution space by GA: (1) the population points are assessed according to an objective function; (2) some points are selected as the solutions of the problem, according to the results of the first step; (3) the next generation is created by applying the genetic operators to the selected points of step 2. These three mentioned steps are repeated until the stop criterion is achieved [130, 131].

3.8.1 Large locality electricity demand

Ozturk and Ceylan [132] used the genetic algorithm to forecast the total and industrial sector electricity demand in Turkey. The GDP, population, import, and export volumes were used to predict the total electricity consumption, and GNP, import and export volumes were used as well to forecast the industrial sector electricity demand. Forecasting the electricity consumption using stochastic procedures is studied by Azadeh et al. [18], where they applied the combination of genetic algorithm and ANN to develop their model. The model applied to the data of the electricity demand of agriculture sector in Iran from 1981 to 2005, and the mean absolute percentage error of the case shows the better performance of the developed model in comparison to time series. Zhao et al. [133] used the genetic algorithm of wavelet neural network to develop a framework for forecasting the energy consumption. The developed model is effective to have a forecast of energy demand in multi-factor quantitative problems in comparison to the mathematical models which are commonly used.

The electricity demand of Nanchang in China was also forecasted by using the genetic algorithm [134]. The proposed model is based on the combination of a BP neural network and a genetic algorithm. The author believed that the proposed model outperformed the other methods, both in terms of training and operation times. A cooperative approach of ant colony optimization and genetic algorithms was proposed by Ghanbari et al. [135] to construct an expert system for energy demand forecasting. The proposed approach used GA to create a database of the expert system, and the ant colony optimization was applied to learn linguistic fuzzy rules to increase the degree of cooperation between the rule base and the data base. The results show the out-performance of the proposed method compared to the adaptive neuro-fuzzy inference systems and ANNs.

Moreover, Wang [137] accurately predicted the smart grid power demand using the genetic algorithm. The author had integrated Neural Network, K-mean, and Artificial Chromosomes embedded in a GA to propose a hybrid model. The experimental results show the out-performance of proposed model compared to the classical K-mean and ANN models.

3.8.2 Building energy demand

Recently, Nazari et al. [136] compared the particle swarm optimization and genetic algorithms for forecasting the energy demand in the residential and commercial sectors of Iran. The estimation of energy demand was estimated in both linear and exponential models. The case study results showed that the PSO exponential model was the best model, according to the MAPE measures (Table 8).

Table 8 Most important and quality articles using Genetic Algorithm for Energy demand forecasting

3.9 Econometric models

Studying the socio-economic data, economic models, and mathematical statistics is considered as Econometric studies. The cause-and-effect relationships between the dependent and independent variables are identified in Econometrics. The main objective of the method is to determine the explanatory variables and the appropriate mathematical expression of the relationship between the affective factors and the dependent variable. The causal relationships of the variables are addressed by the regression analysis–including linear and non-linear associations–and the lagged effects of explanatory variables over longer periods.

An econometric model is defined by the following steps: developing economic hypothesis; a mathematical model of the hypothesis; an econometric model of the hypothesis; an estimation of the econometric model; testing the hypothesis; and forecasting [138].

3.9.1 Large locality electricity demand

Predicting the energy demand in the Southeast Asian region was studied by Utama et al. [2], where they also used the econometric approach. The land-to-population ratio, total land area, population trends, demographic characteristics, and landscape were considered the affective factors regarding energy demand. Mtembo et al. [141] proposed an econometric model to predict the electricity demand in Zimbabwe. A multiple linear regression econometric model was proposed, and the GDP, Consumer Price Index (CPI), temperature, and population were considered the affective factors for peak demand. Econometric modeling was also used by Roming and Leimbach [142] to forecast the energy demand. The authors used both in-sample and out-of-sample selection criteria in order to create robust short-term and medium-term forecasts. The variables of GDP per capita, population density, and urbanization were considered the affective factors.

3.9.2 Large locality gas demand

Dey et al. [139] forecasted the natural gas demand for the power sector of Bangladesh by using econometrics. Economic variables, such as gas price and per-capita GDP, were considered the affective factors. Applying various statistical tests, it was shown that price has no significant impact on gas demand.

3.9.3 Large locality heat demand

The demand for fuelwood in Turkey was also accurately forecasted by econometrics [140]. The proposed econometric model was converted into a log-linear form to easily interpret the parameter estimates with respect to elasticities.

3.10 System dynamics models

Professor Forrester [143] developed the System Dynamics technique at the Massachusetts Institute of Technology. System Dynamics (SD) is a computer-oriented mathematical modelling approach that uses the inter-relation of variables in a complex setting. The main characteristics of SD include the existence of a complex system, time-to-time variations regarding the system behavior, and the existence of the feedback in a closed loop, which was considered new information about the system condition at the time.

The causal and feedback relationships are used iteratively to build a SD model. These relationships are built on differential equations, variables, and parameters. More explicitly, the SD model is a system of stocks and flows connected through auxiliaries.

3.10.1 Large locality electricity demand

The problem of forecasting the energy consumption trends in urban areas is studied by Fong et al. [144] under the different scenarios of future urban growth. The system dynamic model is used and the applied model is divided into the sub-models of residential, industrial, commercial and transportation. They believe that the lifestyle is an important factor that can affect the energy consumption trends in urban areas. Vaudreuil [145] used the SD simulation to predict the energy demands in the Montachusett region in Massachusetts. The author considered a variety of scenarios and simulations, also taking into consideration different affecting variables such as the regional services, population, land occupied, and regional attractiveness on the energy demand of the region. The system dynamics were also used by Akhwanzada and Tahar [146] to forecast the electricity demand in Malaysia. The authors proposed a SD based on a simulation, and used the per-capita consumption of electricity and the population as model variables. Wu and Xu [147] integrated the system dynamics with fuzzy multi-objective programming to predict the energy demand in a world heritage zone, China. Moreover, according to the importance of weather pollution in heritage areas, presented model is developed to forecast the volume of CO\(_2\) emission as well.

4 Conclusions and discussion

4.1 Conclusions

This paper has reviewed the most cited and quality articles of energy demand forecasting methods from 2005 to 2015. The aim is to provide an integrated viewpoint of recently presented models for the use of practitioners, who want to make decisions depending on future demand levels, and for the researchers who want to contribute to the field. The reviewed papers are categorized according to the main applied forecasting method.

Beside the increasing demand of energy in current century, the unbalanced supply and demand of energy in different time periods and geographical, climate and even political conditions, increases the importance of accurate forecasts. Moreover, effect of seasonality is evident in most of the studies and it is shown that the seasonal fluctuations is more intense in underdeveloped countries as the demand management strategies are not well defined in them.

Demand forecasting is one of the most important factors of operating and planning the energy production and distribution systems. If the energy consumption is overestimated, the idle production capacity is planned and the consumption cost of all subscribers will increase without a valid reason. On the other hand underestimating the energy consumption has its own negative economic and social impacts such as blackouts and brownouts. The forecasting models must be reliable to maintain the high level of accuracy in different time periods, and in response to various input variables values. Some researchers believe that the accurate forecasting of energy consumption affects the weather quality, supply network security, capital investment, and revenue analysis and market management.

Affective factors on short-term and long-term energy variations can be different. Economic factors, seasonal variations and climate conditions significantly affect the daily electricity consumption. Although, the historic time series data is used in many studies to predict the energy consumption, the effect of variables such as GDP, CPI, humidity, temperature, population, energy price, daylight time, number of rainy days, etc. on energy consumption is analyzed in many others. Proposed models can be applied by the producers, suppliers and regulatory authorities who want to securely supply the electricity with a reasonable cost.

Some of the models show that the general trend of energy consumption is a function of economic variables variations. Moreover, in some other studies, the energy consumption is not a function of price and it can be seen that energy consumption is considered as the most prior portion of both houses and industries expenditures.

From the view point of available historic data points, some of the studies showed that having data over a longer time period may not increase the forecasting accuracy. For example, Asad [14] showed that for forecasting one-day ahead, using the input data of last six months generates the most accurate forecasts, but for forecasting two days or more, using the input data of the last three months generates the most accurate forecasts. In some other studies such as Filik et al. [50], it is declared that the data based on longer time periods can increase the accuracy of forecasts.

The reason of applying various methods to predict the energy consumption is that when the accuracy and computing time of several methods are the same, simpler methods are preferred to more complex methods. As a conclusion, developing simple methods with acceptable accuracy is an attractive area of future studies.

4.2 Discussion

Beside the proposed regular energy consumption forecasting methods, forecasting the peak load demand is a critical issue. Accurate peak load demand forecasting is so important for secure and reliable scheduling and supply of energy in peak periods. Accurate peak time forecasting provides the possibility of efficient load shifting among the transmission substations, secure energy supply in the distribution network, energy flow analysis and scheduling the startup times. Therefore, forecasting the arrival time of peak periods in cold and hot weather conditions is a needed area for future works in order to predict more accurate weather forecasting models.

Most of the studies had shown the increasing amount of energy consumption in recent years and believe that the energy consumption will be increased in the future years as well, but this increasing trend cannot continue forever. Studying factors such as industry saturation, economic recessions, climate change and environmental concerns that can initialize the decreasing trend of energy consumption is an interesting field of study. This area of study could discover when the energy consumption starts to decrease or what changes may occur in other types of energy consumption. Another attractive research path can be finding the optimal lag of effect on energy consumption for the effective factors. Varying an effective factor can affect the energy consumption after one or more periods.

From the modeling point of view, the most cited studies applied ANNs to forecast the energy consumption and approved the dominant performance of ANN models, but ANN computation time is much more than many other methods according to its sophisticated structure. On the other hand, if we have a few numbers of training data points, forecasting accuracy of ANN can be affected by overfitting. Dong et al. [17] showed that the optimal subsets of model inputs, which are selected by abductive modeling, can reduce the dimensionality of the data and improve the forecasting accuracy of the ANN models. In most of the mid and long term energy demand forecasting problems, there are very few data points available for a training set. Therefore, reducing the dimensionality could improve the generalization when we have a few numbers of training data. Also, it is notable that in a considerable number of the studies when an insufficient number of data points available, SVM can be used as the forecasting method while maintaining accurate results.

As a future research path, the effect of environmental parameters, excluding temperature, such as wind, cloud cover and humidity on energy consumption and the reliability of forecasting variables can be studied. Another field of future research includes the development of hybrid methods. The literature shows that the classical methods cannot result in dominant outputs anymore. The other area of future research can focus on new measures to evaluate the efficiency of forecasting methods. Energy forecasting methods apply the same error based evaluation criteria, which are used for other forecasting problems such as water consumption prediction, travel demand prediction, and telecommunication devices demand prediction. So, it would be interesting if some non-error-based evaluation criteria could be defined to find out the efficiency of energy consumption forecasting methods.