Keywords

1 Introduction

The loads from the nodes of electricity distribution systems (represented by the Medium Voltage/Low Voltage (MV/LV) electric substations) vary in time and have particular characteristics in each consumption point. Therefore, to solve the problems regarding the optimal network planning and operation, the demand management and the correct billing of consumers, the Distribution Network Operators (DNOs) need to know the dynamic behaviour of the loads in their networks [1,2,3]. On the other hand, the load variations are influenced by several factors, such as consumer type, time factor, climatic factors, other electrical loads correlated with the analysed load, historical values, and consumption profile [4,5,6].

The modelling of electricity consumptions is made using the records from the databases which describe the evolution of individual and aggregated loads. These data are recorded and processed systematically using appropriate methods. The following input information is frequently used in an analysis: the daily maximum value of load, the hourly power consumption, the daily/weekly electricity amount. For a better accuracy in the modelling process, a large database should be used, including the electricity consumptions for a long-time interval and, if possible, the evolution of demographic, climatic and economic activity indices for the geographical area and time interval of interest [5, 6].

Also, there are some restrictions which the Decision Makers must consider them in their analyses [7, 8]:

  • The power flows must satisfy the fundamental laws of electrotechnics (Kirchhoff laws);

  • The balance between the obtained loads in the estimation process and the measured values.

  • The load does not depend by the structure of network.

The randomly selected working sample from the database must be subjected to a detailed analysis to identity the outliers, then following the correlation process to find the relationships between the variables represented by the power/energy consumption and the climatic and weather factors [9, 10].

In the chapter, various approaches for the load modelling from the nodes of electric distribution networks, based on the correlation and regression analysis will be proposed. The support of the proposed approaches is represented by the processing process of the load profiles belonging to the MV/LV electric substations or LV consumers recorded with the help of smart meters using the statistical tools. The structure of chapter is divided in two parts: a short review about the correlation and regression analysis is made in the first part, and in the second part the regression analysis based-approaches are presented regarding the estimation of the powers in the MV/LV electric substations (at the hour when the maximum value of the load from the system is recorded) and the demands of the residential consumers.

2 Correlation and Regression Analysis

To understand the operation of electric distribution systems, it is necessary to be studied the relationships between the state variables that characterize them (voltages, currents, powers, etc.). For these variables, the relationships can be analysed using the regression and correlation methods.

The regression methods allow the measurement and study of the relation between two or more variables, as well as the discovery of the connection laws between these. A mathematical expression can be obtained with the aim to estimate the values of one independent variable according to the values of other variables [11, 12].

Correlation analysis measures the intensity of the relationship between one or more variables. Depending on the regression model, the correlation can be treated as a single or multiple correlation [13, 14].

The following issues must be solved in a study which is based on the regression and correlation analysis [12]:

  • Identify the existence of the relationship between variables. Solution: A logical analysis of the possibility of a relationship between the variables can be applied.

  • Establishing the meaning and form of the relationship. Solution: Regression analysis methods can be used.

  • Determining the intensity degree of relationship. Correlation analysis methods can be used.

2.1 Correlation Methods

2.1.1 Interdependent Parallel Statistical Series-Based Method

The analysis of statistical relationships takes into account the estimation of a regression model and measuring the intensity of the relationship between variables. The analysis of the statistical relationship compares the terms of two interdependent parallel series x (independent variable) and y (dependent variable). For example, when two time series are compared, their elements are chronologically sorted, such that the existence and direction of the relationship can be easy identified. Thus, if both variables have a variation in the same direction, there is a direct relationship. If the variation is different, an inverse correlation is obtained. If the two time series vary independently, or one varies and the other remains constant, there is no relationship [8].

The method can be used for the time series with few variables, when there is a relationship between the pairs of variables (xi, yi, i = 1, …, N).

2.1.2 Cross-Correlation Matrix Based Method

The principle of method is based on the grouping the elements of a data set using simultaneously both correlated variables (x and y). Equal intervals and an identical number of groups for both variables are recommended to be used. Thus, in the matrix, the existence, direction and intensity of the relationship can be appreciated using the distribution model of frequencies nij, as it can be seen in Table 1.

Table 1 Cross-correlation matrix

If the frequencies nij are scattered relatively uniformly inside the matrix, there is no relationship between the variables considered. But, if they are concentrated around the diagonals, a stronger correlation can be identified between the variables x and y.

2.1.3 Graphical Method

The method involves the graphical representation of the pairs of values corresponding to the variables in a coordinate system, such that the existence, meaning, form and intensity of the correlation can be easy identified. The graph corresponds to the case where a relationship is defined in concordance with interdependent statistical parallel series-based method.

2.1.4 Analytical Methods

The analytical models allow determination of the mathematical relations and the numerical measurement of the intensity between variables. The regression models aim to represent the distribution type of correlated variables. The regression curves indicate the correspondence between the pairs (xi, yi). The following steps should be performed to establish and analyse a regression model:

  • Building the correlation graph.

  • Establishing the theoretical regression model of the relationship (based on the correlation graph adjustment) and identification of the equation corresponding to the chosen regression model.

  • Determining the coefficients of the regression equation (with the least squares method) and interpreting the regression according to their sign and value.

2.1.5 Regression Models with Two Variables

The relationship between two variables x and y can be expressed by a regression equation:

$$y_{x} = f(x) + e$$
(1)

where f (x) represents a function which is dependent on the variable x, and e is the approximation error.

If the size of the database will grow, the approximation error e will decrease. Thus, a higher number of observations can lead at a stronger relationship. Function f(x) can have different models depending by the data scatter.

Linear regression (LR) model

The LR model is most used in the practice. The relationship can be expressed using the following equation:

$$y_{x} = a + bx + e$$
(2)

The Eq. (2) can be plotted using a line. The variable e represents a random error given by:

$$e = y_{i} - y_{{x_{i} }} \,;\quad i = 1, \ldots ,N$$
(3)

where a and b are unknown coefficients, their values being determined using the least squares method.

The coefficient b from the expression (2) can have different signs which characterize the direction of the relationship between variables: “+”, positive relationship; null”, no relationship, and “”, negative relationship.

The value of coefficient b shows the dependence degree between variables, namely how much the variable y increases or decreases when the variable x increases or decreases with one unit.

Parabolic regression (PR) model

In order to express this model, the second degree polynomial is usually used:

$$y_{x} = a + bx + cx^{2} + e$$
(4)

where coefficients a, b and c are determined using the least squares method.

Hyperbolic regression (HR) model

$$y_{x} = a + \frac{b}{x} + e$$
(5)

Exponential regression (ER) model

In order to express this model, the following equation is used:

$$y_{x} = ab^{x} + e$$
(6)

For each sample, rel. (6) can be linearized by logarithm:

$$\log y_{x} = \log a + x\log b$$
(7)

2.2 Intensity of the Relationship Between Two Variables

The intensity of the relationship, if there is between two variables (x, y), indicates a concentration degree of or scattering of the values y around the regression model yx. The intensity of the relationship can be measured based on the correlation coefficient and the correlation ratio.

2.2.1 Correlation Coefficient

The correlation coefficient is used to appreciate the intensity of relationship between the analysed variables. The calculation of this coefficient can be made using the relation:

$$\rho (x,y) = \frac{C(x,y)}{{\sigma_{x} \cdot \sigma_{y} }} = \frac{{\sum\limits_{i} {(x_{i} - x_{m} )(y_{i} - y_{m} )} }}{{n \cdot \sigma_{x} \cdot \sigma_{y} }}\,,\quad i = 1\,, \ldots ,\;N$$
(8)

where: C(x, y)—the covariance between analysed variables; xm, ym—the mean values of the variable; N—number of pairs of values; σx and σy—the standard deviation of variables x and y.

Between the regression coefficient b from relation (2) and the correlation coefficient, ρ (x, y), there is the following relationship:

$$\rho = b \cdot \frac{{\sigma_{x} }}{{\sigma_{y} }}$$
(9)

The analysis of the relation (9) highlights that the sign of the correlation coefficient is identically with the sign of the regression coefficient, because σx and σy are positive or equal with zero. The value of the correlation coefficient is in the range [−1, 1]. These two extreme values represent a perfect linear relationship between the two variables (“positive” or “negative”). The missing of a relationship between the two variables can be recorded if ρ = 0.

2.2.2 Correlation Ratio

The correlation ratio η is defined by the relation:

$$\eta = \sqrt {\frac{{\sigma_{{y_{x} }}^{2} }}{{\sigma_{y}^{2} }}}$$
(10)

where

$$\sigma_{y}^{2} = \frac{{\sum {\left( {y_{i} - \bar{y}} \right)^{2} } }}{n};\,\sigma_{{y_{x} }}^{2} = \frac{{\sum {\left( {y_{{x_{i} }} - \bar{y}} \right)^{2} } }}{n}$$
(11)

The correlation ratio have the values into the range [0, 1]. The value 1 indicates the existence of a relationship, namely the variation of the variable y depends only on by the variation of variable x.

3 Case Studies in the Electric Distribution Networks

3.1 Power Correlation Problem

The quality and efficiency of complex problem-solving process regarding the optimal operation and planning of the electric distribution networks are largely determined by the accuracy of the load estimation methods. The estimation of the power demand and the electricity consumption is made starting from the historical data on the evolution of consumption, which is recorded systematically, processed by appropriate methods. The main factors which can be taken into account are: daily peak load, hourly electricity consumption, and daily or weekly electricity [15, 16]. In order to have the most accurate estimation, a large-size database should be used including the hourly electricity consumptions for a sufficiently long period (minimum 1 year), the evolution of demographic and climatic factors, and economic indexes in the analysed areas [4,5,6]. These information must be subjected to a pre-processing stage to eliminate systematic, gross, and random errors, and then if it possible to find a relationship between variables represented by the electricity consumption and the climatic and weather factors [7, 8, 14, 17].

The practice applications have concluded that the success of an estimation method is based on the achievement of some appropriate conditions, such as: an accurate selection of estimation period, the applied method, the confidence of the initial data, the flexibility, and taking into account the climatic and weather factors. In the load estimation process (including the peak load), there are more mathematical methods developed in the literature. The most of the proposed approaches use the dependence between the maximum value of the load (peak load) and the annually/monthly/daily electricity consumption [7, 8].

Today, the most Distribution Network Operators (DNOs) from the European countries are in full process of implementing the smart metering system in the MV/LV electric substations and at the end consumers. The problem is that this process is slow and there are enough electric substations for which DNOs do not have yet information on their loading and the peak load to estimate the operation regime of the electric network. In this case, the loads, generally, and the peak load, particularly, can be estimated based on correlation studies, as will be shown in the following [6,7,8, 18].

If a simple linear regression model is used for the relationship between the mean values of the variables P and Q, then the following relations can be accepted (see Fig. 1):

Fig. 1
figure 1

The correlation between P and Q (direct variation)

$$Q = \rho_{PQ} \cdot \frac{{\sigma_{Q} }}{{\sigma_{P} }} \cdot P + k_{PQ}$$
(12)
$$\rho_{PQ} = \frac{{C_{PQ} }}{{\sigma_{P} \cdot \sigma_{Q} }}$$
(13)
$$C_{PQ} = \overline{P \cdot Q} - \overline{P} \cdot \overline{Q}$$
(14)
$$\sigma_{P}^{2} = \overline{{P^{2} }} - \overline{P}^{2}$$
(15)
$$\sigma_{Q}^{2} = \overline{{Q^{2} }} - \overline{Q}^{2}$$
(16)

where: P—the active power [kW], Q—the reactive power [kVAr], ρQP—the correlation coefficient between P and Q; CPQ—covariance between P and Q; σP, σQ—the standard deviation of P and Q.

The overline indicates the mean value, and the coefficient kQP is determined for each particular case, based on the correlation studies [7].

But, there are cases where the powers P and Q have an opposite variation. In these cases, a “variation belt” should introduced (see Fig. 2).

Fig. 2
figure 2

The correlation between P and Q (opposite variation)

3.2 Peak Load Estimation Using Power Correlation

3.2.1 Solution Description

The estimation of the loads from the MV/LV electric substations at the hour when the maximum value (peak load) in the electric distribution system was recorded, will be made in this paragraph using a power correlation-based method.

In the initial step, a statistical analysis of the load profiles regarding to the active power from a database belonging a DNO in the MV/LV electric substations without the installed smart meters is performed. Different time frames can be used in this analyse, depending on the technical and load characteristics of the network. The length of the time frames (Lh with h = 7 or 24) could be chosen from the following: L24 frame, L7 frames (hPL ± 3 h), (hPL − 4 h; hPL + 2 h) and (hPL − 5 h; hPL + 1 h), where hPL is the hour when the maximum value of load (peak load) from the system was recorded.

Using the LR model, the steps of the estimation method are the following:

  1. 1.

    Consideration of a main variable in relation to which the correlation analysis will be performed. The main variable can be chosen as the HV/MV electric substation because the hourly powers P and Q are recorded all along using smart meters.

  2. 2.

    Determining the peak load and the hour when is recorded for the reference electric substation.

  3. 3.

    Calculation of the correlation coefficients between the profiles of the powers P and Q, recorded in each MV/LV electric substation, and the profile for the power P, recorded in the HV/MV electric substation chosen as reference. Also, the standard deviation of the powers P and Q recorded in the MV/LV electric substation will be calculated.

  4. 4.

    Determination of the values for the coefficients \(b_{{P_{r} }}^{{P_{i} }}\), \(b_{{P_{r} }}^{{Q_{i} }}\), \(a^{{P_{i} }}\), and \(a^{{Q_{1} }}\) with the relations:

$$b_{{P_{r} }}^{{P_{i} }} = \rho_{{P_{r} P_{i} }} \cdot \frac{{\sigma_{{P_{i} }} }}{{\sigma_{{P_{r} }} }};\quad i = 1, \ldots ,N$$
(17)
$$b_{{P_{r} }}^{{Q_{i} }} = \rho_{{P_{r} Q_{i} }} \cdot \frac{{\sigma_{{Q_{i} }} }}{{\sigma_{{P_{r} }} }};\quad i = 1, \ldots ,N$$
(18)
$$a^{{P_{i} }} = \sum\limits_{j = 1}^{h} {(P_{ij} - b_{{P_{r} }}^{{P_{i} }} \cdot P_{rj} )/L_{h} } ;\quad i = 1, \ldots ,N\quad$$
(19)
$$a^{{Q_{i} }} = \sum\limits_{j = 1}^{h} {(Q_{ij} - b_{{P_{r} }}^{{Q_{i} }} \cdot P_{rj} )/L_{h} } ;\quad i = 1, \ldots ,N\quad$$
(20)

where: Pr—the active power corresponding to the HV/MV electric substation chosen as reference; Pi, Qi—the active and reactive powers from the MV/LV electric substation i; N—the number of MV/LV electric substations from the analysed network; Lh—the length of time frame (h = 7 or 24).

  1. 5.

    Estimation of the powers P and Q from the MV/LV electric substations at the hour when the maximum value of load in the system was recorded can be made using the following LR models:

$$P_{i} = b_{{P_{r} }}^{{P_{i} }} \cdot P_{r\,\text{max} } + a^{{P_{i} }}$$
(21)
$$Q_{i} = b_{{P_{r} }}^{{Q_{i} }} \cdot P_{r\,\text{max} } + a^{{Q_{i} }}$$
(22)

where: Pr max—the peak load corresponding to the reference; Pi, Qi—the estimated powers in the MV/LV electric substation i = 1, …, N.

3.2.2 Testing the Solution

This paragraph presents testing the proposed method based on database belonging an electric MV distribution system (20 kV) with 34 MV/LV electric substations. The peak load in this system is recorded at the hour 15. Following the steps of method, the LR models for different time frames were used in the analysis. The values of the active powers at the hour when the load peak was recorded in the analysed system, for the time frames L24 and L7, are presented in Table 2.

Table 2 The estimated active powers in the MV/LV electric substations

The RL models obtained for all considered time frames in the case of a MV/LV electric substation (no. 28) from the analysed system are represented in Figs. 3, 4, 5 and 6 to observe the estimation accuracy for some time frame.

Fig. 3
figure 3

Linear regression model P28 = 0.0556 · Pr − 175.5 (Time Frame L24)

Fig. 4
figure 4

Linear regression model P28 = 0.0543 · Pr − 146.1 (Time Frame L7 (hPL ± 3 h)

Fig. 5
figure 5

Linear regression model P28 = 0.0609 · Pr − 251.7 (Time Frame L7 (hPL − 4 h; hPL + 2 h))

Fig. 6
figure 6

Linear regression model P28 = 0.0602 · Pr − 244.03 (Time Frame L7 (hPL − 5 h; hPL + 1 h))

The errors were calculated with the relation:

$$Er_{p} = \frac{{P_{e} - P_{m} }}{{P_{m} }}100\quad [\% ]$$
(23)

where: Pe—estimated active power; Pm—measured active power.

It can be observed that the errors are smaller in the case of the time frame 7 h (hPL − 5 h; hPL + 1 h) than in the others frames, the average error being 1.48%.

3.3 Residential Load Estimation Using a RegressionCorrelation-Based Method

3.3.1 Solution Description

Load estimation has seen in the latest decades an increase in importance, complexity and need of accuracy. Before 1970, the electricity demand was relatively predictable, and a good forecast required simple mathematical models, limited to trend extrapolation. Also, the “7% rule” was used, which stated the doubling of electricity demand in each 10 years [19]. The load estimation studies are influenced by more factors, which can be grouped as follows [1]:

  • Economic: for long and medium time period. These factors aren’t responsible for hourly load variations and aren’t considered in short term forecasts.

  • Temporal—seasons, daily and weekly cycles, holidays, daylight intervals.

  • Weather: temperature, humidity, wind speed and direction, clouds, rain.

  • Casual: holidays, worker strikes, public events.

Practical studies have shown that the demand variation in time or according to other considered parameters has four main components [17]: season S(t); cyclic C(t); trend T(t), and random R(t). The demand can be written as the sum of the four factors, using the following equation:

$$W\left( t \right) = S\left( t \right) + C\left( t \right) + T\left( t \right) + R\left( t \right)$$
(24)

The mathematical function used in the estimation process is determined by successive steps, taking into account the consumption history and a qualitative and quantitative analysis of the technical and economic factors which influence in time over the consumer demand. In order to obtain a model for the demand of a consumer group or a geographical area requires the testing of several approximation approaches. For electrical load estimation, the optimal approximation functions are obtained using specialized software tools, which choose the best variant among a wide range of options.

The accuracy of the selected estimation model is assessed by computing indices which give the spread of the initial data (earlier demand values) with regard to the considered trend. Usually, a low spread indicates a good approximation which can be expressed by the quality indices (Ik). The mathematical expressions of these indices are given below:

  • the mean absolute values of deviations:

$$I_{1} = \frac{1}{n} \cdot \sum\limits_{i = 1}^{n} {\left| {\hat{y}_{i} - y_{i} } \right|}$$
(25)
  • the mean absolute percentage values of deviations:

$$I_{2} = \frac{1}{n} \cdot \sum\limits_{i = 1}^{n} {\left| {\frac{{\hat{y}_{i} - y_{i} }}{{y_{i} }}} \right|} \cdot 100$$
(26)
  • the mean absolute deviation:

$$I_{3} = \frac{1}{n} \cdot \sum\limits_{i = 1}^{n} {\left| {\hat{y}_{i} - \bar{y}} \right|}$$
(27)
  • the dispersion:

$$I_{4} = \sigma^{2} = \frac{1}{n - m - 1} \cdot \sum\limits_{i = 1}^{n} {\left( {y_{i} - \hat{y}_{i} } \right)^{2} } .$$
(28)

but the value is different with the total variance of y:

$$\sigma_{t}^{2} = \frac{1}{n} \cdot \sum\limits_{i = 1}^{n} {\left( {y_{i} - \bar{y}_{i} } \right)^{2} }$$
(29)
  • the mean square deviation of the selection

$$I_{5} = \sigma$$
(30)
  • the variation coefficient:

$$I_{6} = v = \frac{\sigma }{x}$$
(31)
  • the correlation coefficient from (8) in a particular form:

$$I_{7} = \rho = \frac{{\sum\limits_{i = 1}^{n} {\left( {x_{i} - \bar{x}} \right) \cdot \left( {y_{i} - \bar{y}} \right)} }}{{ \pm \sqrt {\sum\limits_{i = 1}^{n} {\left( {x_{i} - \bar{x}} \right)^{2} } \cdot \sum\limits_{i = 1}^{n} {\left( {y_{i} - \bar{y}} \right)^{2} } } }}$$
(32)
  • the particular form of (10) of correlation ratio will be:

$$I_{8} = \eta = \sqrt {\frac{{\sum\limits_{i = 1}^{n} {\left( {\hat{y}_{i} - \bar{y}} \right)^{2} } }}{{\sum\limits_{i = 1}^{n} {\left( {y_{i} - \bar{y}} \right)^{2} } }}}$$
(33)

where \(\hat{y}_{i}\)—the estimated value, \(y_{i}\)—the real demand, \(\bar{y}\)—the mean value of the historical consumption, m—the degree of the polynomial used for trend approximation.

In order to compute the trend, as recommended in the literature, continuous functions were used, which can be represented as continuous growth curves and limited growth curves. Their coefficient was determined using time series regression, with normal and modified methods using the sum squared error criterion. This approach is frequently used for residential load estimation.

The load estimation in the MV/LV electric substations is more difficult, because of the lack of historical demand data from consumers. Moreover, load estimation at the level of each DNO is possible with much better accuracy, using load data recorded through the continuous monitoring in the HV/MV electric substations and applying the global estimation methods [17].

Thus, for a year j from the estimation interval Pm+j, the load estimation can be obtained based on a mathematical model which uses historical load data:

$$P_{n + j} = \frac{{\sum\limits_{k = 0}^{m - 1} {\sum\limits_{i = 1}^{n - j - k} {\frac{{P_{i + j + k} }}{{P_{i} }}} } }}{{\sum\limits_{k = 0}^{m - 1} {(n - j - k)} }}$$
(34)

where: n—the previous years for which recordings exist; m—the previous years used as forecast base; j—forecast year; k—base year.

Previous studies have shown that ambient temperature has a significant influence on demand [4]. The load estimation with the temperature (computed for several consecutive years) can be:

$$P_{pr} = \frac{{P_{r} }}{{1 + \frac{a}{b}\Delta \theta }}$$
(35)

where Pr—the real load, measured in a given year; Δθ—the difference between the real and average temperature recorded for several years, over a given time interval; a—regression coefficient with the temperature θ; b—the average load ratio for years j and j − 1.

Accounting for the (load-temperature) correlation, which differs monthly, and sometimes is greater at the night hours than at the day hours, if temperature forecasts are known for the next year, then the load estimation for the next year can be computed:

$$P_{(n + 1),\theta } = P_{{n,\theta_{n} }} (b + a \cdot \Delta \theta )$$
(36)

where Δθ—the difference between the next year temperature forecast and the multi-year temperature; \(P_{{n,\theta_{n} }}\)—the load from the last year; a—regression coefficient with the temperature θ; b—the average load ratio for years j and j − 1.

Using statistical methods [2, 3, 20] the peak load level growth for individual residential consumers can be computed with:

$$S_{\text{max} } = \, \overline{S}_{\text{max} } + \lambda \cdot \sigma$$
(37)

where \(\overline{S}_{\text{max} }\)—the mean value of the peak load for the residential consumer:

$$\, \overline{S}_{\text{max} } = \sum\limits_{i = 1}^{n} {S_{{\max_{i} }} }$$
(38)

\(\sigma\)—mean square deviation, computed as a particular form:

$$\, \sigma = \sqrt {\sigma^{2} } = \sqrt {\frac{1}{n}\left( {S_{{\max_{i} }} - \overline{S}_{\text{max} } } \right)^{\,2} }$$
(39)

n—number of residential consumers with the available measurements;

\(\lambda\)—rated deviation of the normal distribution.

For the estimation of the monthly load, the profile of the warm season (December—month 12) and the profile of the cold season (June, month 6) can be used in any month l:

$$P_{t,l} = \frac{{P_{t,12} + P_{t,6} }}{2} + \frac{{P_{t,12} - P_{t,6} }}{2}\cos \frac{\pi \cdot l}{2}$$
(40)

where: Pt,l—the active power at hour t = 1, …, 24, in month l; Pt,6, Pi,12—the active power at hour t = 1, …, 24, in month 6 (June) and month 12 (December).

If the yearly load growth is considered, (40) can be rewritten as:

$$P_{t,l} = \frac{{\alpha \cdot P_{t,12} + \frac{1 + \alpha }{2} \cdot P_{t,6} }}{2} + \frac{{\alpha \cdot P_{t,12} - \frac{1 + \alpha }{2} \cdot P_{t,6} }}{2} \cdot \cos \frac{\pi \cdot l}{2}$$
(41)

where α is the yearly load growth coefficient.

The estimation model or function is chosen according to the least squares’ criterion, which seeks the minimization of the sum S of the squared differences between the computed and the real energy consumption values, written as:

$$S = \sum\limits_{k = 1}^{n} {d_{k}^{2} } = \sum\limits_{k = 1}^{n} {\left[ {y_{k} - f\left( {x_{k} ,a_{0} ,a_{1} , \ldots ,a_{n} } \right)} \right]^{\;2} }$$
(42)

If the obtained values have different variances, then the measured values were obtained with measurement devices having different precision classes (42) can be rewritten as:

$$S = \sum\limits_{k = 1}^{n} {d_{k}^{2} } = \sum\limits_{k = 1}^{n} {\left\{ {{\kern 1pt} \,\left[ {y_{k} - f\left( {x_{k} ,a_{0} ,a_{1} , \ldots ,a_{n} } \right)} \right]^{\;2} \cdot \omega_{k} } \right\}}$$
(43)

where \(\omega\)k are weights inversely proportional with the variance of the measured values, respectively:

$$\omega_{1} = \frac{1}{{\sigma_{1}^{2} }};\quad \omega_{2} = \frac{1}{{\sigma_{2}^{2} }};\quad \quad \ldots \quad \quad \omega_{n} = \frac{1}{{\sigma_{n}^{2} }}$$
(44)

The values a0, a1, …, an, are obtained by minimizing S (a0, a1, …, an):

$$\frac{\partial S}{{\partial a_{0}^{{}} }} = 0;\quad\frac{\partial S}{{\partial a_{1}^{{}} }} = 0;\quad\ldots \quad\frac{\partial S}{{\partial a_{n}^{{}} }} = 0\,$$
(45)

By solving (45), the best regression coefficients are determined for a function family y = f(x). The direct extrapolation procedure used for determination the best regression coefficients for the load estimation is illustrated in the following for the logistic and power functions. The logistic function used for the estimation of the trend term in time series has the following expression:

$$y = \frac{a}{{1 + b \cdot e^{ - c \cdot x} }}$$
(46)

where a is the limit value of y in time, and can be frequently assessed with non-statistical means.

In order to find a, b and c in (46), a possible approach is to empirically choose three values (y1, y2, y3) which correspond to the (x1, x2, x3) equidistant points illustrated in Fig. 7. For simplifying the computation effort, the following notations can be used:

Fig. 7
figure 7

Representing Y values using equidistant X values

$$x_{1} = 0;\;\;x_{2} = \theta ;\;\;x_{3} = 2\theta$$
(47)

Thus, the logistic function (47) can be written:

$$\frac{a - y}{y} = b \cdot e^{ - c \cdot x}$$
(48)

If x = x1 =  0, then b can be computed with:

$$b = \frac{{a - y_{1} }}{{y_{1} }}$$
(49)

Using the natural logarithm transformation, (48) becomes

$$\ln b - c \cdot x = \ln \left( {\frac{{a - y_{1} }}{{y_{1} }}} \right)$$
(50)

Similarly, if x = x2 = \(\theta\) and x = x3 = 2 \(\theta\),

$$\ln b - c\theta = \ln \left( {\frac{{a - y_{2} }}{{y_{2} }}} \right);\;\ln b - 2c\theta = \ln \left( {\frac{{a - y_{3} }}{{y_{3} }}} \right)$$
(51)

By using (50), multiplying the first equation by (−2) and adding it with the second equation from (51), we obtain

$$\frac{{a - y_{1} }}{{y_{1} }} = \left( {\frac{{a - y_{2} }}{{y_{2} }}} \right)^{2} \cdot \frac{{y_{3} }}{{a - y_{3} }}$$
(52)

Using (52), a can be written as:

$$a = \frac{{2y_{1} \cdot y_{2} \cdot y_{3} - y_{2}^{2} \left( {y_{1} + y_{3} } \right)}}{{y_{1} \cdot y_{3} - y_{2}^{2} }}$$
(53)

Once a from the logistic function (46) is computed using (53), b can be determined with (49), and c with (50), follows using:

$$c\theta = \ln b - \ln \left( {\frac{{a - y_{2} }}{{y_{2} }}} \right) = \frac{{a - y_{1} }}{{y_{1} }} - \ln \left( {\frac{{a - y_{2} }}{{y_{2} }}} \right)$$
(54)

or

$$c\theta = \ln \frac{{a - y_{1} y_{2} }}{{a - y_{2} y_{1} }};\quad c = \frac{1}{\theta } \cdot 2.3026 \cdot \log \frac{{y_{2} \left( {a - y_{1} } \right)}}{{y_{1} \left( {a - y_{2} } \right)}}$$
(55)

Knowing a, b and c, the logistic function can be computed for any each value of the variable x.

As presented in the literature [21,22,23], the logistic function can be used for yearly estimations only for longer intervals (8–10 years), especially for consumer categories with similar appliances and demand profiles. As for the use of the power function in load extrapolation, its initial expression is

$$y = a \cdot x^{b}$$
(56)

By using the transformation of natural logarithm, we get

$$y = \ln (A) + B\ln (x)$$
(57)

and by substituting Y = ln y; a = ln A; X = ln x; B = b, a linear function is obtained:

$$Y = b \cdot X + a$$
(58)

The best regression curve fulfils the least mean square criterion:

$$S = \sum\limits_{k = 1}^{l} {\left( {Y_{k} - b \cdot X_{k} - a} \right)^{2} } \to \text{min}$$
(59)

To find the minimum value of S, it’s the first order derivatives in report with a and b must be set to zero (\(\partial S/\partial a = 0\;;\;\partial S/\partial b = 0\)), which gives the following equations system:

$$\left\{ {\begin{array}{*{20}l} {b \cdot \sum\limits_{i = 1}^{l} {\ln x_{i} } + m \cdot a = \sum\limits_{i = 1}^{l} {\ln y_{i} } } \hfill \\ {b \cdot \sum\limits_{i = 1}^{l} {\left( {\ln x_{i} } \right)^{2} } + m \cdot a \cdot \sum\limits_{i = 1}^{l} {\ln x_{i} } = \sum\limits_{i = 1}^{l} {\ln x_{i} \cdot \ln y_{i} } } \hfill \\ \end{array} } \right.$$
(60)

By solving the linear equations system (60), the power function coefficients are obtained:

$$a = \frac{{\sum\limits_{i = 1}^{l} {\ln y_{i} } \cdot \sum\limits_{i = 1}^{l} {\left( {\ln x_{i} } \right)^{2} } - \sum\limits_{i = 1}^{l} {\ln x_{i} } \cdot \sum\limits_{i = 1}^{l} {\ln x_{i} \cdot \ln y_{i} } }}{{m \cdot \sum\limits_{i = 1}^{l} {\left( {\ln x_{i} } \right)^{2} } - \left( {\sum\limits_{i = 1}^{l} {\ln x_{i} } } \right)^{2} }}$$
(61)
$$b = \frac{{m \cdot \sum\limits_{i = 1}^{l} {\ln x_{i} \cdot \ln y_{i} } - \sum\limits_{i = 1}^{l} {\ln x_{i} } \cdot \sum\limits_{i = 1}^{l} {\ln y_{i} } }}{{m \cdot \sum\limits_{i = 1}^{l} {\left( {\ln x_{i} } \right)^{2} } - \left( {\sum\limits_{i = 1}^{l} {\ln x_{i} } } \right)^{2} }}$$
(62)

The Romanian standards recommends the use a power function for residential load estimation:

$$P(t) = A \cdot t^{b} = P(t) \cdot t^{b}$$
(63)

If it is considered the 2000–2030 interval, the signification of terms from (63) is the following: P(t)—the estimated load for year t; t—a year from the range [2000, 2020], (t = 1 for year 2000); A = P(t)—the demand in the first year (2000), used as base value; b—regression coefficient, based on historical data, whose value differs for each consumer category.

The estimation functions for the demand evolution in urban areas, considered as power required by MV/LV electric substations, maximum and minimum value, are given in [24] for the 2000–2035 interval. It should also be noted that for the estimation of the demand for the apartments found in crowded areas or in individual buildings more than 4 levels, the following supplemental values should be added: for staircase lighting—0.2 kW/store (4/6 apartments); elevators—10 kW/drive; fire hose enclosure lighting:—2 kW/entrance.

The choice between the maximum and the minimum value should be made in the design stage, taking into account the geographical area, the economic environment, consumer density etc. [5, 6].

3.3.2 Testing the Solution

Using the capabilities of the Smart Meters, which can record consumption values, data was recorded for seven consecutive years (2012–2018) on the LV side of four MV/LV electric substations located in an electric distribution network belonging of a DNO from Romania. The monitored substations supply 390 apartments with 2 and 3 room apartments.

A first category (Group I) contains 205 apartments which use natural gas for cooking and receive hot water and heating from the central thermal power plant. The second category (Group II) contains 185 apartments which use natural gas for cooking and individual thermal plants for hot water and heating. Table 3 and Fig. 8 show the electricity demand evolution measured in the four monitored MV/LV substations, as measured by the smart meters.

Table 3 The demand evolution for each apartment category, [kW/ap]
Fig. 8
figure 8

The demand evolution measured at the LV side of electric substation, for the considered apartment categories and years 2012–2018

Initially, in order to identify the most representative mathematical model for the load estimation, as described in the previous sections, continuous growth functions (linear, parabolic, polynomial, exponential) limited growth (power, logarithmic, modified exponential, logistic) and modified combinations functions were used.

In the second stage, the regression coefficients were determined for each function and apartment category, using the time as interest variable and the minimum least square criterion. The results confirmed that the power function has the smallest sum of squared estimate of errors (SSE), confirming the validity of the estimation function type recommended in the standards. However, the regression coefficients differ slightly:

$${\text{Group}}\,{\text{I}}:\,W(t) = 0.420\, \cdot t^{0.201}$$
(64)
$${\text{Group}}\,{\text{II}}:\,W(t) = 0.467 \cdot t^{0.219}$$
(65)

It should be noted that in the Romanian standard, the power function coefficients have different values according to the number of rooms in the apartment and the heating/cooking type, as described earlier.

Table 4 The demand evolution on the LV side of electric substations, for the considered apartment categories using new and the recommended coefficients

For the apartment types used in the study case, two different coefficient sets are provided:

$${\text{Minimal}}\,W(t) = 0.305 \cdot t^{0.35}$$
(66)
$${\text{Maximal}}\,W(t) = 0.357 \cdot t^{0.35}$$
(67)

For a comparative analysis of the coefficients associated the estimation function obtained in the study case, relations (64) and (65), and given in the standards, relations (66) and (67), Table 4 and Fig. 9 present the demand evolution on the LV side of electric substations, between 2012 and 2030 (estimated).

Fig. 9
figure 9

The demand evolution for the categories of considered apartment

The following conclusions can be highlighted:

  • The coefficients of the power function computed in the study case have different values comparative with those from the normative.

  • The new estimated values are inside the range given in [24] for the apartments using individual thermal substations.

4 Conclusions

The estimation of the loads in different parts of the distribution system represents a main function of the DNOs. The electricity cannot be efficiently stored on a large scale (relative to the produced amount), which means that for the DNOs, the estimation of the loads is an indispensable factor in the distribution process. The regression models are some of the most commonly used statistical techniques. For the estimation of electricity/power consumption such approaches are used to model the relationship between consumption and other factors such as weather, type of day, nature of consumption, etc. Usually RL model is used using in most cases the temperature. The advantages of this model are related to the relatively simple implementation, the easy understanding of the relationship between the input and output variables and the easy estimation of the performance of the forecasting model. However, due to the complex dependence between electricity consumption and influence factors, inherent problems arise in identifying the correct model.

To solve this problem, the regression analysis-based approaches for the load modelling from the nodes of electricity distribution networks were treated in the chapter. The approaches refer to estimation of the required powers in the supply points with a mixt structure of the load (i.e. residential, commercial, and industrial) at the hour when the maximum value of the load is recorded and the demand of residential consumers which represent the highest percentage from the load structure fed from the LV/MV electric substations. The proposed approaches were tested in real operation conditions of MV distribution networks from Romania. Thus, the estimation of the loads from the MV/LV electric substations of a test network, at the hour when the maximum value (peak load) was recorded, using the proposed method based on the power correlation, led at an average error for the time frame 7 h (hPL − 5 h; hPL + 1 h) below 1.48% than in the others frames L24 frame or L7 frames (hPL ± 3 h), (hPL − 4 h; hPL + 2 h).

Regarding the estimation of the demand in the case of residential consumers, the comparative analysis of the coefficients associated the estimation function and to those given in the Romanian standard highlighted that the estimated values are lower than the minimum recommended values for the apartments which use natural gas for cooking and individual thermal plants for hot water and heating. This behaviour could be the result of a modified behaviour of the customers or due to the used database which belonging of a characteristic electric substation from the analysed area, while in used data from the Romanian standard are collected from the whole country.