Keywords

1 Introduction

Energy demand is perhaps the market’s most important pillar: all institutions, agents, and processes—from planning and operation to marketing and management—are essentially organized to serve it. However, although projecting load future evolution is crucial for an economical and secure supply, it is still one of our major challenges. The behavior of the consumer changes continuously, offering unpredictable reactions to various stimuli, as prices, economic indicators, expectations, and perceptions not always based on reality.

Brazilian load offers an interesting case study. The year 2018 experienced an anomalous increase in consumption throughout Brazil, almost always without connection to any of the classical explaining triggers: GDP experienced a sharp fall, as did income and all economic activities’ indicators. We currently face a major challenge: consumer behavior has changed, old dynamics no longer represent the present and we must predict the future without any past basis. In fact, in this context, the longer the history, the worse is the prediction.

This behavior almost lies within the concepts proposed by [1, 2], where income raise yields a sensible behavior change, breaking the previous classical correlations between consumption and economy indicators.

However, the Brazilian case steps further: even without a significant income raise, popular expectations lead to new apparel acquisitions (specially climatization) and thus to consumption increase. Correlations are broken, and only behavioral economics can explain this anomaly.

It is necessary to develop mathematical models and computational tools as agile as the consumer, able to understand, follow and maybe anticipate its behavior, with the speed of our new times.

2 Objective

This paper describes a model able to accommodate more than just lack of data: we deal with extreme scarcity, where forecast needs to be performed from very few observations—for example, one year (twelve months). In this case, historical records are not even enough to allow a backtracking test (identification/prediction): it will be necessary to start from scratch.

It is necessary to “populate” the load history with valid information—and it is important to distinguish information from numbers: it would be possible to create synthetic samples from the available data, but they would contain the same poor information—anything else could even lead us to distorted results.

However, although it is not possible to extract more information from a history beyond the availability limits, it is feasible to combine similar experiences: observations from different agents that exhibit similar behaviors. For example, it is possible that distributors in neighboring regions share the same dynamics of consumption. In this case, it might be interesting to “blend the knowledge” of each company into a single richer, more complete history.

This is the proposal of collaborative learning (MTL) [3,4,5]. By joining forces, information is shared without losing individuality. The model should select the common dynamics and point specificities, leading to a more consistent and reliable projection.

The advantages of the proposed model are highlighted through a comparison between the new model and a Hilbert Space approach, previously used in many Brazilian companies, also designed for lack of data forecast problems.

3 Multi-Task Learning Approach

Considering space limitations, this article summarizes the applied collaborative learning model. More details, including alternative implementations, may be found in [3].

The proposed approach establishes a set of outputs or tasks t (in our case, the target variables, loads, or consumption). Each of these tasks is associated to a set of explanatory variables (inputs) x (in our case, economic, climatic, behavioral activities, etc.). The successful collaborative learning model requires that outputs t react similarly to inputs x.

The function that “maps” the input x to the output t is written as

$$\varvec{f}_{\varvec{t}} \left( \varvec{x} \right) = \sum\nolimits_{{\varvec{i} = 1}}^{\varvec{d}} {\varvec{a}_{{\varvec{it}}} \varvec{u}_{\varvec{i}} \left( \varvec{x} \right) : \forall \varvec{t} \in \varvec{T}; \varvec{a}_{{\varvec{it}}} \in {\mathbb{R}};\text{ }\varvec{x} \in {\mathbb{R}}^{\varvec{d}} }$$
(1)

where

x is the vector of input variables

\(\varvec{f}_{\varvec{t}} \left( \varvec{x} \right)\) is the output associated to task t.

function \(\varvec{u}_{\varvec{i}} \left( \varvec{x} \right)\) expresses the shared responses of all inputs x and different tasks t.

coefficients \(\varvec{a}_{{\varvec{it}}}\) measure the “coupling” between different tasks.

For the sake of simplicity, this work assumes linear functions (non-linear extensions are possible and relatively straightforward). In this case, function f(t) corresponds to a vector product which may be written as

$$\varvec{w}_{\varvec{t}} = \sum\nolimits_{{\varvec{i} = 1}}^{\varvec{d}} {\varvec{a}_{{\varvec{it}}} \varvec{u}_{\varvec{i}} }$$
(2)

and therefore

$$\varvec{f}_{\varvec{t}} \left( \varvec{x} \right) = \varvec{w}_{\varvec{t}} \left( \varvec{x} \right) : \forall \varvec{t} \in \varvec{T}; \varvec{x} \in {\mathbb{R}}^{\varvec{d}}$$
(3)

where \(\varvec{w}_{\varvec{t}} \left( \varvec{x} \right)\) combines the individual task coefficients a to the shared u.

Finally, for concision

$$\varvec{W} = \varvec{UA} :\varvec{W} \in {\mathbb{R}}^{{\varvec{d} \times \varvec{T}}}$$
(4)

These coefficients are obtained from the historical observations among all agents (even if scarce). Among other methods, the most intuitive is the well-known technique of function fitting to the available history

$$\varvec{min} \left\{ {\sum\nolimits_{{\varvec{i} = 1}}^{\varvec{m}} {\varvec{L}\left( {\varvec{y}_{{\varvec{ti}}} , \left\langle {\varvec{a}_{{\varvec{ti}}} , \varvec{U}^{\varvec{T}} \varvec{x}_{{\varvec{ti}}} } \right\rangle } \right)} } \right\}:\varvec{a}_{\varvec{t}} \in {\mathbb{R}}^{\varvec{d}}$$
(5)

where L(.,.) measures the empirical deviation between the model outputs and the available data.

4 Architecture Differences

Figures 1 and 2 illustrate the conceptual difference between the classical and collaborative approaches. While the classical approaches use each set of observations independently, collaborative approach combines all observations, creating a common pattern without losing each agent’s uniqueness.

Fig. 1
figure 1

Classic, individual approach

Fig. 2
figure 2

Collaborative approach

5 The Classical Hilbert Space Approach

The classical Hilbert approach was previously designed to handle the lack of data, aiming to adapt to the ever-changing Brazilian consumer’s behavior is described in [6, 7] and will be summarized here.

5.1 Projection Theorem

Functional Analysis has been extensively applied to optimization processes [8]. It might be used on a statistical basis, as it is often found in communications, or on a deterministic point of view, the latter usually associated to Hilbert Spaces.

Hilbert Space elements may be seen as vectors, or, in our computerized world, data sequences representing loads, temperatures, economy index, etc. The Hilbert Space is a complete metric space [9], being able to approximate any given vector, always satisfying the Projection Theorem and the Orthogonality Condition [10].

This is shown in Fig. 3, where a given load vector is approximated by the vector sum of three “explaining variable” vectors, Ve1, Ve2, and Ve3 (for instance, GDP, income, and temperature).

Fig. 3
figure 3

Hilbert space decompositsion

Figure 4 illustrates the decomposition process for just one “explaining variable”. The original vector is projected (using the Projection Theorem) over the “explaining variable” (say, Ve1), yielding the “explained component”. The remaining orthogonal vector corresponds to the unexplained component, or the error vector.

Fig. 4
figure 4

Original vector decomposition over a first “explaining vector”

The unexplained component (error) will then be projected over the second explaining vector (say, Ve2) and the process will continue until the final error is considered negligible.

5.2 Parallel Processing Implementation

Let \(C\) be the desired vector to be decomposed by the set of “explaining variables-vectors” \(\underline{S} ,\underline{{S_{2} }} , \ldots ,\underline{{S_{N} }}\). Therefore, one should look for the optimum combination of these “basis” vectors

$$\underline{C} \cong \underline{\underline{S}} \; \underline{\alpha } = \left[ {\underline{{S_{1} }} ,\underline{{S_{2} }} , \ldots ,\underline{{S_{N} }} } \right] \underline{\alpha }$$
(6)

such as to minimize the error norm

$$min\underbrace {{\left\| {\underline{C} - \underline{\underline{S}} \;\underline{\alpha } } \right\|}}_{{\underline{\alpha } \left\| {\underline{\varepsilon } } \right\|}}$$
(7)

The Projection Theorem states the optimum approximation error is orthogonal to the space of “explaining vectors” and, therefore, to any of its elements, such as

$$\begin{array}{*{20}c} {\underline{\varepsilon }^{t} \underline{{S_{i} }} = \underline{C}^{t} \underline{{S_{i} }} - \underline{\alpha }^{t} \underline{\underline{S}}^{t} \underline{{S_{i} }} = 0} & {\text{for}} & {i = 1,2, \ldots ,N} \\ \end{array}$$
(8)

or, for all “explaining vectors”

$$\underline{C}^{t} \underbrace {{\left[ {\underline{{S_{1} }} ,\underline{{S_{2} }} , \ldots ,\underline{{S_{N} }} } \right]}}_{{\underline{\underline{S}} }} = \underline{\alpha }^{t} \left[ {\underline{\underline{S}}^{t} } \right]\underbrace {{\left[ {\underline{{S_{1} }} ,\underline{{S_{2} }} , \ldots ,\underline{{S_{N} }} } \right]}}_{{\underline{\underline{S}} }}$$
(9)

leading finally to the unique [9] optimum set of coefficients

$$\begin{array}{*{20}c} {\underline{\alpha } = } \\ \end{array} \left( {\underline{\underline{S}}^{t} \underline{\underline{S}} } \right)^{ - 1} \underline{\underline{S}}^{t} \underline{C}$$
(10)

The method is now able to work with large sets of “explaining vectors” in a very efficient way. Moreover, it solves the “co-integration” problem, automatically accommodating inter-correlated explaining variables, finding the best fit while eliminating possible “double counting” effects due to the interdependencies.

Finally, Hilbert Decomposition does not require a large historical period. Although, of course, more reliable information yields a more precise result, it will work at its best within a constrained history, and it suited to a lack of data framework. It has been successfully used in many Brazilian companies, and was able—until now—to yield a reliable forecast based on a mere 5-year history (60 monthly observations).

6 Case Study

6.1 The Challenge

The necessity of a new model, able to deal with lack of data, is shown in Fig. 5. After three years of stagnation, the load finally experienced a steep—and unexpected—rise.

Fig. 5
figure 5

Bahia (COELBA) load growth

The explanation to this phenomenon, however, was unclear. Figures 6, 7, and 8 show the classical model forecast results for a backtracking process (identification and projection) applied to three neighboring distributors (COELBA, CELPE, COSERN), based on usual explaining variables (GDP, Income, Temperature). There is a sensible, abnormal step associated to 2019 summer in all companies (in fact, all Brazilian distributors exhibited the same behavior, and many different statistical models led to similar results). No available model was able to predict—even to explain this response.

Fig. 6
figure 6

Bahia (COELBA) load dynamics

Fig. 7
figure 7

Pernambuco (CELPE) load dynamics

Fig. 8
figure 8

Rio Grande do Norte (COSERN) load dynamics

More than absorbing the deviations, the main question is should that step be an anomaly, or should it be a change in consumer’s behavior—in other words, is this a new permanent pattern? This question is, of course, related to the consumer’s reactions and the answer requires a deeper—non-statistical—understanding.

Extensive field research [11], based on behavioral economics [12, 13], uncovered an interesting fact: a disputed election restored the consumer’s belief on a stronger economy and a change for the better. This faith in the future, associated to an unusual warm summer, leads to the highest level of refrigeration equipment purchase observed in a decade.

It must be noticed that no economy or income growth backed up this trend: it was a matter of hope and belief. Therefore, no model based on past correlations would be able to account for this change.

As a consequence, consumers possess a new basis of installed demand, and will use it from now on. There is indeed a new standard, which will induce a new response, that must be predicted based on a few observations.

6.2 The Proposed Solution

The anomalous behavior was detected from May 2018. It would be very difficult, if not impossible, to apply existing models to as few as 12–18 months for model identification/validation.

We proceeded to try the collaborative learning technique. As our goal was predicting 2019 summer, we based our identification phase on the period from October 2017 to May 2018—where the behavior was still establishing. Of course, more observations will improve the results and will be used as they become available.

Figure 9, 10, and 11 compare the results obtained from our best classical Hilbert Space model (individual learning) and from the collaborative learning. It is interesting to notice that (as expected) the results show slightly higher errors during springtime (as consumers were still adapting, taking decisions, buying equipment). However, projection for summer months is much better.

Fig. 9
figure 9

Individual x collaborative learning, Bahia (COELBA)

Fig. 10
figure 10

Individual x collaborative learning, Pernambuco (CELPE)

Fig. 11
figure 11

Individual x collaborative learning, Rio G. do Norte (COSERN)

In any case, the proposed approach offered a clear enhancement on the overall forecast quality. All deviations are significantly lower, despite the almost non-existing information. Moreover, the “deviation trend” is broken, offering a more stable and reliable insight of the future.

7 Conclusions

We live in a changing world, and consumption dynamics is not an exception. Preparedness for the future requires the forecast of the unknown. It is crucial to build models that are able to quickly detect modifications—and know the difference from anomalies. It will be necessary to adapt, adjust, absorb novelties.

In the context, classical models, that try to repeat the past, will not be able to foresee the future. The ability to collect and store a huge history may not ensure the quality of information. Number of observations will not necessarily yield precision.

We propose a model designed for this new reality: a collaborative learning technique, able to combine information from different agents, identify common and individual characteristics and build a rich history without traveling back to a distant past.

The described approach was applied to a hard challenge: the projection of the summer load for three Brazilian distributors which broke any known record. A mere 8-month observed data was able to provide much better results for all companies, paving the path to explain the (previously) unexplainable behavior.

These promising results suggest an interesting way, which will be pursued and reported in the near future.