Keywords

Each appliance is modelled as an n-variate HMM, i.e., an HMM whose emitted symbols are represented by n values. More in details, each HMM is represented by the following parameters [76]:

  • the number of states \( m \in {{\mathbb {Z}}}_{+}\);

  • the hidden states x ∈{S 1, S 2, …, S m};

  • the symbols emitted \(\boldsymbol {\mu }_j \in {\mathbb {R}}^n\), where j = 1, …, s;

  • the symbol emission probability matrix M s×m;

  • the state transition probability matrix P ∈ [0, 1]m×m;

  • the starting state probability vector ϕ ∈ [0, 1]m.

In this book, it is assumed that each state of the HMM corresponds to a working state of the appliance, i.e., x ∈{ON1, ON2, …, OFF}, so that the number of states m is equal to the number of symbols s and M  ≡ I m×m (degenerate HMM). Furthermore, the values composing the emitted symbols represent the power consumption of the appliance: since the components are defined in an orthogonal space, the power consumptions which best fit with this constraint are the active and reactive power. Therefore, n = 2 and for the sake of clarity in the remainder of this section it will be omitted since the individual active and reactive power components will be made explicit. For example, each symbol is defined as μ j  =  [μ a,j μ r,j]T, where the subscripts a and r distinguish the active and reactive components. In the remainder of this section, there will be additionally treated the analysis in the unidimensional space, with n = 1, exploiting only one of the two components. For each appliance, the quantities to be estimated are the number of states m, the values of μ j for each state, the state transition probability matrix P and the starting state probability vector ϕ. Estimation of m and of μ j will be addressed in Sect. 4.1.1.

Regarding the state transition probability matrix P, each entry P ij represents the probability of transitioning from state i to state j. Thus, P ij can be estimated with a Maximum Likelihood criterion by calculating the number of times state i transitions to state j and normalizing by the total number of transitions from state i. Formally:

$$\displaystyle \begin{aligned} P_{ij} = \frac{C_{ij}}{\sum_{j'=1}^m C_{ij'}}, \end{aligned} $$
(4.1)

where C ij is the number of transitions from state i to state j. Typically, the greatest values in the matrix are located in the diagonal, meaning that the probability of remaining in the same state is higher compared to the probability of transitioning to another state. Table 4.1 shows a typical transition probability matrix related to an appliance with four working states. The highest values in the matrix are the ones located on the diagonal, which represent the probability of remaining in the same state, with respect to the transition to another one: indeed, for the states where the permanence time is low, this value is lower than the one of the state where the permanence time is higher. The highest value is the one related to the OFF state, because the activation of the appliance occurs after a long time in which it is turned off.

Table 4.1 An example of the HMM transition probability matrix

In addition, the OFF state corresponds to the initial state, since the footprint starts just before the turning on instant, thus ϕ = [0 0⋯0 1]T.

An example of a four-state appliance model is shown in Fig. 4.1, where the arc between two states is the probability of transition P ij, while the arc starting and closing on the same state represents the probability P ii of permanence in each state.

Fig. 4.1
figure 1

An example of a four-state HMM

The probability value which tends to zero denotes that the transition is unlikely. In practice, it is recommended to avoid zero probability value, because it is evaluated in \(\log \) scale in the AFAMAP algorithm, and it tends to infinity. It is recommended to fix the value to a little quantity, e.g., ≃ 10−5.

4.1 Additive Factorial Approximate Maximum A Posteriori (AFAMAP)

FHMMs have been introduced in [77] as an extension of HMMs to model time series that depend on multiple hidden processes. Starting from the work of Kim and colleagues [17], FHMMs have been largely employed for NILM and several approaches have been proposed in the literature [18, 22, 24, 27, 85, 86]. Among them, AFAMAP [21] represents an effective algorithm able to achieve high performance with a reasonable computational cost.

AFAMAP has been proposed in [21] as an efficient disaggregation algorithm based on FHMMs. In this algorithm, an additional model which relies on the same HMMs composing the Additive FHMM (AFHMM) is introduced. It is based on a differential version of the aggregated signal, resulting in a Differential FHMM (DFHMM). The inference on the set of states of multiple HMMs can be computed through the Maximum A Posteriori (MAP) algorithm and a relaxation towards real values is taken into account, leading to a convex Quadratic Programming (QP) optimization problem. The disaggregation process is performed by analyzing the aggregated power divided in non-overlapping frames.

The reference work [21] describes an unsupervised approach to data disaggregation: in fact, an unsupervised procedure aimed to the extraction of the device load signature is paired with the disaggregation algorithm, referred to as AFAMAP (Additive Factorial Approximate Maximum a Posteriori). In this work, the aim is to investigate and to improve the disaggregation algorithm. Differently to the reference work, however, a supervised approach is used to create the HMMs, based on the circuit level power consumption signature. The signal can be obtained, clearly, from the aggregated data under the condition that the appliances run one at a time [15].

The theoretical approach towards disaggregation is based on the work of Kolter and Jaakkola [21]. In this work the system is modelled relying on Additive Factorial Hidden Markov Model (AFHMM), for which the value of each aggregated power sample corresponds to a combination of working states of the appliances into the system.

Also, in this approach, the assumption that at most one HMM may change its state at any given time is made, which holds true if the sampling time is reasonably short. In this case, the transition on the aggregate power, when moving from a sample to the next, corresponds to the state change of a particular HMM.

Because of that, the differential signal, built from the aggregated power, can be modelled as the result of a Differential Factorial Hidden Markov Model (DFHMM), which relies on the same HMM models comprising the AFHMM.

By combining the two models, the inference on the set of states of multiple HMMs can be computed through the Maximum A Posteriori (MAP) technique, which takes the form of a Mixed Integer Quadratic Programming (MIQP) optimization problem.

One of the shortcomings of this approach is the non-convex nature of the problem, because of the integer nature of the variables: in this case, a relaxation towards real values is taken into account, leading to a convex Quadratic Programming (QP) optimization problem. Thus, the Additive Factorial Approximate MAP (AFAMAP) approach is obtained.

In a real case scenario, the modelled output may not match with the observed aggregated signal, due to electrical noises, very small loads, or leakages. In that case, the issue is addressed by defining a robust mixture component in both AFHMM and DFHMM, named z(t) and Δz(t), respectively.

When a denoised scenario [87] is considered, i.e., all the contributions to the aggregated energy demand are known, the robust mixture component is missing. When a noised scenario is considered, the robust mixture component is not used, and all the contributions are modelled as an additional appliance represented by the RoW model, which will be introduced in Sect. 4.1.2. This approach provides further advantages, since appliances with lower power consumption values risk to be modelled with working states associated to similar consumption values. This can lead the algorithm to an erroneous assignment of the disaggregation output between similar models. Furthermore, the authors in [21] demonstrated that the disaggregation performance degrades as the number of appliances increases. Thus, representing several appliances with a single model eases the disaggregation task.

In the reference work [21], the parameter n defines the problem dimensionality: in its presentation, it is assumed n = 1, because the algorithm uses only the active power data to characterize the observed aggregated signal.

Specifically, the parameters of the problem follows:

  • \( N \in {{\mathbb {Z}}}_{+}\) is the number of HMMs in the system;

  • \(\overline {y}(\tau ) \in {\mathbb {R}}\) is the observed aggregated output (in denoised environments \( \overline {y}(\tau ) = \sum _{i=1}^{N} {y}^{(i)}(\tau ) \), where y (i)(τ) corresponds to the true appliance output);

  • \(\sigma ^{2}_{1/2} \in {\mathbb {R}}\) is the observation variance.

The differential signal is referred to as \( {\varDelta \overline {y}}_{b}(\tau ) = \overline {y}(\tau ) - \overline {y}(\tau -1) \).

For the i-th HMM the parameters are:

  • \( m_{i} \in {{\mathbb {Z}}}_{+}\) is the number of states;

  • \(x^{(i)}(\tau ) \in \left \{ S_1,\dots , S_{m_{i}} \right \}\) is the HMM state at time instant τ (\(x^{(i)}(\tau ) \equiv S_{m_{i}} \) corresponds to the OFF state);

  • \(\mu _{j}^{(i)} \in {\mathbb {R}}\) is the j-th state mean value;

  • \(\boldsymbol {\phi }_{b}^{(i)} \in [0,1]^{m_{i}}\) is the initial states distribution;

  • \(\boldsymbol {P}_{b}^{(i)} \in [0,1]^{m_{i} \times m_{i}}\) is the transition matrix.

The aggregated signal \( \overline {y}(\tau ) \) is analysed using a windowing technique, where τ ∈ w f = [(f − 1)T + 1, …, fT] for f = 1, 2, …, F. The window w f is the timebase for the algorithm and, for convenience, a new temporal variable is introduced by defining the relation t = τ − (f − 1)T, for t = 1, 2, …, T, with \(T \in {{\mathbb {Z}}}_{+}\). After the analysis of all the F windows, the disaggregated signals \( \hat {y}^{(i)}(t) \) are recomposed using the inverse relation τ = t + (f − 1)T.

In the optimization problem, the variables are defined as:

$$\displaystyle \begin{aligned} \mathcal{Q} = \Big\{ \boldsymbol{Q}(x^{(i)}(t)) \in {\mathbb{R}}^{m_i}, \boldsymbol{Q}(x^{(i)}(t-1),x^{(i)}(t)) \in {\mathbb{R}}^{m_{i} \times m_{i}}\Big\}, \end{aligned} $$

for which the Q(x (i)(t))j variable is the indicator of the state assumed at time instant t, while the Q(x (i)(t − 1), x (i)(t))j,k variable is the indicator of the state transition from previous to actual time instant, for the i-th HMM.

The AFAMAP algorithm is shown in Fig. 4.2.

Fig. 4.2
figure 2

The AFAMAP algorithm

In (4.2) the error terms are defined as:

$$\displaystyle \begin{aligned} E'(t) = \Big( \overline{y}(t) - \sum_{i=1}^{N} \sum_{j=1}^{m_i} \Big\{ \mu_j^{(i)} Q(x^{(i)}(t))_j\Big\} \Big)^{2},\end{aligned} $$
(4.4)
$$\displaystyle \begin{aligned} E''(t) = \sum_{i=1}^{N} \sum_{\substack{j = 1\\ k=1 \\ k \neq j}}^{m_i} \Big \{ \left( {\varDelta \overline{y}}_{b}(t) - \varDelta\mu_{k,j}^{(i)} \right)^{2} Q(x^{(i)}(t-1),x^{(i)}(t))_{j,k} \Big\},\end{aligned} $$
(4.5)
$$\displaystyle \begin{aligned} E'''(t) = D\left( \frac{ {\varDelta \overline{y}}_{b}(t)}{\sigma_2}, \lambda\right) \Big(1 - \sum_{i=1}^{N} \sum_{\substack{j=1 \\k=1 \\ k \neq j}}^{m_i} Q(x^{(i)}(t-1),x^{(i)}(t))_{j,k}\Big).\end{aligned} $$
(4.6)

The QP optimization problem is defined in the form:

Minimize

(4.7)

subject to the constraint:

$$\displaystyle \begin{aligned} \begin{array}{rcl} \boldsymbol{A}_{eq} \boldsymbol{v} = \boldsymbol{b}_{eq}, \end{array} \end{aligned} $$
(4.8)
$$\displaystyle \begin{aligned} \begin{array}{rcl} \boldsymbol{lb} \leq \boldsymbol{v} \leq \boldsymbol{ub}. \end{array} \end{aligned} $$
(4.9)

The variables of the problem are represented by the vector v, which is composed of several subsets, based on the time instant t and the appliance index (i):

where the variables for the state are represented in ξ (i)(t), and the variables for the backward transition in β (i)(t).

The parameters of the problem fill up the elements of H and f, according to the structure of the v vector, whereas A eq and b eq are used to represent the consistent constraints between the state and the transition variables. The vectors lb and ub define the lower and upper boundaries of the solution: because of the nature of the variables [21], the lower boundary is equal to 0, whereas the upper boundary to 1, for all the elements in v (Fig. 4.3).

Fig. 4.3
figure 3

Additive FHMM model

In A eq the constraint about Q(x (i)(t − 1), x (i)(t)) with t = 1 has to be removed since there is no information about Q(x (i)(t)) at the previous time instant, thus falling back to the constraint 0 ⋅ Q(x (i)(t − 1), x (i)(t)) = 0 (Fig. 4.4).

Fig. 4.4
figure 4

Differential FHMM model

4.1.1 Appliance Modelling

The working states power level estimation consists in obtaining representative power level distributions related to each appliance state, i.e., the values of the emitted symbols μ j. In a realistic scenario, this is obtained by using a set of examples of an appliance typical consumption cycle. This information can be extracted by observing the aggregate power signal, under the assumption that only one appliance at time is operating [15].

In particular, this stage involves the extraction of a footprint of the appliance, i.e., the active and reactive power signals comprised between the power on (transition from the OFF state to an ON state) and the power off (transition from an ON state to the OFF state). This is performed by firstly identifying these instants by means of an Appliance Activity Detector (AAD). Basically, it consists in detecting when the active power level signal exceeds a certain threshold or not (typical values are in the order of 20 W). Isolated occurrences of power levels below the threshold are managed by employing a hangover technique: it is a counter, which decreases its value for each sample the signal is below the threshold. If the signal returns over the threshold before the end of the counter, the footprint is considered continued. The typical value is 5–10 min. The diagram of the footprint extraction stage is shown in Fig. 4.5a.

Fig. 4.5
figure 5

Diagram of the footprint extraction procedure (a) and of the training phase of the appliance models (b)

The power value and the temporal information of the OFF state cannot be obtained by analysing the signal extracted with the AAD. The value is reasonably assumed 0 W and 0 VAR for the active and reactive power signals, respectively. The temporal information, i.e., the typical interval intercurring between the OFF state and an ON state, has to be specified a-priori for each appliance based on the typical usages (e.g., once in an hour, three times in a day, etc.).

Different uses of the appliance in its life cycle from the user lead to the need of model representation of every combination of usage, under the assumption that the working state of the appliance are predetermined and not varying from different usage: reasonably, the working cycle of a washing machine is always the same (e.g., pre-washing, water heating, washing, rinsing and spinning), indifferently from the order of execution, thus the number of working state is predetermined for every appliance.

Complex appliances (e.g., washing machines, dishwashers) are characterized by several working cycles and the extraction of a single footprint might not be completely representative of its operation. This motivates the need to acquire several footprints for each appliance. Furthermore, even though only one footprint is sufficient to explore all the working states of an appliance, multiple footprints allow to employ more data for the power level extraction phase, particularly useful for those power levels characterized by a short duration.

The estimation of the power level associated to a state of the HMM relies on the appliance consumption data, which is not composed of discrete values of consumption, but it presents a continuous variability in the values. In order to find the averaged values of the signal, within the period of permanence in the same working state, a clustering procedure is adopted, and the k-means [88] has been selected as the algorithm.

Since the OFF state information is not present in the data, the number of clusters is set to (m − 1). After identifying the clusters, the power levels associated to each HMM state are represented by their centroids.

The clustering operation is not directly performed on the footprints extracted with the AAD. Indeed, after extracting the footprint, a bivariate histogram composed of 100 bins per kW and per kVAR is used to analyse the probability distribution of the active and reactive power signals. The number of bins is empirically chosen after analysing some footprints of the training set in order to obtain a sufficiently detailed histogram able to provide a good trade-off between variance and bias of the density estimate. Additionally, power levels with a low number of occurrences are excluded from the successive processing. More in details, bins having a number of occurrences below the threshold are considered of lower relevance, thus the related observations are discarded. This technique allows to obtain the number of working states m, which is determined by observing the number of clusters obtained in the final bivariate histograms. An example is shown in Fig. 4.6, where the histogram before and after the thresholding operation is shown. It refers to the dishwasher consumption in the AMPds dataset. Additionally, it reduces a limitation of the clustering algorithm: k-means does not employ the information on the samples distribution in the cluster, since it selects the centroid which satisfies the rule of convergence over all data. Discarding bins with low occurrences forces k-means to select the centroids with higher probability and to discard local clusters with lower probability, that could result in erroneous centroids. Furthermore, it allows to discriminate close clusters which can be confused as a single one: indeed, transients between near clusters produce samples comprised between the cluster with higher occurrences , which merge the two clusters in a single one.

Fig. 4.6
figure 6

An example of a two-dimensional histogram of the active and reactive power signals related to the dishwasher in the dataset AMPds

Figure 4.7 represents the relationship between the cluster obtained on the consumption values in the footprints, and the footprint itself. It is related to the washing machine of the household 1, in the ECO dataset. In this case, the univariate modelling case, representing the active power consumption, is considered. The histogram, depicted in Fig. 4.7a, represents the probability density function of the samples belonging to each state, which is correlated to the mean consumption value of the same state and its variability, as shown in Fig. 4.7b.

Fig. 4.7
figure 7

Washing machine in ECO, household 1. (a) Histogram of the power consumption values. (b) Footprint and clusters associated to the working states

The diagram of the clustering and of the model training stage is shown in Fig. 4.5b.

In general, clusters present different characteristics depending on the magnitude of their centroid. Typically, the ones characterized by high values (e.g., 3000 W) are highly variable, since they depend on the appliance usage by the user, e.g., the water temperature chosen in the washing machine or the rinsing cycle of the dishwasher affects the maximum power consumption. On the other hand, clusters characterized by low power value (e.g., 300 W) have lower variability, since deviation from the centroid is mainly caused by intermediate working stages of the appliance, and they do not depend on the usage.

Figure 4.8 shows an example of the inference procedure conducted on the active power signal only, denoted as P a, and on the joint active–reactive power signals, denoted as (P a, P r). The signals are related to the washing machine in the AMPds dataset. In particular, Fig. 4.8a shows the active power signal and the cluster membership of each sample when k-means operates on the P a signal only. Figure 4.8b, c show, respectively, the same active power signal and the reactive power signal, but the cluster membership is related to the outcome of k-means operating on the joint (P a, P r). Figure 4.8d shows at the bottom the 1-D P a line with the clusters obtained when k-means operates on the P a signal only and at the top the (P a, P r) plane with the clusters obtained when k-means operates on the joint (P a, P r) signals. In the figure, each cluster is depicted as an interval or as an ellipse whose size is twice the standard deviation of the cluster centred at its centroid. The number of clusters is different between the active power and the active and reactive power cases: in the first case 4 clusters can be identified, whereas the addition of the reactive power allows to distinguish 5 clusters. As shown in the figure, 2 clusters share the same value of active power, but differ in the reactive component. Using the reactive power, thus, allows to have a better representation of the working states of the appliance, therefore reducing the admissible combination of working states in the aggregated data.

Fig. 4.8
figure 8figure 8

Washing machine footprint and clusters in the dataset AMPds. (a) Footprint (P a) and cluster membership of each sample with k-means operating on P a. (b) Footprint (P a) and related clusters with k-means operating on (P a, P r). (c) Footprint (P r) and related clusters with k-means operating on (P a, P r). (d) Clusters in the (P a, P r) plane (above) and the P a line (below)

Since the pause interval between two footprint is not recorded, the user has to establish the time interval between two appliance activations, e.g., the typical time of use in the daytime or the number of activations per day of the appliance, in order to calculate the OFF interval and to use this value for the calculation of the transition probability related to the OFF state.

4.1.2 Rest-of-the-World Model

In a real case scenario, a noise contribution can be observed on the aggregated signal, due to electrical noises in the system, very small loads, leakages. This contribution can be considered as a source of power consumption, additionally to the appliances which the system tries to disaggregate, therefore it can be modelled with an HMM, as described in Sect. 4.1.1, leading to a noise model or Rest-of-the-World (RoW) model. The number of working states is a parameter which depends on the application scenario, therefore it has to be explored in the experimental phase, nevertheless it would be greater than the number of states defined for the appliances, since it represents a set of multiple load working at the same time. The data used for training this model can be extracted by observing the aggregate power signal, when all the appliances of interest are switched off and all the remaining equipment in the house are working.

Referring to Eq. (2.1), the training signal used to create the RoW model is the residual power consumption from the aggregated data, excluding the appliances power consumption:

$$\displaystyle \begin{aligned} e(t) = \overline{y}(t) - \sum_{i=1}^{N} y^{(i)}(t). \end{aligned} $$
(4.10)

In the case where the dataset comprises always-on appliances, since no operating cycle or footprint is defined in this case the RoW model does not include the OFF working state, as showed in Fig. 4.9.

Fig. 4.9
figure 9

The denoised aggregated power and the RoW signal, compared to the main aggregated power, in the AMPds. (a) Noised aggregated power vs denoised aggregated power. (b) Noised aggregated power vs RoW signal

The consumption values in the working states of the RoW model are extracted algorithmically using the k-means, even if there are no evident consumption values clusters, determined by any working state.

4.2 Algorithm Improvements

In the reference approach, the DFHMMs are obtained as the difference, in terms of power consumption, between the current and the previous sample (referred to as backward transition), so that a change in the state of an HMM can be evaluated against the change in the aggregated power consumption. Similarly, an additional evaluation, based on the next against the current sample (referred to as forward transition), is carried out. Furthermore, a smarter employment of the solver boundaries is evaluated, starting from a more accurate analysis of the aggregated power or using heterogeneous information, as the reactive power consumption of the electrical system.

Since the AFAMAP algorithm operates offline, it is possible to further extend the model by taking into account the transition from the current to the next state. The original DFHMM [21] is computed by looking backward from the current sample to the previous one, and thus it can be addressed to as Backward DFHMM. The new differential FHMM is computed by looking forward, as showed in Fig. 4.10, and thus is referred to as Forward FHMM.

Fig. 4.10
figure 10

The forward differential FHMM

The formulation of the new model, also, differs from the original one, only in the index order. The new variables define the problem, as follows:

$$\displaystyle \begin{aligned} \mathcal{Q} = \Big\{ \boldsymbol{Q}(x^{(i)}(t)) \in {\mathbb{R}}^{m_i}, \boldsymbol{Q}(x^{(i)}(t+1),x^{(i)}(t)) \in {\mathbb{R}}^{m_{i} \times m_{i}}\Big\}, \end{aligned}$$

where the variables are indicators of the transition from the next to the current state: Q(x (i)(t))j = 1 ⇔ x (i)(t) = S j, and Q(x (i)(t + 1), x (i)(t))j,k = 1 ⇔ x (i)(t + 1) = S j, x (i)(t) = S k. The consistent constraints between the state variables and transition variables need to be satisfied:

(4.11)

Therefore, the new cost function is derived from the Forward DFHMM, based on the forward differential aggregated signal \( {\varDelta \overline {y}}_{f}(t) = \overline {y}(t) - \overline {y}(t+1) \), as follows:

$$\displaystyle \begin{aligned} \begin{aligned} & \frac{1}{2 {\sigma_3}^{2}} \sum_{t=1}^{T-1} E^{''''}(t) + \frac{1}{2} \sum_{t=1}^{T-1} E^{'''''}(t) \\ &\quad + \sum_{t=1}^{T-1}\sum_{i=1}^{N}\sum_{\substack{j = 1\\ k=1}}^{m_i} \Big\{ Q(x^{(i)}(t+1),x^{(i)}(t))_{j,k} \left(-\log {P_{f}}_{k,j}^{(i)}\right)\Big\} \\ &\quad + \sum_{i=1}^{N}\sum_{j=1}^{m_i} \Big\{Q(x^{(i)}(T))_{j} (-\log {\phi_{f}}_{j}^{(i)})\Big\}, \end{aligned} {} \end{aligned} $$
(4.12)

where the error terms in (4.12) are defined as:

$$\displaystyle \begin{aligned} E^{''''}(t) = \sum_{i=1}^{N} \sum_{\substack{j = 1\\ k=1 \\ k \neq j}}^{m_i} \Big \{ \left( {{\varDelta \overline{y}}_{f}}(t) - \varDelta\mu_{k,j}^{(i)} \right)^{2} Q(x^{(i)}(t+1),x^{(i)}(t))_{j,k} \Big\}, \end{aligned} $$
(4.13)
$$\displaystyle \begin{aligned} E^{'''''}(t) = D\left( \frac{ {{\varDelta \overline{y}}_{f}}(t)}{\sigma_3}, \lambda\right) \Big(1 - \sum_{i=1}^{N} \sum_{\substack{j=1 \\k=1 \\ k \neq j}}^{m_i} Q(x^{(i)}(t+1),x^{(i)}(t))_{j,k}\Big). \end{aligned} $$
(4.14)

The transition matrix \( \boldsymbol {P}_{f}^{(i)} \) represents the probability of state change from the next to the current time instant: this parameter is equivalent to the typical representation of the transition matrix (i.e., the probability of state change from the previous time instant to the actual) evaluated after flipping the signal, thus it can be derived by using the available algorithm for HMM training. The parameter \( \boldsymbol {\phi }_{f}^{(i)} \) represents the final state distribution, that is the initial state distribution starting from the end of the signal.

Since the duality in the forward and backward representation of the AFHMM (i.e., it is derived from the same observed signal, but in opposite directions), the problem definition using only one of the two versions of the DFHMM leads to the already known performance. Considering simultaneously both versions of DFHMM may lead to performance improvements: for this reason the forward differential function (4.12) is added to the reference formulation (4.25), thus leading to a new optimization problem.

The variable vector v in the QP problem accounts for the new terms, following the same structure introduced in Sect. 4.1:

where the new term ϕ (i)(t) represents the variables for the forward transition.

The introduction of the new variables leads to an alteration of the problem constraints, represented by the parameters A eq and b eq, and the variable boundaries lb and ub. In A eq the constraint about Q(x (i)(t + 1), x (i)(t)) with t = T has to be removed since there is no information about Q(x (i)(t)) at the following time instant, thus falling back to the constraint 0 ⋅ Q(x (i)(t + 1), x (i)(t)) = 0.

In order to solve the optimization problem, different solutions, which satisfy the constraints, need to be evaluated before the solver finds the optimal one. As such, the values of v that are not compatible with the given set of samples can be discarded, to restrict the search domain and improve the search efficiency.

On purpose, the lower and upper boundaries of the variable v are selected beforehand in order to prevent that the solver investigates those combinations of states that do not match the value of the aggregated power consumption. The selection method is similar to the one proposed in [89].

If several runs of a single appliance are evaluated, although the same working states are identified, the signature tends to differ from a run to the other. For this reason, the appliance power consumption can be modelled as a stochastic process and, therefore, the output value y (i)(t), relative to a working state x (i)(t) of an appliance, can be modelled as a gaussian variable, described by a mean value and a variance value:

(4.15)

In regard to this, the power signal is replaced by a simplified model that presents a constant power consumption, corresponding to the mean value of the working state power value, with a superimposed noisy contribution, described by the variance value in the working state.

Since the aggregated data \(\overline {y}(t)\) is assumed to correspond with the sum of the power consumption of each appliance, it can be modelled as a gaussian variable, described by a mean value and a variance value equivalent to the sum of the corresponding values of each appliance, under the assumption of statistical independence between the appliances:

(4.16)

This simplified model results in a number of admissible combinations of working states equal to \( \prod _{i=1}^{N}{m_{i}} \). It allows to evaluate which combination of working states fits the power value for each sample of the aggregated data, thus discarding the incompatible ones. The effectiveness interval for each combination is centred in mean value, and its width is twice the value of the standard deviation. For some combinations, which have similar mean value or great variance, the effectiveness intervals result overlapped: for those cases, if the power value falls in this region, both the combinations are considered valid.

Based on this observation, it is possible to manipulate the boundaries of the QP problem domain. For instance, if two HMMs are considered, M1 and M2, whose power levels are M1  =  {70, 0} and M2  =  {100, 20, 0}, respectively, the different combined power levels are {0, 20, 70, 90, 100, 170}, each one with its own variance value. This example is represented in Fig. 4.11. Considering a few different values of aggregated power, e.g., \( \overline {y}(t) = \{20, 95, 140\} \), it can be observed that \( \overline {y}(t) = 20 \) is obtained as the combination (x (1)(t)  =  S 2, x (2)(t)  =  S 2), therefore the allowed constraints are defined as:

Fig. 4.11
figure 11

A sketch of the different probability density functions (PDF) for each aggregated power value produced by the combination of all appliances states power levels

If \( \overline {y}(t) = 95 \), the value falls in an overlapped interval, belonging to the combinations (x (1)(t)  =  S 2, x (2)(t)  =  S 1) and (x (1)(t)  =  S 1, x (2)(t)  =  S 2), thus, the allowed constraints are defined as:

whereas if \( \overline {y}(t) = 140\), no combination is corresponding, thus the boundaries remain as default.

Clearly, the same process can be applied to bound the β (i)(t) and ϕ (i)(t). In regard to this, however, since transitions are related to the steady states, the evaluation of the steady states is enough to bound both kinds of variables.

Even though disaggregation is aimed for the aggregated power consumption, in most cases the focus is centred on the active power alone. Nonetheless, given the generality of the AFAMAP algorithm, targeting the reactive aggregated power is also possible. In regard to this, in the present work, the application of the AFAMAP algorithm to the aggregated reactive power has been investigated as well, based on the fact that reactive power is a common trait of the power signature of a residential appliances subset.

In the current scenario, the disaggregation of the reactive power samples is carried out, in order to collect additional information about the activity states of the appliances. This information, in turn, is used to further define the lower and the upper boundaries of the states in the active power disaggregation. Similarly to the active power case, the HMMs are modelled for each appliances starting from the signature in the reactive power and the AFAMAP algorithm is run by using the aggregated reactive power signal as input.

Following the basic knowledge in circuit theory, an electrical load with a reactive component (i.e., an appliance) which has a reactive power consumption greater than 0 is necessarily turned on, therefore the boundaries of the problem in active power disaggregation are assigned as follows:

Although when the reactive power consumption is 0, the active component could be both null or greater than 0, depending on whether the appliance is turned off or only the load passive component is working. Therefore, the boundaries of the problem in active power disaggregation are set as default.

4.2.1 Experimental Setup

The dataset used for the experiments is the Almanac of Minutely Power dataset (AMPds) [58]: it contains recordings of consumption profiles belonging to a single home in Canada for a period of 2 years at 1 min sampling rate. It provides active and reactive power at appliance level, unlike most of the dataset in which only the active power is provided at appliance level, as described in Sect. 2.3: this information is crucial to test the new approach based on the reactive power disaggregation as constraint. Analysing the contents of the dataset, it can be noticed that the usage of the appliances is homogeneous throughout the entire period, therefore the experiments are evaluated on 6 months of data, which can be considered a representative of the entire dataset. To create the HMM models of the appliances, the training requires at least one signature per appliance, although multiple signatures lead to a more general model. In the proposed work, a subset of the data, spanning over 14 days, has been deemed sufficient to collect all the signatures required to train all the HMMs. The HMM are trained in accordance with the Baum-Welch algorithm, after determining the ground truth state over the time: those are obtained through a clustering procedure, in which every cluster represents a power consumption level of the appliance, thus a state of the HMM. This process is achieved using the k-means algorithm, in which the number of the clusters is imposed in a supervised manner, starting from the knowledge of the operating states of the appliance. The power level mean and the variance values are achieved by means of a gaussian variable fitting procedure over the samples belonging to each cluster. To satisfy the condition of denoised system, the aggregated data is synthetically composed by summing the appliance level power signals. The experiments are conducted by using the appliances at higher contribution, therefore 6 appliances have been chosen: dryer, washing machine, dishwasher, fridge, oven and heat pump. The simulations are conducted in Matlab environment and the CPLEX solver is used to solve the QP problem. The value of starting probability \( \boldsymbol {\phi }^{(i)}_{b} \) of the i-th HMM is imposed to assume the certainty for the OFF state for f = 1, whereas for the consecutive windows, 1 < f ≤ F, it is imposed to assume the value of the last sample ξ (i)(T) of the previous window, in order to ensure the contiguity of the solution on the window border. The value of the ending probability \( \boldsymbol {\phi }^{(i)}_{f} \), instead, is uniformly imposed in every state, since no information from the consecutive window is available. Different experiments are conducted varying the size of the windows between the values T ∈{10, 30, 60, 90, 120} min, and the effectiveness of the innovative aspect is evaluated: the introduction of the forward term in the cost function, the selection of the boundaries related to the aggregated power level and to the disaggregation output of the reactive power. The variance parameters are defined with σ 1 2  =  σ 2 2  =  σ 3 2  =  0.01 according to the variance of the experimental data and the regularization parameter λ = 1.

4.2.2 Results

The results of the experiments, based on the scenario described in Sect. 4.2.1, are presented in the current section.

In Fig. 4.12, the AFAMAP disaggregated power consumption profiles of the appliances are compared against the corresponding true outputs, provided by the dataset: in the figure a time span of 10 h, corresponding to 600 samples, is considered. At the bottom, the energy distribution over the same period, expressed among the appliances in terms of percent value, is compared between the reconstructed and the true appliances consumption.

Fig. 4.12
figure 12

Appliances consumption: estimated AFAMAP disaggregation output against original signals

The signals reveal that the appliances which show a high steady power consumption are easily recognized, whereas the appliances with complex working cycles, or with several power levels, are more difficult to detect. Indeed, whenever several appliances present similar consumption levels, many combinations may satisfy the problem constraints, thus additional information is required to identify the active appliances. For instance, in Fig. 4.12, the oven and the fridge are seldom recognized, whereas the detection of the dryer and the washing machine are partially more successful.

The evaluation of the algorithm performance is carried out by means of the metrics proposed in Sect. 2.4. Although the focus of the present work is on the AFAMAP algorithm, the dataset being used and the proposed training method are different with respect to [21], therefore a direct comparison against the results proposed in the reference work is not possible. To overcome this shortcoming, the baseline has been created anew, by means of the AFAMAP algorithm, the AMPds dataset and the proposed training method.

The disaggregation results computed by means of the metrics are reported in Fig. 4.13: in Fig. 4.13a the state based metric is presented, whereas the energy based metric is proposed in Fig. 4.13b. The results are shown for different values of the time window length. Clearly, since all the results exceed 0.5, the plots have been drawn from 0.5 onwards.

Fig. 4.13
figure 13

Disaggregation performance on AMPds dataset using 6 appliances, with different algorithm configuration. (a) State based metric: \( F_1^{(S)} \). (b) Energy based metric: \( F_1^{(E)} \)

Both plots show that the best results are achieved using the shortest time window. On a different note, however, not every configuration improves in the same way.

Focusing on the state based metrics, it is possible to observe that the AFAMAP baseline shows a significant performance improvement with the decreasing of the window length, except when passing from the 30 to 10 min window size. On the contrary, the forward differential model shows an improvements at the shorter window size, resulting in the best performance in the unbounded problem solution, with an \( F_1^{(S)} \) of 0.738 and an improvement of 1% with respect to the baseline.

Fixing the boundaries of the problem, in every simulation case, gives the benefit on the disaggregation results: the profile based method gives a considerable performance improvements with every window size, but the highest relative improvement can be noted at the smallest size, resulting to an \( F_1^{(S)} \) of 0.863 and a relative improvement of 18%.

Alternatively, the boundaries can be set based on the reactive power disaggregation feedback: the results, showed in Table 4.2, demonstrate that the reactive power reaches high performance in disaggregation. This is due to the high difference in the reactive components of each appliance, which involves a strong distinction in the creation of the HMM, therefore allowing a highly reliable disaggregation. The usage of this information results in a performance improvement for every window size, more considerable at the smallest size: in general, the usage of the reactive power feedback gives benefits to the disaggregation, with an \( F_1^{(S)} \) of 0.802 and a relative improvement of 9.7%, therefore less than the profile based constraints.

Table 4.2 Disaggregation results on reactive power

Clearly, the same trends presented about the state based metrics still hold true when evaluating with the energy based metrics. The most notably difference between the two plots, in fact, is that the rate of improvement of the algorithms when decreasing the time window length: indeed, the forward differential model introduction results to an \( F_1^{(E)} \) of 0.771 and a relative improvement of 1.2% with respect to the baseline, whereas the profile based setting of the boundaries results to an \( F_1^{(E)} \) of 0.878 with a relative improvement of 15.2% and the reactive power based method to an \( F_1^{(E)} \) of 0.832 with an improvement of 9.2%.

The forward differential model seems to be beneficial only with the shortest time window: it may be a direct consequence of the problem formulation alteration. Indeed, the introduction of additional variables increases the size of the problem, therefore the computational burden, for which the solver demonstrates worst performance, as it happens for the baseline approach with larger window size.

Despite this, the improvements achieved adding the differential forward information to the model are restricted to the application scenario: since the algorithm operates on a per-sample basis, for each appliance behaviour two state changes unlikely happen across three contiguous samples, thus the forward difference cannot provide a substantial support to the inference of the actual working state.

The errors in the disaggregation phase are caused by the multiplicity of states combinations which can correspond to the same value of the aggregated data: for this reason the use of boundaries allows to exclude some solutions that are not eligible, therefore facilitates the solver to find the exact solution to the problem. Nevertheless, the variation over time of the power consumption associated to a specific appliance working state causes an unwanted variability, i.e., a noise component, in the achieved solution.

4.3 Exploitation of the Reactive Power

In this section, a disaggregation algorithm based on FHMMs and active and reactive power measured at low sampling rates is proposed. The HMM models of the appliances and the proposed solution for obtaining their parameters from a training dataset are described. Load disaggregation is performed by proposing a reformulated version of the Additive Factorial Approximate Maximum a Posteriori (AFAMAP) algorithm [21] that allows a straightforward extension to the bivariate case. The experimental evaluation has been conducted on the Almanac of Minutely Power dataset (AMPds) dataset [58] in noised and denoised scenarios, and the proposed solution has been compared to AFAMAP based on the active power only and to two variants of Hart’s algorithm [15] both based on active and reactive power. The results show that in terms of energy based F 1-Measure (\( F_1^{(E)} \)) the proposed approach provides a significant performance improvement with respect to the comparative methods.

Apart from [20], the aforementioned approaches employ the active power as the sole electrical parameter for NILM, despite some algorithmic frameworks have been formulated for operating on multidimensional feature vectors [21]. The reactive power has been employed since the very first work by Hart [15] and in more recent works based on the same principles [26,28,29,30,94] or on transient-state analysis [13,31,32,33,34,50]. However, to the best of author’s knowledge, the only work that employs both the active and reactive power in the FHMM framework is the work by Zoha and colleagues [20].

Following a similar philosophy, a disaggregation algorithm based on FHMMs that uses both the active and reactive power is proposed. However, differently from [20], where the disaggregation algorithm is based on the structural variational approximation method and on the Viterbi algorithm, in the proposed approach the active power is disaggregated by reformulating the AFAMAP algorithm for the bivariate case. As demonstrated in [21], this allows the introduction of a Differential FHMM (DFHMM) that improves the performance and reduces the computational cost. Thus, differently from [20], here the reactive power component is introduced also in the DFHMM. More in details, the proposed solution belongs to the family of supervised approaches based on steady-state signals acquired from low frequency measurements. The reactive power is introduced in the FHMM framework by employing bivariate hidden Markov appliance models whose emitted symbols are represented by active and reactive power pairs. Differently from [20], the entire procedure for obtaining the bivariate HMM appliance models is described. The parameters are estimated by clustering the appliance disaggregate signals and the bivariate optimization problem is solved by proposing an alternative formulation of AFAMAP [21] for disaggregating appliances consumption profiles. The proposed approach differs from the one presented in Sect. 4.2, since there the reactive power was employed alone in an initial disaggregation stage whose output served as a constraint for the subsequent disaggregation of the active power only. The proposed approach has been compared to the original AFAMAP algorithm [21], which employs the active power only, and to Hart’s algorithm [15], which employs both the active and reactive power. In order to deal with the occurence of multiple appliance combination, two implementation of Hart’s algorithm have been developed: in the first, the final combination is selected randomly. In the second, it is selected by choosing the most probable combination calculated on a training set. The experiments have been conducted on the Almanac of Minutely Power dataset (AMPds) [58], containing recordings of consumption profiles belonging to a single home for a period of 2 years at 1 min sampling rate. Both the “noised” and the “denoised” scenarios have been addressed, and the results show that the proposed approach outperforms both AFAMAP and Hart’s algorithm.

Finally, in [20] the experiments are conducted on low-power appliances only in a “denoised scenario”, while here the “noised” is also considered.

In the following, the superscript (i) denotes terms related to HMM i, while subscripts a or r denote terms related to the active and reactive power components, respectively. The subscript c ∈{a, r} denotes a term related to the active or to the reactive power component. The parameters of the problem are the following:

  • \( N \in {{\mathbb {Z}}}_{+}\) is the number of HMMs in the system;

  • \( \overline {\boldsymbol {y}}(\tau ) \in {\mathbb {R}}^{n}\) is the observed aggregate output, where τ = 1, 2, …,  Υ is the sample index and Υ is the total number of samples;

  • \(\boldsymbol {\varSigma }_1 \in {\mathbb {R}}^{n \times n}\) is the observation covariance matrix related to the AFHMM;

  • \(\boldsymbol {\varSigma }_2 \in {\mathbb {R}}^{n \times n}\) is the observation covariance matrix related to the DFHMM;

  • \( \varDelta \boldsymbol {\overline {y}}(\tau ) = \boldsymbol {\overline {y}}(\tau ) - \boldsymbol {\overline {y}}(\tau -1) \) is the differential signal.

As aforementioned, all the contribution to the aggregated power are considered, thus:

$$\displaystyle \begin{aligned} \boldsymbol{\overline{y}}(\tau) = \sum_{i=1}^{N} \boldsymbol{{y}}^{(i)}(\tau), \end{aligned} $$
(4.17)

where y (i)(τ) corresponds to the ground truth consumption of the appliances and the noise. Recalling the notation of Chap. 4, the parameters of the i-th HMM at the sample index τ are:

  • \( m_{i} \in {{\mathbb {Z}}}_{+}\) is the number of states;

  • \(x^{(i)}(\tau ) \in \left \{ S_1,\dots , S_{m_{i}} \right \}\) is the HMM state at time instant τ, where \(S_{m_{i}}\) corresponds to the OFF state (if present);

  • \(\boldsymbol {\mu }_j^{(i)}\) is the emitted symbol in the j-th state, where j = 1, 2, …, m i;

  • \(\boldsymbol {\phi }^{(i)} \in [0,1]^{m_{i}}\) is the initial states probability distribution;

  • \(\boldsymbol {P}^{(i)} \in [0,1]^{m_{i} \times m_{i}}\) is the state transition probability matrix.

The aggregate signal \(\boldsymbol {\overline {y}}(\tau ) \) is analysed using non-overlapping frames of length T. Each frame \(\overline {\boldsymbol {y}}_f(\tau )\), where f = 1, 2, …, F, is defined as

$$\displaystyle \begin{aligned} \overline{\boldsymbol{y}}_f(\tau) = \begin{cases} \boldsymbol{\overline{y}}(\tau) & \text{if } \tau = (f-1) T + 1, \ldots, f T, \\ 0 & \text{otherwise}. \end{cases} \end{aligned} $$
(4.18)

After the analysis of all the F =  Υ∕T frames, the disaggregated signals \( \hat {\boldsymbol {y}}^{(i)}(\tau ) \) are reconstructed as follows:

$$\displaystyle \begin{aligned} \hat{\boldsymbol{y}}^{(i)}(\tau) = \sum_{f=1}^F \hat{\boldsymbol{y}}^{(i)}_f(\tau). \end{aligned} $$
(4.19)

In the following, the algorithm is formulated for a single frame of the signal and for convenience, a new temporal variable t is defined with the relation t = τ − (f − 1)T, for t = 1, 2, …, T, with \(T \in {{\mathbb {Z}}}_{+}\).

In [21], the parameter n defines the problem dimensionality: the authors use only the active power data to characterize the observed aggregated signal \(\overline {y}_a(t)\), therefore they assumed n = 1. In this work, both the active and the reactive power are used for disaggregation, therefore n = 2 and the problem variables are decomposed in two components:

(4.20)
(4.21)

Since the statistical independence between the active and reactive power components is supposed, the covariance terms σ a,r and σ r,a are zero in both Σ 1 and Σ 2, and the same problem formalization as the n = 1 case can be used, introducing additional variables and constraining them each other. For the generic power component c, the variables in the optimization problem are defined as follows:

$$\displaystyle \begin{aligned} \mathcal{Q}_c = \left\{ \boldsymbol{Q}_c(x^{(i)}(t)) \in {\mathbb{R}}^{m_i}, \boldsymbol{Q}_c(x^{(i)}(t-1),x^{(i)}(t)) \in {\mathbb{R}}^{m_{i} \times m_{i}}\right\}. \end{aligned} $$
(4.22)

In the vector Q c(x (i)(t)), the element Q c(x (i)(t))j indicates the state assumed at time instant t, while in the matrix Q c(x (i)(t − 1), x (i)(t)) the element Q c(x (i)(t − 1), x (i)(t))jk indicates the state transition from previous to the current time instant.

This problem statement is a reformulated version of the algorithm proposed in [21]: since the original algorithm allows to operate with multivariate dimension, the variables associated to the state represent all the components. When only one dimension is considered, the variables \(\mathcal {Q}_a\) is only associated at the active power level consumption. This problem statement instead started from the univariate formulation, and the algorithm is extended to n = 2 by using twice the optimization variables, thus introducing the \(\mathcal {Q}_r\) variable set, and an additional minimization function. Moreover, the supplementary variables need to be constrained to the original ones in order to assume the same value during the optimization process, representing the bivariate resolution problem with a univariate problem formalization:

(4.23)

A numerically safer definition of the constraints can be defined using a tolerance α and inequalities:

(4.24)

where j, k = 1, …, m i.

Algorithm 1 The proposed disaggregation algorithm

The final algorithm is shown in Algorithm 1. In Eq. (4.25), the error terms are defined as:

$$\displaystyle \begin{aligned} E^{\prime}_c(t) &= \left( \overline{y}_{c,f}(t) - \sum_{i=1}^{N} \sum_{j=1}^{m_i} \mu_{c,j}^{(i)} Q_c(x^{(i)}(t))_j \right)^{2}, \end{aligned} $$
(4.27)
$$\displaystyle \begin{aligned} E^{\prime\prime}_c(t) &= \sum_{i=1}^{N} \sum_{\substack{j = 1\\ k=1 \\ k \neq j}}^{m_i} \bigg \{ \left( {{\varDelta \overline{y}}}_{c,f}(t) - {\varDelta\mu}_{c,kj}^{(i)} \right)^{2} Q_{c}(x^{(i)}(t-1),x^{(i)}(t))_{jk} \bigg \}, \end{aligned} $$
(4.28)
$$\displaystyle \begin{aligned} E^{\prime\prime\prime}_{c}(t) &= D\left( \frac{ {{\varDelta \overline{y}}}_{c,f}(t) }{ {\sigma_{c,2}} }, \lambda\right) \left(1 - \sum_{i=1}^{N} \sum_{\substack{j=1 \\k=1 \\ k \neq j}}^{m_i} Q_{c}(x^{(i)}(t-1),x^{(i)}(t))_{jk}\right).\end{aligned} $$
(4.29)

The QP optimization problem is defined as follows:

Minimize

$$\displaystyle \begin{aligned} \frac{1}{2} \boldsymbol{v} ^{T} \boldsymbol{H } \boldsymbol{v} + \boldsymbol{f} ^{T} \boldsymbol{v} ,\end{aligned} $$
(4.30)

subject to the constraints:

$$\displaystyle \begin{aligned} \begin{array}{rcl} \boldsymbol{A}_{eq} \, \boldsymbol{v} = \boldsymbol{b}_{eq}, \end{array} \end{aligned} $$
(4.31)
$$\displaystyle \begin{aligned} \begin{array}{rcl} \boldsymbol{lb} \leq \boldsymbol{v} \leq \boldsymbol{ub}. \end{array} \end{aligned} $$
(4.32)

The variables of the problem are represented by the vector v = [v a v r]T whose components are defined as follows:

(4.33)
(4.34)

where the variables for the state are represented in ξ (i)(t), and the variables for the transition in β (i)(t).

The parameters of the problem, e.g., the HMMs parameters and the aggregated power signal, comprise the elements of H and f, according to the structure of the v vector. In a QP problem, the coefficient of the quadratic terms in the cost function is defined in H, as a symmetric matrix. In the proposed approach, since the independence between the active and reactive power is assumed, there are no joint quadratic terms, therefore H is structured as follows:

(4.35)

Differently, the coefficients of the linear terms are expressed in f = [f a f r]T, whereas A eq and b eq are used to represent the consistent constraints between the state and the transition variables. The vectors lb and ub define the lower and upper boundaries of the solution: because of the nature of the variables [21], the lower boundary is equal to 0, whereas the upper boundary to 1, for all the elements in v.

Additional constraints to QP problem need to be considered, in order to impose the inequality constraints between the optimization variables. Duplicating the constraints of Eq. (4.24):

$$\displaystyle \begin{aligned} \begin{cases} -\alpha \leq Q_a(x^{(i)}(t))_j - Q_r(x^{(i)}(t))_j, \\ Q_a(x^{(i)}(t))_j - Q_r(x^{(i)}(t))_j \leq \alpha, \end{cases} \end{aligned} $$
(4.36)
$$\displaystyle \begin{aligned} \begin{cases} -\alpha \leq Q_a(x^{(i)}(t-1),x^{(i)}(t))_{jk} - Q_r(x^{(i)}(t-1),x^{(i)}(t))_{jk}, \\ Q_a(x^{(i)}(t-1),x^{(i)}(t))_{jk} - Q_r(x^{(i)}(t-1),x^{(i)}(t))_{jk} \leq \alpha, \end{cases} \end{aligned} $$
(4.37)

results in the following optimization constraint:

$$\displaystyle \begin{aligned} \begin{array}{rcl} \boldsymbol{A}_{ineq} \, \boldsymbol{v} \leq \boldsymbol{b}_{ineq}. \end{array} \end{aligned} $$
(4.38)

This is needed only for the joint active–reactive problem, since, solving only for the active power, the related unique variable is not constrained to other variables. Indeed, in Eq. (4.25) only the active power terms need to be considered. Further details on the terms H, f, A eq, b eq, lb, ub, A ineq and b ineq are provided in 4.3.1.

As aforementioned, the aggregate signal is analysed in frames of length T. In the first frame, the value of starting probability vector ϕ (i) = [0 0 ⋯ 0 1], i.e., the appliance is initially assumed in the OFF state. In the subsequent frames, the value of ϕ (i) depends on the last state assumed in the previous frame in order to ensure the contiguity of the solution at the border. Thus, if the last state assumed in the previous frame is j, the corresponding element of ϕ (i) is set to 1, while the others are set to 0. This information is represented by the value of the solution ξ (i)(t) in the last sample t = T.

4.3.1 AFAMAP Formulation

This subsection provides further details on the algorithm formulation presented in Sect. 4.3. In particular, the following terms of the QP problem are described: H, f, A eq, b eq, lb, ub, A ineq and b ineq.The matrix H is structured as follows:

(4.39)

where H c ∈{H a, H r} is given by:

(4.40)

and

(4.41)

Regarding the vector f, in Sect. 4.3 it has been defined as follows:

(4.42)

where f c ∈{f a, f r} is given by the sum of five terms:

$$\displaystyle \begin{aligned} \boldsymbol{f}_c = - \boldsymbol{f}_{c,1} - \frac{1}{\sigma^2_{c,1}} \boldsymbol{f}_{c,2} - \boldsymbol{f}_{c,3} + \frac{1}{2} \frac{1}{\sigma^2_{c,2}} \boldsymbol{f}_{c,4} - \frac{1}{2} \boldsymbol{f}_{c,5}, \end{aligned} $$
(4.43)

where

(4.44)
(4.45)
(4.46)
(4.47)
(4.48)
(4.49)
(4.50)
(4.51)

where:

$$\displaystyle \begin{aligned} D(y,\lambda) = \min \left\{\frac{1}{2} y^{2}, \max\left\{ \lambda |y| - \frac{\lambda^{2}}{2}, \frac{\lambda^{2}}{2} \right\} \right\}. \end{aligned} $$
(4.52)

The matrix A eq is defined as follows:

(4.53)
(4.54)
(4.55)
(4.56)
(4.57)
(4.58)
(4.59)

The vector b eq has the following form:

(4.60)
(4.61)
(4.62)
(4.63)

The matrix A ineq is given by:

(4.64)
(4.65)
(4.66)

Finally, the vector b ineq is given by:

(4.67)

As described in Sect. 4.3, the dimensionality of the variables vector and, accordingly, of each elements of the QP problem is defined as follows:

  • v c: l-dimensional vector;

  • H c: [l × l] symmetric matrix;

  • f c: l-dimensional vector;

  • A eq,c: [m × l] matrix;

  • b eq,c: m-dimensional vector;

  • lb, ub: 2 l -dimensional vector;

  • A ineq: [2 l × 2 l] matrix;

  • b ineq: 2 l-dimensional vector;

where \( l = T \cdot \sum _{i=1}^{N}(m_{i} + m_{i}^{2}) \) and \( m = T \cdot \sum _{i=1}^{N}(1 + 2 m_{i}) \).

4.3.2 Experimental Setup

The proposed approach has been compared with the algorithm presented by Hart in [15], since it employs both the active and the reactive power to model the appliance working behaviour and it employs those electrical parameters for disaggregation. This section provides an overview of its basic operating principles as well as additional details on its implementation. In addition, the algorithm originally presented in [15] has been improved for handling the occurrence of multiple solutions by means of a MAP technique.

Hart’s algorithm models each appliance as a Finite State Machine (FSM). Each FSM is represented by the following parameters:

  • the number of states \( m \in {{\mathbb {Z}}}_{+}\);

  • the finite states x ∈{S 1, S 2, …, S m};

  • the symbols emitted \(\boldsymbol {\mu }_j \in {\mathbb {R}}^n\), where j = 1, …, m;

  • state transition matrix T ∈{0, 1}m×m.

As in the proposed approach, each state of the FSM corresponds to a working state of the appliance and n = 2, i.e., the symbol emitted in the j-th state is defined as μ j = [μ a,j μ r,j]T. A tolerance parameter β j = [β a,j β r,j]T is associated to the emitted symbol in the j-th state, in order to define the effectiveness interval for the emitted symbol. The interval width is 2 β j and it is centred in μ j. For each appliance, the quantities to be estimated are the number of states m, the values of μ j and β j for each state and the state transition matrix T.

In order to model the power consumption of an appliance as a stochastic process, under the assumption of multiple independent causes to the circuital power dissipation, the central limit theorem might be invoked. Therefore, the power consumption y (i)(t) of the i-th appliance at time instant t, related to the working state x (i)(t), can be modelled as a bivariate Gaussian variable, described by a mean vector \(\boldsymbol {\mu }_{x^{(i)}(t)} \) and a covariance matrix \(\boldsymbol {\varSigma }_{x^{(i)}(t)}\):

(4.68)

Following this approach, the consumption signal is replaced by a simplified model that represents a constant power consumption, corresponding to the mean value of the working state power value, with a superimposed noisy contribution, described by the variance value in the working state. Under the assumption of statistical independence between the active and reactive power components, the covariance matrix \(\boldsymbol {\varSigma }_{x^{(i)}(t)}\) is diagonal:

(4.69)

where \(\sigma ^{2}_{a,x^{(i)}(t)}\) and \(\sigma ^{2}_{r,x^{(i)}(t)}\) represent, respectively, the variance of the active and reactive power in the cluster. The inference procedure is carried out independently for the two components. Therefore, at each state,

(4.70)

The number of states m i is defined in the clustering phase, described in Sect. 4.1.1, assuming that each cluster corresponds to a state in the FSM model: the estimation of the mean and the variance values for each component is performed with the Maximum Likelihood criterion on the clusters data. Each component of the tolerance parameter β c,j, associated to the respective component of the emitted symbol μ c,j, is set equal to the standard deviation σ c,j of the Gaussian distribution.

Regarding the state transition matrix T, each entry T ij represents the admissibility of the transition from state i to state j, using the value T ij = 1 if the transition is allowed and T ij = 0 otherwise. This value is inferred from the ground truth state evolution of each appliance consumption. Since this model does not represent the evolution in time of a signal, the permanence in the state is not represented, therefore the variable T ii is set to 1. The diagram of the clustering and of the model training stage is shown in Fig. 4.14.

Fig. 4.14
figure 14

Block diagram of the clustering and of the model training stages of Hart’s algorithm

Since the aggregated data \({\overline {y}_c}(t)\) is assumed to correspond with the sum of the power consumption of each appliance, it can be modelled as a Gaussian variable, described by a mean value and a variance value equivalent to the sum of the corresponding values of each appliance, under the assumption of statistical independence among the appliances:

(4.71)

This variable represents the Probability Density Function (PDF) of the working states combinations and it allows to evaluate which combination of working states fits the power value for each sample of the aggregated data. The number of admissible combinations of working states is equal to \( \prod _{i=1}^{N}{m_{i}} \).

Following the same rule defined for each appliance symbol, the effectiveness interval for each combination is centred in mean value, and its width is twice the value of the standard deviation. For some combinations, which have similar mean value or great variance, the effectiveness intervals are overlapped: for those cases, if the power value falls in this region, both the combinations are considered valid.

The aggregate power data is analysed sample by sample: for each value, the effectiveness intervals in which the sample falls are selected. The related state combination might be admissible or not, depending on the previous state combination selected. Therefore, for each FSM, from the knowledge of the previous state selected, the admissible transition is evaluated through the transition matrix T ij: the FSMs which do not make any variation in the state from the previous combination are not evaluated, then if the transition is not admissible for at least one FSM, the selected combination is discarded. The starting combination is evaluated on the first sample, without the evaluation on the transition from any previous state. If no combination is admissible, the previous state is maintained for each FSM. If the aggregated data sample does not fall within any combination interval, the previous state is maintained for each FSM.

In this way, the time series of the state evolution is reconstructed for each FSM. The disaggregation consists in using the related power level consumption assigned to each state of the FSM, thus reconstructing the power consumption profile for each appliance. The general scheme of the disaggregation phase is shown in Fig. 4.15.

Fig. 4.15
figure 15

Diagram of the load disaggregation phase

In order to deal with the noise presence in the aggregated data, an FSM version of the noise model defined in Sect. 4.1.2 is considered, additionally to the FSM models representing the appliances.

In order to make a fair comparison of algorithms, representing an appliance, both kinds of model have the same number of states, values of power consumption and standard deviation of the gaussian variable. The values are resumed in Table 4.3.

Table 4.3 Number of states m i related to each class of appliance

In [15], the author did not describe the technique adopted for dealing with the occurrence of multiple solutions during the disaggregation phase. Two different approaches for dealing with the problem are adopted. The first consists in supposing that each combination of appliances is equally probable, thus the ambiguity is solved by choosing a random combination sampled from a uniform distribution. This algorithm will be denoted as “Hart” in the remainder of this book.

The second approach consists in adopting a MAP technique [58]: the posterior probability of each combination is calculated from the training data, and it is multiplied to the Gaussian PDF, resulting in the posterior PDF. The value of the posterior PDF in the aggregate data sample is denoted as the posterior likelihood. The combination with the higher posterior likelihood value is then chosen as the most probable combination. This alternative of Hart’s algorithm will be denoted as “Hart w/ MAP” in the remainder of this book.

The general scheme of the disaggregation phase is shown in Fig. 4.16. The algorithm is based on the work proposed by Kolter and Jaakkola [21], where the problem is modelled in the Additive Factorial Hidden Markov Model (AFHMM) framework.

Fig. 4.16
figure 16

Diagram of the load disaggregation phase

Basically, this consists in modelling the value of each aggregated power sample as a combination of working states of the appliances. In [21], an assumption is made that at most one HMM changes its state at any given time, which holds true if the sampling time is reasonably short. In this case, a transition on the aggregate power, when moving from a sample to the next, corresponds to a state change of a particular HMM. As a consequence, a differential signal can be modelled as the result of a Differential Factorial Hidden Markov Model (DFHMM), which relies on the same HMM models comprising the AFHMM. The DFHMM models the observation output as the difference between the states combination of the HMMs in two consecutive time instants. By combining the additive and differential models, the inference on the set of states of multiple HMMs can be computed through the Maximum A Posteriori (MAP) algorithm, which takes the form of a Mixed Integer Quadratic Programming (MIQP) optimization problem. One of the shortcomings of this approach is the non-convex nature of the problem, due to the integer nature of the variables: therefore, a relaxation towards real values is taken into account, which allows the solution to assume any value in the range [0, 1], instead of the binary solution, leading to a convex Quadratic Programming (QP) optimization problem.

In a real case scenario, the modelled output may not match with the observed aggregated signal, due to electrical noises, very small loads or leakages. In that case, the issue is addressed by defining a robust mixture component both in the AFHMM and in the DFHMM. This component is missing in this book, since all the contributions to the aggregated power are modelled. Indeed, each appliance and the noise is represented by its HMM.

The dataset used for the experiments is the Almanac of Minutely Power dataset (AMPds) [58]: it contains recordings of consumption profiles belonging to a single home in Canada for a period of 2 years, at 1 min sampling rate. Additionally to the aggregated power consumption, it provides active and reactive power at appliance level, unlike most of the dataset, in which the appliances consumption is described by the only active power, as showed in Sect. 2.3: this information is crucial in order to create the appliance models and test the new approach.

The experiments are conducted by using the six appliances which contribute the most to the power consumption: dryer, washing machine, dishwasher, fridge, electric oven and heat pump. Regarding the significance of the reactive components of the appliances taken into consideration, the following values have been extracted from the datasets: (128.25 W, 7.96 VAR) for the fridge, (4545.91 W, 413.75 VAR) and (248.11 W, 408.94 VAR) for the dryer, (909.11 W, 203.44 VAR), (531.10 W, 14.37 VAR),(146.80 W, 3.60 VAR) and (137.54 W, 96.47 VAR) for the washing machine, (753.07 W, 33.31 VAR), (137.96 W, 35.86 VAR) and (14.42 W, 52.55 VAR) for the dishwasher, (3187.67 W, 136.63 VAR),(125.68 W, 121.67 VAR) and (89.54 W, 50.62 VAR) for the electric oven, (1798.83 W, 320.95 VAR) and (37.23 W, 17.03 VAR) for the heat pump. As shown by these values, the appliances evaluated in the experiments have a significant contribution of reactive power that make them suitable for evaluating the performance of the proposed approach. Analysing the contents of the dataset, the usage of the appliances proves to be homogeneous throughout the entire period, therefore the experiments are evaluated on 6 months of data, which can be considered a representative of the entire dataset. A subset of the data, spanning over 14 days, has been considered sufficient to collect all the signatures required to train all the HMMs. This represents the training set in Fig. 4.5b.

Two different scenario are defined in this work, according to [87]. The noised scenario employs the aggregated power consumption in the dataset as the aggregated signal, therefore it includes the noise term. In this case, the training data used to create the noise model are obtained subtracting the ground truth consumption signals, related to the appliances of interest, from the aggregated power, whereas in the denoised scenario the aggregated data are synthetically composed by summing the ground truth appliance power signals in the dataset, determining the absence of the noise term.

The proposed approach and Hart’s algorithm are able to disaggregate both the active and the reactive power, however the performance metrics has been calculated on the active power only in order to compare it with the univariate formulation of AFAMAP. Furthermore, the active power is the physical quantity directly related to the cost in the bill, therefore it is the most relevant component to be analysed.

The frame size is set to T = 60 min, which is an interval sufficiently large to include a complete activation for the most of appliances under study. This value is considered within the windowing operation in Fig. 4.16, where the f-th frame is considered in the disaggregation. For the ones which have a longer activation, this value allows to include a complete operating subcycle, for which the HMM is still representative. The variance parameters are set to \(\sigma _{c,1}^2=\sigma _{c,2}^2=0.01 \) according to the variance of the experimental data, and the regularization parameter is set to λ = 1.

The algorithm has been implemented in Matlab and the CPLEXFootnote 1 solver has been used to solve the QP problem. The amount of time required to disaggregate a frame of 60 min on a personal computer equipped with an Intel i7 CPU running at 3.3 GHz and 32 GB of RAM is about 30 s. The performance is compared to the univariate formulation of AFAMAP and to Hart’s algorithm presented in Sect. 4.3.2. The tolerance parameter is set α = 10−6.

Table 4.3 presents the number of states, defined a-priori for each class of appliance. For appliances with similar consumption value in active power, different values of reactive power are associated: this phenomenon allows to reduce the number of state combination in the aggregate power, when passing from the univariate to the bivariate approach, improving the disaggregation performance.

The number of states in the noise model has been varied in the range {4, 6, 8, 10}, both in the univariate and bivariate approaches, in order to find the most performing model.

4.3.3 Results

In this section, the results of the experiments related to the denoised scenario will be shown. Since the aggregated power signal depends on which and how many appliances are considered, the experiments have been conducted by varying the number of appliances, in order to evaluate the disaggregation performance for different problem complexities. In particular, different test sets, each composed of every combination of N appliances have been created. For each test set, the total number of experiments is \({6}\choose {N}\), with N = 2, …, 5 and the final metrics are calculated averaging between the single experiments overall performance. Before calculating the final energy based F 1-Measure (\( F_1^{(E)} \)), the Precision (P (E)) and Recall (R (E)) are averaged between the experiments. Differently, the final NDE is the average between the single experiment value.

In Fig. 4.17, the disaggregated appliances active power (D) are compared to the corresponding ground truth (GT): in the figure, for each appliance, an adequate time span is considered, in order to evaluate the performance on a single or multiple activations. The bottom of the figure shows the comparison of the appliance contribution to the total energy in the aggregated signal, between the disaggregation outputs and the ground truth consumptions. The left side of the figure shows the disaggregation profiles resulting from the univariate formulation of the AFAMAP algorithm, the central shows the active power component resulting from the Hart’s algorithm, and the right side shows profiles related to the proposed approach (Table 4.4).

Fig. 4.17
figure 17figure 17figure 17

Algorithms comparison: AFAMAP vs Hart vs proposed approach. For each algorithm, the disaggregation output (D) is compared against the ground truth (GT) signals

Table 4.4 Performance improvement in the “6 appliances” case study (denoised scenario)

The overall disaggregation results are reported in Fig. 4.18, where the \( F_1^{(E)} \) is reported in Fig. 4.18a and the NDE in Fig. 4.18b. The values are related to Table 4.5, where the absolute improvements of the proposed approach with respect to the AFAMAP and the Hart’s algorithm are shown. The proposed approach reaches the best performance in each case study, with \( F_1^{(E)} \) of 87.0 and NDE equal to 0.209 in the 2 appliances case, and with \( F_1^{(E)} \) of 69.4 and NDE equal to 0.347 in the 6 appliances case, The proposed approach reaches the best performance in each case study, with \( F_1^{(E)} \) of 87.0 and NDE equal to 0.209 in the 2 appliances case, and with \( F_1^{(E)} \) of 69.4 and NDE equal to 0.347 in the 6 appliances case.

Table 4.5 Comparison of the disaggregation performance for different number of appliances (denoised scenario)
Fig. 4.18
figure 18

Disaggregation performance on AMPds dataset for all the addressed algorithms. (a) Comparison of the disaggregation performance in terms of \( F_1^{(E)} \) for different number of appliances. (b) Comparison of the disaggregation performance in terms of NDE for different number of appliances

The radar chart in Fig. 4.19 shows the \(F_1^{(E)}\) for each appliance. It refers to the experiment including all the 6 appliances, where the area of each coloured line is proportional to the \(F_1^{(E)}\) of the related algorithm, averaged across the appliances. The values are related to Table 4.4, where the absolute improvements of the proposed approach with respect to the AFAMAP and the Hart’s algorithm are shown.

Fig. 4.19
figure 19

Performance in terms of \( F_1^{(E)} \) (%) for the different appliances in the “6 appliances” case study: (a) denoised scenario, (b) noised scenario

As shown in the plots, the appliances presenting a high steady power consumption are easily recognized, whereas the appliances with complex working cycles, or with several power levels, are more difficult to detect. For instance, the dryer, the electric oven and the heat pump are successfully reconstructed, whereas the washing machine, the dishwasher and the fridge are partially erroneously reconstructed. Indeed, in the univariate formulation, whenever several appliances present similar consumption levels, many combinations may satisfy the problem constraints and the algorithm chooses an erroneous solution for disaggregation. Comparing the results with the proposed bivariate approach, the multiple combinations of the solution are reduced due to the component constraint to be satisfied by the algorithm, which leads to the correct solution and, consequently, to a better profile disaggregation of the active power component. For instance, although the appliances with higher power level maintain a successful disaggregation, the fridge and the dishwasher improve the correspondence with the ground truth signals. The washing machine partially improves the disaggregation performance in the activation period, whereas introduces some false energy assignation. The disaggregated profiles of Hart’s method show that, for some appliances, the FSM is a modelling technique which allows a better representation for the appliances with sharply defined steady states, e.g., the fridge and the heat pump, but a worse representation for appliances with highly variable activity, e.g., the electric oven.

The more confident are the disaggregated profiles with respect to the ground truth signal, the better is the estimation of the energy consumption percentage distribution among the appliances: indeed, for the proposed approach, the consumption distribution has a better correspondence with the ground truth ones, with respect to the AFAMAP algorithm. For instance, the disaggregated profiles related to the fridge results to be more confident, which reflects on the increase of the energy assignation, whereas the dishwasher and the electric oven ones results to have a false energy assignation during the OFF period, corresponding to a decrease of the related energy contributions. Regarding the washing machine, some errors are introduced, therefore the energy assignation is erroneously increased. Regarding the dryer and the heat pump the energy contributions are maintained, because of the correspondence between the algorithms disaggregation performance. In the Hart’s method, the improvements in the heat pump and the fridge are reflected on a better correspondence between the energy contributions, but the absence of the constraint between the aggregate power amount and the sum of the disaggregated profiles leads to an unassigned percentage of the total energy (represented as the grey portion).

Regarding the performance of the individual appliances, the major improvements with respect to AFAMAP are observed in the electric oven, the fridge and the dishwasher, with a relative increase of the \(F_1^{(E)}\) of + 88.2%, + 65.9% and + 28.6%, and a variation in the NDE of − 0.192, − 0.143, − 0.239, respectively. This is due to a more accurate correspondence between the disaggregated output and the ground truth, as already shown in the disaggregation output plots. On the contrary, the performance is almost unchanged for the washing machine, the dryer and the heat pump. With respect to the Hart’s algorithm, the proposed approach shows a high improvement additionally for the dryer, with an absolute increase of \(F_1^{(E)}\) equal + 64.3% and a variation in the NDE of − 0.569, whereas it shows a substantial loss for the fridge, with a decrease of \(F_1^{(E)}\) equal − 22.9% and a variation in the NDE of + 0.065. This demonstrates that the HMM modelling results more effective with a higher number of states. Since moving from the univariate to the bivariate model leads to a greater number of states, this also demonstrates the effectiveness of the proposed approach. Compared to the Hart’s algorithm with the MAP stage, the performance on each appliance reduces their gain, particularly for the dishwasher and the dryer, with a decrease of \(F_1^{(E)}\) equal to − 0.7% and an increase of + 3.9% and a variation in the NDE of − 0.322 and − 0.252 up to the heat pump, where a loss of performance is shown, with an absolute increase in the \(F_1^{(E)}\) of − 15.8% and a variation in the NDE of − 0.034. The washing machine remains the appliance with the worst disaggregation performance: the reason is the model complexity, since it is the appliance with the highest number of states, both in the univariate and bivariate representation. Observing the radar chart, the area under the curve related to the proposed approach is increased with respect to AFAMAP and Hart’s algorithm, resulting in an average performance improvement, whereas it is slightly higher with respect to the Hart’s algorithm version with the MAP stage. The average performance of the system increases, resulting in a relative improvement of \(F_1^{(E)}\) equal to + 14.9%, + 21.8% and + 2.5%, and a variation in the NDE of − 0.024, − 0.552, − 0.194 with respect to AFAMAP, the Hart’s algorithm and the version with MAP stage, respectively.

Concerning with the experiments for different number of appliances, the results show that, lowering the number of appliances, the performance improves in the FHMM-based algorithms, while in the Hart’s algorithm it reaches a peak with 4 appliances, after that the performance decreases. Regarding the Hart’s algorithm version with the MAP stage, the performance decreases gradually with a lower number of appliance.

Compared to AFAMAP and to Hart’s algorithm, the proposed approach provides a significant performance improvement also when the problem complexity is minimal, i.e., when the number of appliances is 2. The higher absolute increase from AFAMAP occurs with 6 appliances, whereas it decreases lowering the complexity of the problem: this demonstrates that the proposed approach resolves more ambiguities in the NILM solution when the number of combinations of working states is higher.

Regardless of the number of appliances, the performance of Hart’s algorithm is lower compared to the proposed approach, because of the less descriptive capabilities of the FSM appliance model with respect to the HMM one. The comparative evaluation with the Hart’s version with the MAP stage proves that, even if this approach exploits the information on the most probable solution in case of ambiguity, which is an ideal condition, the proposed approach reaches better performance. Furthermore, the proposed algorithm provides an optimum solution on a frame of T samples, which takes into account both the short-term and long-term dependencies of the signal. This differs in Hart’s algorithm that finds the solution by processing the aggregate signal sample-by-sample. For this method, the performance decreases reducing the number of the appliances: a motivation behind this phenomenon can reside in the fact that the MAP stage of the Hart’s algorithm chooses a solution with higher probability, but which results incorrect for the majority of the experiments, specially with few combinations.

In this section, the results of the experiments related to the noised scenario will be shown. Differently from the denoised scenario, the aggregated power signal does not vary with the appliances considered, therefore only the results with all the appliances will be shown. Regarding the number of states of the noise model, the experiments demonstrated that, for each approach, the best value is 4, except for the Hart’s algorithm with the MAP stage, for which the best results are reached with 10 states. For the sake of conciseness, only the results for the best configuration will be reported in this section.

The overall disaggregation results are reported in Fig. 4.18, on the last column, in order to make a comparative evaluation with the denoised scenario. The values are related to Table 4.6 on the Overall column, where the absolute improvements of the proposed approach with respect to the AFAMAP and the Hart’s algorithm are shown. The proposed approach reaches the best overall performances, with \(F_1^{(E)}\) of 54.1 and NDE equal to 0.504, despite the Hart’s algorithm version with the MAP stage showing a higher NDE value. This discordance will be motivated in the analysis. The radar chart in Fig. 4.19b shows the \(F_1^{(E)}\) for each appliance. The values are related to Table 4.6.

Table 4.6 Appliances performance improvement in the “6 appliances” case study (noised scenario)

Differently from the denoised scenario, the major improvement, with respect to AFAMAP, is observed for the dryer, with an \(F_1^{(E)}\) relative improvement of + 35.9%, and a variation in the NDE of − 0.033, whereas the improvements are reduced for the remaining appliances. This proves the effectiveness of the transition from the univariate to the bivariate formulation of the problem, even in the presence of noise.

With respect to Hart’s algorithm, the proposed approach shows a higher improvement for the dryer, the dishwasher and the heat pump with an improvement of + 131.7%, + 228.2%, + 71.7%, and a variation in the NDE of − 0.610, − 0.884, − 0.462. Differently, Hart’s algorithm with the MAP stage achieves a higher \(F_1^{(E)}\), and the relative difference of \(F_1^{(E)}\) for the heat pump, the electric oven and the dryer is − 19.4%, − 12.2%, − 6.8%, while in terms of NDE the difference is + 0.036, − 0.044, + 0.019. This demonstrates that the HMM modelling leads to performance improvements with respect to the FSM modelling even in the presence of noise, but considering the MAP stage this improvement is substantially reduced. The washing machine is still the appliance with the worst disaggregation performance, following the trend of the denoised scenario. Observing the radar chart, the area under the curve related to the proposed approach is increased with respect to AFAMAP and Hart’s algorithm, resulting in an average performance improvement, whereas it is comparable with respect to the Hart’s algorithm version with the MAP stage, due to unbalancing between the appliances.

The average performance of the system increases, resulting in an \(F_1^{(E)}\) absolute improvement of + 25.5%, + 51.1% and + 6.7%, and a variation in the NDE of − 0.155, − 0.533, + 0.040 with respect to AFAMAP, the Hart’s algorithm and the version with MAP stage, respectively.

Comparing those results to the denoised scenario ones, the overall performance is lower, due to the introduction of the noise contribution in the aggregated power, except for the Hart’s algorithm with the MAP stage: despite the \(F_1^{(E)}\) showing a degradation of performance, the NDE decreases, meaning that this version of the algorithm maintains the trend showed with the increase of the number of appliances. In fact, the noised scenario can be defined as the denoised scenario using the noise model additionally to the appliances models, therefore the MAP stage introduces additional advantages, leading to a performance improvement. The MAP stage exploits additional information which are not introduced within the AFHMM, but represents an almost ideal FSM based case study.

4.4 Footprint Extraction Procedure

Among different NILM approaches, the supervised ones reach better performance [52, 55], that is, the resulting disaggregated signals have a better correspondence with the true appliance energy consumption. Therefore, those methods results to be more reliable for the final user.

The supervised section in the NILM algorithms corresponds to the appliance modelling stage, as showed in Fig. 4.20b, where the training phase is carried out. A model is created starting from the appliance level consumption (e.g., training set), in order to represent each appliance in a parametric way, and its parameters are used in the NILM algorithm in order to disaggregate the portion of the aggregated power consumption related to each appliance, as represented in Fig. 4.20c.

Fig. 4.20
figure 20

The supervised NILM chain. (a) The footprint extraction stage. (b) The appliance modelling stage. (c) The disaggregation algorithm stage

The power consumption profile of an appliance can be depicted as the repeating of a working cycle, alternated by time intervals when the appliance is turned off. The repetition rate, related to the length of the off-intervals, depends on the user consumption habit.

Therefore, in order to analyze the consumption features of an appliance, it is sufficient to extract the working cycle in the appliance level consumption, defined as the footprint, and to exploit it as training set in the appliance modelling stage.

This stage of the supervised NILM chain is named footprint extraction, as showed in Fig. 4.20a.

In literature, different approaches have been proposed to extract the appliance working cycle features from the aggregated data. An unsupervised method, based on spectral clustering, is proposed in [21]: the most different activation occurrences, which can be denoted in the aggregated power, are saved; then, they are grouped between the most similar, using the clustering technique. A bayesian approach is used in [18, 19]: a generic bayesian model for the appliance category is defined; then, it is fitted on the activation within the aggregated power, using a threshold schema on the likelihood function. Most of those approaches have limitations, concerning the aggregated power, where the appliance activation can be overlapped and it can cause trouble in the extraction phase.

To overcome this, in a real scenario, the user interaction with the system can be considered, in order to improve the reliability of the footprint extraction: in those cases, the user needs a facilitated procedure to determinate the appliance activation instant and an easy way to interact with the energy monitoring system. Therefore, in this work a user-aided footprint extraction procedure is proposed.

The easiest way to extract the footprint from the aggregated power is to use the appliance alone, turning off all the other devices in the electrical network, as described in [15]. This approach results to be the more reliable for the user, thus it is adopted in the presented work.

The appliance modelling stage employs the footprint, in order to represent the appliance consumption behaviour: despite several works dealing with model for the classification, such as SVM, k-NN [36] or deep neural networks [31], the hidden Markov Model (HMM) is a widespread modelling technique [17, 22, 28], since it is able to represent the behaviour of the appliance in working states and to regulate the transition with a probability value. This representation is close to the real appliance mode of operation, where each working state corresponds to a power consumption value.

In this work, the disaggregation algorithm is based on HMM, in particular the AFAMAP (Additive Factorial Approximate Maximum a Posteriori) algorithm [21] is used.

The unavailability of the appliance level consumption, for extracting the footprint, represents one of the main issues in the NILM supervised approach. In real scenarios, only the aggregated power consumption is available to the user. Therefore, the footprint extraction stage aims to extract the appliance footprint from the aggregated power: this work aims to investigate the performance of a footprint extraction procedure based on the HMM and AFAMAP algorithm.

A working cycle of an appliance is the interval between the power on and the power off by the user. In this time interval, the appliance power consumption signal is defined as footprint. Some examples of footprint taken from the ECO dataset [57] are shown in Fig. 4.21, that reports the power consumption traces recorded from the appliances located inside different Swiss households.

Fig. 4.21
figure 21

Alike and different footprints for the same appliance, in ECO. (a) Dryer, household 1. (b) Dishwasher, household 2

The usage of an appliance differs every time, especially in the case of equipments with different usage modes: e.g., the operating cycles of a washing machine can be set in a different way each time, or the operation of the dishwasher may vary according to the selected rinsing cycle. The different usage mode of the same appliance reflects on different footprint, as shown in Fig. 4.21b: the power levels in the two footprint of the dishwasher are the same, but they appear in different orders, which demonstrate that the working state comprising the appliance working cycle is unique, but they are employed in different orders, based on the user habits. Therefore, it is necessary to record different occurrence of the appliance footprint, in order to explore the different user habits in the appliance usage.

On the other hand, this aspect is not significant for appliances with easier working principle and a less complex circuit composition. In this case, the usage pattern of the appliance cannot be different in times, thus the footprint appears to be similar in each occurrence, as shown in Fig. 4.21a: the footprint of the dryer follows the same trend in time, which demonstrates the unique working cycle of the appliance and the unique way of usage by the user.

The footprint extraction is a necessary step in supervised NILM algorithms. In this context, the user exploits the aggregated power sensing system. An easy method to record the appliance footprint is to switch off all the appliances in the household and to turn on only the appliance of interest [15]. In this way, the aggregated power consumption corresponds to the appliance one.

The appliance switch on and off are detected by using a threshold schema on the active power consumption: when the value exceeds a threshold, the current is flowing in the circuit and the appliance is turned on, whereas when the value is below, the appliance is turned off. A threshold equal to the value of 50 W is a good choice for most datasets, nevertheless this value depends on the type of appliance and the activation power consumption. The samples between those two events are saved as the power consumption data related to the footprint. Multiple usages of the same appliance define different occurrences of the footprint.

In a household not all appliances can be turned off, e.g., the fridge and the freezer have to be continuously powered in order to maintain the food inside in safe condition. As shown in Fig. 4.22a, b, their power consumption is continuous in time, with a periodic working cycle. In this scenario, the aggregated consumption presents a continuous component, resulting from the sum of the fridge and freezer consumption, as shown in Fig. 4.22c. This signal can be modelled as the consumption of a unique model, representing the combination fridge-freezer as a composed appliance.

Fig. 4.22
figure 22

Power consumption of continuously turned on appliances, in ECO. (a) Fridge, household 1. (b) Freezer, household 1. (c) Fridge-freezer combination, household 1

The presence of this component in the aggregated power does not allow to acquire a clean footprint of the appliance of interest, since all the appliances power signals are summed up on the aggregated power. Therefore, the footprint results to be corrupted and a procedure to clean it is needed.

In order to clean a corrupted footprint, a procedure to separate the fridge-freezer consumption from the appliance footprint one is needed.

The fridge-freezer contribution can be recorded on the aggregated power turning off all the other appliances in the household: in this way, the characterization of the fridge-freezer combination is not afflicted by noise or other appliances consumption, thus the extracted model results to be highly reliable and accurate.

The steps to be followed are the following:

  1. 1.

    the consumption of the fridge-freezer combination is recorded, in an adequate span of time to collect enough data for the modelling;

  2. 2.

    a corrupted version of the appliance of interest footprint is acquired;

  3. 3.

    the extraction procedure is applied to the recorded footprint, using the a priori knowledge of the fridge-freezer model and a generic model of the appliance.

The process of signal separation can be interpreted as a disaggregation problem with 2 sources: therefore, the same NILM algorithm, which is executed after the footprint extraction and the appliance modelling step, can be exploited for the footprint extraction step as well. In order to obtain the disaggregated traces, the NILM algorithm requires both the model of the fridge-freezer combination and of the appliance of interest. The first one is available, whereas the appliance model is not available, because the footprint extraction step precedes the appliance modelling step. Therefore, it is necessary to provide a generic model, which represents the class related to the appliance of interest, and which is suitably fitted on the specific appliance features, e.g., a priori knowledge of the maximum power consumption, in order to represent it as good as possible. This procedure introduces an uncertainty in the appliance modelling stage, which might be the cause of the error in the footprint extraction stage.

In this work, the NILM algorithm chosen for the disaggregation step is the AFAMAP proposed by Kolter and Jaakkola [21]: the algorithm requires the HMM of each appliance that contributes to the aggregated power signal.

From the analysis carried out in Sect. 4.4, the availability of the HMM of both the fridge-freezer combination and the appliance of interest is necessary. The first one is obtained from the corresponding consumption recorded, thus it is a model with high reliability: as showed in Fig. 4.22c, it is a model with 4 working states, derived from the composition of the 2 working states of the fridge and the freezer, whereas, for the appliance of interest, the model is not available, since it is derived after the footprint extraction step. Therefore, a generic HMM is exploited: it is obtained from a reference dataset, under the assumption that all the appliances of the same category act in the same way, while passing from a working state to another, so that the transition probability matrix results the same for each appliance in the category. Furthermore, it is assumed that the number of the working states is the same for all the appliances of the same category, since the working cycle of the appliance type observed in the footprint: therefore, the number of states is defined a priori for the appliance type, such as described in Table 4.3. In this approach, the univariate modelling case of the appliances consumption behaviour is considered.

For the appliances with a number of working states greater than 2, it is assumed that the consumption values are proportional to each other: therefore, the consumption values in the model are scaled based on the nominal (maximum) value, which is given a priori to the algorithm.

In this way, the HMM represents the appliance as good as possible, omitting the approximation on the consumption values of the middle working state and the approximation on the transition probability matrix.

After the AFAMAP algorithm execution, two disaggregated consumption profiles are obtained: the appliance one corresponds to the extracted footprint. Starting from this, the HMM representing the appliance is created, which is used in the disaggregation algorithm to solve the NILM problem.

In order to reach a good generalization in the HMM creation, the availability of different appliance footprints is necessary, as described in Sect. 4.4: this process allows to mitigate the errors introduced in the footprint extraction phase. A suggested value of occurrences to record is in the order of 10.

In Fig. 4.23 the flowchart of the footprint extraction algorithm is depicted. The diagram is composed of two sections: in the left one, the contribution of the fridge-freezer combination is recorded, from which the HMM is obtained; in the right one, the appliance activations are recorded, to obtain the footprint and the related HMM. This procedure is repeated for each appliance footprint recorded, which needs to be extracted.

Fig. 4.23
figure 23

Footprint extraction algorithm flowchart

4.4.1 Experimental Setup

The experiments have been conducted using different datasets: the first one for the generic model extraction, and the second one for testing the footprint extraction algorithm. The disaggregation experiments have been conducted on the same dataset, to evaluate the effectiveness of the footprint extraction algorithm, compared to the use of the true appliance level consumption, to create the appliance model.

The general model has been extracted using the AMPds dataset [58]. The experiments on footprint extraction and disaggregation are conducted on the ECO dataset [57], considering the households 1 and 2, whose appliances are:

  • household 1: dryer, washing machine;

  • household 2: dishwasher, oven.

The experiments include the fridge-freezer combination, present in each household.

4.4.2 Results

Figure 4.24 shows two examples of extracted footprints, compared to the original ones. In both cases, a good correspondence between the temporal trends can be noticed, which denotes that the model representing the fridge-freezer combination has a high reliability and it allows to extract the appliance footprint contribution in a suitable way. However, for several portions of the footprint, the correspondence with the power level is not correct: this might be due to the incorrect power levels of the general model, which are obtained from a scaling operation with respect to the nominal consumption value. Indeed, the error is introduced in the middle power levels, while for the maximum power level the correspondence is exact. In the entire process, the uncertainty introduced from the disaggregation algorithm, used to separate the footprint from the consumption of the fridge-freezer combination, needs to be considered.

Fig. 4.24
figure 24

Comparison between the true and the extracted footprint for some appliances. (a) Washing machine in ECO, household 1. (b) Dishwasher in ECO, household 2

The experiments have been conducted on a portion of 30 days of the ECO dataset. To evaluate the effectiveness of the footprint extraction procedure, the disaggregation results have been evaluated using:

  • the models created by using the appliance level consumption, available in the dataset (true footprint);

  • the models created by using the extracted footprint, following the procedure described in Sect. 4.4.

The disaggregation results have been evaluated using the Precision (P) and Recall (R) metrics, defined in Sect. 2.4 in state and energy based sense. To compare the performance of the entire disaggregation system, the F-score (F 1) metric averaged across the appliances (Overall) has been used.

The parameters used in the AFAMAP algorithm were the same employed in Sect. 4.2. The disaggregation window parameter has been set T = 60 min.

The disaggregation results are showed in Tables 4.7 and 4.8. For both metrics, the algorithms achieve good performance: the best results are reached in the household 2 experiment, with an \(F_1^{(S)}\) of 0.898 and \(F_1^{(E)}\) of 0.956. This is due to the relatively simple problem studied in those cases: a disaggregation problem with only 3 appliances, with highly distinguishable values of power consumption, reveals to be solvable with high accuracy. The experiments in Table 4.8 show a better performance with respect to Table 4.7: the reason is the appliances footprints and the resulting HMMs composition. Indeed, the second problem is composed of models with a lower number of states (e.g., 3 states for the dishwasher, 3 states for the oven, with respect to the 3 states for the dryer and 4 states for the washing machine), thus the disaggregation problem results to be simpler in the resolution, and the overall performance reaches higher values. This trend was already introduced from the author of the disaggregation algorithm [21], who shows that the higher is the number of states related to the HMM, the higher is the complexity of the problem definition, and lower is the disaggregation performance due to the more difficult resolution. Regarding the first problem, the fridge-freezer combination has the consumption values close to the dryer ones, which leads to an ambiguity during the problem resolution and a lower performance for the total problem. In general, the appliance with the better performance is the one with the higher power consumption value: for the first problem the washing machine, for the second one the oven.

Table 4.7 Disaggregation performance in ECO, household 1
Table 4.8 Disaggregation performance in ECO, household 2

In both experiments the results corresponding to the true footprint show higher performance with respect to the extracted footprints ones: it means that the footprint extraction procedure introduces an error in the appliance modelling stage, which results in an error during the disaggregation algorithm resolution. Nevertheless, the results of the extracted footprint experiments show performance with an admissible relative loss: for the household 1 experiment, the relative loss results of 3.83% in state based sense, and 2.87% in energy based sense, while for the household 2 experiment, it results of 1.89% in state based sense, and 1.05% in energy based sense.

In conclusion, the models obtained after the footprint extraction procedure show a good correspondence with the original ones, which means that the footprint extraction is sufficiently reliable. Therefore, the footprint extraction algorithm introduced in this work provides a convenient procedure to the user for modelling the appliance at the cost of an acceptable loss in disaggregation performance.