8.1 Introduction

In modern smart building, the temperature measurement is a key step for smart temperature management implemented by a cyber-physical system (CPS) [1, 2]. CPS is a complex, heterogeneous distributed system with seamlessly integrated and closely interacted cyber components (e.g., sensors, sink nodes, control centers, and actuators) and physical processes (e.g., temperature) [3]. As shown in Fig. 8.1, the physical world is sensed by corresponding sensors and the acquired data is sent to a sink node or control center. Then the sink node or control center will send an instruction to actuators to control the physical world after the data is analyzed. In smart building, the in-building temperatures are monitored by several spatially distributed and immovable temperature sensors.

Fig. 8.1
figure 1

CPS in modern smart building

Although advanced technologies in the semiconductor industry and micro-electromechanical systems are developed in recent years, in practice, sensors outputs exist errors, which are one of the major barriers to the use of sensor networks. There are three main types of errors: gain, drift, and noise [4]. Compared with gain and noise, the sensor drift is considered with vital importance since it has significantly negative effect on measurement accuracy [5]. Although sensors with high accuracy can be deployed, these sensors always have expensive price. As shown in Fig. 8.2a, the temperature sensor AD590JH with ± 0.5 °C accuracy is sold at more than tenfold price of TMP100 with ± 2 °C accuracy.

Fig. 8.2
figure 2

(a) Comparison of different temperature sensors; (b) Sensor MCP9509; (c) Sensor LM335A

Sensor drift calibration has been studied in many literatures. Without further assumption, calibration cannot be performed. In [7,8,9], at most one sensor is assumed to have an unknown drift, which is estimated by Kalman filter. In practice, this assumption is hard to satisfy. Therefore, the calibration problem is naturally studied extensively to be a sparse reconstruction problem, where a sparse set of sensors are assumed to have significant drifts. These drift calibration works mainly depend on subspace prior, which is first proposed by Balzano and Nowak to perform calibration when variational sources are over-sampled by sensors [10]. The projection matrix is obtained by singular value decomposition (SVD) [10, 11]. In [11], Wang et.al. adopt temporal sparse Bayesian learning (TSBL) [12] to calibrate time-variant and incremental drifts for the sparse set of sensors. However, due to the sparsity assumption, not all sensors can be calibrated. In addition, since observation matrix is directly determined by drift-free measurement, the method cannot calibrate drifts if signals lie in time-variant subspace.

Very recently, in order to calibrate all sensors, Ling and Strohmer presented three models, which are formulated as bilinear inverse problems [13]. However, these models heavily rely on the partial information about the sensing matrix. For the temperature sensor calibration in a smart building, the sensing matrix depends on weather, the position of sensors, and parameters of the building, e.g., material characteristics, geometry, and equipment power per area [1, 2, 14]. In practice, it is hard to obtain these complex and tedious information. As a result, these models cannot be directly used to calibrate temperature sensors in a smart building.

In this paper, we focus on the temperature sensor drift calibration. Several low-cost sensors with low accuracy are deployed to sense in-building temperatures (see Fig. 8.2b,c). Unlike traditional arts, we build a sensor spatial correlation model whose coefficients only depend on measurements, and we assume that all sensors have drifts. Our model coefficients are optimally determined by statistically extracting prior information from drift-free measurement model coefficients and maximum-a-posteriori (MAP) estimation. As a result, our proposed sensor drift calibration framework allows that the signals lie in time-variant subspace. MAP estimation is formulated as a non-convex problem with three hyper-parameters. We propose an alternating-based optimization algorithm to handle the non-convex formulation. Cross-validation and expectation-maximization (EM) with Gibbs sampling are adopted to determine hyper-parameters, respectively.

Experimental results show that on benchmarks simulated from EnergyPlus, compared with state-of-the-art method, the proposed framework with EM can achieve a better trade-off between accuracy and runtime.

The rest of this paper is organized as follows. In Sect. 8.2, we provide a problem formulation about sensor drift calibration and broadly introduce our proposed whole flow. In Sect. 8.3, we build a drift calibration model based on sensor spatial correlation and deliver mathematical formulation with three hyper-parameters. In Sect. 8.4, we propose a more efficient method to handle the mathematical formulation. In Sect. 8.5, three hyper-parameters are determined by cross-validation and EM with Gibbs sampling, respectively. Section 8.6 presents experimental results with comparison and discussion, followed by conclusion in Sect. 8.7.

8.2 Preliminary

8.2.1 Problem Formulation

Several low-cost sensors are deployed to sense in-building temperatures. Due to a slow-aging effect, all sensors have unknown time-invariant drifts. As shown in Fig. 8.3, unlike communication channels [12], for a sensor signal to be output, e.g., current, it is contaminated by a time-invariant drift. In order to achieve high-accurate measurements, drifts need to be estimated and calibrated. Specifically, the mean absolute percent error (MAPE) is used to evaluate drift calibration accuracy.

Fig. 8.3
figure 3

Drift vs. temperature [15]

Based on the above description, we define the sensor drift calibration problem as follows.

Problem 1 (Sensor Drift Calibration)

Given the measurement values sensed by all sensors during several time-instants, drifts will be accurately estimated and calibrated.

8.2.2 Overall Flow

The overall flow of our proposed sensor drift calibration is shown in Fig. 8.4, which consists of three parts: model optimization, cross-validation, and EM with Gibbs sampling.

Fig. 8.4
figure 4

The proposed sensor drift calibration flow

After drift-free measurements model coefficients and several temperature measurements with drifts are input, an alternating-based optimization algorithm is proposed to handle sensor drift calibration formulation in model optimization. In addition, cross-validation and EM with Gibbs sampling are adopted to induce hyper-parameters, respectively. By using the proposed sensor drift calibration, it is expected to accurately calibrate sensor drifts.

8.3 Mathematical Formulation

We assume that n sensors are deployed to sense in-building temperatures. During a short time after new sensors are deployed, the drift is assumed to be insignificant. Furthermore, like [11], we assume all sensors are drift-free during m 0 initial time-instants. Due to over-sampling, as illustrated in [10, 11], signals measured by sensors lie in a low dimensional subspace. Furthermore, in a smart building, all actual temperatures measured by sensors have a high correlation, for example, the dense deployment of sensors. Therefore, we build a linear model among all actual temperatures as follows:

$$\displaystyle \begin{aligned} x_{i}^{(k)}\approx\sum_{j=1,j\neq i}^{n}a_{i,j}x_{j}^{(k)}+a_{i,0}, \qquad k=1,2,\ldots,m_{0}, {} \end{aligned} $$
(8.1)

where \(x_{i}^{(k)}\) is the ground-truth temperature sensed by ith sensor at kth time-instant. a i,j is the drift-free model coefficient. We define \(\mathbf {x}=[x_{1}^{(1)},x_{2}^{(1)},\ldots , x_{n}^{(1)},\) \(\ldots ,x_{n}^{(m_{0})}]^{\top }\), a i = [a i,0, …, a i,i−1, a i,i+1, \(\ldots ,a_{i,n}]^{\top }\in \mathbb {R}^{n}\), \(\mathbf {a}=[{\mathbf {a}}_{1}^{\top },{\mathbf {a}}_{2}^{\top },\ldots ,\) \({\mathbf {a}}_{n}^{\top }]^{\top }\in \mathbb {R}^{n^2}\).

Due to a slow-aging effect, all sensors have unknown time-invariant drifts. During m time-instants, Eq. (8.1) is naturally extended as

$$\displaystyle \begin{aligned} \hat{x}_{i}^{(k)}+\epsilon_{i}\approx\sum_{j=1,j\neq i}^{n}\hat{a}_{i,j}\left(\hat{x}_{j}^{(k)}+\epsilon_{j}\right)+\hat{a}_{i,0}, \qquad k=1,2,\ldots,m, {} \end{aligned} $$
(8.2)

where \(\hat {x}_{i}^{(k)}\) is the measurement value sensed by ith sensor at kth time-instant. In particular, in order to obtain enough information, we assume m 0, m > n. For ith sensor, 𝜖 i is a time-invariant drift calibration, which is independent of time-instant k. \(\hat {a}_{i,j}\) is the model coefficient when all sensors have unknown time-invariant drifts. We vectorize these variables as \({\hat {\mathbf {x}}}=[\hat {x}_{1}^{(1)},\hat {x}_{2}^{(1)},\ldots ,\ldots ,\hat {x}_{n}^{(m)}]^{\top }\), \({\hat {\mathbf {a}}}_{i}=[\hat {a}_{i,0},\ldots ,\) \(\hat {a}_{i,i-1},\hat {a}_{i,i+1},\) \(\ldots ,\hat {a}_{i,n}]^{\top }\) \(\in \mathbb {R}^{n}\), \({\hat {\mathbf {a}}}=[{\hat {\mathbf {a}}}_{1}^{\top },{\hat {{\mathbf {a}}}}_{2}^{\top },\ldots ,\) \({\hat {\mathbf {a}}}_{n}^{\top }]^{\top }\) \(\in \mathbb {R}^{n^2}\), and \(\boldsymbol {\epsilon }=[\epsilon _{1},\epsilon _{2},\ldots ,\epsilon _{n}]^{\top }\in \mathbb {R}^{n}\).

Note that Eq. (8.2) is essential in our proposed sensor spatial correlation model. Furthermore, the model error in Eq. (8.2) is assumed to follow identical independent Gaussian distribution with zero-mean and unknown precision (inverse variance) δ 0. Therefore, the likelihood function \(\mathcal {P}(\hat {\mathbf {x}}|\hat {\mathbf {a}},\boldsymbol {\epsilon })\) is defined as follows:

$$\displaystyle \begin{aligned} \mathcal{P}\left(\hat{\mathbf{x}}|\hat{\mathbf{a}},\boldsymbol{\epsilon}\right) \propto \mathrm{exp}\left(-\frac{\delta_{0}}{2}\sum_{i=1}^{n}\sum_{k=1}^{m}\left[\hat{x}_{i}^{(k)}+\epsilon_{i} -\sum_{j=1,j\neq i}^{n}\hat{a}_{i,j}\left(\hat{x}_{j}^{(k)}+\epsilon_{j}\right)-\hat{a}_{i,0}\right]^{2}\right). {} \end{aligned} $$
(8.3)

However, the likelihood function \(\mathcal {P}(\hat {\mathbf {x}}|\hat {\mathbf {a}},\boldsymbol {\epsilon })\) cannot be directly used to calibrate drifts using maximum-likelihood-estimation (MLE) since it has not enough information. Therefore, we need to give two priors in development.

For all sensors, drifts are assumed to follow identical independent Gaussian distribution with zero-mean and unknown precision δ 𝜖 as follows:

$$\displaystyle \begin{aligned} \mathcal{P}(\boldsymbol{\epsilon}) \propto \mathrm{exp}\left(-\frac{\delta_{\boldsymbol{\epsilon}}}{2}\sum_{i=1}^{n}\epsilon_{i}^{2}\right). {} \end{aligned} $$
(8.4)

In addition, we assume that the model coefficient \(\hat {a}_{i,j}\) follows identical independent Gaussian distribution. Intuitively, \(\hat {a}_{i,j}\) has high dependency on a i,j in statistics. Furthermore, the probability density function of \(\hat {a}_{i,j}\) is assumed to take a maximum value at a i,j. Therefore, the prior mean of \(\hat {a}_{i,j}\) is a i,j. In addition, in order that each model coefficient \(\hat {a}_{i,j}\) is provided with a relatively equal probability to deviate from the corresponding drift-free model coefficient a i,j, the precision of model coefficient \(\hat {a}_{i,j}\) is defined to be \(\lambda a_{i,j}^{-2}\), where λ is a nonnegative hyper-parameter to control the precision of \(\hat {a}_{i,j}\). Therefore, each model coefficient \(\hat {a}_{i,j}\) follows identical independent Gaussian distribution with a i,j mean and \(\lambda a_{i,j}^{-2}\) precision [16,17,18]. For all model coefficients, we have

$$\displaystyle \begin{aligned} \begin{aligned} \mathcal{P}(\hat{\mathbf{a}}) \propto\mathrm{exp}\left(-\sum_{i=1}^{n}\sum_{j=0,j\neq i}^{n}\frac{\lambda}{2a_{i,j}^{2}}\left(\hat{a}_{i,j}-a_{i,j}\right)^{2}\right). {} \end{aligned} \end{aligned} $$
(8.5)

In order to calibrate drifts for all sensors, the posterior \(\mathcal {P}(\hat {\mathbf {a}},\boldsymbol {\epsilon }|\hat {\mathbf {x}})\) needs to be maximized in MAP estimation manner. According to Bayes’ rule, the posterior \(\mathcal {P}(\hat {\mathbf {a}},\boldsymbol {\epsilon }|\hat {\mathbf {x}})\) can be expressed by two priors and the likelihood function as follows:

$$\displaystyle \begin{aligned} \begin{aligned} \mathcal{P}(\hat{\mathbf{a}},\boldsymbol{\epsilon}|\hat{\mathbf{x}}) \propto\mathcal{P}(\hat{\mathbf{x}}|\hat{\mathbf{a}},\boldsymbol{\epsilon})\cdot\mathcal{P}(\hat{\mathbf{a}})\cdot\mathcal{P}(\boldsymbol{\epsilon}). {} \end{aligned} \end{aligned} $$
(8.6)

Taking the logarithm, MAP can be transferred to the equivalent formulation as follows:

$$\displaystyle \begin{aligned} \mathop{\mbox{min}}\limits_{\hat{\mathbf{a}},\boldsymbol{\epsilon}} \quad &\delta_{0}\sum_{i=1}^{n}\sum_{k=1}^{m}\left[\hat{x}^{(k)}_{i}+\epsilon_{i}-\sum_{j=1,j\neq i}^{n}\hat{a}_{i,j}\left(\hat{x}^{(k)}_{j}+\epsilon_{j}\right)-\hat{a}_{i,0}\right]^{2} \\ &\quad + \lambda\sum_{i=1}^{n}\sum_{j=0,j\neq i}^{n}\frac{1}{a_{i,j}^{2}}\left(\hat{a}_{i,j}-a_{i,j}\right)^{2}+\delta_{\boldsymbol{\epsilon}}\sum_{i=1}^{n}\epsilon_{i}^{2}. \end{aligned} $$
(8.7)

There are two challenges for Formulation (8.7): how to handle Formulation (8.7) and how to induce hyper-parameters λ, δ 0, and δ 𝜖.

8.4 Alternating-Based Optimization

Formulation (8.7) is a non-convex problem; thus, it is difficult to obtain an optimal solution. In this section, we propose a fast and efficient alternating-based optimization methodology to handle Formulation (8.7) by alternatively updating in each iteration.

According to the alternating-based methodology, at each iteration, the values of \({\hat {\mathbf {a}}}\) and 𝜖 are updated by optimizing Formulation (8.7) w.r.t. \({\hat {\mathbf {a}}}\) and 𝜖. Furthermore, note that with fixed drift calibration variable 𝜖, Formulation (8.7) w.r.t. \(\hat {\mathbf {a}}\) is regarded as a convex unconstrained quadratic programming (QP) problem. In addition, Formulation (8.7) w.r.t. \({\hat {\mathbf {a}}}\) can be decomposed into n independent sub-formulations w.r.t. \({\hat {\mathbf {a}}}_{i}\) as follows:

$$\displaystyle \begin{aligned} \mathop{\mbox{min}}\limits_{\hat{\mathbf{a}}_{i}} \quad &\delta_{0}\sum_{k=1}^{m}\left[\hat{x}^{(k)}_{i}+\epsilon_{i}-\sum_{j=1,j\neq i}^{n}\hat{a}_{i,j}\left(\hat{x}^{(k)}_{j} +\epsilon_{j}\right)-\hat{a}_{i,0}\right]^{2} \\ &\quad + \lambda\sum_{j=0,j\neq i}^{n}\frac{1}{a_{i,j}^{2}}\left(\hat{a}_{i,j}-a_{i,j}\right)^{2}, \end{aligned} $$
(8.8)

with the first-order optimality condition:

$$\displaystyle \begin{aligned} \delta_{0}\sum_{k=1}^{m}\left(\hat{x}_{t}^{(k)}+\epsilon_{t}\right)\left[\sum_{j=1}^{n}\hat{a}_{i,j}\left(\hat{x}_{j}^{(k)}+\epsilon_{j}\right)+\hat{a}_{i,0}\right]+\lambda\frac{\left(\hat{a}_{i,t}-a_{i,t}\right)}{a_{i,t}^{2}}=0, {} \end{aligned} $$
(8.9)

where t = 0, 1, …, i − 1, i + 1, …, n. In particular, we define \(\hat {a}_{i,i}\triangleq -1\) and \(\hat {x}_{0}^{(k)}+\epsilon _{0}\triangleq 1\). The system of linear equations (8.9) can be handled by Gaussian elimination [19].

In the same manner, with fixed model coefficients \(\hat {\mathbf {a}}\), Formulation (8.7) w.r.t. the drift calibration 𝜖 can also be regarded to be a convex unconstrained QP problem as follows:

$$\displaystyle \begin{aligned} \mathop{\mbox{min}}\limits_{\boldsymbol{\epsilon}} \quad &\delta_{0}\sum_{i=1}^{n}\sum_{k=1}^{m}\left[\hat{x}^{(k)}_{i}+\epsilon_{i}-\sum_{j=1,j\neq i}^{n}\hat{a}_{i,j}\left(\hat{x}^{(k)}_{j}+\epsilon_{j}\right)-\hat{a}_{i,0}\right]^{2} +\delta_{\boldsymbol{\epsilon}}\sum_{i=1}^{n}\epsilon_{i}^{2}, \end{aligned} $$
(8.10)

with the corresponding first-order optimality condition:

$$\displaystyle \begin{aligned} \delta_{0}\sum_{i=1}^{n}\sum_{k=1}^{m}\left[\hat{a}_{i,t}\left(\sum_{j=1}^{n}\hat{a}_{i,j}\left(\hat{x}_{j}^{(k)}+\epsilon_{j}\right)+\hat{a}_{i,0}\right)\right]+\delta_{\boldsymbol{\epsilon}}\epsilon_{t}=0, {} \end{aligned} $$
(8.11)

where t = 1, 2, …, n.

Algorithm 1 Alternating-based method

A local optimum can be obtained by the proposed alternating-based method while the convergence speed and solution quality depend on the initialization of variables. In our proposed framework, two priors are given for model coefficients \({\hat {\mathbf {a}}}\) and drift calibration 𝜖. Therefore, in order to achieve a better convergence speed and solution quality, the prior means a and 0 are used to initialize variables \({\hat {\mathbf {a}}}\) and 𝜖. We continue to update \({\hat {\mathbf {a}}}\) and 𝜖 until convergence. The convergence condition is that the relative difference of drift calibration 𝜖 between current and previous iterations is less than a threshold. In summary, our proposed alternating-based method is shown in Algorithm 1.

8.5 Estimation of Hyper-Parameters

It is important to induce the aforementioned three hyper-parameters so that drifts can be accurately calibrated and meanwhile the over-fitting can be avoided. In this section, cross-validation and EM with Gibbs sampling are used to induce hyper-parameters, respectively.

8.5.1 Unsupervised Cross-Validation

Cross-validation is a simple method to select hyper-parameters. Although there are three hyper-parameters λ, δ 0, δ 𝜖 in Formulation (8.7), only two ratios λδ 0 and δ 𝜖δ 0 need to be determined instead of individual hyper-parameters by cross-validation. We partition temperature measurements during m time-instants into s non-overlapping parts. Given each combination of ratios candidates λδ 0 and δ 𝜖δ 0, in each run, one of the s parts is used to estimate the model error and all other s − 1 parts are used to calculate model coefficients and drift calibration. In the same manner, each run gives a model error e r (r = 1, 2, …, s) estimated from a part of temperature measurements. The final model error is computed as the average \(\bar {e}=(e_{1}+e_{2}+\cdots +e_{s})/s\). Then two ratios λδ 0 and δ 𝜖δ 0 corresponding to the minimum average model error are chosen.

Note that unlike conventional cross-validation [1, 2, 14, 16,17,18], not any golden value of drift calibration is used in metrics to choose hyper-parameters in model fitting stage. Therefore, in our proposed framework, cross-validation is adopted in an unsupervised-learning-like fashion.

Cross-validation is time-consuming since Algorithm 1 has to be performed for multiple times. Thus, we propose a fast and efficient EM algorithm to determine hyper-parameters in statistical model.

8.5.2 Monte Carlo Expectation Maximization

In this section, MLE is used to determine individual hyper-parameters δ 0, λ, and δ 𝜖. MLE of hyper-parameters is formulated as follows:

$$\displaystyle \begin{aligned} \mathop{\mbox{max}}\limits_{\delta_{\boldsymbol{\epsilon}},\delta_{0},\lambda} \quad \mathcal{P}(\hat{\mathbf{x}};\delta_{0},\lambda,\delta_{\boldsymbol{\epsilon}}). {} \end{aligned} $$
(8.12)

However, the likelihood function \(\mathcal {P}(\hat {\mathbf {x}};\delta _{0},\lambda ,\delta _{\boldsymbol {\epsilon }})\) is intractable. EM algorithm is leveraged to efficiently find a solution to Formulation (8.12). According to EM algorithm, Formulation (8.12) can be taken the logarithm and transferred to be its auxiliary lower bound function [20]. Then, the auxiliary lower bound function is optimized by E-step and M-step iteratively after the term independent of hyper-parameters is omitted. The detailed derivation can be found in [21]. For convenience, all hyper-parameters are collected as a set Ω.

8.5.2.1 Expectation Step with Gibbs Sampling

In E-step, the auxiliary lower bound function can be simplified to be a quantity defined as follows:

$$\displaystyle \begin{aligned} Q\left(\varOmega|\varOmega^{\mathrm{old}}\right) =\int\int\mathcal{P}\left(\hat{\mathbf{a}},\boldsymbol{\epsilon}|\hat{\mathbf{x}};\varOmega^{\mathrm{old}}\right)\ln\mathcal{P}(\hat{\mathbf{x}},\hat{\mathbf{a}},\boldsymbol{\epsilon};\varOmega)d\hat{\mathbf{a}}d\boldsymbol{\epsilon}, \end{aligned} $$
(8.13)

where Ω old denotes estimated hyper-parameters in the previous iteration.

However, the posterior \(\mathcal {P}(\hat {\mathbf {a}},\boldsymbol {\epsilon }|\mathbf {x};\varOmega ^{\mathrm {old}})\) is intractable. There are two main methods to approximate the posterior \(\mathcal {P}(\hat {\mathbf {a}},\boldsymbol {\epsilon }|\mathbf {x};\varOmega )\): variational inference and Markov chain Monte Carlo (MCMC). Compared with variational inference, MCMC has the advantage of being non-parametric and asymptotically exact [22]. Therefore, Monte Carlo method is utilized to approximate the quantity as follows:

$$\displaystyle \begin{aligned} Q\left(\varOmega|\varOmega^{\mathrm{old}}\right)\approx \frac{1}{L}\sum_{l=1}^{L}\ln\mathcal{P}\left(\hat{\mathbf{x}},\hat{\mathbf{a}}^{(l)},\boldsymbol{\epsilon}^{(l)};\varOmega\right), {} \end{aligned} $$
(8.14)

where samples \(\hat {\mathbf {a}}^{(l)}\) and 𝜖 (l) are obtained from the distribution \(\mathcal {P}(\hat {\mathbf {a}},\boldsymbol {\epsilon }|\hat {\mathbf {x}}; \varOmega ^{\mathrm {old}})\). L is total amount of samples. In MCMC, there are two main algorithms to obtain samples from the desired distribution \(\mathcal {P}(\hat {\mathbf {a}},\boldsymbol {\epsilon }|\hat {\mathbf {x}}; \varOmega ^{\mathrm {old}})\): Metropolis Hastings algorithm and Gibbs sampling. Since the rejection rate will be high in complex problems, Metropolis Hastings algorithm has very slow convergence [21]. Therefore, Gibbs sampling is used to obtain samples \(\hat {\mathbf {a}}^{(l)}\) and 𝜖 (l).

Gibbs sampling has the behavior that one or batch variables are cyclically and repeatedly updated in some particular order at random from conditional distribution. Sampling order is arranged to be \(\hat {a}_{1,0}^{(l)},\ldots ,\hat {a}_{1,n}^{(l)},\hat {a}_{2,0}^{(l)},\ldots ,\hat {a}_{n,n-1}^{(l)},\epsilon _{1}^{(l)},\) \(\ldots ,\epsilon _{n}^{(l)}\). In Gibbs sampling, one of key points is derivation of the conditional distribution for each variable. Note that according to Formulation (8.7), the log conditional distribution w.r.t. individual variable is quadratic. Therefore, the conditional distribution of each variable is Gaussian distribution as follows:

$$\displaystyle \begin{aligned} \begin{aligned} \hat{a}_{p,q} &\sim\mathcal{P}\left(\hat{a}_{p,q}|\boldsymbol{\epsilon},\hat{\mathbf{a}}_{/\hat{a}_{p,q}},\hat{\mathbf{x}};\delta_{\epsilon},\delta,\lambda\right) =\mathcal{N}\left(\mu_{\hat{a}_{p,q}},\sigma_{\hat{a}_{p,q}}^{-1}\right), \\ \epsilon_{t} &\sim\mathcal{P}\left(\epsilon_{t}|\boldsymbol{\epsilon}_{/\epsilon_{t}},\hat{\mathbf{a}},\hat{\mathbf{x}};\delta_{\epsilon},\delta,\lambda\right) =\mathcal{N}\left(\mu_{\epsilon_{t}},\sigma_{\epsilon_{t}}^{-1}\right), \end{aligned} {} \end{aligned} $$
(8.15)

in agreement with (8.4) and (8.5). μ is mean and σ is precision. \(\hat {\mathbf {a}}_{/\hat {a}_{p,q}}\) and \(\boldsymbol {\epsilon }_{/\epsilon _{t}}\) denote \(\hat {\mathbf {a}}\) but with \(\hat {a}_{p,q}\) omitted and 𝜖 but with 𝜖 t omitted.

Before Gibbs sampling, in order to converge to the desired posterior, the warm-start has to be performed if there is no reasonable initialization for samples. Furthermore, it is very hard to judge whether the warm-start is enough [21]. In order to waive the warm-start, a reasonable initialization for samples is adopted in Gibbs sampling. Note that Gibbs sampling is used to obtain samples from the desired posterior \(\mathcal {P}(\hat {\mathbf {a}},\boldsymbol {\epsilon }|\hat {\mathbf {x}};\varOmega ^{\mathrm {old}})\) (8.6). As we discussed in Sect. 8.2, Formulation (8.7) is equivalent to MAP estimation of \(\hat {\mathbf {a}}\) and 𝜖. Thus given hyper-parameters Ω old and measurement values \(\hat {\mathbf {x}}\), Gibbs sampling can be initialized by handling Formulation (8.7) to obtain initial samples \(\hat {\mathbf {a}}^{(0)}\) and 𝜖 (0) satisfying the distribution \(\mathcal {P}(\hat {\mathbf {a}},\boldsymbol {\epsilon }|\hat {\mathbf {x}};\varOmega ^{\mathrm {old}})\). As a result, the warm-start can be totally waived.

8.5.2.2 Maximization Step

After L samples are obtained by Gibbs sampling, in M-step, we will maximize the approximated quantity as follows:

$$\displaystyle \begin{aligned} \mathop{\mbox{max}}\limits_{\varOmega} \quad \frac{1}{L}\sum_{l=1}^{L}\mathrm{ln}\mathcal{P}\left(\hat{\mathbf{x}},\hat{\mathbf{a}}^{(l)},\boldsymbol{\epsilon}^{(l)};\varOmega\right). \end{aligned} $$
(8.16)

With the first-order optimality condition, that is dQ = 0, hyper-parameters λ, δ 0, δ 𝜖 can be updated as follows:

$$\displaystyle \begin{aligned} \lambda=\frac{n^{2}L}{\sum_{i=1}^{n}\sum_{j=0,j\neq i}^{n}\sum_{l=1}^{L}\frac{\left(\hat{a}_{i,j}^{(l)}-a_{i,j}\right)^{2}}{a_{i,j}^{2}}}, \end{aligned} $$
(8.17)
$$\displaystyle \begin{aligned} \delta_{0} = \frac{Lmn}{\sum_{l=1}^{L}\sum_{i=1}^{n}\sum_{k=1}^{m}\left[\sum_{j=1}^{n}\hat{a}_{i,j}^{(l)}\left(\hat{x}^{(k)}_{j}+\epsilon_{j}^{(l)}\right)+\hat{a}_{i,0}^{(l)}\right]^{2}}, \end{aligned} $$
(8.18)
$$\displaystyle \begin{aligned} \delta_{\boldsymbol{\epsilon}}=\frac{nL}{\sum_{l=1}^{L}\sum_{i=1}^{n}\epsilon_{i}^{(l)2}}. \end{aligned} $$
(8.19)

Here, \(\hat {a}_{i,i}^{(l)}\triangleq -1\) and \(\hat {x}_{0}^{(k)}+\epsilon _{0}^{(l)}\triangleq 1\). We continue to alternate between E-step and M-step until convergence. The convergence condition is that the relative difference of three hyper-parameters between current and previous iterations is less than a threshold. Then hyper-parameters λ, δ, δ 𝜖 can be determined.

For convenience, all variables are collected as a set Ψ = {ψ 1, ψ 2, \(\dots ,\psi _{n^2+n}\}\) \(=\{\hat {a}_{1,0},\ldots ,\hat {a}_{1,n},\ldots ,\hat {a}_{n,n-1},\epsilon _{1},\ldots ,\epsilon _{n}\}\). In summary, our proposed EM with Gibbs sampling is shown in Algorithm 2.

Algorithm 2 EM with Gibbs sampling

8.6 Experimental Results

The in-building temperature data are used to test our proposed framework. We use several sensors to calibrate drifts. All data is directly generated from EnergyPlus as shown in Fig. 8.5. As shown in Fig. 8.6, two building benchmarks, Hall [23] with Washington, D.C weather and Secondary School [24] with Chicago weather, are simulated by EnergyPlus to generate the ground-truth in-building temperatures, which are used to test our proposed framework. The temperature sampling period is set to be 1 h.

Fig. 8.5
figure 5

The generated simulation data

Fig. 8.6
figure 6

Benchmark: (a) Hall; (b) Secondary School

In practice, both drift and measurement noise need to be carefully considered and reasonably set to close to real temperature measurement. Because of a slow-aging effect, time effects on sensor performance are not considered in our experiments. Drift is set to be time-invariant while measurement noise is set to be time-variant. According to the sensors’ performance shown in Fig. 8.2a, two low-cost temperature sensors, MCP9509 with accuracy ± 4.5 °C and LM335A with accuracy ± 5 °C as shown in Fig. 8.2b,c, are chosen to set drift variance, respectively. According to the triple standard deviation, we set two drift variances to be σ 2 = (4.5∕3)2 = 2.25 and σ 2 = (5∕3)2 = 2.78. In addition, according to our survey, the noise variance is set to be 0.001. All temperature measurements are generated by adding noise.

The time-instant number needs to be reasonably set to meet practical application and accurately calibrate sensor drifts. We assume the temperature measurements are drift-free during first m 0 = 240 time-instants (first 10 days). And during m = 60 time-instants (60 h), the temperature measurements with drifts are used to test our proposed framework.

TSBL [11] and the proposed framework with cross-validation and EM are used to calibrate sensor drifts, respectively. All methods are implemented by Python 2.7 on 12-core Linux machine with 256 G RAM and 2.80 GHz. 100 combinations of hyper-parameters ratios and s = 5 folds are set in cross-validation. Since the warm-start is waived in Gibbs sampling, in order to achieve a better trade-off between accuracy and runtime, only L = 10 samples are generated to perform Monte Carlo approximation (8.14), and three hyper-parameters λ, δ 0, δ 𝜖 are initialized to be 103, 10−4, and 10−3 in EM. The convergence criterion thresholds are set to be 10−8 and 10−2 in Algorithms 1 and 2.

As mentioned in Sect. 8.2, the drift calibration accuracy is evaluated by using MAPE defined as follows:

$$\displaystyle \begin{aligned} \mathrm{MAPE}=\frac{1}{nm}\sum_{k=1}^{m}\sum_{i=1}^{n}\left|\dfrac{\hat{\epsilon}_{i}^{(k)}-\epsilon_{i}}{\epsilon_{i}}\right|, \end{aligned} $$
(8.20)

where \(\hat {\epsilon }^{(k)}_{i}\) is the estimated calibration. Specifically, in our proposed framework, \(\hat {\epsilon }^{(k)}_{i}=\hat {\epsilon }_{i}\). The sensor drift calibration performances of accuracy and runtime are shown in Figs 8.7 and 8.8.

Fig. 8.7
figure 7

Drift variance is set to (a,c) 2.25; (b,d) 2.78; Benchmark: (a,b) Hall; (c,d) Secondary school

Fig. 8.8
figure 8

Runtime vs. # sensor on (a) Hall; (b) Secondary school

As shown in Fig. 8.8, TSBL has acceptable computational overhead even if its computational complexity is dominated by multiple matrix inversion operations. However, as shown in Fig. 8.7, TSBL has the worst performance and robust for drifts calibration. In fact, temperature signals lie in time-variant subspace since in-building temperatures are influenced by multiple time-variant factors, e.g., weather. As a result, TSBL cannot achieve an obvious drift calibration.

Unlike TSBL, the proposed spatial correlation model can calibrate drifts even if temperature signals lie in time-variant subspace. Therefore, as shown in Fig. 8.7, the proposed framework with either cross-validation or EM outperforms TSBL in accuracy. Besides, the proposed drift calibration framework with cross-validation can achieve the best accuracy. However, as shown in Fig. 8.8, cross-validation has heavy computational overhead since we need to run Algorithm 1 for multiple times. Compared with cross-validation and TSBL, EM with Gibbs sampling has lower computation complexity since less samples are generated to perform Monte Carlo approximation and EM can achieve fast convergence. However, as shown in Fig. 8.7, the proposed framework with EM cannot achieve the best accuracy since EM with Gibbs sampling is an approximation method.

As shown in Fig. 8.7, because of incremental correlation, the more sensors can achieve the more accuracy of drift calibration by using our proposed framework. In practice, when less sensors need to be calibrated, in order to achieve a better accuracy, cross-validation can be used to determine hyper-parameters within a reasonable response time, e.g., 1 min. While more sensors need to be calibrated, EM with Gibbs sampling can be used to determine hyper-parameters so that sensor measurement accuracy can be improved to a tolerable level within acceptable runtime. The proposed calibration framework with EM can achieve robust drift calibration and a better trade-off between accuracy and runtime.

8.7 Conclusion

In this paper, a sensor spatial correlation model has been proposed to perform drift calibration. Thanks to spatial correlation, the unknown actual temperature measured by each sensor is linearly expressed by all other sensors. The priors for model coefficients and drift calibration are applied to MAP estimation. MAP estimation is then formulated as a non-convex problem with three hyper-parameters, which is handled by the proposed alternating-based method. Cross-validation and EM with Gibbs sampling are used to determine hyper-parameters, respectively. Experimental results show that on benchmarks simulated from EnergyPlus, the proposed framework with EM can achieve a robust drift calibration and better trade-off between accuracy and runtime.