Keywords

1 Introduction

Many existing private and public assets, such as civil engineering infrastructures, buildings, or aircraft, require reliable damage detection techniques to be safely used, especially during their inevitable aging. When monitoring a structure over its lifecycle, its deterioration and damages represent a great concern and the early detection of critical decay might prevent failures that can cause sudden shutdowns or even catastrophes with severe life-safety and economic repercussions [1]. To prevent these critical failures from happening, techniques of structural health monitoring (SHM) have been developed in recent studies with applications to civil and aerospace engineering, as well as to the conservation of cultural heritage structures. SHM refers to automated monitoring procedures that seek to provide reliable information on the performance and integrity of a structure in real time. In the context of SHM, the combination of sensor measurements, numerical models simulating the underlying behavior of a structure of interest under different environmental and operational conditions, and machine learning techniques has led to the design of structural digital twins [2].

The focus of this work is on wave propagation approaches to data-driven, predictive SHM, which aim to detect damages by examining the distortions in propagating elastic waves as a result of reflections and amplitude attenuations when intersecting the damage boundary. Featuring a data-driven nature, these approaches, sometimes called simulation-based SHM [3,4,5,6,7], are decomposed into an offline phase and an online phase. In the former, a database of synthetic signals is built to represent the structural behavior under different conditions, while in the latter real experimental time signals, collected from sensors placed on a structure, are compared with those simulated offline using a classifier that discriminates between damaged and undamaged states.

Toward an efficient and robust scheme of data-driven predictive monitoring, reduced-order modeling techniques are integrated into a wave propagation approach with cutting-edge machine learning tools. The data-driven SHM setting corresponds to a multiquery problem, where one has to solve high-dimensional, time-dependent, parametric equations, which results in great demands for computational resources. To overcome such a computational burden, model order reduction is employed to project the original full-order system onto a reduced space with a significantly lower dimensionality. In this way, a robust dataset of approximated sensor measurements is generated. In SHM, the task of damage detection is typically reduced to a supervised learning process relying on a fully labeled dataset, obtained from both healthy and damaged structures (either generated with computer-aided procedures or collected experimentally). However, gathering an exhaustive collection of configuration classes anticipating all types of damages is typically unrealistic and the number of different classification labels may grow rapidly. Instead, this work relies on semi-supervised learning techniques, also called one-class classification methods, which learn the common features among labeled data belonging to the normal class in the training phase. Unlabeled data from both classes are then used in the test phase to identify abnormal data which deviate from the normal mode. One-class algorithms allow to locate the damage and estimate its severity by training a different model for each sensor location. Finally, guided by an appropriate indicator of the damage detection performance, a modified sparse Gaussian process method is applied to the synthetic dataset of healthy configurations to systematically place a fixed number of sensors on a structure of interest.

Following the introduction, a model problem for sensor measurements governed by the acoustic-elastic equation is briefly reviewed in Sect. 2. Basic techniques of reduced-order modeling, which can be used for the multiquery simulations in SHM, are introduced in Sect. 3. A local semi-supervised method is used for automatic anomaly detection in Sect. 4, and the variational sparse Gaussian process model is utilized for optimal sensor placement in Sect. 5. In Sect. 6, the all-round methodology is demonstrated by a numerical example. Finally, conclusions are drawn in Sect. 7.

2 A Model for Sensor Measurements

This section introduces the model problem for sensor measurements, including its governing equations and parametric discrete formulations.

2.1 Governing Equation

Let \(\Omega \subset \mathbb {R}^{d_\Omega }\) be a polygonal physical domain with piece-wise smooth boundary \(\partial \Omega \), where \(d_\Omega =2,3\) is the spatial dimension, and let [0, T], with \(T\in \mathbb {R}_+\), be a suitable time interval. Here, \(\Omega \) represents a structure of interest and [0, T] a suitable time window to observe the response of a structure undergoing a predefined excitation (i.e., the effect of an active or passive source) through sensor measurements. Moreover, let \(\mathcal {P}\subset \mathbb {R}^{d_\mu }\) be a suitable parameter domain, where \(d_\mu \) indicates the number of input parameters required to describe the healthy variations that a structure may undergo during its life time, and let \(\boldsymbol{\mu }=(\mu _1,\dots ,\mu _{d_\mu })\) be a parameter vector representing one possible healthy variation of the environmental and operational conditions, i.e., \(\mu _i\) may relate to the material properties, the boundary conditions, the initial conditions, or the source function for \(1\le i\le d_{\mu }\).

Let \(\boldsymbol{u}=\boldsymbol{u}(\boldsymbol{x},t;\boldsymbol{\mu }):\Omega \times [0,T]\times \mathcal {P}\rightarrow \mathbb {R}^{d_\Omega }\) be a vector-valued displacement field, solution of the acoustic-elastic equation equipped with suitable boundary and initial conditions:

$$\begin{aligned} {\left\{ \begin{array}{ll} \rho (\boldsymbol{\ddot{u}}+\eta \boldsymbol{\dot{u}}) - \nabla \cdot \boldsymbol{\sigma } = h(t;\boldsymbol{\mu }){\boldsymbol{b}}(\boldsymbol{x};\boldsymbol{\mu }) &{} \text {in } \Omega \\ \boldsymbol{u} \cdot \boldsymbol{n} = \boldsymbol{0}, \quad (\boldsymbol{\sigma } \cdot \boldsymbol{n} )\cdot \boldsymbol{\tau } = \boldsymbol{t}_N &{} \text {on } \partial \Omega \\ \boldsymbol{u}|_{t=0} = \boldsymbol{u}_0, \quad \boldsymbol{\dot{u}}|_{t=0} = \boldsymbol{v}_0 &{} \text {in } \Omega \\ \end{array}\right. }, \end{aligned}$$
(1)

where \(\boldsymbol{\ddot{u}}=\partial ^2\boldsymbol{u}/\partial t^2\) and \(\boldsymbol{\dot{u}}=\partial \boldsymbol{u}/\partial t\) are the acceleration and velocity fields. Here, \(\rho \) is the density coefficient, \(\eta \) is a dimensionless damping coefficient, \(h:[0,T]\times \mathcal {P}\rightarrow \mathbb {R}\) and \({\boldsymbol{b}}:\Omega \times \mathcal {P}\rightarrow \mathbb {R}^{d_\Omega }\) are two source functions depending on time and space, respectively, \(\boldsymbol{\sigma }=\boldsymbol{\sigma }(\boldsymbol{u};\boldsymbol{\mu })\) is the stress tensor, \(\boldsymbol{n}\) and \(\boldsymbol{\tau }\) are the outward normal and tangential (unit) vectors to \(\partial \Omega \), respectively, \(\boldsymbol{t}_N=\boldsymbol{t}_N(\boldsymbol{x},t;\boldsymbol{\mu })\) is the traction vector used in the definition of the free-slip boundary conditions, \(\boldsymbol{u}_0=\boldsymbol{u}_0(\boldsymbol{x};\boldsymbol{\mu })\) and \(\boldsymbol{v}_0=\boldsymbol{v}_0(\boldsymbol{x};\boldsymbol{\mu })\) describe the initial displacement and velocity in space, respectively.

The ultimate goal is to emulate the real sensor response at m given sensor locations \(\boldsymbol{x}_i\in \Omega \) for \(1\le i\le m\). To do so, let \(\ell :\mathbb {R}^{d_\Omega }\times \mathcal {P}\rightarrow \mathbb {R}^{d_g}\) be an input–output function and \(g_i:[0,T]\times \mathcal {P}\rightarrow \mathbb {R}^{d_g}\) a (parametric) output of interest, i.e., an approximation of the sensor response at time t and location \(\boldsymbol{x}_i\):

$$\begin{aligned} g_i(t;\boldsymbol{\mu }) = \ell (\boldsymbol{u}(\boldsymbol{x}_i,t;\boldsymbol{\mu });\boldsymbol{\mu }), \quad 1\le i\le m. \end{aligned}$$
(2)

Before proceeding with classic discretization techniques such as the finite element method, consider the vector space \(\mathbb {V}=\{\boldsymbol{w}\in [H^1(\Omega )]^{d_\Omega }:\boldsymbol{w}\cdot \boldsymbol{n}=\boldsymbol{0}\text { on }\partial \Omega \}\), equipped with a suitable inner product \(\langle \cdot ,\cdot \rangle _\mathbb {V}\) and the corresponding induced norm \(\Vert \cdot \Vert _\mathbb {V}\). Moreover, consider a parametrized linear form \(f:\mathbb {V} \times \mathcal {P}\rightarrow \mathbb {R}\) where the linearity is with respect to the first argument, and the parametrized bilinear forms \(m:\mathbb {V}\times \mathbb {V}\rightarrow \mathbb {R}\) and \(a:\mathbb {V}\times \mathbb {V}\times \mathcal {P}\rightarrow \mathbb {R}\), where the bilinearity is with respect to the first two arguments. Then, the acoustic-elastic problem in abstract form reads: given \(t\in [0,T]\) and \(\boldsymbol{\mu }\in \mathcal {P}\), find \(\boldsymbol{u}=\boldsymbol{u}(t;\boldsymbol{\mu })\in \mathbb {V}\) such that

$$\begin{aligned} {\left\{ \begin{array}{ll} \rho \left( m(\boldsymbol{\ddot{u}},\boldsymbol{\psi })+ \eta \, m(\boldsymbol{\dot{u}},\boldsymbol{\psi })\right) + a(\boldsymbol{u},\boldsymbol{\psi };\boldsymbol{\mu }) = h(t;\boldsymbol{\mu }) f(\boldsymbol{\psi };\boldsymbol{\mu }) &{} \forall \boldsymbol{\psi }\in \mathbb {V}, \\ \boldsymbol{u}(t=0;\boldsymbol{\mu }) = \boldsymbol{0},\quad \boldsymbol{\dot{u}}(t=0;\boldsymbol{\mu }) = \boldsymbol{0}, &{} \end{array}\right. } \end{aligned}$$
(3)

with

$$\begin{aligned} \begin{gathered} m(\boldsymbol{ u},\boldsymbol{\psi }) =\int \limits _\Omega \boldsymbol{ u} \cdot \boldsymbol{\psi } \,\boldsymbol{\Omega }, \quad f(\boldsymbol{\psi };\boldsymbol{\mu }) = \int \limits _\Omega {\boldsymbol{b}}(\boldsymbol{ \mu }) \cdot \boldsymbol{\psi } \,\boldsymbol{\Omega }, \\ a(\boldsymbol{u},\boldsymbol{\psi };\boldsymbol{\mu }) =\int \limits _\Omega \left[ {\boldsymbol{H}}({\boldsymbol{\mu }}):\boldsymbol{\varepsilon }(\boldsymbol{u})\right] : \boldsymbol{\varepsilon }(\boldsymbol{\psi }) \,\boldsymbol{\Omega }, \end{gathered} \end{aligned}$$
(4)

in which \({\boldsymbol{H}}\) is the Hooke’s stiffness tensor and \(\boldsymbol{\varepsilon }(\cdot ) = \frac{1}{2}[\nabla \cdot +(\nabla \cdot )^\text {T}]\) is the operator of Cauchy strain. Note that, for the sake of simplicity, homogeneous boundary and initial conditions in (3) are considered, i.e., \({\boldsymbol{u}}_0 = \boldsymbol{0}\), \({\boldsymbol{v}}_0 = \boldsymbol{0}\), and \({\boldsymbol{t}}_N = \boldsymbol{0}\).

2.2 Parametric Discrete Problem

This section introduces a discrete approximation space \(\mathbb {V}_h\subset \mathbb {V}\) as well as a discretization of the time interval [0, T] in which the approximate solution is sought. The approximation space \(\mathbb {V}_h\) is here constructed by a standard finite element method based on piece-wise linear basis functions and a triangulation of \(\Omega \), i.e., non-overlapping triangles (\(d_\Omega = 2\)) or tetrahedra (\(d_\Omega = 3\)) whose union perfectly coincides with \(\Omega \). Alternative discretization strategies include spectral methods or higher-order finite elements. Let \(\mathbb {V}_h\) be equipped with a basis \(\{\boldsymbol{\varphi }_j({\boldsymbol{x}}) \in \mathbb {R}^{d_\Omega }\}_{j=1}^{N_h}\), where \(N_h=\text {dim}(\mathbb {V}_h)\) is the number of degrees of freedom (DOFs). Moreover, divide the time interval [0, T] into \(N_t\) subintervals of equal length \(\Delta t = \tfrac{T}{N_t}\) and define \(t^n = n\Delta t,\, 0 \le n \le N_t\).

The discrete problem with finite element discretization seeks to find \(\boldsymbol{u}_h(t;\boldsymbol{\mu })\in \mathbb {V}_h\), which can be expressed as \(\boldsymbol{u}_h(t;\boldsymbol{\mu }) = \sum _{j=1}^{N_h} (\boldsymbol{u}_{h}(t;\boldsymbol{\mu }))_j \boldsymbol{\varphi }_j(\boldsymbol{x})\) where \((\boldsymbol{u}_{h})_j\) denotes the jth entry of the solution vector \(\boldsymbol{u}_{h}\in \mathbb {R}^{N_h}\). With an additional discretization over time, one can retrieve the solution vector at the nth time step, denoted by \(\boldsymbol{u}_h^n({\boldsymbol{\mu }})= \boldsymbol{u}_{h}(t^n;\boldsymbol{\mu })\), \(n= 1,\ldots , N_t\). Moreover, let \(\boldsymbol{v}_{h}^n(\boldsymbol{\mu }) \in \mathbb {R}^{N_h}\) and \(\boldsymbol{a}_{h}^n(\boldsymbol{\mu })\in \mathbb {R}^{N_h}\) be the velocity and the acceleration vectors, respectively, such that their elements are the multiplicative coefficients of the following expressions: \(\boldsymbol{\dot{u}}_h^n(\boldsymbol{\mu }) = \sum _{j=1}^{N_h} (\boldsymbol{v}_{h}^n(\boldsymbol{\mu }))_j \boldsymbol{\varphi }_j(\boldsymbol{x})\) and \(\boldsymbol{\ddot{u}}_h^n(\boldsymbol{\mu }) = \sum _{j=1}^{N_h} (\boldsymbol{a}_{h}^n(\boldsymbol{\mu }))_j \boldsymbol{\varphi }_j(\boldsymbol{x})\), respectively.

Once the acoustic-elastic equation is spatially discretized by finite elements, the corresponding algebraic formulation is written as follows for a given \(\boldsymbol{\mu }\in \mathcal {P}\) and \(t\in [0,T]\):

$$\begin{aligned} \rho \boldsymbol{M}_h \left[ \ddot{\boldsymbol{u}}_h({\boldsymbol{\mu }})+\eta \dot{\boldsymbol{u}}_h({\boldsymbol{\mu }})\right] + \boldsymbol{A}_h({\boldsymbol{\mu }}){\boldsymbol{u}}_h({\boldsymbol{\mu }}) = h(t;{\boldsymbol{\mu }})\boldsymbol{f}_h({\boldsymbol{\mu }}), \end{aligned}$$
(5)

where \(\boldsymbol{M}_h\in \mathbb {R}^{{N_h}\times {N_h}}\) is the mass matrix, \(\boldsymbol{A}_h(\boldsymbol{\mu })\in \mathbb {R}^{{N_h}\times {N_h}}\) the parametrized stiffness matrix, and \(\boldsymbol{f}_h(\boldsymbol{\mu })\in \mathbb {R}^{{N_h}}\) the parametrized right-hand side vector with entries

$$\begin{aligned} \begin{aligned}&(\boldsymbol{M}_h)_{ij} = m(\boldsymbol{\varphi }_j,\boldsymbol{\varphi }_i),\quad (\boldsymbol{A}_h(\boldsymbol{\mu }))_{ij} = a(\boldsymbol{\varphi }_j,\boldsymbol{\varphi }_i;\boldsymbol{\mu }),\\ \text {and}\quad&(\boldsymbol{f}_h(\boldsymbol{\mu }))_{i}=f(\boldsymbol{\varphi }_i;\boldsymbol{\mu }),\quad 1\le i,j\le N_h. \end{aligned} \end{aligned}$$
(6)

A classic Newmark method is then applied to the temporal discretization and the governing equation becomes: given \(\boldsymbol{\mu }\in \mathcal {P}\), find the acceleration vector \(\boldsymbol{a}_{h}^n(\boldsymbol{\mu })\in \mathbb {R}^{N_h}\) for \(n=1,\dots ,N_t\) such that

$$\begin{aligned} \left[ \rho (1 + \eta \zeta \Delta t) \boldsymbol{M}_h + \beta (\Delta t)^2\boldsymbol{A}_h(\boldsymbol{\mu })\right] \boldsymbol{a}_{h}^n(\boldsymbol{\mu }) = h({t^n;\boldsymbol{\mu }})\boldsymbol{f}_{h}(\boldsymbol{\mu }) - \boldsymbol{q}_{h}^{n-1}(\boldsymbol{\mu }), \end{aligned}$$
(7)

in which \(\beta \) and \(\zeta \) are two constant parameters, here chosen as \(\zeta =2\beta = 2\), which corresponds to a popular second-order method [8, 9], while \(\boldsymbol{q}_{h}^{n-1}(\boldsymbol{\mu })\in \mathbb {R}^{{N_h}}\) is given as

$$\begin{aligned} \begin{aligned} {\boldsymbol{q}}_{h}^{n-1}({\boldsymbol{\mu }}) =&{\boldsymbol{A}}_h({\boldsymbol{\mu }}){\boldsymbol{u}}_{h}^{n-1}({\boldsymbol{\mu }}) + (\rho \eta {\boldsymbol{M}}_h +\Delta t{\boldsymbol{A}}_h({\boldsymbol{\mu }})){\boldsymbol{v}}_{h}^{n-1}({\boldsymbol{\mu }}) \\&+ \left( \rho \eta (1-\zeta )\Delta t{\boldsymbol{M}}_h +\tfrac{1-2\beta }{2}(\Delta t)^2{\boldsymbol{A}}_h({\boldsymbol{\mu }})\right) {\boldsymbol{a}}_{h}^{n-1}({\boldsymbol{\mu }}). \end{aligned} \end{aligned}$$
(8)

Finally, the displacement solution vector \(\boldsymbol{u}_{h}^{n}(\boldsymbol{\mu })\) is obtained using the updating rule of the implicit Newmark method, introduced in [10] and defined as:

$$\begin{aligned} \boldsymbol{u}_{h}^n(\boldsymbol{\mu })&= \boldsymbol{u}_{h}^{n-1}(\boldsymbol{\mu }) +\Delta t \boldsymbol{v}_{h}^{n-1}(\boldsymbol{\mu })\left( \beta \boldsymbol{a}_{h}^{n}(\boldsymbol{\mu })+\tfrac{1-2\beta }{2}\boldsymbol{a}_{h}^{n-1}(\boldsymbol{\mu }) \right) \end{aligned}$$
(9)
$$\begin{aligned} \boldsymbol{v}_{h}^{n}(\boldsymbol{\mu })&= \boldsymbol{v}_{h}^{n-1}(\boldsymbol{\mu })+\Delta t \left( \zeta \boldsymbol{a}_{h}^n(\boldsymbol{\mu })+(1-\zeta )\boldsymbol{a}_{h}^{n-1}(\boldsymbol{\mu }) \right) , \end{aligned}$$
(10)

Problem (7) is denoted as the truth problem and \(\boldsymbol{u}_{h}^n(\boldsymbol{\mu })\) as the truth solution at n-th time step, which, in principle, can be achieved with as high accuracy as desired. However, many degrees of freedom may be involved in the problem, thus leading to a computationally expensive method due to inversion of the \(N_h\)-dimensional matrix in the left-hand side of (7). In addition, to fully represent the healthy variations of the structure, one needs to estimate an approximation to the output of interest (2) for many input parameter values \(\{\boldsymbol{\mu }_1,\boldsymbol{\mu }_2,\dots \}\) over the whole discrete time window \(0=t^0,t^1,\dots ,t^{N_t}=T\), i.e.,

$$\begin{aligned} \boldsymbol{g}_i^k =\left[ g_i(t^0;\boldsymbol{\mu }_k),g_i(t^1;\boldsymbol{\mu }_k),\ldots ,g_i(t^{N_t};\boldsymbol{\mu }_k)\right] ,\quad k = 1,2, \dots , \end{aligned}$$
(11)

evaluated at all the sensor locations \({\boldsymbol{x}}_i\), \(1\le i\le m\). For each input parameter value \(\boldsymbol{\mu }_k\), the total computational cost involves the resolution of \(N_t\) linear systems of dimension \(N_h\).

3 Techniques of Reduced-Order Modeling

When the dimensionality of a full-order system, defined as \(N_h\) in Sect. 2, is large, the repeated solution of such a time-dependent problem with varying input parameters can result in great demands on both CPU time and memory, which is often computationally prohibitive. To reduce the computational cost without significantly compromising the overall accuracy, reduced-order models (ROMs) have been developed. In general, reduced-order modeling seeks to find a low-dimensional representation of the full-order solution manifold and hence reduce the dimensionality by projecting the original governing equations onto a low-dimensional space.

The reduced basis (RB) method [11, 12] is a typical projection-based approach to ROMs and features an offline–online framework. With a significantly smaller dimension than the full-order model, a reduced space is spanned by a set of RB modes that are extracted offline from a collection of full-order snapshots at several time-parameter locations. Once the RB space is constructed, the approximate solution for an unseen parameter value is recovered online in the reduced space. Conventionally, a Galerkin projection is adopted to determine the combination coefficients associated with the RB, yielding the reduced-order solutions during the online stage.

For time-dependent problems, due to the traveling-wave behavior of the solution, classic projection-based ROM strategies [13] may pose several challenges, e.g., the manifold of all possible solutions can often not be compressed to a small reduced basis. Furthermore, the sampling strategy is more complicated since it has to combine the solution at different time instants and for different values of the parameter. For example, the readers can refer to the POD-greedy sampling strategy [14] and the randomized SVD algorithms [15]. Recent efforts have been made in the direction of space–time approaches, where projection in space and time is performed simultaneously, see, e.g., [16]. A different effective strategy is to replace the time domain formulation with a frequency domain formulation and to apply a ROM method to replace the full-order problem in frequency domain. In this way, the number of time instances \(N_t\) where one expects to solve a linear system equivalent to (7) is reduced to a number of principal frequencies \(N_z\), with \(N_z \ll N_t\). In addition, recurring to a ROM strategy reduces the number of degrees of freedom of each linear system to the size of reduced basis, i.e., from \(N_h\) to r, with \(r \ll N_h\). Without going in too much detail, the reader are referred to [5, 17], where the authors, motivated by the interest of studying the transient response of damaged structures under the effect active sources, construct a reduced-order model of the acoustic-elastic equation in the Laplace domain.

The goal here is to provide a brief introduction to several basic elements of the RB method, which lay the foundation for more advanced techniques. In particular, after introducing a general formulation of the proper orthogonal decomposition (POD) in Sect. 3.1, the construction of RB from full-order snapshots using the POD is also described. The technique to retrieve the RB solution, an approximation of the high-fidelity solution, is ultimately presented in Sect. 3.2.

3.1 Proper Orthogonal Decomposition

General formulation of the POD

In a vector space \(\mathbb {X}\), equipped with an inner product \(\langle \cdot ,\cdot \rangle _\mathbb {X}\), consider a collection of snapshot vectors, denoted by \(\{p_1,\ldots ,p_{N_s}\}\subset \mathbb {X}\). A correlation matrix \(\boldsymbol{C}\in \mathbb {R}^{N_s\times N_s}\) of the snapshots is formed as

$$\begin{aligned} \boldsymbol{C}_{ij} = \langle p_i,p_j\rangle _\mathbb {X},\quad 1\le i,j \le N_s. \end{aligned}$$
(12)

The eigenvalue problem of such a correlation matrix \(\boldsymbol{C}\) is then written as

$$\begin{aligned} \boldsymbol{C}\boldsymbol{z}^{(i)} = \lambda ^{(i)} \boldsymbol{z}^{(i)},\quad 1\le i \le N_s, \end{aligned}$$
(13)

in which \(\lambda ^{(1)}\ge \cdots \ge \lambda ^{(N_s)} \ge 0\). By taking

$$\begin{aligned} \phi _i = \sum _{j=1}^{N_s}p_j\left( \boldsymbol{z}^{(i)}\right) _j/\sqrt{\lambda ^{(i)}},\quad 1\le i \le r \text {with}\,N \le N_s, \end{aligned}$$
(14)

an orthonormal basis is formed, i.e., \(\langle \phi _i ,\phi _j \rangle =\delta _{ij}\), \(1\le i,j \le r\), and an r-dimensional subspace is then constructed as \(\mathbb {X}_r=\text {span}\{\phi _1,\ldots ,\phi _{N}\}\subset \mathbb {X}\). The projection onto this subspace, denoted by \(P_N[\cdot ]:\mathbb {X}\rightarrow \mathbb {X}_r\), is thus defined as

$$\begin{aligned} P_r[f]= \arg \min _{\xi \in \mathbb {X}_r} \Vert f-\xi \Vert _\mathbb {X}^2=\sum _{i=1}^{r}\langle f,\phi _i\rangle _\mathbb {X} \phi _i, \quad f\in \mathbb {X}. \end{aligned}$$
(15)

It can be shown that the projection error of the snapshots only depends on the truncated eigenvalues, written as

$$\begin{aligned} \sum _{i=1}^{N_s}\Vert p_i -P_r[p_i]\Vert _\mathbb {X}^2 = \sum _{i = r+1}^{N_s} \lambda ^{(i)}. \end{aligned}$$
(16)

In addition, \(\mathbb {X}_r\) is the optimal subspace of \(\mathbb {S}=\text {span}\{p_1,\ldots ,p_{N_s}\}\) that minimizes the projection error, i.e.,

$$\begin{aligned} \sum _{i=1}^{N_s}\Vert p_i -P_r[p_i]\Vert _\mathbb {X}^2 = \min _{\mathbb {U}\text { being a subspace of } \mathbb {S}}\left\{ \sum _{i=1}^{N_s}\min _{\xi \in \mathbb {U}}\Vert p_i - \xi \Vert _\mathbb {X}^2 \right\} . \end{aligned}$$
(17)

Construction of RB using the POD

At the algebraic level, the solution space for the full-order discrete system (5) is \(\mathbb {R}^{N_h}\), i.e., \(\mathbb {X} = \mathbb {R}^{N_h}\), and it is correspondingly equipped with the Euclidean inner product. To construct an RB space, one has to collect the solution snapshots of \(N_t\) time instances \(\{t^0,t^ 1,\ldots , t^{N_t}\}\) at \(N_\mu \) parameter locations \(\{{\boldsymbol{\mu }}_1,{\boldsymbol{\mu }}_2,\ldots ,{\boldsymbol{\mu }}_{N_\mu }\}\), i.e., \(\{p_i\}_{i=1}^{N_s}=\{\boldsymbol{u}_h^n({\boldsymbol{\mu }}_k):1\le n\le N_t,1\le k\le N_\mu \}\) and \(N_s = N_t N_\mu \). Let \(\boldsymbol{S}\in \mathbb {R}^{N_h \times N_s}\) denote the snapshot matrix collecting all the \(N_s\) snapshot vectors as columns.

Using the POD, r basis vectors are obtained and collected in a matrix \(\boldsymbol{V}_r\in \mathbb {R}^{N_h \times r}\), whose i-th column, i.e., the i-th basis vector, corresponds to the i-th eigenvalue \(\lambda ^{(i)}\) of the correlation matrix \(\boldsymbol{C} = \boldsymbol{S}^\text {T}\boldsymbol{S}\), \(1\le i \le r\). In fact, given the SVD of the snapshot matrix \(\boldsymbol{S}\), written as

$$\begin{aligned} \boldsymbol{S}= \boldsymbol{U}{\boldsymbol{\Sigma }} \boldsymbol{Z}^\text {T}, \end{aligned}$$
(18)

the basis vectors in \(\boldsymbol{V}_r\) are the first r columns of \(\boldsymbol{U}\), i.e., \(\boldsymbol{V}_r = \boldsymbol{U}[:,0:r-1]\) in a Python notation, and \({\boldsymbol{\Sigma }}\) is a diagonal matrix of singular values, i.e., \({\boldsymbol{\Sigma }} = \text {diag}(\sqrt{\lambda ^{(1)}},\sqrt{\lambda ^{(2)}},\ldots ,\sqrt{\lambda ^{(N_s)}})\). Especially when the singular values decay fast, a small number of basis vectors can achieve a small projection error according to (16).

In this way, a reduced basis \(\boldsymbol{V}_r\) is obtained, reducing the \(N_h\)-dimensional, full-order solution space \(\mathbb {R}^{N_h}\) to an r-dimensional, reduced space \(\text {Col}(\boldsymbol{V}_r)\), \(\text {Col}\) representing the column space. With a rapid decay of singular values, the dimensionality reduction is significant (\(r\ll N_h\)) but the accuracy is still under control.

3.2 Reduced-Order Solutions

The discrete solution is approximated as a linear combination of RB vectors \(\boldsymbol{V}_r\), written as \(\boldsymbol{u}_h \approx \boldsymbol{V}_r \boldsymbol{q}_r\) with \(\boldsymbol{q}_r\in \mathbb {R}^r\) denoting the RB coefficients, and project the full-order system (5) onto the reduced space \(\text {Col}(\boldsymbol{V}_r)\). A reduced-order system is thus obtained as

$$\begin{aligned} \rho \boldsymbol{M}_r \left[ \ddot{\boldsymbol{q}}_r({\boldsymbol{\mu }})+\eta \dot{\boldsymbol{q}}_r({\boldsymbol{\mu }})\right] + \boldsymbol{A}_r({\boldsymbol{\mu }}){\boldsymbol{q}}_r({\boldsymbol{\mu }}) = h(t;{\boldsymbol{\mu }})\boldsymbol{f}_r({\boldsymbol{\mu }}), \end{aligned}$$
(19)

in which the reduced-size matrices \(\boldsymbol{M}_r\in \mathbb {R}^{r\times r}\), \(\boldsymbol{A}_r\in \mathbb {R}^{r\times r}\) and \(\boldsymbol{f}_r\in \mathbb {R}^{r}\) are defined as \(\boldsymbol{M}_r=\boldsymbol{V}_r^\text {T}\boldsymbol{M}_h\boldsymbol{V}_r\), \(\boldsymbol{A}_r=\boldsymbol{V}_r^\text {T}\boldsymbol{A}_h\boldsymbol{V}_r\) and \(\boldsymbol{f}_r=\boldsymbol{V}_r^\text {T}\boldsymbol{f}_h\), respectively. Such an r-dimensional reduced system is solved in the online stage for any new parameter value \({\boldsymbol{\mu }}\).

If the full-size, parameter-dependent stiffness matrix \(\boldsymbol{A}_h({\boldsymbol{\mu }})\) and source term vector \(\boldsymbol{f}_h({\boldsymbol{\mu }})\) can be expressed as a linear combination of parameter-independent matrices/vectors with scalar-valued, parameter-dependent coefficients, often referred to as an affine form, i.e., \(\boldsymbol{A}_h({\boldsymbol{\mu }})=\sum _j \omega _j^a({\boldsymbol{\mu }}) \boldsymbol{A}_j\) and \(\boldsymbol{f}_h({\boldsymbol{\mu }})=\sum _j \omega _j^f ({\boldsymbol{\mu }})\boldsymbol{f}_j\), one can evaluate their reduced-size counterparts offline as \(\boldsymbol{A}_{r,j}=\boldsymbol{V}_r^\text {T}\boldsymbol{A}_j\boldsymbol{V}_r\) and \(\boldsymbol{f}_{r,j}=\boldsymbol{V}_r^\text {T}\boldsymbol{f}_j\), \(j=1,2,\ldots \), and the online assembly only requires linear combinations \(\boldsymbol{A}_r({\boldsymbol{\mu }})=\sum _j \omega ^a_j({\boldsymbol{\mu }})\boldsymbol{A}_{r,j}\) and \(\boldsymbol{f}_r({\boldsymbol{\mu }})=\sum _j \omega ^f_j({\boldsymbol{\mu }})\boldsymbol{f}_{r,j}\), respectively. In this case, the online assembly is conducted in the reduced dimensionality and guarantees a good online efficiency. However, if an affine form of the full-size matrix/vector is not available, one has to recall the them during the reduced-size assembly, which can often compromise the online efficiency. To overcome the difficulties stemming from the non-affinity, hyper-reduction techniques have been developed to recover an affine approximation of the non-affine operators, see [18, 19] for example.

An alternative approach to recover reduced-order solutions is through non-intrusive surrogate modeling. In addition to the construction of an RB space using the POD, one has to train a regression model to approximate \(\boldsymbol{q}_r: ]0,T]\times \mathcal {P}\rightarrow \mathbb {R}^{r}\), \((t,{\boldsymbol{\mu }})\mapsto \boldsymbol{V}_r^\text {T}\boldsymbol{u}_h(t;{\boldsymbol{\mu }})\), mapping the time-parameter inputs to the projection coefficients onto the RB. The training data of input–output pairs are derived from a set of collected full-order snapshots. Gaussian process regression has been used for the non-intrusive, reduced-order surrogate modeling in [20,21,22,23].

4 Automatic Anomaly Detection with Unbalanced Datasets

This section presents a data-driven technique to detect, localize, and estimate the severity of structural anomalies by observing healthy configurations only. One-class classification learning methods offer the possibility of training a set of samples all belonging to the same class and test if a new sample is abnormal, i.e., it belongs to a different class with respect to the training data. Typical one-class classification methods, sometimes called semi-supervised methods, include one-class support vector machines (oc-SVMs), isolation forests, and local outlier factors. During the offline phase, these methods learn a description of the salient features that the training data have in common to ultimately detect if a previously unseen object reflects this description by means of an online anomaly (or novelty) score. If the new unseen sample is associated with an anomaly score close to the ones observed in the training phase, the new object is classified as healthy, otherwise it is classified as damaged. The crucial part is to define what close to means from a mathematical standpoint. Let x be an unseen object, then the outcome of all one-class classification methods can be summarized as follows:

$$\begin{aligned} {\left\{ \begin{array}{ll} {\text {score}}(x) \ge \theta &{} {\text {damaged}}/{\text {outlier}} \\ {\text {score}}(x) < \theta &{} {\text {healthy}}/{\text {inlier}} \end{array}\right. }, \end{aligned}$$
(20)

where \(\text {score}(x)\) is the anomaly score associated with x and \(\theta \) is an ad hoc threshold to be estimated by observing the anomaly score value of healthy data only. From a practical perspective, in the semi-supervised context, \(\theta \) is heuristically chosen by observing the highest anomaly score value in the training data, i.e., \(\theta \) should be equal to the anomaly score of the most outlier sample among all the inliers. Consider \(\mathcal {D}\) the dataset of healthy measurements and \(\varepsilon \in \mathbb {R}\), then the threshold value is fixed as

$$\begin{aligned} \theta = \max _{x\in \mathcal {D}} \,{\text {score}}(x) + \varepsilon , \end{aligned}$$
(21)

where a positive value of \(\varepsilon \) indicates the user accepts a higher false alarm rate, while a negative value implies a higher miss detection rate. The trade-off between false positive and false negative errors should guide the choice of \(\varepsilon \) and ultimately of \(\theta \). It becomes clear that to choose an effective threshold value, the training set has to be the most comprehensive as possible, covering several healthy environmental and operational scenarios.

An alternative approach to detect anomalies is to include sensor measurements belonging to damaged structures in the training set, which leads to using traditional two-class supervised learning methods to distinguish healthy scenarios from damaged ones. In this approach, the choice of the threshold value benefits form the availability of two (or more) classes in the training phase. However, an increasing trend toward the assumption that it would be unreasonable to describe all types of damages is observed in the literature; as a consequence, representing only some damaged configurations would lead to a bias toward certain types and therefore to possible misdetections with high probability (see, e.g., [5, 24]). For this reason, in this chapter, one-class classification methods are used instead.

Fig. 1
figure 1

Flowchart to compare the feature-level (top) and the decision-level (bottom) fusion approaches for the semi-supervised damage detection strategy with multidimensional training data captured by m sensors

The general one-class approach is introduced in Sect. 4.1 and explain the need for feature selection in subsection 4.2. For a detailed description of the classic one-class models, the reader is referred to [25, 26] for the oc-SVMs, [27] for the isolation forest and [28] for the local outlier factor. A Python implementation of these methods can be found for example in scikit-learn library [29].

4.1 Local Semi-supervised Method

Considering that the time signals, i.e., the output of interests (2), are collected at multiple sensor locations, one has to decide on how to best aggregate these data. There exists two typical approaches in the literature to combine sensor data: decision-level fusion or feature-level fusion. The latter combines data after feature extraction and considers one global classifier (sensor independent), thus exploiting the correlations across sensors. On the other hand, in decision-level fusion, the signals are classified for each sensor location by a local classifier (sensor dependent) and the results are then combined into a decision output. The two strategies are summarized in Fig. 1. While the superiority of one method over the other one depends strongly on the problem at hand, to exploit the local aspect of the data the authors propose to use the decision-level fusion approach which facilitates the use of a hierarchical classification approach where increasing levels of damage identification can be defined to ultimately gain information on the existence, localization, and severity of the damage.

In a decision-level fusion approach, one has to train as many one-class algorithms as the number of sensor locations. Thus, the global classification model (20) is replaced with m local detection models, where m is the number of sensors:

$$\begin{aligned} {\left\{ \begin{array}{ll} {\text {score}}_i(x) \ge \theta _i &{} {\text {damage in the proximity of the}}\, i^{th} \,{\text {sensor}}, \\ {\text {score}}_i(x) < \theta _i &{} {\text {health in the proximity of the}}\, i^{th} \,{\text {sensor}}, \end{array}\right. }, \end{aligned}$$
(22)

for \(1\le i\le m\). From a computational cost point of view, note that the process can be run in parallel since the local models are independent. Moreover, in the feature-based fusion approach, aggregating the local features leads to high-dimensional input data, while the dimensionality of the input data for the classifiers in the decision-based strategy remains small. As further explained in Sect. 4.2, high-dimensional data may lead to overfitting, a well-known problem in machine learning.

4.2 Damage-Sensitive Features

Overfitting refers to the phenomenon observed when the model performs well on training data, but fails to generalize well to new observation. An overfitted model is described by more parameters than can be justified by the data and is typically associated with high-dimensional input data. While sometimes adding more training samples may be a remedy to overfitting, a more effective solution to prevent this behavior is to reduce the dimensionality of the data, i.e., to perform what is called feature selection.

Recall that the input data correspond to \(N_t\)-dimensional sensor signals, with \(N_t\) the number of time steps, which may be very large. Therefore, one needs to express these high-dimensional time signals by means of few variables, extracted from the signals themselves. Ideal features should be sensitive to damage and, at the same time, robust toward noise and healthy variations. Common choices for the engineering-based, damage-sensitive features can be found for example in [24, 30]. When studying the acoustic-elastic equation, i.e., in the context of guided-waves, relevant features are the crest factor, which indicates how extreme the peaks are in a waveform, the maximum and minimum values of the time response, the corresponding arrival times, i.e., the onset, and the number of peaks and troughs in the signals. Without further detail, the reader is referred to [5] and references therein for a thorough description of damage-sensitive features for a guided-wave monitoring approach.

Finally, note that autoencoders, a particular type of neural networks, trained to attempt to copy their inputs to their outputs, have gained particular interest in the framework of anomaly detection, see, e.g., [31,32,33]. The main advantage of using autoencoders for anomaly detection is that specific engineering-based damage indicator features do not need to be specified by the user, different from classic one-class methods. Instead, by learning the features which suffice to describe and reconstruct the input, autoencoders provide a purely data-driven feature extraction method. Hence, raw measurements such as sensor time signals can be used directly.

5 Finding Optimal Sensor Locations Using Gaussian Processes

Until this point, this work has covered the problem of how to best detect damages given a fixed network of sensors. The specular research question is how to choose the location (and the number) of sensors in order to best detect defaults. Sensor placement strategies are extremely important to optimally equip structures, whose monitoring performance depends critically on the quality of the information collected by sensors. Hence, it is no surprise that this problem has been extensively addressed in the SHM literature, see, e.g., the thorough review [34] and references therein. For most of the sensor placement strategies, the objective is to optimize a suitable cost function with respect to some operational parameters, e.g., the candidate sensor locations and the available number of sensors. However, classic cost functions are usually formulated in terms of damage detectability, which poses a problem when one wishes to make no assumption on the potential damages. Thus, a procedure to place a fixed budget of sensors in the context of anomaly detection is proposed, i.e., when only healthy scenarios are included in the training phase. The proposed strategy relies on sparse Gaussian process to identify the spacial positions that minimize the reconstruction error of an output of interest at all “unsensed” locations. The quantity of interest that defines the cost function for the sensor placement optimization algorithm is the same quantity used to train the anomaly detection classifier, i.e., the damage-sensitive features extracted from the synthetic sensor measurements (11), as described in Sect. 4. As such, the proposed placement strategy is based on an appropriate indicator of the damage detection performance of a given network. Note that, while this approach requires the number and type of sensors to be fixed, it can be easily extended to help the user to identify the minimum number of sensors to achieve a preset coverage.

This section presents the sensor placement strategy introduced in [35], to which the reader is referred for a more in-depth discussion. In particular, after a brief introduction to Gaussian process (GP) regression and sparse GP in Sect. 5.1, the description of how to leverage this technique to systematically place sensors on a structure of interest is provided in Sect. 5.2.

5.1 (Sparse) Gaussian Process Regression

A GP is a collection of random variables, any finite number of which obeys a joint Gaussian distribution. In Gaussian process regression (GPR), the prior of regression function is assumed to be a GP corrupted by an independent Gaussian noise term, i.e., for \(({\boldsymbol{x}},{\boldsymbol{x}}^{\prime })\in \Omega \times \Omega \) with \(\Omega \subset \mathbb {R}^{d_\Omega }\) denoting the domain of regression,Footnote 1

$$\begin{aligned} f({\boldsymbol{x}})\sim \text {GP}(\boldsymbol{0},\kappa ({\boldsymbol{x}},{\boldsymbol{x}}^{\prime })),\quad y=f(\boldsymbol{x})+\epsilon ,\quad \epsilon \sim \mathcal {N}(0,\chi ^2). \end{aligned}$$
(23)

There are many different options for the covariance/kernel function \(\kappa :\Omega \times \Omega \rightarrow \mathbb {R}\), a typical form of which is written as

$$\begin{aligned} \kappa ({\boldsymbol{x}},{\boldsymbol{x}}^{\prime })= \sigma ^{2} \phi (r), \end{aligned}$$
(24)

where \(\phi (\cdot )\) is a radial basis function and rFootnote 2 can be defined as

$$\begin{aligned} r=\Vert {\boldsymbol{x}}-{\boldsymbol{x}}^{\prime }\Vert /\ell \quad \text {or}\quad r=\sqrt{\sum _{k=1}^{d_\Omega }\frac{(x_k-x_k^{\prime })^2}{\ell _k^2}}, \end{aligned}$$

the former for a stationary kernel with isotropic lengthscale \(\ell \), while the latter for an automatic relevance determination (ARD) kernel that considers an individual correlated lengthscale \(\ell _k\) for each input dimension and allows for differentiated relevances of input features to the regression.

Given a finite number of training input locations in the domain \(\Omega \), a prior joint Gaussian is defined for the corresponding outputs:

$$\begin{aligned} \boldsymbol{y}|\boldsymbol{X}\sim \mathcal {N}(\boldsymbol{0},\boldsymbol{K}_y),\quad \boldsymbol{K}_y=\mathrm {cov}[\boldsymbol{y}|\boldsymbol{X}]=\kappa (\boldsymbol{X},\boldsymbol{X})+\chi ^2\boldsymbol{I}_M, \end{aligned}$$
(25)

where \(\boldsymbol{y}=\{y_1,y_2,\ldots ,y_M\}^{\mathrm {T}}\), \(\boldsymbol{X}=[{\boldsymbol{x}}_1|{\boldsymbol{x}}_2|\cdots |{\boldsymbol{x}}_M]^\text {T}\), \(\boldsymbol{I}_M\) is the M-dimensional unit matrix, and M is the number of training samples.

The goal of a regression model is to predict the noise-free output \(f^*({\boldsymbol{s}})\) at any new, unseen input location \({\boldsymbol{s}}\in \Omega \). By the standard Bayesian rule \(p(f^*({\boldsymbol{s}})|\boldsymbol{X},\boldsymbol{y})=p(f^*,\boldsymbol{y}|{\boldsymbol{s}},\boldsymbol{X})/p(\boldsymbol{y}|\boldsymbol{X})\), the posterior distribution conditioned on the training data \((\boldsymbol{X},\boldsymbol{y})\) can be obtained as a new GP:

$$\begin{aligned} \begin{aligned}&f^*({\boldsymbol{s}})|\boldsymbol{X},\boldsymbol{y}\sim \text {GP}(m^*({\boldsymbol{s}}),c^*({\boldsymbol{s}},{\boldsymbol{s}}^{\prime })), \\ m^*({\boldsymbol{s}}) =&\kappa ({\boldsymbol{s}},\boldsymbol{X})\boldsymbol{K}_y^{-1}\boldsymbol{y}, \quad c^*({\boldsymbol{s}},{\boldsymbol{s}}^{\prime }) = \kappa ({\boldsymbol{s}},{\boldsymbol{s}}^{\prime })-\kappa ({\boldsymbol{s}},\boldsymbol{X})\boldsymbol{K}_y^{-1}\kappa (\boldsymbol{X},{\boldsymbol{s}}^{\prime }), \end{aligned} \end{aligned}$$
(26)

The values of hyperparameters \({\boldsymbol{\theta }}=\{\ell \text {or} (\ell _1,\ldots ,\ell _d),\sigma ^2,\chi ^2\}\) make significant difference on the predictive performance. In this chapter, an empirical Bayesian approach of maximizing marginal likelihood is adopted to determine a set of optimal values of the parameters. Using a standard gradient-based optimizer, the optimal hyperparameters \({\boldsymbol{\theta }}^*\) can be estimated via the maximization problem as follows:

$$\begin{aligned} \begin{aligned} {\boldsymbol{\theta }}^* =&\arg \max _{{\boldsymbol{\theta }}}\log p(\boldsymbol{y}|\boldsymbol{X},{\boldsymbol{\theta }}) = \arg \max _{{\boldsymbol{\theta }}}\log \left[ \mathcal {N}(\boldsymbol{y}|\boldsymbol{0}, \boldsymbol{K}_y({\boldsymbol{\theta }}))\right] \\&= \arg \max _{{\boldsymbol{\theta }}}\left\{ -\frac{1}{2}\boldsymbol{y}^{\mathrm {T}}\boldsymbol{K}_y^{-1}({\boldsymbol{\theta }}) \boldsymbol{y}-\frac{1}{2}\log \left| \boldsymbol{K}_y({\boldsymbol{\theta }})\right| -\frac{M}{2}\log (2\pi )\right\} , \end{aligned} \end{aligned}$$
(27)

where \(p(\boldsymbol{y}|\boldsymbol{X},{\boldsymbol{\theta }})\) is the density function of \(\boldsymbol{y}\) given \(\boldsymbol{X}\) under hyperparameters \({\boldsymbol{\theta }}\), considered as the marginal likelihood \(p(\boldsymbol{y}|\boldsymbol{X},{\boldsymbol{\theta }})=\int p(\boldsymbol{y}|\boldsymbol{f},\boldsymbol{X},{\boldsymbol{\theta }})p(\boldsymbol{f}|\boldsymbol{X},{\boldsymbol{\theta }})\boldsymbol{\boldsymbol{f}}\).

It is important to remark that the computational complexity of generating a GP model is \(\mathcal {O}(M^3)\) and the associated storage requirement \(\mathcal {O}(M^2)\), which becomes intractable for large datasets. To overcome the computational limitation, the corresponding sparse methods rely on a small set of \(m \ll M\) points, called inducing points, to facilitate the information gain of the whole dataset, thus allowing for a complexity reduction, i.e., \(\mathcal {O}(Mm^2)\). An overview of well-known sparse GP regression methods can be found for example in [36], where each sparse method is described as an exact inference with a specific approximated prior, different from the true GP prior (23). A different approach is presented in [37], where both the m inducing points, indicated as \(\boldsymbol{D} = [{\boldsymbol{d}}_1|{\boldsymbol{d}}_1| \cdots |{\boldsymbol{d}}_m]^\text {T}\), and the hyperparameters \({\boldsymbol{\theta }}\) are considered as variational parameters to be estimated by minimizing the Kullback–Leibler (KL) divergence between the true posterior (26) and a variational posterior. This is equivalent to maximize the following variational lower bound:

$$\begin{aligned} \begin{aligned} (\boldsymbol{D}^*,{\boldsymbol{\theta }}^*)&= \arg \max _{\boldsymbol{D},{\boldsymbol{\theta }}} \mathcal {L}(\boldsymbol{D},{\boldsymbol{\theta }}) \\&= \arg \max _{\boldsymbol{D},{\boldsymbol{\theta }}} \left\{ \log \left[ \mathcal {N}(\boldsymbol{y}|\boldsymbol{0}, \boldsymbol{Q}(\boldsymbol{D},{\boldsymbol{\theta }})+\chi ^2\boldsymbol{I}_M)\right] - \frac{1}{2\chi ^2}\text {Tr}(\kappa (\boldsymbol{X},\boldsymbol{X}) -\boldsymbol{Q}(\boldsymbol{D},{\boldsymbol{\theta }}) )\right\} ,\\ \end{aligned} \end{aligned}$$
(28)

where \(\boldsymbol{Q} = \kappa (\boldsymbol{X},\boldsymbol{D})(\kappa (\boldsymbol{D},\boldsymbol{D}))^{-1}\kappa (\boldsymbol{D},\boldsymbol{X})\) is the Nystrom approximation of the true prior covariance. Note that the trace term in (28) acts as a regularization term of the marginal log likelihood, which can be viewed as an accuracy indicator of how well the inducing points summarize the overall statistics.

5.2 Variational Approximation for Systematic Sensor Placement

The aforementioned variational sparse GP model together with the numerical approach defined in the previous sections can be used to systematically place a network of sensors on a structure of interest. Following the description in Sect. 4.2, let \(\boldsymbol{y}=\{ y_1, \dots , y_{n_\text {dof}}\}^\text {T}\) be the damage-sensitive features extracted from the synthetic time signals (11), collected at the \(n_\text {dof}\) points of a coarse mesh of the input domain \(\Omega \), which is denoted as \(\boldsymbol{X}=[{\boldsymbol{x}}_1|{\boldsymbol{x}}_2|\cdots |{\boldsymbol{x}}_{n_\text {dof}}]^\text {T}\), where \(n_\text {dof}\ll N_h\). Given m the number of sensors that the user wishes to place on the structure, one can apply the variational sparse GP strategy to this collection of data, with m being the number of desired inducing points. Ultimately, one can identify the sought sensor locations with the inducing points obtained by variational inference.

Although the procedure is quite simple, some remarks ought to be made. First of all, observe that the hyperparameters and the inducing inputs are estimated by maximizing the variational lower bound (28), which is in general an unconstrained, non-convex optimization problem. This may be problematic because one needs to impose some locality constraints on the inducing points to prevent them from being outside the input domain, especially when this is non-convex. Therefore, the standard variational approximation should be replaced with a constrained optimization:

$$\begin{aligned} (\boldsymbol{D}^*,{\boldsymbol{\theta }}^*) = \arg \max _{\boldsymbol{D}\in \Omega _s,{\boldsymbol{\theta }}}\,\, \mathcal {L}(\boldsymbol{D},{\boldsymbol{\theta }}), \end{aligned}$$
(29)

where \(\Omega _s \subset \Omega \) indicates the admissible domain for sensor locations and, with a slight abuse of notation, \(\boldsymbol{D}\in \Omega _s\) means that each inducing point \({\boldsymbol{d}}_i, 1\le i\le m\) is constrained to belong to \(\Omega _s\). For real-world problems, the complexity of the domain may be such that the boundaries of \(\Omega _s\) cannot be easily specified analytically and, in such cases, it may be worth to replace \(\Omega _s\) with a discrete counterpart. If that is the case, instead of gradient-based optimization techniques, one could opt for discrete optimization methods such as the genetic algorithm [38].

A second point to notice is that the output of interest \(\boldsymbol{y}\) is in general parameter dependent, i.e., \(\boldsymbol{y}=\boldsymbol{y}(\boldsymbol{\mu })\). Hence, choosing the sensor locations as the optimal inducing points obtained for one specific input configuration may not be optimal for another context, described by a different parameter. To overcome this, this work proposes to apply the variational sparse GP approach to \(N_\mu \) outputs of interest \(\boldsymbol{y}(\boldsymbol{\mu }_i)\), with \(\boldsymbol{\mu }_i\in \mathcal {P}\) for \(1\le i\le N_\mu \). To summarize the information from the so-obtained \(N_\mu m\) inducing points, the K-medoids algorithm, a well-known unsupervised clustering technique, is employed to find m clusters and their corresponding centers, called centroids. As a last step, the desired sensor locations will be chosen as the clusters’ centers.

To quantify the quality of the placements, the simplest choice is to compare the relative reconstruction error of the high-fidelity quantity of interest at unsensed locations with respect to the mean of the posterior distribution of the sparse model based on the estimated variational parameters. Alternatively, the pointwise relative variance reduction, defined as

$$\begin{aligned} V_i = \frac{\kappa ({\boldsymbol{x}}_i,\boldsymbol{D}^*)(\kappa (\boldsymbol{D}^*,\boldsymbol{D}^*))^{-1}\kappa (\boldsymbol{D}^*,{\boldsymbol{x}}_i)}{\kappa ({\boldsymbol{x}}_i,{\boldsymbol{x}}_i)}, \quad \text {for} \, 1\le i\le n_\text {dof}, \end{aligned}$$
(30)

provides an indicator on how much variance reduction can be achieved by including \({\boldsymbol{x}}_i\) to the set of selected sensor locations. When the relative variance reduction is close to one, it means that the inducing variables alone can well reproduce the full GP prediction.

6 Numerical Example

In this section, a numerical problem in 2D is used to illustrate the results in terms of damage detection and sensor placement. Similar results for more complex 3D problems can be found in [5, 35].

Damage detection

The first step consists in generating a synthetic database of sensor observations. Apply (1) with homogeneous free slip boundary conditions and homogeneous initial conditions, i.e., \(\boldsymbol{u}_0=\boldsymbol{v}_0=\boldsymbol{t}_N=0\), to the healthy geometry illustrated in Fig. 2a and equipped with \(m=15\) sensors. The high-fidelity numerical solutions are computed using the FE approximation with \(\mathbb {P}_1\) elements over a discretized domain with a total of \(N_h = 30'912\) degrees of freedom, while for the RB solver the model relies on 267 basis. The natural variations are described by \(d_\mu =3\) parameters, i.e., \(\boldsymbol{\mu }=(E,\nu ,k)\in \mathcal {P}= [0.999, 1.001] \times [0.329, 0.331] \times [1.9, 2.1]\), where E is the Young’s modulus and \(\nu \) the Poisson’s ratio, defining the stress tensor \(\boldsymbol{\sigma }\), while k is a parameter representing the number of cycles before attenuation of the source impulse. The position of the active source as well as the density and damping coefficients are fixed, i.e., \(\rho = 1, \eta = 0.1\), respectively.

Fig. 2
figure 2

Summary of the one-class SVMs classification results for one healthy geometry (a) and 5 damaged ones (bf)

For each sensor location, one should follow the pipeline introduced in Sect. 4 and generate a training dataset of \(N_\text {train}=1000\) samples, obtained by extracting a few damage-sensitive features from the discrete time signal (11), in their turn obtained with a suitable reduced-order modeling approach, as described in Sect. 3. Then, one has to train m separate one-class SVM models to learn the common traits of the local healthy features and test the results on sensor measurements belonging to both healthy and damaged geometries. In this example, the test signals are generated synthetically using the high-fidelity model. Different from the training set, to approximate experimental measurements, a Gaussian noise term \(\varepsilon _i\sim \mathcal {N}(0,\gamma _i^2)\) is added to the test time signal (11), with \(\gamma \) being the \(0.01\%\) of the maximum amplitude of 30 randomly chosen training healthy signals in the training set. Damages are obtained by modifying the geometry of the structure to include discontinuities, as shown in Fig. 2b–f. The crack-modeling approach is a common choice in the literature, see, e.g., [39] where artificial damages on the blade of a wind turbine are implemented via a trailing edge opening. Anomaly detection results are shown in Fig. 2, where for each geometry the average outcome of the oc-SVM for 10 simulations (i.e., for 10 different input parameters) is presented. For each damaged scenario, at least one sensor is classified as damaged, while for the healthy scenario all sensors are classified as healthy. Moreover, for most of the damaged scenarios, one can observe a certain level of proximity between the cracks and the sensors classified as damaged, thus guiding the localization of damages. An exception corresponds to the geometry in Fig. 2c, where almost all sensors are classified as damaged, thus preventing localization. This issue is attributable to the relative position of the source and the crack, i.e., to localize damage (c) the source should be placed differently. A reasonable solution is to consider different locations for the active source, for example by defining an additional input parameter in the model. This approach is already employed in SHM with Lamb wave propagation where piezoelectric transducers are used as both sensors and actuators (see e.g., [40]).

Sensor placement

To identify the optimal sensor locations, one has to create a new database of \(N_\mu =100\) synthetic observations sampled from a Sobol’s sequence [41]. Keep the same parameters as in the previous paragraph, but, instead of computing the synthetic time signals at few predefined points in space (the sensor locations), collect the measurements at all the nodes of a new coarse mesh of \(\Omega \), for a total of \(n_\text {dof}=360\) degrees of freedom. Following the description given in Sect. 5.2, for each one of the \(N_\mu \) time signals, one has to perform the constrained variational approximation (29) with an ARD-Exponential kernel over \(\Omega _s = \Omega \setminus \partial \Omega \) to obtain a set of m inducing points \(\boldsymbol{D}^*(\boldsymbol{\mu }_i)\) for \(1\le i\le N_\mu \). Figure 3 sketches the clustering results obtained for different values of \(N_\mu \). Observe that the centroids tend to stabilize already after considering a cluster of 10 samples, especially when the sensor budget is high, i.e., for large m values. This can be explained by noticing that when m is small, the algorithm is trying to reconstruct a non-trivial quantity of interest over a complex domain with only a few points, which may lead the sparse model to get stuck in a local minimum without reaching convergence. Finally, the relative variance reduction (30) is used to evaluate the quality of the estimated locations whose results are shown in Fig. 4, where one can observe a variance reduction almost equal to 1 near the sensor locations and a general reduction above 0.7 for all the unsensed locations even for \(m = 4 \), thus indicating a good sensor placement.

Fig. 3
figure 3

Comparison of the centroids obtained with the K-medoids clustering algorithm for different sizes of clusters \(N_\mu \). Each plot shows a fixed number m of inducing points, which increases from left to right

Fig. 4
figure 4

Relative variance reduction obtained using m centroids and averaged over \(N_\mu =100\) samples. Each plot shows a fixed number m of inducing points, which increases from left to right. Values close to 1 indicate a good placement quality

7 Conclusion

This chapter presents how a model-based numerical approach can be integrated with different data-driven techniques in the context of predictive maintenance. A peculiarity of this work is that the authors make no assumption on the type of damages that a structure may undergo during its lifetime, while modeling many environmental and operational healthy scenarios. From a technical point of view, this work describes how reduced-order modeling techniques can be leveraged to generate large and robust datasets of synthetic sensor measurements and it explains how such datasets can be used to learn the salient features that healthy scenarios have in common to ultimately detect damages. The same damage-sensitive features are also used to guide an automated data-driven sensor placement strategy to increase the detection accuracy for a given budget of sensors.

Although a simple 2D example is used to validate the proposed method, the generalization to more complex, possibly nonlinear problems is possible. Note however that for real-world engineering problems the parameter space describing healthy variations is expected to be high dimensional, thus requiring a high computational effort to elaborate the synthetic time signals. To this end, sensitivity analysis techniques such as the variance-based global sensitivity indices [42] or the derivative-based global sensitivity measures (DGSM) [43] are two popular choices to identify a few parameters that influence the output of interest the most. Finally, note that while the proposed method can be used for real-time predictions thanks to the offline and online decomposition of tasks, a filtering technique to integrate the evolution of the structure and update the model would be a valuable addition.