Keywords

1 Introduction

Due to rapid improvements in sensing and communication technologies, embedded systems are now often spatially distributed. Such spatially distributed systems (SDS) consist of heterogeneous components embedded in a specific topological space, whose time-varying behaviors evolve according to complex mutual inter-dependence relations [16]. In the formal methods community, tremendous advances have been achieved for verification and analysis of distributed systems. However, most formal techniques abstract away the specific spatial aspects of distributed systems, which can be of crucial importance in certain applications. For example, consider the problem of developing a bike-sharing system (BSS) in a “sharing economy.” Here, the system consists of a number of bike stations that would use sensors to detect the number of bikes present at a station, and use incentives to let users return bikes to stations that are running low. The bike stations themselves could be arbitrary locations in a city, and the design of an effective BSS would require reasoning about the distance to nearby locations, and the time-varying demand or supply at each location. For instance, the property “there is always a bike and a slot available at distance d from a bike station” depends on the distance of the bike station to its nearby stations. Evaluating whether the BSS functions correctly is a verification problem where the specification is a spatio-temporal logic formula. Similarly, consider the problem of coordinating the movements of multiple mobile robots, or a HVAC controller that activates heating or cooling in parts of a building based on occupancy. Given spatio-temporal execution traces of nodes in such systems, we may be interested in analyzing the data to solve several classical formal methods problems such as fault localization, debugging, invariant generation or specification mining. It is increasingly urgent to formulate methods that enable reasoning about spatially-distributed systems in a way that explicitly incorporates their spatial topology.

In this paper, we focus on one specific aspect of spatio-temporal reasoning: mining interpretable logical properties from data in an SDS. We model a SDS as a directed or undirected graph where individual compute nodes are vertices, and edges model either the connection topology or spatial proximity. In the past, analytic models based on partial differential equations (e.g. diffusion equations) [6] have been used to express the spatio-temporal evolution of these systems. While such formalisms are incredibly powerful, they are also quite difficult to interpret. Traditional machine learning (ML) approaches have also been used to uncover the structure of such spatio-temporal systems, but these techniques also suffer from the lack of interpretability. Our proposed method draws on a recently proposed logic known as Spatio-Temporal Reach and Escape Logic (STREL) [2]. Recent research on STREL has focused on efficient algorithms for runtime verification and monitoring of STREL specifications [2, 3]. However, there is no existing work on mining STREL specifications.

Mined STREL specifications can be useful in many different contexts in the design of spatially distributed systems; an incomplete list of usage scenarios includes the following applications: (1) Mined STREL formulas can serve as spatio-temporal invariants that are satisfied by the computing nodes, (2) STREL formulas could be used by developers to characterize the properties of a deployed spatially distributed system, which can then be used to monitor any subsequent updates to the system, (3) Clustering nodes that satisfy similar STREL formulas can help debug possible bottlenecks and violations in communication protocols in such distributed systems.

There is considerable amount of recent work on learning temporal logic formulas from data [8, 11, 14, 15]. In particular, the work in this paper is closest to the work on unsupervised clustering of time-series data using Signal Temporal Logic [11]. In this work, the authors assume that the user provides a Parametric Signal Temporal Logic (PSTL) formula, and the procedure projects given temporal data onto the parameter domain of the PSTL formula. The authors use off-the-shelf clustering techniques to group parameter values and identify STL formulas corresponding to each cluster. There are a few hurdles in applying such an approach to spatio-temporal data. First, in [11], the authors assume a monotonic fragment of PSTL: there is no such fragment identified in the literature for STREL. Second, in [11], the authors assume that clusters in the parameter space can be separated by axis-aligned hyper-boxes. Third, given spatio-temporal data, we can have different choices to impose the edge relation on nodes, which can affect the formula we learn.

To address the shortcomings of previous techniques, we introduce PSTREL, by treating threshold constants in signal predicates, time bounds in temporal operators, and distance bounds in spatial operators as parameters. We then identify a monotonic fragment of PSTREL, and propose a multi-dimensional binary-search based procedure to infer tight parameter valuations for the given PSTREL formula. We also explore the space of implied edge relations between spatial nodes, proposing an algorithm to define the most suitable graph. After defining a projection operator that maps a given spatio-temporal signal to parameter values of the given PSTREL formula, we use an agglomerative hierarchical clustering technique to cluster spatial locations into hyperboxes. We improve the method of [11] by introducing a decision-tree based approach to systematically split overlapping hyperbox clusters. The result of our method produces axis-aligned hyperbox clusters that can be compactly described by an STREL formula that has length proportional to the number of parameters in the given PSTREL formula (and independent of the number of clusters). Finally, we give human-interpretable meanings for each cluster. We show the usefulness of our approach considering four benchmarks: COVID-19 data from LA County, Outdoor air quality data, BSS data and movements of the customer in a Food Court.

Running Example: A Bike Sharing System (BSS). To ease the exposition of key ideas in the paper, we use an example of a BSS deployed in the city of Edinburgh, UK. The BSS consists of a number of bike stations, distributed over a geographic area. Each station has a fixed number of bike slots. Users can pick up a bike, use it for a while, and then return it to another station in the area. The data that we analyze are the number of bikes (B) and empty slots (S) at each time step in each bike station. With the advent of electric bikes, BSS have become an important aspect in urban mobility, and such systems make use of embedded devices for diverse purposes such as tracking bike usage, billing, and displaying information about availability to users over apps. Figure 1b shows the map of the Edinburgh city with the bike stations. Different colors of the nodes represent different learned clusters as can be seen in Fig. 1a. For example, using our approach, we learn that stations in orange cluster have a long wait time, and stations in red cluster are the most undesirable stations as they have long wait time and do not have nearby stations with bike availability. If we look at the actual location of red points in Fig. 1b, they are indeed far away stations.

Fig. 1.
figure 1

Interpretable clusters automatically identified by our technique.

2 Background

In this section, we introduce the notation and terminology for spatial models and spatio-temporal traces and we describe Spatio-Temporal Reach and Escape Logic (STREL).

Definition 1

(Spatial Model). A spatial model \(\mathcal {S}\) is defined as a pair \(\langle L, W\rangle \), where \(L\) is a set of nodes or locations and \(W\subseteq L\times \mathbb {R}\times L\) is a nonempty relation associating each distinct pair \(\ell _1, \ell _2 \in L\) with a label \(w \in \mathbb {R}\) (also denoted \(\ell _1 \xrightarrow {w} \ell _2\)).

There are many different choices possible for the proximity relation \(W\); for example, \(W\) could be defined in a way that the edge-weights indicate spatial proximity, communication network connectivity etc. Given a set of locations, unless there is a user-specified \(W\), we note that there are several graphs (and associated edge-weights) that we can use to express spatial models. We explore these possibilities in Sect. 3. For the rest of this section, we assume that \(W\) is defined using the notion of \((\delta ,d)\)-connectivity graph as defined in Definition 2.

Definition 2

(\((\delta ,d)\)-connectivity spatial model). Given a compact metric space \(M\) with the distance metric \(d:M\times M\rightarrow \mathbb {R}^{\ge 0}\), a set of locations \(L\) that is a finite subset of \(M\), and a fixed \(\delta \in \mathbb {R}, \delta > 0\), a \((\delta ,d)\)-connectivity spatial model is defined as \(\langle L,W\rangle \), where \((\ell _1,w,\ell _2) \in W\) iff \(d(\ell _1,\ell _2) = w\), and \(w < \delta \).

Example 1

In the BSS, each bike station is a node/location in the spatial model, where locations are assumed to lie on the metric space defined by the 3D spherical manifold of the earth’s surface; each location is defined by its latitude and longitude, and the distance metric is the Haversine distanceFootnote 1. Figure 2b shows the \(\delta \)-connectivity graph of the Edinburgh BSS, with \(\delta =1\,\text {km}\).

Definition 3

(Route). For a spatial model \(\mathcal {S}=\langle L, W\rangle \), a route \(\tau \) is an infinite sequence \(\ell _0\ell _1\cdots \ell _k\cdots \) such that for any \(i \ge 0\), \(\ell _i \xrightarrow {w_i} \ell _{i+1}\).

For a route \(\tau \), \(\tau [i]\) denotes the \(i^{th}\) node \(\ell _i\) in \(\tau \), \(\tau [i..]\) indicates the suffix route \(\ell _i\ell _{i+1}...\), and \(\tau (\ell )\) denotes \(\min {i\mid \tau [i]=\ell }\), i.e. the first occurrence of \(\ell \) in \(\tau \). Note that \(\tau (\ell ) = \infty \) if \(\forall i \tau [i]\ne \ell \). We use \(\mathcal {T}(\mathcal {S})\) to denote the set of routes in \(\mathcal {S}\), and \(\mathcal {T}(\mathcal {S},\ell )\) to denote the set of routes in \(\mathcal {S}\) starting from \(\ell \in L\). We can use routes to define the route distance between two locations in the spatial model as follows.

Definition 4

(Route Distance and Spatial Model Induced Distance). Given a route \(\tau \), the route distance along \(\tau \) up to a location \(\ell \) denoted \(d_{\mathcal {S}}^\tau (\ell )\) is defined as \(\sum _{i=0}^{\tau (\ell )} w_i\). The spatial model induced distance between locations \(\ell _1\) and \(\ell _2\) (denoted \(d_{\mathcal {S}}(\ell _1,\ell _2)\)) is defined as: \(d_{\mathcal {S}}(\ell _1,\ell _2) = \min _{\tau \in \mathcal {T}(\mathcal {S},\ell _1)} d_{\mathcal {S}}^\tau (\ell _2)\).

Note that by the above definition, \(d_{\mathcal {S}}^\tau (\ell ) = 0\) if \(\tau [0] = \ell \) and \(\infty \) if \(\ell \) is not a part of the route (i.e. \(\tau (\ell ) = \infty \)), and \(d_{\mathcal {S}}(\ell _1,\ell _2) = \infty \) if there is no route from \(\ell _1\) to \(\ell _2\).

Spatio-temporal Time-Series. A spatio-temporal trace associates each location in a spatial model with a time-series trace. Formally, a time-series trace x is a mapping from a time domain \(\mathbb {T}\) to some bounded and non-empty set known as the value domain \(\mathcal {V}\). Given a spatial model \(\mathcal {S}= \langle L, W\rangle \), a spatio-temporal trace \(\sigma \) is a function from \(L\times \mathbb {T}\) to \(\mathcal {V}\). We denote the time-series trace at location \(\ell \) by \(\sigma (\ell )\).

Example 2

Consider a spatio-temporal trace \(\sigma \) of the BSS defined such that for each location \(\ell \) and at any given time t, \(\sigma (\ell ,t)\) is \((B(t),S(t))\), where \(B(t)\) and \(S(t)\) are respectively the number of bikes and empty slots at time t.

2.1 Spatio-temporal Reach and Escape Logic (STREL)

Syntax. STREL is a logic that was introduced in [2] as a formalism for monitoring spatially distributed cyber-physical systems. STREL extends Signal Temporal Logic [12] with two spatial operators, reach and escape, from which is possible to derive other three spatial modalities: everywhere, somewhere and surround. The syntax of STREL is given by:

$$ \varphi :\,\!:= true\mid \mu \mid \lnot \varphi \mid \varphi _{1} \wedge \varphi _{2} \mid \varphi _{1} \, \mathrm {U}_{I} \, \varphi _{2} \mid \varphi _{1} \, \mathcal {R}_{D} \, \varphi _{2} \mid \mathcal {E}_{D} \, \varphi . $$

Here, \(\mu \) is an atomic predicate (AP) over the value domain \(\mathcal {V}\). Negation \(\lnot \) and conjunction \(\wedge \) are the standard Boolean connectives, while \(\mathrm {U}_{I}\) is the temporal operator until with I being a non-singular interval over the time-domain \(\mathbb {T}\). The operators \(\mathcal {R}_{D}\) and \(\mathcal {E}_{D}\) are spatial operators where \(D\) denotes an interval over the distances induced by the underlying spatial model, i.e., an interval over \(\mathbb {R}^{\ge 0}\).

Semantics. A STREL formula is evaluated piecewise over each location and each time appearing in a given spatio-temporal trace. We use the notation \((\sigma ,\ell ) \models \varphi \) if the formula \(\varphi \) holds true at location \(\ell \) for the given spatio-temporal trace \(\sigma \). The interpretation of atomic predicates, Boolean operations and temporal operators follows standard semantics for Signal Temporal Logic: E.g., for a given location \(\ell \) and a given time t, the formula \(\varphi _1 \mathrm {U}_{I} \varphi _2\) holds at \(\ell \) iff there is some time \(t'\) in \(t \oplus I\) where \(\varphi _2\) holds, and for all times \(t''\) in \([t,t')\), \(\varphi _1\) holds. Here the \(\oplus \) operator defines the interval obtained by adding t to both interval end-points. We use standard abbreviations \(\mathbf {F}_{I}\varphi = true \mathrm {U}_{I} \varphi \) and \(\mathbf {G}_{I}\varphi = \lnot \mathbf {F}_{I}\varphi \), for the eventually and globally operators. The reachability (\( \mathcal {R}_{D}\)) and escape (\(\mathcal {E}_{D}\))operators are spatial operators. The formula \(\varphi _1 \mathcal {R}_{D} \varphi _2\) holds at a location \(\ell \) if there is a route \(\tau \) starting at \(\ell \) that reaches a location \(\ell '\) that satisfies \(\varphi _2\), with a route distance \(d_{\mathcal {S}}^\tau (\ell ')\) that lies in the interval \(D\), and for all preceding locations, including \(\ell \), \(\varphi _1\) holds true. The escape formula \(\mathcal {E}_{D}\varphi \) holds at a location \(\ell \) if there exists a location \(\ell '\) at a route distance \(d_{\mathcal {S}}(\ell _1,\ell _2)\) that lies in the interval \(D\) and a route starting at \(\ell \) and reaching \(\ell '\) consisting of locations that satisfy \(\varphi \). We define two other operators for notational convenience: The somewhere operator, denoted , is defined as \( true \mathcal {R}_{[0,d]} \varphi \), and the everywhere operator, denoted is defined as , where d is a real positive value; their meaning is described in the next example.

Example 3

In the BSS, we use atomic predicates \(S> 0\) and \(B> 10\), and the formula is true if always within the next 3 h, at a location \(\ell \), there is some location \(\ell '\) at most 1 km from \(\ell \) where, the number of bikes available exceed 10. Similarly, the formula is true at a location \(\ell \) if for all locations within 1 km, for the next 30 mins, there is no empty slot.

Fig. 2.
figure 2

Different approaches for constructing the spatial model for the BSS. (a) shows an \((\infty ,d_\mathrm {hvrsn})\)-connectivity spatial model where \(d_\mathrm {hvrsn}\) is the Haversine distance between locations. (b) shows a \((\delta ,d_\mathrm {hvrsn})\)-connectivity spatial model where \(\delta =1\,\text {km}\). Observe that the spatial model is disconnected. (c) shows an MST-spatial model. (d) shows an \((\alpha ,d_\mathrm {hvrsn})\) enhanced MSG spatial model with \(\alpha =2\). Observe that this spatial model is sparse compared even to the \((\delta ,d_\mathrm {hvrsn})\)-connectivity spatial model.

3 Constructing a Spatial Model

In this section, we present four approaches to construct a spatial model, and discuss the pros and cons of each approach.

  1. 1.

    \((\infty ,d)\)-connectivity spatial model: This spatial model corresponds to the \((\delta ,d)\)-connectivity spatial model as presented in Definition 2, where we set \(\delta = \infty \). We note that this gives us a fully connected graph, i.e. where \(|W|\) is \(O(|L|^2)\). We remark that our learning algorithm uses monitoring STREL formulas as a sub-routine, and from Lemma 2 in AppendixFootnote 2, we can see that as the complexity of monitoring a STREL formula is linear in \(|W|\), a fully connected graph is undesirable.

  2. 2.

    \((\delta ,d)\)-connectivity spatial model: This is the model presented in Definition 2, where \(\delta \) is heuristically chosen in an application-dependent fashion. Typically, the \(\delta \) we choose is much smaller compared to the distance between the furthest nodes in the given topological space. This gives us \(W\) that is sparse, and thus with a lower monitoring cost; however, a small \(\delta \) can lead to a disconnected spatial model which can affect the accuracy of the learned STREL formulas. Furthermore, this approach may overestimate the spatial model induced distance between two nodes (as in Definition 4) that are not connected by a direct edge. For instance, in Fig. 2b, nodes 1 and 8 are connected through the route \(1 \rightarrow 9 \rightarrow 8\), and sum of the edge-weights along this route is larger than the actual (metric) distance of 1 and 8.

  3. 3.

    MST-spatial model: To minimize the number of edges in the graph while keeping the connectivity of the graph, we can use Minimum Spanning Tree (MST) as illustrated in Fig. 2c. This gives us \(|W|\) that is \(O(|L|)\), which makes monitoring much faster, while resolving the issue of disconnected nodes in the \((\delta ,d)\)-spatial model. However, an MST can also lead to an overestimate of the spatial model induced distance between some nodes in the graph. For example, in Fig. 2c, the direct distance between nodes 1 and 8 is much smaller than their route distance (through the route \(1 \rightarrow 2 \rightarrow 3 \rightarrow 4 \rightarrow 5 \rightarrow 6 \rightarrow 7 \rightarrow 8\)).

  4. 4.

    \((\alpha ,d)\)-Enhanced MSG Spatial Model: To address the shortcomings of previous approaches, we propose constructing a spatial model that we call the \((\alpha ,d)\)-Enhanced Minimum Spanning Graph Spatial model. First, we construct an MST over the given set of locations and use it to define \(W\) and pick \(\alpha \) as some number greater than 1. Then, for each distinct pair of locations \(\ell _1,\ell _2\), we compute the shortest route distance \(d_{\mathcal {S}}(\ell _1,\ell _2)\) between them in the constructed MST, and compare it to their distance \(d(\ell _1,\ell _2)\) in the metric space. If \(d_{\mathcal {S}}(\ell _1,\ell _2) > \alpha \cdot d(\ell _1,\ell _2)\), then we add an edge \((\ell _1, d(\ell _1,\ell _2), \ell _2)\) to \(W\). The resulting spatial model is no longer a tree, but typically is still sparseFootnote 3. In our case studies, the cost of building the enhanced MSG spatial model was insignificant compared to the other steps in the learning procedureFootnote 4.

4 Learning STREL Formulas from Data

In this section, we first introduce Parametric Spatio-Temporal Reach and Escape Logic (PSTREL) and the notion of monotonicity for PSTREL formulas. Then, we introduce a projection function \(\pi \) that maps a spatio-temporal trace to a valuation in the parameter space of a given PSTREL formula. We then cluster the trace-projections using Agglomerative Hierarchical Clustering, and finally learn a compact STREL formula for each cluster using Decision Tree techniques.

Parametric STREL (PSTREL). Parametric STREL (PSTREL) is a logic obtained by replacing one or more numeric constants appearing in STREL formulas by parameters; parameters appearing in atomic predicates are called magnitude parameters \(\mathcal {P}_{\mathcal {V}}\), and those appearing in temporal and spatial operators are called timing \(\mathcal {P}_{\mathbb {T}}\) and spatial parameters \(\mathcal {P}_{d_{\mathcal {S}}}\) respectively. Each parameter in \(\mathcal {P}_{\mathcal {V}}\) take values from \(\mathcal {V}\), those in \(\mathcal {P}_{\mathbb {T}}\) take values from \(\mathbb {T}\), and those in \(\mathcal {P}_{d_{\mathcal {S}}}\) take values from \(\mathbb {R}^{\ge 0}\) (i.e. the set of values that the \(d_{\mathcal {S}}\) metric can take for a given spatial model). We define a valuation function \(\nu \) that maps all parameters in a PSTREL formula to their respective values.

Example 4

Consider the PSTREL versions of the STREL formulas introduced in Example 3 \(\varphi (\mathbf {p}_\tau ,\mathbf {p}_d,\mathbf {p}_c)\) = . The valuation \(\nu \): \(\mathbf {p}_\tau \mapsto 3\,\text {h}\), \(\mathbf {p}_d \mapsto 1\mathrm {km}\), and \(\mathbf {p}_c \mapsto 10\) returns the STREL formula introduced in Example 3.

Definition 5

(Parameter Polarity, Monotonic PSTREL). A polarity function \(\gamma \) maps a parameter to an element of \(\{+,-\}\), and is defined as follows:

figure a

The monotonic fragment of PSTREL consists of PSTREL formulas where all parameters have either positive or negative polarity.

In simple terms, the polarity of a parameter \(\mathbf {p}\) is positive if it is easier to satisfy \(\varphi \) as we increase the value of \(\mathbf {p}\) and is negative if it is easier to satisfy \(\varphi \) as we decrease the value of \(\mathbf {p}\). The notion of polarity for PSTL formulas was introduced in [1], and we extend this to PSTREL and spatial operators. The polarity for PSTREL formulas \(\varphi (d_1,d_2)\) of the form , \(\psi _1\mathcal {R}_{[d_1,d_2]}\psi _2\), and \(\mathcal {E}_{[d_1,d_2]}\psi \) are \(\gamma (d_1) = -\) and \(\gamma (d_2) = +\), i.e. if a spatio-temporal trace satisfies \(\varphi (\nu (d_1),\nu (d_2))\), then it also satisfies any STREL formula over a strictly larger spatial model induced distance interval, i.e. by decreasing \(\nu (d_1)\) and increasing \(\nu (d_2)\). For a formula , \(\gamma (d_1) = +\) and \(\gamma (d_2) = -\), i.e. the formula obtained by strictly shrinking the distance interval. The proofs are simple, and provided in Appendix for completeness.

Definition 6

(Validity Domain, Boundary). Let \(P= \mathcal {V}^{|\mathcal {P}_{\mathcal {V}}|} \times \mathbb {T}^{| \mathcal {P}_{\mathbb {T}}|}\times (\mathbb {R}^{\ge 0})^{|\mathcal {P}_{d_{\mathcal {S}}}|}\) denote the space of parameter valuations, then the validity domain \(V\) of a PSTREL formula at a location \(\ell \) with respect to a set of spatio-temporal traces \(\varSigma \) is defined as follows: \( V(\varphi (\mathbf {p}),\ell ,\varSigma ) = \{ \nu (\mathbf {p}) \mid \mathbf {p}\in P, \sigma \in \varSigma , (\sigma ,\ell ) \models \varphi (\nu (\mathbf {p})) \} \) The validity domain boundary \(\partial V(\varphi (\varphi ),\ell ,\varSigma )\) is defined as the intersection of \(V(\varphi ,\ell ,\varSigma )\) with the closure of its complement.

Spatio-temporal Trace Projection. We now explain how a monotonic PSTREL formula \(\varphi (\mathbf {p})\) can be used to automatically extract features from a spatio-temporal trace. The main idea is to define a total order \(>_\mathcal {P}\) on the parameters \(\mathbf {p}\) (i.e. parameter priorities) that allows us to define a lexicographic projection of the spatio-temporal trace \(\sigma \) at each location \(\ell \) to a parameter valuation \(\nu (\mathbf {p})\) (this is similar to assumptions made in [8, 11]). We briefly remark how we can relax this assumption later. Let \(\nu _j\) denote the valuation of the \(j^{th}\) parameter.

Definition 7

(Parameter Space Ordering, Projection). A total order on parameter indices \(j_1> \ldots > j_n\) imposes a total order \(\prec _\mathrm {lex}\) on the parameter space defined as:

$$ \nu (\mathbf {p}) \prec _\mathrm {lex}\nu '(\mathbf {p}) \Leftrightarrow \exists j_k \text { s.t. }\left\{ \begin{array}{l} \gamma (\mathbf {p}_{j_k}) = + \Rightarrow \nu _{j_k}< \nu '_{j_k} \\ \gamma (\mathbf {p}_{j_k}) = - \Rightarrow \nu _{j_k} > \nu '_{j_k} \end{array}\right. \, \mathrm {and}\ \forall m <_\mathcal {P}k, \nu _m = \nu '_m. $$

Given above total order, \(\pi _{\mathrm {lex}}(\sigma ,\ell ) = \inf _{\prec _\mathrm {lex}} \{\nu (\mathbf {p}) \in \partial V(\varphi (\mathbf {p}),\{\sigma \}\}\).

In simple terms, given a total order on the parameters, the lexicographic projection maps a spatio-temporal trace to valuations that are least permissive w.r.t. the parameter with the greatest priority, then among those valuations, to those that are least permissive w.r.t. the parameter with the next greater priority, and so on. Finding a lexicographic projection can be done by sequentially performing binary search on each parameter dimension [11]Footnote 5. It is easy to show that \(\pi _{\mathrm {lex}}\) returns a valuation on the validity domain boundary.

Remark 1

The order of parameters is assumed to be provided by the user and is important as it affects the unsupervised learning algorithms for clustering that we apply next. Intuitively, the order corresponds to what the user deems as more important. For example, consider the formula . Note that \(\gamma (d) = +\), and \(\gamma (c) = -\). Now if the user is more interested in the radius around each station where the number of bikes exceeds some threshold (possibly 0) within 3 h, then the order is \(d >_\mathcal {P}c\). If she is more interested in knowing what is the largest number of bikes available in any radius (possibly \(\infty \)) always within 3 h, then \(c >_\mathcal {P}d\).

Remark 2

Similar to [18], we can compute an approximation of the validity domain boundary for a given trace, and then apply a clustering algorithm on the validity domain boundaries. This does not require the user to specify parameter priorities. In all our case studies, the parameter priorities were clear from the domain knowledge, and hence we will investigate this extension in the future.

Clustering. The projection operator \(\pi _{\mathrm {lex}}(\sigma ,\ell )\) maps each location to a valuation in the parameter space. These valuation points serve as features for off-the-shelf clustering algorithms. In our experiments, we use the Agglomerative Hierarchical Clustering (AHC) technique [5] to automatically cluster similar valuations. AHC is a bottom-up approach that starts by assigning each point to a single cluster, and then merging clusters in a hierarchical manner based on a similarity criteriaFootnote 6. An important hyperparameter for any clustering algorithm is the number of clusters to choose. In some case studies, we use domain knowledge to decide the number of clusters. Where such knowledge is not available, we use the Silhouette metric to compute the optimal number of clusters. Silhouette is a ML method to interpret and validate consistency within clusters by measuring how well each point has been clustered. The silhouette metric ranges from \(-1\) to \(+1\), where a high silhouette value indicates that the object is well matched to its own cluster and poorly matched to neighboring clusters [17].

Example 5

Figure 1a shows the results of projecting the spatio-temporal traces from BSS through the PSTREL formula \(\varphi (\tau ,d)\) shown in Eq. (1).

$$\begin{aligned} \varphi (\tau , d) = \mathbf {G}_{[0,3]}(\varphi _\mathrm {wait}(\tau ) \vee \varphi _\mathrm {walk}(d)) \end{aligned}$$
(1)

In the above formula, \(\varphi _\mathrm {wait}(\tau )\) is defined as \(\mathbf {F}_{[0,\tau ]}(B\ge 1) \wedge (\mathbf {F}_{[0,\tau ]} S\ge 1)\), and \(\varphi _\mathrm {walk}(d)\) is . \(\varphi (\tau ,d)\) means that for the next 3 h, either \(\varphi _\mathrm {wait}(\tau )\) or \(\varphi _\mathrm {walk}(d)\) is true. Locations with large values of \(\tau \) have long wait times or with large d values are typically far from a location with bike/slot availability (and are thus undesirable). Locations with small \(\tau ,d\) are desirable. Each point in Fig. 1a shows \(\pi _{\mathrm {lex}}(\sigma ,\ell )\) applied to each location and the result of applying AHC with 3 clusters.

Let \( numC \) be the number of clusters obtained after applying AHC to the parameter valuations. Let \(C\) denote the labeling function mapping \(\pi _{\mathrm {lex}}(\sigma ,\ell )\) to \(\{1,\ldots , numC \}\). The next step after clustering is to represent each cluster in terms of an easily interpretable STREL formula. Next, we propose a decision tree-based approach to learn an interpretable STREL formula from each cluster.

Learning STREL Formulas from Clusters. The main goal of this subsection is to obtain a compact STREL formula to describe each cluster identified by AHC. We argue that bounded length formulas tend to be human-interpretable, and show how we can automatically obtain such formulas using a decision-tree approach. Decision-trees (DTs) are a non-parametric supervised learning method used for classification and regression[13]. Given a finite set of points \(X \subseteq \mathbb {R}^m\) and a labeling function \(\mathcal {L}\) that maps each point \(x \in X\) to some label \(\mathcal {L}(x)\), the DT learning algorithm creates a tree whose non-leaf nodes \(n_j\) are annotated with constraints \(\phi _j\), and each leaf node is associated with some label in the range of \(\mathcal {L}\). Each path \(n_1,\ldots ,n_i,n_{i+1}\) from the root node to a leaf node corresponds to a conjunction \(\bigwedge _{j=1}^{i} h_j\), where \(h_j\) = \(\lnot \phi _j\) if \(h_{j+1}\) is the left child of \(h_j\) and \(\phi _j\) otherwise. Each label thus corresponds to the disjunction over the conjunctions corresponding to each path from the root node to the leaf node with that label.

Recall that after applying the AHC procedure, we get one valuation \(\pi _{\mathrm {lex}}(\sigma ,\ell )\) for each location, and its associated cluster label. We apply a DT learning algorithm to each point \(\pi _{\mathrm {lex}}(\sigma ,\ell )\), and each DT node is associated with a \(\phi _j\) of the form \(p_j \ge v_j\) for some \(p_j \in \mathbf {p}\).

Lemma 1

Any path in the DT corresponds to a STREL formula of length that is \(O((|\mathcal {P}| + 1)\cdot |\varphi |)\).

Proof

Any path in the DT is a conjunction over a number of formulas of the kind \(p_j \ge v_j\) or its negation. Because \(\varphi (\mathbf {p})\) is monotonic in each of its parameters, if we are given a conjunction of two conjuncts of the type \(p_j \ge v_j\) and \(p_j \ge v'_j\), then depending on \(\gamma (p_j)\), one inequality implies the other, and we can discard the weaker inequality. Repeating this procedure, for each parameter, we will be left with at most 2 inequalities (one specifying a lower limit and the other an upper limit on \(p_j\)). Thus, each path in the DT corresponds to an axis-aligned hyperbox in the parameter space. Due to monotonicity, an axis-aligned hyperbox in the parameter space can be represented by a formula that is a conjunction of \(|\mathcal {P}|+1\) STREL formulas (negations of formulas corresponding to the \(|\mathcal {P}|\) vertices connected to the vertex with the most permissive STREL formula, and the most permissive formula itself) [11] (see Fig. 3a for an example in a 2D parameter space). Thus, each path in the DT can be described by a formula of length \(O((|\mathcal {P}|+1)\cdot |\varphi |)\), where \(|\varphi |\) is the length of \(\varphi \).

Example 6

The result of applying the DT algorithm to the clusters identified by AHC (shown in dotted lines in Fig. 1a) is shown as the axis-aligned hyperboxes. Using the meaning of \(\varphi (\tau ,d)\) as defined in Eq. (1), we learn the formula \(\lnot \varphi (17.09,2100) \wedge \lnot \varphi (50,1000.98) \wedge \varphi (50,2100)\) for the red cluster. The last of these conjuncts is essentially the formula \( true \), as this formula corresponds to the most permissive formula over the given parameter space. Thus, the formula we learn is:

$$ \varphi _{red} = \lnot \mathbf {G}_{[0,3]}(\varphi _\mathrm {wait}(17.09) \vee \varphi _\mathrm {walk}(2100)) \wedge \lnot \mathbf {G}_{[0,3]}(\varphi _\mathrm {wait}(50) \vee \varphi _\mathrm {walk}(1000.98)) $$

The first of these conjuncts is associated with a short wait time and the second is associated with short walking distance. As both are not satisfied, these locations are the least desirable.

Pruning the Decision Tree. If the decision tree algorithm produces several disjuncts for a given label (e.g., see Fig. 4a), then it can significantly increase the length and complexity of the formula that we learn for a label. This typically happens when the clusters produced by AHC are not clearly separable using axis-aligned hyperplanes. We can mitigate this by pruning the decision tree to a maximum depth, and in the process losing the bijective mapping between cluster labels and small STREL formulas. We can still recover an STREL formula that is satisfied by most points in a cluster using a k-fold cross validation approach (The formal procedure is presented in Algorithm 3 in the arXiv version.) The idea is to loop over the maximum depth permitted from 1 to N, where N is user provided, and for each depth performing k-fold cross validation to characterize the accuracy of classification at that depth. If the accuracy is greater than a threshold (\(90\%\) in our experiments), we stop and return the depth as a limit for the decision tree. Figure 4b illustrates the hyper-boxes obtained using this approach. For this example, we could decrease the number of hyper-boxes from 11 to 3 by miss-classifying only a few data points (less than \(10\%\) of the data).

Fig. 3.
figure 3

Illustration of clustering on the BSS locations

5 Case Studies

We now present the results of applying the clustering techniques developed on three benchmarks: (1) COVID-19 data from Los Angeles County, USA, [9] (2) Outdoor Air Quality data from California, and (3) BSS data from the city of Edinburgh [10] (running example)Footnote 7. A summary of the computational aspects of the results is provided in Table 1. The numbers indicate that our methods scale to spatial models containing hundreds of locations, and still learn interpretable STREL formulas for clusters.

Table 1. Summary of results.

COVID-19 Data from LA County. Understanding the spread pattern of COVID-19 in different areas is vital to stop the spread of the disease. While this example is not related to a software system, it is nevertheless a useful example to show the versatility of our approach to spatio-temporal data. The PSTREL formula allows us to number of cases exceeding a threshold c within \(\tau =10\) days in a neighborhood of size d for a given locationFootnote 8. Locations with small value of d and large value of c are unsafe as there is a large number of new positive cases within a small radius around them.

We illustrate the clustering results in Fig. 4. Each location in Fig. 4a is associated with a geographic region in LA county (shown in Fig. 4c), and the red cluster corresponds to hot spots (small d and large c). Applying the DT classifier on the learned clusters (shown in Fig. 4a) produces 11 hyperboxes, some of which contain only a few points. Hence we apply our DT pruning procedure to obtain the largest cluster that gives us at least \(90\%\) accuracy. Figure 4b shows the results after pruning the Decision Tree. We learn the following formula:

figure c

This formula means that within 4691.29 m from any red location, within 10 days, the number of new positive cases exceeds 3180. The COVID-19 data that we used is for September 2020Footnote 9.

Fig. 4.
figure 4

Procedure to learn STREL formulas from COVID-19 data

Outdoor Air Quality Data from California. We next consider Air Quality data from California gathered by the US Environmental Protection Agency (EPA). Among reported pollutants we focus on PM2.5 contaminant, and try to learn the patterns in the amount of PM2.5 in the air using STREL formulas. Consider a mobile sensing network consisting of UAVs to monitor pollution, such a STREL formula could be used to characterize locations that need increased monitoring.

We use the PSTREL formula \(\varphi (c,d)\) = \(\mathbf {G}_{[0,10]}(\mathcal {E}_{[d,16000]}{}(PM2.5 < c))\) and project each location in California to the parameter space of cd. A location \(\ell \) satisfies this property if it is always true within the next 10 days, that there exists a location \(\ell '\) at a distance more than d, and a route \(\tau \) starting from \(\ell \) and reaching \(\ell '\) such that all the locations in the route satisfy the property \(PM2.5 < c\). Hence, it might be possible to escape to a location at a distance greater than d always satisfying property \(PM2.5 < c\). The results are shown in Fig. 5a. Cluster 8 is the best cluster as it has a small value of c and large value of d which means that there exists a long route from the locations in cluster 8 with low density of PM2.5. Cluster 3 is the worst as it has a large value of c and a small value of d. The formula for cluster 3 is \(\varphi _3\) = \(\varphi (500,0) \wedge \lnot \varphi (500,2500) \wedge \lnot \varphi (216,0)\). \(\varphi _3\) holds in locations where, in the next 10 days, PM2.5 is always less than 500, but at least in 1 day PM2.5 reaches 216 and there is no safe route (i.e. locations along the route have \(PM2.5 < 500\)) of length at least 2500.

Fig. 5.
figure 5

Clustering experiments on the California Air Quality Data

6 Related Work and Conclusion

Traditional ML Approaches for Time-Series Clustering. Time-series clustering is a popular area in the domain of machine learning and data mining. Some techniques for time-series clustering combine clustering methods such as KMeans [7], Hierarchical Clustering, agglomerative clustering [4] and etc., with similarity metrics between time-series data such as the Euclidean distance, dynamic time-warping (DTW) distance, and statistical measures (such as mean, median, correlation, etc.). Some recent works such as the works on shapelets automatically identify distinguishing shapes in the time-series data [19]. Such shapelets serve as features for ML tasks. All these approaches are based on shape-similarity which might be useful in some applications; however, for applications that the user is interested in mining temporal information from data, dissimilar traces might be clustered in the same group [11]. Furthermore, such approaches may lack interpretability as we showed in BSS case study.

STL-Based Clustering of Time-Series Data. There is considerable amount of recent work on learning temporal logic formulas from time-series data using logics such as Signal Temporal Logic (STL) [8, 11, 14, 15]; however, there is no work on discovering such relations on spatio-temporal data. In particular, the work in [11] which addresses unsupervised clustering of time-series data using Signal Temporal Logic is closest to our work. There are a few hurdles in applying such an approach to spatio-temporal data as explained in Sect. 1. We address all the hurdles in the current work.

Monitoring Spatio-temporal Properties. There is considerable amount of recent work such as [2, 3] on monitoring spatio-temporal properties. Particularly, MoonLight [3] is a recent tool for monitoring of STREL properties, and in our current work, we use MoonLight for computing the robustness of spatio-temporal data with respect to STREL formulas. MoonLight uses \((\delta ,d)\)-connectivity approach for creating a spatial model, which has several issues, including dis-connectivity and distance overestimation. We resolve these issues by proposing our new method for creating the spatial graph, which we call Enhanced MSG. While there are many works on monitoring of spatio-temporal logic, to the best of our knowledge, there is no work on automatically inferring spatio-temporal logic formulas from data that we address in this work.

Conclusion. In this work, we proposed a technique to learn interpretable STREL formulas from spatio-temporal time-series data for Spatially Distributed Systems. First, we introduced the notion of monotonicity for a PSTREL formula, proving the monotonicity of each spatial operator. We proposed a new method for creating a spatial model with a restrict number of edges that preserves connectivity of the spatial model. We leveraged quantitative semantics of STREL combined with multi-dimensional bisection search to extract features for spatio-temporal time-series clustering. We applied Agglomerative Hierarchical clustering on the extracted features followed by a Decision Tree based approach to learn an interpretable STREL formula for each cluster. We then illustrated with a number of benchmarks how this technique could be used and the kinds of insights it can develop. The results show that while our method performs slower than traditional ML approaches, it is more interpretable and provides a better insight into the data. For future work, we will study extensions of this approach to supervised and active learning.