Keywords

1 Introduction

In the past few years, Semantic Web technologies have been successfuly applied to the domain of the Internet of Things (IoT). Various systems such as SPITFIRE [6], mixing RDF and machine-generated data, were developed with promising results.

In this paper, we explore the possibility of exchanging RDF data in constrained environments, in order to extend the scope of the Semantic Web for the IoT. In particular, we are interested in the problem of storing and querying RDF data on micro-controllers with IP connectivity (8 to 64 kB RAM).

2 Related Work

Until recently, no realistic use case could be found where computational devices had limited resources but still IP connectivity. As a consequence, the problem of storing RDF in constraind environments, as opposed to storing billions of statements in very large databases, has remained mostly unexplored. The situation has changed with the coming of the IoT where RDF has found new usages.

The first work that addressed constrained devices —and to the best of our knowledge, the only one— is the Wiselib TupleStore (as part of SPITFIRE) [4]. It compares the performance of various C++ data structures to compress URIs, inherent to any RDF document.

The Wiselib TupleStore features insertion and removal operations. There exists other proposals to compress RDF data but none of them have these characteristics. The most notable ones are the Header Dictionary Triples (HDT) binary representation for RDF [2] and \(k^2\)-triples, a variant of HDT [1]. The main objective of HDT was to compress large RDF datasets. e.g. to fit in the main memory of a standard PC. But its compression scheme could reasonably be used on small datasets as well.

In the original proposal for HDT, triples are stored using bitmaps on which compression is applied. \(k^2\)-triples indexes triples in subject-object indexes (vertical partitioning) and applies \(k^2\)-tree compression on the two-dimensional arrays.

HDT and \(k^2\)-triples show high compression ratios. However, in a typical IoT configuration, RDF data might be dynamically updated, which requires insertion and removal operations. Moreover, neither the Wiselib Tuplestore nor HDT/\(k^2\)-triples go beyond triple pattern matching to query RDF. Our proposal, the \(\mu \)RDF store, combines both a small memory footprint with insertion/removal operations and Basic Graph Pattern (BGP) processing. It is presented next.

3 Overview of the \(\mu \)RDF Store

A few assumptions were made in the design of the \(\mu \)RDF store. First, it is expected that datasets contain a limited number of triples, to fit in the RAM of a micro-controller. Second, we expect datasets to contain mostly assertional data, as opposed to terminological data.

3.1 Data Structure

The data structure underlying the \(\mu \)RDF store is optimized for navigational queries (i.e. with bound predicates) against datasets having few distinct properties (max. n properties). Each resource is given an index number where the first n indices are reserved for predicates (in practice, \(n=32\) or 64, i.e. a multiple of the size of a machine word). Triples are stored with variable-length byte (vbyte) encoding in a way that navigation from every resource to its neighbors in the RDF graph is possible both forward and backward. Triples with literals are only stored once.

Formally, for I, B and L mutually disjoint sets of IRIs, blank nodes and literals respectively, a \(\mu \)RDF resource \(\rho \in P\) is the pair (RL) where \(R \subset I \times (I \cup B) \times \{0,1\}\) is the set of direct and reverse relations to other resources and \(L \subset I \times L\) is the set of literal relations associated to \(\rho \). An instance of the \(\mu \)RDF store is a map \(\mu : (I \cup B) \rightarrow P\).

3.2 Query Interface

The \(\mu \)RDF store supports BGP queries (given that at least one subject or object is bound). Queries are processed in a greedy fashion: as soon as a binding is found for a triple pattern, the next triple pattern is processed. This guarantees an upper bound of O(|V|) for intermediate storage, where V is the set of variables in the query. This process, that might lead to duplicated calls, avoids managing arbitrarily large intermediary results (e.g. for patterns with the rdf:type predicate), which is critical in constrained environments.

To exchange data over the network, data is serialized in the Efficient XML Interchange (EXI) format. Our original proposal for the \(\mu \)RDF store suggests that EXI, that can take advantage of schema information from RDF/XML, offers satisfactory compression ratios [5].

Moreover, a straightforward mapping of the HTTP SPARQL 1.1 protocol can be defined for the Constrained Application Protocol (CoAP), where both query and update are supported. The CoAP specification limits the payload size to 1024 bytes. If the serialized result set is above this value, the algorithm described above allows data to be sent “block-wise”: each full mapping found by the algorithm is sent in a separate blockFootnote 1.

4 Evaluation

We implemented the \(\mu \)RDF store for the ESP8266, a microcontroller with an integrated Wi-Fi chip (64 kB RAM, 80 MHz). To test its performances, we used the dataset generator and the query mix provided by the Lehigh University Benchmark (LUBM) [3]. The generator had to be adapted to small datasetsFootnote 2. All fourteen queries of the benchmark can be processed by our implementation.

We compared the memory footprint of our implementation to the Wiselib tuplestore, HDT and \(k^2\)-triples (Fig. 1a). The comparison suggests that, overall, HDT performs best, followed by the \(\mu \)RDF store, in the range we consider of interest (up to 10,000 triples). It is reported in the literature that \(k^2\)-triples performs better but the graphic shows that this does not apply to small RDF datasets. It also shows that URI compression as performed by the Wiselib tuplestore performs poorly for datasets with less than 1,000 triples.

Fig. 1.
figure 1

Evaluation of the \(\mu \)RDF store with LUBM synthetic data

The original idea behind the \(\mu \)RDF store is to explore SPARQL as a decentralized discovery mechanism among IoT devices. This requires at least BGPs to be supported by individual nodes. As mentioned before, no work in the state-of-the-art has proposed an implementation of BGP processing at the scale of a micro-controller.

On the MSP8266, all queries from the LUBM benchmark —especially Q9 that requires many intermadiary joins—are processed in less than 20 ms, including EXI coding. In comparison, sending static RDF/EXI data over CoAP takes at least 500 ms and this number grows with the number of triples being sent (Fig. 1b), which means that query processing has no significant impact on the overall exchange between two devices. On the contrary, local BGP processing, in constrat to other RDF query mechanisms such as Triple Pattern Fragments (TPFs) [7], can significantly improve the rapidity of an exchange by reducing the total amount of exchanged data.

Our work on the \(\mu \)RDF store has led us to the following conclusion: with today’s hardware, an “Embedded Semantic Web” is possible. As opposed to HDT, the \(\mu \)RDF store supports updates and BGP processing; without degrading data transmission; with a limited overhead in memory.

5 Demonstration: Ask Me (almost) Anything

The purpose of this demonstration is to illustrate the conclusions of our evaluation in an interactive fashion. We developed a simple chat application that allows one to submit queries in a simplified manner to an instance of the \(\mu \)RDF store from their Web browser. Its design is inspired by a concept popularized by Reddit called Ask Me Anything (AMA). Here, a microcontroller invites you to an AMA session to let you discover its name, its capabilites and the “Things” it is semantically connected to.

Figure 2a shows the Web interface on which queries can be formulated. The technical details of the demonstration are given in Fig. 2b. A client query is first pre-processed by the so-called “advisor”, a non-constrained machine that first translates the query into SPARQL, then serializes it in EXI; the EXI is processed by the micro-controller which responds with another EXI message; the server response is post-processed by the advisor to print it in a human-readable form. All the messages are sent via the Message Queue Telemetry Transport (MQTT) protocol following a simple publish-subscribe interaction pattern. The delay between requests and responses does not exceed a few seconds.

Fig. 2.
figure 2

Overview of the ask me (almost) anything (AMaA) application

6 Conclusion

IoT devices that embed a full IP-based communication stack usually come with unused computational power. As current applications mostly concentrate on sending sensor data to the Cloud as fast as possible, the capacities of edge devices remain untapped. With the \(\mu \)RDF store, we aim at relocating intelligence to the edge, starting with semantic processing. Our demonstration underlines the relevance of an Embedded Semantic Web for the IoT.

Another aspect that this demonstration tends to show is that human–to–machine communication is tedious. IoT systems might be much more performant if machines themselves formulated the queries. This is our next line of research. Support for the OPTIONAL operator is also being considered in our implementation.