Evaluating Geospatial RDF Stores Using the Benchmark Geographica 2

Ioannidis, Theofilos; Garbis, George; Kyzirakos, Kostis; Bereta, Konstantina; Koubarakis, Manolis

doi:10.1007/s13740-021-00118-x

Evaluating Geospatial RDF Stores Using the Benchmark Geographica 2

Original Article
Published: 23 April 2021

Volume 10, pages 189–228, (2021)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Journal on Data Semantics

Evaluating Geospatial RDF Stores Using the Benchmark Geographica 2

Download PDF

Theofilos Ioannidis ORCID: orcid.org/0000-0002-1754-8748¹,
George Garbis¹,
Kostis Kyzirakos¹,
Konstantina Bereta¹ &
…
Manolis Koubarakis¹

284 Accesses
10 Citations
3 Altmetric
Explore all metrics

Abstract

Geospatial extensions of SPARQL, like GeoSPARQL and stSPARQL, have been defined since 2007, and while several geospatial RDF stores have implemented a substantial part of these extensions, other stores limited their support mostly on point geometry features. A parallel process with the above was that RDF frameworks evolved in an interesting way by presenting a more mature set of geospatial features, such as GeoSPARQL support and including the latest indexing technologies. As a logical consequence, a shift in the use of RDF frameworks is to be expected, from base platforms that users extend to create more complete geospatial RDF stores, to attractive finished RDF solutions for many geospatial applications. Alongside with the ever-increasing size of linked geospatial data that semantic stores need to handle, all the above provided our group the motivation to improve our single-node systems benchmark Geographica, originally defined in 2013. Geographica 2 is more comprehensive, because it now includes new geospatial RDF stores and frameworks, big real-world datasets of many hundred million triples with up to 50 million features of complex geometries, new tests and queries that reveal the scalability of these systems. The augmented and revised real-world workload of Geographica 2 tests the efficiency of primitive spatial functions in RDF stores, their performance in the geocoding scenario against the new Census dataset in addition to many other real use case scenarios and finally includes computation of statistics for geospatial datasets. A more detailed and systematic evaluation is performed using the synthetic workload. The new scalability workload aims at discovering the limits of centralized geospatial RDF stores of various architectures. It employs a set of six well-balanced real-world datasets with highly complex geometries covering many European countries and compares three RDF stores in terms of storage space, bulk loading and query response time. In addition, a special version of the benchmark has been created for systems with limited geospatial functionality and two more systems of this category are introduced along the six systems of the main benchmark, all stressed against point-only subsets of the workloads. Three out of the eight systems use an RDBMS for the persistence layer, while some of them offer a variety of persistence options.

Geographica: A Benchmark for Geospatial RDF Stores (Long Version)

Ontop of Geospatial Databases

Strabo 2: Distributed Management of Massive Geospatial RDF Datasets

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Many geospatial datasets have recently been added to the Web of data and geospatial extensions to SPARQL, such as GeoSPARQL and stSPARQL, have been defined.

GeoSPARQL [1] is a standard of the Open Geospatial Consortium (OGC) for a SPARQL-based query language for geospatial data expressed in RDF. GeoSPARQL defines a vocabulary (classes, datatypes and properties) that can be used in RDF graphs to represent geographic features with vector geometries.

The query language stSPARQL [2, 3] is an extension of SPARQL 1.1 developed by our group for representing and querying geospatial data that changes over time. Similarly to GeoSPARQL, the geospatial part of stSPARQL defines datatypes that can be used for representing in RDF the serializations of vector geometries encoded according to the widely adopted OGC standards Well-Known Text (WKT) [4] and Geography Markup Language (GML) [5]. stSPARQL and GeoSPARQL also define extension functions from the OGC standard “OpenGIS Simple Feature Access” (OGC-SFA) [4] that can be used for querying vector geometries.

The query languages stSPARQL^{Footnote 1} and GeoSPARQL were developed independently at around the same time, and have produced very similar representational and querying constructs. A detailed comparison of stSPARQL and GeoSPARQL is given in [6] and the forthcoming book [7].

In parallel with the appearance of GeoSPARQL and stSPARQL, researchers have been implementing geospatial RDF stores that support these SPARQL extensions (e.g., Strabon [2], Parliament [8], and uSeekM). The earlier approach for the implementation was by extending existing RDF frameworks (e.g., Sesame) with limited geospatial functionality and relying on state of the art spatially enabled RDBMSs (e.g., PostGIS) for the storage and querying of geometries (e.g., Strabon and uSeekM with PostGIS). One reason that this hybrid approach had been successful is that the relational realization of the OGC-SFA standard has been widely adopted by many RDBMS for storing and manipulating vector geometries. The state of the art in this area is summarized in the early survey paper [6].

However, new highly competitive geospatial RDF stores appeared lately that belong to the NoSQL graph databases technology family, e.g., GraphDB. In addition, as of mid-2018, some RDF Frameworks (e.g., RDF4J, formerly known as Sesame) advanced substantially in terms of GeoSPARQL support, the availability of indexing and search technologies and may make an attractive starting point for building geospatial RDF stores rich in terms of features and more efficient in terms of performance.

The above advances to the state of the art in query languages and implemented systems have also been matched with work on evaluation and benchmarking of geospatial RDF stores. Although there are various benchmarks for spatially enabled Relational Database Management Systems (RDBMS) [9,10,11,12,13,14], there are some publications [2, 15,16,17] that study the performance of geospatial RDF stores but no widely accepted benchmark exists.

The work described in [15] has preceded the GeoSPARQL and stSPARQL proposals, therefore it does not cover many of the features available in these languages. Only point and rectangle geometries and only few topological and non-topological functions are included in its workload. Similarly, only the geospatial RDF store SPAUK [18], which is a precursor to Parliament, has been evaluated using this benchmark. In [15] uses a synthetic workload only and does not consider real linked geospatial datasets such as the ones that are available in the LOD cloud today.

In [2], authors present the geospatial RDF store Strabon^{Footnote 2} and they include a section that is an evaluation targeted mostly to Strabon than a general evaluation benchmark. Both a real-world workload and a synthetic one is used in [2]. The synthetic workload uses only point geometries and spatial selection queries, but it allows the study of performance in a controlled environment. In [16] presents a benchmark based on [15] and adapted to the technological advances at the time. In [16] evaluates several geospatial RDF stores taking into account the expressive power of GeoSPARQL and using real data from OpenStreetMap (OSM)^{Footnote 3} of various geometry types (points, lines, polygons). Its workload covers the primary query types covered in [15] (spatial location queries, spatial range queries, spatial join queries, and nearest-neighbor queries) and additional query types, such as queries using non-topological spatial functions, and negation and aggregation queries that use spatial filters.

Finally, the previous version of our benchmark [17], named Geographica^{Footnote 4} was a comprehensive proposal at the time and has been used to evaluate RDF stores supporting GeoSPARQL and stSPARQL. It comprises two workloads with their associated datasets and queries: a real-world workload and a synthetic workload. The real-world workload uses publicly available linked geospatial data, covering a wide range of geometry types (e.g., points, lines, polygons). This workload, follows the approach of the benchmark Jackpine [13] and defines a micro-benchmark and a macro-benchmark. The micro-benchmark tests primitive spatial functions. The spatial component of a system is tested with queries that use non-topological functions, spatial selections, spatial joins and spatial aggregate functions. The macro-benchmark tests performance of selected RDF stores in typical application scenarios like reverse geocoding, map search and browsing and a real-world use case from the Earth Observation (EO) domain. For the synthetic workload of Geographica, a generator was developed that produces synthetic datasets of various sizes and generates queries of varying spatial and non-spatial selectivity. In this way, performance of geospatial RDF stores can be studied in a closely controlled environment. This workload follows the rationale of earlier papers [2, 12, 19]. For reasons of reproducibility, both workloads are publicly available on the web site^{Footnote 5} of the benchmark.

The present article revisits [17] and offers the following contributions:

We present a new version of Geographica, called Geographica 2, which contains the following extensions. We extended the macro-part of the real-world workload of Geographica [17] by adding two more application scenarios: the geocoding scenario and a scenario that involves the computation of statistics for geospatial datasets. Geocoding is tested against a new dataset Census with detailed information on New York’s street addresses.
The second important addition to the original benchmark is the scalability workload, which aims at discovering the limits of centralized geospatial RDF stores of various architectures. Six increasingly bigger and well-balanced datasets have been constructed by combining OSM and CORINE Land Cover datasets with highly complex geometries covering many European countries. The queryset comprises a spatial selection and two spatial joins of different selectivity. We study the behavior of three stores (Strabon, GraphDB and RDF4J) in three key areas, storage space, bulk loading and query response time, all with respect to the number of triples of the dataset. These metrics help discover the positive and negative aspects of each system and can assist future research in the areas of large data storage, indexing strategies and query processing.
We include in our evaluation a qualitative comparison of geospatial RDF stores in order to stress the differences between them in terms of supported geospatial features and functionality.
We also include in our evaluation a variant of the main benchmark which compares two systems of limited geospatial functionality along with the six systems that qualified for the main benchmark. OpenLink Virtuoso does not have substantial support of GeoSPARQL^{Footnote 6} and another proprietary RDF store, called here System Y implements point-only functionality. We had not included these systems in experiments presented in our previous work [17], because our focus at the time was only on systems that exhibited a high level of GeoSPARQL compliance. We believe that this comparison will help shed light on the performance trade-offs of spatial indexing methods employed by these RDF stores. For the purpose of this special benchmark, we used a point-only subset of the real-world and synthetic workloads of Geographica 2.
In [17], we chose to test three well-known open-source RDF stores that provide GeoSPARQL functionality, namely Strabon, Parliament and uSeekM. In this benchmark, we also include a geospatial RDF solution offered as option of one of the leading proprietary RDBMSs, called here System X^{Footnote 7}, the free edition of the GraphDB v8.6.1 NoSQL graph database and the RDF4J v2.4.3 Semantic Framework. To the best of our knowledge, these six systems are the only ones that currently provide support for a rich subset of GeoSPARQL and stSPARQL, so we did not include any other system in the main part of Geographica.
The runtime of Geographica [17] supported systems that are compliant with Sesame API. Geographica 2 provides an additional runtime which allows easy integration of systems that are compliant with the RDF4J API.

The rest of the paper is organized as follows. Section 2 presents the main data models and query languages for linked geospatial data. Section 3 presents previous related work. Section 4 presents well-known geospatial RDF stores and compares them in terms of geospatial functionality that they offer. The benchmark is described in Sect. 5 and its results are discussed in Sect. 6. Section 7 discusses the performance of generic RDF stores with limited geospatial capabilities in comparison with geospatial RDF stores providing full geospatial capabilities. Finally, the contributions of the paper are summarized and future work is discussed in Sect. 8.

2 Background

In this section, we introduce GeoSPARQL and stRDF/stSPARQL. GeoSPARQL allows the representation of geographic data in RDF and querying it using an extension of SPARQL. stRDF is an extension of RDF that allows the representation of geospatial linked data that evolves over time. stSPARQL is an extension of SPARQL that permits querying stRDF data taking into account its spatial and temporal dimension.

2.1 GeoSPARQL

GeoSPARQL is a standard, developed by the OGC, that defines a core RDF/OWL vocabulary and a set of SPARQL extension functions for representing and querying linked geospatial data. GeoSPARQL follows a modular architecture, shown in Fig. 1, that defines six conformance classes. Each implementation may support one or more conformance classes.

The Core conformance class defines a basic RDFS/OWL vocabulary for representing geospatial data. This vocabulary includes the class SpatialObject and its subclasses Feature and Geometry. Features can have geometries and geometries can be encoded by the OGC standards WKT and GML. The Topology Vocabulary Extension defines a vocabulary for asserting topological relations between spatial objects. This conformance class is parameterized so that an implementation can use any of the well-known topological relation families: RCC8 [20], Egenhofer [21], and OGC SFA. The Geometry Extension conformance class defines a vocabulary for asserting information about geometry data and query functions operating on geometry data. This class defines the appropriate RDFS datatypes for asserting geometry data as literal values. A geometry literal can be encoded in WKT or in GML; this is defined by a parameter of the conformance class. The Geometry Extension conformance class also defines non-topological functions that operate on geometry data and return geometry or numeric data (e.g., the distance between two geometries). The Geometry Topology Extension conformance class defines topological query functions that operate between two geometry literals and return if a topological relation holds between their corresponding geometries. According to parameters of the Geometry Topology Extension, GeoSPARQL implementations can support any of the geometry serializations (WKT, GML) and any of the aforementioned topology relation families (RCC8, Egenhofer, OGC SFA). The RDFS Entailment Extension conformance class defines a mechanism for matching implicitly derived RDF triples in GeoSPARQL queries. Finally, the Query Rewrite Extension conformance class defines rules to support implication of direct topological predicates between features based on the geometries of these features. This is achieved by a set of RIF rules that expand direct topological predicates (from Topology vocabulary) into a series of triple patterns and an invocation of the corresponding extension function (from Geometry Topology vocabulary). For example, a RIF rule asserts that if the function geof:sfIntersects holds between two geometry literals then the topological relation geo:sfIntersects holds among the corresponding features. Using these rules, queries that contain a topological relation between two variables standing for features (e.g., ?x geo:sfIntersects ?y) can be rewritten into queries that contain topological functions standing for two literals (e.g., geof:sfIntersects(?f1, ?f2)).

2.2 stRDF and stSPARQL

stRDF and stSPARQL are extensions of RDF and SPARQL that allow the representation and querying of linked spatiotemporal data. stSPARQL has been developed by our group at the same time as GeoSPARQL, and has resulted in a similar representation model. It follows the categorization for feature characteristics, proposed in Perry’s PhD thesis [22]: spatial, temporal and the non-spatiotemporal attributes which are named thematic. Similar categorization appears in GIS-related papers such as [23]. stSPARQL, like GeoSPARQL, defines two datatypes (strdf:WKT, strdf:GML) for encoding geometry literals and a set of functions that correspond to the functions of the Geometry extension and the Geometry Topology extension of GeoSPARQL. In addition to these functions, stSPARQL defines directional relation functions that are based on the minimum bounding boxes of two geometries (e.g., if a geometry is strictly on the left of another geometry) and spatial aggregate functions that operate on sets of geometries and compute new geometry objects (e.g., the union of a set of geometries). Note that both GeoSPARQL and stSPARQL include functions that compute the union of two given geometry literals, stSPARQL additionally includes a function computes the union of a given set of geometry literals.

In addition to its geospatial features, stRDF has a temporal component which can represent the valid time of a triple and stSPARQL defines a set of temporal functions for querying the valid time of triples. The temporal component of stRDF and stSPARQL are described in [3] and it will not be considered in the rest of the paper.

2.3 Selection of features to test

In the Geographica 2 benchmark, an effort has been made to test against a fusion of non-redundant features from both geospatial extensions of SPARQL, while taking into consideration some common design choices of the majority of the systems under test.

The developers of several RDF stores, when faced with optional requirements or multiple equivalent alternatives of the GeoSPARQL standard, they initially settle for a minimal set of features that provide adequate spatial functionality. Such dilemmas are, the support for multiple CRSs in the WKT serialization, or the three topological relation families (OGC SFA, Egenhofer, RCC8) for spatial relations and functions. It is common ground that several RDF stores implement only the WKT serialization of geospatial data with support just for the default CRS84 and realize only the functions of the Simple Features topological relation family. Anyway, the GeoSPARQL standard also provides an equivalence matrix between the three topological relation families, which allows an implementer to consider using only one of them, without some loss of expressivity or functionality.

With this in mind and since GeoSPARQL is the widely accepted standard, the benchmark experiments focus on features included in the Core, Geometry Extension and Geometry Topology Extension conformance classes, with the addition of the very useful aggregate functions offered by stSPARQL. Datasets and querysets are using the WKT serialization with the default CRS84 and only the Simple Features set of functions is tested. The Topology Vocabulary and Query Rewrite conformance classes are rarely implemented by the majority of the systems and are not considered in this version of the benchmark. The RDFS Entailment Extension albeit supported by a number of systems, is intentionally not tested since: (i) it is not the primary focus of this work and (ii) it would place additional computing resource requirements to the very demanding task of querying large geospatial data sources.

3 Related Work

This section discusses the most important benchmarks that are relevant to Geographica. First benchmarks for SPARQL query processing are presented, followed by those from the geospatial relational databases area and, finally, we concluded with benchmarks for querying linked geospatial data.

3.1 Benchmarks for SPARQL Query Processing

A well-known benchmark for Semantic Web knowledge base systems is the Lehigh University Benchmark (LUBM) [24]. It tests scalability, efficiency and reasoning capabilities of memory-based systems and systems with persistent storage. Concerning reasoning capabilities, three degrees are tested: (i) RDFS reasoning, (ii) partial OWL reasoning, and (iii) complete or almost complete OWL Lite reasoning. The authors propose a benchmark with fourteen queries over a large dataset that commits to an ontology describing the university domain. This is one of the first benchmarks for SPARQL query processing and its design is based on techniques applied to older database benchmarks. For example, its data are synthetically generated so that the data size can be arbitrarily large and the selectivity and output size of each query can be predefined. Finally, LUBM uses a set of predefined performance metrics, namely load time, repository size, and query response time and it also suggests two new metrics about completeness and soundness of the query evaluations.

EvoGen [25], a LUBM derivative, is a synthetic benchmark for evolving RDF. It includes configurable schema evolution, change logging and representation between versions, as well as query workload generation functionality. EvoGen extends LUBM’s ontology with 10 new classes and 19 new properties and adds queries commonly performed in evolving settings, such as temporal querying, queries on changes and longitudinal queries across versions. The implemented change logging mechanism produces logs of the changes between consecutive versions following the representational schema of the change ontology.

The SPARQL performance benchmark (\(\mathrm {SP^{2}Bench}\)) [26] is an RDF benchmark directed toward a comprehensive performance evaluation of RDF stores. The authors of this benchmark cover a wide spectrum of SPARQL features. They define queries with various SPARQL operators (e.g., UNION, OPTIONAL, FILTER) and solution modifiers (e.g., DISTINCT, ORDER BY, LIMIT) and they also test negation as failure queries. Queries are grouped in two categories: (i) long path chains and (ii) bushy patterns and they are designed so they are amenable to SPARQL optimization techniques (e.g., triple reordering, FILTER pushing). \(\mathrm {SP^{2}Bench}\) defines a data generator that produces datasets resembling the DBLP dataset.

Another SPARQL query processing benchmark is the Berlin SPARQL Benchmark (BSBM) [27]. This benchmark compares the performance of native RDF stores with the performance of SPARQL-to-SQL rewriters. BSBM uses synthetic data that describes an e-commerce use case. Different vendors offer products and reviews have been posted about these products by consumers. Unlike the systematic approach of \(\mathrm {SP^{2}Bench}\), the Berlin SPARQL Benchmark uses an application-based query mix that emulates the search and navigation pattern of a consumer looking for a product. Thus, the query mix covers an adequate range of SPARQL features but not all of them. Since BSBM is application-oriented, it uses metrics defined for application scenarios and not for single queries, such as query mixes per hour (QMpH) and queries per second (QpS).

The DBpedia SPARQL benchmark (DBPSB) [28] follows a different approach and proposes a generic SPARQL benchmark creation procedure which is based on real application data and query logs. DBPSB proposes a technique to create data of arbitrary size that resembles real data. This technique enables increasing or decreasing the size of a real RDF dataset so that generated data retains the basic network characteristics (in and out degree) and other characteristics, such as the number of classes and properties of the original data. Also, DBPSB proposes a query analysis technique to extract representative queries of a set of real queries. The techniques of DBPSB were applied in the use case of DBpedia^{Footnote 8} but they can be applied to any dataset and query log to produce a use case-specific benchmark.

The Waterloo SPARQL Diversity Test Suite (WatDiv) [29] provides stress testing tools for RDF systems that face diverse queries and varied workloads. It defines two classes of query features based on which it discusses the variability of the datasets and workloads in a SPARQL benchmark: (i) structural features such as triple pattern count, join vertex count, degree and type and (ii) data-driven features such as result cardinality, filtered triple pattern (f-TP) selectivity, basic graph patterns (BGPs) restricted f-TP selectivity and join-restricted f-TP selectivity. The second part of [29] includes an experimental evaluation of other SPARQL benchmarks with emphasis on identifying test cases that are not handled by these benchmarks. The last part of [29] is an experimental evaluation of five RDF stores of different architectures using WatDiv, demonstrating that none of the systems is sufficiently robust across a diverse set of queries.

The Social Network Benchmark (SNB) [30] is the first benchmark issued by the Linked Data Benchmark Council (LDBC),^{Footnote 9} an independent EU sponsored authority responsible for specifying benchmarks, benchmarking procedures, verifying and publishing benchmark results. SNB targets all types of graph database systems, such as property graphs and RDF stores. Its core component is a synthetic dataset modeling a Facebook-like social network, consisting of persons, their friendship connections and messages posted in forums. It also comprises three different workloads, which all use the common dataset, basically making available three separate SNB benchmarks: SNB-Interactive, SNB-BI and SNB-Algorithms. LDBC has released a draft version of SNB-Interactive^{Footnote 10} and a first stab of the SNB-BI formulated in SPARQL which is tested against OpenLink Virtuoso. The SNB-Algorithms workload is not available yet. The SNB-Interactive queryset is defined in plain text, but example implementations exist in Cypher, SPARQL and SQL. It is an OLTP-like workload which measures a system’s throughput using a mixed set of simple and complex queries along with concurrent updates. Key contributions of SNB-Interactive are: (i) the DATAGEN synthetic graph generator produces scalable datasets, through a scaling factor, which at the same time are more realistic than previous generators, by employing skewed value distributions, and exploiting plausible correlations between property values and graph structures, (ii) the expert and user community-driven choke-point workload design which helps identify important technical challenges for query optimization, (iii) the query driver manages to generate a highly parallel workload to achieve high throughput, on a difficult to partition dataset with a complex structure of connected components, (iv) the introduction of the parameter curation benchmarking concept, which basically involves using data mining techniques during data generation to find good query substitution parameters with equivalent behavior.

LDBC also issued the first draft of their Semantic Publishing Benchmark (SPB v2.0),^{Footnote 11} an RDF focused benchmark inspired by the Media/Publishing industry. More specifically, British Broadcasting Corporation (BBC) helped define this benchmark and also contributed with workloads, ontologies and data. Currently, the coordinator and main contributor for SPB is Ontotext^{Footnote 12} company, developer of the well-known GraphDB semantic store, and the Institute of Computer Science (ICS) of the Foundation for Research and Technology Hellas (FORTH).^{Footnote 13} The benchmark considers large volume of streaming content and assumes that an RDF database is used to store both the reference knowledge and related metadata. The main operations on the repository are: (i) updates, that add new metadata or alter the repository, and (ii) aggregation queries, that retrieve content according to various criteria. The benchmark is very much based upon the BBC use-case and places a strict requirement that the engine should handle instantly large number of updates in parallel with massive amount of aggregation queries. To properly execute SPB v2.0, an RDF store has to also satisfy the following requirements: support for storage of RDF data, support for RDF named graphs, loading of data in one of the standard RDF serializations (Turtle for ontologies, N-Quads for data), support for standards SPARQL 1.1 Query/Update/Protocol, RDFS v1.0 and the RL profile of Web Ontology Language (OWL2). SPB provides a data generator which produces scalable in size synthetic data based on ontologies, reference and previously generated data. Following the logic of SNB, SPB identifies eleven choke points (CP1–CP11), such as CP1:join ordering and CP8:geo-spatial predicates, which, respectively, test the ability of RDF store engines to decide which type of join to be used based on cardinality constraints and the ability to handle queries about entities within a geospatial range. The two types of workloads offered are: basic and advanced. The basic workload consists of 9 queries of types: search, aggregate, geospatial, full-text search, time-range queries and the advanced workload adds: analytical, drill-down and faceted search queries, a total of 25 queries. Each query definition includes the list of choke points tested by it.

gMark [31] is a benchmark that concentrates on schema-driven generation of graphs and queries. It is considered [32] the first domain-independent and query language-independent synthetic graph benchmarking tool. Its design aims to cover features and capabilities commonly found in graph query processing, graph analytics, and schema validation. The authors identify the following three problem areas: (i) there is no community consensus on schema formalisms for graph data, (ii) the construction of synthetic workloads with given selectivity and desired structural features is a very difficult problem and (iii) the approaches, used by WatDiv and LDBC, to perform selectivity estimation based on generated graph instances, does not scale for massive graphs and queryworkloads. In gMark, graph instance generation is leveraged through an optional graph configuration, which includes the enumeration of predicates (i.e., edge labels) and node types (i.e., node labels) occurring in the data, along with their properties in generated instances. For query workload generation with desired behavior, gMark does not rely on graph instances to generate them. Furthermore, the query selectivity of the workload can be specified. gMark is independent from particular query language syntaxes or systems and supports various practical output formats for the graphs and for the queries, including N-triples for data, and SQL, SPARQL, Datalog and openCypher as concrete query language syntaxes for query workloads. In the last part of [31], an experimental comparison of selected state-of-the-art graph query engines is performed, which reveals some limitations of current graph query processing engines such as recursive query processing.

3.2 Benchmarks for Geospatial Relational Databases

One of the first benchmarks for spatial databases is the SEQUOIA benchmark [9] which focuses on Earth Science use cases and which has been used for testing many Geographic Information Systems (GIS). SEQUOIA uses real data (satellite raster data, point locations of geographic features, land use/land cover polygons and data about drainage networks covering the area of USA) and real queries. It also considers different scales of datasets and use cases (e.g., local or national scale). The SEQUOIA benchmark is formed by 11 queries trying to cover the most usual tasks in Earth Science, like (i) data loading and building of respective indexes, (ii) raster data management, (iii) selections based on spatial and non-spatial filters, (iv) spatial joins, (v) and a recursive spatial query.

SEQUOIA was later extended by [10] to evaluate the geospatial DBMS Paradise. In [10], DeWitt et al. study traditional database techniques and how these techniques can be used (or extended to be used) in geospatial query processing. SEQUOIA takes into account only points and polygons, while [10] also tests polylines and circles and broadens the tested functionality (e.g., it tests spatial aggregate functions). Finally, a methodology, called resolution scaleup, is applied to scale up geospatial data. This technique simulates the zoom-in operation of map applications. Existing spatial features are represented in more detail by adding more points to their boundaries, and at the same time new smaller spatial features appear around the existing ones.

Rather than focusing on evaluating performance of systems, the Á La Carte [11] benchmark compares performance of spatial join techniques. In particular, [11] tests the following algorithms: nested loops, scan and index, and synchronized tree traversal. A data generator is presented that generates rectangles with edges parallel to the axes. The Á La Carte generator enables data of arbitrary size that can follow various statistical distributions (uniform, normal and exponential). This allows for the generation of realistic data in terms of spatial distribution. However, the fact that generated rectangles have edges parallel to the axes does not allow testing the full process of spatial evaluation in a DBMS. The usual workflow of a spatial evaluation is composed of two steps. The first step utilizes a spatial index, which is built according to the minimum bounding boxes of geometries, to find candidate results. This step is called filtering step. The second step, which is called refinement step, tests the actual geometries and discards false positives generated by the filtering step. Using rectangles with edges parallel to the axes means that their actual geometries are identical to their minimum bounding boxes. So, the exact answer is already found by the filtering step that does not generate any false positive.

In order to generate data and conduct experiments, Á La Carte defines some statistical models for the generator that resembles typical cartographic applications. These models are the following: (i) “Biotopes” simulates a biotope map where there are few large rectangles uniformly distributed that may overlap but not to a large degree, (ii) “Cities” simulates the distribution of cities and it is composed of many small rectangles (modeled as squares) uniformly distributed around the map, finally, (iii) a hybrid model is defined that resembles a word map. This model comprises two nested submodels. First, a “Biotopes” model creates the continents of the world and inside each continent there are rectangles modeled by the “Cities” model.

A more complex data generator is used in VESPA [12] to compare PostgreSQL with the Rock & Roll deductive object-oriented database system. The data generator of VESPA produces spatial features that resemble real maps. The spatial features that are produced by the VESPA generator represent land ownership, states, land use, roads, streams, gas lines and points of interest. They are uniformly distributed, in contrast to spatial features generated by the Á La Carte generator, but they are more complex than simple rectangles. The dataset consists not only of polygons but also of lines and points. The produced polygons are hexagons and triangles, so their edges are not parallel to the axes and both filtering and refinement step of spatial joins can be tested. Apart from spatial selection and spatial analysis queries VESPA also tests updates and spatial aggregate queries which are not tested by previous benchmarks.

Finally, a more generic benchmark is Jackpine [13]. Jackpine defines two kinds of benchmarking, micro and macro. Micro-benchmarking tests spatial functions in isolation, in order to evaluate the performance of systems in evaluating spatial selection, spatial join, and spatial analysis queries. Macro-benchmarking defines real application scenarios as series of queries and tests the performance of systems for evaluating the entire series of queries for each scenario. Tested scenarios range from simple ones, like geocoding and reverse geocoding, or more complex scenarios like flood risk and toxic spill analysis.

3.3 Benchmarks for Geospatial RDF Stores

The first published benchmark for querying geospatial data encoded in RDF has been proposed in [15]. In [15] extends LUBM to include spatial entities and test the performance of spatially enabled RDF stores. The data generator of LUBM is extended so that each university, department or student gets a spatial extent (rectangle or point). LUBM queries are extended to cover four primary types of spatial queries, namely spatial location queries, spatial range queries, spatial join queries, and nearest-neighbor queries. Range queries aim to test cases of various selectivity, while spatial joins aim to test whether the query planner selects a good plan by taking into account the selectivity of the spatial and ontological part of each query.

Another evaluation of geospatial RDF stores has been done in [2]. In the context of presenting the geospatial RDF store Strabon, experiments studying its performance were conducted. Strabon is compared with Parliament [8] and an implementation on top of RDF-3X [19] that supports spatial queries. In this evaluation, more emphasis is given to study Strabon itself rather than creating a benchmark for various RDF stores. This is why different variations of Strabon are studied in order to demonstrate advantages and disadvantages of different implementation choices. Two workloads are used: one based on real-world linked data and a synthetic one. The first workload consists of eight real-world queries that are either frequently used in Semantic Web applications (e.g., DBpedia and LinkedGeoData endpoints) or they demonstrate the spatial extensions of Strabon. This workload contains thematic queries as well as spatial selection and spatial join queries and queries using non-topological spatial functions. The second workload uses a modified version of the data generator of [19] to generate spatial datasets with arbitrary size and predefined characteristics. The data generator produces spatial data with point geometries and only spatial query selections are studied.

Patroumpas et al. [16] have reviewed the state of the art in managing geospatial data in the Semantic Web. In [16] starts by presenting basic concepts and standards (e.g., GeoSPARQL) about geospatial data in the Semantic Web, then it presents the current state of the art geospatial RDF stores and a qualitative comparison between them. Finally, [16] presents and performs an evaluation of the performance of the geospatial RDF stores. For this evaluation, [16] uses data from OSM and it follows the guidelines of [15] to define a query workload. This workload consists of basic queries that cover the four primary types of spatial queries that have been suggested in [15] and geospatial analysis queries that cover query types not studied in [15]. These are queries that use non-topological spatial functions, combine thematic and spatial criteria, and aggregate and negation queries that use spatial filters.

Bellini and Nesi [33] evaluate the needs and constraints for RDF stores to be used for smart cities services. Several well-known systems, including Virtuoso, GraphDB, Oracle, and Stardog, are assessed for semantically enabled services. The identified smart city requirements for RDF stores include: (i) spatial indexing, like information near a given geographical point, elements along a cycle path or inside a given polygonal area, (ii) high spatial querying performance, (iii) support for quads (named graph), to enable tracking the data source, with metadata and associated licenses, (iv) SPARQL version 1.0 or 1.1, and (v) actively maintained source base. The included benchmark found evidence of partial support of spatial operations in the majority of the RDF stores, and verified that only few of them support GeoSPARQL adequately.

3.4 Benchmarks distilled

At this point, it is useful to contemplate about the common requirements, techniques and metrics that got inherited and evolved in the reviewed benchmarks. This will allow us to more clearly envision which features a contemporary, comprehensive geospatial semantic benchmark should comprise.

Metrics. The metrics used in the most comprehensive or generic benchmarks include: (i) load time, (ii) repository size, (iii) cold and warm cache query response time for OLTP applications and (iv) query throughput for real-world use cases (OLAP applications).

Datasets. Synthetic datasets and generators are used in all cases because they allow the creation of data of: (i) arbitrary size and (ii) with the desired structural features (complexity) which in turn enables tests of controlled selectivity. The attempts, though, to create real world-like synthetic data by using various elaborately engineered distributions at the end cannot adequately substitute real-world datasets even when accompanied by real-world inspired use cases. Real world datasets are indispensable as they provide real data complexity, as in many thousands vertices polygons and line strings instead of rectangles, hexagons or simple sloped lines and points. At the same time, these datasets give the opportunity of running real application scenarios for demanding applications. Furthermore, by combining and interlinking real-world datasets, with the same spatial extent, we can increase even more the opportunities for hosting many more different application use cases. Another real-world requirement for some of the datasets under consideration is that they should contain quads (triples with a named graph context), an aspect overlooked by several benchmarks. Quads further complicate benchmarking activities as it involves an extra context index, which increases load time and repository size, but may favor query execution times for appropriately formulated queries.

Querysets. The categories of spatial queries used, include: (i) selections, (ii) joins, (iii) aggregates, (iv) nearest neighbor, (v) distance. The cold or warm cache response time of selections, joins and distance queries can be studied in detail in controlled settings either with synthetic datasets or within the context of a Micro-benchmark with small to medium real-world datasets. On the other hand, the addition of nearest-neighbor and aggregate queries comes natural to a Macro-benchmark setting where we are primarily interested on the query throughput for a complex use case scenario against real-world datasets. For datasets of quads, querysets must be designed that can take advantage of the named graph context.

Geographica 2 goes beyond the previously reviewed benchmarks [15,16,17, 33] as it is the first one that takes advantage of the majority of the lessons learned by its predecessors. It makes use of the most effective geospatial benchmarking techniques for dataset and queryset creation and uses appropriate metrics for evaluating geospatial RDF stores. It contains a real-world workload that uses publicly available linked geospatial data, covering a wide range of geometry types (e.g., points, lines, polygons), some of which are highly complex. Each dataset is loaded in a different named graph and therefore allows for named graph query patterns. The real-world workload follows the approach of the benchmark Jackpine [13] and defines both a micro- and a macro-benchmark. The micro-benchmark tests primitive spatial functions. The spatial component of a system is tested with queries that use non-topological functions, distance, spatial selections, spatial joins and spatial aggregate functions. The macro-benchmark tests performance of selected RDF stores in typical application scenarios like geocoding, reverse geocoding, map search and browsing and a real-world use case from the EO domain. In these scenarios, a mix of spatial nearest neighbor, spatial selections and spatial joins are tested over multiple named graphs. The benchmark also features a synthetic workload, which is based primarily on VESPA [12] and [2, 19]. It uses a generator that produces synthetic datasets of various sizes and generates queries of varying thematic and spatial selectivity. In this way, performance of geospatial RDF stores can be studied in a closely controlled environment.

4 A Functional Comparison of Geospatial RDF Stores

This section presents all of the RDF stores known to the authors that implement some geospatial functionality, and compares them in terms of the geospatial functionality that they offer.

Although GeoSPARQL is an OGC standard since 2012, it is not fully supported by any geospatial RDF store. Usually systems do not implement the Query-Rewrite Extension Also there are some RDF stores that provide geospatial capabilities which are limited to point geometries.

A common problem area is CRS support. A coordinate reference system (CRS) also referred to as spatial reference system (SRS) is a coordinate system related to an object (e.g., the Earth) through a datum which specifies its origin, scale, and orientation. OGC is among the authorities that maintain partial or non-fully compatible lists of CRSs and it provides a set of CRS URIs.^{Footnote 14} Another related organization is the International Association of Oil and Gas Producers (IOGP)^{Footnote 15} which after the absorption of the European Petroleum Survey Group (EPSG) maintains the EPSG online registry of geodetic parameters.^{Footnote 16}

We organize our comparison according to the GeoSPARQL standard. We indicate which extensions of GeoSPARQL are supported by each RDF store, which spatial relation classes and geometry serialization formats are implemented, and whether multiple CRSs are supported. We have also included a selection of available geospatial extensions which are not part of GeoSPARQL, such as the use of geometry literals as objects in triple patterns, the spatial aggregate functions defined by stSPARQL [2] and three main spatial query classes that are used for querying points. A tabular view of this comparison can be found in Table 1. The rest of the section essentially explains the contents of Table 1 by discussing in detail the functionality of each system.

Table 1 Functionality of geospatial RDF stores

Evaluating Geospatial RDF Stores Using the Benchmark Geographica 2

Abstract

Similar content being viewed by others

Geographica: A Benchmark for Geospatial RDF Stores (Long Version)

Ontop of Geospatial Databases

Strabo 2: Distributed Management of Massive Geospatial RDF Datasets

Explore related subjects

1 Introduction

2 Background

2.1 GeoSPARQL

2.2 stRDF and stSPARQL

2.3 Selection of features to test

3 Related Work

3.1 Benchmarks for SPARQL Query Processing

3.2 Benchmarks for Geospatial Relational Databases

3.3 Benchmarks for Geospatial RDF Stores

3.4 Benchmarks distilled

4 A Functional Comparison of Geospatial RDF Stores

4.1 Geospatial RDF Stores that Conform to the GeoSPARQL Standard

4.2 RDF Stores with Limited Geospatial Capabilities

5 The Benchmark Geographica 2

5.1 Real-World Workload

5.1.1 Datasets

5.1.2 Micro-benchmark

5.1.3 Macro-benchmark

5.2 Synthetic Workload

5.2.1 Datasets

5.2.2 Queries

5.3 Scalability Workload

5.3.1 Datasets

5.3.2 Queries

5.3.3 Systems

6 Benchmark Results

6.1 Experimental Setup

6.2 Real-World Workload

6.2.1 Dataset Storage

6.2.2 Micro-benchmark

6.2.3 Macro-benchmark

6.3 Synthetic Workload

6.3.1 Dataset Storage

6.3.2 Queries

6.4 Scalability Workload

6.4.1 Dataset Storage

6.4.2 Queries

7 Evaluating the Performance of RDF Stores with Limited Geospatial Capabilities

7.1 Real-World Workload

7.1.1 Dataset Storage

7.1.2 Queries

7.2 Synthetic Workload

7.2.1 Dataset Storage

7.2.2 Queries

7.3 Summary

8 Summary and Future Work

8.1 Summary

8.2 Limitations and Future Work

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation