Keywords

1 Introduction

To enforce access control on a database with limited or even no access control features, like the community version of Neo4j, we have various options. The approach we chose is rewriting insecure queries before they are handed over to the database as secure queries [1], including authorization-specific filters. Our approach even supports attribute-based access control (ABAC) [9] by operating on data stored in the database. We already motivated this approach in our previous work [6]. However, our current work does not rely on a predefined query structure, but can handle freely formulated queries.

Authorization requirements are expressed in terms of rules in the authorization policy, referencing filter templates to be used in the query rewriting. A filter template defines authorization-specific constraints to be added to the insecure query. Graph database query languages like Cypher distinguish between nodes and relationships. For this work, we refer to both using the term element. To rewrite the insecure query applying authorization-specific constraints, we consider the following research questions:

  • RQ1 Which elements of a query influence the result?

  • RQ2 What information must be provided in the authorization policy?

  • RQ3 How can we find mappings between a policy and a query?

  • RQ4 How do we apply access control filter templates on queries?

Our contributions are (1) identifying the influencing elements and how they impact the query result in Sect. 3, (2) a policy model influenced by XACML as a policy having a set of rules with conditions and references to filter templates in Sect. 4, (3) a query rewriting approach to extend the insecure query with filters encoding authorization requirements in Sect. 5, and (4) a proof-of-concept prototypeFootnote 1 using Cypher and a preliminary evaluation in Sect. 6. Related works and a summary including an outlook on future work are provided in Sects. 2 and 7 respectively.

2 Related Work

The idea of protecting data by query rewriting is influenced by Browder et. al. and their work about per-user views in Oracle databases [3]. Another influence comes from Bogaerts et. al. [2] as they propose entity-based access control, taking not only attributes but also the relations between entities into account. While their focus is on relational databases, we primarily consider graph databases and thus attributes on nodes and edges. The dynamic rewriting approach was already proposed by Jarman et. al. [7], however, on relational databases and role-based access control. Our policy model is highly influenced by XACML (Ramli et. al. [10]), although we reduced the features to a subset suitable for our requirements. Colombo et. al. proposed an approach similar to ours in [4], as they generate authorized views to replace the original collection in the query. However, their focus is on document-oriented stores with focus on IoT data analysis. Access control by query rewriting for RDF and SPARQL was also proposed by Kirrane in [8]. A slightly different approach is presented by Shay et. al. [11] which checks queries against a policy and blocks them altogether if necessary. The current work is also based on our previous work, especially [5] for query parsing and modification and [9] for XACML policies for graphs.

3 Relevant Information in the Insecure Query

To answer RQ1, we start with checking the elements of the query pattern and identify the relevant elements influencing the query result. The pattern for example in the Cypher query is . While a confidential node in the RETURN clause of a query clearly reveals information, other cases are less obvious. For example, we have a graph database with information about students and their grades. A node stores all student data and links to a node with the student’s grade for a certain exam. To protect grades, we block returning the grade-nodes. However, a malicious user could return a student’s node and include a WHERE clause filtering only for a name (which is not confidential) and a specific grade. By only returning the data-node, no access violation is detected, but it implicitly confirms the guessed grade. Therefore, an element which has a filter applied might still lead to information leaks although it is not directly returned. On the other hand aggregating functions (e.g. average) prevent access to individual elements (e.g., a student’s grade). Thus, we check the combination of filter and return status for each element (see example in Table 1). The filter status is (1) filtered or (2) unfiltered and the return status is (1) aggregated, (2) direct value or (3) not included in the return clause.

Table 1. Influencing factors in .

4 Policy Model

The purpose of the policy is to specify all authorization-relevant information. A policy P describes a pattern of elements E (i.e., nodes and relationships) and a set of rules R. The function \(\varPhi (e_{policy}, e_{query}) \rightarrow \{true, false\}\) decides whether an element of the policy pattern (\(e_{policy}\)) can be mapped to an element of the insecure query pattern (\(e_{query}\)). Let one policy be:

$$\begin{aligned} P =&(E, R, \varPhi ) \\ E =&\langle e_1, e_2, ... , e_n \rangle \\ R =&(e, C, f) \\ C =&\{c_1, c_2, ... , c_n\} \end{aligned}$$

Each rule \(r \in R\) references a single element of the policy pattern (\(e \in E\)) and specifies one or more boolean combined conditions C on the pattern elements and references a filter template f to be applied to e. A condition \(c \in C\) checks whether filter and return properties (cp. Sect. 3) are satisfied by any element of the policy pattern \(e' \in E\).

Filter templates F are used to exclude unauthorized results in the secure query. They define authorization-relevant constraints to be added to the insecure query. We define a filter template f with placeholders for runtime-specific information as follows:

$$\begin{aligned} F =&\{f_1, f_2, ... , f_n\} \\ f =&(t, A) \\ A =&\langle a_1, a_2, ..., a_n \rangle \end{aligned}$$

Every filter template \(f \in F\) includes a query fragment t containing placeholders. For each placeholder in t, its kind a (e.g., ruleElement or username) is given in A. The kind ruleElement indicates that the placeholder stands for the element in the rule which references this filter template.

5 Query Processing

For a policy and its rules to be applicable, each element defined in the policy \(e_{policy}\) is mapped to an equivalent one in the query \(e_{query}\) based on its labels and pattern structure. To find a mapping (cp. RQ3), we define a function getPaths returning a set of paths from the pattern. Each path consists of a start node, a relationship and an end node \((e_{start}, e_{relationship}, e_{end})\). The relationship and end node can be empty if the start node is isolated.

$$\begin{aligned}\begin{gathered} getPaths(E) \rightarrow E^{\star } \\ E^{\star } = \{ (e_{start}, e_{relationship}, e_{end}), ...\} \end{gathered}\end{aligned}$$

This step converts the patterns of policy and query into a common and comparable structure. We search for mappings using the function \(map(e_{policy}) \rightarrow e_{query}\):

$$\begin{aligned}\begin{gathered} map(e_{policy}) \rightarrow e_{query} \Leftrightarrow \forall \, (a, b, c) \in E_{policy}^{\star } \, \exists \, (x, y, z) \in E_{query}^{\star }: \\ \varPhi (a, x) \wedge \varPhi (b, y) \wedge \varPhi (c, z) \wedge (a = e_{policy} \wedge x = e_{query}) \wedge \\ ((b = \emptyset \wedge c = \emptyset ) \vee (b = e_{policy} \wedge y = e_{query} \wedge c = e_{policy} \wedge z = e_{query})) \end{gathered}\end{aligned}$$

The overall mapping is valid if (1) the path elements of the policy and the insecure query are successfully mapped using the function \(\varPhi \) (e.g., \(\varPhi (a, x)\)), (2) the start nodes are matched, and (3) the relationships and end nodes are either empty or matched. Accordingly, we evaluate all conditions C in all rules R for a policy P. We generate a set S of 2-tuples \((e_i, f_i)\) denoting an element from the query \(e_i\) and a filter template \(f_i\) to be applied on the insecure query q.

$$\begin{aligned}\begin{gathered} S = \{(e_i, f_i) \; | \; \exists (E, R, \varPhi ) \in P, (e, C, f) \in R, \, e_{i} \in q, \, f_i \in F \; \forall c \in C: \, \\ \Gamma (q, c) \wedge map(e) = e_{i} \wedge f_i = f) \; \} \end{gathered}\end{aligned}$$

The function \(\Gamma (q, c) \rightarrow \{true, false\}\) checks whether a condition c in the rule’s conditions C is satisfied by the insecure query q. Further, the element from the rule e must map to the element in the insecure query \(e_{i}\) and the applied filter template \(f_i\) is the same as the one f in the rule. To apply the filters of the matched rules on the insecure query (RQ4), we use the following function:

$$\begin{aligned}\begin{gathered} \varXi (q, S) \rightarrow q' \end{gathered}\end{aligned}$$

It takes an insecure query q and for each assignment \((e_i, f_i) \in S\), it instantiates \(f_i \rightarrow f_{i_q}\) according to Section 4. This \(f_{i_q}\) can then be added to \(e_i\) or it extends existing filters using boolean AND. With all filters in place, we have rewritten an insecure query q to a secure version \(q'\).

6 Evaluation

We evaluate our query rewriting approach by implementing a proof-of-concept prototypeFootnote 2 using Cypher, ANTLR, Spring Boot and Kotlin. We rewrite the insecure Cypher query based on the specified policy. The secure query and information about the applied rules are returned. In our prototypical implementation, we only support reading queries with one MATCH clause. In experiments with a set of queries, we tested all currently supported features and visually confirmed that the filters were applied correctly. However, in our prototype we did not consider potential vulnerabilities or attack vectors not addressed by ABAC.

When measuring the performance of the query rewriting (no database access), we noticed the standard deviation to be higher than the average rewriting time (\(\approx \)  0.2 ms on a HP ELITEBOOK 850 G6 with 32 GB, CPU i7-8665U). Therefore, we assume the performance overhead to be negligible.

7 Conclusion

In this paper, we proposed a runtime rewriting approach for freely formulated graph-DB queries to enforce ABAC independent of the underlying database and application. First, we defined how various elements of a query contribute to its result (RQ1). We introduced the strategy of categorizing the elements based on whether they have a filter applied and how they are used for returning data. Next, we introduced a policy model encoding our authorization requirements. Then, we formally defined a policy model including a filter template. The policy consists of a pattern and rules to decide if access control constraints apply to an element of the insecure query and which filter template to use (RQ2). The policy pattern is a sequence of elements, which is used in the query processing.

The policy and the insecure query are processed by first splitting their patterns into tuples representing either paths or isolated nodes. Accordingly, we mapped the policy elements with their respective ones in the insecure query (RQ3). A mapping is valid if each path tuple of the insecure query matches one of the policy. In this case, if all conditions of a rule are successfully evaluated, its filter template is instantiated replacing its placeholders with runtime values from the insecure query and/or user information. The last step of query processing is enhancing the insecure query with these access control filters (RQ4).

As we only consider one policy, we plan to support policy sets according to the XACML policy language model in the future. This further demands for combining algorithms. Additionally, we currently support reading queries only with one MATCH clause. Thus, we not only need to increase the supported number of MATCH clauses, but also the types of supported queries. This could be added using additional conditions or dedicated rules for reading and writing access. Above all, intensive evaluation is needed especially with complex authorization policies and large graph models. Finally, we need to identify possible potential security vulnerabilities.