1 Introduction

Attribute-based access control is a significant security part in a SOA (Service Oriented Architecture) software system that defines an access control paradigm whereby access rights are granted to users who combine attributes together. Access control is an important security mechanism for the protection of sensitive information and authorization system resources [1, 2]. The operating efficiency of an authorization service is determined by the evaluation performance of PDP (policy decision point) that is a vital component in an access control model. PDP needs to load a policy set composed of a large number of policies, whose evaluation performance will fall into a serious degradation with the scale of a policy set growing larger and larger. This problem leads authorization service systems to a challenging position [3]. A large-scale policy set is a major bottleneck of improving PDP evaluation performance in an authorization system because of its flexible construction, uneasy description and containing massive polices [4].

XACML stands for “eXtensible Access Control Markup Language”, which not only suits the specific environment, resource and application system, but also can fit the actual requirements with versatility. XACML is one of the standard implements of attribute-based access control, and it has great adaptability, compatibility and expressive ability [5]. However, XACML cannot effectively detect conflicts in a policy. Deng et al. solve this problem by presenting a conflict detecting and eliminating engine termed XDPCE, which not only can detect and eliminate conflicts within a policy, but also has the same ability as a PDP [6]. In this paper, a creative locomotive algorithm is designed based on XDPCE. In a further way, to meet the demanding evaluation of a web-based information system, the rising of XACML’s scale and complexity is becoming more notable. The major measurements to solve this problem include optimizing a policy set, formulating a policy evaluation engine, and so on. International and domestic academics and reporters have proposed extensive researches, and the findings include formulating decision diagrams, policy reordering, clustering strategies, numericalization, caching mechanism, and so on [7]. Researchers have optimized an XACML policy set from different perspectives, but the related studies available mainly have the following two problems [4, 7].

  1. 1.

    The status quo that requires a lot of storage space to statically preprocess large-scale complicated policy sets does not change.

  2. 2.

    The high time consumption of evaluating requests to dynamically adjust large-scale complicated policy sets is not alleviated.

Ngo et al. [8] present a solution called MIDD (Multi-data-type Interval Decision Diagram) using data interval partition aggregation together with new decision diagram combinations. After that, they improve it by using the data interval partition aggregation that can parse and transform complex logical expressions in policies into decision tree structures [9]. Niu et al. [10] adopt a multi-level caching mechanism based on a statistical analysis to store frequency called request-results, attributes and policy information. Policy recording and clustering strategies are adopted to optimize policy procession in [11]. Marouf et al. [12] describe an adaptive approach for XACML policy optimization by applying a clustering technique to policy sets based on the K-means algorithm. A new approach called “Satisfiability Modulo Theories (SMT)” can easily lend itself to verification at run-time during authorization query answering [13]. Qi et al. [14] optimize an XACML policy by numericalization. Attribute numericalization transforms textuary attributes of XACML policies into numerical attributes. Azzam et al. [15] elaborate on a set-based language that covers all the XACML components and establish an intermediate layer to which policies are automatically converted.

This paper makes the following contributions.

  1. 1.

    A novel policy evaluation engine is presented based on numericalization, batch processing and secondary classification.

  2. 2.

    A creative algorithm called locomotive algorithm is studied that can efficiently improve the PDP evaluation performance especially in large-scale complicated sets.

The rest of the paper is organized as follows. Related works are reviewed in Sect. 2. Section 3 introduces the framework of our proposed policy evaluation engine called XDPNBE. Section 4 outlines our propositions including numericalization, hash coding and a locomotive algorithm. In Sect. 5, a numericalization method of policy sets is described. We present a creative locomotive algorithm combined with XDPCE, secondary classification and batch processing in Sect. 6. Section 7 implements XDPNBE and compares its performance with Sun PDP, XEngine, HPEngine and SBA-XACML, respectively. Finally, we conclude this paper in Sect. 8.

2 Related work

In this section, we review related works addressing PDP evaluation performance, including numericalization, secondary classification and batch processing for XACML policies. Their limitations are discussed to emphasize our contributions.

Qi et al. [14] establish a hash table for all strings in an XACML policy, and map strings to numbers that efficiently convert strings to more easily processed numbers. However, this method consumes a lot of memory space. In this paper, we numeralize different rules according to the characteristics of different attributes of an XACML policy set. For example, attributes are counted and assigned with less changes, hash functions are used for attributes with more changes, and strings are converted into a fixed-length number for processing.

The SBA-XACML [15] reduces the complexity of policy sets and the overhead of real-time policy evaluation. However, when a policy set is extremely large and complicated, conflicts and redundancies will occur. A novel XACML policy evaluation engine, namely XEngine [16], converts all strings in an XACML policy to numerical values and the hierarchical structure of an XACML policy into a flat structure. Niu et al. [10] improve the XEngine and propose a novel XACML policy evaluation engine called HPEngine. Compared to the direct numericalization in the XEngine, the HPEngine dynamically refines policies based on a statistical analysis of a policy optimization mechanism and transforms the text form of a policy into a numerical form afterward. The HPEngine is better in performance than the XEngine, though the problem of “converting all strings into integers by the same function” still exists.

Meng et al. [17] introduce a feature selection based dual-graph sparse non-negative matrix factorization for the local discriminative clustering (DSNMF-LDC). Redundant and irrelevant features are eliminated through dimensionality reduction, and discriminative features are selected by a dual-graph model. The local discriminative clustering is utilized to divide the selected features to several categories. Blockchain technology is exploited to define an access control system that guarantees the auditability of policy evaluation [18]. Policies and attributes are managed by smart contracts deployed on the blockchain. However, its evaluation performance is limited by choosing the blockchain protocol. Deng et al. [19] explore a novel distributed PDP model based on a combination of two-stage clustering and reordering to reduce the limitation of computational performance of a single PDP.

Methods of numericalization proposed in [20] and [21] only support one-to-one matching methods. For example, resource 1 is represented by integer 1. These methods are too simple to deal with complicated policy sets. JBoss XACML [22] implements a package based on the Sun XACML, but still uses the inefficient strategy matching pattern. AXESCON XACML [23] amplifies the load and cache functions in an evaluation engine, and presents the policy reference and multi-policies matching. Nevertheless, AXESCON XACML needs more improvements on the logic of matching and indexing. In the aspect of indexing, Enterprise XACML [24] is fruitful for its high-efficiency indexing structure, and reduces the scope of policies to be retrieved, which may cause duplicate index and multiple matching.

A locomotive algorithm combined with numericalization, secondary classification and batch processing is designed to avoid the problems above.

3 Policy decision engine XDPNBE

In this section, we present the overall architecture of our approach. A policy evaluation engine called XDPNBE (XiDian Policy Numericalization & Batch processing Engine) is proposed. By loading policies, the XDPNBE can evaluate access requests and return authorization results to context processors and the PEP (Policy Evaluation Point).

The evaluation process of the XDPNBE includes two parts:

  1. 1.

    Policy sets and requests are preprocessed with numericalization;

  2. 2.

    A locomotive algorithm is designed to make authorization decisions by employing secondary classification and batch processing.

Numericalization of policy sets and requests simplifies the matching procedure and is a significant foundation for the locomotive algorithm. When requests arrive, the locomotive algorithm eliminates conflicts in policy sets by using XDPCE [6] as preprocessing, and makes authorization decisions efficiently by using secondary classification and batch processing. The output of the XDPNBE can be correctly generated owing to the locomotive algorithm with massive requests. It can dramatically economize the time of processing large-scale complicated policy sets. Our propositions include numericalization, hash coding and a locomotive algorithm. The structure of the XDPNBE is described in Fig. 1.

Fig. 1
figure 1

Structure of XDPNBE

4 Approach overview

Considering that most of attributes in rules are made of character strings, there are two distinct methods to analyze and optimize rules. One of them is dividing long strings into shorter ones to improve analysis performance; the other is a numericalization method of transforming character strings to numbers. We address attribute numericalization on the basis of hash coding and hash function. Among all the universal hash functions, the BKDRHash function [25] has the most effective consequence in both coding and application, and also has high evaluation performance when character strings are getting longer. Under such a circumstance, subject attributes are labeled simply with integers on the account of its fixed limits of authority. Resource and action attributes in large-scale complicated policy sets are set to numbers using the BKDRHash function.

We also propose a locomotive algorithm that takes every sub table of a secondary classification policy set as a train and each rule as a compartment. Based on the secondary classification and batch processing, this algorithm can successfully optimize rules and policies. In order to improve efficiency by downsizing matching, we classify policies according to their subject attributes and then classify them according to their action attributes. Processing requests in batches makes the locomotive algorithm suitable for concurrent processing.

This paper aims at reforming policy sets and simplifying the policy decision procedure, so that the evaluation performance in large-scale policy sets can be improved. An innovative locomotive algorithm is proposed to implement an effective measurement.

5 Numericalization of policy sets

In [26], the numericalization values of attributes in a policy is restricted by systems. The same attributes in the policy must be successive integers, which not only increases the cost of maintaining a policy, but also may affect the attributes of the subsequent sequence when one attribute needs to be added or deleted. Numericalization has implications and does not consider security issues. In [14], the numericalization method uses hash functions to digitize attributes of subject, action, resource and condition. However, there are many types of hash function with different effects. There is no indication of which hash function to use, and a uniform hash function is used to handle all attributes, without considering that the hash function does not work well when processing attributes with different string length. Similarly, security issues are not considered.

In this paper, a hash function is adopted to process attributes with long string length, and attributes with short string length is numbered. A zipper method is designed to resolve conflicts. When a policy set is preprocessed, a hash table is created for all the string attributes of subject, resource, and action. In the actual situation, the string length of a subject is short, while the string length of a resource and action is relatively long. Therefore, a numericalization method is used for subject attributes, and a hash function is adopted for resource and action attributes to map them to numbers, hence converting inefficient character matching into efficient digital comparisons.

If policies have dynamic natures, we can also use numericalization flexibly. For example, given “age > 21” or “has been with the company for more than 6 months”, we can define that “age > 21” is number 1 and “has been with the company for more than 6 months” is number 2. We can present a range by using numbers, giving a concrete analysis to concrete problems.

A batch processing is used to batch requests and then they are handled in batches. The idea of pipelined processor architecture are adopted. After the previous batch of requests is numericalized, it will be matched, and the next batch of requests is numericalized during the matching process of the previous batch of requests, hence an improvement of matching speed of requests in each batch.

5.1 Comparison and selection of numericalization methods

A hash function is used to digitize strings and a zipper method to resolve conflicts. The commonly used string hash functions are BKDRHash, APHash, DJBHash, JSHash, RSHash, SDBMHash, PJWHash, ELFHash, etc. The results of evaluating these hash functions [25] are shown in Table 1 [27].

Table 1 Comparisons of hash function performance

In Table 1, data 1 is the number of 100,000 random letters and numbers hash collision. Data 2 is the number of 100,000 meaningful English sentence hash conflicts. Data 3 is the number of collisions between the hash value of data 1 and 1,000,003 (a large prime number) stored in the linear table. Data 4 is the number of conflicts stored in the linear table after the hash value of data 1 and 10,000,019 (a larger prime number). After the evaluation, the scores of the four data can be obtained. Quadratic mean is used to compute the average score as shown in Eq. (1).

$${\mathrm{Q}}_{\mathrm{n}}=\sqrt{\frac{\sum_{\dot{\mathrm{i}}=1}^{\mathrm{n}}{\mathrm{x}}_{\mathrm{i}}^{2}}{\mathrm{n}}} =\sqrt{\frac{{x}_{1}^{2}+{x}_{2}^{2}+\dots +{x}_{n}^{2}}{n}}$$
(1)

According to the Eq. (1), the average score of the four data are 92.64 for BKDRHash, 86.28 for APHash, 83.43 for DJBHash, 81.94 for JSHash, 75.96 for RSHash, 72.41 for SDBMHash, 21.95 for PJWHash, and 21.95 for ELFHash by means of squared average.

It can be found that the effect of BKDRHash is the most prominent in both the actual effect and the coding implementation. Therefore, the BKDRHash function is selected for numericalization and used to solve the hash value of a string, where a is a string and s is a seed, as shown in Eq. (2).

$$ hash(a) = (\sum\nolimits_{n} {a[n]} *s^{n} )mod2^{31} $$
(2)

We consider the size of a policy set for the test, where seed is 31. The hash function of Eq. (2) is used to create a hash table for resource and action attributes. Subject attributes are numbered from zero. If a different subject attribute comes, its number is added to the previous number. A rule is quantified as < N, HashCode1, HashCode2, environment attribute, 0|1 > form, and a request is quantified as < N, HashCode1, HashCode2, environment attribute > form.

5.2 Conflict handling

For resolving conflicts, a classic zipper approach is adopted. For some hash addresses that are shared by multiple key values, a single-linked list is established for them. When a conflict occurs, a single-linked list corresponding to its hashed address is searched for to handle the conflict, as shown in Fig. 2.

Fig. 2
figure 2

Zipper method

For example, rules in the three commonly used XACML policy sets LMS [28], VMS [29], and ASMS [30] are quantified. Tables 2, 3, and 4 show the partial results of quantifying rules in the policy sets of LMS, VMS and ASMS.

Table 2 Numericalization results of LMS policy set
Table 3 Numericalization results of VMS policy set
Table 4 Numericalization results of ASMS policy set

In Tables 2, 3, and 4, the mapping relationships after the quantification of all the attributes are stored in hash tables, so the time complexity of their digitization and restoration is O(1). Although it makes a string comparison to receive a request attribute map and consumes some time resources, its time loss is negligible compared with a large number of string comparisons in the entire policy set. At the same time, because memory always needs to maintain a hash table that maps a string attribute to an integer value, it consumes memory space resources, so in essence this is a method of trading time for space. However, with the reduction of the cost of computer memory and the strong demand of a system for improving PDP evaluating performance, this method of exchanging space for time is particularly in line with the development of a system.

6 Policy decision method based on a locomotive algorithm

On the basis of numericalization, a locomotive algorithm is designed to make policy decisions. The algorithm combines XDPCE, secondary classification and batch processing to optimize PDPs. XDPCE is used as preprocessing to eliminate conflicts. Secondary classification determines the required matching rules by the method of indexing. Considering the reality of large data traffic, we select batch processing to increase throughput by finding multiple requests in batches that helps achieving the goal of improving the average efficiency.

6.1 Locomotive algorithm

In order to improve the speed of making decisions, every sub table of the secondary classification is regarded as a train and each rule as a compartment. The first request for each category matches the locomotive, that is, the subject and action attribute class. A new node is created as a mark to make sure that the same type of passenger (request) belongs to the same train (the sub table) by using the characteristics of the vertical classification (if there is no match in each sub table, it shows that there is no corresponding rule in the policy set). After that, the requests of the same category (in batches) will be retrieved directly from the train, which means that the search area will be greatly reduced from the front to the rear of the vehicle. On average, the time to find the locomotive is equal to the matching time of the subject attribute class plus that of the action attribute class. The total matching process is shown in Fig. 3.

Fig. 3
figure 3

Matching process of requests based on locomotive algorithm

A secondary table is used to store category information after a policy set is secondary classified, and a request table is employed to store the category information after requests are classified. The locomotive algorithm is shown in Algorithm 2.

6.2 Classification

We improve the PDP evaluation performance by matching the irrelevant policies as few as possible to reduce the scope of the required traversal of a policy set in the process of matching requests one by one. Our method is based on the classification of subject attributes and divide a policy set into several subclasses according to subject attributes of each policy. Thus, a request is distributed to one of subclasses, avoiding a traversal of other unrelated subclasses.

6.2.1 Secondary classification based on action attributes

If a matching object is composed of two attributes randomly, its possible number of class is the product of two attributes. But if we determine one of the attributes first and then determine the other one, the number of the object is the sum of the number of two attributes only. Correspondingly, if there are multiple attributes, we first consider the attribute combinations except the first attribute as the second attribute, and then process requests on the second attribute. The common attributes of both policies and requests are subject, action, resource and condition. Among them, action attributes are the most complicated; subject attributes are followed by a fewer number of resource attributes; condition attributes sometimes have no value and the total number is the least. Therefore, policies are classified according to their subject and action attributes in order. It is difficult to merge multiple action attributes into one class because the values of action attributes are not regular and discontinuous. In different rules, subject attributes are often different if the corresponding action attributes are distinct. In order to avoid redundancies, each action attribute is regarded as a classification foundation. To sum up, the result of secondary classification in this paper is that subject and action attributes are both classification bases, and the classification results are shown in Fig. 4.

figure j

6.2.2 Comparisons on classifications basis of action attributes and resources attributes

Suppose that the total number of subject attributes is X and the number of subject attribute classes is x, the total number of action attributes is Y, and the total number of resource attributes and condition attributes is Z. After subject attributes are divided into x classes, the number of subject attributes contained by each subject attribute class is X/x, and the time that matching the specific subject attribute is changed from O(X) to O(X/x) + O(x). Since the product of X/x and x is constant, the difference (\(\left| {X/x - x} \right|\)) is smaller and the sum (\(X/x + x\)) is bigger. Therefore, we make x = X/x, that is, we suppose that x and X/x are both approximately equal to \(\sqrt X\). It can also be deduced that the time complexities of matching the subject attribute class and the action attributes are O(x) and O(Y), respectively, the time complexity of specifying subject in the subject attribute class is O(X/x), and the time complexity of matching resource attributes and condition attributes is O(Z). The total matching time is the sum of O(x), O(Y), O(X/x) and O(Z).

Fig. 4
figure 4

Secondary classification of decision sets

6.2.3 Efficiency of secondary classification

The main index used in this paper is to classify the rules of policy sets according to the subject attributes. For example, the rules with E0001 ~ E0010 subject attributes belong to the same subject attribute class. When a request is received, it is necessary to identify the subject attribute class, determine the subject attributes, and match it one by one until the same rule is matched. The matching time of subject attribute is T0. We have

$$ T_{0} = x + X/x + YZ $$
(3)

where x is the time used to determine the subject attribute class of the request, X/x is the time used to determine the specific subject in a subject attribute class, and YZ is the time used to match the remaining attributes (action, resource and condition). Secondary classification is to match the action attributes after the subject attribute class is fixed, determine the subject in the subject attribute class, and match the resource attributes and the condition attributes. The total secondary classification index time is Tn, as shown in Eq. (4).

$$ T_{n} = x + X/x + Y + Z $$
(4)

Equation (4) shows the matching sequence used in secondary classification. After the subject attribute class and the action attributes are successfully matched, the search scope is reduced to the sub table, whose attributes include subject, resource and condition of the corresponding subject attribute class. Compared with the subject index, the time that secondary classification index saves is shown in Eq. (5).

$$ T_{0} - T_{n} = YZ - Y - Z $$
(5)

6.3 Batch processing for requests

In the optimization of PDPs, it is common to improve the efficiency of making decision by improving the speed of single decision processing. Considering the large network coverage in the information age, people’s demand for Internet operation is much higher than before, which leads to many concurrency problems. We envisage that when requests are dense and similar enough, the decision making method can be improved by batch processing, that is, when multiple regular requests arrives concurrently, they can be processed in batch according to their connections. This method bypasses the single processing time optimization, and takes the unit time processing number as the working efficiency reference standard. Assuming that the speed of single processing is the same, if the throughput is increased, the decision time can be optimized. In order to process multiple associated requests, requests need to be categorized.

6.3.1 Criteria for requests classification

In this paper, requests are classified by finding the correlation or similarity of attributes. We have considered the subject and resource attributes as classification bases (using the condition attribute is not significant), but for an actual system, it is difficult to send multiple requests for the same subject in a short time (not only more than one here, but a certain base number). As to resource attributes, they are difficult to repeat as a single resource (not suitable for batch processing), and the base number is small (the batch processing space is very limited). On the contrary, the action attributes are often repeated in a system's request, and the secondary classification of the previous text has classified the policy sets according to action attributes, so action attributes are finally chosen as criteria for the classification of requests and the basis for batch processing. The classification of rules follows the classification method of a policy set. If and only if the subject attributes belong to the same class (avoiding that requests in the same group are divided into different classes of policy sets and speeding up on the basis of the subject attribute index) and the action attributes are completely equal (since the numericalization attribute values are used), two policies can belong to the same class. Using the vertical classification of policy sets, the result of classification is shown in Fig. 5.

Fig. 5
figure 5

Vertical classification of requests

6.3.2 Time analysis of related algorithms

We assume that the number of pending requests per unit time is B, and the number of categories after the request classification is b.

6.3.2.1 Subject attribute index

The time used to match the subject attribute class under average case is x/2, and the time used to specify the subject is X/(2*x) = x/2, and the time used to match other attributes is Y/2*Z/4 = YZ/8. The total matching time is x + YZ/8.

6.3.2.2 Secondary classification based on action attributes

The time used to find the locomotive under average case is Ts, as shown in Eq. (6).

$$ T_{s} = Bx/2 + BY/2 $$
(6)

The time used to match the specific rules is Tm, as shown in Eq. (7).

$$ T_{m} = B{*}(X/x)/2 + BZ/4 = Bx/2 + BZ/4 $$
(7)

The total matching time is T1, as shown in Eq. (8).

$$ T_{1} = BY/2 + Bx + BZ/4 $$
(8)
6.3.2.3 Batch processing in request classification

In order to show the conditions, advantages and disadvantages of batch processing more directly, we discuss the average time, maximum time and minimum time of its usage. Next, the average time is used to illustrate the influence of the similarity of requests on the batch processing algorithm.

Average case The classification time of requests Tc is \(b^{2} /2\). The time used to find the locomotive Ts is bx/2 + bY/2. The index time in the table after the secondary classification Tm is Bx/2 + BZ/4. The total matching time T2(average) is the sum of Tc, Ts and Tm, as shown in Eq. (9).

$$ T_{2} (average) = b^{2} /2 + b\left( {x + Y} \right)/2 + B\left( {x/2 + Z/4} \right) $$
(9)

Worst case (b = B) The time used to match the action attributions is \(B^{2} /2\), and other time is the same as the time in method 1. Adding to the matching time of the action attribute, the total matching time is T2(worse), as shown in Eq. (10).

$$ T_{2} (worse) = B^{2} /2 + BY/2 + Bx/2 + BZ/4 $$
(10)

Best case The best case is the case when b = 1. Ignoring the locomotive matching time (there is only one locomotive), T2(best) is as shown in Eq. (11).

$$ T_{2} \left( {best} \right) = B + Bx + BY/2 + BZ/4 $$
(11)
6.3.2.4 Mathematical proof of effectiveness of batch processing

In order to test the effectiveness of the batch processing method and find out the applicable conditions, a lemma is introduced and its proof is given.

Lemma 1

Classify requests using batch processing can enhance the efficiency of policy decision when \(b < \left( {B/b - 1} \right)\left( {Y + x} \right)\).

Proof

To analyze the effect of classifying requests using batch processing, let r denote \(T_{1} - T_{2}\).

If and only if r > 0, batch processing is effective.

Next, we prove that r > 0 when \(b < \left( {B/b - 1} \right)\left( {Y + x} \right)\).

Since \(T_{1} = BY/2 + Bx + BZ/4\) and \(T_{2} = b^{2} /2 + b\left( {x + Y} \right)/2 + B\left( {x/2 + Z/4} \right)\), we have

$$ \begin{gathered} r = T_{1} - T_{2} \hfill \\ \, = BY/2 + Bx + BZ/4 - b^{2} /2 - bx/2 - bY/2 - Bx/2 - BZ/4 \hfill \\ \, = [(B/b - 1)(Y + x) - b]b/2 \hfill \\ \end{gathered} $$

Since \(b < \left( {B/b - 1} \right)\left( {Y + x} \right)\), we have.

\((B/b - 1)(Y + x) - b > 0\).

According to the definition of b, we have b > 0.

Thus, r > 0 when \(b < \left( {B/b - 1} \right)\left( {Y + x} \right)\).

Similarly, \(b < \left( {B/b - 1} \right)\left( {Y + x} \right)\) when r > 0.

If and only if r > 0, we see that the batch processing is effective. Among them, Y is the number of action attribute types, B/b is the average number of requests per category.

6.4 Cache and real-time matching

When dealing with requests in batch processing, we inevitably have to face a waiting time problem. Assuming that a group of requests are accepted in one second, a system should wait for one second before processing the first request, and if the time we actually save is less than the waiting time, it shows that the method of batch processing is a failure.

6.4.1 Shortening waiting time and caching instead of waiting

The first request does not need to wait for all subsequent requests within the limited time, but when the request is matched with the new request, the request of the waiting area is read into the policy set and then changed to a new request. This can make the waiting time more averaging and avoid some requests waiting a long time. This method caches the values of subject and action attributes in a system, and the request is directly matched with the cache, without waiting. The significance of batch processing is that it can improve efficiency in the case of high data repeatability and large amount of data. Batch processing should only be used when conditions are met. In real-time matching, each new action class needs to be compared more than once; The cache saves the waiting time and takes up a part of space. From a macro point of view, when the action attributes in requests are repeated a lot, it is suitable for real-time matching. And when the action attributes are not concentrated, it is suitable for cache matching.

6.4.2 Efficiency comparison between caching and real-time matching

The difference between the caching and real-time matching is mainly reflected in the time of matching action attributes. The average time of matching action attributes in caching is \(Y^{2} /2\), and the average time of action attribute matching in real time is \(b^{2} /2\) (This value may have errors due to the distinct type density in the requests). Real-time matching requires the previous request to wait for the next request.

Definition 1

Waiting time Tw..refers to the average interval between two consecutive requests received by a system.

The average time of action attribute matching in real time is b2/2 + Tw. When the condition is known, the matching method with shorter average time is more efficient, and the time difference is Tb is shown in Eq. (12).

$$ T_{b} = Y^{2} /2 - b^{2} /2 - T_{w} $$
(12)

When Tb is greater than zero, the real-time matching is adopted, otherwise the caching mechanism (secondary classification) is adopted.

6.5 Summary

In reality, the batch processing method can be suitably used to improve the PDP evaluation performance in practical systems, and can obtain preferable performance (i.e. improving the speed of decision-making to a great extent) in the case that a large number of requests arrive simultaneously, or that the arriving requests have the same subject attributes and action attributes.

7 Experimental results and analysis

In order to verify that the method based on a locomotive algorithm can improve the PDP evaluation performance, we introduce the policies adopted in experiments, the method of generating test policies and the experiments in which comparisons of the evaluation performance of the policy decision engine XDPNBE with that of the Sun PDP [31], XEngine [16], HPEngine [32], and SBA-XACML [15] are made in this section.

The experiments are carried out on a laptop computer running Windows 10, with Intel (R) Core (TM) i7-6700HQ 2.6 GHz processor and 16 GB of RAM. To eliminate the performance factors of implementation languages, the XDPNBE, XEngine, HPEngine and SBA-XACML are implemented in Java since the Sun PDP is written in Java.

7.1 Experimental policies

In order to simulate practical application scenarios, we select four policy sets from practical systems to test. Library Management System (LMS) [28] provides access control policies by which a public library can use Web to manage books. Virtual Meeting System (VMS) [29] provides access control policies by which Web conference services can be managed. Auction Sale Management System (ASMS) [30] provides access control policies by which items can be bought or sold online. The Continue-a policy is taken and converted from [33].

The numbers of rules in the LMS, VMS, ASMS and Continue-a are 3000, 6000, 9000 and 12,000, respectively. We consider these four policy sets as the small-scale policy sets. We expand the numbers of rules in the LMS, VMS, ASMS and Continue-a to 30,000, 60,000, 90,000 and 120,000, respectively. The new obtained policy sets are referred to as the ELMS, EVMS, EASMS and EContinue-a. We consider the new four policy sets as the large-scale policy sets.

7.2 Generation of test requests

In order to improve the coverage of the test, Martin and Xie [34] analyzed policies by Change-Impact to automatically generate access requests conforming to Change-Impact. They put forward that conflicting policies or rules can be obtained by conflict detection tools according to the fact that different policies or different rules in the same policy could make inconsistent results of evaluation for the same request. Besides, correlative access requests can be constructed for testing according to the conflicting policies or rules.

According to Wei et al. [35], access requests can be automatically generated to test the correctness of the PDP as well as the configured policies. They proposed that the Context Shema that is defined by the XML Schema of the XACML describes all the structures of the access requests that might be accepted by the PDP, or all the valid input requests. They put forward that their developed X-CREATE can generate possible structures of access requests according to the Context Shema of the XACML. The policy analyzer obtains possible input values of every attribute from a policy. The policy manager adopts the method for random allocation to distribute the obtained input values into structures of access requests. Simple Combinatorial is another test scheme generating access requests according to all the possible combinations of attribute values of subject, action and resource in the XACML policies.

Inspired by the above, we use composite transformation and context Shema to simulate actual access requests according to the actual requirements of performance testing.

7.3 Policy evaluation engines

Nowadays, the PDP [36] termed Sun PDP [31] has been widely used that can evaluate access requests selectively according to the internal rule matching mechanism [37]. We adopt the Sun PDP to evaluate requests as a decision engine in our following experiments. There are two main reasons for our choosing the Sun PDP. One is that the Sun PDP is the first and the most widely deployed implementation of XACML evaluation engines. It has become the industrial standard. The other is that the Sun PDP is open source that provides convenience.

XEngine [16] can convert text XACML policies into numerical policies, transform numerical policies from complex structures to canonical structures, convert numerical policies into tree-type data structures and efficiently process requests.

HPEngine [32] uses a statistical analysis mechanism to create caches for attributes, policies, and request results which are frequently invoked to achieve the goal of reducing the size of the policy and optimizing the matching method.

SBA-XACML [15] is a semantic-based language that establishes an intermediate layer for a policy automatic transformation. It contains formal semantics and algorithms that use mathematical operations to provide an effective policy evaluation.

In practical applications, reducing the time spent in matching policy sets is a principal issue to improve the evaluation performance of PDPs. In our experiments, we compare the evaluation performance of our proposed XDPNBE with that of the Sun PDP, XEngine, HPEngine and SBA-XACML.

7.4 Performance tests and comparisons

In order to make the experimental results more convincing, we conduct four experiments as follows.

  1. 1.

    Performance evaluation of XDPNBE before and after batch processing.

  2. 2.

    Comparisons of evaluation performance on the small-scale policy sets.

  3. 3.

    Comparisons of evaluation performance on the large-scale policy sets.

  4. 4.

    Self-comparisons of evaluation performance of XDPNBE on the small-scale and large-scale policy sets.

7.4.1 Performance evaluation of XDPNBE before and after batch processing

In order to highlight the efficiency after batch processing of requests, we evaluate the performance with or without batch processing of requests in the access control process of XDPNBE by contrast. We generate 1000, 2000, …, 10,000 access requests randomly to measure the evaluation time of the PDP. For the four policy sets of the LMS, VMS, ASMS and Continue-a, the variation of the evaluation time of XDPNBE before and after batch processing with the number of access requests is shown in Fig. 6.

Fig. 6
figure 6

Variation of evaluation time of XDPNBE before and after batch processing

From Fig. 6, we observe that

  • Whether requests are batch processed or not, the evaluation time of the XDPNBE increases when the number of access requests grows.

  • The growth rate of XDPNBE after batch processing is less than that of XDPNBE before the requests are batch processed.

  • For the policy sets of LMS, VMS, ASMS and Continue-a, the XDPNBE with batch processing of requests can effectively reduce evaluation time.

7.4.2 Comparisons of evaluation performance on small-scale policy sets

We compare our proposed XDPNBE with the Sun PDP, XEngine, HPEngine and SBA-XACML in terms of evaluation time for the four small-scale policy sets LMS, VMS, ASMS, and Continue-a, respectively. We randomly generate 1000, 2000, …, 10,000 access requests to record the evaluation time and matching speed of these policy evaluation engines. The variations of the evaluation time of each policy evaluation engine with the amount of requests are shown in Fig. 7.

Fig. 7
figure 7

Comparisons of evaluation time on small-scale policy sets

From Fig. 7, we conclude that

  • Compared with the Sun PDP, XEngine, HPEngine and SBA-XACML, XDPNBE is the most efficient engine in all experimental conditions.

  • For the same policy set, as the size of the policy set increases, the growth rate of evaluation time of XDPNBE grows much more slowly than that of the Sun PDP, XEngine, HPEngine and SBA-XACML.

  • In view of the policy set Continue-a containing 12,000 rules, when the amount of requests reaches to 10,000, the evaluation time of XDPNBE is approximately 3.12%, 27.25%, 31.47% and 33.34% of that of the Sun PDP, XEngine, HPEngine and SBA-XACML, respectively.

For the small-scale policy sets, the variations of the matching speed of each policy evaluation engine with the amount of requests are shown in Fig. 8.

Fig. 8
figure 8

Comparisons of matching speed for small-scale policy sets

From Fig. 8, we conclude that

  • Compared with the Sun PDP, XEngine, HPEngine and SBA-XACML, XDPNBE is the fastest engine in the matching speed of rules.

  • For the same policy set, with the increase of the number of requests, the matching speed of the XEngine, HPEngine and SBA-XACML does not change significantly, while XDPNBE has a significant improvement on the matching speed of rules.

7.4.3 Comparisons of evaluation performance on large-scale policy sets

We compare the proposed XDPNBE with the Sun PDP, XEngine, HPEngine and SBA-XACML in terms of evaluation time for the four large-scale policy sets ELMS, EVMS, EASMS and EContinue-a, respectively. We randomly generate 1000, 2000, …, 10,000 access requests to record the evaluation time and matching speed of these policy evaluation engines. The variations of the evaluation time of each policy evaluation engine with the amount of requests are depicted in Fig. 9.

Fig. 9
figure 9

Comparisons of evaluation time on large-scale policy sets

From Fig. 9, we conclude that

  1. 1.

    As the amount of requests rises, XDPNBE spends less evaluation time than the other three policy evaluation engines.

  2. 2.

    For the same policy set, as the size of the policy set increases, the growth rate of evaluation time of XDPNBE grows much slower than that of the Sun PDP, XEngine, HPEngine and SBA-XACML.

  3. 3.

    In view of the policy set EContinue-a containing 120,000 rules, if the number of requests reaches to 10,000, the evaluation time of XDPNBE is approximately 0.21%, 4.69%, 5.67% and 9.66% of that of the Sun PDP, XEngine, HPEngine and SBA-XACML, respectively.

For the large-scale policy sets, the variations of the matching speed of each policy evaluation engine with the amount of requests are depicted in Fig. 10.

Fig. 10
figure 10

Comparisons of matching speed for large-scale policy sets

From Fig. 10, we conclude that

  1. 1.

    As the amount of requests rises, XDPNBE is faster in the matching speed of rules than the other three policy evaluation engines.

  2. 2.

    For the policy set ELMS, with the increase in the number of requests, the matching speed of Sun PDP, XEngine, HPEngine and SBA-XACML is stable and lower than that of XDPNBE, while the matching speed of XDPNBE is gradually decreasing and tends to be stable. In the policy sets EVMS, EASMS and EContinue-a, the matching speed of these four engines is basically stable, and XDPNBE has obvious advantages.

7.4.4 Self-comparisons of evaluation performance of XDPNBE on small-scale and large-scale policy sets

We conduct self-comparison experiments of XDPNBE in terms of evaluation time for both the three small-scale policy sets and three large-scale policy sets. We randomly generate 1000, 2000, …, 10,000 access requests to record the evaluation time and matching speed of XDPNBE. The variations of the evaluation time of XDPNBE with the amount of requests for both the small-scale and large-scale policy sets are visualized in Fig. 11.

Fig. 11
figure 11

Self-comparisons of evaluation time

From Fig. 11, we conclude that

  • For the small-scale and large-scale policy sets, the evaluation time of XDPNBE grows approximately linearly as the amount of requests rises.

  • For the three small-scale policy sets, the variations of the evaluation time of XDPNBE are nearly the same as the amount of requests rises.

  • For the large-scale policy sets, as the size of the policy set rises, the growth rate of evaluation time of XDPNBE gets faster.

The variations of the matching speed of XDPNBE with the amount of requests for both the small-scale and large-scale policy sets are visualized in Fig. 12.

Fig. 12
figure 12

Self-comparisons of matching speed

From Fig. 12, we conclude that

  1. 1.

    XDPNBE runs faster in the matching speed of large-scale policy sets than that of small-scale policy sets.

  2. 2.

    For the three small-scale policy sets, the more requests arrive, the faster the matching speed of XDPNBE will be. However, for the large-scale policy sets, the matching speed of XDPNBE gradually tends to be stable. It means that when a large-scale policy set reaches a certain size, the increasing rate of matching speed of XDPNBE reaches a bottleneck and maintains stability.

8 Conclusions

A policy evaluation engine termed XDPNBE is presented in this paper, which not only can improve the PDP evaluation performance in a safer way, but also can be used for concurrent processing. XDPNBE is based on a locomotive algorithm that combines ideas of XDPCE, secondary classification and batch processing.

Before making policy decisions based on the locomotive algorithm, a numericalization method is employed to preprocess policy sets and real-time requests. In the numericalization part, we adopt the BKDRHash function to process data with long string length and a zipper method to resolve conflicts. The locomotive algorithm takes every sub table of the secondary classification policy set as a train and each rule as a compartment, which classifies policies according to their subject and action attributes successively, then increases throughput by finding multiple requests in batches and helps to improve the average efficiency especially at concurrency.