1 Introduction

Model checking has proven effective for automatically verifying correctness of protocols, controllers, schedulers and other systems. Because a model checker tool relies on the exhaustive exploration of the system’s state space, its power depends on efficient storage of states.

Fig. 1.
figure 1

Lamport’s “Bakery” mutual exclusion protocol for two threads. The wait loop is unrolled at Lines 4–7, where the process waits until all threads j, with smaller numbers or with the same number but with higher priority, expressed as \({\texttt {(N[j],j) < (N[i],i)}}\), passed their critical section. The boolean variable E[i] associated with process i serves to allow other threads to wait until i received a number in N[i]. For simplicity, we assume that each line can be executed atomically.

To illustrate the structure of typical states in model checking problems, consider Lamport’s Bakery algorithm in Fig. 1; a mutual exclusion protocol that mimics a bakery with numbering machine to prioritize customers. Due to limitation of computing hardware, the number is not maintained globally but reconstructed from local counters in N[i] (for each process i). For two processes, the state vector of this program consists of the two program counters (pcs) and all variables, i.e. \(\left\langle \texttt {E} [0], N[0], pc _0, \texttt {E} [1], N[1], pc _1 \right\rangle \).Footnote 1 Their respective domains are:

\(\left\langle \left\{ \top , \bot \right\} , [0\ldots 2], [0\ldots 7],\left\{ \top , \bot \right\} , [0\ldots 2], [0\ldots 7] \right\rangle \).

There are \(2 \times 3 \times 8 \times 2 \times 3 \times 8 = 2304\) possible state vectors. The task of the model checker is determine which of those are reachable from the initial state; here \(\iota \,\triangleq \,\left\langle \bot , 0, 0, \bot , 0, 0 \right\rangle \). It does this using a next-state function, which in this case implements the semantics of the Bakery algorithm to compute the successor states of any state. For example, the successors of the initial state are:

One successor represents the case where the first process executed Line 0; its program counter is set to 1 and \(\texttt {E} [0]\) is updated as a consequence. Similarly, the other successor represents the case where the second process executed Line 0.

figure b

While exhaustively exploring all reachable states, the model checker searches if it can reach a state from the set \( Error \). For the Bakery algorithm with two threads, we have \( Error \,\triangleq \,\{ \left\langle b_0, n_0, 7, b_1, n_1, 7 \right\rangle \mid b_0,b_1 \in \left\{ \top , \bot \right\} , n_0,n_1\in [0\ldots 2] \}\), i.e., all states where both processes reside in their critical section (= pc loc. 8). For completeness, Algorithm 1 shows the basic reachability procedure. The more states the reachability procedure can process, the more powerful the model checker, i.e., the larger program instances it can automatically verify. This number depends crucially on the size of the visited states set V in memory. Several techniques exist to reduce V: partial order reduction [19, 26], symmetry reduction [10, 30], BDDs [3, 7], etc. Here we focus on explicitly storing the states in V using state compression.

The potency of compression becomes apparent from two related observations:

  • Locality. Successors computed in the next-state function exhibit locality, e.g.,

    Note that only program counters change value (marked bold in successors).

  • Combinatorics. Similar to the set of all possible state vectors, the set of reached state vectors is also highly combinatorial. Assuming \(\left\langle \bot , 1, 4, \bot , 2, 6 \right\rangle \) can be reached from the initial state \(\iota \), we indeed saw four different vectors sharing large sub-vectors with their predecessors (underlined here):

    figure c

We hypothesize that the typical locality of the next-state function ensures that the set of reachable states exhibits this combinatorial structure in the limit. Therefore, storing each vector in its entirety in a hash table, would duplicate a lot of data. By folding the reachable state vectors in a tree, however, these shared sub-vectors only have to be stored once (more in Sect. 3).

In this paper, we investigate the lower bound on the space requirements of typical state spaces occurring in model checking. We do this by modeling the state spaces as an information stream. The values in this stream probabilistically depend on previously seen values, in effect modeling the locality in the next-state function. A simple application of Shannon’s information theory yields a lower bound for the storage requirements of our “state space stream”.

Subsequently, in Sects. 3 and 4, we investigate whether this lower bound can be reached in practice. To this end, we provide an implementation for the visited set V. A practical compressed data structure has as additional requirement that the query time, the time it takes to lookup and insert individual state vectors, is constant (or at least poly-logarithmic) in the length of the vector. The technique suggested by the information theoretical model, i.e., maintaining differences between successor states, does not satisfy this requirement. Therefore, we utilize a binary tree in combination with a compact hash table. By analyzing the best-case compression of this structure, we show that it indeed can reach information theoretical lower bound (at least in theory).

According to the same best-case analysis, our implementation of the ‘Compact Tree’ can compress up to tens of billions of large state descriptors (of tens to hundreds of integers) down to only one 32-bit integer per state. Extensive experimentation in Sect. 5 with diverse input models in four different input languages shows moreover that this compression is also reached in practice, and with little computational overhead.

2 An Information Theoretical Lower Bound

The fact that state spaces have combinatorial values is related to the fact that state generated by a model checker exhibit locality as we discussed in Sect. 1. We will make no assumptions on the nature of the inputs, besides the locality of state generation. In the current section, we will derive the information entropy—which is equal to the minimum number of bits needed for its storage—of a single state vector using basic notions from information theory.

Information theory abstracts away from the computational nature of a program by considering sender and receiver as black boxes that communicate data (signals) via a channel. The goal for the sender is to encode the data as small as possible, such that the receiver is still able to decode it back to the original. The encoded size depends on the amount of entropy in the data. In the most basic case, no statistical information is known about the data: each of X possible messages has an equal probability of taking one of its values and the entropy H is maximal: \(H(X) = log_2(|X|) bit\), i.e., the entropy directly corresponds to using one fixed-sized (\(log_2(|X|)\)) bit pattern for each possible message.

If more is known about the statistical nature of the information coming from the sender, the entropy is lower as more elaborate encodings can be used to reduce the number of bits needed per piece of information. A simple example is when we take into account the character frequency of the English language for encoding sentences. Assuming that certain characters are much more frequent, a code of fewer bits can be used for them, while longer codes can be reserved for infrequent characters. To calculate the entropy in this example, we need the probability of occurrence p(x) for each character \(x\in X\) in the English language. We can deduce this from analyzing a dictionary, or better a large corpus of texts. The entropy then becomes: \(H(X) = \sum _{x \in X}-p(x)\log _2(p(x))\)

We apply the same principle now to state vectors. As data source, we use the next-state function to compute new states, as we saw in Sect. 1:

As a simplification, let states consist of k variables. By storing full states in the queue Q, the predecessor state is always known in the model checker’s reachability procedure (see s and \(s'\) on line 6 in Algorithm 1). Hence, we can abstract away from the one-to-many relation of the next-state function and instead view the arriving states as a k-periodic stream of variable assignments:

$$ \left\langle v^0_0,\dots v^0_{k-1} \right\rangle ,\left\langle v^1_0,\dots v^1_{k-1} \right\rangle ,\cdots , \left\langle v^{n-1}_0,\dots v^{n-1}_{k-1} \right\rangle $$

It thus makes sense to describe the probability that a variable holds a certain value with respect to the same variable in the predecessor state: For each variable \(v^{i}_{j}\) with \(i \ge 0\) and \(0\le j<k-1\), both encoder and decoder can always look at the corresponding variable \(v^{i-1}_{j}\) in the predecessor to retrieve its previous value.

Since we are interested in establishing a lower bound, we may safely under-approximate the number of variables changing value with respect to a state’s predecessor. It makes sense to assume that only one variable changes value, since with zero changes, the same state is generated (requiring no “new” space in V). Hence, we take the following relative probabilities (see example Fig. 2):

$$ ~~~~~~p(v^i_j \ne v^{i-1}_j) = \frac{1}{k} ~~~~~~~~~~~~~~~~~~~~~~~~~~ p(v^i_j = v^{i-1}_j) = \frac{k-1}{k} $$
Fig. 2.
figure 2

The states generated with the next-state function seen as a stream exhibiting locality. To derive a lower bound, we assume that locality changes only one value in each new vector, i.e., each vector that has to be stored. As there are k variables in the vector, the resulting probability that a variable changes is . So the chance that it remains the same with respect to the predecessor is .

Let \(\left\langle d^0_0,\dots d^0_{k-1} \right\rangle ,\left\langle d^1_0,\dots d^1_{k-1} \right\rangle ,\dots , \left\langle d^{n-1}_0,\dots d^{n-1}_{k-1} \right\rangle \), be the domains of the state slots. As a simplification, we assume that all domains have u bits, resulting in \(y=2^{u}\) values. Therefore, there are \(y-1\) possible values for which variable \(v^{i}_j\) can differ from its predecessor \(v^{i-1}_j\). Therefore, the probability for one of these other values \(x\in d^i_j\) becomes \(p(x) = \frac{1}{k} \times \frac{1}{y-1}= \frac{1}{k(y-1)}\) (this equal probability distribution over the possible values results in higher entropy, but recall that we do not make other assumptions on the nature of the inputs). Of course, there is only one value assignment when the variable \(v^{i}_j\) does not change, i.e., the valuation of the same variable in the predecessor state \(v^{i-1}_j\).Footnote 2 This results in the following definition of entropy per variable in the stream:

$$ H_{ var }(v^i_j) = -\frac{k-1}{k}\log _2\left( \frac{k-1}{k}\right) + \sum ^{y-1}_{n=1}-\frac{1}{k(y-1)} \log _2\left( \frac{1}{k(y-1)}\right) $$

After some simplification, we can derive the state vector’s entropy:

$$\begin{aligned} H_{ state }(v^i_0,\ldots ,v^i_{k-1})= \sum ^{k-1}_{j=0} H_{ var }(v^i_j) = \log _2(y{-1}) + \log _2(k{-1}) + k \log _2\left( \frac{k}{k{-1}}\right) \end{aligned}$$
(1)

Theorem 1

(Information Entropy of States Exhibiting Locality). For \(k>1\), the information entropy of state vectors in state spaces exhibiting locality, abbreviated with \(H_{ state }\), is bound by:

$$ \log _2(y-1) + \log _2(k-1) + 1 \le H_{ state } \le \log _2(y) + \log _2(k) + 2 = u+ \log _2(k) + 2 $$

Proof

We first show that \(H_{ state } \le \log _2(y) + \log _2(k) + 2 = u+ \log _2(k) + 2\).

Simplification using \(\log _2(k-1) \le \log _2(k)\) yields: \((1 + \frac{1}{k-1})^k \le 4\). As for \(k=2\) (recall that \(k > 1\)), we have \( (1 + \frac{1}{k-1})^k = 4\) and, in the limit, we have \(\lim _{k \rightarrow \infty }(1 + \frac{1}{k-1})^k = \lim _{k \rightarrow \infty }(1 + \frac{1}{k})^k = e\), it can be seen that \((1 + \frac{1}{k-1})^k \le 4\) holds, and hence \(H_{ state } \le \log _2(y) + \log _2(k) + 2 = u+ \log _2(k) + 2\).

Now we show that \(\log _2(y-1) + \log _2(k-1) + 1 \le H_{ state } \).

Simplification yields: \(2 \le (1 + \frac{1}{k-1})^k\). Again for \(k=2\), we have \((1 + \frac{1}{k-1})^k = 4\) and \(\lim _{k \rightarrow \infty }(1 + \frac{1}{k-1})^k = e\), hence \(\log _2(y-1) + \log _2(k-1) + 1 \le H_{ state } \).    \(\square \)

Intuitively, this approximation makes sense since a single modification in each new state vector can be encoded with solely the index of the changed variable, in \(\log (k)\) bits, plus its new value, in \(\log (y) = u\) bits, plus some overhead to accommodate cases where more than one variable changes value. This result indicates that locality could allow us to store sets of arbitrarily long (\(k\cdot u\)-bit) state vectors using a small integer of less than \(u +\log _2(k) + 2\) bits per vector.

In practice, this could mean that vectors of a thousand (1024) of byte-size variables can be compressed to 20 bits each, which is only slightly more than if these states were numbered consecutively—in which case the states would be 18 bits—but far less than 8192 bits required for storing the full state vectors.

3 An Analysis of Binary Tree Compression

The interpretation of the results in Sect. 2 suggests a trivial data structure to reach the information theoretical lower bound: Simply store incremental differences between state vectors. However, as noted in the introduction, an incremental data structure like that does not provide the required efficiency for lookup up operations (the reachability procedure in Algorithm 1 needs to determine whether states have been visited before on Line 7).

The current section shows how many state vectors can be folded into a single binary tree of hash tables to achieve sharing among subvectors, while also achieving poly-logarithmic lookup times in the worst case. This is the first step towards achieving the optimal compression from Sect. 2 in practice. Section 4 presents the second step. We focus here on the analysis of tree compression. For tree algorithms, refer to [21].

The shape of the binary tree is fixed and depends only on k. Vectors are folded in the tree until only tuples remain. These are stored in the leaves. Using hashing, tuples receive a unique index which is propagated back upwards, forming again new tuple in the tree nodes that can be hashed again. This process continues until a tuple is stored in the root node, representing the entire vector.

Fig. 3.
figure 3

Tree folding process for \(\left\langle \bot , 1, 4, \bot , 2, 6 \right\rangle \) (in (a)–(d)), \(\left\langle \bot , 1, 5, \bot , 2, 6 \right\rangle \) (in (e)), \(\left\langle \bot , 1, 5, \bot , 2, 7 \right\rangle \) (in (f)) and \(\left\langle \bot , 1, 4, \bot , 2, 7 \right\rangle \) (in (g)).

Figure 3(a) demonstrates how the state \(\left\langle \bot , 1, 4, \bot , 2, 6 \right\rangle \) is folded into an empty tree, which consists of \(k-1\) nodes of empty hash tables storing tuples. The process starts at the root of the tree (a), and recursively visits children while splitting the vector (b). When the leaves of the tree (colored gray) are reached, they are filled with the values from the vector (c). The vectors inserted into the hash tables can be indexed (we use negative numbers to distinguish indices). Indices are then propagated back upwards to fill the tree until the root (d).

Using a similar process, we can insert vector \(\left\langle \bot , 1, \mathbf 5 , \bot , 2, 6 \right\rangle \) (e). The hash tables in the tree nodes extended with index -2 storing in the left child of the root, while the root is extended with the tuple . Notice how sub-vector sharing already occurs since the tuple in the left child of the root points again to . In (f), the vector \(\left\langle \bot , 1, 4, \bot , 2, 7 \right\rangle \) is also added. In this case, only the right child of the root needs to be extended, while the tuple is added to the root.

With these three vectors in the tree (f), we can now easily add a new vector \(\left\langle \bot , 1, 5, \bot , 2, 7 \right\rangle \) by merely adding the tuple to the root of the tree (g). We observe that an entire state vector (of length k in general) can be compressed to a single tuple of integers in the root of the tree, provided that the sub-vectors are already present in the left and the right sub-tree of the root.

The tree containing four vectors in Fig. 3 (g) uses 20 “places” (= 10 tuples in tree nodes) to store four vectors with a total of 24 variables. The more vectors are added, the more sharing can occur and the better the compression. We now recall the worst-case and the best-case compression ratio for this tree database. We make the following reasonable assumptions about their dimensions:

  • The respective database stores \(n = \left| {V}\right| \) state vectors of k u-bit variables.

  • The size of tree tuples is 2w bits, and w bits is enough to store both a variable valuation (in a leaf) or a tree reference (in a tree node), hence \(u \le w\).

  • Keys can be stored without overhead in tables.Footnote 3

  • k is a power of 2.Footnote 4

Fig. 4.
figure 4

From left to right: a hash table and a tree table with their dimensions.

Fig. 5.
figure 5

Optimal entries per tree node level.

Figure 4 provides an overview of the different data structures and the stated assumptions about their dimensions.

To arrive at the worst-case compression scenario (Theorem 2), consider the case where all states \(s\in V\) have k identical data values: \(V = \{ v^k \mid v\in \{1,\dots ,n\}\}\), where \(v^k\) is a vector of length k: \(\left\langle v, \dots ,v \right\rangle \). No sharing can occur between state vectors in the database, so for each state we store \(k-1\) tuples at the tree nodes.

Theorem 2

([4]). In the worst case, the tree database requires at most \(k-1\) tuple entries of 2w bits per state vector.

Table 1. Theoretical bounds for the compressed state sizes in the tree database and in plain hash table storage. Note that while \(u\le w\), often uw are in the same ballpark.

The best-case scenario (Theorem 3) is easy to comprehend from the effects of a good combinatorial structure on the size of the parent tables in the tree. If a certain tree table contains d tuple entries, and its sibling contains e entries, than the parent can have up to \(d\times e\) entries (all combinations, i.e. the Cartesian product). In a tree that is perfectly balanced (\(d=e\) for all sibling tables), the root node has n entries (1 per state), its children have \(\sqrt{n}\) entries, its children’s children \(\root 4 \of {n}\), etc. Figure 5 depicts this scenario.

Hence there are a total of tuple entries. Dividing this series by n gives a series for the expected number of tuple entries per state: \(\sum \limits _{i=0}^{\log _2(k)-1} 2^{i} \frac{\root 2^{i} \of {n}}{n}\). It is hard to see where this series exactly converges, but Theorem 3 provides an upper bound. The theorem is a refinement of the upper bound established in [4]. Note that the example above of a tree with the four Bakery algorithm states already represents an optimal scenario, i.e., the root table is the cross product of its children.

Theorem 3

In the best case and with \(k \ge 8\), the tree database requires less than \(n + 2\sqrt{n} + 2 \root 4 \of {n} (k-4)\) tuple entries of 2w bits to store n vectors.

Proof

In the best case, the root tree table contains n entries and its children both contain \(\sqrt{n}\) entries. The entries in the 4 children’s children of the root represent vectors of size . These 4 tree nodes contain each \(\root 4 \of {n}\) entries that each require tuples taking the worst case according to Theorem 2 (hence also \(k \ge 8\)).

   \(\square \)

Corollary 1

([21]). In the best case, the total number of tuple entries l in all descendants of root table is negligible (\(l \ll n\)), assuming a relatively large number of vectors is stored: \(n \gg k^{2} \gg 1\).

Corollary 2

([21]). In the best case, the compressed state size approaches 2w.

Table 1 lists the achieved compressed sizes for states, as stored in a normal hash table and a tree database. As a simplifying assumption, we take u to be equal w, which can be the case if the tree is specifically adapted to accommodate u bit references.

Performance. We conclude the current section with a note on the performance of the tree database compared to a plain hash table. The tree trades ku bit vector lookups for \(k-1\) of 2u-bit tuple lookups in its nodes. Apart from the additional data access required (\(ku - 2u\)), it seems like the increased random memory accesses could cause poor behavior on modern CPUs. However, in the case of good compressions, the lower tables in the tree typically contain fewer entries which can more easily be cached, whereas effective caching of the large plain vectors in hash table solutions is nigh impossible. Moreover, we can further use locality to speed up tree lookups by keeping the tree of the predecessor state in the search stack (Q), as explained in [21]. Figure 6 illustrates this.

Fig. 6.
figure 6

Incremental insertion of state \(\left\langle \bot ,1,\mathbf 5 , \bot ,2,6 \right\rangle \). Only a small part of the tree needs to be updated (dashed boxes), because the predecessor state \(\left\langle \bot ,1,4, \bot ,2,6 \right\rangle \) is used to lookup unchanged parts (the crosses in the tree of the successor state).

4 A Novel Compact Tree

The current section shows how a normal tree database can be extended to reach the information theoretical optimum using a compact hash table.

Hash Tables and Compact Hash Tables. A hash table stores a subset of a large universe U of keys and provides the means to lookup individual keys in constant time. It uses a hash function to calculate an address h from the unique key. The entire key is then stored at its hash or home location in a table T (an array of buckets): \(T[h] \leftarrow key \). Because typically \(|U|\gg |T|\), multiple keys may have the same hash location. These so-called collisions are handled by calculating alternate hash locations and inserting the key there if empty. This process is known as probing. For this reason, the entire key needs to be stored in it; to distinguish which key is currently mapped to a bucket of T.

Fig. 7.
figure 7

Cleary table T storing keys K from universe U with three admin. bits/bucket. (We omit that keys should be hashed, with invertible function, for good distribution.)

Observe, however, that in the case that \(|U|\le |T|\), the table can be replaced with a perfect hash function and a bit array. Compact hashing [8] generalizes this idea for the case \(|U|>|T|\) (the table size is relatively close to the size of the universe). The compact table first splits a key k into a quotient q(k) and a remainder \( rem (k)\), using a reversible operation, e.g., \(q(k) = k \% \left| {T}\right| \) and \( rem (k)= k / \left| {T}\right| \). When the key is \(x = \lceil log_2(|U|)\rceil \) bits, the quotient \(m = \lceil log_2(|T|)\rceil \) bits and the remainder \(r = x - m\) bits. The quotient is used for addressing in T (like in a normal hash table). Now only the remainder is stored in the bucket. The complete key can now be reconstructed from the value in T and the home location of the key. If, due to collisions, the key is not stored at its home location, additional information is needed. Cleary [8] solved this problem with little overhead by imposing an order on the keys in T and introducing three administration bits per bucket. For details, see [8, 12, 28]. Because of the administration bits, the bucket size b of compact hash tables is \(b = r + 3\) bits. The ratio can approach zero arbitrarily close, yielding good compression. For instance, a compact table only needs 5 bits per bucket to store \(2^{30}\) 32-bit keys (Fig. 7).

Compact Tree Database. To create a compact tree database, we replace the hash tables in the tree nodes with compact hash tables.

Let the tree references again be w bits; Tuples in a tree node table are 2w bits. The tree node table’s universe therefore contains \(2^{2w}\) tuples. However, tree node tables cannot contain more than \(2^w\) entries, otherwise the entries cannot be referenced (with w-bit indices) by parent tree node tables. As the tree’s root table has no parent, it can contain up to \(2^{2w}\) entries. Let o be the overcommit of the tree root table \(T_{ root } \), i.e., \(\log _2(\left| {T_{ root }}\right| ) = 2^{w+o}\) for \(0 \le o \le w\).

Overcommitting the root table in the tree can yield better reductions as we will see. However, it also limits the subsets of the state universe that the tree can store. Close-to-worst-case subsets might be rejected as the left or right child (\(2^w\) tuples max) of the root can grow full before the root does (\(2^{w+o}\) tuples max).

We will only focus on replacing the root table with a compact hash table as it dominates the tree’s memory usage in the optimal case according to Corollary 1. The following parameters follow for using a compact hash tables for \(T_{ root } \):

figure j

Let the Compact Tree Database be a Tree Database with the root table replaced by a compact hash table with the dimensions provided above, ergo: \(n = \left| {V}\right| = \left| {T_{ root }}\right| = 2^{w+o} = 2^m\). Theorem 4 gives its best-case memory usage.

Theorem 4

(Compact Tree Best-Case). In the best case and with \(k\ge 8\), the compact tree database requires less than \( CT _{ opt } \,\triangleq \,(w- o +3)n + 4w\sqrt{n} + 4w\root 4 \of {n} (k-4) \) bits to store n vectors.

Proof

According to Theorem 3, there are at most \(n + 2\sqrt{n} + 2 \root 4 \of {n} (k-4)\) tuples in a tree with optimal storage. The root table contains n of these tuples, its descendants use at most \(2\sqrt{n} + 2 \root 4 \of {n} (k-4)\) bits. The n tuples in the root table can now be stored using \(w - o + 3\) bits in the compact hash table buckets instead of 2w bits, hence the root table uses \(n(w - o +3)\) bits.    \(\square \)

Finally, Theorem 5 relates the compact tree compression results to our information theoretical model in Sect. 2, under the reasonable assumptions that \(8 \le k \le \root 4 \of {n} + 4\). As a consequence, when the overcommit (\(o - 7\) bits) fills the gap of \(w - u\) bits between the sizes of references in the tree (w bits) and the sizes of variables (u bits), the optimal compression of the compact tree is approached. If \(o-7 > w -u\), the compact tree can even surpass the compression predicted by our information theoretical model. This is not surprising as the tree with \(k=2\) reduces to a compact hash table, for which a different information theoretical model holds [12, 27].

Theorem 5

Let \( CT _{ opt } \) be the best-case compact-tree compressed vector sizes. We have \( CT _{ opt } \le \frac{w - o +7}{u} H_{ state } \) provided \(8 \le k \le \root 4 \of {n} + 4\).

Proof

According to Theorem 1, \(nH_{ state } \le un + \log _2(k)n + 2n\) bits. According to Theorem 4, the compact tree database uses at most \( CT _{ opt } \,\triangleq \,(w-o+3)n + 4w\sqrt{n} + 4w\root 4 \of {n} (k-4)\) bits in the best case and with \(k\ge 8\).

We show that \( CT _{ opt } \le c H_{ state } \) using lower bound Theorem 1 and derive c.

After simplification using \((u-1) \le \log _2(y-1)\) and \(0 \le \log _2(k-1)\), we obtain: . As the premise ensures that \( n \ge (k-4)^4\), this can be further simplified to and then to \(8w/\sqrt{n} \le cu - w + o - 3 \).

In an intermediate step, we show that under the premises \(n \ge (k-4)^4\) and \(k \ge 8\). We have \(w \le \log _2(n)\) in order to accommodate the worst-case compression (see Theorem 2 in Sect. 4). Therefore, we can also prove . Implied by the two earlier assumptions from the premise, we have \(n \ge 256\) for which the inequality indeed holds.

With , the above gives \( 4 \le cu - w + o - 3 \) and \( \frac{w - o + 7}{u} \le c \).

Therefore, we obtain \( CT _{ opt } \le \frac{w -o + 7}{u} H_{ state } \), provided that \(n \ge (k-4)^4\).    \(\square \)

5 Experiments

We implemented the Compact Tree in the model checker LTSmin  [22]. This implementation is based on two concurrent data structures: a tree database [21] and a compact hash table [28], based on Cleary’s approach [8]. The parameters of the Compact Tree Table in this implementation are (for details see [23]):

figure k

LTSmin is a language-independent model checker based on a partitioned next-state interface [18]. We exploit this property to investigate the compression ratios of the Compact Tree for four different input types: DVE models written for the DiVinE model checker [1], Promela models written for the spin model checker [15], process algebra models written for the mCRL2 model checker [9], and Petri net models from the MCC contest [20]. Table 2 provides an overview of the models in each of these input formats and a justification for the selection criterion used. In total, over 400 models were used in these benchmarks.

Table 2. Input languages and model selection criteria

All experiments ran on a machine with 128 GB memory and 48 cores: four AMD OpteronTM 6168 processors with 12 cores each.

Compression Ratio. Compressed state sizes of our implementation can roughly approach \(w - 2 + 3 = 31\) bits or \(\pm 4\) Bytes by Corollary 1 and Theorem 4. We first investigate whether this compression is actually reached in practice. Figure 8 plots the compressed sizes of the state vectors against the length of the uncompressed vector. We see that for some models the optimal compression is indeed reached. The average compression is 7.88 Bytes per state. The fact that there is little correlation with the vector length confirms that the compressed size indeed tends to be constant and vectors of up to 1000 Bytes are compressed to just above 4 Bytes. Figure 9 furthermore reveals that good compression correlates positively with the state space size, which can be expected as the tree can exhibit more sharing.

Only for Petri nets and for DVE models, we find models that exhibit worse compression (between 10 and 15 Bytes per state) even when the state space is large. However, we observed that in these cases, the vector length k is also large, e.g., the two Petri net instances with a compressed size of around 12 have \(k > 400\). Based on some earlier informal experiments, we believe that with some variable reordering, these compression might very well be improved to reach the optimum. Thus far, however, we were unable to derive a reordering heuristic that consistently improves the compression.

Fig. 8.
figure 8

Compressed sizes in Compact Tree for all benchmarks against the length k of the uncompressed state vector.

Fig. 9.
figure 9

Compressed sizes in Compact Tree for all benchmarks against the size n of state space.

Fig. 10.
figure 10

Sequential runtimes of LTSmin with Compact Tree and spin with optimal settings (as reported in [2]) on (translated) DVE models and Promela models.

Fig. 11.
figure 11

Runtimes, sequentially and with 48 threads, of LTSmin with compact tree on all models: DVE, Promela, process algebra and Petri nets.

Runtime Performance and Parallel Scalability. In the introduction, we mentioned the requirement that a database visited set ideally features constant lookup times, like in a normal hash table. To this end, we compare the runtime of the DVE models with the spin model checker; a model checker known for its fast state generator.Footnote 5 Figure 10 confirms that the runtimes of LTSmin with Compact Tree are sequentially on par with those of spin, and often even better. We attribute this performance mainly to the incremental vector insertion discussed in Sect. 3 (see Fig. 6). Based on the MCC 2016 [20] results, we believe that LTSmin ’s performance is on par with other Petri net tools as well.

The measured performance first of all confirms that the Compact Tree satisfies its requirements. Secondly, it provides a good basis for the analysis of parallel scalability (if we had chosen to implement the Compact Tree in a slow scripting language, the slowdown would yield “free” speedup). Figure 11 compares the sequential runtimes to the runtimes with 48 threads. The measured speedup often surpasses 40x, especially when the runtimes are longer and there is more work to parallelize. Speedups are good regardless of input language.

Fig. 12.
figure 12

Compressed sizes per state of LTSmin with Compact Tree and spin with collapse compression [14] on DVE models.

Fig. 13.
figure 13

Absolute memory use of LTSmin with Compact Tree and spin with collapse compression [14] on DVE models.

Comparison with Other Data Structures. spin ’s collapse compression uses the structure in the model to fold vectors, similar as in tree compression. The lower bounds reported in the current paper cannot be reached with collapse due to the n-ary tree and the two levels. Figures 12 and 13 show additional experiments that show an order of magnitude difference in practice.

Jensen et al. [16] propose a Trie for storing state vectors. Tries compress vectors by ensuring sharing between prefixes. BDDs [6] also store state vectors efficiently, however, Jensen et al. [17] figure them too slow for state space exploration. We compared both Tries and BDDs with the Compact Tree and found that (1) the Trie’s compression is less than the Compact Tree though sometimes faster (Figs. 14 and 15), and (2) that BDD’s are not always prohibitively expensive with LTSmin (because it learns the transition relation [18]), but nonetheless hard to compare to Tree Compression (Figs. 17 and 16).

Fig. 14.
figure 14

Memory use per state of LTSmin with Compact Tree and Trie from [16] on Petri net models.

Fig. 15.
figure 15

Runtime (sequential) of LTSmin with Compact Tree and Trie from [16] on Petri net models.

Fig. 16.
figure 16

Memory use per state of LTSmin with Compact Tree and BDD [3] on mCRL2, Promela and Petri net models.

Fig. 17.
figure 17

Runtime (sequential) of LTSmin with Compact Tree and BDD [3] on mCRL2, Promela and Petri net models.

6 Discussion and Conclusion

The tree compression method discussed here is a more general variant of recursive indexing [14], which only breaks down processes into separate tables. Hash compaction [25] compresses states to an integer-sized hash, but this lossy technique becomes redundant with the compact tree database. Bloom filters [13] still present a worthwhile lossy alternative using only a few bits per state, but of course abandon soundness when applied in model checking.

Evangelista et al. [11] report on a hash table storing incremental differences of successor states (similar to the incremental data structure discussed in Sect. 3). Their partial vectors take \(2u + \log (E)\) bits, where E is the set of (deterministic) actions in the model. Defying our requirement of poly-logarithmic for lookups, Evangelista et al. reconstruct full states by reconstructing all ancestors.

A Binary Decision Diagram (BDD) [6] can store an astronomically sized state set using only constant memory (the true leaf). Our information theoretical model suggests however that compressed sizes are merely linear in the number of states (and constant in the length of the state vector). We can explain this with the fact that we only assume locality about inputs. Compression in BDDs, on the other hand, depends on the entire state space. Therefore, we would have to assume structural, global properties to describe the non-linear compression of BDDs (e.g. the input’s decomposition into processes, symmetries, etc.).

Much like in BDDs [5], the variable ordering influences the number of nodes in a tree table and thus the compression, as mentioned in Sect. 1. Consider the vector set \(\left\{ {\varvec{i}}{\varvec{,}}{\varvec{i}}{\varvec{,}}{\varvec{j}}{\varvec{,}}{\varvec{j}} \mid i,j \in [1\dots N] \right\} \): Only the root node in a compact tree will contain \(N^2\) entries, while the leaf nodes contain N entries. On the other hand, we have no such luck for the set \(\left\{ {\varvec{i}}{\varvec{,}}{\varvec{j}}{\varvec{,}}{\varvec{i}}{\varvec{,}}{\varvec{j}} \mid i,j \in [1\dots N] \right\} \). Preliminary research [29] revealed that the tree’s optimum can be reached in most cases for DVE models, but we were unable to find a heuristic to consistently realize this.