1 Introduction

1.1 Static Trees

A binary search tree is one of the most fundamental data structures in computer science. In response to a search operation, some binary trees perform changes in the data structure, while others do not. For example, the splay tree [18] data structure performs a sequence of rotations that moves the searched item to the root. Other binary search tree data structures do not change at all during a search, for example, red-black trees [13] and AVL trees [1]. We will call BSTs that do not perform changes in the structure during searches static and call trees that perform changes BSTs with rotations. In this work we do not consider insertions and deletions, only searches in the comparison model, and thus can assume without loss of generality that all structures under consideration store the integers from 1 to n and that all searches are to these items.

We consider two variants of static BSTs: root finger and lazy finger. In the classic method, the root finger method, the first search proceeds from the root to the item being searched. In the second and subsequent searches, a root finger BST executes the searches in the same manner, always starting each search from the root. Here we consider lazy finger BSTs to be those which start each search at the destination of the previous search and move to the item being searched. In general, this movement involves going up to the least common ancestor (LCA) of the previous and current items being searched, and then moving down from the LCA to the current item being searched. In order to facilitate such a search, each node of the tree needs to be augmented with the minimal and maximal elements in its subtree. (In fact, this can be reduced by observing that if one is willing to look at the data in the parent of the LCA, only the minimum or maximum needs to be stored depending on whether a node is the left or right child of its parent).

1.2 Notation and Definitions

A static tree T is a fixed binary search tree containing n elements. No rotations are allowed. The data structure must process a sequence of searches, by moving a single pointer in the tree. Let r(Tij) be the time to move the pointer in the tree T from node i to j. If \(d_T(i)\) represents the depth of node i, with the root defined as having depth zero, then

$$\begin{aligned} r(T,i,j)&=d_T(i)-d_T(\text {LCA}_T(i,j)) + d_T(j)- d_T(\text {LCA}_T(i,j))\\&=d_T(i)+d_T(j)-2d_T(\text {LCA}_T(i,j)). \end{aligned}$$

Observe that if i is the root, r(Tij) is simply \(d_T(j)\), as the other two terms are the depth of the root, which is zero.

The runtime to execute a sequence \(X=x_1,x_2, \ldots x_m\) of searches on a tree T using the root finger method is

$$\begin{aligned} R_{\text {root}}(T,X) = \sum _{i=1}^m r(T,root(T),x_i)=\sum _{i=1}^m d_T(x_i) \end{aligned}$$

and the runtime to execute the same sequence on a tree T using the lazy finger method is

$$\begin{aligned} R_{\text {lazy}}(T,X)&= \sum _{i=1}^m r(T,x_{i-1},x_i)\\&=\left( 2\sum _{i=1}^m (d_T(x_i)-d_T(\text {LCA}_T(x_i,x_{i-1})))\right) -d_T(x_m) \end{aligned}$$

where \(x_0\) is defined to be the root of T, which is where the first search starts.

1.3 History of Optimal Static Trees with Root Finger

For the root finger method, once the tree T is fixed, the cost of any single search in tree T depends only on the search and the tree, not on any of the search history. Thus, the optimal search tree for the root finger method is a function only of the frequency of the searches for each item. Let \(f_X(a)\) denote the number of searches in X to a. Given \(f_X\), computing the optimal static BST with root finger has a long history. In 1971, Knuth gave a \(O(n^2)\) dynamic programming solution that finds the optimum tree [15]. More interesting is the discovery of a connection between the runtime of the optimal tree and the entropy of the frequencies:

$$\begin{aligned} H(f_X) = \sum _{a=1}^{n} \frac{f_X(a)}{m}\lg \frac{m}{f_X(a)}. \end{aligned}$$

Melhorn [16] showed that a simple greedy heuristic proposed by Knuth [15] and shown to have a linear-time implementation by Fredman [11] produced a static tree where an average search takes time \(2+\frac{1}{1-\lg (\sqrt{5}-1)}H(f_X)\). Furthermore, Melhorn demonstrated a lower bound of \(\frac{1}{\lg 3} H(f_X)\) for an average search in an optimal static tree, and showed this bound was tight for infinitely many distributions. (The \(\lg 3\) come from the fact that the comparisons at the node of a BST are in fact three-way and thus yield at most \(\lg 3\) bits of information). Thus, by 1975, it was established that the runtime for an average search in an optimal search tree with root finger was \(\varTheta (H(f_X))\), and that such a tree could easily be computed in linear time.

1.4 Our Results

We wish to study the natural problem of what we call search with a lazy finger in a static tree, i.e. have each search start where the last one ended. We seek to characterize the optimal tree for this search strategy, and describe how to build it.

The lazy finger method is asymptotically clearly no worse then the root finger method; moving up to the LCA and back down is better than moving to the root and back down, which is exactly double the cost of the root finger method. But, in general, is the lazy finger method better? For the lazy finger method, the cost of a single search in a static tree depends only on the current search and the previous search. Thus the optimal search tree for the lazy finger method only depends on the frequency of each search transition; let \(f_X(a,b)\) be the number of searches in X to b where the previous search was to a. This is between the power of root finger, which only depends on the current search, and trees with rotations where the runtime can depend on a superconstant number of the previous searches. Given these pairwise frequencies (from which the frequencies \(f_X(a)\) can easily be computed), is there a nice closed form for the runtime of the optimal BST with lazy finger? One candidate runtime to consider is the conditional entropy, which takes into account the entropy of one event given another:

$$\begin{aligned} H_c(f_X) = \sum _{a=1}^{n} \sum _{b=1}^{n} \frac{f_X({a,b})}{m}\lg \frac{f_X(a)}{f_X({a,b})} \end{aligned}$$

This is of interest as information theory gives this as an expected lower boundFootnote 1 if the search sequence is derived from a Markov chain where n states represents searching each item; this is because the conditional entropy gives the expected information (and thus a lower bound on binary decisions) in a search given the previous search.

While a runtime related to the conditional entropy is the best achievable by any algorithm parameterized solely on the pairwise frequencies, however, we will show in Lemma 5 that the conditional entropy is impossible to be asymptotically achieved for any BST, static or dynamic, within any \(o(\log n)\) factor. Thus, for the root finger, the lower bound given by information theory is achievable, yet for lazy finger it is not related to the runtime of the optimal tree. In Sect. 7 we will present a simple static non-tree structure whose runtime is related to the conditional entropy.

This still leaves us with the question: is there a simple closed form for the runtime of the optimal BST with lazy finger? We answer this in the affirmative by showing an equivalence between the runtime of BSTs with lazy finger and something known as the weighted dynamic finger runtime. In the weighted dynamic finger runtime, if item i is given weight \(w_i\), then the time to execute search \(x_i\) is

$$\begin{aligned} \lg \frac{ \sum _{k=\min (x_i,{x_{i-1}})}^{\max ({x_i},{x_{i-1}})} w_{k}}{\min (w_{x_{i-1}},w_{x_i})}. \end{aligned}$$

Our main theorem is that the runtime of the best static tree with lazy finger, LF(X), is given by the weighted dynamic finger runtime bound with the best choice of weights:

$$\begin{aligned}LF(X)= \min _T R_\text {lazy}(T,X) = \varTheta \left( \min _W \left\{ \sum _{i=1}^m \lg \frac{\sum _{k=\min (x_i,{x_{i-1}})}^{\max ({x_i},{x_{i-1}})} w_{k}}{\min (w_{x_{i-1}},w_{x_i})} \right\} \right) \end{aligned}$$

To prove this, we first state the result of Seidel and Aragon [17] in Sect. 2 of how to construct a tree with the weighted dynamic finger runtime given a set of weights. Then, in Sect. 3, we show how, given any static tree T, there exist weights such that the runtime of T on a sequence using a lazy finger can be lower bounded using the weighted dynamic finger runtime with these weights. These results are combined in Sect. 4 to give the main theorem.

While a nice closed-form formula for the runtime of splay trees is not known, there are several different bounds on their runtime: working set, static finger, dynamic finger, and static optimality [7, 8, 18]. One implication of our result is that the runtime of the optimal lazy finger tree is asymptotically as good as that of all of the aforementioned bounds with the exception of the working set bound (see Theorem 3 for why the working set bound does not hold on a lazy finger static structure).

While these results have served to characterize the best runtime for the optimal BST, a concrete method is needed to compute the best tree given the pairwise frequencies. We present a dynamic programming solution in Sect. 6; this solution takes time \(O(n^3)\) to compute the optimal tree for lazy finger, given a table of size \(n^2\) with the frequency of each pair of searches occurring adjacently. This method could be extended using the ideas of Iacono and Mulzer [14] into one which periodically rebuilds the static structure using the observed frequencies so far; the result would be an online structure that for sufficiently long search sequences achieves a runtime that is within a constant factor of the optimal tree without needing to be initialized with the pairwise frequencies.

1.5 Relation to Finger Search Structures

The results here have a relation to the various finger search structures that have been proposed. We note, first of all, that the trees we are considering are not level linked; the only pointers are to the parent and children. Secondly, while the basic finger search runtime of \(O(\sum _{i=2}^m \log |x_i-x_{i-1}|)\) (recall that we are assuming the \(x_i\) are integers from 1 to n) is long known to be easily achievable in a static tree, it is easily shown that there are some search sequences X for which the optimal tree performs far better. For example, the search sequence \(x_i= i \sqrt{n} \bmod n\) where n is a perfect square can be easily executed in time O(m) on the best static tree with lazy finger, which is much better than the \(\varTheta (m \log n)\) of dynamic finger.

But this limitation of the \(O(\sum _{i=2}^m \log |x_i-x_{i-1}|)\) runtime has been long known, which is why the weighted version of finger search was proposed. Our main contribution is to realize that the weighted dynamic finger runtime bound, which was not proposed in the context of lazy finger, is the asymptotically tight characterization of BSTs with lazy finger when used with the best choice of weights.

1.6 Why Static Trees?

Static trees are less powerful than dynamic ones in terms of the classes of search sequence distributions that can be executed quickly, so why are we studying them? One should use the simplest structure with the least overhead that gets the job done. By completely categorizing the runtime of the optimal tree with lazy finger, one can know if such a structure is appropriate for a particular application or whether one should instead use the more powerful dynamic trees, or simpler root-finger trees.

Rotation-based trees have horrible cache performance. However, there are methods to map the nodes of a static tree to memory so as to have optimal performance in the disk-access model and cache-oblivious models of the memory hierarchy [6, 9, 12, 19]. One leading cache oblivious predecessor query data structure that supports insertion and deletion works by having a static tree and moves the data around in the fixed static tree in response to insertions and deletions and only periodically rebuilds the static structure [4]—in such a structure an efficient static structure is the key to obtaining good performance even with insertions and deletions.

Also, concurrency becomes a real issue in dynamic trees, which requires another layer of complexity to resolve (see, for example Bronson et al. [5]), while static trees trivially support concurrent operations.

2 Weights Give a Tree

We use the following theorem:

Theorem 1

(Seidel and Aragon [3]) Given a set of positive weights \(W=w_1, w_2, \ldots w_n\), there is a randomized method to choose a tree \(T_W\) such that the expected runtime (with repect to the choice of tree) of search with lazy finger is \( r(T_W,i,j) = O\left( \lg \frac{ \sum _{k=\min (i,{j})}^{\max ({i},{j})} w_{k}}{\min (w_i,w_j)} \right) .\)

The method to randomly create \(T_W\) is a straightforward random tree construction using the weights: recursively pick the root using the normalized weights of all nodes as probabilities. Thus, by the probabilistic method [2], there is a deterministic tree, call it \(T_{W}\) whose runtime over the sequence X is at most the runtime bound of Seidel and Aragon for the sequence X on the best possible choice of weights.

Corollary 1

For any set of positive weights \(W=w_1, w_2, \ldots w_n\) there is a tree \(T_{W}(X)\) such that

$$\begin{aligned} \sum _{i=1}^m r(T_{W}(X),x_{i-1},x_i) = O \left( \sum _{i=1}^m \lg \frac{\textstyle \sum _{k=\min (x_i,{x_{i-1}})}^{\max ({x_i},{x_{i-1}})} w_{k}}{\min (w_{x_{i-1}},w_{x_i})} \right) \end{aligned}$$

Proof

This follows directly from Seidel and Aragon, where \(T_{W}(X)\) is a tree that achieves the expected runtime of their randomized method for the best choice of weights. \(\square \)

3 Trees Can Be Represented by Weights

We show that a tree can be represented by weights:

Lemma 1

For each tree T there is a set of weights \(W^T=w_1^T,w_2^T,\ldots w_n^T\) such that for all ij \(r(T,i,j) = \varTheta \left( \lg \frac{ \sum _{k=\min (i,{j})}^{\max ({i},{j})} w^T_{k}}{\min (w_i^T,w_j^T)} \right) .\)

Proof

These weights are simple: give a node at depth d in T a weight of \({4^{-d}}\). Consider a search that starts at node i and goes to node j. Such a path goes up from i to \(\text {LCA}_T(i,j)\) and down to j. A lower bound on \(\sum _{k=\min (i,{j})}^{\max ({i},{j})} w^T_{k}\) is the weight of \(\text {LCA}_T(i,j)\) which is included in this sum and is \({4^{-d_T(\text {LCA}_T(i,j))}}\). Thus we can bound \(\lg \frac{ \sum _{k=\min (i,{j})}^{\max ({i},{j})} w^T_{k}}{\min (w_i^T,w_j^T)}\) as follows: \(\lg \frac{ \sum _{k=\min (i,{j})}^{\max ({i},{j})} w^T_{k}}{\min (w_i^T,w_j^T)} \ge \lg \frac{{4^{-d_T(\text {LCA}_T(i,j))}}}{\min \left( {4^{-d_T(i)}},{4^{-d_T(j)}}\right) } =2\max (d_T(i),d_T(j))-2d_T(\text {LCA}_T(i,j)) \ge d_T(i)+d_T(j)-2d_T(\text {LCA}_T(i,j)) =r(T,i,j)\).

Similarly, an upper bound on \(\sum _{k=\min (i,{j})}^{\max ({i},{j})} w^T_{k}\) is twice the weight of \(\text {LCA}_T(i,j)\): \({2 \cdot 4^{-d_T(\text {LCA}_T(i,j))}}\). This is because each of the two paths down from the LCA have weights that when summed are geometric and sum to less than half that of the LCA: \(\lg \frac{ \sum _{k=\min (i,{j})}^{\max ({i},{j})} w^T_{k}}{\min (w_i^T,w_j^T)} \le \lg \frac{{2 \cdot 4^{-d_T(\text {LCA}_T(i,j))}}}{\min \left( {4^{-d_T(i)}},{4^{-d_T(j)}}\right) } =1+ \lg \frac{{4^{-d_T(\text {LCA}_T(i,j))}}}{\min \left( {4^{-d_T(i)}},{4^{-d_T(j)}}\right) }\) which is \(1+2r(T,i,j)\) using the same math as in the previous paragraph. \(\square \)

4 Proof of Main Theorem

Here we combine the results of the previous two sections to show that the runtime of the optimal tree with lazy finger is asymptotically the weighted dynamic finger bound for the best choice of weights.

Theorem 2

$$\begin{aligned} \min _T \left\{ \textstyle \sum _{i=1}^m r(T,x_{i-1},x_i) \right\} = \varTheta \left( \min _W \left\{ \textstyle \sum _{i=1}^m \lg \frac{\sum _{k=\min (x_i,{x_{i-1}})}^{\max ({x_i},{x_{i-1}})} w_{k}}{\min (w_{x_{i-1}},w_{x_i})} \right\} \right) \end{aligned}$$

Proof

Start by setting \(T^{\min }\), to be the optimal tree. That is, \(T^{\min }={{\mathrm{\arg \min }}}_T \left\{ \textstyle \sum _{i=1}^m r(T,x_{i-1},x_i) \right\} \):

$$\begin{aligned} \min _T \left\{ \textstyle \sum _{i=1}^m r(T,x_{i-1},x_i) \right\} = \textstyle \sum _{i=1}^m r(T^{\min },x_{i-1},x_i) \end{aligned}$$

Using Lemma 1 there is a constant c such and a set of weights \(w^{T^{\text {min}}}\) such that:

$$\begin{aligned} \ge c \textstyle \sum _{i=1}^m \lg \frac{\sum _{k=\min (x_i,{x_{i-1}})}^{\max ({x_i},{x_{i-1}})} w^{T^{\min }}_{k}}{\min (w^{T^{\min }}_{x_{i-1}},w^{T^{\min }}_{x_{i}})} \end{aligned}$$

The weights \(w^{T^{\min }}\) are a lower bound on the sum with the optimal weights

$$\begin{aligned} \ge c \min _W \sum _{i=1}^m \lg \frac{\textstyle \textstyle \sum _{k=\min (x_i,{x_{i-1}})}^{\max ({x_i},{x_{i-1}})} w_{k}}{\min (w_{x_{i-1}},w_{x_i})} \end{aligned}$$

Using Theorem 1, there is a constant \(c^{\prime }\) such that:

$$\begin{aligned} \ge c^{\prime } \textstyle \sum _{i=1}^m r(T_W,x_{i-1},x_i) \end{aligned}$$

The sum with \(T_w\) is at most the sum for optimal T:

$$\begin{aligned} \ge c^{\prime } \min _T \left\{ \textstyle \sum _{i=1}^m r(T,x_{i-1},x_i) \right\} \end{aligned}$$

\(\square \)

5 Hierarchy and Limitations of Models

In this section we show there is a strict hierarchy of runtimes from the root finger static BST model to the lazy finger static BST model to the rotation-based BST model. Let OPT(X) be the fastest any binary search with rotations can execute X.

Theorem 3

For any sequence \(X, \min _T R_{\text {root}}(T,X) = \varOmega \left( \min _T R_{\text {lazy}}(T,X) \right) =\varOmega (OPT(X)). \) Furthermore there exist classes of search sequences of any length \(m, X^{\prime }_m\) and \(X^{\prime \prime }_m\) such that \(\min _T R_{\text {root}}(T,X^{\prime }_m) = \omega \left( \min _T R_{\text {lazy}}(T,X^{\prime }_m) \right) \) and \( \min _T R_{\text {lazy}}(T,X^{\prime \prime }_m) =\omega (OPT(X^{\prime \prime }_m)).\)

Proof

We address each of the claims of this theorem separately.

Root finger can be simulated with lazy finger \(\min _T R_{\text {root}}(T,X) = \varOmega ( \min _T R_{\text {lazy}}(T,X) )\). For lazy finger, moving up to the LCA and back down is no more work than than moving to the root and back down, which is exactly the double of the cost of the root finger method.

Lazy finger can be simulated with a rotation-based tree \(\min _T R_{\text {lazy}}(T,X) =\varOmega (OPT(X))\). The normal definition of a tree allowing rotations has a finger that starts at the root at every operation and can move around the tree performing rotations, where following pointers and performing rotations can be done at unit cost. The work of Demaine et al. [10] shows how to simulate with constant-factor overhead any number of lazy fingers in a tree that allows rotations in the normal tree with rotations and one single pointer that starts at the root. This transformation can be used on a static tree with lazy finger to get the result.

Some sequences can be executed quickly with lazy finger but not with root finger: There is a \(X^{\prime }_m\) such that \(\min _T R_{\text {root}}(T,X^{\prime }_m) = \omega \left( \min _T R_{\text {lazy}}(T,X^{\prime }_m) \right) \). One choice of \(X^{\prime }_m\) is the sequential search sequence \(1,2,\ldots n,1,2, \ldots \) repeated until a search sequence of length m is created. So long as \(m\ge n\), this takes time O(m) to execute on any tree using lazy finger, but takes \(\varOmega (m \lg n)\) time to execute on every tree using root finger.

Some sequences can be executed quickly using a BST with rotations, but not with lazy finger Pick some small k, say \(k=\lg n\). Create the sequence \(X^{\prime \prime }_m\) in rounds as follows: In each round pick k random elements from \(1,\ldots ,n\), search each of them once, and then perform n random searches on these k elements. Continue with more rounds until a total of m searches are performed. A splay tree can perform this in time \(O(m \lg k)\). This is because splay trees have the working-set bound, which states that the amortized time to search an item is at most big-O of the logarithm of the number of different things searched since the last time that item was searched. For the sequence \(X^{\prime \prime }_m\), the n random searches in each round have been constructed to have a working set bound of \(O(\lg k)\) amortized, while the k other searches in each round have a working set bound of \(O(\lg n)\) amortized. Thus the total cost to execute \(X^{\prime \prime }_m\) on a splay tree is \( O\left( \frac{m}{n+k}(n \lg k + k \lg n) \right) \) which is \(O(m \lg \lg n)\) since \(k=\lg n\).

However, for a static tree with lazy finger, \(X^{\prime \prime }_m\) is basically indistinguishable from a random sequence and takes \(\varOmega (m \lg n)\) expected time. This is because the majority of the searches are random searches where the previous item was a random search, and in any static tree the expected distance between two random items is \(\varOmega (\lg n)\). \(\square \)

Lemma 2

The runtime of a BST in any model cannot be related to the conditional entropy of the search sequence.

Proof

Wilber [20] proved that there is a particular sequence, known as the bit reversal sequence, that if one searches the items in the sequence it takes \(\varOmega (n\lg n)\) time in an optimal dynamic BST. This sequence is a precise permutation of all elements in the tree. However, any single permutation repeated over and over has a conditional entropy of 0, since every search is completely determined by the previous one. \(\square \)

6 Constructing the Optimal Lazy Finger BST

Recall that \(f_{a,b} =f_X({a,b})\) is the number of searches in X where the current search is to b and the previous search is to a, and \(f_X{(a)}\) is the number of searches to a in X. We will first describe one method to compute the cost to execute X on some tree T. Suppose the nodes in [ab] constitute the nodes of some subtree of T, call it \(T_{a,b}\) and denote the root of the subtree as \(r(T_{a,b})\). We now present a recursive formula for computing the expected cost of a single search in T. Let \(R_{\text {lazy}}(T,X,a,b)\) be the number of edges traversed in \(T_{a,b}\) when executing X. Thus, \(R_{\text {lazy}}(T,X,1,n)\) equals the runtime \(R_{\text {lazy}}(T,X)\). There is a recursive formula for \(R_{\text {lazy}}(T,X,a,b)\):

The formula is long but straightforward. First we recursively include the number of edges traversed in the left (a) and right (b) subtrees of the root \(r(T_{a,b})\). Thus, all that is left to account for is traversing the edges between the root of the subtree and its up to two children. Both edges to its children are traversed when a search moves from the left to right subtree of \(r_{a,b}\) or vice-versa (c). A single edge to a child of the \(r(T_{a,b})\) traversed if a search moves from either the left or right subtrees of \(r(T_{a,b})\) to \(r(T_{a,b})\) itself or vice-versa (d), or if a search moves from any node but the root in the current subtree containing the nodes [ab] out to the rest of T or vice-versa (e).

This formula can easily be adjusted into one to determine the optimal cost over all trees—since at each step the only dependence on the tree was is root of the current subtree, the minimum can be obtained by trying all possible roots. Here is the resultant recursive formulation for the minimum number of edges traversed in and among all subtrees containing [ab]:

This formula can trivially be evaluated using dynamic programming in \(O(n^5)\) time as there are \(O(n^3)\) choices for ab, and r and evaluating the summations in the brute-force way takes time \(O(n^2)\). The dynamic programming gives not only the cost of the best tree, but the minimum roots chosen at each step gives the tree itself. The runtime can be improved to \(O(n^3)\) by observing that when f is viewed as a 2-D array, each of the sums is simply a constant number of partial sum queries (queries that ask for the sum of a contiguous block in the array) on the array f, each of which can be answered in O(1) time after \(O(n^2)\) preprocessing. (The folklore method of doing this is to store all the 2-D partial sums from the origin; a generic partial sum can be computed from these with a constant number of additions and subtractions).

We summarize this result in the following theorem:

Theorem 4

Given the pairwise frequencies \(f_X\) finding the tree that minimizes the execution time of search sequence X using lazy finger takes time \(O(n^3)\).

This algorithm computes an optimal tree. Computing f from X can be done in O(m) time, for a total runtime of \(O(m+n^3)\). It remains open if there is any approach to speed up the computation of the optimal tree, or an approximation thereof. Note that although our closed form expression of the asymptotic runtime of the best tree was stated in terms of an optimal choice of weights, the dynamic program presented here in no way attempts to compute these weights. It would be interesting if some weight-based method were to be discovered.

7 Multiple Trees Structure

Here we present a static data structure in the comparison model on a pointer machine that guarantees an average search time of \(O(H_c(f_X) \log _d n )\) for any fixed value \(1\le d\le n\), a runtime which we have shown to be impossible for any BST algorithm, static or dynamic. This data structure requires O(dn) space. In particular, setting \(d=n^\epsilon \) gives a search time of \(O(H_c(f_X))\) with space \(O(n^{1+\epsilon })\) for any \(\epsilon >0\). The purpose of this structure is to demonstrate that while no tree can have a runtime related to the conditional entropy, pointer based structures can.

As a first attempt, a structure could be made of n binary search trees \(T_1, T_2, \ldots T_n\) where each tree \(T_i\) is an optimal static tree given the previous search was to i. By using tree \(T_{x_{i-1}}\) to execute search \(T_i\), the asymptotic conditional entropy can be easily obtained. However the space of this structure is \(O(n^2)\). Thus space can be reduced by observing the nodes not near the root of every tree are being executed slowly and thus need not be stored in every tree.

The multiple trees structure has two main parts. It is composed first by a complete binary search tree \(T^{\prime }\) containing all of set of keys to be stored, \(S=[1,\ldots ,n]\). Thus the height of \(T^{\prime }\) is \(O(\lg n)\). The second part is n binary search trees \(\{T_{1},T_{2},\ldots ,T_{n}\}\). A tree \(T_{i}\) contains the d elements j that have the greatest frequencies \(f_X(i,j)\); these are the j elements most frequently searched after that i has been searched. The depth of an element j in \(T_{i}\) is \(O(\lg {f_X(i) \over f_X(i,j)})\). For each element j in the entire structure we add a pointer linking j to the root of \(T_j\). The tree \(T^{\prime }\) uses O(n) space and every tree \(T_{j}\) uses O(d) space. Thus the space used by the entire structure is O(dn).

Suppose we have just searched the element i and our finger search is located on the root of \(T_i\). Now we proceed to the next search to the element j in the following way: Search j in \(T_i\). If j is in \(T_i\) then we are done, otherwise search j in \(T^{\prime }\). After we found j either in \(T_j\) or \(T^{\prime }\) we move the finger to the root of \(T_j\) by following the aforementioned pointer.

If j is in \(T_i\) then it is found in time \(O(\lg {f_X(i) \over f_X(i,j)})\). Otherwise if j is found in \(T^{\prime }\), then it is found in \(O(\lg n)\) time. We know that if j is not in \(T_x\) this means that optimally it requires \(\varOmega (\lg d)\) comparisons to be found since \(T_x\) contains the d elements that have the greatest probability to be searched after that x has been accessed. Hence every search is at most \(O(\lg n/\lg d)\) times the optimal search time of \(O(\lg {f_X(i) \over f_X(i,j)})\). Thus a search for \(x_i\) in X takes time \(O\left( \log _d n \lg {f_X(x_i) \over f_X(x_{i-1},x_i)} \right) .\) Summing this up over all m searches \(x_i\) in X gives the runtime to execute X:

$$\begin{aligned} \textstyle O\left( \sum \limits _{i=1}^m \log _d n \lg {f_X(x_i) \over f_X(x_{i-1},x_i)} \right)&= O\left( \textstyle \sum \limits _{a=1}^n \textstyle \sum \limits _{b=1}^n f_X(a,b) \log _d n \lg {f_X(a) \over f_X(x_{a},x_b)} \right) \\&=O( m H_c(f_X) \log _d n) \end{aligned}$$

We summarize this result in the following theorem:

Theorem 5

Given the pairwise frequencies \(f_X\) and a constant \(d, 1\le d \le n\), the multiple trees structure executes X in time \(O( m H_c(f_X) \log _d n)\) and uses space O(nd).

8 Open Problems

We conjecture that no pointer-model structure has space O(n) and search cost \(O( H_c(f_X))\).

One interesting implication of this result is that any search tree which is dynamically optimal must have the weighted dynamic finger runtime for any choice of weights including the minimizing one. However, no rotation-based binary search tree algorithms are known which have this runtime bound. The closest thing that is known is the dynamic finger property of spay trees, which is equivalent to the weighted dynamic finger with unit weights [7, 8]. However, the proof found in [7, 8] does not lend itself to an obvious extension to non-unit weights.