A binary search tree (BST) is a fundamental data structure which is widely used in applications. There is a large variety of algorithms for constructing BSTs. A first approach is based on sequentially adding nodes to the tree. The nodes may be added to leaves [5] or to the root of the tree [7]. A second approach consists in reconstructing a BST from preorder or postorder traversals (e.g., see [1, 2] and references therein). A third approach is based on halving the given sorted array and recursively building the left and the right subtrees [9]. There are also algorithms that account for the probabilities of hitting specific nodes and try to build optimal BSTs (e.g., see [4] and references therein).

There also exist algorithms that do not adhere to any of the approaches above. The recursive algorithm in [8] constructs the tree by sequentially constructing perfect BSTs. After a perfect BST is constructed, it is incorporated into the new BST as the left subtree of the root. Then the new BST is transformed into a perfect BST, and so on.

We present a new non-recursive algorithm for constructing a binary search tree. The algorithm has O(N) time and O(1) auxiliary memory complexity if the given array of N numbers is sorted. We use an array-based representation of the BST. The O(1) auxiliary memory complexity means that, except for the resulting arrays used to store the tree, we need O(1) auxiliary memory. If a link-based representation is needed, then the algorithm will additionally need O(N) auxiliary memory. The resulting BST has minimal height. However, it may fail to be balanced in the sense of AVL trees, i.e., the trees where the heights of the two child subtrees of each node differ by at most one. The new algorithm, though being non-recursive, somewhat resembles the recursive algorithm in [8]. Moreover, we can use the rotations algorithm from [8] to make the BST complete (i.e., make all the levels, except possibly the lowest one, completely filled) while retaining the minimal height, which needs \(O(\log N)\) time and O(1) auxiliary memory.

As mentioned above, we assume that the given array of N numbers is sorted. To simplify notations, we will build a BST for the numbers \(0,\dots ,N-1\). This will not limit generality even when the keys are not pairwise distinct.

Our algorithm is substantially based on the binary representation of a number. We will mark the binary numbers with a leading zero to distinguish them from decimal ones; e.g., \(2=010\).

First, let us consider the case when \(N=2^K-1\) for some integer \(K\ge 1\). In this case, the minimal-height BST is perfect. For example, for \(K=4\), the tree is shown in Fig. 1.

Fig. 1
figure 1

The perfect binary search tree for the binary numbers from 0 to \(N - 1 = 2^4-2\)

Let the level of a node be the distance from the node to a nearest leaf in the perfect BST. That is, in a perfect BST, the leaves lie on level 0, the parents of the leaves lie on level 1, and so on. Note that in our case the level of a node depends on the number of the node only and does not depend on the height of the tree.

We see that the binary representations of nodes of level k end with a zero and plus k ones, since subsequent nodes on level k differ by \(2^{k+1}\). Thus, the level L(j) of a node j can be calculated as the location of the least significant zero in the binary representation of j.

To calculate L(j), one may use the operation of the least significant one in a binary number, which is implemented in many modern processor architectures [3]. There are also built-in functions for the operation in popular compilers. For instance, in GCC, L(j) can be computed as __builtin_ffs(~j)-1 (see [6]). We assume that the binary representation of \(N-1\)

contains at least one zero, which is the case when N can be represented as the number of the same unsigned integer type as is used for indexing the cells of the given sorted array of numbers.

However, Algorithms 1 and 2 below utilize not L(j) itself but \(2^{L(j)}\), and Algorithm 3 below can be obviously modified to use \(2^{L(j)}\) instead of L(j) if needed. It is well known [3] that

$$\begin{aligned} 2^{L(j)}=(j+1) \& (-(j+1)) \end{aligned}$$

or

$$\begin{aligned} 2^{L(j)}=(\sim j) \& (-(\sim j)), \end{aligned}$$

where & is the bitwise AND operator, \(\sim\) is the bitwise NOT operator, \(-j\) is the negative of j treating j as a signed integer in two’s complement arithmetic, which is common in modern processors. For instance, in R, \(2^{L(j)}\) can be computed as bitwAnd((j+1L),-(j+1L)).

A node j on level \(k\ge 1\) has the left child \(j-2^{k-1}\) and the right child \(j+2^{k-1}\) . Moreover, a node j on level \(k\ge 0\) has the parent \(j+2^k\) if the binary representation of j ends with “\(001\cdots 1\)” (ending with k ones) and the parent \(j-2^k\) if the binary representation of j ends with “\(101\cdots 1\)” (ending with k ones).

Thus, for the case \(N=2^K-1\), we can write down the algorithm as the following pseudocode. Below, p, l, r denote the resulting arrays of parents, left children, and right children, respectively. The algorithm constructs the BST as these three arrays. t stands for the number of the root node. M(j) denotes the location of the most significant one in the binary representation of j, e.g., \(M(01001)=3\). For instance, in GCC, __builtin_clz() function may be used for computing M(j) [6]. Unlike L(j), the function M(j) has no known representations in terms of common bitwise operations.

figure b

Now it remains to modify Algorithm 1 for the case of arbitrary N. If we try to build a binary tree with Algorithm 1, then some edges may point to missing nodes that are greater than \(N-1\).

FormalPara Lemma

. All the edges pointing to missing nodes in the “tree” built by Algorithm 1, except the down-right edge of the last node if any, are located on the ascending path from the node \((N-1)\) to the root in the perfect BST of the same height.

FormalPara Proof

Let us have the “tree” constructed by Algorithm 1. Let a node j, \(j\ne N-1\), of the “tree” have its down-right edge pointing to a missing node i. We have \(j<N-1<i\). Let k be the level of the node j, and let m be the ancestor of the node \((N-1)\) on level k in the corresponding perfect BST. Then we have \(m\ge j\), since \(j<N-1\). Besides, we cannot have \(m>j\) since it would imply \(N-1>i\). Hence, \(m=j\), and the ancestor of the node \((N-1)\) at level \(k-1\) in the corresponding perfect BST is i since \(j<N-1<i\).

Let now a node j of the “tree” constructed by Algorithm 1 have its up edge pointing to a missing node i, let k be the level of the node j, and let m be the ancestor of the node \((N-1)\) on level k in the corresponding perfect BST. Then again \(j<N-1<i\) and \(m\ge j\), and again we cannot have \(m>j\) since it would imply \(N-1>i\). Hence, \(m=j\).

The lemma is proved.

To correct the “tree” built by Algorithm 1, it remains to follow the descending path from the root to the node \((N-1)\) in the corresponding perfect BST and merge edges pointing to missing nodes. Finally, the algorithm, which will be explained below, is as follows. Below, / denotes integer division, e.g., \(1/2 = 0\).

figure c

The time complexity is still O(N), since merging edges after the for loops takes \(O(\log N)\) time and O(1) auxiliary memory.

The while loops for merging edges are explained as follows. Traveling by the descending path from the root node t to the node \((N-1)\), we move by \(\pm 2^{L(j)-1}\) when we go from the node j to its right/left child. Thus, \(N-1-t=A-B\), where A is the binary number with 1’s on locations \(m=L(j)-1\) such that the path contains the edge from j to its right child; analogously, B is the binary number that has 1’s on locations m such that the path contains the edge from j to its left child, \(m=L(j)-1\). The path goes through the nodes \(>N-1\) when it contains a subpath with the edges down right–left–left–\(\,\cdots \,\)–left with the next edge being down-right or with the last edge of the subpath being the last edge of the path. Only the first and the last node of the subpath are \(\le N-1\). So we must merge each such subpath into one edge. Let m and n be the levels of the first and the last node of the subpath. Then \(N-1-t\) will contain the following binary digits at the locations \(m-1,\dots ,n\): \(100\cdots 0-011\cdots 1=00\cdots 01\). The while loops just search for all such subpaths (all such patterns in \(N-1-t\)) and connect their first and last nodes with an edge.

FormalPara Remark 1

Algorithm 2 allows effective straightforward vectorization and/or parallelization. The only loop that cannot be vectorized or parallelized is the loop correcting edges pointing to nodes \(>N-1\). That loop has complexity \(O(\log N)\). Hence, the time complexity of the parallelized algorithm is \(O((N/s)+s+\log N)\), where s is the number of parallel threads. Note that the possible parallelization is much simpler than that for recursive algorithms.

FormalPara Remark 2

If the user does not need the array of parents p, then the array can be excluded from Algorithm 2 as well as from Algorithm 3 below, since those algorithms never read the values from p.

To make the tree complete while retaining the minimal height, we can use the rotations algorithm from [8] as follows.

In the algorithm below, R(j) stands for the height of the right subtree of a node j in the BST built by Algorithm 2 when the node j is reachable from the root node by descending via down-right edges only, h is the level of the current node, and x is the current node.

figure d

Algorithm 3 needs \(O(\log N)\) time since h decreases by 1 in each iteration of the while loop.

Performance of the new algorithm

In order to test performance of the new algorithm, we compared it with the classical Wirth’s algorithm [9] which recursively builds left and right subtree for each node, and with Vaucher’s algorithm [8]. Rotations balancing the BST were included into the implementation of Vaucher’s algorithm, as well as into the implementation of the new algorithm. Recall that all the three algorithms have linear time complexity and build complete BSTs. All the algorithms were implemented in R language and used array-based representations of BSTs. We ran the three tests: for all \(N=10,\!001,\,\dots ,\,20,\!000\) randomly ordered; for 1000 values of N randomly selected without replacement from the range \(100,\!001,\,\dots ,\,200,\!000\); and for 1000 values of N randomly selected without replacement from the range \(1,\!000,\!001,\,\dots ,\,2,\!000,\!000\). All the three algorithms were run for the same choices of values of N. No parallelization was implemented for the algorithms. The tests were run on an AMD 3950X processor. It is to be noted that, in the tests, the time was measured for constructing a BST for the numbers \(0,\dots ,N-1\), while, in practice, the tree may be constructed for an unsorted array of numbers, in which case sorting of the array must be done before applying each of the algorithms. The results of the tests are reported in Table 1 as mean time ± standard deviation.

Table 1 Execution times of the algorithms, presented as mean ± standard deviation

We see that in all the tests the new algorithm performed at least 8 times faster than each of the other two considered algorithms. That can be explained by the fact that the new algorithm does not use recursive function calls which are time-expensive. Besides, a vectorized version of the new algorithm was tested (We might note that the vectorized version has O(N) auxiliary memory complexity unlike the version with loops that needs O(1) auxiliary memory).

Conclusion

We have suggested a new fast algorithm for constructing complete binary search trees. The new algorithm has linear time complexity like previously known algorithms, but, in contrast to those algorithms, the new algorithm does not use recursive function calls. The new algorithm allows simple vectorization and parallelization. When comparing the performance of the new algorithm with that of Wirth’s algorithm and Vaucher’s algorithm, the new algorithm appeared to be at least eight times faster than its competitors in the considered tests.