Abstract
A framework for the analysis of the amortized complexity of (functional) data structures is formalized in Isabelle/HOL and applied to a number of standard examples and to three famous non-trivial ones: skew heaps, splay trees and splay heaps.
Access provided by Autonomous University of Puebla. Download conference paper PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
1 Introduction
Amortized complexity [3, 14] of an algorithm averages the running times of a sequence of invocations of the algorithm. In this paper we formalize a simple framework for the analysis of the amortized complexity of functional programs and apply it to both the easy standard examples and the more challenging examples of skew heaps, splay trees and splay heaps. We have also analyzed pairing heaps [4] but cannot present them here for lack of space. All proofs are available online [9].
We are aiming for a particularly lightweight framework that supports proofs at a high level of abstraction. Therefore all algorithms are modeled as recursive functions in the logic. Because mathematical functions do not have a complexity they need to be accompanied by definitions of the intended running time that follow the recursive structure of the actual function definitions. Thus the user is free to choose the level of granularity of the complexity analysis. In our examples we simply count function calls.
Although one can view the artefacts that we analyze as functional programs, one can also view them as models or abstractions of imperative programs. For the amortized complexity we are only interested in the input-output behaviour and the running time complexity. As long as those are the same, it does not matter in what language the algorithm is implemented. In fact, the standard imperative implementations of all of our examples have the same complexity as the functional model. However, in a functional setting, amortized complexity reasoning may be invalid if the data structure under consideration is not used in a single-threaded manner [10].
1.1 Related Work
Hofmann and Jost [7] pioneered automatic type-based amortized analysis of heap usage of functional programs. This was later generalized in many directions, for example allowing multivariate polynomials [6]. Atkey [1] carries some of the ideas over to an imperative language with separation logic embedded in Coq. Charguéraud and Pottier [2] employ separation logic to verify the almost-linear amortized complexity of a Union-Find implementation in OCaml in Coq.
2 Lists and Trees
Lists are constructed from the empty list \({{[}{]}}\) via the infix cons-operator “\({{{\cdot }}}\)”, \({{|}xs{|}}\) is the length of \({xs}\), \(tl\) takes the tail and \({rev}\) reverses a list.
Binary trees are defined as the data type \({{'}a\ tree}\) with two constructors: the empty tree or leaf \({{{\langle }}{{\rangle }}}\) and the node \({{{\langle }}l{,}\ a{,}\ r{{\rangle }}}\) with subtrees \({l{,}\ r\ {:}{:}\ {'}a\ tree}\) and contents \({a\ {:}{:}\ {'}a}\). The size of a tree is the number of its nodes:
For convenience there is also the modified size function .
3 Amortized Analysis Formalized
We formalize the following scenario. We are given a number of operations that may take parameters and that update the state (some data structure) as a side effect. Our model is purely functional: the state is replaced by a new state with each invocation of an operation, rather than mutating the state. This makes no difference because we only analyze time, not space.
Our model of amortized analysis is a theory that is parameterized as follows (a locale in Isabelle-speak):
-
\({{'}s}\) is the type of state.
-
\({{'}o}\) is the type of operations.
-
\({{}init\ {:}{:}\ {'}s{}}\) is the initial state.
-
\({nxt\ {:}{:}\ {'}o\ {{\Rightarrow }}\ {'}s\ {{\Rightarrow }}\ {'}s{}}\) is the next state function.
-
\({{}inv {:}{:}\ {'}s\ {{\Rightarrow }}\ bool{}}\) is an invariant on the state. We assume that the invariant holds initially (\({inv\ init}\)) and that it is preserved by all operations (\({inv\ s\ {{\Longrightarrow }}}\) \({inv\ {(}nxt\ f\ s{)}}\)).
-
\({{}t\ {:}{:}\ {'}o\ {{\Rightarrow }}\ {'}s\ {{\Rightarrow }}\ real{}}\) is the timing function: \({t\ f\ s}\) represents the time it takes to execute operation \({f}\) in state \({s}\), i.e. \({nxt\ f\ s}\).
The effect of each operation \({f}\) is modeled as a function \({nxt\ f}\) from state to state. Since functions are extensional, the execution time is modeled explicitly by function \({t}\). Alternatively one can instrument \({nxt}\) with timing information and have it return a pair of a new state and the time the operation took. We have separated the computation of the result and the timing information into two functions because that is what one would typically do in a first step anyway to simplify the proofs. In particular this means that \({t}\) need not be a closed form expression for the actual complexity. In all of our examples the definition of \({t}\) will follow the (usually recursive) structure of the definition of \({nxt}\) precisely. One could go one step further and derive \({t}\) from an intensional formulation of \({nxt}\) automatically, but that is orthogonal to our aim in this paper, namely amortized complexity.
For the analysis of amortized complexity we formalize the potential method. That is, our theory has another parameter:
-
\({{}{{\Phi }}\ {:}{:}\ {'}s\ {{\Rightarrow }}\ real{}}\) is the potential of a state. We assume the potential is initially 0 () and never becomes negative (\({inv\ s\ {{\Longrightarrow }}\ {\mathrm{0}}\ {{\le }}\ {{\Phi }}\ s}\)).
The potential of the state represents the savings that can pay for future restructurings of the data structure. Typically, the higher the potential, the more out of balance the data structure is. Note that the potential is just a means to an end, the analysis, but does not influence the actual operations.
Let us now analyze the complexity of a sequence of operations formalized as a function of type \({nat\ {{\Rightarrow }}\ {'}o}\). The sequence is infinite for convenience but of course we only analyze the execution of finite prefixes. For this purpose we define \({state\ f\ n}\), the state after the execution of the first \({n}\) elements of \({f}\):
Now we can define the amortized complexity of an operation as the actual time it takes plus the difference in potential:
By telescoping (i.e. induction) we obtain
where
is the sum of all \({F\ i~ \mathrm{{with}}~ i\ {<}\ n}\). Because of the assumptions on \({{{\Phi }}}\) this implies that on average the amortized complexity is an upper bound of the real complexity:
To complete our formalization we add one more parameter:
-
\({{}U\ {:}{:}\ {'}o\ {{\Rightarrow }}\ {'}s\ {{\Rightarrow }}\ real{}}\) is an explicit upper bound for the amortized complexity of each operation (in a certain state), i.e. we assume that \({inv\ s\ {{\Longrightarrow }}\ t\ f\ s\ {+}}\) \({{{\Phi }}\ {(}nxt\ f\ s{)}\ {-}\ {{\Phi }}\ s\ {{\le }}\ U\ f\ s}\).
Thus we obtain that \({U}\) is indeed an upper bound of the real complexity:
Instantiating this theory of amortized complexity means defining the parameters and proving the assumptions, in particular about \({U}\).
4 Easy Examples
Unless noted otherwise, the examples in this section come from a standard textbook [3].
4.1 Binary Counter
We start with the binary counter explained in the introduction. The state space \({{'}s}\) is just a list of booleans, starting with the least significant bit. There is just one parameterless operation “increment”. Thus we can model type \({{'}o}\) with the unit type. The increment operation is defined recursively:
In complete analogy the running time function for \({incr}\) is defined:
Now we can instantiate the parameters of the amortized analysis theory:
The key idea of the analysis is to define the potential of \({s}\) as \({{|}filter\ id\ s{|}}\), the number of \({True}\) bits in \({s}\). This makes sense because the higher the potential, the longer an increment may take (roughly speaking). Now it is easy to show that 2 is an upper bound for the amortized complexity: the requirement on \({U}\) follows immediately from this lemma (which is proved by induction):
4.2 Stack with Multipop
The operations are
where \({Pop\ n}\) pops \({n}\) elements off the stack:
In complete analogy the running time function is defined:
Now we can instantiate the parameters of the amortized analysis theory:
The necessary proofs are all automatic.
4.3 Dynamic Tables
Dynamic tables are tables where elements are added and deleted and the table grows and shrinks accordingly. We ignore the actual elements because they are irrelevant for the complexity analysis. Therefore the operations
do not have arguments. Similarly the state is merely a pair of natural numbers \({{(}n{,}\ l{)}}\) that abstracts a table of size \({l}\) with \({n}\) elements. This is how the operations behave:
If the table overflows upon insertion, its size is doubled. If a table is less than one quarter full after deletion, its size is halved. The transition from and to the empty table is treated specially.
This is the corresponding running time function:
The running time for the cases where the table expands or shrinks is determined by the number of elements that need to be copied.
Now we can instantiate the parameters of the amortized analysis theory. We start with the system itself:
This is the first time we have a non-trivial invariant. The potential is also more complicated than before:
Now it is automatic to show the following amortized complexity:
4.4 Queues
Queues have one operation for enqueueing a new item and one for dequeueing the oldest item:
We ignore accessing the oldest item because it is a constant time operation in our implementation.
The simplest possible implementation of functional queues (e.g. [10]) consists of two lists (stacks) \({{(}xs{,} ys{)}}\):
Note that the running time function counts only allocations of list cells and that it assumes \({rev}\) is linear. Now we can instantiate the parameters of the amortized analysis theory to show that the average complexity of both \({Enq}\) and \({Deq}\) is 2. The necessary proofs are all automatic.
In the same manner I have also verified [9] a modified implementation where reversal happens already when ; this improves the worst-case behaviour but (using ) the amortized complexity of \(Enq\) increases to 3.
5 Skew Heaps
This section analyzes a beautifully simple data structure for priority queues: skew heaps [13]. Heaps are trees where the least element is at the root. We assume that the elements are linearly ordered. The central operation on skew heaps is \(meld\), that merges two skew heaps and swaps children along the merge path:
We consider the two operations of inserting an element and removing the minimal element:
They are implemented via \(meld\) as follows:
For the functional correctness proofs see [9].
The analysis by Sleator and Tarjan is not ideal as a starting point for a formalization. Luckily there is a nice, precise functional account by Kaldewaij and Schoenmakers [8] that we will follow (although their \(meld\) differs slightly from ours). Their cost measure counts the number of calls of \(meld\), \(Insert\) and \(Delmin\):
Kaldewaij and Schoenmakers prove a tighter upper bound than Sleator and Tarjan, replacing the factor of 3 by 1.44. We are satisfied with verifying the bound by Sleator and Tarjan and work with the following simple potential function which is an instance of the one by Kaldewaij and Schoenmakers: it counts the number of “right heavy” nodes.
To prove the amortized complexity of \(meld\) we need some further notions that capture the ideas of Sleator and Tarjan in a concise manner:
Two easy inductive properties:
Now the desired logarithmic amortized complexity of \(meld\) follows:
Now it is easy to verify the following amortized complexity for \(Insert\) and \(Delmin\) by instantiating our standard theory with
Note that Isabelle supports implicit coercions, in particular from \(nat\) to \(real\), that are inserted automatically [15].
6 Splay Trees
A splay tree [12] is a subtle self-adjusting binary search tree. It achieves its amortized logarithmic complexity by local rotations of subtrees along the access path. Its central operation is \(splay\) of type \({'}a\ {{\Rightarrow }}\ {'}a\ tree\ {{\Rightarrow }}\ {'}a\ tree\) that rotates the given element (of a linearly ordered type \({'}a\)) to the root of the tree. Most presentations of \(splay\) confine themselves to this case where the given element is in the tree. If the given element is not in the tree, the last element found before a \({{\langle }}{{\rangle }}\) was met is rotated to the root. The complete definition is shown in Fig. 1.
Given splaying, searching for an element in the tree is trivial: you splay with the given element and check if it ends up at the root. For insertion and deletion, algorithm texts typically show pictures only. In contrast, we show the code only, in Figs. 2–3. To insert \(a\), you splay with \(a\) to see if it is already there, and if it is not, you insert it at the top (which is the right place due to the previous splay action). To delete \(a\), you splay with \(a\) and if \(a\) ends up at the root, you replace it with the maximal element removed from the left subtree. The latter step is performed by \(splay{\_}max\) that splays with the maximal element.
6.1 Functional Correctness
So far we had ignored functional correctness but for splay trees we actually need it in the verification of the complexity. To formulate functional correctness we need the two auxiliary functions shown in Fig. 4. Function collects the elements in the tree, function \(bst\) checks if the tree is a binary search tree according to the linear ordering “\({<}\)” on the elements. The key functional properties are that splaying does not change the contents of the tree (it merely reorganizes it) and that \(bst\) is an invariant of splaying:
Similar properties can be proved for insertion and deletion, e.g.,
Now we present two amortized analyses: a simpler one that yields the bounds proved by Sleator and Tarjan [12] and a more complicated and precise one due to Schoenmakers [11].
6.2 Amortized Analysis
The timing functions are straightforward and not shown. Roughly speaking, they count only the number of splay steps: counts the number of calls of \(splay\), counts the number of calls of \(splay{\_}max\); counts the time for both \(splay\) and \(splay{\_}max\).
The potential of a tree is defined as a sum of logarithms as follows:
The amortized complexity of splaying is defined as usual:
Let \(subtrees\) yield the set of all subtrees of a tree:
The following logarithmic bound is proved by induction on \(t\) according to the recursion schema of \(splay\): if \(bst\ t\) and \({{\langle }}l{,}\ a{,}\ r{{\rangle }}\ {{\in }}\ subtrees\ t\) then
Let us look at one case of the inductive proof in detail. We pick the so-called zig-zig case shown in Fig. 5. Subtrees with root x are called X on the left and \(X'\) on the right-hand side. Thus the figure depicts assuming the recursive call .
This looks similar to the proof by Sleator and Tarjan but is different: they consider one double rotation whereas we argue about the whole input-output relationship; also our \(log\) argument is simpler.
From (3) we obtain in the worst case (\(l\ {=}\ r\ {=} {{\langle }}{{\rangle }}\)):
In the literature the case is treated informally by stating that it can be reduced to : one could have called \(splay\) with some instead of \(a\) and the behaviour would have been the same. Formally we prove by induction that if \(t\ {{\not =}}\ {{\langle }}{{\rangle }}\) and \(bst\ t\) then
This gives us an upper bound for all binary search trees:
The \({{\varphi }}\ t\ {-}\ {\mathrm{1}}\) was increased to \({{\varphi }}\ t\) because the former is negative if \(t\ {=}\ {{\langle }}{{\rangle }}\).
We also need to determine the amortized complexity of \(splay{\_}max\)
A derivation similar to but simpler than the one for yields the same upper bound: .
Now we can apply our amortized analysis theory:
The fact that the given \(U\) is indeed a correct upper bound follows from the upper bounds for and ; for \(Insert\) and \(Delete\) the proof needs more case distinctions and \(\log \)-manipulations.
6.3 Improved Amortized Analysis
This subsection follows the work of Schoenmakers [11] (except that he confines himself to \(splay)\) who improves upon the constants in the above analysis. His analysis is parameterized by two constants \({{\alpha }}\ {>}\ {\mathrm{1}}\) and \({{\beta }}\) subject to three constraints where all the variables are assumed to be \({{\ge }}\ {\mathrm{1}}\):
The following upper bound is again proved by induction but this time with the help of the above constraints: if \(bst\ t\) and \({{\langle }}l{,}\ a{,}\ r{{\rangle }}\ {{\in }}\ subtrees\ t\) then
From this we obtain the following main theorem just like before:
Now we instantiate the above abstract development with
and \({"}{{\beta }}\ {=}\ {\mathrm{1}}{/}{\mathrm{3}}{"}\) (which includes proving the three constraints on \({{\alpha }}\) and \({{\beta }}\) above) to obtain a bound for splaying that is only half as large as in (4):
The subscript is our indication that we refer to the
and instance. Schoenmakers additionally showed that this specific choice of \({{\alpha }}\) and \({{\beta }}\) yields the minimal upper bound.
A similar but simpler development leads to the same bound for . Again we apply our amortized analysis theory to verify upper bounds for \(Splay\), \(Insert\) and \(Delete\) that are also only half as large as before:
The proofs in this subsection require a lot of highly nonlinear arithmetic. Only some of the polynomial inequalities can be automated with Harrison’s sum-of-squares method [5].
7 Splay Heaps
Splay heaps are another self-adjusting data structure and were invented by Okasaki [10]. Splay heaps are organized internally like splay trees but they implement a priority queue interface. When inserting an element \(x\) into a splay heap, the splay heap is first partitioned (by rotations, like \(splay\)) into two trees, one \({{\le }}\ x\) and one \({>}\ x\), and \(x\) becomes the new root:
Function \(del{\_}min\) removes the minimal element and is similar to \(splay{\_}max\):
In contrast to search trees, priority queues may contain elements multiple times. Therefore splay heaps satisfy the weaker invariant :
This is an invariant for both \(partition\) and \(del{\_}min\):
For the functional correctness proofs see [9].
7.1 Amortized Analysis
Now we verify the amortized analysis due to Okasaki. The timing functions are straightforward and not shown: and count the number of calls of \(partition\) and \(del{\_}min\). The potential of a tree is defined as for splay trees in Sect. 6.2. The following logarithmic bound of the amortized complexity is proved by computation induction on \(partition\ t:\) if and \(partition\ p\ t\ {=}\ {(}l{'}{,}\ r{'}{)}\) then
Okasaki [10] shows the zig-zig case of the induction, I show the zig-zag case in Fig. 6. Subtrees with root x are called X on the left and \(X'\) on the right-hand side. Thus Fig. 6 depicts assuming the recursive call .
The proof of the amortized complexity of \(del{\_}min\) is similar to the one for \(splay{\_}max\): . Now it is routine to verify the following amortized complexities by instantiating our standard theory with
and .
References
Atkey, R.: Amortised resource analysis with separation logic. Logical Methods Comput. Sci. 7(2), 33 (2011)
Charguéraud, A., Pottier, F.: Machine-checked verification of the correctness and amortized complexity of an efficient union-find implementation. In: Urban, C., Zhang, X. (ed.) ITP 2015. LNCS, vol. 9236, pp. 137–154. Springer, Heidelberg (2015)
Cormen, T.H., Leiserson, C.E., Rivest, R.L.: Introduction to Algorithms. MIT Press, New York (1990)
Fredman, M.L., Sedgewick, R., Sleator, D.D., Tarjan, R.E.: The pairing heap: A new form of self-adjusting heap. Algorithmica 1(1), 111–129 (1986)
Harrison, J.: Verifying nonlinear real formulas via sums of squares. In: Schneider, K., Brandt, J. (eds.) TPHOLs 2007. LNCS, vol. 4732, pp. 102–118. Springer, Heidelberg (2007)
Hoffmann, J., Aehlig, K., Hofmann, M.: Multivariate amortized resource analysis. ACM Trans. Program. Lang. Syst. 34(3), 14 (2012)
Hofmann, M., Jost, S.: Static prediction of heap space usage for first-order functional programs. In: Proceedings of the 30th ACM Symposium Principles of Programming Languages, pp. 185–197 (2003)
Kaldewaij, A., Schoenmakers, B.: The derivation of a tighter bound for top-down skew heaps. Inf. Process. Lett. 37, 265–271 (1991)
Nipkow, T.: Amortized complexity verified. Archive of Formal Proofs (2014). http://afp.sf.net/entries/Amortized_Complexity.shtml. Formal proof development
Okasaki, C.: Purely Functional Data Structures. Cambridge University Press, Cambridge (1998)
Schoenmakers, B.: A systematic analysis of splaying. Inf. Process. Lett. 45, 41–50 (1993)
Sleator, D.D., Tarjan, R.E.: Self-adjusting binary search trees. J. ACM 32(3), 652–686 (1985)
Sleator, D.D., Tarjan, R.E.: Self-adjusting heaps. SIAM J. Comput. 15(1), 52–69 (1986)
Tarjan, R.E.: Amortized complexity. SIAM J. Alg. Disc. Meth. 6(2), 306–318 (1985)
Traytel, D., Berghofer, S., Nipkow, T.: Extending hindley-milner type inference with coercive structural subtyping. In: Yang, H. (ed.) APLAS 2011. LNCS, vol. 7078, pp. 89–104. Springer, Heidelberg (2011)
Acknowledgement
Berry Schoenmakers patiently answered many questions about his work whenever I needed help.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Nipkow, T. (2015). Amortized Complexity Verified. In: Urban, C., Zhang, X. (eds) Interactive Theorem Proving. ITP 2015. Lecture Notes in Computer Science(), vol 9236. Springer, Cham. https://doi.org/10.1007/978-3-319-22102-1_21
Download citation
DOI: https://doi.org/10.1007/978-3-319-22102-1_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-22101-4
Online ISBN: 978-3-319-22102-1
eBook Packages: Computer ScienceComputer Science (R0)