Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

In algorithmics, that is, the “study of algorithms” [12], one is concerned with constructing and analyzing algorithms for given computing problems that perform well with respect to some given constraints, for instance, being efficient or obtaining some specific solution quality. In a classical setup, an input is given to an algorithm, and some particular information needs to be extracted. This is usually done while having full knowledge about the input. For instance, the information that needs to be extracted may be a cheapest Hamiltonian cycle that is “hidden” in an instance that corresponds to a complete weighted graph. Many computing problems, however, are what is called “intrinsically online” which means that the whole input is not known in advance, but arrives gradually in consecutive time steps while a part of the definite output already needs to be created. Typical members of this class are scheduling or packing problems, and various kinds of resource management problems.

Such problems are called “online problems” [1, 6, 15, 22], and they are found in many real-world situations. One of the most prominent and well-understood online problems is the paging (caching) problem. Here, an online algorithm maintains a cache containing up to k logical pages out of m possible ones. The input is a sequence of n requests for logical pages, and the algorithm has to process each request: if the requested page is in the cache (this is called a cache hit), nothing happens; if not (which is called a cache miss, or page fault), the algorithm has to evict one page from the cache, and replace it with the requested one. The goal of the algorithm is to process the input while minimizing the number of page faults. Let us give a formal definition; for the ease of presentation, we identify pages with their indices.

Definition 1

(Paging Problem). An instance of the paging problem is a sequence of integers representing requests to logical pages \(I=(x_1,x_2,\dots ,x_n)\), \(x_i>0\). An online algorithm \(\textsc {Alg}\) maintains a buffer (content of the physical memory) \(B=\{b_1,b_2,\dots ,b_k\}\) of k integers, where k is a fixed constant known to \(\textsc {Alg}\). Before processing the first request, the buffer gets initialized as \(B=\{1,2,\dots ,k\}\). Upon receiving a request \(x_i\), if \(x_i\in B\), then \(\textsc {Alg}\) creates the partial output \(y_i=0\). If \(x_i\not \in B\), then a page fault occurs, and \(\textsc {Alg}\) has to find some victim \(b_j\), that is, \(B:=B\setminus \{b_j\}\cup \{x_i\}\), and \(y_i=b_j\). The cost of the solution \(\textsc {Alg} (I)\) is the number of page faults, that is, \(\mathrm {cost}(\textsc {Alg} (I))=|\{i\mid y_i>0\}|\).

In this paper, we consider deterministic online algorithms for the paging problem. The performance of an algorithm is measured by the competitive ratio (which basically is the online counterpart of the approximation ratio Footnote 1 for offline problems) where the cost of the algorithm on a particular input I is compared to an optimal solution for I. In a very general setting, one may consider the inputs to come from a probability distribution \(\rho \) over all possible inputs. The quality of a solution depends on the data model that specifies two orthogonal aspects of the setting. First, the data model specifies the class of possible input distributions \(\mathcal {P}\); an algorithm \(\textsc {Alg}\) is called c-competitive if there is a constant \(\alpha \), such that

$$\begin{aligned} \forall \rho \in \mathcal {P} : \mathbb {E}_{\rho }\mathopen {}\left[ \frac{\mathrm {cost}(\textsc {Alg} (I))-\alpha }{\mathrm {cost}(\textsc {Opt} (I))}\right] \le c\end{aligned}$$
(1)

where the instance I is taken according to the distribution \(\rho \). Second, the data model specifies what information about the distribution \(\rho \) is known to the algorithm.

When \(\mathcal {P}\) is the class of all point mass distributions (that is, distributions where one particular input has probability 1), the impact of the algorithm’s knowledge has been widely studied. One extreme case is when the whole point mass distribution (that is, the input) is known to the algorithm; this corresponds to the offline deterministic case, which is easily solvable by a greedy algorithm (\(\mathrm {LFD}\), longest forward distance, called \(\textsc {Min}\) by Bélády [2] who first proved its optimality). The other extreme case, when the point mass distribution is unknown, corresponds to the online deterministic worst-case scenario, and it is known [21] that no deterministic online algorithm can be better than k-competitive. The spectrum between these two extremes has been studied by means of advice complexity [4, 5, 11, 14], which was first studied for paging by Dobrev et al. [10].

Orthogonally, in what is known as distributional approach [6] to the analysis of online algorithms, it is supposed that the inputs come from a fixed distribution (that is, \(\mathcal {P}\) is a singleton, in our setting), which usually is a uniform distribution. Franaszek and Wagner [13] studied the particular input distribution where each request is selected independently from a given distribution. Note that this was done before competitive analysis was introduced by Sleator and Tarjan [21]. In the Markov paging model by Shedler and Tung [20], the inputs are generated by a Markov chain. Pandurangan and Upfal [18] studied a setting where the page requests come from a stochastic process with given entropy.

There are also approaches where the worst case from several distributions is analyzed. Notably, in the access graph model by Borodin et al. [7], \(\mathcal {P}\) is the set of point mass distributions corresponding to walks in a (known) graph G and the particular distribution \(\rho \) is, of course, unknown. The statistical adversary introduced by Raghavan [19] considers (in the role of \(\mathcal {P}\) in our notation) the class of all point mass distributions that fulfill certain statistical properties, for instance, each page is requested the same number of times. This model was later sucessfully applied to two-way currency trading by Chou et al. [9].

Our setting follows the general notion of the diffuse adversary introduced by Koutsoupias and Papadimitriou [17] with the only distinction that in the diffuse adversary model, the particular distribution is always unknown. The diffuse adversary is a generalization of the statistical adversary (and other models that have been introduced in the past). In 1998, Young further studied the diffuse adversary [25]; he gave both tight bounds (up to a factor of 2) for deterministic and randomized online algorithms in this setting.

As pointed out before [1, 3, 8, 15, 17, 23], we argue that both extreme cases, that is, either knowing nothing about the input, or knowing everything about it, are unrealistic. As the requested pages depend on the user’s actions, it is clearly impossible for any operating system to completely foresee which pages will be accessed in the future. On the other hand, it seems equally unrealistic to assume that every input sequence is possible.

In Sect. 2, we briefly state our contribution, and put it into context. We formally state and prove our results in Sect. 3; we conclude in Sect. 4. Throughout this paper, \(\log \) denotes the logarithm with base 2.

2 Our Contribution

On one hand, a known point mass distribution corresponds to the offline case, and it is easily solvable as already mentioned [6]. Also, it follows from Yao’s principle [24] and the results on randomized paging, that, for any known point mass distribution, there is an \(\mathcal {O}\mathopen {}\left( \log k\right) \)-competitive algorithm, and there are such distributions where any algorithm is at least \(\mathrm {\Omega }\mathopen {}\left( \log k\right) \)-competitive. Moreover, as shown by Komm and Královič [16], it follows that, for any point mass distribution, there is an \(\mathcal {O}\mathopen {}\left( \log k\right) \)-bit long binary string from which the \(\mathcal {O}\mathopen {}\left( \log k\right) \)-competitive algorithm can be efficiently decoded.

Motivated by this, we analyze known distributions that are somewhat “close” to being point mass: instead of the probability mass being concentrated in one input, it can be distributed arbitrarily among \(\ell \) inputs.

Definition 2

(Class \(\varvec{\mathcal {P} _\ell }\) of Distributions). By \(\mathcal {P} _\ell \) we denote the class of \(\ell \)-point distributions, that is, probability distributions over inputs such that at most \(\ell \) inputs have non-zero probability.

An online algorithm for paging that only evicts pages in case of a page fault is called a “demand paging” (for k-server, a generalization of paging, the term “lazy” is more common) algorithm. The laziness requirement comes with no loss of generality [6], hence we shall consider only lazy algorithms to obtain easier arguments. Note that we incorporated this already in our formal definition of paging that only allows an algorithm to evict a page if a page fault occurs.

We analyze the expected competitive ratio of deterministic paging algorithms over a known \(\ell \)-point distribution. We show that, if \(\ell \) is constant, there is an “almost optimal” (that is, 1-competitive) algorithm. Basically, we prove that, since \(\ell \) is fixed, the overhead the algorithm pays until it realizes which input it is working on, can be hidden in the additive constant \(\alpha \) from the definition of the competitive ratio. For the strict competitive ratio (that is, when demanding that \(\alpha =0\)) we show an \(\mathcal {O}\mathopen {}\left( \log \ell \right) \)-competitive algorithm for \(\ell <k\) (for \(\ell \ge k\), one can use the \(\mathcal {O}\mathopen {}\left( \log k\right) \)-competitive algorithm). We complement the result by a matching lower bound stating that no online algorithm can be better than \((\log \ell )/2\)-competitive.

3 Results

We start with the non-strict case, that is, the case where the additive constant \(\alpha \) from the definition of the competitive ratio may be strictly positive. From the order of the quantifiers in (1), it follows that \(\alpha \) may depend on the cache size k, and the class of possible distributions (in our case parametrized by \(\ell \)). In this case, we can prove the following theorem.

Theorem 1

There is a 1-competitive paging algorithm for any known distribution from the class \(\mathcal {P} _\ell \), for any constant \(\ell \).

Before presenting an online algorithm that obtains this bound, let us make the following observation.

Lemma 1

Consider two inputs \(I_1\) and \(I_2\). Let \(\textsc {Opt} _1\) and \(\textsc {Opt} _2\) be the optimal (LFD) algorithms for \(I_1\) and \(I_2\), respectively. Then \(\textsc {Opt} _1\) and \(\textsc {Opt} _2\) make the same number of page faults on the common prefix of \(I_1\) and \(I_2\).

Fig. 1.
figure 1

\(I_1\), \(I_2\), and the branching point

Proof

Let j be the first position where \(I_1\), and \(I_2\) differ (see Fig. 1). To prove the claim by contradiction, suppose that there is a page requested in the prefix \(x_1,x_2,\dots ,x_{j-1}\) such that it causes a page fault for \(\textsc {Opt} _1\) but not for \(\textsc {Opt} _2\); let \(x_i\) be the first page with this property. \(\textsc {Opt} _1\) evicted this page in some preceding time step as it was requested farthest in the future, namely in time step i. But since this happened on the common prefix of \(I_1\) and \(I_2\) (that is, \(i\le j-1\)), \(\textsc {Opt} _2\) would have taken the same action.    \(\square \)

We now prove Theorem 1 using Lemma 1.

Proof

(of Theorem 1 ). Let \(\rho \in \mathcal {P} _\ell \) be the input distribution, and let \(I_1,I_2,\dots ,I_\ell \) denote the inputs that are chosen with non-zero probability sorted in non-increasing order \(\rho (I_1)\ge \rho (I_2)\ge \cdots \ge \rho (I_\ell )\). Let \(\textsc {Opt} _i\) be the optimum (LFD) solution for \(I_i\), \(1\le i\le \ell \).

The algorithm \(\textsc {Alg} \) starts by simulating \(\textsc {Opt} _1\). If the actual input is \(I_1\), \(\textsc {Alg} \) is optimal. If, at some point, \(\textsc {Alg} \) realizes that the instance is not \(I_1\), \(\textsc {Alg} \) switches to \(\textsc {Opt} _2\): it replaces the cache by the pages that would have been in the cache of \(\textsc {Opt} _2\) at this moment, and continues as \(\textsc {Opt} _2\) (actually, since we are considering lazy algorithms exclusively, \(\textsc {Alg}\) only remembers the state, and replaces the pages as needed). Let j be the time step where \(I_1\) and \(I_2\) differ for the first time; and thus, if \(\textsc {Alg}\) is not optimal on \(I_2\), the actions of \(\textsc {Opt} _1\) and \(\textsc {Opt} _2\) differ in some time step \(i<j\).

From Lemma 1 it follows that, after \(\textsc {Alg}\) switched from \(\textsc {Opt} _1\) to \(\textsc {Opt} _2\), the number of page faults so far was the same as in \(\textsc {Opt} _2\) plus the at most k faults needed to change the cache content to be consistent with \(\textsc {Opt} _2\).

This process can be iterated: \(\textsc {Alg}\) simulates \(\textsc {Opt} _2\) until it sees a difference in the input, switches to \(\textsc {Opt} _3\), and so on. After any switch to \(\textsc {Opt} _d\), the cost of \(\textsc {Alg}\) so far is bounded by the cost of \(\textsc {Opt} _d\) plus at most kd page faults needed to replace the cache after each switch. Overall, since there are at most \(\ell \) switches, we have

$$\begin{aligned} \mathrm {cost}(\textsc {Alg})\le \mathrm {cost}(\textsc {Opt})+k\ell . \end{aligned}$$

Since this inequality holds for any execution, (1) holds for \(\alpha =k\ell \), which finishes the proof.    \(\square \)

The previous theorem asserts that the price of each switch of the algorithm \(\textsc {Alg}\) is at most k. If we consider the strict competitive ratio, in which \(\alpha =0\) in (1), this price is too high, since for any known distribution there is a \(\mathcal {O}\mathopen {}\left( \log k\right) \)-competitive algorithm. In the following theorem, we present a more detailed accounting of the expected price of switching. In the proof, we again make use of Lemma 1.

Theorem 2

For any constant \(\ell \), there is an online paging algorithm for any known distribution from the class \(\mathcal {P} _\ell \) with an expected strict competitive ratio of at most \(\ln \ell +1\).

Proof

Let \(\rho \in \mathcal {P} _\ell \), and let \(\mathcal {I}:=\{I_1,I_2,\dots ,I_\ell \}\) be the instances that are chosen with non-zero probability. \(\textsc {Alg}\) computes the prefix tree T of \(\mathcal {I}\) with root r. Note that in T there is exactly one node for each distinct prefix in any of the request strings in \(\mathcal {I}\), and any instance is represented by a path from r to a distinct leaf of T. Each node of T that is not a leaf is labeled by its requested page, that is, the page that distinguishes its prefix from its predecessor’s prefix. The root and the leaves obtain an empty label. We define \(N^+(v)\) to be the out-neighborhood of a node v, that is, the children of v.

For each node v of T, we define a probability p(v) inductively. If v is a leaf, then there is a exactly one instance \(I_v \in \mathcal {I} \) that corresponds to a path from r to v and we set \(p(v) = \rho (I_v)\). If v is no leaf but all vertices in \(N^+(v)\) already have been assigned probabilities, we set

$$\begin{aligned} p(v) = \sum _{w \in N^+(v)} p(w). \end{aligned}$$

Clearly, eventually all nodes are labeled and \(p(r)=1\).

We now identify a subgraph \(\mathcal {T}\) of T as follows. Initially, \(\mathcal {T}\) has all nodes of T but no arcs. Then for each node v, we introduce exactly one arc (vw), where

$$\begin{aligned} w = \mathop {{{\mathrm{arg\,max}}}}\limits _{w' \in N^+(v)}p(w'), \end{aligned}$$

breaking ties arbitrarily. Note that \(\mathcal {T}\) is a collection of directed paths (possibly of length 0) that end in leaves of T. For each node v, denote by \(P_v\) the suffix of the path in \(\mathcal {T}\) that contains v where v is the start vertex of \(P_v\). Then the strategy of \(\textsc {Alg}\) is to move within T according to the requests and to follow the LFD strategy of the instance defined by the labels of \(P_v\), where v is the currently visited node. Note that after requests the strategy may change. For a leaf w, \(\textsc {Opt} _w\) is the optimal LFD solution for the instance defined by the path from r to w. To estimate the impact of strategy changes, we use the following claim that follows by applying Lemma 1 to all pairs of vertices that are reachable from a given vertex.

Claim. Let v be a node of T, and let S be the set of leaves reachable from v. Then the number of page faults on the prefix of instances defined by the path from r to v are identical for all solutions \(\textsc {Opt} _w\) with \(w \in S\).

The claim allows us to estimate the cost of changing strategies. For each pair of leaves \(w,w'\), there is a vertex v such that the paths from r to w and from r to \(w'\) fork at v. There is a critical phase from where \(\textsc {Opt} _w\) and \(\textsc {Opt} _{w'}\) differ first until v, and there is some number \(\kappa \) of differences between the two solutions within the critical phase. Therefore, changing the strategy from \(\textsc {Opt} _w\) to \(\textsc {Opt} _{w'}\) causes at most \(\kappa \) page faults and \(\kappa \le \mathrm {cost}(\textsc {Opt} _{w'})\). As a consequence, the number of different strategies is an upper bound on the attained competitive ratio. In the remaining analysis, we give an upper bound on the expected number of strategy changes.

Let us fix an internal node v with \(s:=|N^+(v)| > 0\), that is, v is not a leaf. Furthermore, let w be the subsequent node in \(P_v\). For each \(w' \in N^+(v)\) we define

$$\begin{aligned} p'(w') = \frac{p(w')}{\sum _{w'' \in N^+(v)} p(w'')}. \end{aligned}$$

In other words, \(p'(w')\) is the probability that the given path continues with \(w'\), provided that it contains v. Then the probability to change the strategy after v is \(1 - p'(w)\). We now give an upper bound on the fraction of reachable leaves left after leaving v.

For any node \(w'\), let \(\text {leaves} (w')\) be the set of leaves reachable from \(w'\). Then, for each \(w' \in N^+(v)\), we define the fraction of leaves reachable from v via \(w'\) by

$$\begin{aligned} \gamma (v,w'):= \frac{|\text {leaves} (w')|}{\sum _{w'' \in N^+(v)}|\text {leaves} (w'')|}. \end{aligned}$$

Then the expected fraction of leaves left after leaving v is

$$\begin{aligned} \sum _{w' \in N^+(v)} p'(w') \gamma (v,w'). \end{aligned}$$

This number is maximized if \(\gamma (v,w)=1\) and \(\gamma (v,w')=0\) for all \(w' \ne w\). We obtain the upper bound \(\gamma (v,w) \le p'(w)\).

Therefore, we obtain an upper bound on the total number of strategy changes if we consider an integer t and a sequence \((p_i)_{i=1}^t\) of probabilities such that

$$\begin{aligned} \sum _{i=1}^t (1-p_i) \end{aligned}$$

is maximized subject to

$$\begin{aligned} \prod _i p_i \ge 1/\ell . \end{aligned}$$

The meaning of the objective is that there are t nodes with probabilistic decisions. The probability of the selected strategy at the i-th node is \(p_i\) and thus there is a strategy change with probability \(1-p_i\). Intuitively, the maximum is attained when both the length t of the sequence is long and the probabilities to change strategies are large. However, for increasing values of t, the constraints enforce that most of the \(1-p_i\) are small. The constraints stem from the fact that there is no leaf left if the fraction of remaining leaves is smaller than \(1/\ell \).

We claim that the maximum can be attained with \(p_i = p_{i'}\) for all pairs of indices \(i,i'\). Suppose towards contradiction that there is no maximal solution with this property. Let \(p_1,p_2,\cdots ,p_t\) be a solution attaining the maximum such that

$$\begin{aligned} \mu := \max _{i,i'} |p_i - p_{i'}| \end{aligned}$$

is minimal and among these solutions one where

$$\begin{aligned} |\{(i,i') \mid |p_i - p_{i'}| = \mu \}| \end{aligned}$$

is minimal. Let us fix two indices \(\hat{\imath },\hat{\imath }'\) such that \(|p_{\hat{\imath }} - p_{\hat{\imath }'}| = \mu \). Now let us consider the solution \(p'_1,p'_2,\cdots ,p'_t\) where \(p'_{i} = p_{i}\) for all indices except \(\hat{\imath }\) and \(\hat{\imath }'\) and where

$$\begin{aligned} p'_{\hat{\imath }} = p'_{\hat{\imath }'} = \frac{p_{\hat{\imath }} + p_{\hat{\imath }'}}{2}. \end{aligned}$$

Clearly, still \(\max _{i,i'} |p'_i - p'_{i'}| \le \mu \). Also, the value of the objective function did not change since \(p_{\hat{\imath }} + p_{\hat{\imath }'} = p'_{\hat{\imath }} + p'_{\hat{\imath }'}\). To show that the constraints are satisfied, we claim that

$$\begin{aligned} p_{\hat{\imath }}p_{\hat{\imath }'} \le p'_{\hat{\imath }}p'_{\hat{\imath }'}. \end{aligned}$$

By renaming the indices, we assume without loss of generality that \(p_{\hat{\imath }} \le p_{\hat{\imath }'}\). Then we have

$$\begin{aligned} p'_{\hat{\imath }}p'_{\hat{\imath }'}&= \left( \frac{p_{\hat{\imath }} + p_{\hat{\imath }'}}{2}\right) ^2\\&= \frac{p^2_{\hat{\imath }} +2p_{\hat{\imath }}p_{\hat{\imath }'} + p^2_{\hat{\imath }'}}{4}\\&= p_{\hat{\imath }}p_{\hat{\imath }'} + \frac{p^2_{\hat{\imath }}}{4} -\frac{p_{\hat{\imath }}(p_{\hat{\imath }}+\delta )}{2} + \frac{(p_{\hat{\imath }}+\delta )^2}{4}\\&= p_{\hat{\imath }}p_{\hat{\imath }'} + \frac{\delta ^2}{4} \end{aligned}$$

where \(\delta = p_{\hat{\imath }'} - p_{\hat{\imath }}\). Therefore, unless \(\delta = 0\),

$$\begin{aligned} |\{(i,i') \mid |p'_i - p'_{i'}| = \mu \}| < |\{(i,i') \mid |p_i - p_{i'}| = \mu \}|, \end{aligned}$$

which is a contradiction to the minimality of \(|\{(i,i') \mid |p_i - p_{i'}| = \mu \}|\). Hence, from now on we may assume that \(p_i=p\) for some probability p and all indices i. We obtain

$$\begin{aligned} t = \log _p p^t = \log _p(1/\ell ) = \frac{\ln \ell }{\ln (1/p)}. \end{aligned}$$

As a consequence, we have to find

$$\begin{aligned} \max _p \frac{(1-p) \ln \ell }{\ln (1/p)}. \end{aligned}$$

We have

$$ \frac{\partial }{\partial p} \frac{(1-p) \ln \ell }{\ln (1/p)} = \ln \ell \frac{\ln p - 1 + 1/p}{\ln ^2 p}. $$

For \(p \in (0,1)\), the derivative is always positive since \(1/p > 1\). Thus,

$$\begin{aligned} \frac{(1-p) \ln \ell }{\ln (1/p)} \end{aligned}$$

is monotonously increasing in p and the maximum is attained for \(p \rightarrow 1\). On a high level this means that t is large whereas the probability of strategy changes, \(1-p\), is small. Now the number of strategy changes is bounded from above by

$$ \lim _{p\rightarrow 1}\frac{(1-p) \ln \ell }{\ln (1/p)} = \lim _{p\rightarrow 1}\frac{-\ln \ell }{-1/p} = \ln \ell $$

where we used l’Hôspital’s rule. With our previous discussion, the statement of the theorem follows.    \(\square \)

Finally, we argue that the algorithm from Theorem 2 is in a sense best possible by proving the following lower bound.

Theorem 3

Any online algorithm \(\textsc {Alg}\) on the class \(\mathcal {P} _\ell \) with \(\ell \le k\) has an expected strict competitive ratio of at least \((\log \ell )/2\).

Proof

First, assume that \(\ell \) is a power of 2; without loss of generality, let \(\textsc {Alg}\) be a demand paging (that is, lazy) algorithm. We assume that the cache is always organized such that the page indices of all pages residing in the cache at any given point in time are in increasing order in every time step; we do this without loss of generality and to keep our arguments simple and not being forced to argue about permutations of the cache cells. We describe a class \(\mathcal {I}\) of \(\ell \) instances, and the probability distribution \(\rho \) will be a uniform distribution over \(\mathcal {I}\). Let us assume that initially the cache contains pages \(1,2,\dots ,k\).

All instances in \(\mathcal {I}\) start by introducing a page \(k+1\), and particular instances can be described by a number i, such that in \(I_i\), the optimal algorithm evicts page i from the cache in the first time step. For all \(I_i\), the optimal cost will be one, so this is the only page fault the optimal algorithm incurs. In each instance, the first request is followed by \(\log \ell \) rounds, and we show that any algorithm causes a page fault in every round with probability at least 1 / 2. Together with the one page fault in the first time step, this gives the expected number of faults, and also the expected competitive ratio.

Consider any algorithm \(\textsc {Alg}\) running on an instance \(I_i\). Every round starts and ends by a sequence requesting pages \(\ell +1,\ell +2,\dots ,k\); if these pages are not in the cache of \(\textsc {Alg}\) at the beginning of the round, \(\textsc {Alg}\) makes a page fault with probability 1, and we are done. Hence, we can assume that \(\textsc {Alg} \) never evicts pages from the range \(\ell +1,\ell +2,\dots ,k\) from the cache. Let us call the pages \(1,2,\dots ,\ell \) active.

For round 2, the active pages are partitioned into two halves, and all pages of that half that does not contain i are requested consecutively (see Fig. 2). Clearly, \(\textsc {Alg}\) makes another page fault with probability at least 1 / 2. This procedure is applied recursively to the half that contains i until, in round \(\log \ell +1\), there are only two halves that contain single pages (out of which one is the page i). Moreover, in every round, all pages that were requested in the previous round are requested again.

Fig. 2.
figure 2

Example for \(k=\ell =16\); the optimal solution removes page 5 in time step one, when page 17 is requested; after that, there are 4 rounds that each consist of requests that were in the cache of \(\textsc {Opt}\) at the beginning

It follows that, in each round r, \(2\le r\le \log \ell +1\), \(\textsc {Alg}\) makes a page fault with probability at least 1 / 2 (note that the corresponding events are all independent) and an additional page fault in round 1 with probability 1. At the same time, \(\textsc {Opt}\) makes exactly 1 page fault in total. Summing up, the ratio of the costs is \((\log \ell )/2+1\).

Finally, assume that \(\ell \) is not a power of 2. Let \(\ell '\) denote the largest power of 2 that is smaller than \(\ell \). We follow the exact same strategy as above with \(\ell '\) instead of \(\ell \); in particular, we request all pages \(\ell '+1,\ell '+2,\dots ,k\) in every round. By definition, we have \(\ell '>\ell /2\), thus there is at most one less round, and consequently the lower bound decreases by at most 1.    \(\square \)

4 Conclusion

We studied the case of paging against a known distribution. On one hand, there are distributions where no algorithm can perform better than being \(\mathrm {\Omega }\mathopen {}\left( \log k\right) \)-competitive; on the other hand, for a point mass distribution, an easy optimal algorithm exists. We addressed the general question of characterizing the distributions in terms of the complexity of paging algorithms for them. We showed that if the distribution has at most \(\ell \) inputs that are chosen with non-zero probability each, there is an \((\ln \ell +1)\)-competitive online algorithm. Complementing, by constructing a class of hard instances, we showed that this bound is tight up to a small factor when \(\ell \le k\). In the case that the additive constant \(\alpha \) from the competitive ratio is allowed to be positive, there is a simple 1-competitive online algorithm where \(\alpha =k\ell \).