Register Loading via Linear Programming

Calinescu, Gruia; Li, Minming

doi:10.1007/s00453-014-9888-2

Register Loading via Linear Programming

Published: 30 May 2014

Volume 72, pages 1011–1032, (2015)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Algorithmica Aims and scope Submit manuscript

Register Loading via Linear Programming

Download PDF

Gruia Calinescu¹ &
Minming Li²

234 Accesses
1 Citation
Explore all metrics

Abstract

We study the following optimization problem. The input is a number $k$ and a directed graph with a specified “start” vertex, each of whose vertices may have one “memory bank requirement”, an integer. There are $k$ “registers”, labeled $1, \ldots , k$. A valid solution associates to the vertices with no bank requirement one or more “load instructions” $L[b,j]$, for bank $b$ and register $j$, such that every directed trail from the start vertex to some vertex with bank requirement $c$ contains a vertex $u$ that has been associated $L[c,i]$ (for some register $i \le k$) and no vertex following $u$ in the trail has been associated an $L[b,i]$, for any other bank $b$. The objective is to minimize the total number of associated load instructions. We give a $k(k+1)$-approximation algorithm based on linear programming rounding, with $(k+1)$ being the best possible unless Vertex Cover has approximation $2 - {\epsilon }$ for ${\epsilon }> 0$. We also present a $O(k \log n)$ approximation, with $n$ being the number of vertices in the input directed graph. Based on the same linear program, another rounding method outputs a valid solution with objective at most $2k$ times the optimum for $k$ registers, using $2k-1$ registers.

Optimal Register Allocation in Polynomial Time

Register Allocation

Alternating Automatic Register Machines

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Partitioned memory architecture is common in 8-bit microcontrollers. For example, Freescale [12] 68HC11 8-bit microcontrollers allow multiple 64 KB memory banks to be accessed by their 16-bit address registers with only one bank being active at a time. Zilog [31] Z80 also addresses a maximum of 64 KB memory using 16-bit address registers. Other examples include Intel 8051 processor family and MOS technology 6502 series microcontrollers. For embedded systems using these 8-bit microcontrollers, how to insert bank selection instructions (or load instructions) to minimize the code size is an important research topic. Given a Control Flow Graph, where each node represents some code block, we can insert bank selection instructions (loading a bank index to a register) either immediately before the block or immediately after the block to activate some memory bank. We want to use the minimum number of bank selection instructions to guarantee that no matter from which path the program enters the current code block, the memory bank which contains this block is active at this moment. Formal definitions are in the next subsection.

Note that we are not optimizing the run-time of the program (in which case the problem would resemble caching [25]) nor the number of registers needed for a program (as in Thorup [27], Kannan and Proebsting [18], and Jansen and Reiter [16]). An early related work is [15]. More recent work appears in [24], while [6, 19–21, 26] and many other papers deal with practical issues of NP-hard variants of register allocation. Also, as opposed to most theoretical work, we do not assume any structure (such as low treewidth) for the input graph. Most related to our model are the “spill heuristics” discussed in [3–5, 9, 11, 28], but as the name suggests we do not know of any previous approximation algorithms. Here “spill” means putting some variables in the RAM instead of registers and the aim of those heuristics is to minimize the number of variables “spilled”.

So, while our problem resembles register allocation, it differs in the following ways. We do not force a bank with a “live-range” to stay in the same register but allow the register to change content over time. We also have the restriction that a bank variable cannot be stored in RAM when it is to be visited (it must come back to a register in time, which is different from the Register Allocation problem where a variable can be spilled). In the new setting, our goal is to minimize the total number of content switching instructions for registers inserted into the program.

We organize the remaining of the paper as follows. Section 1.1 gives the original problem formulation $k$-OBSIM together with the more pure version $k$-BSIM. Section 1.2 describes previous and new results and further discussion. In Sect. 2, we show a reduction to prove the hardness of $k$-OBSIM and also show that the $k$-OBSIM problem can be transformed to the $k$-BSIM problem and hence we will focus on solving $k$-BSIM. Section 3 presents our integer linear program for $1$-BSIM and the rounding procedure giving the $2$-approximation. We also discuss derandomization in Sect. 3.2, using linear programming duality to reduce the running time of the rounding procedure. Section 4 presents the generalization of the approximation results from $1$-BSIM to $k$-BSIM, as well as the two other approximation results mentioned in the abstract. We conclude in Sect. 5.

1.1 Problem Formulation

Starting directly from embedded systems, we obtain the Original $k$-Bank Selection Instruction Minimization problem ($k$-OBSIM), defined as follows: the input is a number $k$ and a directed graph, called the control flow graph (CFG), with a specified “start” vertex, and for each vertex we have at most one “memory bank requirement”, an integer. The vertices of the CFG correspond to blocks of code in an embedded system, and arcs represent possible jumps in the code. Many embedded systems use partitioned memory architecture, and program variables are stored in “banks” that must be activated by storing its index in registers before use. For illustration convenience, we simply say a bank is stored in a register when the bank is activated. A vertex with no bank requirement is called a transparent vertex, and a vertex with one bank requirement is called a required vertex. There are $k$ registers, labeled $1, \ldots , k$. Let $B$ be the set of possible banks.

Each vertex of the CFG may “load” a bank (use a bank load instruction), either at the “entrance” or at the “exit” of the vertex (or both). We write $L_{in}^u [b,j]$ for loading bank $b$ in register $j$ at the entrance of vertex $u$, and $L_{out}^u [b,j]$ for loading at the exit. Although loading is also allowed on the arcs of the CFG, we prefer to subdivide such arcs with transparent vertices to keep the problem description simpler. Also, when $k=1$, a small proof shows that any load on an arc can be done instead at the entry of the head of the arc, resulting in another feasible solution not worse in the number of load instructions.

For a trail (directed path, not necessarily simple) $P$, let ${\widehat{P}}$ denote the set of interior vertices of $P$, obtained as follows: from the sequence of vertices of $P$, remove the first and last vertex, and then eliminate duplicates. Note that the start or end of $P$ may appear in ${\widehat{P}}$. If for trail $P$ we have vertex $w \in {\widehat{P}}$, let $P[w]$ be the subtrail of $P$ from the last occurrence of $w$ in $P$ to the end vertex of $P$. Let $s$ be the start vertex of the CFG. In a feasible solution, bank load instructions must be associated to CFG vertices such that, for any trail $P$ from $s$ to some node $v$ that has a bank requirement $b$, $P$ has a vertex $w$ (which may be $v$) and a register $j$ for some $j \le k$ such that one of the following holds:

1.
$w=v$, and we have $L_{in}^w [b,j]$ but no $L_{in}^w [c,j]$ for any $c \in B$ with $c \ne b$
2.
$w$ has $L_{out}^w [b,j]$, and for no $c \in B$ with $c \ne b$ there is $L_{in}^v [c,j]$ or $L_{out}^w [c,j]$ or a vertex $u$ of ${\widehat{P[w]}}$ with either $L_{out}^u [c,j]$ or $L_{in}^u [c,j]$
3.
$w$ has $L_{in}^w [b,j]$, and for no $c \in B$ with $c \ne b$ there is $L_{in}^v [c,j]$ or $L_{in}^w [c,j]$ or $L_{out}^w [c,j]$ or a vertex $u$ of ${\widehat{P[w]}}$ with either $L_{out}^u [c,j]$ or $L_{in}^u [c,j]$.

We call such a trail fulfilled. See Figs. 1, 2, and 3 for examples of feasible solutions. In other words, in a trail from the start vertex to some vertex $v$ requiring bank $b$, bank $b$ is always loaded in some register and there are no other bank loads over $b$ further on the trail. The objective is to minimize the total number of bank load instructions (as to keep the embedded code as short as possible). See the right example in Fig. 1 for a scenario where simple paths can be fulfilled but there exists an unfulfilled trail.

We prefer to work with a slightly more pure problem, called $k$-BSIM. The input is a number $k$ and a directed graph with a specified “start” vertex, each of whose vertices may have one “memory bank requirement”, an integer. There are $k$ “registers”, labeled $1,\ldots , k$. A valid solution associates to the vertices with no bank requirement one or more “load instructions” $L[b,j]$, for bank $b$ and register $j$, such that every directed trail from the start vertex to some vertex with bank requirement $c$ contains a vertex $u$ that has been associated $L[c,i]$ (for some register $i \le k$) and no vertex following $u$ in the trail has been associated an $L[b,i]$, for any other bank $b$. The objective is to minimize the total number of associated load instructions.

The difference between $k$-OBSIM and $k$-BSIM is that in $k$-BSIM, “load instructions” can only be added inside nodes with no bank requirement (transparent nodes) while in $k$-OBSIM, “load instructions” can be added before and after any node.

1.2 Results and Discussion

The paper [22] studies $1$-OBSIM without transparent nodes, showing NP-hardness and a $2$-approximation algorithm. We generalize this result, by giving a $k(k+1)$-approximation algorithm for $k$-OBSIM (transparent nodes also allowed) based on linear programming rounding. In a personal communication, Yuan Zhou [30] claimed that $1$-OBSIM without transparent nodes does not have a $2-{\epsilon }$ approximation algorithm unless vertex cover has a $2-{\epsilon }$ approximation algorithm (and it is believed that such an algorithm does not exist). We also present such a reduction, and generalize it to show that $k$-OBSIM without transparent nodes does not have a $\alpha - {\epsilon }$ approximation algorithm unless $(k+1)$-uniform Hypergraph Vertex Cover has such an algorithm. It is known that it is NP-hard to approximate Hypergraph Vertex Cover in a $r$-uniform-hypergraph to within a factor of $(r-1-{\epsilon })$ [10], and it is believed that an approximation ratio of $r$ is the best a polynomial-time algorithm can do. Thus it is NP-hard to approximate $k$-OBSIM ($k > 2$) within a factor of $k-{\epsilon }$.

The results in [22] only work for acyclic CFGs with no transparent nodes, while all our approximation algorithms work for arbitrary CFGs (with cycles and transparent nodes) by adding an essential new constraint and also using a more sophisticated rounding analysis, which is akin to how node-multiway-cut [13] generalizes Vertex Cover. For $k=1$, the existence of transparent nodes poses a serious challenge, since given a transparent node $v$, solutions without a “load instruction” at $v$ may exist, and for such a solution, different program flows (trails) going through $v$ can have different banks being active when leaving $v$. However, for a node with a bank requirement, no matter which flow the program goes through, the active bank will be the same after going through this node. For $k>1$, there are similar issues.

Based on the same linear program, we also present a $O(k \log n)$-approximation, with $n$ being the number of vertices in the input directed graph, for $k$-BSIM. Another rounding method outputs a valid solution with objective at most $2k$ times the optimum for $k$ registers, however using $2k-1$ registers instead of $k$ registers. The linear program contains one “clever” constraint which makes it similar, for $k=1$, to the linear program used by Garg et al. [13] to obtain a $2$-approximation for Node Weighted Multiway Cut. We discussed earlier the hardness of $k$-BSIM, obtained from the hardness of $k$-uniform Hypergraph Vertex Cover. Informally, $k$-BSIM also inherits some hardness from $k$-Coloring (as in the register allocation papers [27] and [18], with the latter using, as one of our algorithms above, $2k$ instead of $k$ registers) and we see intuitive connections to Directed Steiner Tree [7, 29, 32] and Multicut in directed graphs [2, 8, 14].

2 Reductions

A reduction from Vertex Cover to $1$-OBSIM without transparent nodes was announced by Yuan Zhou [30]. We show directly how $(k+1)$-uniform Hypergraph Vertex Cover reduces to $k$-OBSIM without transparent nodes.

$r$-uniform Hypergraph Vertex Cover is the following problem: The input $H = (V,E)$ is a $r$-uniform hypergraph; that is each hyperedge $e \in E$ is a subset of $V$ of size $r$. A set $C$ of vertices is said to cover a hyperedge $e$ if $e \cap C \ne \emptyset $, and is said to be a vertex cover if it covers all hyperedges. The objective is to find a minimum size vertex cover. Vertex cover is $2$-uniform Hypergraph Vertex Cover.

Please refer to Fig. 4 for an illustration. Given a $(k+1)$-uniform Hypergraph Vertex Cover instance $G=(V,E)$ (with $|V| = n$ and $|E| = m$), construct the CFG as follows: add to its vertices many (say, $k m^2$) copies of each vertex of $v \in V$, all with required bank $v$, creating groups with vertices in different groups having different requirements. For each hyperedge $e \in E$, add to the CFG a vertex with bank $0 \not \in V$, and put an arc from this vertex to all the copies of the vertices of $V$ included in $e$. Add to the CFG the start vertex $s$ and put arcs from it to all the CFG-vertices obtained from hyperedges of $E$. Call this $k$-OBSIM instance $I$.

We claim that if $G$ has a vertex cover $C$ of size $q$, we can find a feasible solution for $I$ that uses at most $1 + km + qk m^2$ bank load instructions: (Fig. 5) all the $qk m^2$ copies of all the vertices in $C$ will load their respective requirement at the entry, all the vertices of $I$ obtained from some $e \in E$ will load, at the exit, the requirement of the vertices in $ e \setminus C$ (there are at most $k$ such vertices since $e$ has cardinality $(k+1)$ and at least one vertex of $C$ is contained in the set $e$). $s$ loads bank $0$ at the exit.

Moreover, if $I$ has a solution with less than $qk m^2$ bank load instructions, then $G$ has a vertex cover $Q$ of size less than $q$: put in $Q$ a vertex $v$ if all the $k m^2$ copies of $v$ load their requirement. Then $|Q| < q$ and it remains to prove that all the hyperedges of $G$ are covered by $Q$. Indeed, if hyperedge $e \in E$ is not covered by $Q$, then for every $u \in e$, there is a copy of $u$ with no bank load instruction; however then no matter what $k$ load instructions we associate with the vertex of $I$ corresponding to $e$ or with the start vertex, we do not obtain a feasible solution for $I$: since $|e| > k$ there will be a vertex $w \in e$ such that bank $w$ is not loaded after leaving the vertex of $I$ corresponding to $e$, and since also one of $w$’s copy (call it $w'$) in $I$ does not have an associated load instruction at its entry, the trail (here, a simple path) $s,e,w'$ is not fulfilled.

In conclusion, if $G$ has optimum vertex cover of size $q$, $I$ has optimum between $q k m^2$ and $q k m^2 + k m +1$, and thus $k$-OBSIM cannot be approximated with ratio $k-{\epsilon }$ unless $P = NP$ (using [10]) and $1$-OBSIM cannot be approximated with ratio $2-{\epsilon }$ unless Vertex Cover can be approximated with ratio $2 - {\epsilon }$.

We continue by showing how $k$-OBSIM reduces to $k$-BSIM (with transparent nodes), a problem easier to describe. Given an instance of $k$-OBSIM, for every node $v$ in CFG with bank requirement $b$, add a transparent node $v_{in}^t$ which takes in all the incoming arcs of $v$ and has one arc to $v$, thus $v$ has exactly one incoming arc. Also add a transparent node $v_{out}^t$ which sends out all the outgoing arcs of $v$, and has one arc from $v$, thus $v$ has exactly one outgoing arc. If for the $k$-OBSIM instance, there is a load operation at the entrance of some vertex $v$ with bank requirement $b$, then in the transformed $k$-BSIM instance, we do the same load to node $v_{in}^t$; if there is a load operation at the exit of some vertex $v$ with bank requirement $b$, then in the transformed $k$-BSIM instance, we do the same load to node $v_{out}^t$. Thus, with the above correspondence, a feasible solution for the $k$-OBSIM instance can be changed to a feasible solution for the transformed $k$-BSIM instance. Also, it is easy to see that a feasible solution for the transformed $k$-BSIM instance can be changed to a feasible solution for the original $k$-OBSIM instance, without an increase in the objective function.

3 1-BSIM

Remove nodes from the CFG such that every node is reachable from the start vertex $s$. We again do a similar transformation for the given OBSIM instance. The linear program obtained later after this transformation is more intuitive (but this reduction only works for $k=1$).

Create a new transparent start vertex, $s'$, with exactly one arc, outgoing to the original $s$. For every node $v$ in CFG with bank requirement $b$, split $v$ in two nodes, $v_{in}$ with all the incoming arcs of $v$, and $v_{out}$ with all the outgoing arcs of $v$; both have requirement $b$. This operation is illustrated in Fig. 6. Note that we do not have an arc from $v_{in}$ to $v_{out}$. Moreover, for $v_{in}$, add a transparent node which takes in all the incoming arcs of $v_{in}$ and has one arc to $v_{in}$, thus $v_{in}$ has exactly one incoming arc. Also, for $v_{out}$, add a transparent node which sends out all the outgoing arcs of $v_{out}$, and has one arc from $v_{out}$, thus $v_{out}$ has exactly one outgoing arc. We now insist that all load instructions are done at transparent nodes. Figure 7 gives an example.

Call the resulting directed graph $G=(V,E)$. Let $R^I \subset V$ (lowest level in Fig. 7) contain all the required nodes with one incoming arc each (that is, the $v_{in}$ nodes), let $R^O \subset V$ (highest level in Fig. 7) contain $s'$ and the required nodes with one outgoing arc each (that is, the $v_{out}$ nodes), and let $F$ be the set of transparent nodes other than $s'$ (in Fig. 7, these nodes are not on the highest or lowest levels). For $a \in B$, let $R^I_a$ be the subset of $R^I$ with requirement $a$, and $R^O_a$ be the subset of $R^O$ with requirement $a$. In $G$, we insist that for every bank $a \in B$ and every vertex $v \in R^I_a$, every trail ending in $v$ and starting at a vertex of $R^O \setminus R^O_a$ contains a vertex $u \in F$ loading bank $a$, and no load instructions after $u$. Call BSIM this new problem. One can check that a $1$-OBSIM feasible solution for the original instance corresponds to a BSIM feasible solution to the constructed instance, with the same number of load instructions.

For $v \in V$ and $a \in B$, let $\mathcal{{T}}^v_a$ be the (possibly infinite) set of trails of $G$ from $v$ to some node of $R^I_a$, and let $\mathcal{{P}}^v_a$ be the set of simple paths of $G$ from $v$ to some node of $R^I_a$. Write the following integer linear program (IP1), with variables $x^v_b$ for every node $v \in F$ and bank requirement $b \in B$ ($x^v_b$ in the IP would be 1 if node $v \in F$ loads bank $b$), and variables $d^v_b$ for every node $v \in \left( F \cup R^O \right) $ and bank requirement $b \in B$ ($d^v_b$ in the IP would be 1 if either $\mathcal{{P}}^v_b = \emptyset $ or, for any simple path $P \in \mathcal{{P}}^v_b$, ${\widehat{P}}$ contains at least one node that loads bank $b$). Note that $\mathcal{{P}}^v_b = \emptyset $ iff $\mathcal{{T}}^v_b = \emptyset $, and that if any simple path $P \in \mathcal{{P}}^v_b$, ${\widehat{P}}$ contains at least one node that loads bank $b$, then for any trail $T \in \mathcal {{T}}^v_b$, ${\widehat{T}}$ contains at least one node that loads bank $b$. Also note that if, for some $v \in \left( F \cup R^O \right) $, we have that for any simple path $P \in \mathcal{{P}}^v_b$, ${\widehat{P}}$ contains at least one node that loads bank $b$, it does not necessarily follow that bank $b$ “arrives loaded at the destination” on such a path since it is not mentioned that another bank is not loaded “over $b$” later on the path.

$$\begin{aligned}&\min \sum _{v \in F, b \in B} x^v_b \nonumber \\ \text{ subject } \text{ to }&\sum _{b \in B} x^v_b \le 1 \; \; \forall v \in F \end{aligned}$$

(1)

$$\begin{aligned}&d^u_a \ge 1 \; \;\quad \forall a \in B \wedge \forall u \in \left( R^O \setminus R^O_a \right) \end{aligned}$$

(2)

$$\begin{aligned}&d^v_a \ge x^v_b \; \;\quad \forall a \ne b \in B \wedge \forall v \in F \end{aligned}$$

(3)

$$\begin{aligned}&d^u_a \le d^v_a \!+\! x^v_a \qquad \forall a \in B \wedge \forall u \in \left( F \cup R^O\right) \wedge \forall v \in F \text{ such } \text{ that } uv \in E\nonumber \\ \end{aligned}$$

(4)

$$\begin{aligned}&d^u_a = 0 \qquad \forall a \in B \wedge \forall u \in F \text{ such } \text{ that } \exists v \in R^I_a \text{ such } \text{ that } uv \in E \end{aligned}$$

(5)

$$\begin{aligned}&d^u_a + d^u_b \ge 1 \; \;\quad \forall a \ne b \in B \wedge \forall u \in F \end{aligned}$$

(6)

$$\begin{aligned}&x^v_a \ge 0 \; \; \quad \forall v \in F \wedge \forall a \in B \end{aligned}$$

(7)

$$\begin{aligned}&d^v_a \ge 0 \; \;\quad \forall v \in \left( F \cup R^O\right) \wedge \forall a \in B \end{aligned}$$

(8)

$$\begin{aligned}&x^v_a, d^v_a \in {\mathbb {Z}}\quad \forall v \in V \wedge \forall a \in B \end{aligned}$$

(9)

See an example in Fig. 8. We argue the fact that any IP solution obtained from a BSIM solution satisfies all these constraints, and that we can construct a valid BSIM solution from any IP solution. It is rather obvious the objective function matches. Constraint (1) enforces only one load per vertex of $F$. Constraint (2) enforces the condition that for every bank $a \in B$ and every vertex $v \in R^I_a$, every simple path (and also every trail) ending in $v$ and starting at a vertex of $R^O \setminus R^O_a$ contains a transparent vertex loading bank $a$; it does not guarantee however no load instructions after $u$. This is done by Constraint (3), which enforces the following observation: if bank $b$ is loaded in vertex $v$, then for any simple path (and also every trail) from $v$ to a vertex requiring bank $a$, there must be at least one load of bank $a$.

Constraint (4) enforces the following: if for bank $a$ and vertices $u,v$ with $uv \in E$, we have $\mathcal{{P}}^v_a \ne \emptyset $ and there exists a simple path $P \in \mathcal{{P}}^v_a$ such that ${\widehat{P}}$ contains no node that loads bank $a$, and $v$ also does not load $a$, then $\mathcal{{P}}^u_a \ne \emptyset $ and there exists a path $P' \in \mathcal{{P}}^u_a$ (namely, shortcut if needed the trail starting with arc $uv$ followed by $P$) such that ${\widehat{P'}}$ contains no node that loads bank $a$. Constraint (5) means that if $v \in R^I_a$ and $uv \in E$, then $\mathcal{{P}}^u_a \ne \emptyset $ and there exists a trail $P' \in \mathcal{{P}}^u_a$ (namely, arc $uv$) such that ${\widehat{P'}}$ contains no node that loads bank $a$.

The trickier to verify constraint is (6), which indeed holds for integer solutions as, if for vertex $v$ and banks $a\ne b$, $\mathcal{{P}}^v_a$ and $\mathcal{{P}}^v_b$ are both non-empty, then no matter if or what bank is loaded in $v$ or in any other free vertex, either we must have that every simple path $P \in \mathcal{{P}}^v_a$ satisfies that ${\widehat{P}}$ contains at least one node that loads bank $a$, or we must have that every simple path $P \in \mathcal{{P}}^v_b$ satisfies that ${\widehat{P}}$ contains at least one node that loads bank $b$. Indeed, if there is a simple path $P \in \mathcal{{P}}^v_a$ with ${\widehat{P}}$ not loading $a$, then we must have that either $v$ loads $a$, or all the simple paths from $R^O$ to $v$ load $a$ or are coming from $R^O_a$ (and a trail from $R^O$ to $v$ must exist since we assumed every vertex of the CFG is reachable from $s$). Thus if such a $P$ exists, we must have that every simple path $P' \in \mathcal{{P}}^v_b$ satisfies that ${\widehat{P'}}$ contains at least one node that loads bank $b$. It is the crucial (and clever) Constraint (6) that allows good approximation algorithms.

Now, given an IP1 feasible solution, it remains to argue that loading bank $b$ at vertex $u$ whenever $x^u_b=1$ gives a feasible BSIM solution. Indeed, let $a \in B$, $v \in R^I_a$ and $P$ be a trail from some $w \in \left( R^O \setminus R^O_a \right) $ to $v$. Constraints (2),(4), and (5) ensure that at least one vertex $u$ of ${\widehat{P}}$ has $x^u_b = 1$. Pick $z$ to be the last such vertex of ${\widehat{P}}$. If any vertex $y$ following $z$ on ${\widehat{P}}$ has $x^y_a=1$ then Constraint (3) ensures $d^y_b=1$ and therefore another vertex $v$ of ${\widehat{P}}$, following $y$, has $x^v_b=1$, contradicting the selection of $z$.

3.1 LP Rounding

Let LP1 be the linear programming relaxation of IP1, which can be solved in polynomial time. See Fig. 9 for an example of a fractional solution. Let ${\bar{x}}^v_a,{\bar{d}}^v_a$ be an optimum LP1 solution. Pick uniformly at random a real number $\delta \in (0,1/2)$. Set (for all possible $v,a$) $x^v_a = 1$ iff ${\bar{d}}^v_a < \delta \le {\bar{d}}^v_a + {\bar{x}}^v_a$. Set (for all possible $v,a$) $d^v_a = 1$ iff $\mathcal{{P}}^v_a = \emptyset $ or any path $P$ in $\mathcal{{P}}^v_a$ has some $u \in {\widehat{P}}$ with $x^u_a = 1$ (this can be achieved by Breadth First Search). It is immediate that $Pr[x^v_a = 1] \le 2 {\bar{x}}^v_a$, and thus we have a 2-approximation, provided we prove that for any such $\delta $, we get a valid $IP$ solution.

Lemma 1

For any $\delta \in (0,1/2)$, and for any $v \in (F \cup R^O)$ and $b \in B$, if ${\bar{d}}^v_b \ge 1/2$ and $\mathcal{{P}}^v_b \ne \emptyset $, then any simple path $P \in \mathcal{{P}}^v_b $ has a vertex $z \ne v$ with $x^z_b = 1$ (in other words, bank load $b$ at CFG vertex $z$).

Proof

Let $P$ be such a simple path from $v$ to some $w \in R^I_b$. Note that the vertex $y$ before $w$ in $P$ has ${\bar{d}}^y_b = 0$. Therefore $P$ must have consecutive vertices $u$ and $u'$ such that ${\bar{d}}^{u'}_b < \delta $ and ${\bar{d}}^u_b \ge \delta $; here $u$ may be $v$. Note that $u' \in F$. Constraint (4) also gives ${\bar{d}}^{u'}_b + {\bar{x}}^{u'}_b \ge {\bar{d}}^u_b \ge \delta $, and therefore $x^{u'}_b$ is set to $1$ by the algorithm. The lemma holds with $z = u'$. $\square $

Now we check the feasibility of all constraints. For Constraint (1), note that for $a \ne b \in B$ and $v \in F$, in order to have both $x^v_a$ and $x^v_b$ be made $1$, we must have ${\bar{d}}^v_a < 1/2$ and ${\bar{d}}^v_b < 1/2$, leading to ${\bar{d}}$ violating Constraint (6).

Constraint (2), for $a \in B$ and $u \in R^O \setminus R^O_a$ follows from ${\bar{d}}^u_a \ge 1$ and the lemma above (if $\mathcal{{P}}^u_a \ne \emptyset $), or the way we set all $d^v_a = 1$ above if $\mathcal{{P}}^v_a = \emptyset $.

Constraint (3), for $ a \ne b \in B$ and $v \in F$ follows from the following argument: if $x^v_b = 1$, then ${\bar{d}}^v_b < 1/2$, and therefore by ${\bar{d}}$ satisfying Constraint (6), ${\bar{d}}^v_a \ge 1/2$. Therefore, by the lemma above applied to $v$ and $a$, we set $d^v_a = 1$ whether $\mathcal{{P}}^v_a = \emptyset $ or not.

Constraint (4), for $a \in B$ and $uv \in E$, follows from the way $d$ was constructed: if both $d^v_a = 0$ and $x^v_a = 0$, then $\mathcal{{P}}^u_a \ne \emptyset $ since $\mathcal{{P}}^v_a \ne \emptyset $, and there is a simple path $P \in \mathcal{{P}}^u_a$ such that, for all $z \in {\widehat{P}}$, $x^z_a = 0$: shortcut, if needed, the trail that starts with arc $uv$ and then the simple path $P' \in \mathcal{{P}}^v_a$ with, for all $z \in {\widehat{P}}'$, $x^z_a = 0$; the existence of $P$ implies $d^u_a =0$. If $d^v_a \ne 0$ and $x^v_a \ne 0$, then $d^v_a =1 $ or $x^v_a = 1$, so Constraint (4) is satisfied.

Constraint (5), for $a \in B$ and $u \in F$ such that there exists $uv \in E$ with $v \in R^I_a$ is also satisfied since $\mathcal{{P}}^u_a \ne \emptyset $ and the simple path with its only arc $uv$ has no interior.

Constraint (6), for $a \ne b \in B$ and $u \in F$ follows as follows: either ${\bar{d}}^u_a \ge 1/2$ or ${\bar{d}}^u_b \ge 1/2$, and the lemma above ensures that the one at least $1/2$ becomes $1$. Constraints (7), (8), and (9) are immediate.

3.2 Derandomization

Note that only a polynomial number of values of $\delta $ must be tried, so derandomization is immediate. We go further, and write a relaxation of LP1, and its dual and use complementary slackness to show that every value of $\delta $ gives a $2$-approximation, as it also happens for the linear program of Garg, Vazirani, and Yannakakis [13]. Let LP1’ be the variant of LP1 without constraints (1) and (3); LP1’ is a relaxation of BSIM and thus any BSIM feasible solution within $2$ of the optimum of LP1’ is a $2$-approximation. Note that we do not claim that the integral version of LP1’ is equivalent to the original BSIM instance.

The following linear program with exponentially many constraints (as we only consider simple paths) can be seen to be equivalent to (and solved by) LP1’.

$$\begin{aligned}&\min \sum _{v \in F, b \in B} x^v_b \nonumber \\ \text{ subject } \text{ to } \sum _{v \in {\widehat{P}}} x^v_a \ge 1 \; \;&\forall a \in B \wedge \forall u \in \left( R^O \setminus R^O_a \right) \wedge \forall P \in \mathcal{{P}}^u_a \end{aligned}$$

(10)

$$\begin{aligned} \sum _{v \in {\widehat{P}}_1} x^v_a + \sum _{v \in {\widehat{P}}_2} x^v_b \ge 1 \; \;&\forall a < b \in B \wedge \forall u \in F \wedge \forall P_1 \in \mathcal{{P}}^u_a \wedge \forall P_2 \in \mathcal{{P}}^u_b \end{aligned}$$

(11)

$$\begin{aligned} x^v_a \ge 0 \; \;&\forall v \in F \wedge a \in B \end{aligned}$$

(12)

To see the equivalence, use the same values $x^v_a$, and set in LP1’: $d^u_a = \min _{P \in \mathcal{{P}}^u_a} \sum _{v \in {\widehat{P}}} x^v_a$.

The program above is in fact a covering linear program and combinatorial $1+{\epsilon }$ approximations also exist [17, 23]; however the method below requires an optimum, and one would need to try all $\delta $ if only an approximate linear programming optimum is given.

The dual of the above program, given below, has variables $\alpha _P$ for all $a \in B$ and for all $u \in \left( R^O \setminus R^O_a \right) $ and for all $P \in \mathcal{{P}}^u_a$, and variables $\beta _{P_1,P_2}$ (the order of the paths matter) for all $a < b \in B$, all $u \in F$, all $P_1 \in \mathcal{{P}}^u_a$ and all $P_2 \in \mathcal{{P}}^u_b$.

$$\begin{aligned}&\max \sum _{a \in B} \sum _{u \in \left( R^O \setminus R^O_a \right) } \sum _{P \in \mathcal{{P}}^u_a} \alpha _P + \sum _{a \in B} \sum _{b \ne a \in B} \sum _{u \in F} \sum _{P_1 \in \mathcal{{P}}^u_a} \sum _{P_2 \in \mathcal{{P}}^u_b} \beta _{P_1,P_2}\nonumber \\&\text{ subject } \text{ to } \sum _{u \in \left( R^O \setminus R^O_a \right) } \sum _{P \in \mathcal{{P}}^u_a \; | \; v \in {\widehat{P}}} \alpha _P + \sum _{b < a \in B} \sum _{u \in F} \sum _{P_2 \in \mathcal{{P}}^u_a} \sum _{P_1 \in \mathcal{{P}}^u_b \; | \; v \in {\widehat{P}}_1} \beta _{P_1,P_2}\qquad \end{aligned}$$

(13)

$$\begin{aligned}&\quad + \sum _{b > a \in B} \sum _{u \in F} \sum _{P_1 \in \mathcal{{P}}^u_a} \sum _{P_2 \in \mathcal{{P}}^u_b \; | \; v \in {\widehat{P}}_2} \beta _{P_1,P_2} \le 1 \; \; \forall v \in F \wedge \forall a \in B \end{aligned}$$

(14)

$$\begin{aligned}&\quad \alpha _P \ge 0 \end{aligned}$$

(15)

$$\begin{aligned}&\quad \beta _{P_1,P_2} \ge 0 \end{aligned}$$

(16)

Let ${\bar{x}}^v_a,{\bar{d}}^v_a$ be an optimum solution to the primal. Pick any real number $\delta \in (0,1/2)$. Set ${\hat{x}}^v_a = 1$ iff ${\bar{d}}^v_a < \delta \le {\bar{d}}^v_a + {\bar{x}}^v_a$.

Claim

${\hat{x}}$ gives a valid BSIM solution.

Proof

Assume ${\hat{x}}^v_a =1$. This implies ${\bar{d}}^v_a < 1/2$, and Constraint (11) gives that for all $b \ne a \in B$, ${\bar{d}}^v_b > 1/2$, and thus ${\hat{x}}^v_b = 0$.

Note also that Lemma 1 holds as well (same proof). Let $P$ be an arbitrary trail from a vertex of $R^O \setminus R^O_b$ to a vertex of $R^I_b$. Let $P'$ be the simple path obtained by short-cutting $P$. Then Constraint (10) ensures the existence of a vertex $u$ in ${\widehat{P'}}$ with ${\hat{x}}^u_b = 1$. Then such a vertex also exists on $P$, and choose $w$ to be the last vertex on $P$ with ${\hat{x}}^w_b = 1$. Thus on trail $P$, $w$ loads bank $b$.

Suppose for a contradiction that some vertex $v$ that follows or equals $w$ on $P$ loads bank $a \ne b$. This means that ${\bar{d}}^v_a < 1/2$ and Constraint (11) ensures that ${\bar{d}}^v_b \ge 1/2$. Let $P''$ be the simple path from $v$ to the endpoint of $P$ obtained by short-cutting $P$. Lemma 1 gives a vertex $z \ne v$ on ${\widehat{P}}$ with ${\hat{x}}^z_b = 1$; note that ${\widehat{P}}$ is a subtrail of $P$ that strictly follows $w$, contradicting the choice of $w$. Thus no vertex $v$ that follows or equals $w$ on $P$ loads bank $a \ne b$, which means that $P$ is fulfilled. As $P$ was arbitrary, the claim follows.$\square $

As for the approximation ratio of 2, write the complementary slackness conditions (below, in the summation $\sum _{v \in {\widehat{P}}} x^v_a$, the bank $a$ is such that path P ends at a vertex of $R^I_a$) :

$$\begin{aligned} \alpha _P > 0&\implies&\sum _{v \in {\widehat{P}}} x^v_a = 1 \end{aligned}$$

(17)

$$\begin{aligned} \beta _{P_1,P_2} > 0&\implies&\sum _{v \in {\widehat{P}}_1} x^v_a + \sum _{v \in {\widehat{P}}_2} x^v_b = 1 \end{aligned}$$

(18)

$$\begin{aligned} x^v_a > 0&\implies&\sum _{\ldots } \alpha _P + \sum _{\ldots } \beta _{P_1,P_2} = 1 \end{aligned}$$

(19)

which hold for ${\bar{x}}$ and an optimum dual solution (Condition (19) says Constraint (14) is tight; we did not give all the details above). With respect to the same dual solution, it is immediate that ${\hat{x}}^v_a > 0 $ only if ${\bar{x}}^v_a > 0$ and therefore Condition (19) holds. Any path $P \in \mathcal{{P}}^u_a$ with $\alpha _P > 0$ must have that going through the vertices $v \in {\widehat{P}}$, we see non-increasing ${\bar{d}}^v_a$-values (we know $P$ is a shortest path w.r.t ${\bar{x}}_a$, from Condition (17)), and thus only one such vertex $v$ can have ${\hat{x}}^v_a = 1$, and thus Condition (17) is respected by ${\hat{x}}$ as well. Any two paths $P_1,P_2$ with $\beta _{P_1,P_2} > 0$ must be such that $P_1$ and $P_2$ are each a shortest path, and thus as argued above, ${\widehat{P}}_1$ has at most one $v$ with ${\hat{x}}^v_a =1$ and ${\widehat{P}}_2$ has at most one $w$ with ${\hat{x}}^w_b =1$. For ${\hat{x}}$, Condition (18) holds approximately - with a factor of 2, and as in the primal-dual method, we obtain that ${\hat{x}}$ is a 2-approximation of ${\bar{x}}$.

We do not see half-integrality as in [13], and we do not see a direct primal-dual algorithm.

4 $k$-BSIM

Without loss of generality, assume that every node is reachable from the start vertex $s$. For technical reasons, add a new transparent start vertex, $s'$, with exactly one arc, outgoing to the original $s$ (do this regardless if $s$ is transparent or not). Let $F$ be the set of transparent nodes other than $s'$, $R$ be the set of required vertices, and, for $a \in B$, let $R_a$ be the subset of $R$ with requirement $a$. As before, for $v \in V$ and $a \in B$, $\mathcal{{T}}^v_a$ denotes the (possibly infinite) set of trails of $G$ from $v$ to some node of $R^I_a$, and $\mathcal{{P}}^v_a$ denotes the set of simple paths of $G$ from $v$ to some node of $R^I_a$.

Write the following integer linear program (IP2), with variables $x^v_b$ for every node $v \in F$ and bank requirement $b \in B$ ($x^v_b$ in the IP would be 1 if transparent node $v$ loads bank $b$, in any of its registers), and variables $d^v_b$ for every node $v \in \left( (F \cup R) \setminus R_b \right) $ and bank requirement $b \in B$ ($d^v_b$ in the IP would be 1 if either $\mathcal{{P}}^v_b = \emptyset $ or, for any $P \in \mathcal{{P}}^v_b$, ${\widehat{P}}$ contains at least one node that loads bank $b$, in any register).

$$\begin{aligned} \min \sum _{v \in F, b \in B} x^v_b \nonumber \\ \text{ subject } \text{ to } \sum _{b \in B} x^v_b \le k \; \;&\forall v \in F \end{aligned}$$

(20)

$$\begin{aligned} d^u_a \ge 1 \; \;&\forall a \in B \wedge \forall u \in \left( \{s'\} \cup R \setminus R_a \right) \end{aligned}$$

(21)

$$\begin{aligned} \sum _{a \in B} d^u_a \ge |B| - k \; \;&\forall u \in F \end{aligned}$$

(22)

$$\begin{aligned} d^u_a \le d^v_a + x^v_a&\forall a \in B \wedge \forall u \in \left( F \cup R \setminus R_a \right) \wedge \forall v \in F \nonumber \\&\qquad \text{ such } \text{ that } uv \in E \end{aligned}$$

(23)

$$\begin{aligned} d^u_a = 0&\forall a \in B \wedge \forall u \in F \nonumber \\&\qquad \text{ such } \text{ that } \exists v \in R_a \text{ such } \text{ that } uv \in E \end{aligned}$$

(24)

$$\begin{aligned} 1 \ge x^v_a \ge 0 \; \;&\forall v \in F \wedge \forall a \in B \end{aligned}$$

(25)

$$\begin{aligned} 1 \ge d^v_a \ge 0 \; \;&\forall a \in B \wedge \forall v \in \left( F \cup R \setminus R_a \right) \end{aligned}$$

(26)

$$\begin{aligned} x^v_a, d^v_a \in {\mathbb {Z}}&\forall v \in F \wedge \forall a \in B \end{aligned}$$

(27)

Constraints (22) are the generalization of the “clever” constraints (6); in fact they could (and should, if one uses an IP solver) be used together with Constraints (26) to replace Constraints (6) in LP1. Indeed, Constraints (22) hold for integer solutions since, if for vertex $v$ there are $k+1$ banks $b$ with variable $d^v_b = 0$, then for any of these banks $b$, $\mathcal{{P}}^v_b \ne \emptyset $ and there is at least one path $P \in \mathcal{{P}}^v_b$ such that no node in ${\widehat{P}}$ loads bank $b$ in any of its registers. Then no matter what banks arrive or are loaded at $v$, we do not get a valid $k$-BSIM solution. Note that constraints

$$\begin{aligned} \sum _{a \in Q} d^u_a \ge |Q| - k \; \; \quad \forall Q \subseteq B \wedge \forall u \in F \end{aligned}$$

(28)

are implied by (26) and (22).

IP2 above is not equivalent to $k$-BSIM, as it does not specify in which register a bank is loaded. Nevertheless, from a $k$-BSIM solution, we can get an IP2 solution of the same value (but not vice versa; that will be a coloring problem), by setting $x^v_b$ to be 1 iff transparent node $v$ loads bank $b$ and $d^v_b$ to be 1 iff either $\mathcal{{P}}^v_b = \emptyset $ or, for any $P \in \mathcal{{P}}^v_b$, ${\widehat{P}}$ contains at least one node that loads bank $b$. We relax IP2 to the linear program LP2, and solve it in polynomial time.

Let ${\bar{x}}^v_a,{\bar{d}}^v_a$ be an optimum LP2 solution. Pick uniformly at random a real number $\delta \in (0,1/(k+1))$. Set $x^v_a = 1$ iff ${\bar{d}}^v_a < \delta \le {\bar{d}}^v_a + {\bar{x}}^v_a$.

It is immediate that $Pr[x^v_a = 1] \le (k+1) {\bar{x}}^v_a$. We load at $v$ all the $q$ banks $a$ with ${\bar{d}}^u_a < 1/(k+1)$ if at least one of them has $x^v_a = 1$, using registers $1, 2, \ldots , q$. We have $q \le k$, since Constraint (28) implies that, for any vertex $u$, at most $k$ banks $a$ can have ${\bar{d}}^u_a < 1/(k+1)$. Thus the expected number of loads is at most $k(k+1)$ times the LP cost.

One needs to check that indeed this is a valid solution of $k$-BSIM: pick an arbitrary $a \in B$, an arbitrary $u \in R_a$, and an arbitrary trail $P$ from $s'$ to $u$. Let $u'$ be the next-to-last vertex of $P$. From Constraints (24) we get ${\bar{d}}^{u'}_a = 0$. From Constraints (21) we get ${\bar{d}}^{s'}_a \ge 1/k$. Let $v$ be, as we go on $P$, the last vertex (last occurrence also) such that ${\bar{d}}^{v}_a \ge \delta $; such a $v$ exists since ${\bar{d}}^{s'}_a \ge \delta $ and ${\bar{d}}^{u'}_a = 0$. Let $v'$ be the vertex following the last occurrence of $v$ on $P$. Note that ${\bar{d}}^{v'}_a < \delta $, and Constraints (23) give ${\bar{d}}^{v}_a \le {\bar{d}}^{v'}_a + {\bar{x}}^{v'}_a$. Thus ${\bar{d}}^{v'}_a < \delta \le {\bar{d}}^{v}_a \le {\bar{d}}^{v'}_a + {\bar{x}}^{v'}_a$. Now, let $y$ be, as we go on $P$, the last vertex (last occurrence also) such that there exists $b \in B$ with ${\bar{d}}^{y}_b < \delta \le {\bar{d}}^{y}_b + {\bar{x}}^{y}_b$; such a vertex $y$ exists since $v'$ is a candidate. Note that also ${\bar{d}}^y_a < \delta $ since otherwise we could go on $P$ from $y$ to $u$ and find $v'$ after $y$ as explained above. Then also ${\bar{d}}^y_a < 1/(k+1)$ and therefore bank $a$ is loaded in some register at $y$. As we go on $P$ from $y$ to $u$, no load instructions are selected by the algorithm after $y$ (as we cannot have vertices $w \in {\widehat{P}}[y]$ and banks $b$ with $x^w_b = 1$, since this contradicts the choice of $y$), and thus $P$ is fulfilled.

For the derandomization, we can try polynomially many values of $\delta \in (0,1/(k+1))$, or prove as in Sect. 3.2 that any such value will do. The following comprises this discussion (as well as that of Sect. 3):

Theorem 1

There is a $k(k+1)$-approximation algorithm for $k$-BSIM.

Theorem 2

There is a polynomial-time algorithm whose output uses at most $2k-1$ registers and a number of load instructions at most $2k$ times the optimum solution with $k$ registers.

Proof

For this bicriteria result, we choose independently and uniformly at random real numbers $\delta _b \in (0,1/2)$. Then, for every $v \in F$, if there is an $a$ with ${\bar{d}}^v_a < \delta _a \le {\bar{d}}^v_a + {\bar{x}}^v_a$, we set $x^v_a = 1$. If $x^v_a =1 $, we load at node $v$ (in some of the available $2k-1$ registers) all the banks $b$ with ${\bar{d}}^v_b < \delta _b$; we write this as ${\hat{x}}^v_b = 1$. Indeed, there can be at most $2k-1$ banks $b$ with ${\bar{d}}^v_b < 1/2$, from Constraint (28).

We next argue that the necessary trails are fulfilled. Pick an arbitrary $a \in B$, an arbitrary $u \in R_a$, and an arbitrary trail $P$ from $s'$ to $u$. Let $u'$ be the next-to-last vertex of $P$. From Constraints (24) we get ${\bar{d}}^{u'}_a = 0$. From Constraints (21) we get ${\bar{d}}^{s'}_a \ge 1/2$. Let $v$ be, as we go on $P$, the last vertex (last occurrence also) such that ${\bar{d}}^{v}_a \ge \delta _a$; such a $v$ exists since ${\bar{d}}^{s'}_a \ge \delta _a$ and ${\bar{d}}^{u'}_a = 0$. Let $v'$ be the vertex following the last occurrence of $v$ on $P$. Note that ${\bar{d}}^{v'}_a < \delta _a$, and Constraints (23) give ${\bar{d}}^{v}_a \le {\bar{d}}^{v'}_a + {\bar{x}}^{v'}_a$. Thus ${\bar{d}}^{v'}_a < \delta _a \le {\bar{d}}^{v}_a \le {\bar{d}}^{v'}_a + {\bar{x}}^{v'}_a$.

Now, let $y$ be, as we go on $P$, the last vertex (last occurrence also) such that there exists $b \in B$ with ${\bar{d}}^{y}_b < \delta _b \le {\bar{d}}^{y}_b + {\bar{x}}^{y}_b$; such a vertex $y$ exists since $v'$ is a candidate. Note that also ${\bar{d}}^y_a < \delta _a$ since otherwise we could go on $P$ from $y$ to $u$ and find $v'$ after $y$ as explained above. Therefore also bank $a$ is loaded in some register at $y$. As we go on $P$ from $y$ to $u$, no load instructions are selected by the algorithm after $y$ (as we cannot have vertices $w \in {\widehat{P}}[y]$ and banks $b$ with $x^w_b = 1$, since this contradicts the choice of $y$), and thus $P$ is fulfilled.

Moreover, for any $u \in V$, requiring $a \in B$, any path $P$ from $s$ to $u$ has vertex $v'$ with ${\bar{d}}^{v'}_a < \delta _a \le {\bar{d}}^{v'}_a + {\bar{x}}^{v'}_a$. Let $v$ be the last vertex on ${\widehat{P}}$ with ${\bar{d}}^v_b < \delta _b \le {\bar{d}}^v_b + {\bar{x}}^v_b$, for some $b \in B$. Then also ${\bar{d}}^v_a < \delta _a$ (as otherwise there is a further, on ${\widehat{P}}$, vertex $v''$ with ${\bar{d}}^{v''}_a < \delta _a \le {\bar{d}}^{v''}_a + {\bar{x}}^{v''}_a$). and therefore bank $a$ is loaded in some register at $y$. As we go on ${\widehat{P}}$ from $v$ to $u$, no further load instructions are selected by the algorithm, and thus at vertex $u$ bank $a$ is loaded.

Let $Q_v$ be the set of banks $a$ with ${\bar{d}}^v_a < 1/2$. The probability that bank $b$ is loaded at vertex $v \in Q_v$ is (using the independence of the choices of $\delta _a$):

$$\begin{aligned} Pr[{\hat{x}}^v_b = 1] \le Pr[x^v_b = 1]&+ \sum _{a \in Q_v \setminus \{b\} } Pr[x^v_a = 1] \cdot Pr[{\bar{d}}^v_b < \delta _b] \le 2 {\bar{x}}^v_b \nonumber \\&+\sum _{a \in Q_v \setminus \{b\} } 2 {\bar{x}}^v_a \cdot Pr[{\bar{d}}^v_b < \delta _b] \end{aligned}$$

(29)

and thus the expected number of loads at node $v$ is at most

$$\begin{aligned} \sum _{b \in Q_v} \left( 2 {\bar{x}}^v_b \!+\! \sum _{a \in Q_v \setminus \{b\} } 2 {\bar{x}}^v_a \cdot Pr[{\bar{d}}^v_b < \delta _b] \right) \!=\! \sum _{b \in Q_v} {\bar{x}}^v_b \left( 2 + 2 \sum _{a \in Q_v \setminus \{b\} } Pr[{\bar{d}}^v_a < \delta _a] \right) \end{aligned}$$

If $|Q_v| \le k$, then

$$\begin{aligned} \left( 2 + 2 \sum _{a \in Q_v \setminus \{b\} } Pr[{\bar{d}}^v_a < \delta _a] \right) \le 2 + 2 \left( |Q_v| - 1 \right) \le 2k \end{aligned}$$

and thus the expected number of loads at node $v$ is at most $2k \sum _{b \in B} {\bar{x}}^v_b$.

Otherwise, $|Q_v| \ge k+1$, and Constraint (28) together with ${\bar{d}}^v_b < 1/2$ gives:

$$\begin{aligned} \sum _{a \in Q_v \setminus \{b\} } {\bar{d}}^v_a \ge |Q_v| - k - 1/2. \end{aligned}$$

(30)

We have

$$\begin{aligned} 2 + 2 \sum _{a \in Q_v \setminus \{b\} } Pr[{\bar{d}}^v_a < \delta _a]&= 2 + 2 \sum _{a \in Q_v \setminus \{b\} } ( 1 - 2 {\bar{d}}^v_a) \\&= 2 + 2 ( |Q_v| - 1) - 4 \sum _{a \in Q_v \setminus \{b\} } {\bar{d}}^v_a \\&\le 2 |Q_v| - 4 (|Q_v| - k - 1/2) \\&= 4k + 2 - 2 |Q_v| \\&\le 4k + 2 - 2 (k+1) \\&= 2k \end{aligned}$$

where we used Inequality (30) for the first inequality and $|Q_v| \ge k+1$ for the last. Thus in all cases, the expected number of bank loads is at most $2k$ times the LP solution value. $\square $

Assuming $\ln n << k$, the following result is an improvement:

Theorem 3

There is a $O(k \ln n)$ randomized approximation algorithm for $k$-BSIM.

Proof

Use the rounding method of the previous theorem, with the interval $(0, 1/(8 \ln n) )$ for each $\delta _a$. Let $Q_v$ be the set of banks $a$ with ${\bar{d}}^v_a < 1/(8 \ln n)$; as above $|Q_v| \le 2k$. Let $Q'_v$ be the (random) set of banks loaded by the algorithm at $v$.

Claim

$Pr[|Q'_v| > k] \le \frac{1}{n^2}$

Proof

We are setting up a Chernoff bound. We define $d_a = {\bar{d}}^v_a$, $Q = Q_v$, $q = |Q|$, and $\sigma = \sum _{a \in Q} d_a$. We may assume $q>k$ or else the claim is trivially true. For bank $a \in Q$, define the random variables:

$$\begin{aligned} Z_a = \left\{ \begin{array}{ll} 1 &{} \quad \mathrm if d_a > \delta _a \\ 0 &{} \quad \mathrm otherwise \end{array} \right. \end{aligned}$$

Define the random variable $Z = \sum _{a \in Q} Z_a$. Let $p_a = 8 d_a \ln n$ and $p = \frac{ \sum _{a \in Q} p_a }{q}$. Let $X_a \; (a \in Q)$ be the random variables $Z_a - p_a$. Then $X_a$ are mutually independent with $Pr[X_a = 1 - p_a] = p_a$ and $Pr[X_a = -p_a] = 1 - p_a$. Define the random variable $X = \sum _{a \in Q} X_a$. Then $X$ satisfies Assumptions A.1.3 of [1] and therefore Theorem A.1.13 of [1] states that, for any $\alpha >0$,

$$\begin{aligned} Pr[X < - \alpha ] < e^{-\alpha ^2/2pq}. \end{aligned}$$

(31)

We have that the event $|Q'_v| > k$ is included in the event $Z < q-k$, which is the event $X < (q-k) - \sum _{a \in Q} p_a$. Note that

$$\begin{aligned} - (q-k) + \sum _{a \in Q} p_a = - (q-k) + 8 \ln n \sum _{a \in Q} d_a \ge 7 \sigma \ln n, \end{aligned}$$

where we used Constraints (28), which state $\sigma \ge (q-k)$. Note also that $q-k \ge 1$, and thus Chernoff’s bound from Eq. (31) gives

$$\begin{aligned} Pr[|Q'_v| > k] < e^{-(7 \sigma \ln n )^2/(2 \cdot 8 \sigma \ln n)} \le e^{-2 \sigma \ln n} \le e^{ - 2 \ln n}, \end{aligned}$$

which is what the claim requires. $\square $

The expected number of banks loaded at vertex $v$, computed as in the bicriteria algorithm, does not exceed $ \left( 2 + 8 k \ln n \right) \sum _{b \in Q_v} {\bar{x}}^v_b$. Thus Markov’s inequality gives $Pr[ \text{ number } \text{ of } \text{ banks } \text{ loaded } > 20 k (\ln n) Z^*_{LP2}] \le 1/2$, where $Z^*_{LP2}$ is the objective value of LP2. From Claim 4, taken as a union over all $v$, the probability that there is a vertex loading more than $k$ banks is at most $1/n$. So with probability $1/3$ no vertex is overloaded and less than $20 k (\ln n) Z^*_{LP2}$ banks are loaded in total. This concludes the proof of Theorem 3. $\square $

4.1 Integrality Gap

Let $ opt (I)$ be the optimum value for a $k$-BSIM instance $I$, $Z^*_{IP2}(I)$ be the optimum value for the constructed IP2 instance, and $Z^*_{LP2}(I)$ be the optimum value for the LP2 relaxation. We are unable to find the worst case ratio between $ opt (I)$ and either $Z^*_{IP2}(I)$ or $Z^*_{LP2}(I)$. The next theorem relates $Z^*_{IP2}(I)$ to $Z^*_{LP2}(I)$ and is not surprising in view of the connection of $k$-BSIM to $(k+1)$-uniform Hypergraph Vertex Cover.

Theorem 4

For any ${\epsilon }> 0$, there exists a $k$-BSIM instance $I$ with $Z^*_{IP2}(I) > (k+1 -{\epsilon }) Z^*_{LP2}(I)$.

Proof

We use the first reduction from Sect. 2, starting with a complete $(k+1)$-uniform hypergraph $G$ with $n$ vertices (and thus $\left( {\begin{array}{c}n\\ k+1\end{array}}\right) $ hyperedges). The number $n$ will be picked large enough (and depend on ${\epsilon }$). We obtain a $k$-OBSIM instance, which we call $J$. In $J$, we call nodes corresponding to the hyperedges in $G$ hyperedge-nodes and call nodes corresponding to vertices in $G$ hypervertex-nodes. The hypervertex-nodes come in $n$ groups, where all the $k \left( {\begin{array}{c}n\\ k+1\end{array}}\right) ^2$ nodes of $I$ coming from the same vertex of $G$ are in the same group and have the same requirement, and vertices in different groups have different requirements. To simplify some notation, we change $J$ by making the hyperedge-nodes transparent. We further transform $J$ into a $k$-BSIM instance $I$ using the second reduction of Sect. 2; this adds for every hypervertex-node $v$ a node $v_{in}$ (it should also add $v_{out}$, but with no arc leaving $v_{out}$ we ignore it) and then we add $s'$ and construct an IP2 instance we call $I'$.

Next we describe a fractional (LP2) solution to $I'$. For every hypervertex-node $v$, in a group with requirement $a$ (where $a$ is a vertex in $G$), set $d^{v_{in}}_a = 0$, and for all $b \ne a \in B$, set $d^{v_{in}}_b = 1$. Also set $x^{v_{in}}_a = 1/(k+1)$, and for all $b \ne a \in B$, set $x^{v_{in}}_b = 0$. For every hyperedge-node $w$, corresponding to a hyperedge $e$ of $G$, set for all $a \in e$, $d^w_a = 1/(k+1)$ and $x^w_a = k/(k+1)$, and for all $b \not \in e$, set $d^w_b = 1$ and $x^w_b = 0$. Set for both $s$ and $s'$ all $d$-values to be 1 and all $x$-values to be 0. One can check that this is indeed a feasible LP2 solution, and its objective is $n \cdot k \left( {\begin{array}{c}n\\ k+1\end{array}}\right) ^2 \cdot (1/(k+1) ) + \left( {\begin{array}{c}n\\ k+1\end{array}}\right) \cdot (k+1) \cdot (k/(k+1))$. Thus:

$$\begin{aligned} Z^*_{LP2}(I) \le \frac{nk}{k+1} \left( {\begin{array}{c}n\\ k+1\end{array}}\right) ^2 + k \cdot \left( {\begin{array}{c}n\\ k+1\end{array}}\right) \end{aligned}$$

(32)

Now assume for a contradiction that

$$\begin{aligned} Z^*_{IP2}(I) \ge (n-k) k \left( {\begin{array}{c}n\\ k+1\end{array}}\right) ^2 \end{aligned}$$

(33)

does not hold. Let ${\bar{d}}$ and ${\bar{x}}$ be this feasible solution. Then there exist $k+1$ groups, from vertices $y_1, y_2, \ldots , y_{k+1}$ of $G$, such that each has a node $v^i$, for $i = 1, 2, \ldots , k+1$ such that ${\bar{x}}^{v^i_{in}}_{y_i} = 0$. $G$ being complete, there is a hyperedge $e = \{y_1, y_2, \ldots , y_{k+1}\}$, and let $w$ be the hyperedge-node corresponding to hyperedge $e$. For all $i = 1, 2, \ldots , k+1$, Constraints (24) for $v^i_{in}$ and $v_i$ give ${\bar{d}}^{v^i_{in}}_{y_i} = 0$, and Constraints (23) for $w$ and $v^i_{in}$ give ${\bar{d}}^{w}_{y_i} = 0$, leading to contradicting Constraint (22) for $w$, a contradiction to ${\bar{d}}$ and ${\bar{x}}$ being a feasible solution.

Thus Eq. (33) holds, and together with Eq. (32), $n \leftarrow \infty $ and $k$ fixed, we conclude that indeed for any ${\epsilon }> 0$, there exists a $k$-BSIM instance $I$ with $Z^*_{IP2}(I) > (k+1 -{\epsilon }) Z^*_{LP2}(I)$. $\square $

5 Conclusion

For $k$-BSIM we presented approximation algorithms with ratios $k(k+1)$, and $O(k \log n)$. Another algorithm outputs a valid solution with objective at most $2k$ times the optimum for $k$ registers, using $2k$ registers. These results hold in arbitrary input CFGs. Our algorithms can easily be extended to the case when required nodes each has a set of banks $A \subset B$ with $|A| \le k$, and all banks of $A$ must be loaded.

Our hardness results, that hold for acyclic CFGs as well, are that for any ${\epsilon }> 0$, an approximation ratio of $k - {\epsilon }$ is NP-hard, and an approximation ratio of $k+1 - {\epsilon }$ is unlikely, as it would imply that Vertex Cover has approximation $2 - {\epsilon }$.

We leave open the existence of a $O(k)$-approximation. It is not immediate to apply our methods to the following variant of the problem: For every CFG node $v$ that requires bank $b$, we must select a register $i \in \{1, 2, \ldots , k \}$ and ensure every directed path from $s$ to $v$ loads bank $b$ in register $i$, and no further loads in register $i$ are allowed. Notice, as for example in Fig. 3, that $k$-OBSIM does not require such an $i$ to be selected.

References

Alon, N., Joel, H.: Spencer. The probabilistic method, 2nd edn. Wiley, Hoboken, NJ (2000)
Book Google Scholar
Amit, A., Noga, A., Moses, S.: Charikar. Improved approximation for directed cut problems. In: Proceedings of the thirty-ninth annual ACM symposium on Theory of Computing, STOC ’07, pp. 671–680. ACM, New York, NY (2007)
Bergner, P., Dahl, P., Engebretsen, D., O’Keefe, M.: Spill code minimization via interference region spilling. In: Proceedings of the ACM SIGPLAN 1997 conference on Programming Language Design and Implementation, PLDI ’97, pp. 287–295. ACM, New York, NY (1997)
Bernstein, D., Golumbic, M., Mansour, Y., Pinter, R., Goldin, D., Krawczyk, H. Nahshon, I.: Spill code minimization techniques for optimizing compliers. In: Proceedings of the ACM SIGPLAN 1989 Conference on Programming Language Design and Implementation, PLDI ’89, pp. 258–263. ACM, New York, NY (1989)
Briggs, P., Cooper, K.D., Torczon, L.: Coloring register pairs. ACM Lett. Program. Lang. Sys. 1, 3–13 (1992)
Article Google Scholar
Changqing, Fu., Wilken, K.: A faster optimal register allocator. In: Proceedings of the 35th annual ACM/IEEE International symposium on Microarchitecture, MICRO 35, pp. 245–256. IEEE Computer Society Press, Los Alamitos, CA (2002)
Charikar, M., Chekuri, C., Cheung, T.Y., Dai, Z., Goel, A., Guha, Sudipto, Li, Ming: Approximation algorithms for directed Steiner problems. J. Algo. 33(1), 73–91 (1999)
Article MATH MathSciNet Google Scholar
Cheriyan, J., Karloff, H.J., Rabani, Y.: Approximating directed multicuts. Combinatorica 25(3), 251–269 (2005)
Article MATH MathSciNet Google Scholar
Cooper, K., Dasgupta, A., Eckhardt, J.: Revisiting graph coloring register allocation: a study of the Chaitin–Briggs and Callahan–Koblenz algorithms. In: Ayguad, E., Baumgartner, G., Ramanujam, J., Sadayappan, P., (eds) Languages and Compilers for Parallel Computing, Lecture Notes in Computer Science, vol. 4339, pp. 1–16. Springer, Berlin (2006)
Dinur, I., Guruswami, V., Khot, S., Regev, O.: A new multilayered PCP and the hardness of hypergraph vertex cover. SIAM J. Comput. 34(5), 1129–1146 (2005)
Article MATH MathSciNet Google Scholar
Falk, H.: Wcet-aware register allocation based on graph coloring. In: Proceedings of the Fourty-Sixth ACM/IEEE Design Automation Conference, DAC’09. ACM, New York , NJ (2009)
Freescale. http://www.freescale.com
Garg, N., Vazirani, V.V., Yannakakis, M.: Multiway cuts in node weighted graphs. J. Algo. 50(1), 49–61 (2004)
Article MATH MathSciNet Google Scholar
Gupta, A.: Improved results for directed multicut. In: Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms, SODA ’03, pp. 454–455. Society for Industrial and Applied Mathematics, Philadelphia, PA (2003)
Jansen, K.: The allocation problem in hardware design. Discret. Appl. Math. 43(1), 37–46 (1993)
Article MATH Google Scholar
Jansen, K., Reiter, J.: An approximation algorithm for the register allocation problem. Int. VLSI J. 25(2), 89–102 (1998)
Article MATH Google Scholar
Jansen, K., Porkolab, L.: On preemptiveresource constrained scheduling: polynomial-time approximation schemes. SIAM J. Discret. Math. 20(3), 545–563 (2006)
Article MATH MathSciNet Google Scholar
Kannan, S., Proebsting, T.A.: Register allocation in structured programs. J. Algo. 29(2), 223–237 (1998)
Article MATH MathSciNet Google Scholar
Koes, D.R., Goldstein, S.C.: A global progressive register allocator. In: Proceedings of the ACM SIGPLAN 2006 conference on Programming language design and implementation, PLDI ’06, pp. 204–215. ACM, New York, NY (2006)
Koes, D., Goldstein, S.C.: A progressive register allocator for irregular architectures. In: Proceedings of the International Symposium on Code Generation and Optimization, CGO ’05, pp. 269–280. IEEE Computer Society, Washington, DC (2005)
Koes, D.R., Goldstein, S.C.: Register allocation deconstructed. In: Proceedings of the 12th International Workshop on Software and Compilers for Embedded Systems, SCOPES ’09, pp. 21–30. ACM, New York, NY (2009)
Li, M., Chun Jason, X., Tiantian, L., Yingchao, Z.: Analysis and approximation for bank selection instruction minimization on partitioned memory architecture. J. Comb. Optim. 23(2), 274–291 (2012)
Article MATH MathSciNet Google Scholar
Plotkin, S.A., Shmoys, D.B., Tardos, E.: Fast approximation algorithms for fractional packing and covering problems. Math. Oper. Res. 20, 257–301 (1995)
Article MATH MathSciNet Google Scholar
Quintão Pereira, F.M., Pereira, J. Palsberg.: Register allocation by puzzle solving. In: Proceedings of the 2008 ACM SIGPLAN conference on Programming Language Design and Implementation, PLDI ’08, pp. 216–226. ACM, New York, NY (2008)
Sleator, D.D., Tarjan, R.E.: Amortized efficiency of list update and paging rules. Commun. ACM 28(2), 202–208 (1985)
Article MathSciNet Google Scholar
Smith, M.D., Ramsey, N., Holloway, G.: A generalized algorithm for graph-coloring register allocation. In: Proceedings of the ACM SIGPLAN 2004 conference on Programming Language Design and Implementation, PLDI ’04, pp. 277–288. ACM, New York, NY (2004)
Thorup, M.: All structured programs have small tree-width and good register allocation. Inf. Comput. 142(2), 159–181 (1998)
Article MATH MathSciNet Google Scholar
Zalamea, J., Llosa, J., Ayguad, E., Valero, M.: Modulo scheduling with integrated register spilling. In: Dietz, H. (ed.) Languages and Compilers for Parallel Computing. Lecture Notes in Computer Science, vol. 2624, pp. 239–253. Springer, Berlin (2003)
Chapter Google Scholar
Zelikovsky, A.: A series of approximation algorithms for the acyclic directed Steiner tree problem. Algorithmica 18(1), 99–110 (1997)
Article MATH MathSciNet Google Scholar
Zhou, Y.: Hardness of register loading, 2010. Personal communication (2010)
Zilog.: http://www.zilog.com
Zosin, L., Khuller, S.: On directed Steiner trees. In: Proceedings of the Thirteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA’02, pp. 59–63. Society for Industrial and Applied Mathematics, San Francisco, CA (2002)

Download references

Acknowledgments

Research supported in part by NSF Grant NeTS-0916743 and a grant from the Research grants Council of the Hong Kong Special Administrative Region, China [Project No. CityU 124411].

Author information

Authors and Affiliations

Department of Computer Science, Illinois Institute of Technology, Chicago, IL, USA
Gruia Calinescu
Department of Computer Science, City University of Hong Kong, Hong Kong, China
Minming Li

Authors

Gruia Calinescu
View author publications
You can also search for this author in PubMed Google Scholar
Minming Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Minming Li.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Calinescu, G., Li, M. Register Loading via Linear Programming. Algorithmica 72, 1011–1032 (2015). https://doi.org/10.1007/s00453-014-9888-2

Download citation

Received: 05 September 2012
Accepted: 25 April 2014
Published: 30 May 2014
Issue Date: August 2015
DOI: https://doi.org/10.1007/s00453-014-9888-2

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Register Loading via Linear Programming

Abstract

Similar content being viewed by others

Optimal Register Allocation in Polynomial Time

Register Allocation

Alternating Automatic Register Machines

1 Introduction

1.1 Problem Formulation

1.2 Results and Discussion

2 Reductions

3 1-BSIM

3.1 LP Rounding

Lemma 1

Proof

3.2 Derandomization

Claim

Proof

4 \(k\)-BSIM

Theorem 1

Theorem 2

Proof

Theorem 3

Proof

Claim

Proof

4.1 Integrality Gap

Theorem 4

Proof

5 Conclusion

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Register Loading via Linear Programming

Abstract

Similar content being viewed by others

Optimal Register Allocation in Polynomial Time

Register Allocation

Alternating Automatic Register Machines

1 Introduction

1.1 Problem Formulation

1.2 Results and Discussion

2 Reductions

3 1-BSIM

3.1 LP Rounding

Lemma 1

Proof

3.2 Derandomization

Claim

Proof

4 \(k\)-BSIM

Theorem 1

Theorem 2

Proof

Theorem 3

Proof

Claim

Proof

4.1 Integrality Gap

Theorem 4

Proof

5 Conclusion

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation