Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Consider a box counting heuristic using radius-based boxes, e.g., Maximum Excluded Mass Burning. There is no guarantee that the computed is minimal or even near minimal. However, if a lower bound on is available, we can immediately determine the deviation from optimality for the calculated . A method that provides a lower bound on is presented in [44]. The lower bound is computed by formulating box counting as an uncapacitated facility location problem (UFLP), a classic combinatorial optimization problem. This formulation provides, via the dual of the linear programming relaxation of UFLP, a lower bound on . The method also yields an estimate of ; this estimate is an upper bound on . Under the assumption that holds for some positive constant a and some range of r, a linear program [6], formulated using the upper and lower bounds on , provides an upper and lower bound on \({d_{ \stackrel {}{B}}}\). In the event that the linear program is infeasible, a quadratic program [18] can be used to estimate \(d_{ \stackrel {}{B}}\).

4.1 Mathematical Formulation

Let the box radius r be fixed. For simplicity we will refer to node j rather than node . Define \({\mathbb {N}} \equiv \{ 1, 2, \cdots , N \}\). Let C r be the symmetric N by N matrix defined by

$$\displaystyle \begin{aligned} \begin{array}{rcl} C^r_{ij} = \left\{ \begin{array}{ll} 0 &\displaystyle \mbox{if } dist(i, j) \leq r, \\ \infty &\displaystyle \mbox{otherwise.} \end{array} \right . \end{array} \end{aligned} $$

(As with the matrix \(M^r_{ij}\) defined by (3.1 ), the superscript r in \(C^r_{ij}\) does not mean the r-th power of the matrix C.) For example, for r = 1, the matrix C r corresponding to the network of Fig. 4.1 is

$$\displaystyle \begin{aligned} \begin{array}{rcl} C^1 = \left( \begin{array}{ccccccc} 0 &\displaystyle 0 &\displaystyle - &\displaystyle - &\displaystyle - &\displaystyle - &\displaystyle 0 \\ 0 &\displaystyle 0 &\displaystyle 0 &\displaystyle - &\displaystyle - &\displaystyle - &\displaystyle - \\ - &\displaystyle 0 &\displaystyle 0 &\displaystyle 0 &\displaystyle - &\displaystyle - &\displaystyle 0 \\ - &\displaystyle - &\displaystyle 0 &\displaystyle 0 &\displaystyle 0 &\displaystyle - &\displaystyle - \\ - &\displaystyle - &\displaystyle - &\displaystyle 0 &\displaystyle 0 &\displaystyle 0 &\displaystyle - \\ - &\displaystyle - &\displaystyle - &\displaystyle - &\displaystyle 0 &\displaystyle 0 &\displaystyle 0 \\ 0 &\displaystyle - &\displaystyle 0 &\displaystyle - &\displaystyle - &\displaystyle 0 &\displaystyle 0 \end{array} \right), \end{array} \end{aligned} $$

where a dash “–” is used to indicate the value .

Fig. 4.1
figure 1

Example network with seven nodes and eight arcs

For \(j \in {\mathbb {N}}\), let

A given node i will, in general, be within distance r of more than one center node j used in the covering of \(\mathbb {G}\). However, we will assign each node i to exactly one node j, and the variables specify this assignment. For \(i,j \in {\mathbb {N}}\), let

With the understanding that r is fixed, for simplicity we write \(c_{\stackrel {}{{ij}}}\) to denote element (i, j) of the matrix C r. The minimal network covering problem is

(4.1)
(4.2)
(4.3)
(4.4)
(4.5)

Let UFLP denote the optimization problem defined by (4.1)–(4.5). Constraint (4.2) says that each node must be assigned to the box centered at some j. Constraint (4.3) says that node i can be assigned to the box centered at j only if that box is used in the covering, i.e., only if . The objective function is the sum of the number of boxes in the covering and the total cost of assigning each node to a box. Problem UFLP is feasible since we can always set \(y_{ \stackrel {}{i}} = 1\) and \(x_{\stackrel {}{{ii}}} = 1\) for \(i \in {\mathbb {N}}\); i.e., let each node be the center of a box in the covering. Given a set of binary values of for \(j \in {\mathbb {N}}\), since each \(c_{\stackrel {}{{ij}}}\) is either 0 or , if there is a feasible assignment of nodes to boxes then the objective function value is the number of boxes in the covering; if there is no feasible assignment for the given values then the objective function value is . Note that UFLP requires only ; it is not necessary to require to be binary. This relaxation is allowed since if (x, y) solves UFLP then the objective function value is not increased, and feasibility is maintained, if we assign each i to exactly one k (where k depends on i) such that and .

The primal linear programming relaxation PLP of UFLP is obtained by replacing the restriction that each is binary with the constraint . We associate the dual variable with the constraint , and the dual variable with the constraint . The dual linear program [18] DLP corresponding to PLP is

Following [11], we set and express DLP using only the variables:

(4.6)
(4.7)

Let v(UFLP) be the optimal objective function value of UFLP. Then . Let v(PLP) be the optimal objective function value of the linear programming relaxation PLP. Then v(UFLP) ≥ v(PLP). Let v(DLP) be the optimal objective function value of the dual linear program DLP. By linear programming duality theory, v(PLP) = v(DLP). Define . If u is feasible for DLP as defined by (4.6) and (4.7), then the dual objective function satisfies . Combining these relations, we have

Thus is a lower bound on . As described in [44], to maximize this lower bound subject to (4.7), we use the Dual Ascent and Dual Adjustment methods of [11]; see also [42].

4.2 Dual Ascent and Dual Adjustment

Call the N variables the dual variables. The Dual Ascent method initializes u = 0 and increases the dual variables, one at a time, until constraints (4.7) prevent any further increase in any dual variable. For \(i \in {\mathbb {N}}\), let \({\mathbb {N}}_i = \{ j \in {\mathbb {N}} \, \vert \, c_{\stackrel {}{{ij}}} = 0 \}\). By definition of \(c_{\stackrel {}{{ij}}}\), we have \({\mathbb {N}}_i = \{ j \, \vert dist(i, j) \leq r \}\). Note that \(i \in {\mathbb {N}}_i\). From (4.7), we can increase some dual variable from 0 to 1 only if for \(j \in {\mathbb {N}}_i\). Once we have increased then we cannot increase for any k such that \(c_{\stackrel {}{{kj}}} =0\) for some \(j \in {\mathbb {N}}_i\). This is illustrated, for r = 1, in Fig. 4.2, where and . Once we set , we cannot increase the dual variable associated with or or .

Fig. 4.2
figure 2

Increasing to 1 block other dual variable increases

Recalling that is the node degree of node j, if \(c_{\stackrel {}{{ij}}} = 0\) then the number of dual variables prevented by node j from increasing when we increase is at least , where we subtract 1 since is being increased from 0. In general, increasing prevents approximately at least dual variables from being increased. This is approximate, since there may be arcs connecting the nodes in \({\mathbb {N}}_i\), e.g., there may be an arc between and in Fig. 4.2. However, we can ignore such considerations since we use only as a heuristic metric: we pre-process the data by ordering the dual variables in order of increasing . We have only if for \(j \in {\mathbb {N}}_i\), i.e., only if each node in \({\mathbb {N}}_i\) is a leaf node. This can occur only for the trivial case that \({\mathbb {N}}_i\) consists of two nodes (one of which is i itself) connected by an arc. For any other topology we have . For \(j \in {\mathbb {N}}\), define s(j) to be the slack in constraint (4.7) for node j, so s(j) = 1 if and s(j) = 0 otherwise.

Having pre-processed the data, we run the following Dual Ascent procedure. This procedure is initialized by setting u = 0 and s(j) = 1 for \(j \in {\mathbb {N}}\). We then examine each in the sorted order and compute \(\gamma \equiv \min \{ s(j) \, \vert \, j \in {\mathbb {N}}_i \}\). If γ = 0 then cannot be increased. If γ = 1 then we increase from 0 to 1 and set s(j) = 0 for \(j \in {\mathbb {N}}_i\), since there is no longer slack in those constraints.

Figure 4.3 shows the result of applying Dual Ascent, with r = 1, to Zachary’s Karate Club network [37] , which has 34 nodes and 77 arcs. In this figure, node 1 is labelled as “v1”, etc. The node with the smallest penalty is node 17, and the penalty (p in the figure) is 7. Upon setting we have s(17) = s(6) = s(7) = 0; these nodes are pointed to by arrows in the figure. The node with the next smallest penalty is node 25, and the penalty is 12. Upon setting we have s(25) = s(26) = s(28) = s(32) = 0. The node with the next smallest penalty is node 26, and the penalty is 13. However, cannot be increased, since s(25) = s(32) = 0. The node with the next smallest penalty is node 12, and the penalty is 15. Upon setting we have s(12) = s(1) = 0. The node with the next smallest penalty is node 27, and the penalty is 20. Upon setting we have s(27) = s(30) = s(34) = 0. No other dual variable can be increased, and Dual Ascent halts, yielding a dual objective function value of 4, which is the lower bound on .

Fig. 4.3
figure 3

Results of applying Dual Ascent to Zachary’s Karate Club network

We can now calculate the upper bound . For \(j \in {\mathbb {N}}\), set if s(j) = 0 and otherwise. Setting means that the box of radius r centered at node j will be used in the covering of \(\mathbb {G}\). For Zachary’s Karate Club network, at the conclusion of Dual Ascent with r = 1 there are 12 values of j such that s(j) = 0; for each of these values we set .

We have shown that if u satisfies (4.7) then

If then we have found a minimal covering. If then we use a Dual Adjustment procedure [11] to attempt to close the gap . For Zachary’s Karate Club network, for r = 1 we have .

The Dual Adjustment procedure is motivated by the complementary slackness optimality conditions of linear programming. Let (x, y) be feasible for PLP and let (u, w) be feasible for DLP, where . The complementary slackness conditions state that (x, y) is optimal for PLP and (u, w) is optimal for DLP if

(4.8)
(4.9)

We can assume that x is binary, since as mentioned above, we can assign each i to a single k (where k depends on i) such that and . We say that a node \(j \in {\mathbb {N}}\) is “open” (i.e., the box centered at node j is used in the covering of \(\mathbb {G}\)) if ; otherwise, j is “closed.” When (x, y) and u are feasible for PLP and DLP, respectively, and x is binary, constraints (4.9) have a simple interpretation: if for some i we have then there can be at most one open node j such that dist(i, j) ≤ r. For suppose to the contrary that and there are two open nodes and such that and . Then . Since x is binary, by (4.2), either or . Suppose without loss of generality that and . Then

but

so complementary slackness fails to hold. This argument is easily extended to the case where there are more than two open nodes such that dist(i, j) ≤ r. The conditions (4.9) can also be visualized using Fig. 4.2, where . If then at most one node in the set can be open.

If , we run the following Dual Adjustment procedure to close some nodes, and construct x, to attempt to satisfy constraints (4.9). Define

so Y is the set of open nodes. The Dual Adjustment procedure, which follows Dual Ascent, has two steps.

Step 1

For \(i \in {\mathbb {N}}\), let α(i) be the “smallest” node in Y such that . By “smallest” node we mean the node with the smallest node index, or the alphabetically lowest node name; any similar tie-breaking rule can be used. If for some j ∈ Y we have j ≠ α(i) for \(i \in {\mathbb {N}}\), then j can be closed, so we set Y = Y −{j}. In words, if the chosen method of assigning each node to a box in the covering results in the box centered at j never being used, then j can be closed.

Applying Step 1 to Zachary’s Karate Club network with r = 1, using the tie-breaking rule of the smallest node index, we have, for example, α(25) = 25, α(26) = 25, α(27) = 27, and α(30) = 27. After computing each α(i), we can close nodes 7, 12, 17, and 28, as indicated by the bold X next to these nodes in Fig. 4.4. After this step, we have Y = {1, 6, 25, 26, 27, 30, 32, 34}. This step lowered the primal objective function from 12 (since originally |Y | = 12) to 8.

Fig. 4.4
figure 4

Closing nodes in Zachary’s Karate Club network

Step 2

Suppose we consider closing j, where j ∈ Y . We consider the impact of closing j on i, for \(i \in {\mathbb {N}}\). If j ≠ α(i) then closing j has no impact on i, since i is not assigned to the box centered at j. If j = α(i) then closing j is possible only if there is another open node β(i) ∈ Y such that β(i) ≠ α(i) and (i.e., if there is another open node, distinct from α(i), whose distance from i does not exceed r). Thus we have the rule: close j if for \(i \in {\mathbb {N}}\) either

$$\displaystyle \begin{aligned} j \neq \alpha(i) \end{aligned}$$

or

$$\displaystyle \begin{aligned} j = \alpha(i) \mbox{ and } \beta(i) \mbox{ exists.} \end{aligned}$$

Once we close j and set Y = Y −{j} we must recalculate α(i) and β(i) (if it exists) for \(i \in {\mathbb {N}}\).

Applying Step 2 to Zachary’s Karate Club network with r = 1, we find that, for example, we cannot close node 1, since 1 = α(5) and β(5) does not exist. Similarly, we cannot close node 6, since 6 = α(17) and β(17) does not exist. We can close node 25, since 25 = α(25) but β(25) = 26 (i.e., we can reassign node 25 from the box centered at 25 to the box centered at 26), 25 = α(26) but β(26) = 26, 25 = α(28) but β(28) = 34, and 25 = α(32) but β(32) = 26. After recomputing α(i) and β(i) for \(i \in {\mathbb {N}}\), we determine that node 26 can be closed. Continuing in this manner, we determine that nodes 27 and 30 can be closed, yielding Y = {1, 6, 32, 34}. Since now the primal objective function value and the dual objective function value are both 4, we have computed a minimal covering. When we execute Dual Ascent and Dual Adjustment for Zachary’s Karate Club network with r = 2 we obtain primal and dual objective function values of 2, so again a minimal covering has been found.

4.3 Bounding the Fractal Dimension

Assume that for some positive constant a we have

(4.10)

Suppose we have computed and for r = 1, 2, ⋯ , K. From

we obtain, for r = 1, 2, ⋯ , K,

(4.11)

The system (4.11) of 2K inequalities may be infeasible, i.e., it may have no solution a and \(d_{ \stackrel {}{B}}\). If the system (4.11) is feasible, we can formulate a linear program to determine the maximal and minimal values of \(d_{ \stackrel {}{B}}\) [44]. For simplicity of notation, let the K values \(\log (2r+1)\) for r = 1, 2, ⋯ , K be denoted by for k = 1, 2, ⋯ , K, so , , , etc. For k = 1, 2, ⋯ , K, let the K values of and be denoted by and , respectively. Let \(b = \log a\). The inequalities (4.11) can now be expressed as

The minimal value of \(d_{ \stackrel {}{B}}\) is the optimal objective function value of BCLP (Box Counting Linear Program):

This linear program has only two variables, b and \(d_{ \stackrel {}{B}}\). Let and \(b^{\min }\) be the optimal values of \(d_{ \stackrel {}{B}}\) and b, respectively. Now we change the objective function of BCLP from minimize to maximize, and let and \(b^{\max }\) be the optimal values of \({d_{ \stackrel {}{B}}}\) and b, respectively, for the maximize linear program. The box counting dimension \({d_{ \stackrel {}{B}}}\), assumed to exist by (4.10), satisfies

For example [44], for the much-studied jazz network [19], the linear program BCLP is feasible, and solving the minimize and maximize linear programs yields \(2.11 \leq {d_{ \stackrel {}{B}}} \leq 2.59\).

Feasibility of BCLP does not imply that the box counting relationship (4.10) holds, since the upper and lower bounds might be so far apart that alternative relationships could be posited. If the linear program is infeasible, we can assert that the network does not satisfy the box counting relationship (4.10). Yet even if BCLP is infeasible, it might be so “close” to feasible that we nonetheless want to calculate \({d_{ \stackrel {}{B}}}\). When BCLP is infeasible, we can compute \({d_{ \stackrel {}{B}}}\) using the solution of BCQP (box counting quadratic program), which minimizes the sum of the squared distances to the 2K bounds [44]: