Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

The arrival of submicron era has created a huge difference on VLSI (very large scale integration): delay on interconnects has far exceeded that on gates so the total delay for a sink can no longer be simply assessed by the length of weighted edges which makes its routing more complicated than ever. Traditional methods for VLSI routing are either infeasible or with a low precision. In this chapter multiple objectives are comprehensively reflected as a cost the optimization problem has been abstracted as constructing a minimal rectilinear Steiner tree with rectangular obstacles (MRSTRO) under timing constraints. Then the relationship between cost sink delay is cautiously discussed partially proved to be contradictory using Elmore delay model which is of high fidelity. To effectively address the MRSTRO problem a synergy feedback based ant colony algorithm (SFB-ACO) is configuredimplemented. In SFB-ACO a synergy function is designed to lead each branch to join others thus reducing the total tree length. Additionally according to the intrinsic contradiction between objective constraint a constraint-oriented feedback module is introduced with the purpose of preventing over-constrain while regulating the formation of solutions. With configuration principle two modules are uniformly connected with existing ACO operators to form a hybridization of deterministic strategies evolutionary process. The experimental results have verified the advantage of SFB-ACO compared to other algorithms or practices on VLSI global routing.

1 Introduction

Global routing [14] is to arrange each part of the net into different wiring channels and determine the connection of nodes and their initial wiring courses while satisfying certain design requirements. Its result could have a significant impact on the success of follow-up detailed routing and the overall performance of the chip [5, 6]. Therefore, it is a core link in very large scale integration (VLSI) physical design. Wires in VLSI can be divided into several types: signal line, power line, ground line, clock line, etc., with various optimization objectives of each type. Generally speaking, the total length of interconnect, and the area of wiring district are required to be as small as possible, and time delay for signal lines, clock skew for clock lines are also needed to be considered. Other extra objectives include: power dissipation and heat loss should be reduced; noise and crosstalk between lines should be avoided, etc. These objectives can be comprehensively reflected by weighing each edge in the net; then the optimization problem is narrowed down on minimizing the weighted length of wires, which is usually called a cost. The procedure for constructing interconnects for VLSI global routing has been abstracted as a minimal rectilinear Steiner tree with rectangular obstacles (MRSTRO) problem.

Studies on Steiner tree in Graph theory can be traced to 1941, when Courant and Robbins [7] pointed out that for a net consisting of n endpoints, at most n-2 points are needed to be introduced, and together with the original points, the cost of Steiner tree established on them can be reduced to the lowest. These introduced points are called Steiner points. (Note that not all yielding points are Steiner points, as illustrated in Fig. 8.1) Apparently, how to pick them correctly is a key issue of Minimal Steiner tree (MST) problem [4, 810], which has been proved to be a NP-hard problem [4, 9, 11]. The computational amount of some exact algorithms [9, 10, 12, 13] for Steiner tree increases exponentially as the number of nodes increases. Particularly, wires in VLSI must follow Manhattan routing architecture [14, 15], in which only two perpendicular wiring directions are allowed. The optimal tree spanned under such architecture is formatted as a minimal Rectilinear Steiner tree (MRST). Algorithms to perform MRST construction are usually computational-expensive in space, since it requires divisions between each pair of nodes and separated considerations on weighted cost of divided sections, thus a memory usage of prohibitively huge size.

Fig. 8.1
figure 1

Steiner points and non-Steiner points

To overcome these shortcomings, Hanan [16] defined Hanan points and gave out a well-known theorem that for any endpoints collection P, there exists a minimum Steiner tree solution, whose Steiner points set S is a subset of its Hanan points set U. Later Snyder [13] demonstrated that the above theorem can be extended to Manhattan space with higher dimensions. Such endeavors greatly make MRST a less overwhelming task. In addition, attentive scholars found that MST can serve as a good estimation to MRST, and tried to get an approximated solution by using MST as a starting point in efforts to either decompose nets into several two-pin subnets to ease maze routing [3, 17, 18], or simply adopt a pattern technique to restrict the connecting to be L-shape or Z-shape [3, 19, 20]. Hwang [21], in 1976, stated and proved the formula that \( cost\left( {MST} \right)/cost\left( {MRST} \right) \leq 3/ 2 \), and this number cannot be improved, which means the approximation cost of MST and MRST will reach 3/2 in the worst cases. Such a poor precision cannot be accepted in VLSI routing design. Meanwhile, the existence of obstacles, which is created by various macro modules, IP modules and some others on the chip, results in not only a more complicated process for deriving MRST from MST, but also a larger discrepancy between their costs. All above have led to an increasing popularity in finding MRST in a more straight and accurate way.

Thus far, numerous mature algorithms directed at MRST have been put forward, such as Geo Steiner package [12, 22], edge based heuristic algorithm [23], 1-Steiner heuristic [2426] etc. Recently, FLUTE [27, 28] propositioned with improved performance for it is optimal for nets up to degree 9 and is still very accurate for nets up to degree 100. It has been well appreciated and its direct applications are BoxRouter [3, 29] and FastRouter [3, 30, 31]. For tackling MRSTRO problems [3236] constructed an obstacle-avoiding Steiner tree for an arbitrary \( \lambda \)-geometry by Delaunay triangulation, and demonstrated that it outperformed the conventional construction-by-correction approach [35]. Most algorithms mentioned above are heuristic, which facilitate the finding of a near-optimal solution within a relatively short period of time, thereby has been widely used in multicast network optimization [4]. However, most nets in VLSI circuits have a low degree [28], so rather than having a low runtime complexity, the quality of solution is a more important factor. Discouragingly, up till now, none of the heuristic algorithms can attain twice better performance in the best cases than in the worst ones. Rita and Bryant successfully applied genetic algorithm (GA) in MRSTRO based on MST [36], and Consoli [37] proposed a Jumping Particle Swarm Optimization methodology for addressing the minimum labelling Steiner tree problem, both of which imply a bright and prospective application of intelligent algorithms in VLSI routing [38].

Among intelligent algorithms, ant colony optimization (ACO) [3941] is a kind of bionic algorithm suggested and quickly developed by Dorigo. Using pheromone to transmit messages between ants, its biggest characteristic is to subtly integrate information of historical experience, excellent solutions and their interactions in a distributed way through weighted edges in the searching space. By receiving positive feedback of pheromone as well as heuristic guidance, the searching and message exchanging efficiency and its quality can be guaranteed, thus gradually becoming a very promising algorithm. At present, ACO is still at the very outset of its development, and is mainly used in path planning and has received better results than genetic algorithm (GA) and simulated annealing algorithm (SA). MRSTRO belongs to path planning, whose optimal route can be excavated relying on distributed information of edges. However, it distinguishes itself from general path planning for it usually contains multiple endpoints. Researches on how to apply ACO on multi-terminal connection is still rare.

On the other hand, integrated circuit develops towards a high-speed and high-integration-level trend. With the coming of deep submicron times, interconnect on VLSI has become thinner and longer, leading to a substantial increase on its resistance and capacitance. Consequently, the delay on interconnect is no longer negligible, while that on gate decreases as its feature size shrinks. For instance, for 100 nm, the intrinsic switching delay of a MOSFET is 5 ps, whereas the RC response time for 1 mm of interconnect is 30 ps. At 35 nm, this 6-to-1 differential turns into a 100-to-1 difference [42]. These changes have made interconnect routing on VLSI very different from before [43]. Previous Linear delay estimator being the Manhattan distance between nodes, which is more commonsense-based, pales in fidelity compared to Elmore delay [42, 4447], in which a routing tree with shortest length, though possessing a comparative small wiring area, and sometimes a relatively better synchronicity in critical sinks, pays at other prices. By maximizing sharing of tree’s branches, it adds extra nodes to the mainstream from source to sinks and this could severely lengthen the delay. As a result, the delay constraints at some sinks may be violated, which adversely affects the performance of circuits or even leaving it malfunctioning. Since the calculation of each sink’s delay depends on the structure of Steiner tree and is highly coupled with other branches, such information is rather difficult to be incorporated into distributed edges and hence cannot be appraised by tree’s cost. Therefore, traditional approach, to empirically identify delay as connecting length, and then focus objectives of global routing on reducing the total cost, is not applicable in today’s deep submicron regime.

Based on the above analysis, the delay on each sink is intrinsically contradicted to the total wiring length. In other words, to ensure the least delay on a particular sink, what we need to do is just to link it directly with the source, which may lead to a star-like topology of Steiner tree. Obviously, its total cost is relatively high and thus unwanted. When the delay calculated from the resulting tree is less than the given constraint, it usually means that there is still possibility for merging branches to reduce cost. Therefore, the ideal situation is that delay on each sink should be less than but as close as possible to their respective constraints. Normal methods to deal with optimization problems with constraints can be roughly summarized as follows: one is to accept or abandon a solution (AAS) directly in relation to its eligibility to meet the constraints, and the other one is to bring in a penalty function to turn the questions into non-constraint ones. The former one fails to make full use of solutions with good target values but cannot satisfy the constraints, and in the latter one, likewise, such solutions suffer from punishment and then degrade. In this context, we hope to take advantage of delay information in the last iteration as guidance for generating solutions in the next. Inspired by the positive feedback in ACO, and considering the contradictory relationship between objectives and constraints in VLSI global routing, a negative feedback is introduced to reconcile merging of branches according to the degree how a constraint is satisfied. Specifically, if the delay constraint on a sink is severely violated, its synergy coefficient decreases so as to restrain meeting with others, and otherwise increases to encourage so.

The primary works of this chapter are as follows:

Optimization on global routing in VLSI is abstracted as a MRSTRO problem. For addressing this problem, the MST constructing process is skipped, and an enhanced ACO, which contains a synergy function other than the pheromone and heuristic factors, is proposed and applied in multi-terminal path planning problems.

Differences on VLSI routing between today’s deep submicron era and before are carefully investigated. A more accurate Elmore delay is employed and a constraint-oriented feedback is introduced to adjust branch’s merging with others to prevent the case of over-constrain.

Through experimental tests, the effectiveness of synergy function and constraint-oriented feedback in our proposed SFB-ACO is verified by comparisons with other algorithms or practices.

2 Preliminary

2.1 Terminology in Steiner Tree

The model for VLSI global routing in this chapter is based on MRSTRO, where Steiner tree consists of a collection of given points, additional introduced Steiner points and their connecting relationships. For any two points in Steiner tree, one and only one path can be found. Other requirements in VLSI routing include: either horizontal or vertical wiring lines, no transverses across functional area on the chip. Here, some interpretations related to Steiner tree [8, 10] need to be given as follows.

  • Root, Node, Leaf and Edge

    Any point consisting of a Steiner tree is called a node, whose set, denoted as T, is a union of P and S. Connection between two nodes is called an edge, which defines a parent-child relationship. Within the structural hierarchy in the tree, there is a node with special status, usually called as a root. The closer to the root, the higher rank of the node is. Leaf is defined as a node whose degree is 1, namely the point that only has one connection.

  • Steiner Points and Hanan Points

    Any additionally introduced points that can help reduce the length of the spanning tree are called as Steiner Points. By drawing horizontal and vertical lines through points in P, we can obtain a Hanan grid. The intersections of the grid are called Hanan points, and its collection is indicated as U.

  • Mainstream, Segment and Subtree

    For any element in P, the path from it to the root is called its mainstream, and the connection between two adjacent nodes in the mainstream is called a segment. A partial tree rooted in node T i , and consisting of T i and all its child nodes are called a subtree of T i , denoted as Sub(i).

  • Connectivity

    In the wiring diagram, the existence of obstacles may prevent some Hanan points to be selected as Steiner points. In the cases where exists a pair of nodes P 1 and P 2, and their Hanan points U 1 and U 2, their connection cannot be completed by simply choosing U 1 or U 2 to be the yielding point, but requires two or more, such situation is called non-connectable; otherwise, we call it connectable, illustrated in Fig. 8.2.

    Fig. 8.2
    figure 2

    Instances of connectivity a connectable b non-connectable

2.2 Elmore Delay

Elmore delay is a relatively accurate and commonly used model when calculating signal delay over the network. In today’s deep submicron era, delay on the VLSI interconnect can no longer be ignored. For a wire with length L, it can be divided into N segments and each is measured as \( \Updelta L \); then it can be described by a RC network model illustrated in Fig. 8.3.

Fig. 8.3
figure 3

RC model for a wire

Assuming that the wire itself is homogeneous, that is, the resistance and capacitance per unit length is a constant, denoted as \( R_{rate} \) and \( C_{rate} \), respectively, then the total delay along this wire can be expressed in Eq. (8.1) [45].

$$ \begin{gathered} \tau_{L} = \left( {R_{rate} \cdot \Updelta L} \right)\left( {C_{rate} \cdot \Updelta L} \right) + 2\left( {R_{rate} \cdot \Updelta L} \right)\left( {C_{rate} \cdot \Updelta L} \right) + \cdots + N\left( {R_{rate} \cdot \Updelta L} \right)\left( {C_{rate} \cdot \Updelta L} \right)\; \hfill \\ \, = R_{rate} \cdot C_{rate} \cdot (\Updelta L)^{2} \cdot \sum\limits_{i = 1}^{N} i = \frac{1}{2}R_{L} \cdot C_{L} \hfill \\ \end{gathered} $$
(8.1)

where, \( R_{L} \) and \( C_{L} \) represent the wire’s total resistance and capacitance.

For a node \( T_{i} \) in Steiner tree, the signal delay from the root T 0 to \( T_{i} \) can be formulated as Eq. (8.2) [46].

$$ \tau (T_{i} ) = R_{{T_{0} }} \cdot C_{{T_{0} }} + \sum\limits_{all \, segments \, along \, mainstream(i)}{\tau_{{e_{j} }} } $$
(8.2)
$$ \tau_{{e_{j} }} = R_{{T_{j} }} \cdot C_{{T_{j} }} + R_{{e_{j} }} \left( {\frac{{C_{{e_{j} }} }}{2} + C_{sub(j)} } \right) $$
(8.3)
$$ C_{sub(j)} = \sum\limits_{all \, nodes \, in \, subtree(j)}{C_{{T_{k} }} } + \sum\limits_{all \, edges \, along \, subtree(j)}{C_{{e_{l} }} } $$
(8.4)

In Eq. (8.2), \( R_{{T_{i} }} (i = 0,1,2, \ldots ) \) represents the resistance to drive node \( T_{i} \), \( C_{{T_{i} }} (i = 0,1,2, \ldots ) \) represents capacitance of node T i , and \( e_{j} \) represents the edge from \( T_{j} \) to its next nearest node along \( T_{i} \)’s mainstream. Correspondingly, delay along this edge is denoted as \( \tau_{{e_{j} }} \), and computed by Eq. (8.3), where \( R_{{e_{j} }} \) and \( C_{{e_{j} }} \) respectively represent the total resistance and capacitance of the edge, and \( C_{sub(j)} \) is defined as the equivalent capacitance of subtree rooted in \( T_{j} \)’s nearest child node along the mainstream, which is the sum of capacitance of all nodes and edges in the subtree. If the child node of \( T_{j} \) is a leaf, then the value of \( C_{sub(j)} \) is identical to the capacitance of this leaf.

Up till now, for each node in the Steiner tree, its delay from the root along its mainstream can be calculated iteratively according to Eqs. (8.2)–(8.4).

2.3 Problem Formulation

This section discusses MRETRO problem with timing constraints for VLSI global routing in a deep submicron era. In this section, the relationship between the sink delay and tree length is scrutinized, and appropriate candidate pool for Steiner points is determined on account of solution precision and space complexity.

Given a point set \( P = \{ P_{i} |i = 1:n\} \) corresponding to the sinks on the VLSI chip to be optimized, where n is the number of sinks, and root \( T_{0} \) corresponds to the source on chip. For each sink and source, it can be located by its coordinate \( (x_{i} ,y_{i} ) \) on board. Besides, there is a collection of modules, viewed as obstacles, the actual shape of which does not affect the area for wiring because of the Manhattan rule in VLSI, and thus can be formulated or divided into a number of rectangles. These rectangular obstacles are denoted as \( R = \{ R_{i} |i = 1:r\} \), where r is the number of obstacles, and its position and size are expressed by its bottom-left and upper-right vertex coordinates. Upon the basis of sinks and source, some other special points, known as Steiner Points, are needed to be introduced. How to construct a MRST using these points, and at the same time, not violate the delay constraint of each sink? This is an issue which needs to be addressed here.

As mentioned before, the wiring length, chip area, power consumption, heat loss, and clock synchronicity can be accessed by the total cost of weighted edges in the spanning tree. Here all the wires in the chip are assumed to be homogeneous, which means that all edges are of uniform weight (which is set to 1), so that minimizing the total cost of Steiner tree is equivalent to minimizing its total length. Also, the delay constraint of each sink is set to be \( T_{limit} = \{ T_{limit} (i)|i = 1:n\} \), then the optimization problem can be formulated as follows.

$$ Min\left( {\sum\limits_{all \, segments}{L_{e} } } \right) $$
(8.5)
$$ subject\quad to\quad \tau (T_{i} ) \le T_{limit} (i),\quad \forall i = 1:n $$
(8.6)

In Eqs. (8.5) and (8.6), \( L_{e} \) represents the length of each segment in the spanning tree, and \( \tau \) and \( T_{limit} \) respectively represent the actual Elmore delay of the sink and its delay constraint.

Apparently, the selection of Steiner points is the key to solve above problem. Note that if rendering the candidate pool to be infinite or with little limitation, it will unavoidably increase the space complexity of the problem.

Theorem 1 (Hanan [16]) For any MRST, all of its Steiner points are Hanan points.

Corollary 1 If a MRSTRO problem is solvable, its optimal solution can be obtained by selecting Steiner points from Hanan points set or from points located in the rim of obstacles.

Corollary 2 For two points \( T_{i} \) and \( T_{j} \) to be connected, the shortest path between them is equal to the Manhattan distance between them, as defined in Eq. (8.7), when they are connectable; otherwise it should at least contain one portion of obstacle’s edge.

$$ D(T_{i} ,T_{j} ) = \left| {x_{i} - x_{j} } \right| + \left| {y_{i} - y_{j} } \right| $$
(8.7)

Theorem 2 For a partial tree T and an unconnected point \( P_{k} \) outside the tree, the best location in segment for a Steiner point to connect \( P_{k} \) to T to control the tradeoff between tree length and sink delay, should lie between SP and CUC, as shown in Fig. 8.4 , where SP is the shadow point of \( P_{k} \) to the segment, and CUC is the closest upstream connection to \( P_{k} \).

Fig. 8.4
figure 4

Diagram of CUC, SP and CDC

Proof Let \( T_{i} \) denote the closer-to-root endpoint of the segment to be connected, which is CUC, and its coordinate to be (0, 0). Let the other endpoint, i.e., closest downstream connection (CDC), to be indicated by \( T_{j} \), with its coordinate being (\( x_{j} \) ,0). Let the coordinate of \( P_{k} \) to be \( (x_{k} ,y) \), and its shadow point in segment to be \( (x_{k} ,0) \). Let \( R_{{T_{i} }} \) and \( C_{{T_{i} }} \) denote the resistance and capacitance of node \( T_{i} \), respectively. Let \( R_{s} \) and \( C_{s} \) respectively represent the Steiner point’s resistance and capacitance if it does not lie on CUC or CDC. Denote the resistance and capacitance per unit length on segment by \( R_{rate} \) and \( C_{rate} \), respectively, and the equivalent capacitance of subtree rooted in \( T_{j} \) to be \( C_{sub(j)} \), and assume that the coordinate for selected Steiner point on segment is (x, 0). Then according to Elmore model, delay from source to node \( P_{k} \) along its mainstream can be expressed as follows.

$$ \tau (P_{k} ) = \tau (predecessor) + \tau_{{e_{i} }} + \tau_{{e_{S} }} $$
(8.8)

In Eq. (8.8), \( \tau (predecessor) \) represents the signal delay from source to node \( T_{i} \), \( \tau_{{e_{i} }} \) represents delay from node \( T_{i} \) to the selected Steiner point, and \( \tau_{{e_{s} }} \) represents delay from the Steiner point to node \( P_{k} \).

Additionally, we can easily infer from Eqs. (8.2)–(8.4) that the connecting of \( P_{k} \) only affects the value of \( C_{sub} \), and for each \( C_{sub(i)} \) in the upstream route, it can be expressed as in Eq. (8.9).

$$ C_{sub(l)} = C_{predecessor} + C_{sub(i)} ,\quad \forall T_{l} \; \in \;Predecessor\left( {T_{i} } \right) $$
(8.9)

So that we can rewrite Elmore delay to be a linear function of \( C_{sub(i)} \), expressed as follows.

$$ \tau (predecessor) = ConA + ConB \cdot C_{sub(i)} $$
(8.10)

Where ConA and ConB are both constant, which are only related to the resistance and capacitance of \( T_{i} \)’s upstream route, respectively. The change of \( C_{sub(i)} \) has nothing to do with the value of ConA or ConB.

Hence the influence of Steiner point’s location on \( C_{sub(i)} \) can be calculated according to Eq. (8.11).

$$ C_{sub(i)} = C_{{T_{i} }} + C_{S} + C_{{P_{k} }} + C_{sub(j)} + C_{rate} \cdot \left( {x_{j} + \left| {x_{k} - x} \right| + y} \right) $$
(8.11)

Also, its influence on \( \tau_{{e_{i} }} \) and \( \tau_{{e_{s} }} \) can be expressed as in Eqs. (8.12) and (8.13), respectively.

$$ \tau_{{e_{i} }} = R_{{T_{i} }} \cdot C_{{T_{i} }} + R_{rate} \cdot x \cdot \left\{ {C_{rate} \cdot \frac{x}{2} + C_{S} + C_{sub(j)} + C_{{P_{k} }} + C_{rate} \left[ {(x_{j} - x) + \left| {x_{k} - x} \right| + y} \right]} \right\} $$
(8.12)
$$ \tau_{{e_{S} }} = R_{S} \cdot C_{S} + R_{rate} \cdot \left( {\left| {x_{k} - x} \right| + y} \right) \cdot \left( {C_{rate} \cdot \frac{{\left| {x_{k} - x} \right| + y}}{2} + C_{{P_{k} }} } \right) $$
(8.13)

According to Eqs. (8.8)–(8.13), we can easily find that \( \tau \left( {P_{k} } \right) \) is a piecewise quadratic function of x, expressed as follows.

$$ \tau \left( {P_{k} } \right) = A + Bx + Cx^{2} $$
(8.14)

where

$$ A = \left\{ {\begin{array}{*{20}c} \begin{gathered} ConA + ConB\left[ {C_{{T_{i} }} + C_{S} + C_{{P_{k} }} + C_{sub(j)} + C_{rate} \left( {x_{j} + x_{k} + y} \right)} \right] + R_{{T_{i} }} C_{{T_{i} }} \hfill \\ \, + R_{S} C_{S} + R_{rate} \left( {y + x_{k} } \right)\left( {C_{rate} \cdot \frac{{y + x_{k} }}{2} + C_{{P_{k} }} } \right), \quad 0 \le x \le x_{k} \hfill \\ \end{gathered} \\ \begin{gathered} ConA + ConB\left[ {C_{{T_{i} }} + C_{S} + C_{{P_{k} }} + C_{sub(j)} + C_{rate} \left( {x_{j} - x_{k} + y} \right)} \right] + R_{{T_{i} }} C_{{T_{i} }} \hfill \\ \, + R_{S} C_{S} + R_{rate} \left( {y - x_{k} } \right)\left( {C_{rate} \cdot \frac{{y - x_{k} }}{2} + C_{{P_{k} }} } \right), \quad x_{k} < x < x_{j} \hfill \\ \end{gathered} \\ \end{array} } \right. $$
(8.15)
$$ B = \left\{ {\begin{array}{*{20}c} { - ConB \cdot C_{rate} + R_{rate} \left( {C_{S} + C_{sub(j)} + C_{rate} \cdot x_{j} } \right), \quad 0 \le x \le x_{k} } \\ { \, ConB \cdot C_{rate} + R_{rate} \left( {C_{S} + C_{sub(j)} + C_{rate} \cdot x_{j} } \right), \quad x_{k} < x < x_{j} } \\ \end{array} } \right. $$
(8.16)
$$ C = \left\{ {\begin{array}{*{20}c} { - R_{rate} \cdot C_{rate} , \, 0 \le x \le x_{k} } \\ { \, R_{rate} \cdot C_{rate} , \, x_{k} < x < x_{j} } \\ \end{array} } \right. $$
(8.17)

Other nodes, apart from those which directly connect to the source without any other nodes within the mainstream, their signal delay will also be affected due to the newly connected point \( P_{k} \). Among them, nodes located on \( T/Sub(i) \) mainly suffer from the change of \( C_{sub(i)} \), and the major cause for delay variation of those on \( Sub(j) \) will be the increase of segment number and their corresponding delay change along the mainstream. Delay on these nodes is also a piecewise quadratic function of x, which can be get as above.

Qualitatively drawing curves to depict signal delay of \( P_{k} \) and nodes distributed in other positions, as shown in Fig. 8.5, we can easily tell that delay on all sinks increase when the Steiner point is inserted after SP. And apparently, the tree length is longer compared to the situation when the Steiner point lies before SP. Therefore, an appropriate location for the new Steiner point should be between CUC and SP. In this region, the length of spanning tree gradually decreases when slowly shifting Steiner point backwards, and reaches its lowest point when at SP. Another conclusion drawn from Fig. 8.5 is that, delay on each sink continuously changes as the Steiner point moves between CUC and SP, and some of them change in the opposite direction, which implies, that the timing conditions at different sinks are sometimes contradictory when adjusting position of one Steiner point. This makes it possible for us to artificially regulating the synergy function of branches in order to meet their respective timing constraints.

Fig. 8.5
figure 5

Change of signal delay on different nodes

From above, we also see that limiting the candidate Steiner points to the Hanan pool is actually not conductive to the adjustment of sink’s delay. And according to Corollary 1 and 2, the Hanan points are not enough if the design model is non-connectable itself. Taking the space complexity into account, \( S = U_{refined} \cup RIM \cup PEAK \) can serve as an appropriate candidate pool for Steiner points, where \( U_{refined} \) is a collection consisting of Hanan points that lies off the obstacle region, RIM is comprised of intersections created by drawing horizontal and vertical lines from sinks to the obstacles’ rims, and PEAK represents the collection of all rectangular obstacles’ vertices, as illustrated in the right panel of Fig. 8.8. □

3 SFB-ACO for Addressing MSTRO Problem

This section starts with a brief overview of ACO on two-endpoint path planning, and based on it, a SFB-ACO algorithm encompassing a synergy function and constraint-oriented feedback is proposed and then applied on multi-terminal routing optimization presented before.

3.1 ACO for Path Planning with Two Endpoints

The basic idea for traditional ACO can be summarized as below. Using ants’ paths to represent feasible solutions, all paths searched can constitute the whole solution space for the given optimization problem. Let ants release more pheromone on the path whose total length is relatively shorter, and as time goes by, more and more pheromone can be accumulated on such paths, and thus they are more likely to be selected by other ants. Influenced by such an intense positive feedback, ants will ultimately converge into an optimal path with shortest length, and this path is also called as the optimal solution.

The main framework for standard ACO on path planning with two-endpoints is depicted in Fig. 8.6.

Fig. 8.6
figure 6

Framework of standard ACO

At first, pheromone concentration on all edges is the same, denoted as \( \zeta_{ij} (0) = \tau_{0} \). Ant k (k = 1,2,…,m) will choose the next node to visit according to the amount of pheromone deposited on edge as well as heuristic information, and the corresponding transferring rate for ant k to move from node i to node j can be denoted as \( P_{ij}^{k} (t) \), and calculated in Eq. (8.18).

$$ P_{ij}^{k} = \left\{ {\begin{array}{*{20}l} {\frac{{\left( {\zeta_{ij} } \right)^{\alpha } \cdot \left( {\eta_{ij} } \right)^{\beta } }}{{\sum\limits_{{S \in allow_{k} }} {\left( {\zeta_{ij} } \right)^{\alpha } \cdot \left( {\eta_{ij} } \right)^{\beta } } }},\quad S \in allow_{k} } \\ {0,\qquad\qquad\qquad\;\;S \notin allow_{k} \quad } \\ \end{array} } \right. $$
(8.18)

where \( \zeta_{ij} \) represents the pheromone concentration on edge between i and j, \( \eta_{ij} \) represents heuristic function to signify expectation for ants to move from i to j, allow k represents the node collection that are allowed for ant k to visit, \( \alpha \) is a pheromone factor, whose value represents the importance degree of pheromone concentration in ant’s transferring and value of \( \beta \), referred to as heuristic factor, represents that degree of heuristic information.

At the same time, the pheromone concentration on each edge will be updated with its formulation expressed as below.

$$ \zeta_{ij} (t + 1) = (1 - \rho ) \cdot \zeta_{ij} (t) + \Updelta \zeta_{ij} $$
(8.19)
$$ \Updelta \zeta_{ij} = \sum\limits_{k = 1}^{n} {\Updelta \zeta_{ij}^{k} } $$
(8.20)

where \( \rho \) represents the degree of pheromone evaporation, and its value lies on region [0,1], \( \Updelta \zeta_{ij}^{k} \) represents pheromone released by ant k on edge connected from i to j, and \( \Updelta \zeta_{ij} \) represents the total pheromone released by all ants on this edge.

As for the updating mechanism, Dorigo has given three different models, which are ant cycle system, ant quantity system, and ant density system. Among them, the first one employs the global information on ants’ routing, thus being most commonly used. Its updating mechanism is introduced as follows.

$$ \Updelta \zeta_{ij}^{k} = \left\{ {\begin{array}{*{20}l} {\frac{Q}{{L_{k} }},\quad if \, ant(k) \, visit \, node(j) \, from \, node(i)} \\ {0,\quad\;\; otherwise} \\ \end{array} } \right. $$
(8.21)

where Q is a constant representing the total amount of pheromone released in one cycle, and \( L_{k} \) is identical to the length of ant k’s route.

Standard ACO is perfectly suitable for shortest path planning with a sole source and destination. However, in a VLSI circuit board, there are multiple sinks and one source; what requires to be optimized is not the separate path from each sink to the source as in standard ACO, but the whole spanning tree created by all these points. In other words, no branch is completely independent, and only by merging branches in the maximum degree can we expect the shortest length of the tree. That’s the reason that a synergy function is introduced in our proposed SFB-ACO.

3.2 Procedure for Constructing Steiner Tree Using SFB-ACO

Here we incorporate a synergy matrix \( \gamma \) with size \( n \times n \), where \( \gamma (i,j) \) represents the function for branch i to join in branch j. The procedure for constructing a Steiner tree is described as in Fig. 8.7. For n sinks to be connected in VLSI, let the number of ants in one group to be n, and that of ant groups to be m. Let the initial positions for n ants in one group to be the places where sink 1, sink 2, …, sink n lie. S is the candidate pool for Steiner points, and in combination with the source and sinks, they make up a point collection for ants to visit. Similar to standard ACO, each ant chooses its next node according to the transferring rate, and creates its own routing table. If any ant in the group transfers to the source or to nodes that have previously been visited by the other ants in its group, it succeeds in finishing its task and its travelling ends. Upon all ants in the group finish their tasks, we check the route to see whether it is a Steiner tree and record its length if yes. Otherwise, if any ant encounters a dead corner, i.e., there are not any allowed nodes to choose, we label a failure on the group and cancel all movements of its ants. Hereto, we call it one cycle. The pheromone concentration is updated once in a cycle, and only the group who successfully finishes the task can release pheromone on its path. As the pheromone accumulates through several cycles, the optimal Steiner tree can be found.

Fig. 8.7
figure 7

Framework of SFB-ACO

The pseudo code for function FindRoute in Fig. 8.7 is as follows.

In line 9, the transferring rate is calculated as follows.

$$ P_{ij}^{k} = \left\{ {\begin{array}{*{20}l} {\frac{{\left( {\varsigma_{ij} } \right)^{\alpha } \cdot \left( {\eta_{ij} } \right)^{\beta } \cdot fun_{{\gamma_{ij} }}^{k} }}{{\sum\limits_{{s \in allow_{k} }} {\left( {\varsigma_{ij} } \right)^{\alpha } \cdot \left( {\eta_{ij} } \right)^{\beta } \cdot fun_{{\gamma_{ij} }}^{k} } }},S \in allow_{k} } \\ {0, \qquad \qquad\qquad \, S \in allow_{k} } \\ \end{array} } \right. $$
(8.22)

where

$$ \eta_{ij} = 1/D\left( {T_{0} ,T_{j} } \right) $$
(8.23)
$$ fun_{{\gamma_{ij} }}^{k} = \prod\limits_{r = 1,r \ne k}^{n} {\left[ {\frac{{\sum\limits_{l = 1}^{{N_{node}^{r} }} {\frac{1}{{D\left( {T_{j} ,T_{{tabu^{r} (l)}} } \right)}}} }}{{N_{node}^{r} }}} \right]^{{\gamma_{kr} }} } $$
(8.24)

In Eq. (8.22), \( \eta_{ij} \) is identical to the Manhattan distance from source to node j, expressed as in Eq. (8.23). \( fun_{{\gamma_{ij} }}^{k} \) represents the synergy function for ant k to transfer from node i to node j, whose value can be obtained by Eq. (8.24), where \( T_{{tabu^{\tau } (l)}} \) represents the node in the route table, \( N_{node}^{r} \) represents the number of nodes that have been visited by ant r, and \( \gamma_{kr} \) represents the importance of synergy for branch k to join in branch r.

The pseudo code for function Checking in Fig. 8.7 is as follows.

Another innovative practice referring to the dealing with constraints will be presented in the next subsection.

3.3 Constraint-Oriented Feedback in SFB-ACO

Directed by the above analysis, Elmore delay on each sink is closely related to the number of Steiner points, their positions along the mainstream, and the connecting topology of other branches. To put it simple, the more meeting with others, the more reduction it may cause on the length of spanning tree, whereas adding delay on relevant sinks. Therefore, the value in the synergy matrix will have direct impact on its corresponding sink’s delay. Different from early methods dealing with solutions that break the constraints, which is either abandoning them or punishing them, this chapter presents a constraint-oriented feedback on elements in \( \gamma \) with the purpose to prevent the case of over-constrain.

First, the definition of constraint breach should be clarified.

$$ Breach\left( {T_{i} } \right) = \tau \left( {T_{i} } \right) - T_{\textit{limit}} (i) $$
(8.25)

Obviously, any positive value in vector Breach reveals a violation to the constraint. Otherwise, it may indicate a situation of over-constrain, which means that there is still space for further reducing the tree’s length. Therefore, the ideal value in vector Breach should be equal to zero. However, this may be rather difficult, since the candidate pool for Steiner points we select is far from infinite. For that reason, the value in Breach should be lesser and as close as possible to zero.

The mechanism to regulate \( \gamma \) based on Breach is shown in Fig. 8.7 with red marks, in which the procedure of finding route path for n ants is the same as in Sect. 8.3.2, and the calculation of sink delay is based on Elmore model given in Sect. 8.2.

In addition, the pseudo code for pheromone updating function UpdatePh is as follows.

Coefficient decay_zeta in above pseudo code is calculated as in Eq. (8.26).

$$ decay\_zeta = e^{{ - \lambda \times \left( {\sum\limits_{{l = 1:n,{\text{ and Breach}}_{\text{i}} (l) > 0}} {\left| {Breach_{i} (l)} \right|} } \right)}} $$
(8.26)

where \( \lambda \) is a constant to be determined, and decay_zeta represents the decaying degree of pheromone accumulated because of violation of constraints.

The pseudo code for synergy regulation function RegulateSy is as follows.

The negative feedback introduced above can effectively direct and regulate the synergy function among branches, thus controlling tradeoff between length and delay. On the other hand, receiving positive feedback from pheromone, paths with desired objective value and can satisfy the constraints will be repeatedly strengthened and strengthened. At last, under the role of double feedback, an optimal solution with its Breach value all negative and closest-to-zero can be found.

4 Implementation and Results

Experiments have been conducted to evaluate the performance of our proposed SFB-ACO algorithm, and two groups of experiments are designed and carried out. In the first one, based on the same chip consisting of a certain number of sinks and obstacles, two different scale candidate pools for Steiner points are selected; renewed Prim [48], standard ACO, and SFB-ACO are then applied to optimize the routing using the above two pools, assessments with respect to each are made and roles of synergy function and pool size are carefully discussed. In the second experiment, stringent timing constraints are given according to the Elmore delay tested in the first experiment, and a constraint-oriented feedback is introduced in case of over-constrain, and its effectiveness has been validated through comparisons with AAS.

4.1 Parameters Selection

In order to apply relevant algorithms on VLSI routing, several parameters have to be determined. Arguments input related to the chip to be optimized, including source, sinks, obstacles and their positions can be graphically obtained from Fig. 8.8, where the above objects are indicated by red star, red circles, and cyan rectangular. Figure 8.8 also shows two selected candidate pools of different sizes for Steiner points, denoted as pool I and pool II from left to right.

Fig. 8.8
figure 8

Instance of chip to be optimized and two candidate pools for Steiner points

Parameters used in our proposed SFB-ACO are shown in Table 8.1, and those related to the calculation of Elmore delay are summarized in Table 8.2.

Table 8.1 Parameters related to algorithms
Table 8.2 Parameters related to Elmore delay

4.2 Improvement of Synergy

In the first experiment, timing constraints for each sink are set quite loose such that the problem is degraded as a MRSTRO without constraint. Adopting two candidate pools of different sizes for Steiner points, Table 8.3 records the respective results of renewed Prim, standard ACO, and our proposed SFB-ACO, and their optimal routing diagram are given in Figs. 8.9, 8.10, 8.11, 8.12 and 8.13 from top to bottom under two pools. Pool I is a simplified point set of Pool II, in which the points that are not easily accessible, namely, behind obstacles are removed to achieve a lower space complexity. We can see from the data, there is no much difference in the solution quality and convergence rate under two candidate pools. This indeed implies a possibility to reduce algorithm’s space complexity while not at the cost of its precision or efficiency. However, this is only valid when leaving the timing constraints aside. If these constraints are stringent, the points behind obstacles may be needed as additional choices for leading a constraint-meet topology of spanning tree. This is the reason that we adopt Pool II in our second experiment.

Table 8.3 Comparisons of different algorithms under different pools
Fig. 8.9
figure 9

Routing diagram given by renewed Prim

Fig. 8.10
figure 10

Routing diagram given by standard ACO using pool

Fig. 8.11
figure 11

Routing diagram given by standard ACO using pool II

Fig. 8.12
figure 12

Routing diagram given by SFB-ACO using pool I

Fig. 8.13
figure 13

Routing diagram given by SFB-ACO using pool II

In Table 8.3, renewed Prim is a kind of greedy algorithm similar to Prim algorithm but with several adjustments mainly considering the Manhattan architecture and obstacles and its primary mechanism can be described as below. Firstly, starting with a partial tree containing the source, each time we select the sink which has the shortest attainable Manhattan distance to the existing tree. With the selection of sink, the Steiner point can be determined, and therefore an edge between them can be established. The iteration procedure goes on until all sinks have been added to the tree. From above, we know renewed Prim is a relatively deterministic algorithm with quite high efficiency, and that explains why data related to average length and iterations are not recorded in the table. However, in the process of building tree, it is only guided by information given by the added nodes, but without any consideration about the effects it may have on sequential sinks. Its minimum length, 42, though good, is still not the optimal one, compared to 41 in SFB-ACO. Besides, because of its relative determinacy, it can only obtain solutions with a set of fixed delay on sinks, comparatively, which is absolutely not feasible with stringent timing constraint.

The last two lines in Table 8.3 strongly convince us the advantage of synergy function we introduce in SFB-ACO. Under either pool, SFB-ACO can result in a higher quality of solution and better efficiency of algorithm than standard ACO, 41 versus 43, 41 versus 42, 12 versus 30, and 18 versus 34, respectively. This is because under the function of synergy, branches are no longer independent: they try their best to find ways to join in the tree instead of to reach to the source. Once they merge into another, they just quit travelling and the total length can be reduced. Since the length obtained in their early iterations is already near-to-optimal, the algorithm can converge at a faster rate. Comparing the average length of standard ACO and SFB-ACO, 42 and 48, we also learn that the convergence status in SFB-ACO is better than that in standard ACO. As the iterations goes by, not all solutions can converge into the best one in standard ACO and this is because the pheromone released by the best solution do not have noticeable function on guiding the formatting of its sequential solutions. It, on the other side, implies that merely accounting for pheromone and heuristic information is not enough. Other force, such as our proposed synergy function, is indeed needed.

By comparison of data in Table 8.3 and routing scheme in Figs. 8.9, 8.10, 8.11, 8.12 and 8.13, we also see that same length of two schemes does not necessarily suggest the same topology of spanning tree, not to speak the same delay on each sink. Another purpose for recording Elmore delay in the last couple of columns is for later use as references to giving constraints.

4.3 Effectiveness of Constraint-Oriented Feedback

This part will use Pool II for candidate Steiner points, and based on the Elmore delay tested before, a more stringent timing constraint is given. Then through check experiments between conventional AAS and our constraint-oriented feedback, the effectiveness of the proposed practice on preventing over-constrain will be tested.

Table 8.3 illustrates the sink delay of routing solutions with the shortest length under different algorithms. Due to the relatively contradiction between sink delay and tree length, as well as the contrasting relationship between delays on different sinks, we can safely say that delays on some of sinks can be further reduced by increasing the total length or changing the topology of the final tree. Above analysis leads us to consider setting the timing constraint to be \( T_{limit} = [1.5,1.5,1.4,0.4,1.5,0.5] \).

Figures 8.14 and 8.15 depict the change of tree lengths during iterations, where BestF represents the length of best solutions, regardless of its violation to constraints, BestC represents length of best solutions that can meet the constraints, and AveC represents average length of solutions that can meet the constraints. If adopting AAS, only solutions under timing constraints will be reserved, and then release pheromone on corresponding paths; this procedure often requires a longer time for curves of AveC and BestC to meet, and the resulting length is not quite good. Instead, constraint-oriented feedback can take advantage of solutions that have better target value but slightly violate the constraints, by regulating little by little, also requiring quite a long process, can obtain a better solution, 44 compared to 46 in AAS. And finally, three curves in Fig. 8.15 merge together, implying that most of solutions reserved in the last iteration can satisfy the constraints so that the feedback regulation itself is converged. Figure 8.16, which depicts the change of Elmore delay of each sink during iterations, where color cyan, blue, green and, red respectively represent timing constraint, BestF, BestC, and AveC, also explains that points. In the early searching, the blue curves in most figures lie upon the cyan one, indicating that best solution among all feasible ones is somehow against constraints on some of its sinks. In the meanwhile, some of the green curves fall far below the cyan ones, leaving quite an allowance for improving the quality of solutions. As time goes on, the blue curve declines, so is the trend of the red one, while the green curve go through accommodations with others so as to make the overall breach smaller than before. Some sinks, like sink 4, have to larger their breaches, leaving chances for others to minish theirs. This change occurs in iterations around 5, 20, and 40, which corresponds to a step-down in BestC curve in Fig. 8.15.

Fig. 8.14
figure 14

Change of length under AAS

Fig. 8.15
figure 15

Change of length under constraint-oriented feedback

Fig. 8.16
figure 16

Change of delay on each sinks under constraint-oriented feedback

Figures 8.17 and 8.18 give out the final routing diagram under AAS and our constraint-oriented feedback. Table 8.4 records their respective shortest lengths and their corresponding Elmore delays and breaches on each sinks. The one with the shorter length does not necessarily possess the smallest delay on every sinks, but roughly speaking, the breach of it is comparatively closer to zero. Also, shortest length is not automatically equivalent to a near-to-zero value of all elements in its vector Breach. Instead, an accommodation between sinks must be considered, and that’s the reason why breaches under constraint-oriented feedback are not always smaller than in AAS. Therefore the contradictory relationship discussed before has been once again evidenced, and the effectiveness of our constraint-oriented feedback on preventing over-constrain is convincingly demonstrated.

Fig. 8.17
figure 17

Resulting routing diagram under AAS

Fig. 8.18
figure 18

Routing diagram under constraint-oriented feedback

Table 8.4 Comparisons between AAS and constraint-oriented feedback

5 Summary

The global routing in VLSI belongs to the multi-terminal path planning, and can be abstracted as a MRSTRO problem. With the coming of submicron age, delay on interconnect can no longer be ignored, which makes the optimization model much different from before. Previous algorithms of constructing Steiner tree are either inapplicable or far from satisfactory. This chapter presented a novel SFB-ACO algorithm, which can serve as a useful tool for net connection with multiple endpoints under constraints. In detail, the main contributions are concluded as follows.