Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

The Internet is increasingly changing the way we do everyday tasks at work, at home, and how we communicate with one another. In its entrails, the Internet is structured as a network of networks. From a bottom-up perspective, the Internet is made up of networks of routers, each one under the control of a single technical administration. These networks are called autonomous systems (AS). An AS can use an exterior gateway protocol to route packets to other ASes [35] forming one of the largest synthetic complex system ever built. The InternetFootnote 1 comprises a decentralized collection of more than 30,000 computer networks from all around the world. Two ASes are connected if and only if they establish a business relationship (customer-provider or peer-to-peer relationships), making the Internet a “living” self-organized system.

The topology of the Internet has been studied at inter-domain level by Faloutsos et al. [18]. The ASes exhibit a power-law degree distribution with an exponent of γ = 2. 1, and the average path length is near 3.2 standing out its small-world character. The networks whose degree distribution follows a power-law, at least asymptotically, are called scale-free networks. Table 1 summarizes the main properties of the network.

Table 1 Properties of the Internet ASes network for June 2009 [1]

The main function of the Internet is to forward information from an origin host, traversing switches, routers, and other network nodes, to reach our destination. To do so it uses the Border Gateway Protocol (BGP) [39], the routing protocol of the Internet, which uses a vector with end-to-end paths to guide the routing of information packets. This vector is called the Routing Table (RT), and it represents in fact a distributed global view of the network topology. To maintain this consistent map, routers need to exchange information of reachability through the network. This routing scheme is currently the root of one of the most challenging problems in network architecture: ensure the scalability of the Internet [30].

The Internet is growing abruptly wrapped in nontrivial dynamics. Empirical studies qualified this overall growth as the outcome of a net balance between births and deaths involving large fraction of nodes in the system [33]. This growth is estimated to be exponential, and the number of entries in the RT is currently growing at super-linear rate in the inter-domain level. Other dynamics such as failures to aggregate prefixes, address fragmentation, load balancing, and multi-homing are augmenting the RT demands [11]. This behavior is compromising the BGP scalability due to technological constrains. In addition, the RT updates to maintain the information of the shortest paths involve a huge amount of data exchange and significant convergence times of up to tens of seconds, hindering the communication process.

The poor scaling properties of routing schemes has been studied in depth within the context of compact routing (see [25] for a review). Unfortunately, these studies have concluded that in the presence of topology dynamics, a better scaling on Internet-like topologies is fundamentally impossible: while routing tables can be greatly reduced, the amount of messages per topology change cannot grow slower than linearly. This limitation has raised the need to explore new lines of research. Given the scale-free topology of Internet [33], the work of Boguñá et al. [7, 8] and our recent study [16, 17] rest on the presumption that the complex network theory is the natural framework to analyze and propose solutions. Here we review these works. This chapter is organized as follows. Section 2 briefly reviews some designs in the fields of compact routing and network architecture. In Sect. 3 we present two methodologies based on complex network theory to construct navigable maps of the Internet. Finally, in Sect. 4 we discuss the limitations of these new approaches and future directions to explore.

2 Schemes Based on Compact Routing and Network Architecture

To reduce the cost of communication networks, we should be concerned about the routing of messages through the network, which ideally should follow a shortest path, and the amount of information required, which should be minimal. Simple solutions can guarantee optimal shortest paths at the expenses of keeping in memory big routing tables, but these solutions are too expensive for large systems. In the research field of compact routing, Peleg and Upfal [34] addressed the trade-off between the average stretch (the average ratio of every path length relative to the shortest path) and the space needed to store RTs for general networks.

Earlier, Kleinrock and Kamoun [24] presented an alternative routing strategy to reduce the RT length in large networks. Their method intended to create a nested hierarchical structure of clusters (groups of closer nodes according to some measure), where any node only needs to maintain a small amount of information from distant nodes in other clusters, while it maintains complete information about its neighbors in the cluster. With this approach, the authors achieved a substantial size reduction of the RT (from N to lnN), at the expense of increasing the average message length due to extra labels. However, the stretch analysis is satisfactory only under certain topologies, those in which the shortest path distance between nodes rapidly increases with the network size. Because of this, scale-free networks (under the small-world phenomenon) are not good scenarios for hierarchical routing, suffering severe increments of the path length. Despite the problem described, hierarchical routing is on the basis of the implanted inter-domain address strategy Classless Inter-Domain Routing (CIDR) [20]. CIDR was introduced in 1993 and allows the group (or aggregation) of addresses into blocks using bitwise masks, reducing the number of entries in the RT.

CIDR is an implicit scheme, i.e., the nodes are a priori labeled with structural information. The routing process explores this information to choose the neighbor to which a message should be sent [37]. It is also name-dependent or labeled routing [6], which means that nodes are tagged with topology-dependent information identifiers used to guide the packet forwarding. Among the newer name-dependent solutions, we found the schemes of Thorup and Zwick [40] and Brady and Cowen [9] interesting, which are highlighted here because of their proved efficiency in scale-free networks.

Thorup and Zwick (TZ) presented a scheme with stretch 3 and RT sizes of \(O({n{}^{1/2}\log }^{1/2}n)\)-bits, being these results for the worst-case graph. In the case of scale-free networks Krioukov et al. estimated the performance of this routing and found on average stretch of 1.1 and very small RT sizes [26]. The basic idea behind TZ is to use a small set of landmarks (nodes potentially involved in the process of routing) to guide the navigation. That reveals why we achieve the best possible performance in the TZ scheme: scale-free graphs are optimally structured to exploit high-degree nodes (that will turn out to be landmarks) which are very important for finding shortest paths in such networks [23]. In turn, Brady and Cowen (BC) designed a routing scheme for undirected and unweighted graphs with the basic idea that trees cover scale-free graphs with minor deviations. In scale-free networks BC scheme guarantees an average stretch of 1.1 and logarithmic scaling RT sizes of O(log2 n).

Alternatively to name-dependent solutions, we find name-independent routing schemes [6]. In this variant, nodes may be labeled arbitrarily making routing generally harder: first we need to know the location of the destination, i.e., we need dictionary tables to translate name-independent labels into locators in a name-dependent map. Abraham et al. [3] presented a nearly optimal name-independent routing for undirected graphs. The scheme has stretch 3, and a size upperbound of the RT of \(O(\sqrt{n})\) per node, the same upper limits that in TZ. According to these results, the use of name-independent schemes provides no clear advantage over the use of name-dependent routing [25]. Other name-independent routing solutions have been designed specifically to address the Internet routing problems [38]. Particularly noteworthy are the Locator-Identifier Split (LIS) approaches like LISP [19], ENCAPS [22], and NIMROD [12]. All these LIS proposals separate the identifier and the locator of each node allowing aggressive aggregation techniques. During the communication process, the locators are encapsulated in each packet in a special wrapper, and at inter-domain level, packets are forwarded using only these locators. Krioukov et al. [25] identified the problems that may affect these solutions: they require in general a database to maintain updated locator information, and due to its hierarchical structure, aggressive aggregation is impossible on scale-free topologies.

It must be emphasized that all these proposals are effective for static networks. Under time-varying networks, the proposals assume that each change in the network structure generates a new graph and its routing solution must be recalculated. Krioukov et al. [25] reported the pessimistic scenario that the communication cost lower bound for scale-free graphs is O(n). Nevertheless, adaptive solutions have not been studied in depth yet.

3 Solutions Based on Complex Network Theory

The complex networks theory gives us a new perspective, and new tools, to address the problem of scalability of the Internet routing protocol. The study of the special characteristics of the navigability of complex networks was initiated in 2000 by Kleinberg [23]. Kleinberg highlighted that in complex networks, without a global view of the network, a message can be routed efficiently between any pair of nodes.

Adamic et al. [4] studied the role that high connectivity nodes (hubs) play in the communication process. Hubs are important actors in the routing process that facilitate search and information distribution, especially in large networks. They also introduce several local search strategies that exploit high-degree nodes which have costs that scale sublinearly with the network size.

Another recent work that studies the routing process in scale-free networks was introduced by Lattanzi et al. [29]. This study is focused on social networks and uses the model of affiliation networks that considers the existence of an interest space lying underneath. The search is greedily conducted in this space, and their results show that low-degree nodes not connected directly to hubs are hard to find and that large hubs are essential for an efficient routing process.

In this section, we present two heuristic approaches that use the scale-free characteristic of the Internet to propose alternatives to BGP. The first work was presented by Boguñá et al. [8] and builds a navigable map of the Internet. The second is a study of the current authors on the routability of the Internet using local information from its structure [16].

3.1 Hyperbolic Mapping of the Internet

Let us introduce the model proposed by Boguñá et al. with an archetype. There is a classical example in artificial intelligence aimed to find the shortest path between two cities in Romania, from Arad to Bucharest [36]. Let us assume that we know the straight-line distances between all cities of Romania to Bucharest. Having this information, we can use an informed search strategy to find efficiently the shortest path to our destination. In the given example a successful strategy is to choose the neighbor city (with connection by road) whose distance to destination is shorter. Representing the problem as a network, an informed search method chooses a node with lowest cost based on a heuristic function, e.g., minimize the straight-line distance in the above case. Like in this case, if we can build a coordinate system that reveals the network structure in sufficient detail, at each point in our path, we can use this map to determine which direction to choose to take us closer to the destination. That is, a greedy algorithm that mimics the routing process.

According to this philosophy, Boguñá et al. built a map of the Internet on a hyperbolic geometric space [8]. It has been proved that hyperbolic geometry matches strong heterogeneity (in terms of the power-law degree distribution exponent) and clustering properties of complex network topologies [27]. To achieve this goal they used a combination of geometric features, a distance measure between nodes inversely proportional to their probability of being connected, and topological characteristics, in this case the degree of the nodes.

3.1.1 Mapping of the Network

In the first step the nodes are placed in a hyperbolic disk of radius R, uniformly distributed through the angular component \(\theta \in \left [0,2\pi \right ]\), and with radial coordinates r inversely proportional to the degree of each node. Nodes with higher degree will have smaller r values and will be closer to the center of the disk, and low-degree nodes will be more external. Secondly, Boguñá et al. computed the angular component θ to satisfy the requirement that nearby nodes in the hyperbolic space are connected, i.e., the probability p(x ij ) that two nodes i and j are connected decreases with the distance x ij between them. This distance can be calculated using the hyperbolic law of cosines

$$\displaystyle{ \cosh x_{ij} =\cosh r_{i}\cosh r_{j} -\sinh r_{i}\sinh r_{j}\cos \left (\theta _{i} -\theta _{j}\right )\,. }$$
(1)

The authors propose the following relationship between probabilities and distances:

$$\displaystyle{ p\left (x_{ij}\right ) ={ \left (1 + {e}^{\frac{x_{ij}-R} {2T} }\right )}^{-1}\,, }$$
(2)

where T is a parameter related with the clustering of the network.

The estimation of these coordinates (θ i , r i ) is performed by maximization of the likelihood that the Internet topology has been produced by this model. It is given by

$$\displaystyle{ L =\prod _{i<j}p{\left (x_{ij}\right )}^{a_{ij} }{\left [1 - p\left (x_{ij}\right )\right ]}^{a_{ij} }\,, }$$
(3)

where a ij are the elements of the adjacency matrix of the network.

3.1.2 Navigation on the Hyperbolic Map

Once we have a coordinate pair for each AS, we need to define the routing process over this map. Boguñá et al. took advantage of the characteristics of the underlying hyperbolic space to perform a greedy routing process. Hyperbolic spaces expand exponentially, making the distance between two points approximately the sum of their radial coordinates, with less influence of their angular difference, as can be proved from Eq. (1).

Krioukov et al. [27] established a congruence between the geodesic distances and the shortest paths. Like the trace drawn by the geodesics, shortest paths tend to originate in the outer hyperbolic disk, then getting closer to the center of the embedded space, and finally coming back to the exterior of the disk to reach the destination. Using this similarity, Boguñá et al. designed a greedy packet forwarding that selects at each step of the path the neighbor closest to the destination, following the geodesic. Figure 1 shows an example of a network hyperbolic space and a route path.

Fig. 1
figure 1

Hyperbolic mapping of a synthetic network. The dashed green lines correspond to two geodesic paths and the solid lines to the greedy routing paths (reprinted from [32])

3.1.3 Results

As a result of the high concordance between the network topology and underlying space, the authors achieved near optimal results. The average success ratio is 97%, a very high value considering that the nodes only have knowledge about its neighbors. The greedy paths are very close to the shortest path with an average stretch of 1.1. Furthermore, the local nature of the design minimizes the routing communication needs in front of dynamic topology changes, favoring the network scalability.

More important than the behavior of the scheme in the static network is how it adapts to the dynamic topology. Boguñá et al. have shown that its projection successfully overcomes the failures of links and nodes, with very small losses, and that the addition of new nodes is possible without recalculating the entire map. They have also shown that the projection of the nodes in the metric space is very stable in time eliminating the need to recalculate the coordinates.

3.2 Routing Using Modular Information

In 2000, Kleinberg published his work about navigation in small-world networks [23] pointing that the correlation between the local structure of a network and long-range links provides critical cues for finding paths through the network. We associate these terms of local structure and long-range links to the modular structure of the network. The modular structure (or community structure) refers to the clustering of nodes in communities, groups of nodes in the network more connected between them than with the rest of the network. Interestingly, most of the real-world networks present modular structure as a typical fingerprint of a self-organized and decentralized evolution, and in particular the Internet [21]. Thus, our initial hypothesis is that the community structure, which proves meaningful insights on the structure and function of complex networks, can be an important actor in the Internet routing properties. To exploit this, we propose to analyze the contribution of each node to the modules using the projection technique presented by Arenas et al. [5]. We will use this information to guide the forwarding of information packets through the network using a simple greedy routing algorithm that aims at finding the neighborhood of the target destination and after to find the node destination within it.

3.2.1 Detecting the Modular Structure

The first step in our study is to detect communities in the ASes network. Whatever strategy is applied to detect these communities (groups), all of them are blind to content and only aware of the topological structure.

A widespread method to quantify how good is a given partition of a network into communities was proposed in [31]. This measure is known as modularity, and it rests on the intuitive idea that random networks do not exhibit a clear community structure. Modularity is a quality function that reads

$$\displaystyle{ Q =\sum _{\alpha }\left (e_{\alpha \alpha } - a_{\alpha }^{2}\right )\,. }$$
(4)

where e is a matrix where the elements e αβ represent the fraction of total links starting at a node in community α and ending at a node in community β and e αα is the fraction of links starting and ending in community α. The vector a α is the sum of any row of e, i.e., the fraction of links connected to community α, and a α 2 is the expected number of intra-community links.

Algorithms that optimize this function yield good community structure compared to a (equivalent) random model network. The problem is that the partition space of any graph is huge (the search for the optimal modularity value is a NP-hard problem [10]), and one needs a guide to explore this space and find local maximum values. For more information on the best successful heuristics, Arenas et al. performed a comparison of different methods in [13]. See also this toolset [2]. We have applied a technique to optimize modularity based on the extremal optimization process [14].

3.2.2 Projecting the Network Structure

The use of the modular structure inherent to real complex networks provides a useful information whose exploitation is competitive with global shortest path strategies. To use the information of the modular structure of networks we analyze the contribution of each node to maintain the structure of communities. The object of this analysis is defined as the contribution matrix C of N nodes to M communities.

$$\displaystyle{ C_{i\alpha } =\sum _{ j=1}^{N}W_{ ij}S_{j\alpha } }$$
(5)

where W is the graph matrix, whose elements W ij are the weights of the connections from any node i to any node j, and S is the partition matrix, where if node j belongs to community α, then S  = 1, otherwise S  = 0.

Unfortunately, the study of this matrix involves a computationally prohibitive handling of a huge amount of data, especially to be used as a basis for a feasible routing system. To reduce this problem, we propose to analyze the contribution of each node of a network to communities using the projection technique introduced by Arenas et al. [5]. This projection is based on a rank 2 truncated singular value decomposition (TSVD) and allows building a map \(\mathcal{U}_{2}\) where each node n has a coordinate pair or contribution projection vector \(\tilde{v}_{n}\). This two-dimensional plane reveals the structure of communities and their boundaries, and we will use it as the navigable coordinate system of the complex network.

For each coordinate pair we calculate the polar coordinates (R n , θ n ) where R n is the length of the contribution projection vector \(\tilde{v}_{n}\) and θ n is the angle between \(\tilde{v}_{n}\) and the horizontal axis. To interpret correctly this outcome, we need to know also the intramodular projection \(\tilde{e}_{\alpha }\) of each community, the distinguished direction line of the projection of its internal nodes (those that have links exclusively inside the community).

With these values, we can compute a new pair (R n , ϕ n ), where

$$\displaystyle{ \phi _{n} = \vert \theta _{n} -\theta _{\tilde{e}_{\alpha }}\vert, }$$
(6)

and the new values

$$\displaystyle{ R_{\mbox{ int}_{n}} = R_{n}\sin \phi _{n}, }$$
(7)

and

$$\displaystyle{ R_{\mbox{ ext}_{n}} = R_{n}\cos \phi _{n}. }$$
(8)

Here, R int informs about the internal contribution of nodes to their corresponding communities, and R ext reflects the boundary structure of communities. Both values, R int and R ext, are the basis of our routing framework.

3.2.3 Greedy Routing

Now we have a geometrical projection of the modular structure that presumably will be helpful to design a local routing strategy. How to make use of this map for navigation purposes is the aim of this section.

In an unstructured network an algorithm based on hubs’ transit (i.e., route toward hubs with the hope that hubs will be directly connected to any target destination) will result in a decent routing even improving classical routing techniques [4]. This basic idea is also used in our modular routing strategy; we will look for nodes in our current community that could act as hub connectors within the community and eventually with other communities. To find an efficient path between two nodes in the network, we choose the local neighbor (node) that has high value of R n . The nodes with high values of R n are nodes with many connections that will presumably allow us to lead to the destination quickly reducing the path length. We have to differentiate however two scenarios: when we are into the destination community and when we are not.

The algorithm we propose works as follows (see Algorithm 1). Let us assume we want to go from node i to node j, and let N i refer to the neighbors of node i. Each node k ∈ N i is a candidate in the path. In the routing process, for each node k ∈ N i , we have to compute a cost function and select the candidate that minimizes it in each step. This process is repeated until the destination is reached, the current node i does not find a feasible successor or a time constraint is violated. We do not allow loops.

This heuristic algorithm sets two scenarios: when our neighbor k belongs to the same community α j than the destination node j and when it does not. In the first case, when k ∈ α j , we are interested in finding nodes with an important weight in the community. In the other case, if k ∉ α j , we seek for nodes near to the boundaries of other communities. Figure 2 shows a typical path in the Internet projection.

Fig. 2
figure 2

Example of routing in the projection of the Internet network. The color of the nodes represents their community; some communities have been omitted to simplify the figure. We see in the example a path that begins in a small degree node, which searches the network hubs and gets closer in θ, to enter the destination community and reach the destination node

3.2.4 Results

To evaluate our proposal we used a snapshot of the Internet for June 2009 [1]. In this network, we have found 349 communities using [2] of sizes ranging between 2 and 546 nodes. We have simulated 106 paths randomly selected. Figure 3 presents the distribution of path length of our algorithm. We have achieved a success rate of 94% and a median path of 5 steps (shortest path median is 4). Given that we are using only local information, the percentage of success is very good. We attribute the unreachable destinations to nodes that are very far from any hub of the network. Nodes with longer paths are, in most cases, internal nodes connected only with other nodes of small degree. Our projection reflects the boundaries of communities of the network but only provides information about the number of connections a node has in the internal topology of its community. This makes that our search strategy has difficulties to find the pathway to those poorly connected nodes with only intramodular contribution. Because of this, the stretch of our algorithm can get very unfavorable, as is suggested by the long tail of the distribution that shows Fig. 3. The consideration of customer-provider roles can minimize partially this problem by reducing the possible paths.

Fig. 3
figure 3

Distribution of path lengths of the simulation of 106 paths using our proposal compared with the shortest path. The value λ = 0. 7 has been determined experimentally

This study is also concerned about the scalability of the Internet routing protocol. In situations where the data is continuously changing, like in an evolving network, a TSVD projection might become obsolete. It is an interesting question whether the TSVD projection of an initial data set is reliable. In our earlier work [15] we defined two measures to quantify the differences between a sequence of computed TSVD projections of growing Barabási-Albert’s scale-free networks. In that situation, we proved that the stability of a TSDV map is very high when considering neighborhood stability (note that R int and R ext are relative modules). Thanks to this, under topology dynamics, addition and removal of nodes in the network can be done without recalculating the projection.

4 Discussion

The theory of complex networks is a powerful tool to address the scalability issues of the Internet routing protocol. Here, we have seen two recent works that are alternative proposals to the classical approaches of compact routing. Boguñá et al. [8] used statistical inference techniques to assign a pair of coordinates to each node in a hyperbolic metric space. In a Poincaré representation, the nodes with high degree go to the center of the space and their angular position is determined by their probability to be connected. Alternatively, in Erola et al. [16], we proposed to exploit the community structure of the network to obtain a linear projection in which nodes of the same community are organized around specific singular directions and hubs have large values of the radial coordinate. At the end, both methods use the same principle: using a projection of the network (map) to define a local routing to the destination using local coordinates on them. The main differences are, while Boguñá et al. [8] look for the best projection in terms of the statistical distribution of connections, in Erola et al. [16] we look for the best projection in terms of the modular structure inherent in the network. Both approaches prove to be competitive and probably complementary.

Moreover, these proposals have proved to be resilient to changes in the network structure due to failures and growth. Therefore, there is no need to continuously recalculate the corresponding projections, obtaining at the same time high performances of the routing algorithms. Nevertheless, profound changes in the core of large hubs may accelerate their obsolescence. For instance, Labovitz et al. [28] have identified a significant evolution of provider interconnection strategies involving a rapid transition to more densely interconnected and less hierarchical inter-domain topology. This particular time-varying behavior may concern the success of these new routing schemes and should be analyzed in the future.