Keywords

1 Introduction

Internet has developed to a large scale complex network today. In order to understand the Internet structure, we can recognize it from four levels. There are IP interface levels, router level, POP (Point Of Presence) level and AS (Autonomous System) level, as shown in Fig. 78.1.

Fig. 78.1
figure 1figure 1

Four levels internet topology sketch map

The lowest level is IP interface topology which describes the logic connection between all IP interfaces from different routers. The second level is router topology which can be obtained from IP interface topology by combining several IP interfaces belong to same router into one network node. Combining routers in same geographic position into one network node, we can further get the third level POP topology. The last level is AS topology. AS is the short for Autonomous System which means a group of routers network constructed by one manage department. Using AS node denotes its interior network and AS link denotes the business relationship between different ASes, the AS topology can reflect the macrostructure of whole Internet.

At present, BGP route info, IRR data and traceroute probing data are three main sources to build AS topology. Mahadevan distills the common AS nodes from these three data sources, and builds three type of AS topology from each data source [1]. Oliveira analyzes the evolution of AS node and link by using these three data sources respectively [2]. In order to study the characteristic of different relationship between ASes, Cohen uses BGP route info and IRR data to build an AS topology [3]. Because different data sources always have bias in building AS topology respectively, we will consider how to fuse these three data info to build a more comprehensive AS topology.

2 The Structure of AS Topology

According to the AS role in Internet, the AS topology can contain three layers, they are network core layer, network transit layer and terminal customer layer. Network core layer is composed of several top tier ASes. These ASes have large scale and achieve the backbone transit between different continents. Due to the connectivity is the most important factor between top tier ASes, they often build the peer-to-peer relationship to exchange each data flow. In network transit layer, ASes play a data transit role in some country or area interior. Generally speaking, the large AS provides the connection service for smaller AS, and builds provider-to-customer relationship with each other. So the network transit layer in AS topology displays loose connection state. In terminal customer layer, the count of AS node is far more than other two layers [4], but the structure is simple. These AS nodes often connect one or several provider ASes, and form a tree-like structure. The structure of total AS topology is shown as in Fig. 78.2.

Fig. 78.2
figure 2figure 2

As topology structure sketch map

The AS topology changes all the time. For example, some ASes prefer to connect to large ASes for shorten its data transit path, others want to strength its network robust and connect to several provider ASes to form multi-host connection. Furthermore, the appearance of backup links make quite a few AS links just work in network failure. All these factors cause AS topology become more complex.

3 AS Topology Building Method

3.1 Data Source Analysis

BGP route info contains a series of AS paths to target network. So we can analyze the AS link relationship from it. At present, the most famous project is RouteViews Project [5]. Oregon university has start RouteViews Project since 1999; they set BGP routers to connect with other ASes BGP routers, and receive route updateinfo by creating BGP sessions. In RouteViews Project website, we can download route real-time info and history info directly.

Internet Routing Registry (IRR) is a distributed route info database by manual maintenance. There are 32 IRR organizations at present [6]. The most famous one in them is RIPE IRR. Due to the IRR data contain some special info which can hardly get from other methods, this type of info is also important to discover AS relationship.

By using traceroute mechanism, we can get the IP route path to some target address. Converting each IP address into AS number, the AS path is indirectly obtained. This info can also use to build AS topology.

3.2 The Characteristic Comparison of Three Data Source

Although each data source can use to build the AS topology, they have different characteristics. Firstly, BGP route info root from real BGP routers, this type of info has well validity and veracity, but lack of backup link info between ASes. Due to the limitation of route strategy, a large number of peer-to-peer links cannot be record [4].

Secondly, IRR data comes from manual database, the main problem to use IRR data is that the data validity cannot ensure. However, the IRR data also has some advantages which other data source haven’t. IRR data contains newest route info, and the redundancy is low. A large number of peer-to-peer links and backup links are record in IRR data. Besides, the data properties from IRR are the richest in all data source, such as AS Name, Management, Geography position and Contact.

Thirdly, traceroute data need to probe a series of target IP addresses to achieve. This type of data has well controllable, and the IXP info can be found during IP path converting AS path. If one IP can convert into several ASes at same time, the confusion happens. It needs to set some rigorous rules to solve this confusion.

Table  78.1 has summarized these three data sources characteristics.

Table 78.1 The Characteristic Comparison of Three Data Source

The way to get BGP route info and traceroute data are belong to active probing method, so they have high validity and local bias. Although the traceroute data obtaining method is hard, it can improve the IXP discovery rate in AS topology. IRR data is the knowledge info published in the Internet, the quality of IRR data is different in different areas, but analyzing these data is useful to discover a large number of peer-to-peer links and backup links. So it has an important role in building AS topology.

3.3 AS Topology Building Algorithm

Based on the excellence of BGP route info in node perfectibility and recognizing correctness, our algorithm chooses the BGP route info as original analysis data. Firstly, use BGP route info to build a rough AS topology. Then add new AS nodes and links analyzed from IRR, and store additional info to describe each AS. At last, discover IXP info from traceroute probing data. We collect IXP list from some public websites [7], and adjust correlative nodes and links in current AS topology. The total algorithm sketch is shown as follows.

There are three main problems in analyzing BGP route info.

  • Aggregation problem. Aggregation is one combination method in BGP routers to reduce the count of route. When several routes to one target network have same sub routes, they may aggregate to one route. For example, the aggregated route (2497 1668 10796 {11060, 12262}) contains two independent route. They are (2497 1668 10796 11060) and (2497 1668 10796 12262).

  • Fake AS number problem. BGP routers often choose customer routes, peer routes and provider routes in turn according to its local strategy. When the type of route confirmed, the shortest path will be always chosen. The fake AS number can be added to change the route choosing in result. For example, the AS-Path (4513 701 6496) and (4513 8701 11853 6496) are two same type routes to certain target. According to route strategy, the first one will choose. By adding fake AS number 6496 we can make this route become (4513 701 6496 6496 6496), then the second route will choose. In our analysis algorithm, it needs to reduce the redundancy for each AS-Path.

  • Private AS number problem. Such as private IP addresses exist, a section of AS numbers are private. The range is from 64512 to 65535. These private ASes are used to partition several areas in one AS inner. So if the private AS number appears in BGP route info, it must be a configure mistake. We just lose this route for simple.

After analyzing BGP route info, the IRR data need to be fused in current results.

  • Choose aut-num object in IRR data which is updated in three years. If there are several records for same one aut-num object, just choose the nearest one.

  • For each AS node, firstly check if it exist in anterior BGP analysis result. If not, this AS node cannot add in AS topology directly. Because some ASes just register their route strategy in IRR, but never implement in real network. So it still needs to check correlative aut-nums to validate the symmetrical route strategy if exist. For example, if aut-207 has output strategy to aut-701, then need to check if aut-701 has input strategy to aut-207. When both strategies exist, the aut-num object is valid.

  • The valid AS link analysis between two aut-nums follows above validation rule.

  • Extract affiliated info. We extract affiliated AS info from aut-num object, such as AS Name, Management, Geography position, Contact and so on.

At last, the traceroute data need to be fused in current AS topology result. We adopt the longest mask matching method to convert each IP address to one network, and search the AS number which does this network belong to. When one IP address can convert to several ASes, we solve these ASes as an AS-Group.

For each AS-Group, firstly, our algorithm checks the IXP’s AS number if it exists. Then we analyze the previous AS and the next AS of this AS-Group if connect directly in current AS topology. This AS-Group is replaced by the IXP AS number.

For each non-IXP AS-Group, we need to use current result to check the previous AS and the next AS of this AS-Group if connect each other. Based on Sect. 78.3.2 analysis, let ASListprev denotes for the AS list extracted from BGP result by previous AS, ASListnext denotes for the AS list extracted from BGP result by next AS, then ASListcommon = ASListprev∩ASListnext denotes for the AS list simultaneity belong to both. Let ASSet = ASListcommon ∩ AS-Group. If ASSet ≠ Φ, it means any AS in ASSet can replace the AS-Group, so we choose a random one to replace the AS-Group. If ASSet = Φ, we treat this AS-Group as wrong info.

Pass through above steps, a more comprehensive AS topology can be build.

4 Experiment

By our independence probing in March 2012, we get the traceroute data. So the BGP route info and IRR data are collected at the same time. The BGP route info comes from Routeviews Project, the IRR data is obtained from 32 IRR database. The AS nodes and links analyzed from these three data source are shown in Table  78.2.

Table 78.2 Data analysis result

In Table  78.2, nodes and links from BGP take the biggest proportion. The main reason is that our BGP data comes from BGP router which connects to other AS backbone BGP routers; it includes a large number of global routes. Besides, we use BGP data as original analysis data, when analyzing IRR and traceroute data, many nodes and links info have been discovery by BGP. Traceroute data are analyzed from 228860 aggregated global network address probing. Fusing these three type of data, the AS topology contains 30775 AS nodes and 67838 links in total.

Because the AS topology has large scale and complex characteristics, current researches often compare the network characteristics with others result for validation. We mainly calculate the node degree distribution of our AS topology and compare it with related result. Figure 78.3 displays the node degree CCDF (Complementary Cumulative Distribution Function) curves when fusing three types of data. k denotes node degree, and Pk denotes the complementary cumulative distribution value. In Fig. 78.3, in order to distinguish different result curves, the small picture in left down displays the CCDF curves of BGP and BGPU IRR. We can see that the CCDF curve of pure BGP data has obvious power-law characteristic. When fusing IRR analysis result, the middle of curve will raise a little. The reason is that more peer-to-peer links are found in IRR data, and make the count of middle degree node rapidly increase. When fusing traceroute data, the CCDF curve has little influence. The reason is that BGP data and traceroute data have similar characteristics. Although the traceroute data can find IXP info and change AS topology fractionally, this change cannot influence the CCDF curve.

Fig. 78.3
figure 3figure 3

The node degree CCDF of out result

Figure 78.4 displays CCDF curves of each type of data source independently. Compared with Fig.  78.3, we can see the consistent result. It needs to point out that Fig. 78.4 only use the common info in all three data source. Our work is from the data fusion role, the data scale is larger, and keeps more extra information.

Fig. 78.4
figure 4figure 4

The CCDF of each data source [1]

5 Conclusion

In summary, this paper analyzes the characteristic of BGP route info, IRR data and traceroute probing data. To solve the localization when using single data source to build AS topology, we put forward a new AS level topology build algorithm based on BGP route info, and fused with IRR data and traceroute data. In experiment, we use real data to execute algorithm and compare the result with related research which proved the validity and correctness of the algorithm.