• No results found

5.3 Path Service Architecture

6.1.2 Network Model

The input to our experiments is an Internet AS-level topology which is constructed

using CAIDA’s AS relationship dataset [1]. The CAIDA AS relationship dataset con-

sists of AS adjacencies along with annotated relationships such as customer-provider and peer-to-peer. We used the CAIDA dataset to build an initial topology by adding a single channel between the domains that have business relationships according to the

dataset. As we discuss in Sections 6.1.2.1and 6.1.2.2, we made modifications to the

initial topology to obtain an equivalent AS-level topology that met our requirement that domains be either access or transit, but not both.

The current Internet topology, having domains that act both as access and as transit at the same time (e.g., a tier-1 domain hosting content), does not map directly to our model where each domain is transit or access, but not both. We describe the

channels to the topology to reflect the existence of multiple channels between the domains in the Internet and to reflect the possible implication of settlement-free relationships. We describe the addition of extra channels to the CAIDA topology in Section6.1.2.2.

6.1.2.1 Separating Access and Transit Roles

We began by transforming the CAIDA AS topology into a domain-level topology by separating the roles of access and transit. The two steps of the transformation are: i) identify ASs in the CAIDA topology with mixed-roles, and ii) split the ASs with mixed-roles into an access and a transit domain, connected to each other by a channel. In order to identify domains with mixed-roles, we first classified each domain in the CAIDA topology as either i) primarily transit or ii) primarily access. A method

suggested by Dhamdhere et al [27] infers the primary business of each domain based

on the size of its customers, providers and peers. In particular, the method classifies each domain into one of the three categories: i) Transit Providers (TPs), ii) Con- tent/Access/Hosting providers (CAHP), and iii) Enterprise Customers (ECs). The EC domains correspond to various companies, universities and organizations. Apply- ing the above classification method to the CAIDA data resulted in 35,287 domains that are primarily EC, 4,646 primarily TP, and 1,755 primarily CAHP. Using these results, we formed the set of primarily access domains from the domains classified as either EC or CAHP and formed the set of primarily transit domains from the domains classified as TP.

We then used the NetFlow traces to identify domains with mixed-roles among the primarily transit domains. Our approach to identify mixed-role transit domains is based on determining whether the domain is the destination in a large percentage of the flows in the NetFlow traces. Based on the traces, 2,700 of the 4,646 primarily transit domains were the destinations in a significant number of flows (i.e., over 10% of all the collected flows). Therefore, we identified those 2,700 domains as mixed-role

transit domains. We also identified mixed-role primarily access domains using the CAIDA AS relationships data. In particular, a primarily access domain is identified as mixed-role if the domain is the transit provider of at least one other domain according to the CAIDA data. Finally, we split the mixed-role domains into a transit and an access domain, connected with a channel. After splitting the domain into two, we assign all the existing channels of the original domain to the transit component.

6.1.2.2 Adding Extra Channels

In the Internet today, there are typically multiple channels between domains, es- pecially between large ISPs. However, the CAIDA AS relationship topology dataset does not provide the number of channels between the ASs. In order to reflect the mul-

tiple channels in our network model, we added 60,000 additional channels2 between

the transit providers, placing them between pairs of adjacent transit domains. We

performed a biased random selection of channel endpoints by favoring high degree3

domains in the selection. In particular, we followed the three step procedure to add a new channel: i) pick one domain with a probability proportional to its degree, ii) pick a second domain from the existing neighbors of the first domain with a probability proportional to its degree, and iii) add a channel between the two domains.

After adding extra channels between the transit domains, we also added new channels from access domains to transit domains in order to reflect the possible impact

of settlement-free relationships between the domains (as described in Section3.1). In

particular, we added a total of 18,694 new edges between access and transit providers in order for access domains to have an average multi-homing degree of 4 (as opposed to 2.8 in the original topology). When adding the extra channels to access domains, we selected access providers uniformly at random in order to create a topology where

2

60,000 additional channels equate to around 10 extra channels per transit domain on average. However, we performed a biased random selection favoring high degree domains when choosing the channel endpoints. As a result, the largest transit domains obtained hundreds of additional channels.

3

We made the assumption that the number of channels of a domain (i.e., degree) is a good indication of its size.

all access providers were equally likely to be multi-homed. The resulting topology contains 37,986 access domains and 6,402 transit domains. In the topology, there are 171,074 bidirectional channels.

The final topology approximates the structure of the current Internet topology. Because our approach scales through parallelism, future Internet topologies with larger number of paths than the current Internet can be accommodated by adding additional instances of parallel components to our path service. Next, we describe how the paths are generated in the pre-computation stage.