GigaSunet Implementation Report

(1)

Swedish University Network KTHNOC

GigaSunet – Implementation Report

September, 2003

1. Summary

GigaSunet – the new generation of production network within SUNET.

The traffic load in the Swedish University Network, SUNET, has doubled every year since 1980. To meet this traffic growth, major redesigns of the network have been done in roughly 4-year periods. The need for high availability in later years has pushed the design towards high level of redundancy. After SUNET-155 with 622 Mbit/sec core connections a network based on 2.5

Gbit/sec access lines and 10 Gbit/sec core was needed. This was realized with four core rings, using Cisco OC192 POS technology over Telia DWDM channels. Access connections to the universities1 were done with 2.5 Gbit/sec SRP rings and Cisco 10720 access routers, nicknamed “yellow brick”. Routing was initially set up with route reflectors, but this was later changed to “full mesh” to allow hot potato routing for external peering purposes.

To make the network more resilient to disturbances, service hosts for DNS, authentication etc. has been placed in the POPs in Malmö, Göteborg and Umeå. These hosts serve both as a backup and load sharing to the main hosts for those services in Stockholm.

After nearly a year of operation, we can conclude that GigaSunet has fulfilled the goal of being a robust production network, with very high availability. So far we have had very few disturbances, in spite of fiber cuts, line card failures etc. The availability for the universities has been almost 100%, thanks to the robust configuration together with the redundant infrastructure and equipment.

2. Introduction

The dual purpose of this report is to describe the process of designing GigaSunet, and to document the network as such. The documentation describes physical topologies and routing configurations, as well as some of the design criteria of the physical and logical network.

The report has been produced as a joint effort between SUNET and KTH Network Operation Center, KTHNOC, which has been given the task to implement and operate the various generations of SUNET since 1988.

Börje Josefsson, SUNET, and Peter Graham, KTHNOC, have written and edited the report with contributions by specialists from KTHNOC (chapters 5-7).

(3)

3. Design and background

Background

SUNET is the organization for the national higher education and research network (NRN) of Sweden. Its network is used by researchers, teachers, students, and administrative personnel on 32 universities nationwide. In addition to that, some national government museums and external organizations are also connected to the network.

The previous generation of SUNET network (SUNET-155), consisted of redundant 155 Mbit/sec connections to all the universities in Sweden, and was based on connections leased from the National Rail Administration until late 2002. That network was in full operation for nearly four years, and was – with minor operational disturbances – functioning very well.

However, we reached the point in time, roughly according to the plan that was drawn up when the network was designed, when the connections to certain universities (Luleå and Lund, with

Chalmers soon following them) were reaching the limit of their capacity. Therefore, an expansion must take place at these universities. At the same time [early 2001], it was time to look forward and decide what the next generation of the network was going to look like. It was proposed that the new network should be called GigaSunet.

Infrastructure

The fiber-optic infrastructure in Sweden is being developed on a wide front, to a varying degree and at a different speed in different parts of the country. After discussions with several

infrastructure owners, we made the assessment that national fiber networks would cover all of the university cities, apart from Visby, with redundant paths during 2002. Several of the cities were covered as early as in the end of 2001.

At first sight, one may find it natural that SUNET should lease “dark fiber” and itself connect equipment to light it up. However, one notes that SUNET is far too small and it’s requirements far too modest for this alternative to be profitable. To exemplify this one can mention the fact that regenerators and/or amplifiers are needed roughly every 80th kilometers, and the organization for running and handling them and for troubleshooting would alone be substantial. Here we are thinking about a network in the order of 6 000 km of fiber!

The other alternative available would be to lease wavelengths (“lambdas”) from some operator. The normal practice for operators today (simplified) is that, in a pair of fibers, they use lasers with different colors to light up the fiber. In this way they can run several parallel and completely separate connections in the same pair of fibers. For SUNET this would be an alternative that would provide much better value for money.

A third possible alternative for SUNET was to cooperate with some major ISP in making a joint investment and running a fiber network in Sweden. However, this alternative has not been studied to any greater extent, since the “lambda-leasing alternative” was so attractive. Also, not owning the infrastructure makes changes to the network more flexible.

SUNET carried out a preliminary study where we analyzed the suppliers’ possibilities of realizing our target network within year 2002 at a reasonable cost. It was our opinion that there were several suppliers in Sweden who were capable of offering what we were asking for.

(4)

Bandwidth

Since the models (not just for SUNET) for forecasting the development of bandwidth requirements seem to have been accurate so far (the forecasts for the SUNET-155 turned out to be very good), one can probably use them for GigaSunet too. This means that we had to construct a network that, in 4 years from its start, was capable of providing the largest users/producers with 2 480 (=155*24) Mbit/sec (a twofold increase from SUNET-155’s capacity of 155 Mbit /sec every twelfth month). This was a challenge in itself, but not at all impossible.

SUNET transmission capacity

1000 10 k 100 k 1 M 10 M 1 G 10 G 1980 1985 1990 1995 2000 2005 1200 9600 64 K 2 M 34 M 155 M 622 M 10 G ??? 2 x traf fic eve ry year since 1 980…

Figure 1. Transmission capacities in the SUNET backbone since 1980.

The design was therefore based on a core network with a capacity of 10 Gbit/sec between the university cities, with a delivery of 2.5 Gbit/sec to each University.

Topology

SUNET-155 was, to the highest degree possible, a production network that was used 24 hours a day, every day of the year. It was therefore of great importance that GigaSunet should maintain at least the same availability as that.The network was to be structurally simple, and therefore provide a high degree of redundancy. Consequently, we studied a topology in Sweden in the form of a “snowman”, with a number of (wavelength) rings which were to be joined together at several places and which allow the traffic to continue to flow even if individual connections were cut off, e.g. by excavation or hardware failures. Within each university city, the access was also to be redundant in the form of some kind of ring or triangular structure.

(5)

Figure 2: The “snowman” topology.

Another design criterion was that no backbone ring should be overbooked, at least not to a larger extent. Therefore, no city should have more than 4 cities between itself and the main node in Stockholm when the network is fully operational. In case of an arbitrary fiber- or node failure, that distance should never exceed 5. This is mainly based on the fact that a lot of traffic in our network is external, and the main part of that traffic exchange is done in Stockholm.

Uniform technology

One of the factors that complicated the operation of SUNET-155, was the diversity of equipment types. The equipment installed centrally in Stockholm was not of the same type as that at the nodal points in Sundsvall, Göteborg and Malmö. Moreover, this equipment was different from the equipment that SUNET had installed in the universities. In addition, the local equipment at the universities was also different to a high degree. We proposed a concept that involved all the equipment in the core network being of the same type, with almost identical configurations.

Likewise, we proposed that the access routers in the universities should be identical everywhere. In this way, all the access points from GigaSunet would be technically of the same type.

(6)

In those cities that have several universities (e.g. Stockholm, Uppsala, Göteborg and Malmö/Lund), the universities in each city share the same ring, with the exception of the Stockholm region, where we proposed that the six universities share three local rings.

Figure 4: Layout in a city with several universities.

Operations

After some initial discussions, our proposal was that the network personnel of the universities should be given the option of operational responsibility for the SUNET access routers. The situation that we had in SUNET-155, where the access routers were centrally managed, had sometimes involved a university or college placing an additional “layer” of routers between SUNETs access router and its own local network, in order to have a point for inserting filter lists, other connections, etc. By gaining access to the new access routers, the universities obtain better possibilities of adapting the SUNET delivery to their own equipment, as well as powerful

equipment for running filters, etc. The central operating organization will only run the core network itself, which, as said above, consists of equipment of the same type, and will deliver to universities which logically appear to be the same (i.e., from the point of view of the core network, have equipment of the very same type). This gives us operational advantages, both centrally and locally. This setup was accompanied by courses and seminars that provided the local operating personnel with knowledge of how to run the new equipment. Those universities that, nevertheless, are not able or didn’t wish to operate the equipment placed there by SUNET have been able to obtain assistance from KTHNOC. SUNET is in all cases responsible for service agreements, etc. for all of the access routers.

Objective

The objective was to build one of the best academic production networks in the world. Then the next generation of university network in Sweden would hopefully once again be a forerunner in Sweden and in Europe, as well as for the commercial suppliers of network services and

applications, at the same time as it will have all the qualities necessary to provide Internet services for the country’s universities.

Call for tender

A Call for tender for the infrastructure part of GigaSunet was sent out in March 2001. That was evaluated during May, and from that evaluation, the SUNET board decided to sign a contract with

(7)

Telia (together with its subsidiaries) as supplier of dark fiber and lambdas for GigaSunet. Telia are also housing the SUNET core routers in each city within their premises as well as supporting with “room service” (i.e., their personnel can do smaller tasks, such as servicing routers).

All hardware was bought from Cisco, via their Swedish partner Eterra, using the agreement between them and Statskontoret.

Design team

During the planning phase, several different groups of people were involved. After the contracts were signed, a team of representatives from SUNET, KTHNOC, Cisco Systems Sweden and Eterra (Cisco partner in Sweden) was formed to discuss details in routing configurations, deployment of equipment and other aspects concerning the details of the installation process.

Figure 5: Nationwide 10 Gbit/sec network.

Implementation

All POPs (Point-of-presence) in the core network are built in a very similar manner. They consist of two Cisco GSR 12410 routers, connected to the 10 Gbit/sec lambdas, which connect that POP with the ones in the neighboring cities. They also connect to a dark fiber-ring within the city, which connects the core POP with the Cisco 10720 access routers at the University, using 2.5 Gbit/sec DPT/SRP protocol. Exceptions to this standardized POP layout are Umeå, Göteborg and Malmö, where we have an extra router in the POP to be able to connect serial lines, Ethernet connections and service hosts.

(8)

Figure 6: GigaSunet POP

To be able to reach the core routers in case of network breach or hardware problems, an out-of-band-network, based on ISDN, has been formed. This consists of a small router at each site, connected to the console ports on the core routers. This network is only reachable from the operations centre at KTHNOC.

(9)

4. Implementation and rollout of GigaSunet

The implementation and rollout of GigaSunet was divided into three phases. Phase one, consisting of 8 cities, was implemented during the first months of 2002. That implementation was done with SUNET-155 as a backup, because it was not possible to get full infrastructural redundancy in the new network at that time. The second phase, with another 10 cities, was implemented during the spring of 2002, even that being done with redundancy through SUNET-155. Full redundancy in GigaSunet was accomplished July 1, 2002. The last phase of installation was done in September of 2002, when the remaining cities and universities were connected. All cities are listed in table 1 below.

Figure 8: Cities connected within GigaSunet.

In addition to the three major phases there was also an initial test phase.

• Phase 0: [Nov 2001] Test of GSR, OC-192 POS and Telia DWDM links and equipment.

• Phase 1: [Jan-Apr 2002] Fourteen universities connected (Luleå, Umeå, Sundsvall, KTH, SU, KI, HLS, SH, HHS, Linköping, GU, CTH, Malmö, Lund)

• Phase 2: [June 2002] Eleven universities connected (UU, SLU, Örebro, Västerås, Borlänge, Karlstad, Skövde, Jönköping, Borås, Trollhättan and Halmstad)

• Phase 3: [Sept 2002] The last six universities connected (Gävle, Visby, Växjö, Ronneby, Kristianstad, Kalmar and Kiruna).

(10)

University name Abbreviation City

Blekinge tekniska högskola BTH RONNEBY

Chalmers Tekniska Högskola Chalmers GÖTEBORG

Göteborgs universitet GU GÖTEBORG

Handelshögskolan HHS STOCKHOLM

Högskolan Dalarna DU BORLÄNGE

Högskolan i Borås HIB BORÅS

Högskolan i Gävle/Sandviken HIG GÄVLE

Högskolan i Halmstad HH HALMSTAD

Högskolan i Jönköping HJ JÖNKÖPING

Högskolan i Kalmar HIK KALMAR

Högskolan i Kristianstad HKR KRISTIANSTAD

Högskolan i Skövde HIS SKÖVDE

Högskolan i Trollhättan/Uddevalla HTU TROLLHÄTTAN

Högskolan på Gotland, HGO VISBY

Institutet för rymdfysik IRF KIRUNA

Karlstads universitet KAU KARLSTAD

Karolinska institutet KI STOCKHOLM

Kungliga tekniska högskolan KTH STOCKHOLM

Linköpings universitet LIU LINKÖPING

Luleå tekniska universitet LTU LULEÅ

Lunds universitet LU LUND

Lärarhögskolan i Stockholm LHS STOCKHOLM

Malmö högskola MAH MALMÖ

Mitthögskolan MH SUNDSVALL

Mälardalens Högskola MDH VÄSTERÅS

SLU Uppsala SLU UPPSALA

Stockholms universitet SU STOCKHOLM

Södertörns högskola SH STOCKHOLM

Umeå universitet UMU UMEÅ

Uppsala Universitet UU UPPSALA

Växjö universitet VXU VÄXJÖ

Örebro universitet ORU ÖREBRO

(11)

5. Routing in GigaSunet

Björn Rhoads, KTHNOC, [email protected]

First published 21 May 2002, revised June 2003.

Abstract

This chapter describes the initial routing configuration within GigaSunet, how the IGP and EGP are set up. Below is the quick list of the routing options used:

1. The IGP used is IS-IS

2. iBGP is done with one level of route reflectors

3. eBGP is done either from the bricks at the Universities or from the GSR's at the POPs, depending on how the bricks are managed.

4. Multicast is using Anycast RP with internal MSDP and external MSDP/MBGP This is then updated with the current routing configuration, both regarding change from route reflectors to full mesh and tuning of IS-IS and BGP for fast convergence.

Network structure

The basic network layout can be described as a "snowman" with four major parts, i.e. four main rings. This simplification is not directly obvious when looking at the physical structure of the network, see figure 9.

To better see what the layout looks like, one can untwist some of the rings and get a cleaner layout of the network, as seen in figure 10. This structure was the basis for the choice of how to layout the routing structure.

Note that the physical map has the different stages of network deployment marked, but the logical map only shows the final network layout.

(12)

(13)

Figure 10: The logical network layout

The main POP for external traffic out of SUNET is Stockholm. In Stockholm there are more routers than shown in figure 10. Stockholm has one more GSR other than 3 and 4, which handles the local traffic to the universities and that is stkpr3 which is a router for external peerings. In addition to these five routers at the main POP, there are two at KTHNOC (KTHNOC-1 and 2) which handles the traffic to national museums, schools of performing arts and external

(14)

Figure 11: Layout for the Stockholm region (when SUNET-155 was still in operation).

Internal routing

IS-IS

As internal routing protocol, SUNET has decided to use IS-IS. It is the same protocol that was used in the previous network, SUNET-155. The entire network consists of one single IS-IS L2 area. So both GigaSunet and SUNET-155 belonged to the same L2 area.

Figure 12: Layout of a POP

Since IS-IS disregards the difference in speed on links, special considerations had to be taken into account at the POP's. A simplified version of a POP can be seen in figure 12. Here the routers 1 and 2 are the core GSR's, and the B1 and B2 routers are the “bricks” at the university. The thick line through 1 and 2 is the OC-192 backbone, and the SRP ring is the local connection for that university. The problem comes from the fact that IS-IS has equal metric on all links when

calculating paths. In this case the SRP and the OC-192 will be one hop with equal cost, so without modification the traffic between 1 and 2 is just as likely to go over the SRP-ring as it is going over the OC-192 POS interfaces.

This was fixed by raising the metric for the SRP interfaces, and lowering the metric for the POS interface. KTHNOC choose to set the metrics for the POS interfaces to 2, and the metric for the SRP-interfaces to 60. This was done to ensure that if a POS-link in a POP fails, the traffic going through the POP will go over the backbone and not over the SRP, since the SRP has only ¼ of the speed of the POS-links.

(15)

As an example, look at figure 10 at POP 6 Uppsala. The layout of each POP is not visible here, but Uppsala has the layout of figure 12. If the POS link between 1 and 2 would fail, the traffic that normally should go through Uppsala should in this case go over Västerås and Gävle, and not through the local SRP-ring at the POP.

iBGP

The first design of the internal BGP-structure was rather simple; it consisted of one layer of route reflectors (RR's) serving the rings. In figure 13 the route reflectors are marked in green. There were a total of 9 RR's in GigaSunet, and all are in full mesh. The route reflector location was selected on the basis of the physical structure of the network. All RR's were at the intersections, or close to intersections of the rings.

(16)

Figure 14: iBGP between Route reflectors and clients

There were two RR's for each ring. This was to ensure redundancy for the routing throughout the network. The layout for the clients towards the reflectors are a bit hard to visualize in an easy

(17)

manner. The basic idea was that each router speaks with two route reflectors, close to each exit point on the rings for traffic towards Stockholm.

In figure 14, it is visible that the route reflectors were servicing the rings that they were on. The router with most iBGP-peerings is boras2 (16).

External routing

A common way of doing external peering is to try to avoid doing it in the backbone routers, so that internal forwarding is affected as little as possible. That idea is, however, hard to realise in a real network. During the startup phase of GigaSunet, all external peerings was done in the bricks at the universities, and thereby moved the external peerings off the backbone. That has now changed. For the final GigaSunet, the universities had the choice of managing the brick themselves, and by doing so the brick moved into the customer (university) AS. This in turn moved the eBGP peering for the university to the backbone routers. Figure 12 shows the simplified layout of a university

connection. eBGP

Customer peerings are done by the backbone routers at the POP's if the customers are managing the bricks; otherwise the peering is done by the brick at the university. At some POP's there are more than one university on the same ring, an example is Stockholm, where there are three SRP-rings with two universities each. Those SRP-rings are connected to stockholm3 and stockholm4. The setup, with a separate peering router is also sometimes not used. The peerings with NORDUnet, the main upstream provider, is done by stockholm1 and stockholm2.

Multicast

The multicast is run with PIM-SM and AnycastRP. There are 9 rendezvous points (RP's) in a layout similar to the (original) iBGP route reflectors, but not identical to the RR layout. The RP was close to the RR or the same in some instances. This has been done in order to spread the work between the pairs of backbone routers in the POPs. Figure 15 shows the layout of the RP's. All RP's are in full mesh, meaning that there is a MSDP peering between all RP's. This is strictly not needed, but it ensures that all SA's always are announced to all RP's regardless of what physical links are down.

(18)

Figure 15: RP layout

The way to spread the source active (SA) information can be done in several ways. One way is to set up MSDP peerings directly between 1 or 2 RP’s close to the customer RP. Another is to let the

(19)

customer MSDP peering be to the POP closest to the customer and then have internal MSDP peerings to the RP’s in the core.

The decision was to have the external MSDP peerings to the RP’s. Multicast on the connections to the customers, the SRP-rings, should be as stable as possible. Since SRP is a shared medium, there is a PIM DR (designated router) on each SRP ring. The DR is elected by the highest IP, so without modification that DR will always be a customer router and not a core router. The command to set the priority for the DR election was not initially available in the version of IOS that SUNET was running, but it is now.

There are two ways of scoping the multicast address space, either use TTL based scoping, or use address based scoping. Address based scoping seems to be the most common way (see RFC 2365 for details). SUNET has implemented address based scoping, i.e. filtering is done on 239/8 at the GigaSunet borders and RP’s. RFC 2365 also defines address ranges to be organization local scope 239.192.0.0/14. This range or part of it could be used as SUNET local address, to be used by SUNET and customers together.

Router configurations

Multicast configuration – condensed For all routers:

ip multicast-routing distributed ip sdr cache-timeout 240

ip pim ssm desfault !

! Define the RP to use !

ip pim rp-address 193.10.80.229 multicast-accept-register override ip pim accept-rp 193.10.80.229 multicast-accept-register

!

ip access-list standard multicast-boundary-sweden deny 224.0.1.75 deny 224.0.1.35 deny 224.0.1.39 deny 224.0.1.40 deny 224.0.1.60 deny 224.0.1.1 deny 224.0.2.2 deny 224.0.1.3 deny 224.0.2.1 deny 224.0.1.20 deny 224.0.1.22 deny 224.0.1.24 deny 239.0.0.0 0.255.255.255 permit any !

ip access-list standard multicast-boundary-sunet deny 224.0.1.75 deny 224.0.1.35 deny 224.0.1.39 deny 224.0.1.40 deny 224.0.1.60 deny 224.0.1.1 deny 224.0.2.2

(20)

deny 224.0.1.3 deny 224.0.2.1 deny 224.0.1.20 deny 224.0.1.22 deny 224.0.1.24 permit 239.192.0.0 0.3.255.255 deny 239.0.0.0 0.255.255.255 permit any !

ip access-list standard multicast-accept-register deny 232.0.0.0 0.255.255.255

permit 224.0.0.0 15.255.255.255 deny any

!

ip access-list extended multicast-sa-sweden deny ip any 232.0.0.0 0.255.255.255 deny ip any 239.0.0.0 0.255.255.255 deny ip 10.0.0.0 0.255.255.255 any deny ip 192.168.0.0 0.0.255.255 any deny ip 172.16.0.0 0.15.255.255 any deny ip 127.0.0.0 0.255.255.255 any deny ip 0.0.0.0 0.255.255.255 any deny ip 169.254.0.0 0.0.255.255 any deny ip 192.0.2.0 0.0.0.255 any deny ip any host 224.0.1.1 deny ip any host 224.0.1.2 deny ip any host 224.0.1.3 deny ip any host 224.0.1.20 deny ip any host 224.0.1.22 deny ip any host 224.0.1.24 deny ip any host 224.0.1.35 deny ip any host 224.0.1.39 deny ip any host 224.0.1.40 deny ip any host 224.0.1.60 deny ip any host 224.0.1.75 deny ip any host 224.0.2.1 deny ip any host 224.0.2.2 permit ip any any

!

ip access-list extended multicast-sa-sunet deny ip 10.0.0.0 0.255.255.255 any deny ip 192.168.0.0 0.0.255.255 any deny ip 172.16.0.0 0.15.255.255 any deny ip 127.0.0.0 0.255.255.255 any deny ip 0.0.0.0 0.255.255.255 any deny ip 169.254.0.0 0.0.255.255 any deny ip 192.0.2.0 0.0.0.255 any deny ip any host 224.0.1.1 deny ip any host 224.0.1.2 deny ip any host 224.0.1.3 deny ip any host 224.0.1.20 deny ip any host 224.0.1.22 deny ip any host 224.0.1.24 deny ip any host 224.0.1.35 deny ip any host 224.0.1.39 deny ip any host 224.0.1.40 deny ip any host 224.0.1.60 deny ip any host 224.0.1.75 deny ip any host 224.0.2.1 deny ip any host 224.0.2.2

(21)

permit ip any 239.192.0.0 0.3.255.255 deny ip any 239.0.0.0 0.255.255.255 permit ip any any

For access routers !

! for interfaces that are towards customers ! these are the added commands for multicast !

interface <int> <num> ip pim bsr-border ip pim sparse-mode

ip mroute-cache distributed

ip multicast boundary multicast-boundary-sunet [ip multicast ttl-threshold <ttl>]

For RP’s ! ! The RP address ! interface Loopback1 description sunet-rp.sunet.se ip address 193.10.80.229 255.255.255.255 no ip directed-broadcast ip pim sparse-mode !

! Full mesh MSDP with the other RP’s ! This is from borlange2

!

ip pim rp-address 193.10.80.229 62 override ip pim accept-rp 193.10.80.229 62

ip msdp peer 130.242.80.1 connect-source Loopback0 ip msdp peer 130.242.80.5 connect-source Loopback0 ip msdp peer 130.242.80.8 connect-source Loopback0 ip msdp peer 130.242.80.12 connect-source Loopback0 ip msdp peer 130.242.80.14 connect-source Loopback0 ip msdp peer 130.242.80.25 connect-source Loopback0 ip msdp peer 130.242.80.31 connect-source Loopback0 ip msdp peer 130.242.80.32 connect-source Loopback0 ip msdp mesh-group GSCore 130.242.80.1 ip msdp mesh-group GSCore 130.242.80.5 ip msdp mesh-group GSCore 130.242.80.8 ip msdp mesh-group GSCore 130.242.80.12 ip msdp mesh-group GSCore 130.242.80.14 ip msdp mesh-group GSCore 130.242.80.25 ip msdp mesh-group GSCore 130.242.80.31 ip msdp mesh-group GSCore 130.242.80.32 ip msdp cache-sa-state ip msdp originator-id Loopback0 ! !

! MSDP peering with customer !

ip msdp peer <peer address> connect-source Loopback0 [remote-as <AS num>] ip msdp sa-filter in <peer address> list multicast-sa-sunet

(22)

ip msdp sa-filter out <peer address> list multicast-sa-sunet [ip msdp ttl-threshold <peer address> <ttl>]

Routing improvements

The design with route reflectors (RR) was based on the information that the routers in the core network could not handle full mesh2 peering in a network the size of GigaSunet. This situation changed by the end of 2002. As GigaSunet has several external peering points with other networks, some of them requested hot potato routing, which was not possible with our RR setup. Therefore the decision was made to implement full mesh backbone network in the beginning of 2003.

After careful planning, an approach was chosen to introduce full mesh, ring by ring in the core. The sequence of routers to be changed was important not to disturb the traffic more than necessary. With a team of 9 engineers, and all configurations prepared and downloaded to the routers in advance, it took only 50 minutes to do the whole change. Only in one case was traffic affected and that was when the sequence by mistake was not adhered to (just to prove the plan right).

As a preparation we implemented some fine tuning features for BGP to get the network to converge faster when we had made the alterations. The fine tuning of BGP is described in the next section.

IS-IS and BGP tuning for faster convergence

BGP fine tuning

The fine tuning of BGP was implemented just before the migrations from the RR setup to the full mesh. The fine tuning consisted of some features for TCP and increases interface queues.

The first thing we did was to increase the queue depth on all the interfaces in core. The depth of the queues was set to 1500 packets instead o the default value of 75 packets. This will minimize the number of dropped packets and retransmissions in the convergence phase. The command used for this operation was:

router(config-if)#hold-queue 1500 in

The SPD (Selective Packet Discard) queues where also modified. The SPD headroom was extended from 100 to 1000 packets and the SPD extended-headroom was extended from 10 to 1000 packets. The SPD headroom 1000 is the default value in Cisco IOS 12.0(23)S1 and therefore it will not show up in the configurations anymore. Commands used for this operation where:

router(config)#spd headroom 1000

router(config)#spd extended-headroom 1000

We also manipulated the default IP MTU to consider MSS (Maximum Segment Size). When the routers consider MTU it will be possible to send larger IP packets which leads to less BGP packets to send a complete update with the default-less routing table. The command used was:

router(config)#ip tcp path-mtu-discovery

(23)

The last thing we changed was how often a router wants an acknowledgment of sent packets. The parameter was set to it's maximum value of 65535 bytes. This will minimize the amount of acknowledgments a router needs to consider and along side a larger queue depth minimize retransmissions. We used the command:

router(config)#ip tcp window-size 65535

IS-IS Fine tuning

To improve the performance of the IGP (IS-IS) we wanted to change some timers that affect the convergence times for IS-IS.

To measure the difference in convergence times before and after tuning we used two UNIX hosts that were connected directly to the backbone. One is located in Umeå (head.snowman.sunet.se)

and the other in Malmö (foot.snowman.sunet.se). The path that is normally used between the

hosts is shown in the figure below.

(24)

Fping was used on the UNIX hosts:

fping -p 1 -C 10000 -r 1 -t 50 head.snowman.sunet.se

This will result in the following behaviour 1 ms between pings

50 ms for timeouts One retry per ping

Average roundtrip ~23-30ms

This means that the time between each lost packet is ~51ms. This assumption is used for the calculation of convergence times.

Initial Convergence Test

The link between Uppsala and Borlänge was selected as target for the convergence test. It was selected on the basis that it wasn't close to the UNIX hosts and that the alternative path is involving many routers that need to update the routing table.

(25)

When the link was taken down we lost approx 210 packets (50ms + 1ms) * 210 = 10.7s

When the link was restored we lost just 1 packet

IS-IS Tuning

The following commands were added to the core routers.

int pos x/x carrier-delay msec 8 router bgp 1653 bgp update-delay 300 router IS-IS spf-interval 1 1 50 prc-interval 1 1 50 lsp-gen-interval 5 1 50

set-overload-bit on-startup wait-for-bgp lsp-refresh-interval 65000

max-lsp-lifetime 65535 ignore-lsp-errors

log-adjacency-changes

Not all of the above commands affect the convergence time of IS-IS. Some of them control the behaviour of IS-IS when the router is booting.

Convergence test after tuning

When the link was taken down we lost approx 28 packets (50ms + 1ms) * 28 = 1.4s

When the link was restored we lost approx 14 packets (50ms + 1ms) * 14 = 0.7s

Here we saw worse performance when restoring the link. This behaviour has been discussed with Cisco. There seems to be a number of factors which can explain a variation in convergence time between two measurements. To get more a reliable estimate, a large number of measurements would have to be done which is not feasible in a production network. Since the overall performance is better we accepted this result as a good result.

Summary for the tests

“Normal” IS-IS lost approx 215 packets in total = 10.9s “Tweaked” IS-IS lost approx 42 packets in total = 2.1s ~5 times improvement in total

~7.6 times improvement when fault introduced ~14 times degradation on restoration

The overall performance was about 5 times better than default configuration, so we decided to keep the new settings. More tests are needed to make the behavior even better. Since each test is

(26)

6. Distributed services

Måns Nilsson, KTHNOC, [email protected]

To augment and support the basic service in GigaSunet – forwarding of packets – a service host concept was formulated and proposed by KTHNOC. The services were previously concentrated in a single facility at KTHNOC, and while the basic design and performance of the network in effect remove the distinction between WAN and LAN, it was felt that a distribution of services would augment the network favourably, and also emphasize the decentralisation element of the network design.

These services included:

• DNS; both full-service resolvers and name service

• Authentication for network management

• Time synchronization

Also, the resilience against cathastrophical failures is increased with this model.

Design criteria

We felt that we should try to emulate the structure of the network in the placement of servers. The decision was made to place three servers in the network:

• One in Malmö, being DNS and NTP server.

• One in Göteborg, Authentication and NTP server.

• One in Umeå, similar to Malmö.

The placement and distribution decisions were based on a combination of available services and network structure. Göteborg has good connections to more than one part of the network; Malmö and Umeå are at the extreme ends of GigaSunet. Thus, authentication services would do well if placed in Göteborg and the Malmö and Umeå sites would offer minimum RTT to similar servers that are in fact clones of services only available in Stockholm.

Hardware considerations

The platform chosen was Sun Netra DC200, due to:

• 48V DC power being stable and redundant in Telia facilities

• The “LOM” concept closely mimics the manageability of other network elements like the Cisco Routers. LOM stands for "Lights Out Management", a service processor subsystem which activates as soon as power is applied to a so equipped computer, and gives an interactive interface which can be reached over the out-of-band network, much like a router. It enables controlled cold starting (including power off/power on) of the computer remotely.

(27)

Rollout

The machines were configured and set up by KTHNOC, and also installed by KTHNOC, due to the ”three-off” nature of the project. A larger rollout like the one employed for rest of the network was considered but rejected as having too much overhead. The servers were installed adjacent to the backbone routers (Telia POPs) in the three chosen cities. The hosts are network-wise, connected to Cisco 10720s that we already had in those POPs for other reasons. The console port of the service hosts is also connected to the out-of-band network.

Services deployed

The Malmö node was first to be deployed, and has been in operation since autumn 2002. It serves DNS for all SUNET forward and reverse domains, and also runs NTP.

The Göteborg node was second to go. Some issues with the authentication services setup, and lack of integral distribution systems in Tacacs has delayed its authentication role, but the NTP service has been operative for some time.

The Umeå node was last. It is a clone of the Malmö machine, but since we were waiting for a router to connect it to it has had the shortest time of deployment of them all.

(28)

7. Time in the network

Magnus Carlebjörk, KTHNOC, [email protected]

Overview

The purpose of this chapter is to present the configuration of the NTP protocol in GigaSunet. To make the NTP configuration easy to understand and to minimize the load on the core routers, a simple overlay hierarchy was selected.

The old topology

As a reference, here is the old NTP topology as it was in Sunet-155.

umea-1 umea-2 SVL-BB-1 SVL-BB-2 KTHNOC-1 stk-pr-2 stk-pr-1 trollhat1 STK-BB-1 STK-BB-2 GBG-BB-1 GBG-BB-2 MLM-BB-1 MLM-BB-2 lund-1 lund-2 spntp1 spntp2 time1 time2 time3

umea-1 spntp1 NTP Router ( stratum >= 2 ) NTP peer relationship NTP self-peer relationship NTP server ( stratum >= 1 ) NTP server relationship

Figure 18: Topology map for NTP in SUNET-155

A lot of peering relationships was utilized, including many self-peers3. This design probably worked as it was supposed to, but it contains some relationships that are unnecessary for

3_{In this chapter, a “peering”' relationship refers to an NTP association where both parties are willing to receive and}

(29)

GigaSunet. For instance, the self-peerings were most likely safe-guards against the NTP process dying. Some Unix NTP versions die if their NTP sources are unreachable for a long time, but that does not apply to the routers we are using in GigaSunet.

Topology

The topology relies on three strategically placed NTP servers (and three backbone routers) to supply time via NTP to GigaSunet. These NTP servers are named foot, belly and head (.snowman.sunet.se).

malmo1 stockholm1 lulea1

foot belly head

umea-1 foot NTP Router ( stratum >= 2 ) NTP peer relationship NTP server ( stratum >= 1 ) NTP server relationship stratum 2 ntp1 ntp2 krusovice time2 time1 stratum 1

Figure 19: Topology map for NTP in GigaSunet

The core routers and NTP servers all peer with each other. They get their time from public and private stratum 14 servers. The stratum 1 time servers we use are time1 and time2 from Statens provningsanstalt, ntp1 and ntp2 from Netnod and one internal server at KTHNOC (krusovice), see figure 19. The rest of the core routers collect their time from these six NTP servers at stratum 2 and then relay their time to GigaSunet customers if they request it. The selected topology is not overly redundant. We could have used a full mesh topology with all the backbone routers collecting and giving time to each other, but experience has showed that there is little to gain in that. In the current topology, the routers will be able to get their time anyway even if a few links break. Additionally, when a line in GigaSunet breaks, it is without exception repaired in a very short time. In that short time the routers own clock will be sufficient in keeping correct time until the faulty line(s) are repaired.

NTP configuration on the backbone routers

This is the configuration that is implemented on the backbone routers:

ntp authentication-key 1 md5 key ntp authenticate ntp trusted-key 1 ntp source loopback 0

ntp update-calendar

ntp server foot.snowman.sunet.se key 1 ntp server belly.snowman.sunet.se key 1 ntp server head.snowman.sunet.se key 1

(30)

ntp server malmo1.sunet.se key 1

ntp server stockholm1.sunet.se key 1 ntp server lulea1.sunet.se key 1

The key keyword on the first line denotes the MD5 key for connection to the NTP servers. It is not mandatory to use MD5 (the NTP servers will serve us time anyway) but it gives us authentication in the odd event that someone is spoofing the IP address of the NTP server(s).

NTP sends its messages on UDP:123 and it sends roughly one message per minute per association (peer, server, broadcast etc). Because of this, it can take a few minutes before the NTP process has configured its associations and decided which time source to use. Therefore, the NTP associations might look like they are malfunctioning the first minute. If we want to track the progress of NTP on a router, we can use the following commands on that router:

show ntp status

show ntp associations

show ntp associations detail Example:

stockholm2>show ntp associations

address ref clock st when poll reach delay offset disp *~130.242.80.21 192.36.143.150 2 75 1024 377 21.2 -0.58 0.2 +~130.242.80.23 192.36.143.150 2 330 1024 377 13.4 -0.02 41. ~130.242.80.31 192.36.144.22 2 67 1024 377 173.7 84.67 83.3 * master (synced), # master (unsynced), + selected, - candidate, ~ configured stockholm2>

Here we can see that 130.242.80.21 is chosen as this routers master NTP server. For more

information about poll, reach, delay etc, search the Cisco web site or go to the official NTP home page at http://www.ntp.org.

Output from show running

To see how the previous configuration looks when it is used in a router, we can issue the command

show running-config | inc ntp :

ntp authentication-key 2 md5 0117510F0A3B56150A4B 75 2

ntp authenticate ntp trusted-key 2

ntp clock-period 17179848 36

ntp source Loopback0 ntp update-calendar ntp server 130.242.94.50 key 2 ntp server 130.242.94.34 key 2 ntp server 130.242.94.18 key 2 ntp server 130.242.80.23 key 2 ntp server 130.242.80.31 key 2 ntp server 130.242.80.21 key 2

5_{The number 7 indicates the type of encryption used. It is not necessary to specify it when pasting the configuration}

into a router. Currently, only MD5 is supported.

(31)

When pasting this configuration into a router, the router translates the domain names of the NTP servers to IP addresses. Since link addresses may change, the names point to the loopback

interfaces. Therefore, the backbone can change its topology in terms of links and their associated IP addresses without affecting the NTP topology.

We can see that some sort of conversion has been performed on our key and that it is displayed as a long sequence of letters and numbers. One might think that this is the md5 encrypted password that will be used in peering relationships and such, but that is not the case.

When we issue a show running-config | inc ntp command, we see the key as it is stored in the running configuration, namely in a weak reversible type 7 ``encryption''7, although we specified MD5. Interestingly though, this does not matter, because the running configuration is protected by the enable password and the key is actually converted to a real MD5 password before it is inserted into an NTP packet. We cannot copy and paste the first line directly from another router, since the target router will make a new hash of the source hash. In other words, we have to paste the original keyword in plain text8. In general, when it comes to NTP and authentication, what we want is primarily to protect ourselves against someone injecting false time into our routers. However, even without MD5 authentication, the NTP algorithm itself is highly resistant to false time, that's what it was designed to avoid in the first place...

The clock-period is an internal variable for the NTP protocol. It is automatically generated and should not be pasted from one configuration to another. Remove the clock-period command before pasting a configuration into another router.

7_{The last number on the line denotes the encryption type; 0 = clear text, 5 = MD5, 7 = weak password.}

8_{It should be possible to paste type 7 passwords into a running configuration, but the author has not succeeded with}

(32)

8. Management model

SUNET has introduced a three layer model for management and problem resolution.

• Level 1 is the 24x7 helpdesk for monitoring the network’s physical layer and receiving customer complaints. This is handled by Telia IT-Services in Göteborg. They report line problems to Telia/Skanova and hardware problems to Eterra. If the fault cannot be analyzed or solved at this level it is escalated to level 2.

• Level 2 handles routing problems, configurations and advanced fault detection and isolation. KTHNOC is responsible for this level. There is a close cooperation with Eterra, Cisco NRN and Cisco TAC in case of IOS bugs or difficult hardware problems. TAC cases are opened via Eterra.

• Level 3 is the management and design level handled by SUNET. KTHNOC provides feedback and also participates in the design process.

Figure 20: Management model with problem resolution process

Problem resolution process

Level 1:

Network operations, basic fault detection and –isolation.

Level 2: Config. review/correction, advanced fault detection/isolation. Deployment Level 3:

Review reports and tickets. SLA follow-ups etc. Network design

Contacts on management level

S

SUUNNEETT

Skanova/ STOKAB

Cisco

TAC

AS/NRN

Cisco

Eterra

SUNET CERT HW, obvious HW, Connectivit Connectivit Securit

(33)

9. Experiences

After more than 9 months of operation, we can now conclude that GigaSunet has fulfilled several of the goals we had. i.e. that in designing a robust production network, with very high availability. So far, we have had very little disturbances. Even though we (not surprisingly) have had fiber cuts, line card failures etc, the availability for the universities have been almost 100%, thanks to the robust configuration together with the redundant infrastructure and equipment. The goal of providing the universities with good bandwidth is also fulfilled.

This is in spite of the fact that a major part of the equipment and software were leading edge and not well proven. We have had card series problems and IOS version problems which could have proven fatal without the resilient design. It is essential though to have a well functioning relation and channels of communication with the manufacturer and his development team to tease out both hardware and software bugs.

About 75% of the access routers have been handed over to the universities to be managed locally. This has put more demand on the local university regarding competence and staff availability. Central operations basically have 24x7 but the local university usually operates during office hours. From an operations point it is necessary to have a well defined interface for handing over and to get access in order to troubleshoot with the router on site.

First priority was to get the new network operational with basic functions and then we could start with optimizing performance.

The initial routing design built on information from Cisco that the GSRs could not handle full mesh peering in a network with nearly 50 nodes. Later information changed that and we could implement full mesh. This made it possible to do hot potato routing towards external peers with several

peering points. After careful planning the changeover from RR topology to full mesh in the backbone took only 50 minutes with a team of 9 engineers.

Tuning of IS-IS showed that convergence time was improved 5 times. BGP tuning gave the same or better results.

Tests have been done to verify the end-to-end performance within GigaSunet. The network could, without problems, cope with a 1 Gbit/sec connection between hosts connected to the access routers in Luleå and Stockholm (approx 1.000 km apart). Jitter and packet loss measurements show very low figures, as expected.

GigaSunet Implementation Report