NCGuard validation engine - Methods and techniques for disruption-free network reconfiguration

While relying on a high-level model reduces the likelihood of making some types of mistakes (e.g., duplication mistakes), some of them can still hap- pen. Even worse, these mistakes could be mirrored in every single device as the model gets translated into low-level configurations. Therefore, the model should be validated prior to the generation phase so as to ensure that it complies with the configuration objectives of the operator. In this section, we first characterize the objectives (or intents) pursued by network operators when configuring a network, then we describe how to rep- resent them in NCGuard by using validation rules.

8.4.1 Characterizing configuration objectives

To identify the configuration objectives pursued by operators, we studied the network-wide configuration of two research networks, the Belgian Re-

1 ################# Abilene ###################### 2 abilene = Network("Abilene", 11537) 3 4 ################# Prefix Defintion ###################### 5 abilene.addInternalPrefix([IPv4Network(’100.0.0.0/8’)]) 6 abilene.setInternalLoopbackPrefix(IPv4Network(’100.0.0.0/16’)) 7 abilene.setSecondaryInternalLoopbackPrefix(IPv4Network(’50.0.0.0/16’)) 8 abilene.setInternalLinksPrefix(IPv4Network(’100.1.0.0/16’)) 9 abilene.setExternalLinksPrefix(IPv4Network(’100.10.0.0/16’)) 10 11 ################# Routers Definition ###################### 12 SEAT = Router(’SEAT’) 13 LOSA = Router(’LOSA’) 14 SALT = Router(’SALT’) 15 KANS = Router(’KANS’) 16 HOUS = Router(’HOUS’) 17 CHIC = Router(’CHIC’) 18 ATLA = Router(’ATLA’) 19 NEWY = Router(’NEWY’) 20 WASH = Router(’WASH’)

21 abilene.addRouter(SEAT, LOSA, SALT, KANS, HOUS, CHIC, ATLA, NEWY, WASH) 22

23 ################# Intra-Links Definition ######################

24 abilene.addIntraLink(SEAT, LOSA, attributes={’ospf_weight’:1342}) 25 abilene.addIntraLink(SEAT, SALT, attributes={’ospf_weight’:913}) 26 abilene.addIntraLink(LOSA, SALT, attributes={’ospf_weight’:1303}) 27 abilene.addIntraLink(LOSA, HOUS, attributes={’ospf_weight’:1705}) 28 abilene.addIntraLink(SALT, KANS, attributes={’ospf_weight’:1330}) 29 abilene.addIntraLink(KANS, CHIC, attributes={’ospf_weight’:690}) 30 abilene.addIntraLink(KANS, HOUS, attributes={’ospf_weight’:818}) 31 abilene.addIntraLink(CHIC, NEWY, attributes={’ospf_weight’:1000}) 32 abilene.addIntraLink(CHIC, WASH, attributes={’ospf_weight’:905}) 33 abilene.addIntraLink(NEWY, WASH, attributes={’ospf_weight’:277}) 34 abilene.addIntraLink(WASH, ATLA, attributes={’ospf_weight’:700}) 35 abilene.addIntraLink(CHIC, ATLA, attributes={’ospf_weight’:1045}) 36 abilene.addIntraLink(ATLA, HOUS, attributes={’ospf_weight’:1385}) 37 38 ################ IGP Configuration ################## 39 abilene.igp.enableOSPF() 40 abilene.igp.setOSPFOptions([’AllInternalLinks’, ’Flat’]) 41 42 ############### BGP Configuration ################## 43 abilene.bgp.enableBGP() 44 abilene.bgp.createIBGPFullMesh()

Listing 8.1 By using NCGuard’s programmatic interface, only 45 lines of code are needed to describe the Internet2 IGP and BGP configuration.

search Network and the Abilene network. We also analyzed several ven- dors recommendations such as [35, 36, 78, 60] as well as the configuration problems found by tools such as rcc [44], minerals [81] and others [87, 103, 45, 10].

Typically, configuration objectives can be hierarchically decomposed into high-level objectives and low-level objectives. High-level objectives correspond to general decisions concerning the network organization. Ex- amples of high-level objectives include: ensuring the distribution of interdomain routes to all BGP routers, ensuring the distribution of intradomain routes to all routers or, ensuring sub-second convergence upon any IGP link failures. In contrast, low-level objectives are more specific and imple- ment high-level objectives. Usually, the same high-level objective can be realized trough different combinations of low-level objectives. For instance, to ensure the distribution of interdomain routes, three different realiza- tions exist: (i) use an iBGP full-mesh, (ii) use route reflectors and, (iii) use confederations. Each of these objectives further depends on others. For example, an iBGP full-mesh requires the IGP to advertise reachability for the endpoints of all iBGP sessions. This IGP objective is usually realized by using OSPF or ISIS. If OSPF is used it further requires the OSPF areas to be connected.

Most configuration objectives, in particular the low-level ones, can be classified into three different objective patterns: (non-)presence, uniqueness and symmetry. The last two patterns were already mentioned in [45].

The presence pattern represents objectives in which some configura- tion command must be present on a set of devices. For instance, [35] rec- ommends to define an identifier on all routers to avoid letting them select a different one after a reboot. Another example is the passive keyword which should appear on the interfaces that connect an OSPF router to a different network to avoid creating OSPF adjacencies with a neighboring peer [35]. The non-presence pattern is the dual of the presence pattern. For instance, it can check that stub areas in OSPF do not contain virtual links [36].

The uniqueness pattern represents objectives where several network elements must have a unique value for a given configuration parameter. One example is the IP addresses assigned to different physical interfaces that must be different [35]. Another example is that, for security reasons, all eBGP sessions must use a different MD5 password.

The symmetry pattern represents objectives where two configurations must contain related or identical parameters. We distinguish two types of symmetry patterns. The first type is the simple equality pattern in which some configuration parameters in two different network elements must be equal. For example, two routers attached to the same physical link must use the same layer-2 encapsulation. As another example, the same OSPF timers must be configured on two adjacent routers. A third example could be that the same OSPF weights must be used on both sides of a link (to enforce symmetrical routing). The second type of symmetry pattern is the cross equality pattern in which the equivalence has to be ensured across

multiple parameters. For instance, in order to establish an iBGP session between routersA and B, router A has to configure B as its neighbor, and B has to configure A as its neighbor [36, 35].

Finally, the custom pattern represents more complex configuration ob- jectives that do not belong to any of the previous patterns. Usually, these configuration objectives relate to the entire network. One example of such objective could be that the network should remain connected after any single link failure or after the simultaneous failure of any pair of routers. Another example could be that at least two disjoint equal-cost-paths must exist between any routers.

8.4.2 Modeling and verifying configuration objectives

In NCGuard, a configuration objective is expressed as a rule. A rule veri- fies that a given property is satisfied by the high-level representation. As with the high-level representation, we chose to represent the rules with XML elements. Since rules are hierarchically organized, the XML elements representing the rules are organized as a tree, where the root corresponds to a very high-level rule such as “the network is correct” (see Fig. 8.2). Each rule can define one or more dependencies (i.e., branches in the tree). Depending on the rule, all the dependencies must be verified (and depen- dencies) in order for the high-level rule to be verified or only one of them (or dependencies).

<rules rootid="CORRECT_NETWORK"> <rule id="CORRECT_NETWORK">

<description>This is the main rule.</description> <dependencies type="and"> <depends>CORRECT_CONFIGURATION</depends> <depends>CORRECT_IP_CONNECTIVITY</depends> <depends>CORRECT_VLAN</depends> <depends>CORRECT_OSPF</depends> <depends>CORRECT_BGP</depends> </dependencies> </rule> ... </rules>

Listing 8.2 NCGuard hierarchically organizes validation rules

NCGuard uses three techniques to verify rules. First, rules can be verified by constraining the syntactical structure of the high-level XML representation. For instance, forcing each XML node representing a router to have exactly one child node containing its identifier. Typically, such constraints are expressed by associating the high-level representation with a grammar which precisely defines the syntax of the high-level XML representation. In NCGuard, this grammar is implemented by using a XML Schema [43]. Rules implemented with this technique are called structural rules. Second, rules can be checked by executing queries on the network representation. These rules are called query rules. In NCGuard, the queries

Structural Query Language Total Uniqueness 14 6 - 20 Symmetry 10 - - 10 Presence 82 15 - 97 Custom - 6 3 9 Total 106 27 3 136

Table 8.1 More than 100 rules have been defined to check Abilene’s configuration objectives. Most of them are implemented with structural rules, as syntactic constraints expressed on the high-level representation.

rules can be checked by using a programming language (Java in our proto- type). These rules are called language rules. Depending on the rule, one verification technique is typically more appropriate. For instance, presence and uniqueness rules are easily implemented using a structural or a query rule. Conversely, custom rules usually require a query rule or a language rule as they require more expressiveness. Finally, symmetry rules are implicitly verified since redundant informations are factorized in the high-level representation. For instance, the symmetry of the MTU definition is ensured by defining it directly on the link and not on the interfaces connected to the link. As an illustration, we developed a high-level representation of the Abilene network by reverse-engineering their router configurations. Based on these configurations, we also inferred 136 configuration objectives. We represented most of these configuration objectives by using structural and query rules. Table 8.1 details the distribution of the techniques used to validate the configuration rules.

In the following, we provide four examples of NCGuard rules implemented with different techniques.

Rule#1: All routers have a unique router id This combined rule (presence and uniqueness) is expressed by two structural constraints inside the XML schema (see Listing 8.3).

Rule#2: A loopback interface is present on all nodes This presence rule is expressed by a query rule. As described earlier, NCGuard imple- ments query rules by using XQuery expressions. To simplify the addition of query rules, we abstracted them in a XML element composed of a scope, a descendant, and a condition sub-elements (see Listing 8.4). The scope element identifies the set of XML nodes in the high-level representation on which the rule applies. The descendant element identifies (if needed) specific children nodes of the scope node. Finally, the condition element states a property that must be respected by at least one descendant node for each scope node. Consequently, the rule depicted in Listing 8.4 gets translated by NCGuard into a XQuery which checks that for all routers

<xs:complexType name="router"> <xs:sequence>

...

<xs:element name="rid" type="InetAddressIPv4" minOccurs="1" maxOccurs ="1"/> ... </xs:sequence> </xs:complexType> <xs:key name="nodeIdKey"> <xs:selector xpath="topology/routers/router"/> <xs:field xpath="@id"/> </xs:key>

Listing 8.3 Presence and uniqueness rules are implemented with syntactical constraints expressed on the high-level representation by using a XML schema. In this excerpt, the first constraint mandates a router element to have a rid, while the second constraint mandates the rid to be unique.

(scope), there is at least one interface (descendant) for which the id is a loopback id (condition).

<rule id="LOOPBACK_INTERFACE_ON_EACH_NODE" type="presence"> <presence> <scope>ALL_ROUTERS</scope> <descendant>interfaces/interface</descendant> <condition>substring(@id,1,2)=’lo’</condition> </presence> </rule>

Listing 8.4 Presence rule: Loopback interface on each node

Notice that the distinction between scope and descendant, although not technically needed, allows to reuse scope definitions in different rules, which happens frequently. Examples of frequently used scopes include: the set of all routers, the set of all border routers, the set of all loopback interfaces, etc.

Rule#3: Loopback addresses are advertised in OSPF This presence rule (see Listing 8.5) is also verified by a query rule. Notice that this rule is perfectly equivalent to the previous one except for the scope definition.

<rule id="LOOPBACK_ADVERTISED_OSPF" type="presence"> <presence> <scope>OSPF_NODE</scope> <descendant>interfaces/interface</descendant> <condition>substring(@id,1,2)=’lo’</condition> </presence> </rule>

Rule#4: OSPF areas must be directly connected to the backbone area This rule is an example of a custom rule as it belongs to none of the patterns identified above. However, it can be implemented with a tailored XQuery query. To report useful errors and warnings, NCGuard queries should always identify the elements that do not respect the rule. In List- ing 8.6, the query identifies the OSPF areas without an area border router connected to the backbone area.

<rule id="AREAS_CONNECTED_TO_BACKBONE_AREA" type="custom"> <custom>

<xquery><![CDATA[

for $area in /domain/ospf/areas/area[@id!="0.0.0.0"] let $backbone_nodes :=

/domain/ospf/areas/area[@id="0.0.0.0"]/nodes/node

where not(exists($backbone_nodes[@id=$area/nodes/node/@id])) return <result><area id="{$area/@id}"/></result>]]>

</xquery>...

Listing 8.6 OSPF areas must be directly connected to the backbone area

To actually validate the representation, the validation engine proceeds in two steps. First, it checks if the XML representation is syntactically correct. If it is the case, the validation engine recursively verifies the rules located in the tree, starting at the root. Whenever a rule is not verified, an error containing a counter-example is printed (see Listing 8.7 for an example). Indeed, as noted by [39], meaningful error messages are important to help network operators troubleshoot the model. Regarding its imple- mentation, the validation engine was written in about 1500 lines of Java, leveraging the saxon XML libraries.

Node NY does not have the ’next-hop-self’ option enabled. Therefore, routes announced on eBGP session ’NY-GEANT’ might be unreachable inside the AS.

Activate ’next-hop-self’ or put interface ’ge-4/0/0.102’ inside OSPF (as passive).

Listing 8.7 Excerpt of NCGuard output. To help debugging, meaningful errors are important for network operators. To that extent, NCGuard prints counter- examples whenever it is possible.

In document Methods and techniques for disruption-free network reconfiguration (Page 169-175)