Hui Zang1 and Antonio Nucci2?
1 Sprint Advanced Technology Laboratories
1 Adrian Court, Burlingame, CA, USA [email protected]
2 Narus, Inc.
500 Logue Avenue, Mountain View, CA, USA [email protected]
Abstract. This paper investigates the problem of deploying NetFlow with optimized cov-erage and cost in an IP network. Deploying a network-wide monitoring infrastructure in operational networks is necessary for practical reasons and Cisco NetFlow is a promising solution. However, several cost factors are associated with enabling NetFlow given the current conditions in such a network. We argue that enabling NetFlow to cover a major portion of traffic instead of the entire traffic will achieve significant cost savings while at the same time give operators enough monitoring capabilities. Therefore we aim to solve the Optimal NetFlow Location Problem (ONLP) for a given coverage ratio. We analyze various cost factors to enabling NetFlow in such a network. We model the problem as an Integer Linear Program (ILP). Although we are able to obtain optimal solutions for Sprint’s North America Network by solving the ILP, two heuristic algorithms, Max-Plus (MP) and Least-Minus (LM), are developed to cope with larger-sized problems, given the NP-hard nature of the ONLP problem. The performance of the ILP and heuristics is demonstrated by numerical results and the LM heuristic is able to achieve sub-optimal so-lutions within 1∼2% difference from the optimal solutions in a mixed router environment. It is observed that we can achieve 55% cost savings by covering 95% instead of 100% of the network traffic. The problem and the proposed methodology can be generalized to optimal deployment of new services and features in any types of networks.
Keywords: NetFlow, Integer Linear Programming, Optimal Placement
1
Introduction
Operating a large IP network without a detailed, network-wide knowledge of the traffic demands is challenging. An accurate view of the traffic demands is crucial for a number of important tasks, such as failure diagnosis, capacity planning and forecasting, routing and load-balancing policy optimization, attack identification, etc. It is obvious to oper-ators now that network monitoring and traffic measurement is a necessity and Cisco’s
?
NetFlow [1] emerges as a viable solution to this problem. NetFlow has received attention from both industry and academic researchers. For example, NetFlow data has been used to examine the accuracy of traffic matrix estimation techniques [2]. The prior work on NetFlow has been focusing on performance issues. Reference [3] compares NetFlow to SNMP and packet-level data collection, while [4] proposes new sampling techniques that can be used by NetFlow. In this paper, we study the issues in the deployment of NetFlow. NetFlow is a set of features available on Cisco routers and other switching devices that provide network operators with access to IP flow information from their data net-works [1]. The NetFlow infrastructure consists of two main components: NetFlow Data Export (NDE) and NetFlow Collector (NFC). The NDE is a module configured on routers and captures each IP flow traversing a router.3When a timer expires or the NetFlow cache becomes full, IP flow statistics, such as number of IP flows, number of packets and bytes associated to each flow, source/destination AS numbers, source/destination prefix masks, etc, are exported to a NFC as UDP packets.
IP networks usually contain a large diversity of routers and not all interfaces on all routers can run NetFlow. Although NetFlow can be configured at per-interface basis, NetFlow-supporting capability is determined by the linecard and the router. There are three types of linecards in terms of NetFlow support: 1) linecards that support NetFlow in most traffic conditions, 2) linecards that do not support NetFlow, and 3) linecards that support NetFlow in only certain (light) traffic conditions. Care must be taken for type 3) linecards since turning on NetFlow could potentially impact linecard’s performance on packet forwarding, i.e. cause losses and large latency, or generate inaccurate flow statistics. Linecards of types 2) and 3) can usually be upgraded to a newer configuration to support NetFlow.
Enabling NetFlow at specific router interfaces is not enough. The IP flow statistics exported by NDE modules at each router must be collected by NFCs. Operators process all the data stored in NFCs to gather the information they need.4 There are two problems
when NFCs are considered. First, only a limited number of routers can be served by the same NFC. Second, carriers prefer to collocate NFCs with the NDEs that they serve to avoid the flooding of large amount of information over long-haul IP links.
Therefore, in order to enable NetFlow and utilize the data properly, operators need to identify: 1) a proper configuration for each router enabled to support NetFlow (NDE); and 2) a proper location for each NetFlow Collector (NFC).
The goal of this paper is to provide a methodology and a set of recommendations to op-timizing the NetFlow deployment process. More precisely we are interested in identifying which routers and which linecards on routers should be NetFlow-enabled, such to cover a major portion of network traffic while minimizing the total capital investment required. We refer to the problem of covering a given fraction of traffic on the selected routers while minimizing the total cost as the Optimal NetFlow Location Problem (ONLP). The solution to this problem will assist an operator in two situations: i) For an operator who has decided to deploy NetFlow, identify the proper locations of routers to enable NetFlow
3
An IP flow is identified as the combination of seven fields as Source and Destination IP addresses, Source and Destination Port numbers, IP protocol type, ToS bytes and Input Logical Interface (ifIndex).
4 NetFlow Data Analyzer (NDA) is a NetFlow-specific traffic analysis tool that enables the users/operators to
to achieve a lowest capital investment; ii) For an operator who hasnot decided to deploy NetFlow, obtain a partial NetFlow deployment to achieve a best-coverage with a limited investment for the operator to examine the functions and benefits of NetFlow. We formu-late ONLP into an Integer Linear Program (ILP) model. We also propose two efficient heuristic algorithms to solve it. We demonstrate results from both solving the ILP and applying the heuristics and show great cost benefits can be achieved by carefully choosing the locations of NetFlow deployment.
We target NetFlow and IP networks in this paper to demonstrate how location opti-mization for a given network functionality should be pursued without losing the generality that, the methodology proposed in this paper can also be applied to other services/features in other types of networks.
The rest of the paper is organized as follows. In Section 2, we formally state ONLP and propose an Integer Linear Programming model which can be solved for the optimal solutions of ONLP. In Section 3, we introduce two efficient heuristic algorithms to solve ONLP for larger-sized networks, for which the optimal solutions are too expensive to compute. Numerical results are presented in Section 4 demonstrating the trade-off between traffic coverage and required investment and performances achieved by both the ILP and the heuristics. Section 5 provides recommendations on a NetFlow deployment strategy with the best coverage-cost trade-off and concludes the paper.
2
Optimal NetFlow Location Problem: Problem Statement
and ILP Formulation
We consider a network with a set of routers and a set of interfaces on these routers. Our goal is to monitor a portion of the traffic switched by these interfaces by enabling NetFlow on the linecards that these interfaces reside on. The solution to the NetFlow location problem can then be applied a set of interfaces in a given network which switch a particular traffic type independently, i.e., for two routers R1 and R2 identified, there is
no flow f ∈T∗ that is switched by both R1 and R2 in the same ingress/egress direction.
We can formally state the Optimal NetFlow Location Problem (ONLP) as follows. Given:
– The routers in a network R={R1, R2, . . . , RN}, and for each router Ri ∈ R, a set of
linecards IRi ={IRi 1 , I Ri 2 , . . . , I Ri Si}.
– A set of PoPs P = {1,2, . . . , L}, and for each PoP i, the set of routers associated:
Pi ⊆ R. Pi∩ Pj =φ for ∀i, j :i6=j and ∪1≤i≤LPi =R.
– A traffic volume ti,j associated with each linecard IjRi on each routerRi.
– A cost functionF for any router Ri to have NetFlow enabled at a subset of linecards
I0Ri ⊆ IRi,F :R × I → Z+∪ {0}, whereZ+ denotes the set of positive integers, and
I =∪1≤i≤NIRi.
– A cost function C for the collectors deployed at PoP i when n (n ≥ 0) routers in Pi
are NetFlow-enabled, C :Z+∪ {0} → Z+∪ {0}.
– A coverage ratio D: 0< D≤1.
We need to find a subset of routers R0 ⊆ R such that for each R
i ∈ R0, NetFlow is
is covered by NetFlow: X i:Ri∈R0 X j:IjRi∈I0Ri ti,j ≥D× X 1≤i≤N X 1≤j≤Si ti,j,
while at the same time, minimizing
X i:Ri∈R0 F(Ri,I0 Ri ) + X 1≤j≤L C(|R0∩ Pj|),
where | · |denotes the cardinality of a set.
We formulate the Optimal NetFlow Location Problem (ONLP) as an Integer Linear Program (ILP). Different constraints may be applied to different routers. We consider Cisco GSR routers [5] and 7500 routers [6] in this exercise. In Appendix C, we discuss the details of their capability in supporting NetFlow and we also set up a testbed to study the impact of NetFlow on 7500 routers and determine the need and cost of upgrading a 7500 linecard. Although totally different methods are applied to obtain cost figures for both families of routers, from the modeling perspective, the main differences between both families of routers are the following. First, when upgrading a 7500 linecard, only the processor and memory are upgraded and the interfaces on the linecard remain unchanged, while the entire linecard is replaced when upgrading a GSR linecard which implies that the number of interfaces on the linecard may change with the upgrade. Second, a router consists of a Route Switch Processor (RSP) and a number of linecards. When upgrading a 7500 router’s linecards, sometimes we need to upgrade the RSP on this router as well. However, when upgrading a GSR router’s linecards, we do not need to upgrade the GSR’s RSP because most of the processing is done by the linecards. These differences will be reflected in the ILP formulation.
2.1 Notation
Let G7500 and GGSR be the set of all 7500 and GSR routers, respectively. Let P =
{1,2, . . . , L} be the set of all PoPs in the network and Pi represent the set of routers
belonging to PoPi. A router is present in one and only one PoP. For a router g, letS(g) be the set of slots on router g, whose cardinality is denoted by |S(g)|. Let t(g, s) be the traffic processed at slot s on router g. We define the specific notations for 7500 routers, GSRs, collectors and traffic coverage respectively.
7500 routers Let c(g) be the minimal cost to upgrade the current configuration of router g to one that supports NetFlow. c(g) = 0 if the current one supports NetFlow. Binary parameter r(g) = 1 if such an upgrade is available, and r(g) = 0 otherwise. Let
c(g, s) be theminimalcost to upgrade the current configuration at slot s, routerg to one that supports NetFlow. c(g, s) = 0 if the current configuration supports NetFlow. Binary parameter r(g, s) = 1 if such an upgrade is available, and r(g, s) = 0 otherwise.
GSR routers Let T be the set of all linecard types present on the routers in GGSR.
linecard type t∈ T may or may not be upgraded to another linecard type that supports NetFlow. Let r(t) be a binary parameter which equals to 1 if linecard type t can be upgraded to a new version supporting NetFlow and 0 otherwise. Let c(t) represent the cost to upgrade ifr(t) = 1. For each routerg ∈ GGSRand for eacht∈ T(g) we defineVg(t)
as the set of slot-indices where a linecard of type t is present. Let pg,s(t) represent the
number of used ports of the linecard of type t∈ T(g) in slot s on router g in the current configuration. Let ag(t) denote the number of available ports in the upgraded version of
linecard t ∈ T.
Collectors LetC represent the cost of a single collector. LetN be the maximum number of routers that can be supported by a single collector. According to [1],N = 5 and varies with traffic and the NetFlow sampling rate. In this study, we assume N to be constant since so far there has been no public documentation on how N varies. The model can be easily extended to incorporate different constraints on N.
Traffic Coverage We define D (0 < D ≤ 1) as the minimum fraction of traffic that needs to be covered by NetFlow.
2.2 Decision Variables
The following decision variables are to be solved: – Binary variable η(g, s) forg ∈ GGSR
S
G7500,s ∈ S(g) equals to 1 if slot s on router g
is selected to run NetFlow, and 0 otherwise.
– Binary variable γ(g) for ∀g ∈ G7500 equals to 1 if router g is selected to run NetFlow,
and 0 otherwise.
– Integer variable νg(t) describes the number of linecards of type t ∈ T(g) on router
g ∈ GGSR that need to be upgraded to run NetFlow.
– Integer variableN Ci is the number of collectors needed at PoPito cover all the routers
that have NetFlow enabled.
2.3 Objective
The objective of the ONLP problem is to minimize the total cost expressed by
F =F7500+FGSR+FCol, whereF7500 =Pg∈G7500(c(g)γ(g)+ P s∈S(g)c(g, s)η(g, s)),FGSR=Pg∈GGSR P t∈T(g)νg(t)c(t),
and FCol=Pi∈PN Ci×C. 2.4 Constraints
• Relationship between variablesγ and η for 7500 routers:
γ(g)≤ X
s∈S(g)
Constraint (1) links the variables γ associated to each router with variablesη associated to each slot. The left inequality in (1) forces γ(g) to be 0 if none of its slots has been selected to run NetFlow. The right inequality in (1) forces γ(g) to be 1 if one or more of its slots have been selected to run NetFlow.
• Relationship between r(g) and γ(g), and r(g, s) and η(g, s):
r(g)≥γ(g) ∀g ∈ G7500 (2)
r(g, s)≥η(g, s) ∀g ∈ G7500
[
GGSR,∀s∈ S(g) (3)
Constraints (2) and (3) guarantee that a router/slot can be selected to have NetFlow enabled only if its current configuration supports NetFlow or it can be upgraded to another configuration that supports NetFlow.
• Number of interfaces on GSR routers:
ag(t)νg(t)≥
X
s∈Vg(t)
η(g, s)pg,s(t) ∀g ∈ GGSR,∀t∈ T(g) (4)
Constraint (4) guarantees that we invest in the minimum number of linecards necessary according to the selection we made. For example, if router g has two linecards of type t
with one port being used on each, and the upgraded version of linecard type t has four ports available, then Constraint (4) implies that only one upgraded version of linecard type t is necessary, i.e. νg(t) ≥ 1. When the total cost is minimized by the objective
function, νg(t) will be forced to be 1.
• Fraction of the total traffic to be covered by enabling NetFlow on specific routers and slots: X g∈G7500 X s∈S(g) t(g, s)η(g, s) + X g∈GGSR X s∈S(g) t(g, s)η(g, s)≥D× X g∈G7500SGGSR X s∈S(g) t(g, s) (5)
Constraint (5) ensures that the final solution selected must cover at least a Dfraction of the total traffic. It is clear that the larger D is, the larger will be the number of slots enabled to support NetFlow and the associated deployment cost.
• The number of collectors needed per PoP:
N ×N Ci ≥
X
g∈Pi
γ(g)≥N Ci ∀i∈ P (6)
Constraint (6) ensures that for any PoP, if there are routers with NetFlow enabled, the number of collectors in this PoP will be sufficient to cover all these routers. At the same time, no collectors should be placed at any given PoP where no router is enabled with NetFlow.
3
Heuristic Algorithms
We can prove that the Optimal NetFlow Location Problem (ONLP) is NP-hard by re-ducing the NP-complete problem Knapsack [7] to ONLP (Appendix A), which means that, there exist problem instances that are not likely to be solved within reasonable amount of time. For example, size of the network studied, changes in the network traffic distribution, changes in the pricing of the upgrade options, are crucial factors for which we may encounter problems in solving the ILP model to optimality. Therefore, heuristic algorithms are needed. We develop two heuristic algorithms in this section. To simplify the discussion, we assume that there is no need to upgrade 7500 RSPs. This assumption is verified by our network data which shows the current CPU utilization on 7500 RSPs is extremely low (Appendix C). Hence we only consider three types of cost in the heuristics associated respectively with: i) collectors, ii) GSR linecard upgrade, and iii) 7500 linecard upgrade. The heuristics can be easily extended if 7500 RSP cost were to be included.
The input and output of the two heuristics are the same as those of the ILP model. We remind the reader thatt(g, s) andc(g, s) are the traffic and the cost of upgrade associated with slot s on router g, respectively. In addition, the following notations/variables are used in the heuristics:
– Ttotal, the total traffic under consideration. The target is to coverD×Ttotalby NetFlow.
– Tcovered, the variable representing the traffic that is covered by NetFlow.
– Ctotal, the variable representing the total cost of deployment which is the objective in
the ILP.
To make the presentation concise, we assume all linecards are upgradeable to support NetFlow. The heuristics can be easily generalized to cover the other case.
We first develop a heuristic called “Max-Plus (MP)” and a formal specification is in Algorithm 1. In MP, we start with a network with no NetFlow and keep adding NetFlow-enabled router slots until the required traffic coverage is met. Collectors are added as needed. The admissibility of a slot is based on traffic flowing through the slot and the associated cost for enabling NetFlow, including the necessary collector deployment as well. After each selection, slot with the currently largest traffic/cost ratio will be added as NetFlow-enabled.
The second heuristic, called “Least-Minus (LM)” approaches the problem from the opposite direction and a formal specification can be found in Algorithm 2. In LM, we start with a network with full NetFlow coverage and keep removing NetFlow-enabled router slots and collectors until the traffic coverage is right at or below the required threshold. The admissibility of a slot for NetFlow removal is also based on traffic associated and the “cost” for enabling NetFlow on this slot, including both the upgrade cost and a “fair” share of the collector cost at the PoP. After each selection, slot with the currently lowest traffic/cost ratio will be removed. At the end, if the resulted traffic coverage is below the requirement, the last slot that has been removed (and its associated collector if applicable) is added back.
Algorithm 1Heuristic I - Max-Plus (MP)
1.0 InitializeTcovered= 0, andCtotal= 0. Set
Tremaining=Ttotal×D−Tcovered (7)
1.1 Examine all slots without NetFlow enabled. For each slotson routergat PoPp, calculateCcollector(g, s), as
the additional collector cost at PoPpif slotswere to be selected to enable NetFlow.
Ccollector(g, s) =
0 if routerghas NetFlow on, or if collectors at PoPpcan support one more router
Cotherwise
(8)
CostP erBit(g, s) = (c(g, s) +Ccollector(g, s))/M in(t(g, s), Tremaining) (9)
1.2 Enable NetFlow on slots at routergwith the smallest CostP erBit(g, s). SetTcovered =Tcovered+t(g, s),
andCtotal=Ctotal+c(g, s) +Ccollector(g, s). UpdateTremaining by Eqn. (7).
1.3 Repeat Steps 1.1 through 1.2 untilTremaining≤0 and return.
4
Numerical Results
In this section, we present numerical results obtained by applying the ILP model and the heuristics on Sprint’s North America IP backbone network (SNAIB-NET) with real traffic. We consider traffic carried on all links betweengateway (GW) routers andbackbone (BB) routers. We choose to enable NetFlow on gateway routers because it is more cost-effective to upgrade gateway routers than backbone routers as we found out by going through the router configurations.
4.1 Platform and Speed
We solve the ILP models using CPLEX [8] running on a 2.4 GHz Xeon processor with 1 GB RAM space. The time it takes to solve the ILP models for SNAIB-NET gateway routers ranges from a few seconds to 30 minutes. Note that we solved for several hundred of routers which is a subset of SNAIB-NET. Therefore, for networks of sizes less than hundreds of routers, it is feasible to use the ILP model to find an optimal solution for ONLP. The heuristics runs much faster - it takes sub-seconds to seconds for each heuristic to solve the problem for all coverage ratios.
4.2 The ILP Model and the Heuristics
In this subsection, we present the solutions from the ILP and two heuristics and compare the performance achieved by the heuristics with the optimal solution obtained from the ILP model. Figure 1(a) shows the normalized cost obtained from solving the ILP model and the two heuristics to achieve different coverage ratios from 50% to 100%. The costs are normalized by the cost required to provide 100% coverage, which is the same from all three methods. We notice that the cost to achieve 95% coverage is only about 45% of the cost that is required for 100% coverage. In Fig. 1(b), we plot the relative difference, i.e., the cost difference normalized by the optimal value between the results obtained by each heuristic and those obtained by solving the ILP. We can see that the two heuristics perform differently in terms of optimality. LM performs significantly better than MP. At 50% coverage, the solution from MP is 7% higher than the optimal solution while the
Algorithm 2Heuristic II - Least-Minus (LM)
2.0 For each slots on routergat PoPp, enable NetFlow. Set
Ctotal= X g X s∈S(g) c(g, s) +X p N C(p)×C (10) Tcovered= X g X s∈S(g) t(g, s) (11)
Textra=Tcovered−Ttotal×D (12)
2.1 Go to Step 2.3 ifTextra ≤0. Otherwise, examine all slots with NetFlow enabled. For each slot s on router
gat PoPp, calculateCcollector(g, s) as how much it is responsible for the collector cost at PoPp. LetNr(p)
denote the number of routers with NetFlow enabled at PoP pand Ns(g) denote the number of slots with
NetFlow enabled at routerg.
Ccollector(g, s) =N C(p)×C/(Nr(p)×Ns(g)) (13)
CostP erBit(g, s) = (c(g, s) +Ccollector(g, s))/t(g, s) (14)
2.2 Find a slot with the largestCostP erBit(g, s) and remove NetFlow at this slot. Assume this slot is slotson routergin PoPp. Calculate the reduction of the number of collectors at PoPpas
∆collector(p) =
1 if routerghas no other NetFlow-enabled slots, and the remaining routers at PoPpcan be served with one less collector
0 otherwise
(15)
UpdateN C(p) =N C(p)−∆collector(p),Ctotal=Ctotal−c(g, s)−C×∆collector(p), andTcovered=Tcovered−
t(g, s). RecalculateTextra by Eqn. (12). Go back to Step 2.1.
2.3 IfTextra= 0, Return. Otherwise, return after enabling NetFlow back to the slot picked by the last execution
of Step 2.2, and update the number of collectors.
difference between the results from LM and the optimal results is less than 2% for 50% coverage and constantly less than 1% for coverage ratios greater than 50%. LM performs better because it adopts an amortized collector cost in determining which slot to be NetFlow disabled. However, MP only considers the full collector cost when a collector is to be added as amortized cost cannot be obtained similarly due to the lack of information on how many slots will be enabled later on.
Note that in practice, it is not trivial to determine the feasibility of running NetFlow on a 7500 linecard at a particular network location and to obtain the upgrade cost of a 7500 linecard. We refer the readers to Appendix B for details.
50 55 60 65 70 75 80 85 90 95 100 0 10 20 30 40 50 60 70 80 90 100 110
Traffic Cover Ratio D (%)
Normalized Cost (%)
ILP Heuristic MP Heuristic LM
(a) Normalized cost obtained from ILP and heuristics
50 55 60 65 70 75 80 85 90 95 100 0 1 2 3 4 5 6 7 8
Traffic Cover Ratio D (%)
Relative Difference (%)
Heuristic MP Heuristic LM
(b) Difference between heuristic results and ILP results
5
Conclusions
In this paper, we studied the optimization problem for NetFlow deployment in an IP network. Specifically, we considered a partial NetFlow deployment to achieve the lowest cost for a given coverage ratio, which is the Optimal NetFlow Location Problem (ONLP). We developed an ILP model and two heuristic algorithms to select routers and slots to support NetFlow and the associated configurations such that a certain amount of network traffic is covered at a minimum cost.
We solved ONLP for Sprint’s IP backbone network in north America. We presented numerical results from applying the ILP model and two heuristics. We demonstrated that, it is possible to achieve significant cost savings by adopting a partial NetFlow deployment strategy, i.e., to cover a major portion of the network traffic instead of the entire traffic. A good coverage ratio is suggested as 95%, with 55% cost reduction.
Although our discussion was focused on Cisco NetFlow and the results were collected from Sprint’s operational IP backbone network only, the results can be referenced in similar practices and the methodology proposed can be extended and applied to a wide variety of network location problems to enable different features and services. Besides NetFlow from Cisco, other vendors also support similar flow-based monitoring services, such as sFlow [9], and our methodology can be applied to the deployment of sFlow as well. In addition, as ongoing work, we are extending our approach to network monitoring functions of finer granularity such as packet trace collection.
Acknowledgment
We thank Travis Dawson and Beng-Ong Lee at Sprint ATL for their support in the 7500 router testing and answers to our various NetFlow-related questions.
References
1. “NetFlow Services Solutions Guide,” Cisco white paper.
2. A. Soule, A. Nucci, R. Cruz, E. Leonardi and N. Taft, “How to Identify and Esti-mate the Largest Traffic Elements in a Dynamic Environment”, Proceedings of ACM Sigmetrics, New York, USA, July 2004.
3. R. Sommer and A. Feldmann, “NetFlow: Information Loss or Win?” Proceedings of Internet Measurement Workshop, Marseille, France, Nov. 2002.
4. C. Estan, K. Keys, D. Moore, and G. Varghese, “Building a Better NetFlow,” Pro-ceedings of ACM Sigcomm, Portland, OR, USA, August 2004.
5. “Cisco 12000 Series Router,” Cisco white paper. 6. “Cisco 7500 Series Router,” Cisco white paper.
7. M. R. Garey and D. S. Johnson, “Computers and Intractability, A Guide to the Theory of NP-Completeness,” Bell Telephone Laboratories, Inc., 1979.
8. http://www.ilog.com/products/cplex. 9. http://www.sflow.org.
Appendix A: ONLP is NP-Hard
In this section, we prove that the Optimal NetFlow Location Problem (ONLP) is NP-hard. First, we prove the followingdecision(“yes/no”) version of NetFlow Location Problem reduces to ONLP:
Given a traffic amount T and a costC, is it possible to upgrade the network with cost no more than C and cover at least traffic amount T?
We name the decision version of the NetFlow Location Problem DNLP.
If ONLP is solved with the optimal cost C∗ to cover traffic amount T, for any cost
C ≥ C∗, the answer to DNLP is “yes” and for C < C∗, the answer is “no”. Therefore, DNLP is solvable if ONLP is solvable.
We then prove that DNLP is NPhard by transforming a known NPhard problem -the Knapsack problem to DNLP.
A formal statement of the Knapsack problem is as follows [7]. A finite set U, a “size”
s(u) ∈ Z+ and a “value” v(u) ∈ Z+ for each u ∈ U, a size constraint B ∈ Z+, and a
value goal K ∈ Z+. Question: Is there a subset U0 ⊆ U such that P
u∈U0s(u) ≤ B and P
u∈U0v(u)≥K.
We restrict DNLP to the case that the cost of a collector is zero, the cost of a 7500 RSP is zero, and there is no GSR in the network. Now we focus on 7500 router slots since they are the sole source of upgrade cost. For each slot, there is traffict and upgrade costc
associated. There is a one-to-one mapping from Knapsack to DNLP. For eachu∈U with size s(u) and value v(u), construct a router slots with traffict=v(u) and costc=s(u). With the one-to-one mapping, it is obvious that Knapsack is solvable if and only if this restricted version of DNLP is solvable.
Since Knapsack is known as an NP-complete problem [7], DNLP is NP-hard. Therefore, the optimization version, ONLP, is NP-hard.
Appendix B: Generating Inputs
In this section, we generate input for SNAIB-NET to the ONLP ILP and heuristics which includes the traffic at all GSR/7500 slots and the upgradability and cost to upgrade to a NetFlow-supporting configuration for all GSR/7500 slots and 7500 routers. For both GSRs and 7500 routers, traffic processed at a slot can be obtained by simply processing the SNMP data, which records the load of each interface at 5-minute intervals. Since the GSR linecard upgrades are one-to-one mapping based on the engine type which we already know, it is straightforward to generate the input data on the GSRs. For the 7500 routers, in order to determine the upgradability and upgrade cost for a router/slot configuration to support NetFlow, we set up a Testbed in Appendix C and investigate the impact of enabling NetFlow in terms of CPU and memory utilization on the most common router configurations with different RSP and VIP models under different traffic scenarios. Our findings can be summarized into a few rules.
Let c(r)(g) represent the RSP CPU utilization at 7500 router g before NetFlow is enabled. For a RSP configuration to support NetFlow, c(r)(g) must be below threshold
RCP U:c(r)(g)≤RCP U, whereRCP U can be as high as 99%. There is no memory constraint
Let c(v)(g, s) represent the VIP CPU utilization at slot s on 7500 router g before NetFlow is enabled. For a VIP configuration to support NetFlow,c(v)(g, s) must be below
threshold T HCP U: c(v)(g, s) ≤ T HCP U, where T HCP U is at most 90% since enabling
NetFlow can increase the CPU utilization by 10%. In practice, we may want RCP U and
T HCP U to be even lower to accommodate temporal burstiness in CPU utilization.
Let m(v)(g, s) represent the total VIP memory capacity at slot s on 7500 router g,
and let m(v)req(g, s) be the memory required when NetFlow is enabled. m(v)(g, s) must be
greater than or equal to m(v)req(g, s) for slot s to support NetFlow: m(v)(g, s)≥m(v)req(g, s).
Let ∆M EM be the difference between m (v)
req(g, s) and the VIP memory usage before
NetFlow is enabled at slotson routerg.∆M EM follows a step function of a givenm(v)(g, s)
as shown in Appendix 5. A higher VIP version always provide the same or higher memory capacity options.
Let ∆CP U be the difference of CPU utilization between two adjacent VIP families
under the exactly same configuration and traffic condition, ∆CP U = 20% according to
Appendix 5. For example, by upgrading a VIP2-50 to a VIP4-50, or from a VIP4-80 to a VIP6-80, the CPU utilization will be reduced by 20%.
Given the constraints above, for every router/slot configuration, if we know the current CPU utilization and memory usage, we can, for every upgrade option, calculate the CPU utilization and memory usage under the current traffic condition. Therefore, to obtain the upgradability and upgrade cost to support NetFlow, we enumerate all possible con-figurations for a given router/slot and choose the one that satisfies the above constraints with the minimum cost. The router/slot cannot be upgraded if no possible configurations support NetFlow.
By processing router data, we found that all current router configurations support NetFlow sincec(r)(g) is always low. We are interested in the CPU and memory utilization
on the slots. To get an accurate measurement, we collect CPU and memory utilizations during peak hours for five consecutive weekdays. We use the collected data to calculate their minimum, maximum, average, and 95-percentile statistics and plot their cumulative distribution function (CDF) in Fig. 2. We can see from Fig. 2(a) that the four CPU curves are far apart and we expect that using each statistics as input to the problem would produce a different solution.5 No significant difference is observed in terms of memory
usage for the four memory statistics (Fig. 2(b)). Later on we use the average memory statistics as the input to the problem.
Since 7500 and GSR routers are characterized by different constraints and require-ments, we decide to apply our methodology, both ILP and heuristics, in three different scenarios: i) the 7500 case, ii) the GSR case, and iii) the combined case. In the 7500/GSR case, only 7500/GSR routers and their associated traffic are considered, while in the com-bined case, both types of routers and all traffic are considered.
Appendix C: NetFlow Support by Cisco GSR and 7500 routers
For both 12000 series (GSR) and 7500 series routers, NetFlow can be enabled at interface level but the NetFlow-supporting capability is determined by the linecard and the router. 5 The minimum CPU utilization shall not be used in practice because they do not reflect the actual requirement.
0 20 40 60 80 100 0 0.2 0.4 0.6 0.8 1 CPU Utilization (%) CDF(x) Minimum Average 95−Percentile Maximum
(a) CPU utilization
Memory Usage (MB) CDF(x) 55 60 65 70 75 0 0.2 0.4 0.6 0.8 1 Minimum Average 95−Percentile Maximum (b) Memory usage
Fig. 2.CDF plots for CPU and memory utilizations on 7500 router slots.
We summarize the different factors in supporting NetFlow by the two router families and argue that GSR series contributes to a major fraction of the upgrade cost.
7500 series routers potentially support NetFlow. However, proper functioning of Net-Flow is determined by the following factors: i) traffic load in terms of bits per second (bps) and packet per second (pps), ii) number of active flows, iii) RSP (Route Switch Processor, the central processor of the router) type and memory capacity, and iv) VIP (Versatile In-terface Processors, the processor of a 7500 linecard) type and memory capacity. Therefore, the decision of whether or not a 7500 router or its linecards need an upgrade depends on both the router/linecard configuration and the traffic condition. The traffic load infor-mation can be obtained through SNMP. However, it is difficult to obtain the number of active flows without turning on NetFlow. Therefore, we use packet traces that have been collected from several links in the network to identify the “typical” number of active flows going through a certain interface type on a 7500 series router. By testing the combination of traffic load, number of active flows, RSP type/memory, and VIP type/memory, we can determine whether or not a certain router/linecard configuration supports NetFlow at a given network location.
For GSRs, their capability of supporting NetFlow is determined by the engine type. Some fully support NetFlow (Engine 3 and 4+), some do not support (Engine 4) and some support with limitations (Engine 0, 1, and 2).6 Our strategy is to upgrade all
non-supporting and non-supporting-with-limitation linecards to the fully-non-supporting ones. One important constraint is that during upgrade we must keep the same interface speed. As a consequence some linecards equipped with certain low-speed interfaces do not have any corresponding upgrading option.
Therefore, whether a GSR linecard needs upgrade or not is solely based on its Engine type, while the capability to support NetFlow by 7500 routers depends on both config-uration and traffic conditions at the routers. We need to identify the traffic conditions under which a certain router configuration may support NetFlow without experiencing any degradation in the packet forwarding process. As a result, we determine whether a router configuration needs an upgrade or not. We do that by setting up a TestBed and running several experiments. We use the results to formalize a set of requirements that 6 These linecards either cannot support NetFlow with other desired features or have a performance limitation
each router configuration must satisfy to support NetFlow properly. These requirements are used to generate input to the Integer Linear Program (ILP) or heuristics.
5.1 TestBed Configuration
We set up a TestBed using the configuration shown in Fig. 3. Traffic is generated by an Agilent Router Tester and routed by a 7513 router, which is a member of the 7500 family. Router Tester is connected to the 7513 router using multiple POS OC3 and Fast Ethernet links. To provide useful results, we configured the 7513 router by using one of the most common 7500 router’s configuration in the operational network. Note that although we only test a single 7513 router, with the help of Router Tester, we are actually emulating a real network environment with both customer routers (cus) and backbone routers (BB).
A 7500 router has a central processor and several linecards. The central processor is referred to as a “Route Switch Processor” (RSP). There are various RSP models with different processing powers. The most common RSP on Sprint gateway routers is RSP4. We will examine the impact of NetFlow on RSP4 in our testing. The linecards on 7500 routers are called Versatile Interface Processors (VIPs) [6]. Each VIP has its own pro-cessor. There are different VIP families with different levels of processing powers. About 90% VIPs on 7500 routers in Sprint network are VIP2-50s. VIP2-50s can have up to 128 MB memory (DRAM)[6]. A good upgrade candidate of them are VIP4-80s which have higher CPU power and up to 256 MB memory capacity.Therefore we focus on these two types of VIPs in our testing: VIP2-50 and VIP4-80. A VIP can be used on different types of routers belonging to the 7500 family. Therefore, although we only use a specific 7513 router in our TestBed, we expect similar results on other 7500 family routers since they all use similar types of VIPs with similar configurations.
FE 7513
Agilent Router Tester Cus Cus Cus BB BB FE FE OC3 OC3
Fig. 3.Testbed setup.
In order to create a set of representative testing scenarios we use traces collected from three different OC3 links on Sprint’s gateway routers. The traces collected capture a diversity in time behaviour for the three links since they refer to a collection process spanning 6 months period. We plot the typical number of active flows and packet size distributions in Figure 4. The number of active flows is the major factor in determining CPU utilization and memory usage on VIPs while the packet size distribution is an important factor in packet forwarding performance.
As shown in Figs. 4 (a) through (c), the typical number of active flows on a OC3 gateway link ranges from a few thousand to 35k. We use 60k in our testing as a worst
18:00 21:00 00:00 03:00 06:00 09:00 5 10 15 20 25 30 35 40
Time of day (HH:MM UTC)
Active Flows (‘000)
Copyright (c) 2002 − 2004 Sprint ATL
(a) Active flows on Link 1
40 576 1500 0 2 4 6 8 10 12 14 16 18
Packet size (bytes)
Packets (%)
Copyright (c) 2002 − 2004 Sprint ATL
(d) Packet size distribution on Link 1
06:00 12:00 18:00 00:00 0 2 4 6 8 10 12 14
Time of day (HH:MM UTC)
Active Flows (‘000)
Copyright (c) 2002 − 2004 Sprint ATL
(b) Active flows on Link 2
40 576 1500 0 5 10 15 20 25 30 35 40 45 50
Packet size (bytes)
Packets (%)
Copyright (c) 2002 − 2004 Sprint ATL
(e) Packet size distribution on Link 2
06:00 12:00 18:00 00:00 0 0.5 1 1.5 2 2.5
Time of day (HH:MM UTC)
Active Flows (‘000)
Copyright (c) 2002 − 2004 Sprint ATL
(c) Active flows on Link 3
40 576 1500 0 10 20 30 40 50 60 70
Packet size (bytes)
Packets (%)
Copyright (c) 2002 − 2004 Sprint ATL
(f) Packet size distribution on Link 3
Fig. 4.Active flows and packet size distribution on three links.
case working scenario and expect the actual number of active flows to be lower. We can observe from the Figs. 4 (d) through (e) that packet size distribution have three modes: 40-byte, 576-byte, and 1500-byte. However, the distribution around the three modes is different on different links. We will cover the three packet sizes in our testing.
Based on our discussion above, we will use the following as configurations in our tests: – RSP4
– VIP2-50 and VIP4-80 – Up to 60k flows
– Packet sizes at 40-byte, 576-byte, and 1500-byte and a proper mix of them as on the three OC3 links.
5.2 NetFlow Impact on Forwarding Performance
In Fig. 5 we show the impact of enabling NetFlow on the maximum forwarding rate for a wide range of packet sizes, from 40-bytes to 1500-bytes. We can see that for packet sizes below 256-byte, NetFlow affects the maximum forwarding rate on both VIP2-50 and VIP4-80, while no significant impact is observed for larger packet sizes. This is due to the fact that with small packet sizes, the maximum forwarding rate is limited by packet per second (pps) which will be reduced when NetFlow is enabled. For larger packet sizes, the maximum forwarding rate is limited by the interface speed which is NOT affected by
0 300 600 900 1200 1500 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Packet Size (Bytes)
Normalized Maximum Forwarding Rate (%)
VIP2−50 Netflow Off VIP2−50 Netflow On VIP4−80 Netflow Off VIP4−80 Netflow On
Fig. 5.NetFlow impact on maximum forwarding rate.
NetFlow. Since it is theactual limiting factor, we focus on the CPU utilization of a VIP on the feasibility for enabling NetFlow, which is studied in Section 5.4.
5.3 NetFlow Impact on RSP
Figures 6(a) and 6(b) show the behavior of a RSP4 before and after enabling NetFlow. 60k flows are tested at aggregated rate of 200 Mbps. NetFlow was enabled at approximately 19:30. They demonstrate that there is NO negative impact to the RSP when enabling NetFlow. More specifically:
– NetFlow’s impact on RSP CPU is negligible. We can safely conclude that unless a RSP CPU is utilized at peak, i.e., 99%, turning on NetFlow will not have any negative impact to RSP CPU.
– NetFlow’s impact on RSP memory usage is negligible. the main reason is because, by implementation, NetFlow can move the execution of some tasks such as Access Control Lists (ACLs) from the RSP to the corresponding VIP. Therefore, there is no RSP memory constraint for enabling NetFlow but we expect a significant impact on the VIP memory usage.
5.4 NetFlow Impact on VIPs
In this section we study the impact of enabling NetFlow on VIP cards in terms of CPU utilization and memory usage. As a first test, we vary the number of flows from 2.5k to 60k, and we fix the packet size to be relatively small 256-byte. As we can see from Fig. 7, NetFlow increases the CPU utilization on a VIP2-50 by 5∼10%. We also observe in the tests that when the CPU utilization is at 99% on a VIP, NetFlow is not recording flow information correctly therefore not exporting the correct flow data. There is no forwarding performance degradation.
As a second experiment, we fix the number of flows to 60k flows and we vary the packet size distribution according to the three-modes empirical distribution observed from traces collected from OC3 links. We summarize the results in Table 1. Our objective is to see how VIP4-80 outperforms VIP2-50 in terms of CPU utlization.
(a) CPU utilization on a RSP4
(b) Memory usage on a RSP4
(c) Memory usage on a VIP2-50
Fig. 6.NetFlow impact on a RSP4 and a VIP2-50.
We observe first that for both VIP2-50 and VIP4-80 the CPU utilization increases about 5% and never exceeds 10%. Second, notice that VIP4-80 is able to support NetFlow by requiring about 20% less CPU than VIP2-50 under the same traffic conditions. We point out that some traffic load configurations were not tested on all distributions because they exceed the maximum forwarding rate for that particular average packet size. We would like to extend this study by testing the performance of VIP6-80 but due to lack of resource we cannot make it. However, in the following we assume the CPU utilization reduction by upgrading VIP4-80 to VIP6-80 is again 20%. This number can be adjusted in the ILP model if discrepency is revealed in future tests.
# of Flows CPU Utilization (%) 0 1 2 3 4 5 6 x 104 0 10 20 30 40 50 60 70 80 90 100
50% OC3 Load, Netflow Off 50% OC3 Load, Netflow On 25% OC3 Load, Netflow Off 25% OC3 Load Netflow On 50% OC3 Load, On/Off Difference 25% OC3 Load, On/Off Difference
Table 1.CPU test results on traffic with 60K flows and different packet size distributions. Link OC3 VIP2-50, NetFlow VIP4-80, NetFlow
ID Load OFF ON OFF ON
1 25% 85% 90% 63% 69% 50% 94% 98% 83% 87% 2 25% 97% 98% 3 25% 48% 54% 33% 37% 50% 69% 73% 49% 54% 75% 90% 94% 65% 69%
According to Cisco’s document [1], memory usage on a VIP will increase when NetFlow is enabled because an extra amount of memory will be allocated as NetFlow cache. The size of the NetFlow cache allocated is determined by the total size of the VIP memory. Table 2 summarizes the increase of memory usage on VIP2-50s when enabling NetFlow.
Table 2.Required DRAM by NetFlow on Cisco 7500 VIPs.
DRAM Default NetFlow DRAM Required by Capacity Cache Entries NetFlow Cache
256 MB 256K 16 MB
128 MB 128K 8 MB
64 MB 64K 4 MB
32 MB 32K 2 MB
16 MB 2K 128 KB
We verify the memory impact of NetFlow on our TestBed. Figure 6(c) shows the memory usage is increased by 8 MB when NetFlow is enabled on a VIP2-50 with 128 MB memory. Therefore, to have NetFlow safely enabled, we have to make sure the free memory space under normal working conditions without NetFlow is larger than 8 MB.
To conclude what we have learned from the TestBed:
– NetFlow has negligible impact on RSP CPU and memory utilization. As a consequence RSP CPU loaded up to 99% can still support NetFlow without any upgrade.
– VIP CPU is increased by less than 10% by NetFlow.
– VIP memory increased by NetFlow is a step function of the total VIP memory capacity, as shown in Table 2.
– Upgrading to the closest higher VIP type (e.g., from 2-50 to 4-80, from 4-80 to 6-80) reduces the CPU utilization by 20%.
We would like to thank Beng-Ong Lee and Travis Dawson for completing most of the testbed experiments.