• No results found

CSL 858 Virtual Network Simulation

N/A
N/A
Protected

Academic year: 2021

Share "CSL 858 Virtual Network Simulation"

Copied!
20
0
0

Loading.... (view fulltext now)

Full text

(1)

CSL 858: Simulation of a CDN in a virtual

network

by

Ashish Khullar

2007CS50212

under the guidance of

(2)

Contents

1 Introduction 3

2 The Virtual Network 4

3 The Implementation 7

4 The Emulation 10

4.1 Simulation -1: The Effectiveness of CDN . . . 11 4.2 The Study of our emulation . . . 12 5 Conclusion and Future Work 16

(3)

Chapter 1

Introduction

Content Distribution Networks consist of a set of systems placed at various nodes in a network.CDN essentially works on the principles of caching con-tent that is accessed frequently and also localizing the availability of concon-tent with respect to a given requesting machine and thus reducing the latency and also looks to increase the bandwidth, by reducing the number of nodes between the requesting machine and the machine satisfying the request. Strategically placed edge servers decrease load on backbones and public and private peers,as it redirects the traffic meant for these servers.The intent on all the cases is to get the content to as near to the edge user as possible. CDN’s give more control over asset delivery and network load . They are optimized to per customer need.

Several service providers provide CDN services professionally, including Aka-mai Technologies,Amazon, Windows Azure CDN to name a few. We also have examples of some free CDN’s like the Coral Content Distribution Net-work.Other open source projects which look to create CDN’s are MirrorBrain and CoDeen to name a few.

(4)

Chapter 2

The Virtual Network

The virtual network for this simulation was created using VMware ESXi server installation. The front end queries were executed against a vCentre Client. This client was installed on a VM inside the VMware ESXi server itself.

We have basically divided the network into separate regions that we have shown in the diagram. These regions will be mentioned in our entire discus-sion henceforth as AS. With these AS’s we look to draw a parallel between the service providers like BSNL,Reliance in India. All the service providers have multiple routers, but for the purpose of our simulation we have chosen to provide each AS with only one router as shown in the diagram.

(5)

We look to model the Internet scenario, and have tried to emulate the scenario in a fashion, wherein we have modelled the network behind service provider, by having a dhcp server on all the machines that act as routers.Thus by setting the IP range of the dhcp server, we are able to ensure that all ma-chines that are connected to the dhcp server end up lying in the same AS. This also ensures that we can connect any number of clients to the router without having to worry about assigning IP’s. We do need to assign a static IP to all the CDN servers that we have, so that we can ensure that DNS resolution can be handled. So effectively we are trying to emulate a scenario, wherein there is only one service provider within an AS and all the traffic that leaves an AS has just one gateway. Also we have assumed that the AS assigns IP’s dynamically(which most AS’s do). We have only two routes in between the AS’s right now, but the approach that we have chosen can be very easily extended to a vary large number of systems, and we can add a very large number of links between the AS’s and can define what all routes do want to be able to transmit and receive packets, which we achieve by the means of forwarding tables, as has been explained later. The CDN server in each server is right now servicing requests originating only in that AS, and a load balancing extension of the present emulation would be very more accurate.The Diagram below tries to explain the idea followed in resolution

(6)

As we can see from the diagram, a machine from a particular AS ultimately end’s up having its requests serviced from it’s AS.

(7)

Chapter 3

The Implementation

We have for the purpose essentially made four templates for the virtual ma-chines.We provide a brief explanation of the various changes that have been made to each machines.

1. Client VM’s: These machines are machines in which the etc/udev/rules.d/70-persistent-net* files were removed to make changing the network labels

using the vSwitch possible. Other than that we did not make any ma-jor changes in the client. We exchanged the public/private keys of the client and the controlling machine so that the controlling machine is able to remotely log in without having to give a password.

2. The CDN servers: The CDN servers have been assigned static IP’s.Each CDN server has a nginx server running on it. We have configured the nginx server in accordance with our network created.In the nginx server, we have created files that enable it to cache files of the indicated type. Certain files have to be created for the server to be able to cache con-tent.

3. The Routers: Each router in our simulation has 3 NIC’s. In general, since we want to control the bandwidth of each router we have to add NIC’s for each link that we want to control the bandwidth and connect them to separate Port Groups on the vSwitch

(8)

to its forwarding tables specifying the traffic that can come to it form a particular AS. For instance, if we want a connection on its eth1 to also be able to forward traffic coming from another AS say AS2, to which it is connected via AS1, and not directly, we have to add entries that look like

route add-host {IP range of AS1} dev eth1

route add-host {IP range of AS2} netmask 255.255.0.0 gw {IP range of AS1}

Also on one of the eth0 ports of the router we have a dhcp server run-ning. This is the NIC that is attached to the AS, for which that partic-ular machine acts as a router.For setting up this dhcp server changes have to be made to the /etc/dhcp3/dpcp.conf file of the machine. This file contains information like the range of the IP’s from which the ma-chines which connect to this NIC will be assigned IP’s, their netmask, gateway etc.

4. The Main server: The main server from which all files are down-load. we have set up a DNS server on this machine. For handling the

(9)

requests coming from the machines for downloading files, an Apache server runs on the machine.For the purpose of conducting experiments we have created files of size from 1 to 100 Mb in the main server. For setting up a DNS sever we had to define views in the main server. These consisted of adding a definition of as AS in /etc/bind/named.conf.local. The definition of the view has to be added to /etc/bind/zones/*.conf where * will be the name of the view that you look to create. Changes also have to be made in the named.hosts and the named-rev.hosts files.

(10)

Chapter 4

The Emulation

For the emulation we have added two client machines to each of the AS’s. We have used two type of traffic generators for our study- a random traffic generator and a Poisson traffic generator. In the random traffic generator, a number is generated randomly between 1 and 100 and the file of correspond-ing size is requested from the server. The machine then sleeps for a random time between 1 and 10 seconds after which, the process of seeking the file is repeated. The requests are being made using wget.This file size is not ant standard distribution but it ensures that we have a fairly large spread in terms of file sizes. We could also have tried to create files whose size forms a Poisson distribution, or any other distribution.

For the Poisson traffic generator, we have used a modified version of the traf-fic generator created by Prof. Ribero. The traftraf-fic generator, initially used to open a port to a given IP, and used to send packets of some a given size to that IP at Poisson intervals. We have modified this sender to request for files using wget at the Poisson intervals. The parameters for this generator is the time for which we want the packets following a Poisson distribution in time to be demanded.The size of the file that is requests is one again randomly generated.

For initialization of the simulation, we have created a bash script that re-motely log’s into these machines. The traffic shaping parameter that has been set in all the scenarios is the average bandwidth parameter. For setting the traffic shaping parameters, the user has to fill in a config file, and a bash script runs against the vCentre client using the data in the config file and sets the traffic shaping parameters of all the vSwitches in the network. The bash script then remotely log’s into all these machines, a list of whose IP’s

(11)

have to be provided by the user and executes the scripts corresponding to the emulation scenario.

4.1

Simulation -1: The Effectiveness of CDN

For this simulation, we have used the random traffic generator. We ran the simulation for three scenarios. For running the simulations we divided our vSwitches into two classes. The vSwitches that connect two AS’s that is the vSwitches that connect the routers of two AS have been put in class 1. The vSwitches inside an AS have been put in the second class.

In scenario 1 we have set the bandwidth of all class1 vSwitches as 1 Mbps and the bandwidth of all class 2 vSwitches as 10 Mbps. This choice is mo-tivated by the fact that in this emulation, the bottleneck in the traffic shall be the links between the AS’s and the links between the systems and the CDN’s in the AS’s themselves are over-capacity. In scenario 2 we have set the bandwidth of all class 1 vSwitches as 10 Mbps and the bandwidth of class 2 switches as 1 Mbps. In this scenario, we have the links between the the

(12)

Table 4.1: Transmitted/Received rates in MBps

Scenario Main Server AS-CDN1 AS-CDN2 AS-CDN 3 1 1.210-0.033 1.122-0.938 0.132-0.129 0.129-0.128 2 0.148-0.005 0.036-0.040 0.039-0.040 0.047-0.050 3 0.663-0.021 0.212-0.209 0.212-0.209 0.208-0.209

AS’s as over capacity and the bottleneck are the links between the clients and the CDN’s In scenario 3 , we have set the bandwidth of both class 1 and class 2 vSwicthes as 5Mbps. So by the means of these three emulations we look to understand how severely does the presence of a bottleneck at the points in a network affect the overall transmission rates at various points in the network. As we can see form Table 4.1, in the case of scenario 1, we have the maximum amount of data being transmitted from the cdn servers and the main servers. This is the case in which the bandwidth is maximum for the local connections. We can also see that the effective use of the bandwidth of the main server is maximum in the scenario 1. Thus this presses the case of localizing the data, as we can clearly see that the overall performance of the network increases.

In this simulation, we also had a bandwidth manager running in the back-ground on all the clients. Using the bandwidth manager we recorded the bandwidth being used at each client.

As we can see from the diagram, there are times when the amount of traffic being received/transmitted from a particular machine is almost negligible. In such a scenario it would be wise to use the CDN machine of its AS to service requests from other AS’s. Thus for effective usage of the CDN, some form of load balancing among the servers forming the CDN seems to make a lot of sense.

4.2

The Study of our emulation

In this simulation, we seek to see how close we actually are to simulating real Internet traffic and the aggregate behaviour that happens because of the in-teraction in our emulation. We try and see if the packets being sent/received by any server in the network shows self similar behaviour.We will also see the effect of aggregate Poisson traffic in our emulation. In a normal emulation,

(13)

Figure 4.1: Each figure is a plot of the bandwidth as a plot of the time in minutes. Each pair horizontally were connected to the same vSwitch

(14)

wherein no interaction was happening between the systems, the aggregate would ideally have been a normal distribution. For this purpose we run our simulation for 12 hours using the Poisson traffic generator. On each of the CDN servers and the main server we have a tcpstat process running in the background. The Poisson traffic generator requests for a file of random size at Poisson intervals. tcpstat running in the background on every machine col-lects data at every second. We plot the number of packets sent and received in unit time against time. We have expanded the graphs around arbitrary regions and check to see if the graphs show some self similarity. One such zoomed in expansion is shown below. We observe that in general the graphs do show a great deal of self similarity. We will however try to prove it math-ematically.

For a rigorous mathematical approach we will use the approach as described in the ”On Self Similar Nature Of Ethernet Traffic”. The paper use a pox plot of the R/S data. For a time variant series X = {Xi, i ≥ 1 } with the

partial sum Y (n) = P

1≤i≤n

Xi , and sample variance

S2(n) = 1 n X 1≤i≤n Xi2− 1 n2Y 2(n)

the R/S statistic is given by R S(n) = 1 S(n){ max0≤t≤n(Y (t) − t nY (n)) − min0≤t≤n(Y (t) − t nY (n))}

To determine the R/S statistic for a time series of length N , we divide the series into lengths of K blocks each of size NK . Then for each lag N , compute

R(ki,n)

S(ki,n) starting at points ki = i

N

K, i = 1, 2 . . . such that ki+ n ≤ N . Then we

choose logarithmically spaced values of n plot the log plot log(R/S) vs log n. For each n we end up getting several points on the graph. The figure shows the graph that we obtain. From this figure we estimate the Hurst parame-ter to be 0.67 which is between 0.5 and 1 as required, thus mathematically proving our claim about the self similarity of the traffic that we generate. Thus the interaction in our emulation does not end up giving us a normal distribution as we would expect but rather a self similar result.

(15)

Figure 4.2: Self Similarity of Traffic. The highlighted Parts show the Parts Magnified

(16)

Chapter 5

Conclusion and Future Work

We have managed to create a virtual network and successfully simulate a CDN on it. We have been able to confirm some principles that have gone into the use of CDN’s using this simulation. We have been able to see that by using traffic generators, we are able to fairly accurately get to a simulation scenario that parallels a real simulation. This simulation could be thus used to explore the any changes in the CDN technology that one might look to implement. Also since the simulation is self similar in nature, we expect the results obtained to be very much in accordance to what shall be experienced in an actual deployment or change in as was simulated.

Features that could be added to this simulation include

1. Adding load balancing between the CDN machines.This makes a lot of sense, as we observe that at times that load balancing might lead to faster processing of a file request when the load on the CDN that corresponds to that AS might be very high. This could then also be used to study the various load balancing algorithms that might have been suggested for use.

2. The API of vCentre right now allows for only changing the bandwidth of an entire vSwitch. The API should be extended to be able to do the same over every port group. This would give greater control over the traffic shaping.

Thus we have managed to create an effective system of simulating network traffic and also creating a CDN network in a virtual network. If we look at

(17)

the system load of our emulation, we can see that a system must be provided with a typical 1TB server to execute the virtual deployment, we will be able to successfully run about 25 virtual machines on the server(A 40 GB hard disk to each VM in our experiment will provide for a lof size of 20 GB which should be sufficient for any server log).Thus using virtualization is providing us with a 25 fold reduction as compared to a single machine with say a 1TB physical hard disk.

(18)

Chapter 6

Rural CDN Network

The rural CDN project is an ongoing project in the Department of Computer Science, which is being run in colloboration with the Department of Infor-mation Technology and Ministry of Health and Family Welfare. The project aims at creating a robust rural CDN network. The aim of such a network will be to enable medicare in rural secnarios. The idea essentially is that with the flaky connections that exist in Rural India, it will not be feaseable for the doctors in rural india to actually connect to mainframe servers in India. So the idea instead is that they should be provoded with a robust CDN network that covers all states. This idea is currenlty in implementation with actual testbeds deployed to try and cover the states of Kerala and Uttar Pradesh.A testbed also exists for the same in IIT Delhi.

One of the major problems with the Rural CDN project is the enormous testing costs. With no virtalization based framework currently present, the costs of buying physical servers increases linearly as for every new state to be added, with a typical server costing aroud 7-8 lacs. For every server added, atleast 10 machines are to be added to successfully test if the server behaves appropriately under heavy traffic conditions.However with the use of the vir-tual CDN network, we will be able to use successfully reduce this linear cost. We have successfully deployed a CDN with 3 servers and 8 machines running on a single server, with the machines emulating traffic using the poission and the random traffic genertaors as we had shown earlier. AS we have shown in our results earlier, the simluation behves exactly similar to what actual net-work traffic would. So we are able to effectively save on the cost of 2 servers and 24 machines with a single server. The testbed in IIT was successfully emulated using this CDN server installation to show that this method works.

(19)

Thus effectively we have brought a 5 fold reductionin the cost of an installa-tion(2 servers + 24 machines(cost of 8 mahines = cost of a server).

Thus using the CDN virtual setup, the testing and deployment will be much cheaper

(20)

Bibliography

[1] Leyland, Taqqu, Willinger, Wilson, On self Similar Nature of Ethernet Traffic

[2] Keshav,Shenker,Demers Analysis and Simulation of a Fair Queueing Al-gorithm [3] http://www.cyberciti.biz/tips/howto-configure-linux-virtual-local-area-network-vlan.html [4] http://blog.unixy.net/2010/07/how-to-build-your-own-cdn-using-bind-geoip-nginx-and-varnish/ [5] http://communities.vmware.com/docs/DOC-11800

[6] Saroiu, Gummadi, Dunn, Gribble, Levy Analysis of Internet Content Delivery Systems

[7] Labovitz, McPherson, Oberheide, Jahanian Internet Inter-Domain Traf-fic

[8] http://www.fromdev.com/2011/06/create-cdn-content-delivery-network.html

References

Related documents

An analysis of the economic contribution of the software industry examined the effect of software activity on the Lebanese economy by measuring it in terms of output and value

ƒ Open Source, SaaS and SMB programs for Business Partners – marketing, sales, hosting and technical assistance.

– 96 000 Hz for sampling frequency, – 511 bits for length of Gold code sequence, – 7000 Hz for carrier and chip frequencies, – Method I for location estimation

The power check involves estimation of different power consumed in a design like static power, dynamic power, clock power, latch power, leakage power etc. If any power

· The study tour gave me insights about how the European Commission works and how CSR can be of positive influence on the growth of companies.

The Mint reports the Deep Storage gold as a custodial asset held for Treasury and the Working Stock gold as a component of the operating inventory of coinage metal (copper,

D-Pantothenic Acid (calcium pantothenate) 50 mg Vitamin B6 (pyridoxine hydrochloride) 50 mg Vitamin B12 (cyanocobalamin) 50 mcg Biotin 50 mcg Folic Acid 1 mg Lipotropic Factors:

P7 P5 P4 P3 P1 P2 P8 Key Bazaar type Paper S4: Identifying Architectural Patterns in OSS S0: Research in OSS S1: Challenges in using OSS in product development S2: Review and