Content Distribution Problem - Data Movement Challenges and Solutions with Software Defined Net

The dissemination of content to users at scale can be done using a variety of solutions. The solution chosen primarily depends on the type of content being served – static or dynamic.

6.1.1 Static Content Distribution

Static content is pre-distributed to geographically distributed, redundant edge servers in content delivery networks (CDNs). As the name implies, and as shown in Figure 6.1.1, these edge servers live close to the edge of the network, serving content to consumers who are nearby. Each of the edge servers determines and updates its available content based on the content pushed by the origin server.

A request for content download is typically a lightweight operation that is handled by a front end server, such as a web server where the consumer interacts with the service and with available content. This front end server determines where the consumer is located and assigns an appropriate edge server located near the consumer. This edge server is where the consumer fetches its data. The advantage to such a design is twofold. First, bandwidth in the network core is conserved, since content requests and download responses are redirected to and served at the network edge. Second, the design is a “natural” load balancer for the a given piece of content. As the number of consumers grow that wish to simultaneously consume a given piece of content, additional edge servers can be added to the CDN deployment to handle the additional load. Without the redundant edge servers,

Figure 6.1.1: The architecture of a content delivery network

the origin server would be responsible for handling individual content requests and would quickly become a bottleneck.

CDNs have been adopted by many of today’s popular content providers and web services. Static video streaming services, for example, use CDNs to host video content like movies and television shows on edge servers, where a front end server or cluster of servers provide a single “pane of glass” where the consumers interact with the service using a graphical user interface. In an operation that is typically masked from the consumers, the CDN delivers requested video content from an edge server close to the consumer, rather than from the front end server with which the consumer is interacting. Although they are widely deployed in today’s networks, since they require the pre-deployment of content on edge servers, CDNs are ineﬀective for dynamic content distribution.

6.1.2 Dynamic Content Distribution

Dynamic content is content with characteristics such as location and size that are unknown prior to requests for the content. Live video streaming is an example of dynamic content. It does not have a fixed size, and in an environment where the producers of the streams can come and go from indeterministic locations, live video streaming does not have a predefined content location or source.

As such, it cannot be deployed over a CDN. For dynamic content distribution, there are two popular solutions in today’s network architectures, namely IP unicast and IP multicast.

The use of unicast results in a large number of traﬃc flows in the network from a producer to every interested consumer. This limits the number of consumers a producer can support as a function of the video bitrate and the available network bandwidth of the network’s minimum cut between the producer and all consumers – possibly the link between the producer and the network. IP multicast attempts to overcome this bottleneck by distributing a consumer’s data throughout the network to interested consumers by constructing a multicast tree. This allows a single producer to serve an arbitrarily large number of consumers, where the maximum number of consumers is no longer dependent on the available network bandwidth at the producer. It is instead a function of available capacity at each independent edge. Although eﬃcient for the distribution of content from one to many throughout the network, IP multicast has limitations.

First, it is not desired for switching rapidly from one IP multicast group to another. If a consumer wishes to watch a video stream – whether as an initial subscription or transitioning from one video to another – it must join the IP multicast group of the desired content. This join is not guaranteed to complete in a short amount of time, especially if the producer does not have its multicast tree constructed close to the requesting consumer.

Second, although IP multicast constructs trees in the network core to eﬃciently distribute groups without redundancy, it does not make eﬃcient use of bandwidth when consumers transition between groups. If a consumer is to rapidly switch between videos in IP multicast groups, it must join and leave the groups. There is the potential for many of these groups to fetch data for the consumer in parallel until a timeout occurs or until the user’s local router sends a leave message to the upstream router. This results in bandwidth being used unnecessarily both locally and between the upstream router and the consumer’s local router. Fundamentally, this is due to the reliance on timers that prune consumers and branches of the multicast tree. Timers are used, since multicast is distributed and does not maintain an accurate state of the exact demand for multicast content. With the use of timers, bandwidth can be wasted in the case where there is no longer downstream demand.

Third, IP multicast does not use the lowest cost path for the dissemination of content. It requires a broadcast tree be first implemented via the rendezvous point of the particular multicast group being joined. Then, after this tree has been completed and the content is being delivered, a second tree is constructed following the paths defined in routing tables from the consumer to the

producer. During this time and until the rendezvous point tree is deconstructed, the content is delivered simultaneously along the rendezvous point tree and the shortest path tree.

Fourth, IP multicast is not easy to deploy. There are many flavors of IP multicast for the distribution of content within and across administrative domains. Each is complex to implement correctly and requires hop-by-hop configuration. For this reason, many choose to forego IP multicast. Furthermore, although there have been proposals to improve IP multicast security, IP multicast is inherently insecure. A denial of service attack could be performed on a network by issuing diﬀerent, rapid, and sequential IP multicast group joins (with or without leave notifications). This poses a great security risk in an IP multicast implementation and is part of the reason IP multicast is not widely deployed across administrative domains where precise control over usage is not always possible.

And lastly, IP multicast does not take into consideration network capacity. Because IP multicast is a collection of distributed protocols, not one forwarding device in the network is able to determine the best path to forward a group’s content to downstream consumers. The construction of the IP multicast trees is done independently for each video source without regard for other network users. As such, IP multicast can be dangerous where network bandwidth is limited or where there is the risk of oversubscription. On a similar note, IP multicast allows anyone to publish to a group, where the consumers are responsible for sorting out the data they are interested in. This can result in excessive bandwidth use, since such undesired yet still transmitted data will inevitably be thrown away by consumers.

In document Data Movement Challenges and Solutions with Software Defined Networking (Page 127-130)