To conclude, the topic of video request is vast. Especially when technologies able to achieve greater network performance when applied to VoD are brought into focus. To limit the scope of this Thesis to an achievable goal, the main interests and conclusions to be taken from this brief introduction of technologies above will be as follows:
1. Variable size segmentation cannot be applied in combination with caches without a method of appropriately weighing the historic request frequency of items, thus excluding the possibility of applying the simple, and often used, cache eviction algorithms such as LRU and FIFO. For this reason the “Fixed” segmentation policy will be applied throughout this Thesis to create opportunities to cache using a more vast array of simple cache eviction algorithm. Any cache-specific research containing variable length segmentation, such as pyramid segmentation and skyscraper segmentation will be considered out of scope for this Thesis and the research contained.
2. Video popularity distributions are a point of discussion in the research community in regards to what popularity distribution, namely Zipf or Zipf-Mandelbrot, is to be considered a superior distribution for modelling video popularity data. This
Thesis will regard video popularity distributions to be a primary focus of study with special attention to the discussion surrounding video request data and the methods used to analyse which model is best to consider for the most approximate replication of real VoD data. Additional potential distributions, such as the log- normal distribution, will be considered outside of the scope of this Thesis. 3. Additionally to Video Popularity, other characteristics of video will be discussed
but not analysed in the same detail as the video popularity models. This is due to the lack of available data for such analysis. However, data from the limited existing sources will be combined in an effort to provide a pseudo-realistic request simulation environment for the purpose of simulating request data to be as encom- passing of all known variable characteristics as possible. This work will require leaps of judgement due to the restrictive nature surrounding the details of the characteristics involved.
4. As a method to reduce video delivery strain on a network, caching has been brought forward as a possible solution. Caching, though possible, is restricted in present day IP infrastructures, thus creating the need for alternative infrastruc- tures to enable and support a cache-enabled network. Any-cast protocols such as CCN and ICN provide such an environment. For this reason these technologies will be brought into focus in this Thesis with a primary focus of cache implemen- tation and development of cache eviction algorithms. A simulation environment of an ICN cache enabled network is available (Icarus [43]) and will be utilised throughout this Thesis to develop and test novel cache eviction algorithms.
3
Video Popularity Distribution Analysis
Global IP video traffic is predicted to constitute 82% of all consumer internet traffic by 2020, up from 70% in 2015 [6]. Innovation to reduce the costs of video streaming are therefore necessary and are being developed. Some examples are distributed server farms and video compression techniques, for which most solutions require simulations to stipulate just how effective one may presume specific solutions to be. Simulation data such as total video request distributions may for some VoD hosting platforms be accessible, however this is not so for most. Identifying key characteristics will open up the ability to simulate and reconstruct video request behaviour to more people, creating further opportunity for innovation. The goal of this Chapter is to identify the video request distribution of a VoD system for the purpose of reconstruction. To achieve this, two models are analysed to conclude that a specific model, with the required parameters, would be the best suited model to closely recreate the empirical dataset.
3.1
Introduction
Accurate modelling of video consumption patterns is a key factor in providing efficient utilisation of network resource. However, the lack of publicly available consumption data has meant that video consumption models have not been put under scrutiny and, instead have assumed to follow the same model, namely Zipf, as many other Internet services have; such as the World Wide Web, news feeds and email [26, 52]. Existing re- search has provided extensive testing that shows Zipf-like distribution to be sufficiently accurate for modelling such services [26, 52, 53]. However, the same cannot be said for video consumption models. In fact, various observations have shown that Zipf-like dis- tributions may not be the best model for describing the popularity distribution of video consumption patterns, hence, Zipf-Mandelbrot may be a better fit.
The danger of simply assuming a Zipf-like distribution for video consumption, with- out supporting evidence, is that the resultant model may not be sufficiently represen- tative of the user demands. Assuming a badly fitting model may lead to sub-optimal design choices in areas such as service planning/provisioning, utilization of network re- sources, accommodation of Service Level Agreement (SLA)s. A video caching system is one example where accurate video consumption models have significant impact on the network performance [14, 54, 55]. Such systems need to accommodate different types of video services, such as: VoD, Live streaming and UGC which in some cases follow vastly varying distribution models.
VoD has been argued to follow a Zipf-like consumption pattern [9, 10, 12]. However, the observed distribution of data shows a curvature on the log-log scale, which suggests a possible alternative model, namely Zipf-Mandelbrot [56]. Similar observation can be made for UGC consumption patterns [17]. This deviation may increase in the future of the Internet, with the rapid growth of user demands, the widening variety of the offered video services and the expected innovation in Internet architectures, such as
ICN [42, 50, 57].
This Chapter addresses the shortage in evaluation of video-focused consumption models. While there heve been hints that Zipf-Mandelbrot is a good model, this work shows – through empirical data – that consumption patterns do indeed comply to this distribution. This is achieved by analysing an example large-scale empirical dataset, provided by BT , as well as using synthetically generated consumption data, following Zipf and Zipf-Mandelbrot distributions. These models include standard testing methods such as; Pearson chi-square, Pearson’s correlation-coefficient, as well KL divergence. Our study demonstrates that Zipf-Mandelbrot better fits realistic consumption data than the previously widely used model: The Zipf-like distribution. Furthermore, this study shows that the expected behaviour of cache-enabled video delivery systems is closer to that of the empirical dataset when using the Zipf-Mandelbrot model than the Zipf-like model.