2.3 Video as an Emerging Application
2.3.1 Protocols for Video Delivery
2.3.1.6 HTTP Adaptive Streaming
Despite the relative success of HTTP as a technology, there were other challenges that simply could not be overcome by HTTP alone. As networks expanded, and bandwidth become more available, a disparity appeared between the connections of different users. Many Internet Service Providers offer tiered services, with
different levels of throughput guaranteed at different price points. Similarly,
some of these tariffs enforce a limit or cap on the amount of traffic a household can consume over a billing period. Geographically, not all countries moved at the same technological pace either, making it difficult for services to cater for a wide range of connection types and speeds. The explosion of data-based cellular networks only exacerbated the situation, leading to an even greater spectrum of network capabilities.
Despite the ubiquity of the Internet, the capacity of the networks which un- derpin it are not an infinite resource: with an increase in popularity came a need to further increase the provision of resources. Unfortunately, this is a relatively time and capital intensive process, especially when considered in the scope of a nationwide network. These national networks are often-times connected to- gether, with connections sometimes spanning continents. Clearly this requires vast amounts of coordination, planning and foresight to expand and upgrade. Yet despite this ever increasing technology provision, it is inevitable that during busy periods, congestion can occur in the path between a client and the service they are requesting. This congestion often results in a loss of observed through- put, and in the context of video, a potential reduction in the quality of video a stream can carry.
As alluded to previously, mobile networks have also seen a huge explosion in popularity. In particular, data services capable of carrying video have become an affordable reality for many. A mobile context carries its own challenges though, as the physical movement of a client can lead to a fluctuation in service strength,
which consequently impacts usable bandwidth. Despite this, with the evolution of the transmission technologies, there is a notable trend in the increased capacity that these cellular networks nonetheless afford to clients. This trend is not set to change with the development of future networks [167], which should enable more users to access even higher quality videos whilst on the move.
The suitability of mobile networks has also led to an increase in the specifi- cation of the mobile devices themselves. As these handsets advanced to a level capable of video playback, new video codecs were developed in order to allow a relatively resource constrained device to play back video. Content providers now needed to not only encode a video in multiple quality levels, but also with multiple codecs. However, the capability of at least some of these devices has now progressed so far that they can decode videos that were previously limited to personal computers. Regardless of this innovation, these devices are often still resolution constrained due to their very nature as hand-held devices.
It is clear that content creators need to offer various levels of quality, resolu- tions and encodes in order to match the diversity in both networks and devices. This situation is not fixed either; despite the increase in available bandwidth, the impairments described previously result in fluctuations in bandwidth for an end-host. This can vary from day-to-day, and even from minute-to-minute in the case of a moving client in a mobile network. Clearly a blanket approach to de- livering a single quality level is no longer appropriate. Adaptive video streaming aims to combat the unpredictable nature of modern networks by enabling clients to dynamically adjust the quality of video by requesting a representation that best matches its own available bandwidth. The premise behind this is that user experience is maximised: a client will always request the maximum video quality possible given the resources it has at its disposal.
With a huge amount of content variations to be provided, storage require- ments are vastly increased for providers. Rather than storing a single copy of the content, many versions need to be available to handle all possible requests. Given that HTTP is used to delivery thousands of files every day, it became a natural choice for serving the many variants found in an adaptive representation, and thus a new set of HTTP Adaptive Streaming (HAS) based technologies came into being. These can be broadly categorised by the fact that they rely on HTTP for their transport mechanism. Proprietary commercial solutions (such as Apple HLS [3] and Microsoft Smooth Streaming [16]) are complimented with open and
standardised techniques (such as MPEG-DASH [159]). In the case of the lat-
ter, content of different qualities and encodings are chunked into smaller, often
fixed-length (in terms of playback), segments. These segments are then grouped together to form a single representation (an entire video, from start to finish). Alternate representations are collected together in the same manifest, which is used by the player to enable playback. If the player wishes to change repre- sentation during playback, this process is as simple as requesting an alternate representation from the manifest.
The manifest also includes annotation and metadata for each of the represen- tations. This includes information necessary for the client to determine the most appropriate representation given its own capabilities (decoding capability and res- olution) and those of the connected networks (required throughput). This ability gives the playback client the adaptability required to maximise the experience of a user in a constantly shifting environment.
Importantly, content can be organised and described in two main ways. Firstly, content can be segmented on a file-system level, with a number of different smaller files, each of which is directly equal to an individual segment. Together, these segments represent an entire video. In this context, each chunk is individually playable, affording the player the flexibility to freely swap between representa- tions without the need to download header information. In the second method, segments can also be represented as a byte-range of a much larger file (much the same as a HTTP progressive download technique described previously in Sec- tion 2.3.1.5). By downloading a particular byte-range, the client can reassemble the file necessary for playback. This method also requires an initial set of headers to be downloaded, often before playback starts. Without the headers, the play- back element will not be able to correctly recognise and process the content, as the header contains the information and structure necessary to do so.