Simulations for 360-Degree Video Streaming

4.5 Case Studies

4.6.2 Simulations for 360-Degree Video Streaming

In this subsection, we evaluate the performance improvement of our MF-HTTP middleware for the 360^◦ video streaming case study. We obtain three test videos from YouTube⁵ at 4 different resolutions: 1080s, 720s, 480s, and 360s (“s” stands for spherical). We recruit 10 volunteers to watch each video on the Nexus 6 phone and modify the 360^◦ video player to record user touches during the video watching. Each video watching session lasts for 1 minute. To support tile-based DASH streaming, we use the GPAC⁶ toolbox to slice and package the 360^◦ videos into into 4 × 4 tiles. We further do a segmentation on the encoded tile-based videos and generate segments with duration of 1 second as well as the MPD files,

5YouTube IDs of the three test videos are: -xNN-bJQ4vI, rG4jSz_2HDY, wXeKxY3F0sE.

6https://gpac.wp.imt.fr/home/

Average data rate (KB) ^MF-HTTP Fixed resolution

Figure 4.11: Bandwidth consumption with fixed resolution

MF-HTTP (1080s) Fixed 1080s Num of streamed tiles

Figure 4.12: A sample trace of one video watching session

which are ready to be DASHed. The viewport movement and the resulting tile and rate selection are generated by MF-HTTP based on the collected traces of user touches.

Results

We first check the bandwidth consumption for MF-HTTP at different resolutions. As shown in Fig. 4.11, MF-HTTP significantly reduces the bandwidth consumption at each resolution (52% average bandwidth saving at 360s, 59% at 480s, 60% at 720s, and 56% at 1080s, respectively), compared to the baseline approach, streaming the whole frame with a fixed resolution without considering the viewport. The result suggests that, with the same video quality, MF-HTTP is much more cost-efficient in terms of data transmissions than the blind downloading. We further plot a sample trace of one video watching session in Fig. 4.12,

which shows that MF-HTTP does not necessarily share network load peaks with the baseline steaming approach. On the other hand, the bandwidth consumption of MF-HTTP is closely affected by the number of tiles that appear in or overlap the viewport, as the valleys of the two curves match in Fig. 4.12.

100 2030 4050 6070 8090 100

Percnetage (%)

1080s 720s 480s 360s NA

100 2030 4050 6070 8090 100

Percnetage (%)

1080s 720s 480s 360s NA

100 2030 4050 6070 8090 100

Percnetage (%)

1080s 720s 480s 360s NA

Figure 4.13: Video quality constitutions with different bandwidth (Video 1 to 3 from left to right) We next vary the available bandwidth from 250KB/s to 1000KB/s to examine the streaming quality of MF-HTTP, and compare its performance with a greedy DASH scheme that maximizes bandwidth usage and streams at the highest possible resolution. Fig. 4.13 shows how much time (in percentage) the test videos are played at different resolutions using two streaming approaches, where “NA” denotes the bandwidth is insufficient for any of the given resolutions. As shown, MF-HTTP constantly outperforms the greedy DASH scheme under all bandwidth conditions for all test videos. MF-HTTP can maintain good video quality when the bandwidth is low, and it quickly responds to the increase of the bandwidth. This result suggests that MF-HTTP can more efficiently utilize the network resource to focus on downloading the high quality video segments in the viewport.

Chapter 5

Viewport-Aware Adaptive 360-Degree Video Streaming

Numerous novel multimedia service and content types are emerging on mobile platforms nowadays. In this chapter, we present our design and evaluation of a customized enhance-ment for streaming a new and important content type, 360^◦ video.

5.1 Background

360^◦ video, which provides panoramic views to give users an immersive experience, is now becoming popular on major video sharing websites and social meida channels like YouTube and Facebook. 360^◦ videos are shot using omnidirectional cameras and thus consist of panoramic frames. When watching a 360^◦ video, the user views a limited portion of the whole spherical image which is often determined by the user Field-of-View (FoV) on the smartphone or the head-mounted display (HMD). During the playback, the FoV adapts with the user head motions or other navigating interactions.

Streaming 360^◦ videos is challenging. First, due to the panoramic nature, 360^◦ videos are much larger (4x to 6x) than conventional videos under the same perceived quality [77].

Compared to a regular video, the transmission of a 360^◦ video consumes much higher bandwidth, which can be scarce especially in wireless networks. Second, compared to regular videos, streaming 360^◦videos introduces higher computation and energy overhead for mobile devices [50], which have limited CPU, GPU, storage, battery capacities.

Existing off-the-shelf 360^◦ video streaming systems (e.g., YouTube and Oculus) stream the entire 360^◦ frames to the client [139], which directly employ the conventional approach for regular videos. Streaming all the pixels of 360^◦ videos is wasteful since the user only has a limited FoV. To this end, tile-based streaming [118] is proposed for 360^◦ videos.

The general idea is to divide each panoramic frame in the 360^◦ video into smaller-sized non-overlapping rectangular regions called tiles. As each tile is independently decodable, the clients can only request the tiles that are expected to be in the user FoV. Therefore,

Dataset # of users # of videos Sampling freq.

1. UNantes [25] 57 19 5Hz

2. THU [115] 48 18 90Hz

3. NTHU [62] 50 10 30Hz

Table 5.1: Datasets used in our analysis

viewport prediction in the adaptive 360^◦ video streaming system is a crucial component, which is the basis of the rate adaptation algorithm and thus would significantly affect the user’s Quality-of-Experience (QoE).

Viewport prediction algorithms have been widely studied in existing works, which can be mainly classified into two categories: user-based algorithms and content-based algorithm-s. User-based algorithms take historical user behaviors (e.g., head movements) as hints to predict future user behaviors, where linear regression has been used in many previous stud-ies [77, 38, 76] and proved to have relatively good performance for predicting the near future.

Probabilistic models are built to fit the linear regression prediction error [118] and capture the cross-user interests across different tiles [119]. Content-based algorithms are proposed by the computer vision (CV) community using the traditional saliency or object detection algorithms on 360^◦ videos to find user’s Region-of-Interest (ROI) on the contents [45, 122].

However, this class of algorithms usually employ complicated learning structures (e.g., deep neural networks) and thus introduce significant computation demands, which does not suit for real-time rate adaptation on mobile clients.

In document Enhancing mobile multimedia services (Page 66-70)