Enhancing mobile multimedia services

(1)

Enhancing Mobile Multimedia Services

by

Lei Zhang

M.Sc., Simon Fraser University, 2013

B.Eng., Huazhong University of Science and Technology, 2011

Thesis Submitted in Partial Fulfillment of the Requirements for the Degree of

Doctor of Philosophy in the

School of Computing Science Faculty of Applied Science

c

Lei Zhang 2019

SIMON FRASER UNIVERSITY Summer 2019

Copyright in this work rests with the author. Please ensure that any reproduction or re-use is done in accordance with the relevant national copyright legislation.

(2)

Approval

Name: Lei Zhang

Degree: Doctor of Philosophy (Computing Science)

Title: Enhancing Mobile Multimedia Services

Examining Committee: Chair: Ouldooz Baghban Karimi Lecturer Jiangchuan Liu Senior Supervisor Professor Qianping Gu Supervisor Professor Ping Tan Internal Examiner Associate Professor Shervin Shirmohammadi External Examiner Professor

School of Electrical Engineering and Computer Science The University of Ottawa

(3)

Abstract

The deep penetration of mobile smart devices, e.g., smartphones, tablets, and wearable de-vices, has significantly enriched the multimedia services on the Internet and undoubtedly reshaped the way that users access them. The unique user interactions and the featured user interfaces of the mobile smart devices has brought both opportunities and challenges to de-livering today’s multimedia services. In this thesis, we present our studies on understanding and enhancing multimedia services on mobile platforms. First, we present a measurement and enhancement study on instant video clip sharing, an emerging mainstream multimedia service. We systematically investigate its distinct mobile interface, service framework, and user watching behaviors, revealing how this service type differentiates from its traditional counterparts, based on which we formulate and solve the optimization problem to maximize the viewing experience as well as the cost efficiency. Second, by utilizing the touch screen user interactions on mobile devices, we design and build a mobile-friendly HTTP middle-ware, which interprets user input gestures and optimizes the HTTP download of media contents for dynamic-viewport mobile applications, a class of mobile Internet applications that download contents beyond the user’s viewing region. Further, we jointly consider user behaviors and application features to optimize the steaming of 360-degree videos, a nov-el type of multimedia contents. We propose the viewport-aware adaptive 360-degree video streaming framework with robust viewport prediction and QoE-based rate adaptation op-timization. Finally, we attempt to improve mobile multimedia services by leveraging cloud resources. We closely examine the performance and energy efficiency of offloading realtime video applications to cloud, propose a scheduling algorithm that makes adaptive offload-ing decisions at a fine granularity in dynamic wireless network conditions, and verify its effectiveness through real-world case studies with advanced mobile platforms and practical applications.

(4)

Dedication

This study is wholeheartedly dedicated to my beloved parents, who have been my source of inspiration and strength, who continually provide their moral, spiritual, emotional, and financial support.

(5)

Acknowledgements

First, I would like to thank my senior supervisor, Dr. Jiangchuan Liu. His attitude towards research has greatly influenced me. His insights and suggestions on my research are in-valuable for the success of my Ph.D. studies. His significant enlightenment, guidance and encouragement have made me not only a better researcher but also a better man.

Second, I want to thank my supervisor, Dr. Qianping Gu, the internal examiner, Dr. Ping Tan, and the external examiner, Dr. Shervin Shirmohammadi, for reviewing this thesis and providing helpful comments that helped me to improve its quality. I also want to thank Dr. Ouldooz Baghban Karimi for taking the time to chair my thesis defense.

Many colleagues and friends of mine provided help in my studies and my everyday life. The time with them is unforgettable. I am deeply indebted to them.

Finally, I want to thank my family for their love and care. Nothing would happen without your supports, my parents. I love you all!

(6)

List of Tables

Table 4.1 Comparison and proportions of two-way traffic . . . 54 Table 5.1 Datasets used in our analysis . . . 59 Table 5.2 Pearson correlation coefficient between prediction error and other factors 64 Table 5.3 Normalized Rebuffering Time . . . 76 Table 6.1 Delay tolerance in online gaming . . . 81 Table 6.2 Average number of frames updated before the response arrived . . . . 81 Table 6.3 Power Model . . . 89 Table 6.4 Energy efficiency of the optimal offloading scheduling and the proposed

approach with different bandwidth probing methods . . . 90 Table 6.5 Comparison of video platforms and our generic model (EA-MRDA) . 90

(10)

List of Figures

Figure 3.1 Typical main interface of mobile instant video sharing . . . 12

Figure 3.2 Service framework of mobile instant video clip sharing (Vine as an example) . . . 12

Figure 3.3 Video popularity . . . 15

Figure 3.4 Video lifespan and propagation . . . 16

Figure 3.5 An illustration of playlist optimization . . . 18

Figure 3.6 Popularity distribution . . . 28

Figure 3.7 Average video uploading rate . . . 28

Figure 3.8 Density of inter-arrival time of user gestures . . . 29

Figure 3.9 The initial scrolling speed of flings . . . 29

Figure 3.10 Impacts of p, q and r . . . . 30

Figure 3.11 Impact of α . . . . 31

Figure 3.12 Impact of downloading bandwidth when p/q = 1.5, p/r = 1.5 . . . . 32

Figure 3.13 Impact of downloading bandwidth when p/q = 3.5, p/r = 3.5 . . . . 32

Figure 4.1 An example of dynamic-viewport mobile application . . . 36

Figure 4.2 360-degree video watching application . . . 37

Figure 4.3 Top 50 websites’ viewport distribution . . . 38

Figure 4.4 CDF of normalized viewport size for the top websites with dynamic viewports 39 Figure 4.5 An example of screen scrolling process . . . 40

Figure 4.6 Middleware architecture. . . 40

Figure 4.7 Normalized viewport size . . . 53

Figure 4.8 Viewport load time . . . 54

Figure 4.9 Screenshots of two browsing sessions with the same timestamp . . . 54

Figure 4.10 Amount of traffic . . . 55

Figure 4.11 Bandwidth consumption with fixed resolution . . . 56

Figure 4.12 A sample trace of one video watching session . . . 56

Figure 4.13 Video quality constitutions with different bandwidth (Video 1 to 3 from left to right). . . 57

Figure 5.1 Recall vs Sampling Frequency. . . 61

(11)

Figure 5.3 Recall vs Tile Setting . . . 62

Figure 5.4 Precision vs Tile Setting . . . 62

Figure 5.5 CDF of Normalized Prediction Error . . . 63

Figure 5.6 CDF of the Ratio of Angular Speed/Longitude Error . . . 64

Figure 5.7 CDF of Angular Speed . . . 64

Figure 5.8 Clustering Longitude Traces . . . 66

Figure 5.9 Probability of Head Turn . . . 67

Figure 5.10 Tile Classes . . . 67

Figure 5.11 System Architecture . . . 72

Figure 5.12 Recall at 5Hz . . . 74

Figure 5.13 Precision at 5Hz . . . 74

Figure 5.14 Recall at 2Hz . . . 74

Figure 5.15 Precision at 2Hz . . . 74

Figure 5.16 Average Quality Level. . . 75

Figure 5.17 Maximal Quality Gap. . . 75

Figure 5.18 Standard Deviation of Quality Level . . . 76

Figure 5.19 Actual Consumed Bandwidth . . . 76

Figure 6.1 A snapshot of the test devices . . . 80

Figure 6.2 Impact of packet loss . . . 82

Figure 6.3 Comparison of power consumption . . . 91

Figure 6.4 Workflow of video-based MRDA . . . 93

Figure 6.5 Video-based MRDA with baseline and high profiles, and adaptive profile switching. . . 94 Figure 6.6 Remote accessing CAD 2D and 3D tasks in video-based approach . 96 Figure 6.7 Video-based vs. primitive-based, and adaptive switching between them. 97

(12)

Chapter 1

Introduction

During the past decade, we have witnessed the pervasive penetration of mobile smart devices such as smartphones, tablets and wearable devices, which significantly enrich multimedia services and improve their user experience. In the foreseeable future, this growing trend of mobile smart devices will continue with no doubt. Cisco has reported that, by 2021 nearly three-quarters of all devices connected to the mobile network will be smart devices, which will contribute more than four-fifths (86%) of mobile data traffic [21]. Along with the increasing availability of various mobile terminals, most of multimedia services have been provided and optimized for accesses on mobile platforms. Moreover, mobile multimedia services are taking a larger and larger share of the market. According to the statistic, more than three-fourths of the world’s mobile data traffic will be video by 2021 [21]. For example, YouTube mobile gets over 1 billion views a day, making up more than half of YouTube views. Given the fast growing population of mobile users, enhancing their quality-of-experience (QoE) is becoming increasingly important.

Compared to other powerful platforms such as desktop computers, mobile devices usu-ally have much less capabilities in terms of computation, bandwidth, power supply, stor-age, etc.. Despite the fast development of the technologies and the effort towards uni-fying hand-held and desktop computers (e.g., through Windows 8/10, iOS/MacOS, and Android/ChromeOS), it remains widely agreed that mobile terminals will not completely replace laptop and desktop computers in the near future. Migrating popular multimedia services to mobile platforms or developing similar services is confined by the limited com-putation capability of the mobile devices, as well as their unique and lightweight operating systems and hardware architectures. Further, even though the hardware and the mobile networks continue to evolve, energy is still a major impediment to providing reliable and sophisticated mobile multimedia services that meet the user demands, as mobile devices continue to suffer from the limited battery. Batteries, the only power source for most mo-bile devices, has shown relatively slow technology improvement in the past decade, whose average capacity has been grown only 5% annually [82].

(13)

As mobile multimedia services are often featured by their user interactions, application interfaces, hardware capabilities, etc., it is crucial to identify the key influence factors for their performance (e.g., QoE and cost efficiency). The research goal of this thesis is to im-prove user experience and cost efficiency of mobile multimedia services. To achieve our goal, we start from a measurement-based study to gain the knowledge about today’s represen-tative mobile multimedia applications, and based on our insightful observations we further conduct enhancement studies from different aspects including utilizing user interactions, application features and cloud resources. The four works in this thesis are organized as fol-lows. First, we take advantage of the unique user-touch screen interactions and the resulting screen scrolling effects on the mobile devices to enhance the emerging mobile instant video sharing services. Second, we identify and exploit the opportunities from the typical appli-cation interfaces on mobile clients (e.g., limited and fixed size of screen and viewport) to improve dynamic-viewport applications, a class of mobile Internet applications that make HTTP downloads for media contents outside user viewports. Further, specifically for adap-tive 360◦ video streaming, we investigate the QoE-based rate adaptation scheme based on the robust viewport prediction, which jointly considers the user behavior and the applica-tion interface. Finally, besides focusing on the mobile features, we also seek assists from cloud resources to offload the computations of mobile multimedia services to the cloud and boost the energy efficiency with performance guarantee.

In Chapter 3, we present an initial study on mobile instant video clip sharing empow-ered by a combination of advanced mobile and cloud platforms. With Twitter’s Vine as a representative, we systematically investigate the distinct interface and service framework of this mainstream service type, and identify the unique viewing behaviors, including batch

views and passive views. We develop a data collection engine to track the metadata of video

clips and user accesses from Vine. Compared to early-generation videos, the instant mobile video clips have much short lifespan and highly skewed popularity that quickly decays over time, which is aggravated by the unique screen scrolling operation. As such, the download-and-watch scheduling widely used by existing platforms can hardly achieve quality user experience and high cost efficiency. We closely investigate and model the user gestures for scrolling, including drag and fling, and analyze the scheduling policy, partitioning it into pre-fetching scheduling and watch-time download scheduling. We then develop effective so-lutions towards both subproblems as well as their integration with screen scrolling. The superiority of our enhancement is demonstrated by extensive trace-driven evaluations.

As dynamic-viewport mobile applications usually use HTTP for content downloading, in Chapter 4, we showcase the design of Mobile-Friendly HTTP middleware (MF-HTTP), which acts at the application layer, interprets screen scrolling processes on mobile devices by tracking user touch screen operations, and optimizes the downloading of media objects to improve QoE and cost efficiency of such applications. We first demystify screen scrolling philosophy in mobile operating system in depth. With the opportunities of collecting and

(14)

understanding user touch screen operations, we show how to precisely break down the view-port movement, and identify the media objects involved in the process. By examining the key influential factors for media object downloading, we develop an optimal download scheme. Towards building a practical middleware, we further discuss the implementation details for MF-HTTP, based on which we implement a prototype on Android platforms. We conduct concrete case studies on two typical dynamic-viewport mobile applications, namely, web browsing and 360◦ video streaming, integrate them with our MF-HTTP middleware imple-mentation, and evaluate the performance through extensive experiments. This optimization flow can easily be applied to other protocol/system enhancements for dynamic-viewport mobile applications.

In Chapter 5, we take a systematical study to carefully examine the viewport predic-tion algorithm design space and analyze the potential predicpredic-tion performance issues therein, targeting to develop a robust solution that can smartly accommodate the prediction inaccu-racies caused by various dynamics. In particular, with our fine-grained data-driven analysis, we divide the prediction errors into different categories so as to conquer them differently. We also identify good indicators to bound the estimate errors, which are further integrat-ed with cross-user interest analysis to better weight tiles into various classes. We further design the adaptive 360◦ video streaming framework with our robust viewport prediction and QoE-based rate adaptation optimization as well as a series of implementation issues being considered. We conduct extensive trace-driven simulations to evaluate our solution with other state-of-art approaches. The results show that our solution outperforms other approaches by achieving good and stable performance and being robust against various dynamics.

Chapter 6 presents our investigation on the energy-efficient mobile offloading for video-based applications. Different from other types of applications, realtime video applications have stringent delay constraint and dynamic bandwidth requirement. The playback needs to be continuous and the video quality can vary to adapt to the changing network conditions. Through measurements of dynamic wireless network channels in the state-of-the-art mobile platforms, we examine the performance and energy efficiency of migrating representative applications to the cloud. We identify the critical issues in mobile offloading for realtime video applications. We then develop a generic offloading model accordingly in this context and propose an scheduling algorithm that adaptively offloads tasks to accommodate the dynamics of wireless channels in fine granularity. Trace-driven simulation results prove the effectiveness of our solution. We further present two case studies of practical applications with advanced mobile platforms to demonstrate the superiority of our solution and the significant gain of our approach over existing approaches.

(15)

Chapter 2

Related Work

2.1 Understanding and Enhancing Multimedia Services

2.1.1 Video Sharing

Video sharing has been a killer application since its emergence, which attracts much atten-tion from academia. Pioneer works have carefully studied the characteristics of YouTube, a representative of video sharing over the Internet. To understand the explosive growth of UGC and its implications on underlying infrastructures, M. Cha et al. [12] conducted an extensive data-driven analysis of YouTube on the popularity distribution, popularity evolution, and content duplication. Besides studying the nature of the user behavior and the key elements that shape the popularity distribution, the authors further discussed d-ifferent UGC cache schemes, as well as a potential peer-assisted system design. Another measurement work by X. Cheng et al. [17] has examined the intrinsic statistics of YouTube videos, and investigated the social networking in YouTube videos. Compared to traditional video contents, YouTube mostly comprises videos with shorter video length, smaller file size, whose active life span follows a Pareto distribution. The small-world property is confirmed in the graph formed by YouTube’s related videos. As the emerging online social networks re-shaped the way people watch videos, H. Li et al. [57] presented a study on the video shar-ing in Renren, a facebook-like OSN in China, which shows much more skewed popularity distribution of videos in Renren compared with that in YouTube. Based on the unique char-acteristics of video sharing propagation in OSNs, an extended epidemic model, the SI2_RP model, was proposed in [16] to effectively capture the propagation process of video sharing in OSNs. Further, Pop-Forecast [120], a systematic method for accurately forecasting the popularity of videos promoted through social networks, was designed to optimize the fore-casting accuracy and the timeliness with which forecasts are issued. A more recent study presented a qualitative directed content analysis of youth-authored videos on YouTube and Vine, and discussed the possible differences in how adults and youths approach online video sharing [129].

(16)

Another stream of research efforts has focused on evaluating and improving video-based services for mobile users over various communication networks [111, 121, 37], mainstream platforms [101], efficient coding schemes [88, 131], emerging cloud computing architec-ture [112, 107, 109], and novel transmission standards [69, 26]. Leveraging device-to-device (D2D) communication, a propagation- and mobility-aware content replication strategy for edge-network regions was proposed in [111], in which social contents are assigned to users in edge-network regions according to joint consideration of social graphs, content propagation and user mobility. To integrate D2D-supported content delivery into 5G cellular networks, a resource allocation scheme providing delay QoS guarantee was designed and thoroughly analyzed in [121]. To better understand the energy consumption for mobile video delivery, the experiments conducted by R. Trestian et al. [101] show that, TCP is generally more en-ergy efficient than UDP under most circumstances, and, by changing the quality level of the multimedia stream, the energy can be greatly saved while the user perceived quality level is still acceptable. T. Schierl et al. [88] discussed the potential use of scalable video coding (SVC) in mobile networks, and outlined use cases of mobile media delivery that can benefit from using SVC. By using MPEG-7-based coding, Z. Yuan et al. [131] proposed ADAMS, an adaptive mulsemedia delivery solution for delivering both scalable video and sensorial data to enhance end-user quality of experience. As an emerging video delivery standard over the top of existing infrastructures, different solutions of Dynamic Adaptive Streaming over HTTP (DASH) (Microsoft Smooth Streaming, Adobe Dynamic Streaming, Apple HTTP Live Streaming, and one prototype implementation of the MPEG-DASH standard) have been evaluated for mobile networks [69]. D. De Vleeschauwer et al. [26] further designed a DASH schedular for mobile cellular networks by formulating and solving a utility maximiza-tion problem, and the proposed algorithm can achieve required fairness among the video flows as well as automatically and fairly adapt video quality with increasing congestion thereby preventing data flow throughput starvation.

2.1.2 Dynamic-Viewport Applications

Such mobile smart devices as smartphones, phablets, and tablets, undoubtedly reshape the way that users access Internet services. Existing studies have tackled the challenges brought by the intrinsic mobile nature and enhanced multimedia services to accommodate seamless mobility [123, 133], inefficient retransmission [52], unstable channel quality [8, 138, 132], and unexpected interference [60, 116] in wireless and mobile networks. Among these works, some have attempted to improve multimedia applications by utilizing rich interfaces and user interactions on mobile smart devices.

Dynamic-viewport applications, a class of mobile Internet applications that make HTTP downloads for media contents outside user viewports, are widely seen and have drawn the interest from many researchers. A serial of studies have been conducted to optimize web browsing, an application that is largely affected by user viewport. Prior work [108] suggested

(17)

that client-only approaches have significant limitations for mobile users: caching [78] web contents does not remove the true bottleneck of web page loading–RTT, and predictive prefetching [73] cannot work well either since most of the pages will only be requested once by a user. A recent measurement study [49] showed that only a few web sites have fully deployed HTTP/2 (the state-of-the-art standard in industry) servers, and few of them have correctly realized the new features in HTTP/2, which implies the necessity of research efforts on optimizing web performance. Scheduling network requests is a widely exploited approach to reduce page load time, which is designed base on the dependency between web page elements [71]. Butkiewicz et al. [10] proposed KLOTSKI, a system that prioritizes the contents most relevant to the user preference and with least rendering time. By collecting the traces of user gaze fixation during web browsing, Kelton et al. [51] examined the focus of user attention and reordered the loading of web objects accordingly. To achieve the best performance-energy tradeoff, Ren et al. [81] adopted a machine learning based approach to predict the optimal processor configurations at runtime for heterogeneous mobile platforms. For video streaming, another killer application greatly influenced by user viewport, its rate adaptation scheme is one of the fundamental research issues. By studying the respon-siveness and smoothness trade-off in DASH, Tian et al. [99] showed that client-side buffered video time is a helpful feedback signal to guide rate adaptation. Instead of constantly pre-dicting future capacity, Huang et al. [44] proposed to use simple capacity estimation only in the startup phase and then choose the video rate based on the current buffer occupancy in the steady state. Novel techniques, e.g., deep learning [67] and emerging computing archi-tectures, e.g., edge computing [46, 100, 68] are also adopted to improve the rate adaptation for video streaming. Recently, MPEG DASH standard [94] has included a new Spatial Rep-resentation Description (SRD) [72] feature, to support the streaming of spatial sub-parts of a video to display devices, in combination with adaptive multirate streaming that is in-trinsically supported by DASH. Following this advance, DASH has been further exploited to stream zoomable and navigable videos [24], virtual reality videos [40], and multiview videos [27]. For 360◦ video streaming, Qian et al. [77] designed a viewport prediction mech-anism based on the analysis of user head movement traces to optimize the rate-adaptation, and reworked other related components in the streaming pipeline to further boost the per-formance against non-viewport-adaptive approaches. He et al. [38] identified that viewport prediction error can result in significant video quality degradation, and thus proposed a novel tile-based layered approach to adaptively stream 360◦ content on smartphones.

2.2 Improvements for Mobile Clients

2.2.1 Tuning Mobile Hardware

In the early ages, the initial research focus was on leveraging mobile hardware (e.g., pro-cessors, displays, network interfaces) to improve energy efficiency of multimedia services

(18)

on mobile devices with little or no performance degradation. Among many popular tech-niques, dynamic voltage frequency scaling (DVFS) [7] for multi-core processors and dynamic backlight scaling (DBS) [15] for mobile displays are deeply examined and widely applied.

To reduce the power consumption from executing computational tasks, considerable efforts have been put on efficiently scheduling media processing tasks on mobile multi-core processors using DVFS. Lee et al. [55] investigated the energy-efficient scheduling of realtime video processing tasks running on DVFS-enabled multi-core platforms. The proposed scheme reduces energy consumption by executing the tasks in parallel on an appropriate number of cores with as lower frequency as possible to meet the deadline, while the other cores keep power off. As the knowledge of how much computation needed for each task is hard to obtain in practice, Ma et al. [66] proposed a complexity model for video decoding using H.264/AVC. They used this model to accurately predict the required clock frequency of the ARM Cortex A8 processors and hence perform DVFS for energy efficient video decoding. The DVFS technique has also been applied to the modern processor architecture/platform such as ARM’s big.LITTLE [34] and Nvidia’s Tegra K1 [95], in which the processors are heterogeneous in terms of processing capability and power consumption, and thus should be treated differently for scheduling.

Among the operational components of a mobile phone, display power is one of the most crucial issuees. Not only have displays been identified as one of the most powerconsuming subsystems [11], but display power is also consumed in a broader multimedia-related appli-cations as long as the content is playing on the mobile phone. Cheng et al. [15] first proposed a Concurrent Brightness and Contrast Scaling (CBCS) technique that aims at conserving power by reducing the backlight illumination of TFT-LCD screens, while retaining the im-age fidelity through preservation of the imim-age contrast. As for video streaming, one of the initial research works considered the video stream consisting of a series of image frames and dynamically changed the backlight by applying backlight-scaling techniques to each image frame individually [86]. As QoE can be impacted differently by DBS strategies [125], QoE-aware approaches attempt to achieve backlight energy savings for video playback while preventing frequent backlight switchings [14] or preserving video’s perceptual quality [59]. In a more recent work, Yan et al. [124] extracted the inherent relationship among bitrate, display brightness, and video quality from a realworld dataset. The proposed rate and brightness adaptation scheme jointly considered video transfer and display energy to shift the conventional rate-distortion (R-D) tradeoff to a novel rate-distortion-energy (R-D-E) tradeoff specifically tailored for mobile devices.

2.2.2 Utilizing Cloud Resources

Supporting sophisticated multimedia applications on mobile platforms requires huge pro-cessing power and high battery consumption that usually exceed mobile device’s capabilities. Offloading [54] is a solution to prolong the battery lifetime and expand these mobile system’

(19)

capabilities by migrating computation to more resourceful cloud servers. This is particularly attractive for many multimedia-related applications (e.g., AR/VR, cloud gaming, virtual desktop infrastructure, etc.) that are generally computation intensive. With offloading, the user inputs from a mobile device can be sent to the cloud and executed remotely. The results can be rendered as high-quality videos (e.g., desktop or game scenes) in the cloud and then be streamed back to the mobile device. A significant amount of research has been performed on feasible and smart computation offloading for multimedia services. MAUI [23] and CloneCloud [20] partition applications using a framework that combines static program analysis with dynamic program profiling to optimize execution time or energy consump-tion. Wolski et al. [114] proposed to monitor and predict the offloading bandwidth using a Bayesian scheme, based on which offloading costs can be estimated and offloading decisions are made accordingly. The dynamical offloading algorithm designed by Huang et al. [42] focuses on achieving energy saving given the change of wireless connection. Meanwhile, the interdependency of the partitioning application components should be considered because of the different execution latency constraints and data sharing cost with each other. Tham et al. [102] formulated the opportunistic offloading scheduling problem as a Markov Decision Process to minimize the offloading and processing cost, as well as to guarantee the availabil-ity of cloudlets in the presence of user mobilavailabil-ity. Using smartphone VM (Virtual Machine) image inside the cloud, ThinkAir [53] targets for computation offloading in a commercial cloud scenario with multiple mobile users instead of single user. It considers not only of-floading efficiency and convenience for developers, but also the elasticity and scalability of the cloud for the dynamic demands from users. From the communication perspective, man-aging energy consumption for networked transactions is a critical issue for mobile devices. Ra et al. [80] discuss the trade-off between QoS and delay of data transmission for mobile platforms and presents a stable and adaptive link selection algorithm. Catnap [28] exploits the bottlenecks of wireless and wire links and utilizes an application proxy to decouple data units into segments, which are to scheduled as bursts during transmission for energy sav-ing. Bartendr [89] demonstrates that strong signal can reduce energy cost from empirical study. It then develops energy-aware scheduling algorithms for different workloads, includ-ing background synchronization traffic and video stream traffic, based on signal prediction by location and history.

With cloud computing, considerable research efforts have been dedicated to improve mobile multimedia services. CALMS [105] adaptively leases and adjusts resources in the cloud servers to meet the dynamic demands from users, offering a generic framework for migrating live streaming services. In [39], considering the geographical diversity of cloud resource prices, a Nash bargaining solution is developed for the bandwidth provisioning and the video placement strategies. AMES-Cloud [107], a mobile video service framework proposed by X. Wang et al., leverages the cloud computing technology to construct private agents for mobile users which adaptively adjust streaming rate using SVC and pre-fetch

(20)

video contents according to the classified social activities. For instant social video sharing, the multi-cloud hosting problem was formulated by Z. Wang et al. [109] to disperse contents so that the globally located demands are satisfied and the inter-cloud traffic are minimized. An emerging mobile cloud computing paradigm that involves both offloading and video streaming is the Mobile Remote Desktop Access(MRDA) [103]. In MRDA, the entire desktop environment is hosted in the remote server while the client is only in charge of receiving and displaying the contents. Another related application is cloud gaming, which migrates game execution to the cloud and streams the gaming scenes back to the end users [41]. However, it demands ultra-low latency and the video decoding on the client side may result in excessive use of energy. It has been shown that a naive offloading can incur even higher energy consumption in a state-of-the-art mobile platform [91].

(21)

Chapter 3

Understanding and Enhancing

Mobile Instant Video Clip Sharing

Identifying the influential factors of user experience and cost efficiency is crucial for prop-erly understanding the mobile multimedia service, which can motivate the design of the enhancements. In this chapter, we conduct a measurement study on the new generation of mobile instant video clip sharing service. Based on the key observations and knowledge gained from our analysis, we attempt to improve this service type by proposing and solving a download scheduling optimization problem.

3.1 Background

In the past two decades, we have witnessed the great success of user-generated multimedia content sharing, in particular online video sharing, and its rapid evolution. The first gener-ation, sharing over the Internet, is represented by a number of video sharing sites (VSSes) such as YouTube [12, 17]. Later, online social networks (OSNs), e.g., Facebook and Twitter, emerged to offer the second generation video sharing, in which users access multimedia con-tent through proactively sharing the video links from external VSSes among friends [83, 57]. Recently, the rapid development and penetration of mobile social networking have enabled the third generation video sharing services that use smart mobile terminals to instantly capture and share ultra-short video clips (usually of several seconds). Many mobile apps, e.g., Twitter’s Vine, Instagram, and Snapchat, to name but a few, have incorporated such multimedia services and seen great acceptance, particularly by the youth community [129]. It has also become a mainstream service type in China, where similar emerging apps (e.g., Miaopai, Weishi, Kuaishou, Douyin, Huoshan, etc.) have attracted tremendous amount of users and investments. For example, Miaopai with 70 million daily active users closed a $500 million funding round in 20161, and it now handles 1.5 million uploads per day, with 2.5

bil-1

(22)

lion videos watched every 24 hours; Kuaishou with 50 million daily active users who upload 10 million videos per day, received a $350 million investment from Tencent in 20172. The instant video clips in these services are directly consumed at smart-terminals with specially designed mobile interfaces and operations. The expanded social relations and the distinct operations on the mobile terminals, particularly screen scrolling, have greatly increased the amount of videos available to watch, and in the meantime, shorten the time focusing on individual videos from tens of minutes to only a few seconds.

User experience is crucial to mobile instant video clip sharing. An instant video clip itself is of only several seconds long, thereby a mobile user can hardly tolerate a long delay, which would completely ruin the viewing experience. A straightforward solution is to pre-fetch video clips, which is known to be cost-effective and energy-efficient [32]. Yet given the massive ultra-short video clips, deciding which to pre-fetch and when to pre-fetch become much greater challenges. Users of mobile instant video clips also tend to make requests for but fail to finish watching the video clips, many of which even have no chance to start playing with fast screen scrolling. Smart and adaptive watch-time scheduling is thus needed to cope with these distinct operations in the mobile context. To the best of our knowledge, this new service type has not yet been studied in the literature. In this work, we present an systematical study on mobile instant video clip sharing, an emerging mainstream multimedia service empowered by a combination of advanced mobile and cloud platforms.

3.2 Overview of Mobile Instant Video Clip Sharing

3.2.1 Motivation

We next present a case study on Twitter’s Vine, which enables users to create ultra-short video clips (limited to a maximum of 6-seconds), as well as post and share them with follow-ers or in OSNs, particularly Twitter (which acquired Vine in October 2012) and Facebook. Vine exclusively focused on mobile users from the very beginning, attracted over 200 mil-lion active users since its initial release in January 2013, and continued its core service on Twitter. Other products in the market, e.g., Instagram, Snapchat, Miaopai, and Kuaishou, share similar service architectures and interfaces.

With a Vine client, a user can view, like, comment, and share (repost) the recent posts from others in the Home/Feed page, which is, as shown in Fig. 3.1, a typical and necessary interface for mobile instant video clip sharing and is commonly seen in similar apps. The user can also search for video clips and people of interest, and dedicated channels for specific topics in the Explore page. Compared to traditional OSNs with follower-followee social relationships, a key (and significant) difference is that the media of interest here are ultra short video clips. This makes its user experience notably different.

2

(23)

(a) Vine (b) Instagram (c) Miaopai

Figure 3.1: Typical main interface of mobile instant video sharing

(24)

3.2.2 Service Framework

We have conducted a traffic measurement from our university campus on Twitter’s Vine. We captured the traffic between test devices and servers, and intercepted the SSL connec-tions between them to view detailed requests from the application by using the mitmproxy tool. The traces show that Twitter builds the mobile instant video sharing system based on a cluster of cloud services, including Amazon EC2, Amazon S3, and Amazon CloudFron-t, as well as CDNs provided by Akamai and Fastly. We accordingly illustrate the service framework in Fig. 3.2. A Vine client initiates and maintains a HTTPS connection with the application server running on the EC2 instances with domain name api.vineapp.com. After an authorization process, the user can make requests, and the server in turn offers responses for the user to complete such actions as browsing, search, post, comment, and like. When the user logs into the app (or returns to the Home/Feed page), the client makes a GET request for the timeline information, which corresponds to the recent updates. After receiving the response, the client can further make GET requests to CDNs with domain name v.cdn.vine.co or mtc.cdn.vine.co to download the video clips and the correspond-ing thumbnails. From the meta-data in captured packets, we infer that the videos and the thumbnails are stored on Amazon S3. Similar operations are performed when visiting the Explore page. A slight difference is that the static web images in the page layout are distributed by Amazon CloudFront.

3.2.3 Screen Scrolling and Key User Behaviors

In traditional VSSes and OSNs, users need to click to view or link to one specific video, which only allows them to view one video each time/click. Vine-like services, however, return a playlist of video clips when a user touches the screen to view the updates for certain users, tags, or channels. As the user scrolls the smartphone/tablet’s screen instant video clips are seamlessly played from the generated list. Scrolling includes a series of user gestures, typically click, drag and fling, and the speed, acceleration, and continuity vary depending on the user’s input. Given the fixed organization of instant video clips in playlists, it has become an essential user action.

We use Batch View to refer to the unique user behavior of viewing multiple video clips with screen scrolling. The batch view implies that mobile users can watch a considerable amount of instant video clips within the playback time of one conventional video (e.g., from YouTube). A related new behavior is Passive View. The media contents are arranged in order and a user has limited control over the order for playback (recall the Vine case). For two video clips of interest, if they are separated in the playlist, the user may have to download (and watch) all the video clips between them. These videos of no interest have to be passively watched, and resources for downloading and playing them will be consumed.

(25)

3.3 A Closer Investigation:

Measurements and Observations

3.3.1 Properties of Instant Video Clips

Datasets

We developed customized crawlers and collected the traces of Vine videos that were posted in 16 user channels (47,794 posts) and 2 promotion channels (8,891 posts). For each instant video clip, we accessed and recorded its repost history, including the exact time when it was shared and the user who reposted it. User channels focus on dedicated topics, where each channel has two sections: recent and popular. An instant video clip can be uploaded to any of the recent sections in these 16 channels, and each user channel lists a small number of popular posts in the popular section. The promotion channels do not accept the posts directly from the normal users; Instead, they choose the most popular and most trending videos clips among all the recent posts in Vine.

Popularity

We use the number of reposts to evaluate the video popularity, since the actual number of views for each video clip is hard to obtain by our crawlers. Fig. 3.3a plots the number of reposts as a function of the rank of the video clip by its popularity for all 16 user channels. The plot does not follow a Zipf distribution (which should be a straight line on a log-log scale). This result is different from the previous observations on traditional video sharing services: While the popularity of YouTube videos exhibits a Zipf-like waist with a truncated tail [12, 17], the requests distribution versus video ranks of Renren (the largest Facebook-like service in China) videos follows a perfect power-law pattern [57]. To further understand how the popularity is distributed among Vine videos, we plot the cumulative proportion of the total number of reposts versus the percentile of the video clip in Fig. 3.3b. As shown, the popularity of video clips in the user channels is extremely skewed: the top 5% video clips accounts for more than 99% reposts. It heavily deviates from the Pareto Principle (or 80-20 rule). This result is quite surprising, since other video sharing services show much smaller skewness: the top 10% popular YouTube videos account for nearly 80% of views [12]; whereas the top 2% videos in Renren take up 90% of the total requests, and the 5% videos attract 95% of requests [57]. The popularity distributions for different generations of video sharing services show a trend of becoming more and more skewed throughout the 3-stage evolution (YouTube: 10%-80%; Renren: 2%-90%, 95%; Vine: 2%-95%, 5%-99%). The YouTube result implies that, originally, users’ interests across videos are not evenly distributed (biased towards popular videos). People tend to watch what others have watched, which is exaggerated when OSNs are introduced, as users in the same social group share common interests. On top of social networking, Vine-like services further offer users

(26)

100 101 102 103 104 105 rank 100 101 102 103 104 105 106 number of reposts

(a) Video clips rank ordered by the number of re-post 65 70 75 80 85 90 95 100 percentile (%) 10-8 10-6 10-4 10-2 100

fraction of aggregate reposts

(b) Skewness of popularity across video clips from the user channels

Figure 3.3: Video popularity

ubiquitous mobile accesses, which lead to a more efficient and more extensive propagation of the instant video clips.

Lifespan and Propagation

To investigate how the number of reposts changes with time, we plot Fig. 3.4a, which shows the average daily number of reposts after the video clips were created. As the popularity of the collected video clips is highly skewed, we only consider popular video clips in the following analysis, specifically, the top 5% reposted video clips from the user channels and all the video clips from the promotion channels. One may notice that the plot lasts slightly longer than the data collection period. This is because many of the video clips that we explored may have been there for a while when we started crawling. In Fig. 3.4a, the average number of reposts for the popular video clips monotonically decrease day by day. Even for many of the popular video clips, they are most popular during the first day after the initial posts and are getting less and less popular afterwards. This fast decay feature of mobile instant video clips is quite unique: YouTube videos also reach the global peak immediately after introduction to the system, but decay much more slowly, while the requests for the new videos published in Renren generally experience two or three days latency to reach the peak value, then change dynamically with a series of unpredictable bursts [57].

By defining the active lifespan of a video post as the duration from its initial post to the first day in which it gets no repost, we plot the CDF of active lifespan of the popular video clips in Fig. 3.4b. Here we use a real value (0) as the threshold to decide whether the video clip is active in propagation, instead of other metrics such as the changing rate and the moving average. The reason is two-fold: first, as shown in Fig. 3.4a, although the number of reposts for the popular video clips may change dramatically in the first few days, it still can remain a large value; second, we can hardly know the impact of one repost, as the

(27)

5 10 15 20 25 30

number of days after post

0 1000 2000 3000 4000 5000 6000 7000

average number of reposts

top 5% video clips from user channels video clips from promotion channesls

(a) Daily number of reposts

0 5 10 15 20 25 30 lifetime (day) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 CDF

top 5% video clips from the user channels video clips from the promotion channesls

(b) CDF of lifespan

Figure 3.4: Video lifespan and propagation

number of passive viewers after each repost varies significantly (if the user who shares the video clip has a large population of followers, this repost can have a potentially significant impact on the propagation of the video clip). Even with such a “loose” definition of active lifespan, Fig. 3.4b shows that more than half of the popular video reposts can only stay in active for less than 10 days. This result is quite different from the related observations on traditional video sharing services: popular videos in Renren can continuously attract requests for several months [57]; some of YouTube videos can still get views even after 1 year since they were published, which implies that YouTube users’ interests are video-age insensitive in a gross scale [12]. The fast decay feature can be possibly explained by the mobile nature of ubiquitousness. As mobile users can upload, and more importantly watch instant video clips at any time and anywhere, they can propagate very efficiently and extensively, and thus can reach the peak immediately. And the frequent video watching and uploading from the mobile user also accelerates the fade of existing instant video clips, even for the popular ones.

3.3.2 Summary and Implications

We have revealed a series of unique features of mobile instant video clip sharing, including the extreme skewness, fast decay, and short lifespan. Although lack of common VCR controls (such as rewind and fast forwarding), batch/passive views with scrolling are effective in approaching successive instant video clips in the playlist, enabling users to find interesting contents more easily, and accelerating the propagation of popular videos. Yet, if not being handled properly, screen scrolling may ruin the viewing experience. Currently, most Vine-like services employ a naive download-and-watch scheme, which is clearly not an optimal solution. In the worst case (e.g., downloading every instant video clip through a poor cellular connection), a vicious circle can be formed: the downloading of a just skipped video will take up the network resources and block the downloading of those of interest, which will in

(28)

turn force the user to give up watching the target videos and scroll forward to search for other interesting videos.

As the unpopular video clips move towards the bottom of the playlist, users hardly see them again. On the contrary, the popular video clips will be promoted to the popular section, and users can easily reach these posts. They become more and more popular, keeping on the top of the playlist and thereby being accessed more frequently. With the batch and passive views, the above process is accelerated and exacerbated. Although this extreme skewness suggests that identifying the popular videos and pre-fetching them could be beneficial, deciding which video clips will fall into the user’s interests is never an easy task. Moreover, the short lifespan and fast decay imply the popular contents are much more dynamic than those in other mobile VoD or video streaming applications. This introduces a dilemma for pre-fetching: on one hand, we would like to cache as many videos as possible to provide fluent watching experience; on the other hand, if the cached videos cannot be watched soon enough by the user, it becomes a huge waste for fetching them, as they will probably be flushed out by more recent feeds, having no chance to be viewed. As such, neither a simple download-and-watch scheme nor a naive pre-fetching/caching scheme would work efficiently, and a smart adaptive solution is expected. More importantly, it must work well with screen scrolling, a rich operation whose multiple factors, e.g., speed/acceleration, are to be considered.

3.4 Enhancements on Mobile Side:

Pre-fetching and Watch-time Scheduling

3.4.1 Problem Formulation

We now present a generic formulation for the video download scheduling problem in mobile instant video sharing. As mentioned, instant video clips are usually organized in different playlists, which can be characterized into three types: the list of video updates from followees (social videos), the list of promoted videos in popular sections (popular videos), and the list of user uploaded videos in recent sections (recent videos). Only the playlist of social videos changes with different users, and the other two types of playlists remain the same across users. Consider one video watching event of a specific user. Denote the playlist of instant video clips that will be watched by the user as V = {v₁, v2, ..., vn}. As illustrated in Fig. 3.5,

according to different user input actions, each instant video clip may remain in the user’s viewport for for a specific duration. We use U = {u1, u2, ..., un} to denote such durations,

where u_i corresponds to the duration that the user watches video v_i. Also, we let u₀ denote the time that the user starts watching the playlist. We consider two types of network connections in this formulation: mobile cellular connections (e.g, 3/4G) and wireless local connections (e.g., WiFi). We use B(t), C(t), and E(t) to denote the available bandwidth, the monetary cost, and the energy consumption at a given time t, respectively, where B(t) ∈

(29)

Figure 3.5: An illustration of playlist optimization

{B_{wif i}, B_3/4G}, C(t) ∈ {C_{wif i}, C_3/4G} (C_{wif i} = 0, since the cost for WiFi connections is

usually negligible), and E(t) ∈ {E_{wif i}, E3/4G}. As in previous studies [135, 111], we divide the time evenly into discrete time slots. Let Ri be the video streaming rate of video vi, and

L be the maximum video length. In practice, most users capture video clips till reaching

the maximum length (in Vine’s case, 6 seconds); hence, their file sizes after transcoding to a certain resolution are almost the same, i.e., Ri and L can be treated as given constants3.

Define a video downloading schedule as S = {( ˆv1, ˆt1, ˆl1), ( ˆv2, ˆt2, ˆl2), ..., ( ˆvk, ˆtk, ˆlk)}, where a

tuple ( ˆvi, ˆti, ˆli) ( ˆvi ∈ V and ˆli > 0) means at time ˆti, we start to download video ˆvi for the

duration ˆli.

Our problem is to find a proper schedule S that optimizes the video watching experience with high efficiency in terms of monetary cost and energy consumption. We define the

3_{It is worth noting that for ease of exposition, here we assume homogeneous video length. Our model and}

solutions can be easily extended to afford various specifications for individual video, which does not change the fundamental problem studied in this work.

(30)

playback discontinuity of a single video v_i watched for the duration u_i as: discontinuity(vi) = 1 − 1 min(ui, L) · X t∈( i−1 P k=0 uk,min( i P k=0 uk, i−1 P k=0 uk+L)] I[ X (ˆvj,ˆtj,ˆlj)∈S, ˆ vj=vi,ˆtj≤t min(ˆtj+ˆlj−1,t) X ˆ t=ˆtj B(ˆt) ≥ (t − i−1 X k=0 uk) · Ri],

where I[·] is the indicator function. Inside the indicator function, it checks whether the download progress stays ahead of the playback at any given time slot. Given the watch duration (if it is longer than the video length, we use the video length instead), we can calculate the ratio of continuous playback, and thus define the playback discontinuity ac-cordingly (between 0 and 1). The single video playback discontinuity naturally reflects the user experience for a continuous playback, which calculates how many time slots the down-loading of this video misses the deadline for the playback. We further define the playback discontinuity of the playlist V as a weighted sum of those of individual videos:

Discontinuity = X

vi∈V

wi· discontinuity(vi),

where wi is the normalized weight for vi. An intuitive assignment of wi can be Pn1

k=1uk

ui,

which assigns higher weights to the videos that have longer watching durations, as longer watching durations usually imply higher user interests. We will further discuss more specific assignments of wi later.

Our objective is thus to minimize the playback discontinuity, as well as the total mone-tary and energy cost:

Ctotal= X (ˆvj,ˆtj,ˆlj)∈S ˆ tj+ˆlj−1 X t=ˆtj C(t), Etotal= X (ˆvj,ˆtj,ˆlj)∈S ˆ tj+ˆlj−1 X t=ˆtj E(t).

It is easy to see that the above objectives contradict with each other, as downloading more portions of the playlist can reduce the playback discontinuity but will also inevitably consume more energy and may increase the monetary expense. We thus adopt the following linear combination form to align them together:

p · Discontinuity + q · Ctotal Cmax

+ r · Etotal

Emax

, (3.1)

where p, q and r are the parameters to assign different weights to the three goals. As

(31)

consumption by their corresponding maximum values, where C_max is the maximum total cost of the case that all the videos in the playlist are downloaded through 3/4G links, and

Emax can be obtained similarly. We then have the following theorem:

Theorem 1. The decision version of the modeled generic downloading scheduling problem is NP-complete.

Proof. The corresponding decision problem can be described as: given all the required

pa-rameters, is there a schedule for the playlist such that the objective value given by Eq. 3.1 is at most M ? First, we show that this decision problem is in NP. Given an instance of this decision problem, a certificate that it is solvable would be a specification of the download-ing schedules for each video. We can then easily check each video’s playback discontinuity, downloading monetary cost, energy consumption and whether the objective value is no greater than M , and thus verify the solution in polynomial time, which suggests the deci-sion problem is in NP.

We next show that the Knapsack problem is reducible to our problem. The decision version of the Knapsack problem can be stated as: given n items with size ˆl1, ..., ˆlnand value

ˆ

s1, ..., ˆsn, capacity W and value S, is there a subset I ⊆ 1, 2, ..., n such that Pi∈Iˆli ≤ W

and P

i∈Iˆsi ≥ S? To construct an equivalent scheduling instance of our problem, one may

be struck initially by the fact that we have so many parameters to manage. The key is to sacrifice some of the flexibility, producing a simpler “skeletal" instance of the problem that still encodes the Knapsack problem. Let p = 1, q = 0 and r = 0 in Eq. 3.1. The objective of optimization problem is thus to minimize Discontinuity of the playlist V , which is equivalent to maximize 1 − Discontinuity = 1 −P

vi∈V wi· discontinuity(vi) =

P

vi∈V wi·

(1 − discontinuity(v_i)), given that Discontinuity ∈ [0, 1], discontinuity(v_i) ∈ [0, 1]. Let

B3/4G = Bwif i = B, which implies that we disregard the difference of connection type.

Assume that all the available downloading slots exist before the watch-time, which suggests that, instead of producing detailed download schedules, we only need to make download decisions (i.e., different downloading times make no difference).

Given the Knapsack instance, we now show how to convert it to an instance of our problem in polynomial time. Corresponding to the capacity W and the n items in the Knapsack problem, we have W downloading slots and n videos v1, ..., vn ∈ V . Assume

the watch duration (effective length) for each video vi is 1 time slot. The problem then

becomes to decide whether to download each v_i (i.e., discontinuity(v_i) ∈ {0, 1}). We set the streaming rate of vi as Ri= ˆli· B, so that downloading vi takes ˆli time slots. Note that,

in our problem, the playlist discontinuity is a weighted sum of individual video discontinuity. We set the weight of v_ias w_i = ˆsi/Pj∈[1,n]ˆsj, and M = 1 − S/Pj∈[1,n]sˆj. Now our problem

is to download the videos with the given W available time slots such thatP

i∈[1,n]wi· xi≥

1 − M = S/P

(32)

otherwise. This described instance is equivalent to the original Knapsack decision problem except the value for each item is scaled down by a constant of P

j∈[1,n]sˆj.

Consider any instance that satisfies (answers “Yes” to) the Knapsack decision problem with the chosen subset I. In our scheduling problem, we download videos with indices in I, which suggests discontinuity(vi) = 0, ∀i ∈ I and xi = 1 − discontinuity(vi) =

1, ∀i ∈ I. This download schedule (downloading videos with indices in I) uses at most

W time slots since P

i∈Iˆli ≤ W . The objective value given by Eq. 3.1 is Discontinuity =

P

i∈[1,n]wi· discontinuity(vi) =Pi∈[1,n]wi· (1 − xi) = 1−Pi∈[1,n]wi· xi= 1−Pi∈[1,n]xi· ˆsi/Pj∈[1,n]sˆj =

1−P

i∈Iˆsi/Pj∈[1,n]sˆj ≤ 1−S/Pj∈[1,n]ˆsj = M . Therefore, downloading videos with indices

in I satisfies (answers “Yes” to) our decision problem.

Conversely, if there is a schedule (a set of download decisions) for our constructed decision problem instance such that Discontinuity ≤ M . The subset I for the Knapsack decision problem can be defined as the set of indices of the videos that are downloaded (i ∈

I, if vi is downloaded). Since S/Pj∈[1,n]sˆj = 1 − M ≤ 1 − Discontinuity = 1 −Pi∈[1,n]wi·

discontinuity(vi) = Pi∈[1,n]wi· (1 − discontinuity(vi)) = Pi∈Iwi = Pi∈Isˆi/Pj∈[1,n]sˆj,

we can haveP

i∈Isˆi≥ S. As the schedule is valid, which uses at most W downloading slots,

we naturally have P

i∈Iˆli ≤ W . Therefore, this subset I satisfies (answers “Yes” to) the

Knapsack decision problem. This finishes the proof that the decision version of our original modeled optimization problem is NP-complete.

3.4.2 Pre-fetching Scheduling

We first consider pre-fetching, which happens well before the user starts watching the playlist, i.e., without a stringent time constraint; hence we can offload the mobile traffic to the wireless network to reduce the transmission cost. The objective is to find a schedule

Spf to pre-fetch the videos, subjecting to the following constraints:

(1) Storage Constraint: X (ˆvj,ˆtj,ˆlj)∈Spf ˆ tj+ˆlj−1 X t=ˆtj B(t) ≤ StorageSize; (2) Cost Constraint: ∀(ˆvj, ˆtj, ˆlj) ∈ Spf, ˆ tj+ˆlj−1 X t=ˆtj C(t) = 0.

The storage constraint ensures that the total amount of pre-fetched video will not exceed the limited local storage. And the cost constraint implies that the pre-fetch is performed only through WiFi links. As the pre-fetched videos may not be watched during the watch-time,

(33)

the performance gain of pre-fetching is uncertain. In order to achieve the unguaranteed performance gain with lower costs, we do not consider cellular communications during pre-fetching. The playlist V during the pre-fetching may only be a subset of that during the watch-time, as pre-fetching occurs before video watching and new videos may be added to the playlist after pre-fetching, which will be handled by the watch-time download scheduling to be discussed lately.

Given that the user behavior during the video watching is unknown at this stage (nor the watching duration U ), we thus introduce P = {p1, p2, ..., pn} to denote the user preference

on each video in the playlist, which can reflect the potential lengths of the watch durations. In practice, P can be evaluated by video popularity, video timeliness, or the social distance between the publisher (the user who reposts the video) and the consumer (the user who may watch the video), or a combination of them. Without loss of generality, here we use the video popularity as the metric of user preference. In addition, we introduce parameter

α ∈ [0, 1] to represent the aggressiveness of the pre-fetching. For each instant video clip,

we pre-fetch α of the total video, instead of downloading the whole clip. The playback discontinuity of a single video vi can then be rewritten as

discontinuity(vi) = 1 − 1 α · L · pf (vi) Ri ,

where pf (v_i) defines how much of v_i has been pre-fetched:

pf (vi) = X (ˆvj,ˆtj,ˆlj)∈Spf,ˆvj=vi ˆ tj+ˆlj−1 X ˆ t=ˆtj B(ˆt).

The next step is to find a proper assignment of wi. For this subproblem, we define wi as

wi =

1 Pn

k=1pk· discontinuity(vk)

pi· discontinuity(vi),

which considers both the user preference for v_i and its current playback discontinuity. Note that w_i decreases as more of v_i has been pre-fetched, as given the batch view behavior, it is not reasonable to allocate all the resources to a tiny portion of extremely popular videos. In practice, the first several units of a video are normally requested with a much higher probability than its later part. Together with the pre-fetching aggressiveness α, this assignment of wi allows us to pre-fetch more videos with the video preference still being

(34)

As the monetary cost for WiFi links is usually negligible, our goal here is to minimize

Discontinuity with the form: Discontinuity = 1 Pn k=1pk· discontinuity(vk) · X vi∈V pi· discontinuity(vi)2.

Different from Eq. 3.1, this objective function does not directly involve the energy consump-tion of pre-fetching. Here, we use α to control the trade-off between the energy consumpconsump-tion and the playlist playback discontinuity. As α gets larger, more videos would be pre-fetched, which consumes more energy; on the contrary, if α is small, only a small portion of videos will be pre-fetched, and thus little energy is consumed. Therefore, the above objective function can still represent the overall performance.

This pre-fetching scheduling subproblem is a variation of the knapsack problem with a total weight limit:

W = min(StorageSize, X

∀t such that C(t)=0

B(t)),

where an object is one time slot length of video playback, and its value is the amount of decrease of p_i· discontinuity(v_i)2 after pre-fetching one more time slot, if the object belongs to video vi. It is easy to see that while the weight of each object is the same, the value

changes as the decisions are made, i.e., as one object of video vi is downloaded, the value of

all the remained objects of video v_i decreases as now discontinuity(v_i) decreases. We use a greedy algorithm to search and download one object that currently has the greatest value in each iteration. Recall that all the objects have the same weight. Given the optimal result in each iteration, the algorithm returns the final optimal pre-fetching schedule.

3.4.3 Watch-time Download Scheduling

Unlike pre-fetching, the video watching durations can be largely determined from the input user gestures, typically click, drag and fling, where the last two gestures can cause screen scrolling. Once a gesture is given, the following process of screen moving is predetermined. Given the fixed display size of each clip (specifically, the fixed height), the motion of screen scrolling can be modeled and calculated, and the details of the scrolling process can be obtained accurately (e.g., how many videos are present, how long each video will stay in the viewport), which can hardly be done in VoD or video streaming applications. Although dif-ferent operating systems have difdif-ferent technical details for implementation, the philosophy for animating the screen scrolling is generally the same, which is to gradually decelerate the scrolling speed until it reaches zero if there is no other finger touch detected during the

Enhancing mobile multimedia services