5.3 Traffic analysis
5.3.4 Relationship between request frequency and request inter-arrival rate
0 10 20 30 40 50 60 70 0 10 20 30 40 50 60 70 80 90 100
Number of requests per day
request rate ’λ’ binned each 900 seconds facebook videos
youtube videos
Figure 5.3: YT+FB data
Figure 5.3 quantifies the relationship between the request rate and the request inter-arrival 66
5.3. Traffic analysis 0 10 20 30 40 50 60 0 20 40 60 80 100
mean-std of number of requests per day request rate ’λ’ binned each 900 seconds
Mean youtube requests Std w sunday Std w/o sunday (a) YT data 0 5 10 15 20 25 30 35 40 0 20 40 60 80 100
mean-std of number of requests per day request rate ’λ’ binned each 900 seconds
Mean facebook requests Std w sunday Std w/o sunday
(b) FB data
Figure 5.4: a: Relationship between the daily request frequency and the daily request inter- arrival rate on 01/13/2014; b and c : on the week starting from 01/13/2014
rate for YT videos, respectively FB videos, on one given day, here Monday, January 13, 2014. Given a row data - a vector of (xλ, yN)uvalues representing one single user, where xλrepresents
the inter-arrival rate of requests over one day and yNrepresents the number of viewed videos
per day - on the x-axis, we start creating bins of 900 second long. This subsequently generates 96 bins to cover the entire day. Then each user is associated to one bin. The bin 0 in the x-axis refers to users having their inter-arrival request rate ranging from 0 to 900 seconds. Bin 1 corresponds to the range [900 seconds, 2*900 seconds[, etc. On the y-axis, for each of these 96 groups, we show the number of requests per day (yN) of the 99-percentile most active user
within each bin. Figure 5.3 shows that the activity of users could be modeled and quantified with an exponential decay. Users do not request more than 3 videos per day when their request inter-arrival rate is higher than 20 ∗ 900 seconds.
Figures 5.4(a) and 5.4(b) generalize this observation for the rest of the days of a week (until 20 January). On these figures, users are binned on the x-axis as per the daily average inter-arrival rate of the requests they generated in the week of Monday, January, 13, 2014. On the y-axis, for each of the 96 bins, we show the mean over the seven days of number of requests of the 99-percentile most active user within each bin. The standard deviation is also given twice: once over the seven days, and once over six days from Monday to Saturday (excluding Sunday). Figures 5.4(a) and 5.4(b) show that the request inter-arrival rate remains slightly similar across the days of the week. The standard deviation gets quickly close to zero for the least active users and it also remains relatively low for the heaviest users. The standard deviation is slightly
Chapter 5. CPSys: A system for mobile video prefetching
higher when it includes Sunday. This illustrates that users have more heterogeneous con- sumption behaviors on Sunday than on other days. On Sunday, we record a lower activity on mobile devices. This suggests that the majority of users consume less FB and YT videos on their mobile devices while a minority is much more active on Sundays. The patterns are quite similar for YT (Figure 5.4(a)) and FB (Figure 5.4(b)); only the mean request inter-arrival rate of the heaviest users is higher for YT.
F3: The request inter-arrival rate might be used to identify the heaviest users from the least active ones in order to enforce them a specific prefetching strategy. Moreover Figures 5.4(a) and 5.4(b) provide additional insights to fine tune the prefetching system: The 99-percentile most active users request no more than 35 FB videos and 52 YT videos per day on average, which gives an insight on the daily number of videos to prefetch.
5.3.5 Video lifetime distribution
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 10 100 1000 CDF
time (granulality per hour) CDF of number of views across the time
Figure 5.5: Lifetime distribution of videos made available on January 8, 2014
In Figure 5.5, we show the lifetime distribution of YouTube videos across time. We limited our study to videos that have been uploaded to YouTube on January 8, 2014, which is the starting day of the dataset. In this study, the following process was applied:
• First, the unique reference IDs of the YT videos in the dataset are extracted from the URIs (as explained in Section 5.3.1).
5.3. Traffic analysis
• Second, the reference-id of each of these videos is decoded into a base64 encoding scheme to get the unique identifier of the video attributed by YouTube to index the content.
• Third, the YouTube API2is used to retrieve the day the videos were made available online and their category.
• Fourth, we observe that 4554 YouTube videos were uploaded on 8 January 2014 and are present in the dataset.
• Fifth, the aggregated cumulative distribution of the number of views of these videos is computed at a granularity of 1 hour and for the whole period of the dataset (94 days). An offset is associated to each of these video to shift the first request of each video to the origin. Then we use the same offset to shift the rest of requests of the same content. We plot in Figure 5.5 the cumulative distribution of the number of views of these videos. The figure clearly shows that most of the views happen in a short time frame after the videos are made available: 10% the first hour and 40% the first day.
F4: According to Cha et al. [28] a large part of content items is immutable which means that users tend to lose interest in an item immediately after they consumed it. Figure 5.5 confirms that any prefetching strategy shall be proactive and quickly anticipate the interest of each user towards each video.