VII. CellShift: A System to Efficiently Time-shift Data on the Cellular Net-
8.6 Cloudlet Feasibility Analysis
Having examined our prediction system using trace-based analysis, we then examine the feasiblity and challenges of using it in the real world. We use cloudlets as our main motivating system, and start by considering what sort of overheads would be of concern for these systems. We examine how the total data downloaded can impact the storage required and the time to migrate data between cloudlets. We then build a simple prefetching proxy and find that there are challenges with being able to download prefetched content in time
0 50 100 150 200 250 300 350 2 4 6 8 10
Parameter used in analysis
Percent of data downloaded
Number of template matches needed to download object false positives (objects)
by size by object
Figure 8.12: Impact of varying the parameter for how many times we need to have seen a static URL to prefetch it late.
0 20 40 60 80 100 120 140 160 20 40 60 80 100
Parameter used in analysis
Percent of data downloaded
Number of template matches needed to download object false positives (objects)
by size by object
Figure 8.13: Impact of varying the parameter for how many times we need to have seen a match for this URL pattern when training in order to prefetch it later.
before it is needed.
First, we estimate the impact of the volume of downloaded content on a cloudlet. We use an approach similar to our analysis in the last section, only now we guess what the size of the content would be. We assume that for the objects we mispredict, their sizes are on average similar to the average size of the content we did download. For calculating false positives due to prefetching the right URL with the wrong parameters, we take as the object’s size the size of the URL with the correct parameters. We expect this is a conservative estimate, as in many cases a smaller error message would be returned instead of real content.
We assume each app and each user has independent data stores and that we thus aren’t optimizing by having only one copy of each object globally. This sort of optimization could potentially introduce privacy issues. We calculate the amount of data needed across each of the sessions we recorded, which are several minutes long each.
For our sessions, we average about 41 MB, including both static and dynamic content, but the median was 5 MB. As shown in Figure 8.15, the mean is distorted by a small number of apps (Instagram and PBS Kids in particular) which use over a hundred megabytes, due to their heavy use of high-resolution images and video, respectively. Overall, storage is not a problem. A 500 GB hard disk costs about thirty dollars when bought individually1. It
is thus reasonable to support about 100,000 app instances for individual users on a single cloudlet, or 12,500 if we take the average instead of the median. Even with a disk of a few gigabytes only, a significant number of apps could be supported.
A bigger problem is bandwidth. It appears that the average bandwidth in the US is around 18 Mbps2. Assuming we can saturate the bandwidth, or most apps, several could be forwarded to the next cloudlet each second, but for something like Instagram, it could take 15 seconds or so to transfer all the data. If the user has several such apps, it could take
1https://www.amazon.com/Seagate-Pipeline-3-5-Inch-Internal-
ST3500312CS/dp/B002CMOH26/
2http://gizmodo.com/americas-internet-inequality-a-map-of-whos-got-
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 50 100 150 200 250 300 350 400 Fraction of apps
Data required in a session (MB)
Figure 8.14: Amount of data, total, downloaded for a short session of using an app. minutes. Given the low cost of storage, one solution could be to store data in advance on cloudlets the user is likely to visit next. Another solution would be to only focus on apps with lower sizes. We could also focus on the cases where the user is stationary for a period of time and not guarantee that Server Push will work for the first few minutes after moving elsewhere.
In the case of WiFi, cloudlets could be associated with a building rather than a specific AP, limiting the amount of data that needs to be moved around the building. An exam- ination of traceroute results from WiFi in this building shows that the first hop after the wireless access point in the BBB (which, in fact, is in the School of Information building and not the BBB) is less than 2 milliseconds away, whereas a server on the other side of the continent https://berkeley.edu) is about 77 milliseconds away. This suggests for something like a university campus, only a few cloudlet sites may be needed, since in many cases the last hop adds only a small fraction of the latency, which would limit how often data would need to be migrated.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 20 40 60 80 100 120 140 160 180 200 Fraction of apps
Time to load first batch of prefetch content, seconds
Figure 8.15: Time to download and process the content to be loaded on initial batch load. We created a simple prototype to examine some of the challenges of applying prefetch- ing to real devices. We were not able to get prefetching to work effectively in practice, primarily because content wasn’t loaded fast enough. As we show in Figure 8.15, the time to download and parse the content could be quite substantial, and the apps with short download times often did not have much content that could be prefetched. This is an initial prototype, and not heavily optimized, but aggressively fetching a large amount of content can be slow, and we need to get the download and processing time down to about the time to download a single object to see benefits. A method of identifying or prioritizing the content to prefetch is could alleviate this problem. We also ran into some other engineering problems, such as the account apparently becoming logged out due to the requests made, which prevented prefetching from working effectively.