• No results found

Multi-Label Classification Model

2.5 Evaluation

2.5.2 Multi-Label Classification Model

We have proposed multi-label learning as a prediction approach to web content prefetching for mobile apps, and we have discussed the implementation details of this learning algorithm in Section 2.4. We now evaluate the components in the design of our multi-label learner, namely (1) the accuracy in preparing the training data to the ML algorithm, and (2) ML accuracy in predicting the correct transformations to apply to new embedded URL or string input instances.

5The number of unique static URLs is sometimes less than the number of static URLs fetched by the app as some static URLs are fetched multiple times in a single app launch.

Table 2.9: Insight into app web traffic by profiling the median app behavior per-run

Everyday Food 7 9 163 7

Facebook 6 46.5 374.5 48

The Wall Street Journal 7 23 63 21

The Weather Channel 6 71 32.5 40

Twitter 7 14 215 5

USA Today 7 18 156 8

500px 7 8 102 6

URL Extraction and Matching

Constructing ML models for a given system requires a set of examples or known instances in order to train the ML model. Preparing the training data for our app-specific, web prefetching models requires effectively extracting the relevant strings from HTTP payload content, followed by accurately matching the fetched URLs to the corresponding embedded content. We now evaluate this accuracy in mapping the fetched URLs to the correct embedded strings or URLs. This accuracy ultimately affects the prediction accuracy of the final ML model for future, unknown input instances.

In Table 2.10, we depict the accuracy of our URL extraction and matching tech-niques. We use the Heavy Browsing dataset, and from this dataset, we compute

the number of the fetched URLs that were successfully matched with URLs ex-tracted from previous HTTP payload content. Note that we exclude static URLs from the total number of fetched URLs, as those are not embedded in payload con-tent. These values are representative of the behavior we see from different browsing load datasets. We also find that Some Browsing is able to cover example URL trans-formations as comprehensively as a dataset collected from Heavy Browsing.

We observe that our URL extraction and matching methods achieve high accu-racy. This accuracy reflects the success of our system in preparing the appropriate data to feed to the ML model. Some apps however, namely Geek Trivia and Newser, expose a limitation in Airplane Mode, which is that we do not handle URLs with dy-namic content. Namely, we do not handle URLs constructed with variable content, using timestamps.

Now that Airplane Mode has matched URLs, the differences or transformations between matching URLs are then extracted, and Airplane Mode thus identifies the labels of the its ML classification algorithm. Next, we evaluate the accuracy of Air-plane Mode’s ML predictions, where the ML model is generated using the training we have just discussed.

Table 2.10: Preparing the ML training data using the Heavy Browsing dataset:

accuracy of URL extraction and matching. Stats are reported over multiple runs.

Apps # Runs

Everyday Food 7 1710 1697 98.5

Facebook 6 2455 2359 96

Flixster 7 58 44 33.33

Geek Trivia 7 17 0 0

HowStuffWorks 7 292 277 95.5

Khan Academy 7 231 221 93.3

Manga Z 7 887 882 99.2

Newser 7 954 644 72.2

The Wall Street Journal 7 548 544 99.5

The Weather Channel 6 203 138 70.2

Twitter 7 1611 1223 78.6

USA Today 7 922 873 95

500px 7 706 703 100

* We exclude static URLs from the total number of fetched URLs.

ML Predictions

We adopt a multi-label learning algorithm from literature called HOMER.6After feeding HOMER the training described earlier, we now use the app-specific ML prediction models generated by HOMER and evaluate their accuracy in predicting fetched URLs.

In Table 2.11, we apply our HOMER-based ML approach and two alternative ap-proaches to parse and prefetch URL content to the same traffic dataset, and compare their accuracy. The first alternative is a na¨ıve, general approach that fetches URLs exactly as they appear embedded in HTTP response payload content. The second alternative is a manual approach where we manually evaluate the traffic generated

by an app and construct observation-based, app-specific rules to generate a list of candidate URLs to fetch. Since the manual approach requires detailed analysis of app traffic, we restrict our analysis period to 4hrs per app, and we implement it for four apps only.

Our design of a manual approach mostly outperforms the na¨ıve, general ap-proach, however it is impractical for two obvious reasons. The first is that a manual approach is not scalable, as specific rules need to be created for each app, supported by our analysis of app HTTP traffic. Second, by observing HTTP traffic alone, the user is bound to miss rules, as is clear for Twitter where the general apprach per-forms better than the manual approach. We observe that popular social networking apps that generate content and update content at a high rate are hard to emulate; in Table 2.9, we observe a high dynamic content count from Twitter. All in all, Airplane Mode clearly outperforms the manual and general techniques, and our solution is both scalable and generalizable.

In Table 2.12, we analyze the prediction accuracy of Airplane Mode in greater detail. We evaluate the impact from using different training data on predicting URLs for app launches with different browsing behavior or loads. In particular, we are evaluating the impact of browsing load on the completeness of the training data, and its ability to accurately predict URLs for different user browsing behavior with the app. We note that in our computation of the number of URLs that Airplane Mode

extracts from HTTP content, we consider all identified URLs, though those URLs might not be valid URLs.

We can see that the general trend in Table 2.12 is that prediction accuracy im-proves as the ML training data is derived from traffic with heavier browsing load, however not by much. This can be explained by the number of additional observa-tions on URL transformaobserva-tions that the training data might cover with heavier brows-ing of app content. Also, as the trainbrows-ing data is derived from datasets with heavier browsing loads, the number of static URLs identified increases. This observation indicates that not all static URLs are fetched on app launch, but are fetched after browsing the app and clicking on particular links with static URL identifiers, such as the sports page section.

We now take a closer look and categorize apps based on their behavior. USA Today exhibits unique behavior, whereby we are able to achieve high accuracy in ex-tracting and matching URLs, as shown in Table 2.10, but the performance of the ML algorithm somehow falls a little shorter. The reason for this is the wealth of trans-formations that can be applied to a URL with the same features, however different subsets of these transformations are applied to any given URL. This phenomenon impacts how accurately the ML, namely here HOMER, is able to predict the correct set of transformations to apply to a given URL.

Table 2.11: Evaluation of the accuracy of the proposed prefetching approaches for a single app run.

Apps General Manual Airplane Mode

CNN 0.81 1.0 1.0

The Wall Street Journal 0.49 0.74 0.77

USA Today 0.49 0.85 0.5

Khan Academy 0.58 0.81 0.94

Other apps rely heavily on static URLs to populate their pages or cache content for long periods of time. These apps are namely HowStuffWorks, Khan Academy, Everyday Food, and Flixster. As discussed in Section 2.4.5 and shown in Table 2.5, a certain set of apps (namely, Geek Trivia, The Weather Channel, Flixster and Newser) contain dynamic entries, such as a timestamp or geolocation. Airplane Mode iden-tifies the format of these entries and is able to handle them, however, within the context of extracting generic URL transformation rules, the URL extractor of Air-plane Mode is unable to effectively extract such URLs for those apps, and therefore the ML alone does not handle these cases. As a result, the prediction values from the ML alone indicates poor performance, however, Airplane Mode is able to iden-tify and handle all these URLs using a separate learning mechanism that looks for generic timestamp and geolocation literals in URLs and correctly fetch those URLs, as described in detail in Section 2.4.5.

2.12:AnalysisofMLpredictionsperapplaunch,reportedasthemedianacrossmultipleindependentruns.We aluatetheimpactofthebrowsingloadofthetrainingdataonthepredictionaccuracyoftestsuiteswithdifferent wsingload. Apps

{MLTrainingData,TestSuite} {Nobrowsing,{Somebrowsing,{SomeBrowsing,{HeavyBrowsing, Somebrowsing}Nobrowsing}SomeBrowsing}SomeBrowsing} #URLsAPMS#URLsAPMS#URLsAPMS#URLsAPMS Allrecipes55.519792786014568810254.587112 CNN3956539305005039620383962038 EverydayFood3761363333333332.58611332.586113 Facebook27949331917970327283.5572518259.5543313 Flixster933224450208093322449332244 GeekTrivia1.50100000001.5010001.501000 HowStuffWorks52131077420148658147794425966 KhanAcademy1507327133885424.56043624.560436 MangaZ67.5048523500100111.51348310419972 Newser78.55436101656251994.559202166.5165529 TheWallStreetJournal5829591224790216458172569541630 TheWeatherChannel70.5183349512376178.53435478182953 Twitter58.51186363333331058563378136845 USAToday65.54138214829383369.537372669.5373726 500px631087387502560801646055423 *AP:%URLsAccuratelyPredicted,M:%URLsMissed,S:%StaticURLs