2.2 Design considerations
2.2.1 Challenges
To inform our design, we have studied how apps behave with respect to what web content they download. We have manually examined the MSIL code of several tens of highly rated Windows Phone 8 apps (similar to examining the Java byte code
of Android apps). We have also manually examined network traces of few tens of highly rated Windows Phone 8 and Windows 8.1 tablet apps1.
We have found that apps typically behave very differently than web browsers.
When a user enters or clicks on a URL in the browser, such as http://www.cnn.
com/, a standard set of actions are taken by the browser. The browser will begin by fetching the default object at this URL, which may be index.html. That HTML page may refer to images, CSS, javascript and other URLs using standard HTML tags that the browser will fetch, execute, and render. In this way, browser behavior in visiting a website is not only predictable, it is similar between different browsers and can be determined without observing past browser behavior – one simply needs to fetch the starting URL and follow the same rules that any browser would follow.
This is a direct result of web standards that are designed for interoperability across different browsers on different operating systems.
When an app, such as a news app or social networking app, is launched, it will typically fetch one or a few URLs that are deterministic. These URLs may be stat-ically defined or hardcoded in the app’s source code, and sometimes may include a user ID or device ID that can vary between different users. The underlying OS will fetch these URLs for the app and return the contents. The contents may be a variety of standard web objects, such as XML, XHTML, JSON, JPG, PNG, and
un-1In this work, we consider only “modern” apps that are purchased by users from the app store on Windows.
formatted text. Some of these objects, such as XML, may contain URLs or portions of URLs. The app will parse out these URLs and use some custom logic to fetch a subset of them. It may modify some of the URLs before fetching them. The contents of those fetches may in turn identify additional URLs that the app may fetch. Some of these fetches may be automatic as a result of launching the app, while some may happen only if the user clicks on an icon in the display. In this way, the app operates in a fetch – parse – compute – fetch cycle with respect to downloading web content.
In the rest of this chapter, we classify the set of URLs that an app may fetch into four different buckets: static, modified, verbatim, and partial URLs. Static URLs are those URLs that are either hardcoded in the source code of the app or calculated de-terministically and do not change through multiple app executions. They are fetched when the app is launched, or when a major placeholder, such as the sports section of a news app, is clicked for viewing. Static URLs are typically not present in the contents of previous URL fetches. On the other hand, modified, verbatim and partial URLs can be observed in previous HTTP response payload. A verbatim URL is one that is present in a downloaded web object, and the app subsequently fetches exactly that URL. A modified URL is one that is present in a downloaded web object, but the app modifies the URL before fetching it. A partial URL is one where a simple string or portion of a URL is present in downloaded content, that the app uses to
typeURL statichttp://mobilefeeds.wsj.com/xml/rss/3_5557.xml statichttps://iwap.wsj.com/iphone/rolesByUuid?uuid=MSOFT_sagarwal&udid=WSJ-IPAD statichttp://mobilefeeds.wsj.com/xml/feed/v2/3_5549_2.rss statichttp://www.marketwatch.com/mw2/mediarss/wsjdn/wsjtv.asp?type=wsj-section&query=PersonalFinance&count=30 partialhttp://online.wsj.com/xml/djml/http://blogs.wsj.com/five-things/2014/03/07⇓ /5-takeaways-from-the-february-employment-report/.xml modifiedhttp://mobilefeeds.wsj.com/xml/djml/SB10001424052702304732804579423454141533882.xml verbatimhttp://s.wsj.net/public/resources/images/BN-BV058_russia_G_20140307121641.jpg Table2.1:SomeoftheURLsfetchedbyasinglerunoftheWallStreetJournalapp.
TakeawaysFromtheFebruaryJobsReport</title> xmlns:media="http://search.yahoo.com/mrss" medium="image"height="369"width="553"> UninsuredBuyHealthCoverage</title> e2.1:SnippetsoftheresponsepayloadfromfetchingthethirdURLintheWallStreetJournalexample.The wasa190KBRSSfileinXMLformat.
As a concrete example, we present the behavior of the Wall Street Journal app by Dow Jones & Company Inc. on a Windows 8.1 tablet. Table 2.1 shows some of the URLs that a single invocation of the app fetches. The first four URLs listed are static URLs, and are fetched everytime the app is launched on this tablet. The second URL contains a unique user identifier and a unique app identifier. Some of the contents of the third URL download are shown in Figure 2.1. Notice that the last URL in Table 2.1, which we classify as “verbatim” is embedded exactly as fetched in the middle of Figure 2.1. The top of the figure shows a URL for a news article, which the app modifies by prepending http://online.wsj.com/xml/djml and appending .xml – we call this a partial URL. The bottom of the figure contains a URL that the app modifies by replacing the hostname and the file type from .html to .xml – we call this a modified URL.
As the example shows, custom app logic that the app developer has built deter-mines what URLs to fetch. That app logic may select a subset of URLs to fetch, and may construct or modify URLs from content that is downloaded from the web. This behavior is different from that of webpages in a browser, where HTML and other documents clearly demarcate URLs and common logic is followed by the browser to fetch those URLs. One exception is javascript code that executes in the browser, where potentially arbitrary logic could determine additional URLs to fetch.