7.4 Implementation
7.4.3 Client-Side JavaScript-Based Interaction Tracking
With the logged HTTP requests and responses, it is already possible to determine users’ naviga- tion behaviour on non-AJAX sites, including popular paths through the site, the most important entry and exit pages, or the average number of page views. UsaProxy’s client-side JavaScript code augments this information with details about the user actions on each of the requested pages.
The primary concern which guided the development of UsaProxy’s JavaScript code is the necessity to avoid any interference with other code that may be loaded by the web application itself. The first precaution of the code is to use a “_UsaProxy” suffix for all global JavaScript variables to reduce the chance of name conflicts.
Another problem is more challenging at first: User actions like mouse clicks are mapped to events likeonclick. To receive the event, JavaScript code needs to register an event handler for it. Traditionally, AJAX applications achieve this by obtaining a reference for an object in the Docu- ment Object Model (DOM) tree which represents the element structure of the HTML document. Each object corresponds to an HTML element such as a button, and the code can subscribe to an event by overwriting certain methods of the object, for example theonclick method for mouse clicks.
Unfortunately, one implication of this event registration model is that only a single subscriber to any event is supported. If UsaProxy registered its own onclick and related methods for all objects in the DOM tree, any code of the web application which is loaded afterwards would over- write that handler with its own method, and UsaProxy would be unable to track user behaviour for the respective element.
As a solution to this problem, two extended event registration models were added to browsers. They allow any number of interested parties to register for a particular event. Furthermore, they support the registration of an event handler not only for a single object, but also globally for all elements in the DOM tree or a subtree of it. The W3C version of the event registration uses theaddEventListener()method and is supported by Netscape 6 and Safari/Konqueror, whereas Microsoft Internet Explorer only supports the alternative attachEvent() method. The Opera
browser supports both variants. UsaProxy is intended to work with all the most popular browsers, it can use either method.
The UsaProxy logging code needs to be notified of many of the available events, and this needs to happen for all elements in the DOM tree. For this reason, it registers global event handlers which are later called in addition to any event handlers that the application may have registered. Via the object for which an event happens, access is possible to further useful data about the respective HTML element, such as its id, href or src attributes. Additionally, the object’s position in the DOM tree can be determined. Via its event handlers, UsaProxy obtains the following information and causes it to be logged:
• Loading of new pages and (for some browsers only) closing of a window/tab with a page; • Resizing of the browser window, with new window width and height in pixels;
• Whether the page or one of the elements on it gains or loses the input focus;
• Mouse movements, with pixel coordinates relative to the upper left corner of the page; • Mouse clicks on elements, or hovering the mouse pointer over an element, with pixel co-
ordinates.
• Scrolling inside the document, with vertical pixel offset from the top of the page; • Keys that are pressed, including modifier keys like Shift.
• Changes made to the value of any field in a form, including radio buttons, drop-down
menus, checkboxes and text fields.
• Text that is selected using the mouse or keyboard, either inside text fields or anywhere else
on the page.
For mouse clicks and hovered-over elements, the element is identified in the log entry via anyid hreforsrcattributes it has. Additionally, for anchor tags the text of the anchor is logged. However, in some cases the element will not have been assigned an ID, so this information may not be enough to uniquely identify it in the document.
To account for this problem, the JavaScript code also creates a string which encodes the exact position of the element in the DOM tree: Starting from the root of the tree, each character specifies whichnth child element must be selected to reach the node, where the character “a” stands for the first child, “b” for the second, etc. For example, the string “aba” indicates that the first element is selected (very likely<html>), then the second child (probably<body>), and then the first child element again (e.g. a<h1>heading at the start of the page). An extension of the syntax also allows this scheme to work for nodes with more than 26 children.
If the tracking code recorded every small change of the mouse position or the scroll offset, a significant amount of data would need to be sent back to the proxy. As the aim of the logging solution was to also support users with slow Internet connections, e.g. using a modem link, the amount of data is reduced by only recording changes of these values at most every 150 millisec- onds.
Logging of Client-Side Interaction Data
The JavaScript part of UsaProxy’s implementation collects log information in a string variable and sends it back to the proxy at regular intervals. This is required because there is no way for the
Log request Client Proxy log file Log data
Figure 7.5: Log data is sent to the proxy by the JavaScript running on the client. The special form of the URL that is used for this causes the proxy not to forward it to any server.
script to store larger amounts of information on the client machine due to security restrictions. However, it is also a useful feature, since the proxy can conveniently store the logs for many test participants in a single, central place.
To deliver the log data, the same trick is used as for the initial download of the JavaScript code itself (see figure 7.5): A request to a special URL is made via anXMLHttpRequest invocation. The proxy recognizes this URL and does not forward it to a server, but instead writes a line to its log file. The line consists of the IP from which the log request originated, a timestamp and the submitted log data. The URL of the request now includes the string “log”, followed by the text of the log line, i.e. http://a.b.c.d/usaproxylolo/log?log-data
The storage requirements for UsaProxy’s log data are relatively modest even though the log- ging is far more detailed than that of server-side logs. In the example user studies that were conducted with the system and which involved viewing a number of pages on several websites, on average the amount of disk space that was needed for the recorded HTML documents was below 100 kBytes per user and minute, and the disk space for the log file was about 3 kBytes per user and minute.