• No results found

chapter five iMPULS: internet music program user logging system internet music program user logging system internet music program user logging system internet music program user logging system

5.3 system architecture

5.3.2 data collection

During use, the program records a variety of events relating to the user’s interaction – both input and output. The data collected is as full and raw as possible, to support not only the planned analyses, but also allow for investigations that were not envisaged, without necessitating the further collection of data.

9

The storing of this mapping is also necessary for occasions when a user requires reminding of their activation code, since the server must remember which unique identifier was assigned to which participant. Theoretically, this enables a brute force attack, where an experimenter with access to the database could use it to generate all the possible keys from the email addresses and experiment IDs contained, and then compare them against the code used to tag specific interaction data. However, the separation of this database from the experiment data increases the effort required for such an attack, which can be simply averted by denying experimenters access to the database (e.g. having it maintained by a trusted third-party).

At the same time, recording all data is not feasible, as it would constitute too great an invasion of privacy and place a significant processing overhead on the client program. The collection mechanisms must not interfere with normal program operation or use, as might be the case if the experiment data required too much computer memory or processing power. Moreover, the data collected must be quickly transmittable over the Internet, as the program closes, without interrupting the user experience.

Table 1 – different

event frequencies in the user experience

in the order of... frequency range

interaction hertz (Hz) up to 10 Hz audio kilohertz (kHz) 20 to 192,000 Hz processor megahertz (MHz) or gigahertz (GHz) 300,000,000 to 4000,000,000,000 Hz

In programming, the process of instrumentation is used to record and study program use, but can significantly reduce program performance, as timing data is collected and stored at such a high rate, for each executed function, line of code, or CPU instruction. As illustrated in Table 1, the frequencies of interaction are an order of magnitude lower than those relating to audio, which itself runs at a significantly lower rate, compared to the computer processor. As such, instrumenting a program to record user interaction need have little or no impact on the user experience, as long as it doesn’t delay or interrupt other program processes, such as audio processing or disk access.

data encoding

and bit-packing To ensure as small a processor and memory footprint as

possible, events are bit-packed and stored in memory, then flushed to disk during periods when the computer is idle.10 Figure 9 gives an overview of the different log entries and data encodings used to record each type of interaction event. Further technical details of each event type are given in Appendix D.

The corresponding data types are derived from a base class, representing the members and functions generic to all interaction log entries. This abstract data type provides a single data member to identify the type of log entry, and declares pure and virtual functions that specify an interface, allowing code to handle

10

By default, the smallest data type (datum size) in the C++ programming language is 1 byte (8 bits), which is typically then aligned to 8-byte (64-bit) boundaries to improve the speed of memory accesses. In the best case, this means a simple true/false (1/0) boolean (bool) value takes 8 times the memory required (1-bit); in the worst case, it can take 64 times. To pack the bits more densely, bit masks and

Boolean operations (logical AND, OR, NOT) are employed to address single bits within a byte (e.g.

the value of the nth bit of x is accessed using the expression x & 2n-1). At the same time, a dedicated compiler directive (#pragma pack(push, 1)) is used to override the alignment of members in the data

structure used to record log entries. The remainder of the program is unaffected, and thus free to use faster, if more greedy, memory access methods.

Figure 9 – an overview

of the different event types recorded as part of the experiment

collections of interaction events without worrying about the differing event types and their implementation or encoding. These functions require derived classes define code that:

• returns a human-readable description of the event (text)

• specifies a colour associated with the event type (colour)

• returns the object size (_size), for fast memory copying

Additional functions are declared and defined for the loading and saving of entries from file or memory, which can be overridden by child classes (for example, to save entries of variable length, such as those containing strings):

• loads event data from a file (load(FILE*))

• loads event data from a memory buffer (load(BYTE**))

• saves event data to a file (load(FILE*)) instrumenting the

user experience The timestamp used for the session is set with creation of an iMPULS object, which hosts the functions, buffers and other

mechanisms used to manage data collection.11 However, hook functions and data collection are not started until the

iMPULS::start() function is called, which should be triggered upon

successful conclusion of the program’s startup.

11

The code to support data collection is contained in three files: a header file defining constants and parameters (e.g. connectivity settings) (iMPULS_Constants.h), a header file declaring data types and the support functions (iMPULS.h), and a source code file (iMPULS.cpp) providing the function bodies.

The code is integrated into an existing program’s source by including the main header file (#include “iMPULS.h” ) and creating a single, global instance of the iMPULS controller object. These files and details about integrating IMPULS with other programs are available from the author upon request.

Hooked events (such as host notifications and help system calls) are recorded automatically, through callback functions provided by the iMPULS controller, but other events are recorded manually,

using explicit calls to an appropriate iMPULS function:

• keyboard(...) and mouse(...), called from the program’s

input handlers, upon user input.

• message(...), called from the program’s window procedure,

upon certain Windows notification messages.

• cursor(...) and focus(...), called as the user moves, within

or between controls, tabs or pages.

• command(...), called to log specific program functions as

they are triggered (e.g. as the result of input), or activity not automatically caught by other handlers (e.g. occuring as a result of activity in the host, such as tempo changes). Each function follows a similar procedure; constructing the log entry using the appropriate data type (see Figure 9 and Appendix D), then passing it to a function that adds the entry to the memory buffer, which is flushed to disk as appropriate.