• No results found

visualisation and analysis

chapter five iMPULS: internet music program user logging system internet music program user logging system internet music program user logging system internet music program user logging system

5.4 interactive visualisation environment (IVE)

5.4.1. visualisation and analysis

Although many analyses were already planned, the experiment was designed to support flexible exploration of the user experience – allowing different aspects of the interaction to be explored in greater detail, as their relative importance was established. As such, the tool was designed not only to manage the execution of the experiment, but to provide an experimental platform for testing new analyses and visualisations of the data.

Figure 13

visualisations in

iMPULS|IVE (see Appendix E for specific details of visualisations used)

Shneiderman’s Visual Information-seeking Mantra

In this way, the application provides capabilities consistent with Shneiderman’s Visual Information-seeking Mantra (1996): “Overview first, zoom and filter, then details-on-demand”. This exploits the visual-processing and pattern-matching capabilities of the human brain – providing as many different visual perspectives as possible and allowing the user to guide the visualisation process, in order to identify trends and relationships in data.

visualisation and

the scientific method Visualisations, such as those in Figure 13, can suggest both new

analyses and findings, but the added flexibility increases the risk of cherry-picking data – focusing (possibly inadvertently) on analyses that appear to support a specific conclusion or opinion, and overlooking those that produce less clear-cut results. When it becomes quicker and easier to perform analyses, it becomes easier to over-analyse data, tinkering with a methodology or sample until a finding is found. As a scientific tool, it is important to balance the use of visualisation in its capacity for exploration versus explanation (Tall, 1991). Appendix E details the visualisations used to support and guide analyses in subsequent chapters.

overview first... The main screen (Figure 11) presents the data in hierarchical

(tree) form, with nodes for each user, containing nodes for each user’s sessions, which themselves contain additional nodes for

files and windows described in the session. Selecting a node brings

up information about the corresponding object in the right pane, which can also include summary information about the objects it contains. For example, the root node provides an overview of all users and sessions in the dataset; a user’s summary page presents details about the user and all their sessions. This hierarchy is explicit in both the interface and the internal data types used by the program (see Figure 12).

Figure 14

(top) user and session filters on toolbar; (bottom) interaction event filter dialog

zoom and filter... The tree hierarchy allows the experimenter to ‘zoom in’ on

individual users or individual sessions, but other filtering systems allow them to restrict analyses or visualisations to groups of users or sessions ( in Figure 12). Summary information for each user and session is cached in a database, and can be used to include or exclude users or sessions with certain properties. For example, Figure 14 excludes users with under 30 minutes total interaction, and sessions with less than 10 minutes time in the pattern editor. Individual logs can also be filtered with regard to interaction events, limiting processing to specific event types (see Figure 14). The text representation required by the Entry class (Figure 9) enables events to be filtered using simple string comparisons – looking for combinations of key words or phrases that appear in, or are absent from, the description. Figure 14 also illustrates how different subsets of events can be extracted, combining several simple filters using logical operators.13

… then details

on demand Once the dataset has been optimised and filtered, the program

offers different ways to analyse the data ( ), for visualisation or exporting to another program, such as R or Excel.

Analysis typically involves iterating over each user or session, extracting quantitative information about interaction events. For example, extracting the average keyboard input rate, for all sessions belonging to a user, exporting them to disk as tab- delimited or comma-separated values. Such interaction data can then be cross-tabulated with data from questionnaires (see Appendix C), enabling comparisons between users of different background and levels of experience. Alternatively, where a user has supplied enough data, similar observations can be made between their formative and more recent stages of development to look closer at the learning process – by, for example, looking at behaviour, averaged over set intervals – and the role of expertise. As shown in Figure 12, analyses can be written for any Data

Object type, and typically operate on the collection of entries they

contain. As such, most analyses target the Corpus object, which contains all the data in the experiment – allowing access to all users and their sessions. An analysis is created by sub-classing the abstract Analysis class, and implementing the process() function. Compiler macros were written to abstract common or complex analysis operations, such as iteration or the use of multiprocessing.

13

Such filtering helps researchers visually explore the data, but can also greatly speed up data analysis. For example, if an analysis only concerns keyboard input, the program can use the filtering system to extract iKEYBOARD events to separate files, which can then be analysed without loading the full session.

Figure 16 – analysis

options in iMPULS|IVE

The user triggers analyses from the Analysis menu, shown in Figure 16. A wide range of analyses were developed for the

reViSiT experiment, the specifics of which are detailed in the next

chapter. Despite their diversity, most analyses follow a common procedure: loading, extracting, aggregating and exporting data. Figure 17 presents an example code template for a new analysis, which aggregates extracted data from sessions by user, and enables the use of multi-processing – allowing the computer to process more than one user at a time.

Figure 17

code template for data analysis, using macros

(emphasis denotes separate process)

multi-processing The multi-processing optimisation is achieved by performing the

analysis in two passes – a first pass that collects data from separate users or sessions and saves it to a file, and a second pass that collects the data from these files and aggregates it. Since no session data is shared between users, the first stage, which includes the costly loading of session data, can be split between different threads. To implement this, the iMPULS|IVE program simply spawns other processes of itself, passing a command-line argument to them that defines an affinity. The affinity is an integer, defining which ordinals in a given collection (i.e. users or sessions) the process will handle. For example, in a two-process

scenario the original program creates one new process with an affinity of 1. With the original program assuming an affinity of 0, the two processes will thus divide the collection into the sets {0,2,4,6...} and {1,3,5,7...}, respectively. Processing is split using the PREPARE_MULTIPROCESSING() macro. Analysis is restricted to the

appropriate ordinals, using the USE_MULTIPROCESSING macro, which

can be placed inside either the user or session loops, to split processing by users or sessions, respectively. The compiler macros, ON_COLLECT{…} and ON_AGGREGATE{…} are then used to define

what should happen in each of the two passes. The macros allow unnecessary technical details to be hidden from the experimenter, making it easier to follow the line of the analysis.

Figure 18

code template for visualisation, using macros

preparing

visualisations The visualisations developed for the program (Figure 13) follow

a similar template to analyses: loading, extracting and aggregating data – as illustrated by the example code in Figure 18. Instead of exporting the results to a file for use in another program, the code represents data visually, on screen. The program’s separate console window, inset in Figure 11, can also be used to quickly prototype visualisations, using low-fidelity ANSI text.

Visualisations are tightly integrated with the program, making it difficult to split the workload between separate processes. In many visualisations, the code itself still operates in two passes, where pre-processing is needed to establish drawing parameters – for example, in the case of normalising a graph where the maximum value must be known before the others can be scaled. While this makes processing slower, visualisations typically target single users or sessions, so there is less data to process – though more prolific users result in longer delays.

In addition to their use as an analysis tool, visualisations are an invaluable tool for monitoring the experiment and debugging the client program, as discussed in the next section.