Capturing contextual data - Data collection and labelling

2.4 Data collection and labelling

2.4.2 Capturing contextual data

The usefulness of the interruptibility label produced is ultimately tied to its ability to be predicted from contextual data. Therefore a key design consideration is the choice of what data to capture and how, which can include data representing the current moment (e.g., [72, 99]) as well as historical activity (e.g., [24]).

2.4.2.1 Common contextual data traces collected

Capturing signals to represent the current context is an essential component in predicting interruptibility (or the effects of interruptibility). Table 2.4 details the types of data traces commonly collected in the literature, classified as being from either external sources to an individual or more latent. These can loosely be described as capturing what is currently happeningand how the user feels respectively. Ideally, this data should be as rich as possible, however resource constraints and scenario environments typically dictate a subset of these being used (as shown in Table 2.4). Collection of this data can involve the use of explicit human annotation through surveys (likewise to labelling), real world machine sources, or in some cases, simulated data; with Table 2.2 and Figure 2.4

26 2.4 Data collection and labelling

Source Example data traces and studies

External

Smartphone sensors: such as hardware sensors (e.g., [115, 59, 97, 99, 105, 92, 30, 102, 89, 72, 24, 108, 76, 77, 83]) and/or software APIs (e.g., [65, 115, 97, 111, 30, 18, 102, 77, 106, 21])

Other physiological sensors: capturing physical state (such as heart rate) (e.g., [59, 105]) or activity (e.g., [105, 57, 43])

Other environmental sensors: such as sound or motion in a room (e.g., [82, 32, 48, 15, 44, 45]) or car (e.g., [59])

Software events: e.g., active windows, keyboard and mouse activity (e.g., [50, 51, 55, 82, 32, 118, 80, 42, 15, 44, 46, 45, 33, 116, 61, 85, 72]) Calendar schedules: (e.g., [113, 115, 44, 106])

Temporal logs: e.g., of user actions (e.g., [55, 65, 97, 45, 106])

Spatial logs: e.g., GPS (e.g., [113, 115, 111, 105, 30, 102]) or connections to antennas (e.g., [82, 99, 111, 92, 46])

Latent

Self reports: experience sampling (e.g., [29, 91, 82, 119, 105, 92, 47, 51, 57, 43, 94, 76, 108]) or post-experiment surveys (e.g., [8, 37, 54])

Qualitative feedback: e.g., post-interviews (e.g., [29, 47])

Third party observer reports: e.g., in situ observation (e.g., [54]) or video annotations (e.g., [48, 57, 31, 67])

Physiological sensors: e.g., mental state or workload (e.g., [71, 9, 105, 13, 72, 129])

Table 2.4: A categorisation of commonly captured data traces. Table extended from [121].

showing the use of these different practices over time.

Advances in ubiquitous sensing (such as mobile devices) is a likely cause of the rise in the use of real world machine sources, such as sensors (e.g., [99, 15]) and experience sampling (e.g., [92]) over simulated sensors (e.g., [48, 31]) in recent years (Figure 2.4). Additionally, the personal relationship between these devices and their user has been argued to allow more “ecologically valid data" [79], rather than using peripheral devices, such as external cameras (e.g., [48, 31]) or wearable accelerometers (e.g., [43, 57]). However there is still disagreement over whether sensors should be used [92, 99, 115] or not over user annotation (e.g., through ESM), due to accuracy and reliability concerns [113, 64, 105], resource requirements [111], and limitations for measuring latent variables.

2.4 Data collection and labelling 27 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 Year 0 1 3 5 7 8 10 12 14 16 Number of studies

Experience Sampling Real Sensors/Software Events Simulated Sensors

Figure 2.4: Use of different data collection strategies over time. Figure extended from [121].

However, human involvement either from a third party observer (e.g., [48, 57]) or by the participant themselves (e.g., [47, 119, 91]) has also been argued to suffer from similar issues (e.g., [82, 91, 72]). On one hand, it has benefits including being highly flexible in what can be asked, having a low cost overhead in terms of technical resources, and allowing the collection of latent variables (e.g., mental state) which aren’t easily ob- servable by readily available sensors [64]. However, the use of ESM for interruptibility research specifically has been controversial due to the additional interruption cost it places on the user [30, 72, 78]. Overall, likewise to labelling, the use of explicit human annotation and/or implicit machine sources remains a contested issue.

Beyond this debate, a trend in recent years is the consolidation of technologies used to collect this contextual data, such as only using a smartphone (e.g., [92, 99, 30]). However the emergence of networked pervasive technologies in the environment (i.e., the Internet of Things) and upon the person (i.e., smart wearables), could lead to this becoming unconsolidated once more with these technologies augmenting existing data traces. For example, a light sensor in a room may be more consistent and accurate

28 2.4 Data collection and labelling

than a smartphone equivalent. Additionally this could extend the possibilities of what contextual data is possible to collect - leading to the question:

(RQ4) How can emerging sensor-equipped ubiquitous technologies (such as wearables) improve sampling accuracy and reduce collection and processing complexity in-the- wild?

Tapping into these technologies does not form part of the scope for this thesis, however forms a direction for future research; with their presence becoming more natural and accepted (like the smartphone), rather than the presence of foreign peripheral devices introduced just for experiments.

2.4.2.2 Extracting feature variables from the raw traces

After collecting data traces, feature variables are extracted from the raw data to create a vector representing the context for predicting interruptibility from (or the effect of the interruption). A common first step is to apply smoothing techniques to the data, in order to remove noise (e.g., [71]). However, in conducting the meta-analysis, broadly speaking there is little evidence of widely adopted conventions within interruptibility studies - likely due to scenario differences in the use of different data traces and hardware.

The extracted feature variables can be categorised as representing either the: user, environment, interruption, or the relationships between these. A previous survey by Ho and Intille [43] detailed 11 measures/variables that have previously been considered to influence interruptibility. However, due to the volume and breadth of studies since their work, Table 2.5 extends their observations of the types of features commonly used across the literature. It should be noted that the variables included here were identified where they were either explicitly stated or could be confidently inferred. While some features are likely scenario dependent (e.g., location), there are still large differences across works in the features used, with only a few reoccurring often. Again this supports that comparing and building from interruptibility works is challenging [107].

2.4 Data collection and labelling 29

Type Example types of features

User Features Pupil size event statistics (e.g., [9, 13]), EEG event statistics (e.g., [71, 72, 129]), emotion (e.g., [105, 92, 36, 108]), learning style (e.g., [115]), personality (e.g., [115], demographics (e.g., [72]), time until next calendar event (e.g., [46, 113, 106]).

Environment Features

Location (e.g., [119, 91, 99, 105, 92, 115, 113, 82, 47]), other people present (e.g., [48, 31, 92]), states e.g., door open/closed (e.g., [31, 32]), cell tower id (e.g., [111]), connectivity (e.g., [111, 76, 46]), nearby Bluetooth (e.g., [92]), smartphone ringer state [97, 30, 72]), smartphone screen covered (e.g., [97, 99, 30, 77, 72]), smartphone orientation or position (e.g., [99, 67]), ambient noise (e.g., [30, 15]), light intensity (e.g., [72, 122, 24]).

Interruption Features

Content e.g., text or phone number (e.g., [111, 29, 102, 76]), task complexity (e.g., [37, 19]), number of queued interruptions (e.g., [97]), time between interruptions (e.g., [44]).

User and Environment Features

Time of day (e.g., [31, 115, 113, 82, 46, 97, 111, 99, 102]), day of the week (e.g., [115, 46, 97, 111, 105, 102]), user is in conversation (e.g., [119, 46, 31, 105, 46, 45, 108]), user’s current activity (e.g., [31, 82, 115, 105, 92, 54, 57, 102, 83]), user is present (e.g., [31, 15, 44, 43, 48]), software event statistics (e.g., [97, 46, 82, 65, 118, 42, 18, 15, 54, 32, 44, 45, 33, 86]), unusual environment to be in (e.g., [91]), frustration level (e.g., [115, 8, 51]), stress (e.g., [105]), level of annoyance (e.g., [14]) respiration (e.g., [105]), ambient sound (e.g., [82, 46, 44, 108]), car movement (e.g., [59]), human motion (e.g., [59, 43, 117]), smartphone motions or acceleration (e.g., [99, 30]), PC active and inactive time (e.g., [46]), user head position and posture (e.g., [116]), device use statistics (e.g., [72, 89, 24, 77, 106]). User and

Interruption Features

Social relation (e.g., [119, 37, 102, 36, 10]), interruption frequency (e.g., [37]), content desirability (e.g., [91]), perceived mental effort (e.g., [8, 37]), perceived task performance (e.g., [37, 8]), resumption lag (e.g., [50, 8, 65, 80, 51]), perceived timeliness of delivery (e.g., [91]), number of primary task errors (e.g., [55, 14]), primary task duration (e.g., [8, 65, 46, 14]), elapsed time to switch to interruption (e.g., [97, 105, 45, 51]), primary task complexity (e.g., [115, 37]), interruption time (e.g., [111, 72]), interruption duration (e.g., [65, 8, 105, 14]), perceived time pressure (e.g., [8]), previous or next task cue presented (e.g., [55]), elapsed time before user reaction (e.g., [42]), influence from social contexts (e.g., [36]).

30 2.4 Data collection and labelling

Additionally, in highly constrained and volatile environments such as the smartphone, this transformation process (e.g., from raw microphone readings to the level of ambient noise) could be costly in terms of computational resources. Choosing appropriate and technically feasible data sources to create features from is common at the design phase, however reflections on the cost of transforming these into features (where relevant) has received little attention; yet could bring valuable design considerations for future studies and applications:

(RQ5) Can the utility of potentially influential variables be standardised by considering the trade-off between accuracy and sampling / processing complexities?

Several works have touched on this within wider domains, however, this is not common practice for individual interruptibility studies. For example, Lathia et al explore the issues relating to smartphone sensor sampling stability [64]. A future research direction could involve the formulation a standardised framework for quantifying the cost of retrieving individual feature variables on specific hardware, or investigations into the difficulties of doing. This is not a direct focus in the remainder of this thesis, however the stability of using typical Android hardware to support the data collection used is discussed in Chapter 4.

In document Decomposing responses to mobile notifications (Page 55-60)