An Analysis of a Feedback-based Direct-touch Input Gesture

Chapter 3 Conceptual Framework for Analyzing Pointing-based Interaction

3.2 A Framework for Analyzing Pointing-based Interaction Instruments

3.2.1 An Analysis of a Feedback-based Direct-touch Input Gesture

In this section, I will analyze a typical selection procedure on a _{Figure 23: Feedback-based}

feedback-based direct-touch input device. Performing touch input to a digital system is a task that has many cognitive similarities to reaching and deictic pointing (Marteniuk et al., 1987). I will use “Start Firefox on my smart phone” as an example in my analysis. I assume that the user is already holding the device, and that the device is ready to receive user input, i.e. it is turned on, unlocked, and on its home screen.

Production of a Direct-touch Input Gesture in the Model Human Processor

As most actions, the selection starts with a verbal description of the goal: starting Firefox on the smart phone. The first operator of the overall goal is recalling a visual and spatial representation of the goal’s verbal descriptor (A – C), the second one is finding the proxy icon within the menu structure (D – G), and the last one is producing a tap gesture on the proxy icon (H – K).

A – C At the beginning of the first operator, people have the verbal descriptor of the overall goal (“start Firefox”) loaded in the cognitive processor or central executive (A) (see 2.4.3). Then they retrieve the visual imagery of the icon from visual long-term memory and the last known location of the icon within the menu structure from spatial long-term memory (B). After this, the previously retrieved information is loaded into working memory (C).

D – G In the second operator, people now use both pieces of information to find and visually acquire the icon in the menu structure. First, they formulate the desired outcome of their movements, that is performing a series of swipe gestures that advances the menu to the correct screen (D). Then they calculate the necessary motor program for this movement from an existing and proficiently known motor-response schema for on- surface swiping gestures (E). After this, they evaluate the success of their finding effort (G) by matching current visual sensory feedback (F) to the visual schematic of the Firefox icon now stored in working memory (C) (see 2.4.3). If the error is within acceptable limits, in this particular case: if users have successfully identified the correct icon, they are now ready to produce a tap gesture at the location of the icon. H – K In the final operator, people use their knowledge of the icon’s location to produce the

appropriate tapping motion. As with every motoric production, people determine the desired outcome of their tapping gestures based on the location of the icon (H). Then they calculate the necessary motor program for this movement from a well-known motor-response schema for arm, hand, and finger movement (I). After this, they evaluate the correctness of their finger movement (K) by matching visual sensory feedback (J) to the calculated trajectory of their tapping gesture. If the error is within acceptable limits, the stroke phase of the deictic tapping gesture is complete, and people enter the holding or retraction phase of the gesture (see 2.4.4).

Analysis of the Production of Direct-touch Input Gestures

In this section, I analyze the three operators (recall proxy icon, find icon, and perform gesture) required to complete the overall goal.

First operator: Recalling the icons visual appearance and former location (B) depends on

(semantic) visual memory (see 2.5.5) and spatial memory (see 2.5.3). The visual schematic of the icon is important because it later acts as input in the error assessment process (G); the spatial information is important because it defines a starting position for the—relatively slow—visual search for the icon. Although none of these information are essential, they accelerate the

following operator (finding the proxy icon). Without any spatial information, people would have to visually search the entire input space and not just a subsection; without any visual information, people would have to rely on reading and linguistically processing the icon labels or simple guessing the correct icon using relational queues (see 2.5.5). Previous research confirmed that people can recall the visual schematic of icons well, especially when the icons are were designed with people’s associative abilities in mind (see 2.5.5). Spatial memory, in contrast, is more expertise-driven than visual and relational memory, i.e. people acquire it more implicitly as a byproduct of interacting with objects (see 2.5.3). That makes spatial memory also more

susceptible to failure when object location changes. In HCI, numerous research has shown that spatial stability benefits people’s performance with user interfaces (e.g., Gutwin, Cockburn, Scarr, Malacria, and Olson, 2014). Overall, it is reasonable to assume that people can recall the proxy icon reasonably well, as long as it does not change its location.

Second operator: Finding the proxy icon within the input space depends on the structure and size of the input space, as well as the people’s familiarity with the input space. The structure of the input space can be flat, e.g., the keys on a keyboard or remote control, linear, e.g., the scrollable list menu common in today’s smart phones, or hierarchical, e.g., file browsers in desktop operating systems. Complexity analysis describes how the times it takes to find an object depends on the structure of the input space. On a flat input space, this time is equal for all

objects: 𝑂(𝑛) = 1 (e.g., hashtable). On a linear input space, this time depends on the number of elements in the input space as in average half of the elements have to be traversed 𝑂(𝑛) = 𝑛 (e.g., linked list). On a hierarchically structured input space, the structure helps decreasing retrieval time to 𝑂(𝑛) = log 𝑛 (e.g., tree). These three example are based on data access by a

computing system, and access times might not directly be comparable to that of a human. There are studies, however, that have reported similar performance behavior in humans, e.g., that a flat menu structure has advantages over a hierarchical one (Scarr, Cockburn, Gutwin, and Bunt, 2012). Overall, it is reasonable to assume that navigating the user interface to find the desired proxy icon will take the majority of time for reaching the overall goal and that this time highly depends on people’s familiarity with the input space, its structure, and its spatial stability. Third operator: Performing a tap gesture uses a simple and frequently used motor program. Research has shown that people can perform this type of gesture quickly and accurately (Fitts’s Law, see 2.2.3).

The conclusion of this analysis is that people should be able to perform the goal of selecting a proxy icon from a menu accurately. The time it takes to make such a selection mostly depends on people’s performance in the second operator: finding the correct icon. For this operator, the structure of the input space and people’s familiarity with the input space are crucial. This also means that a fundamental improvement in selection time (e.g., from 𝑂(𝑛) = 𝑛 to 𝑂(𝑛) = log 𝑛) can only be achieved by changing the structure of the input space.

3.2.2 An Analysis of a Mid-air Full-arm Pointing Gesture toward a Real-world

In document Improving command selection in smart environments by exploiting spatial constancy (Page 87-91)