2.4 Conceptual Frameworks of (Mobile) Pen-and-Paper Interaction
2.4.1 Pen-and-Paper Interaction Techniques
As defined by Hinckley, an interaction technique consists of input combined with ap- propriate feedback [Hinckley, 2007]. It essentially represents a mechanism employed by a user to invoke certain functionality, e.g., the drag and drop technique in GUIs to open a particular file in an application. Thereby, the interaction technique refers to the mechanism how a certain functionality can be invoked.
Functionality invoked can either be self-consistent or apply to data specified as part of the interaction technique, e.g., crop-marks to mark text used in a copy command [Liao et al., 2008]. In the latter case a selection technique forms part of the interaction technique, where the user employs another technique to specify data. This concept is referred to as chunking and phrasing or chaining [Buxton, 1986]. Here, chains of interaction techniques can be constructed to form a complex technique out of several basic building blocks.
In the domain of PPI, three main classes of interaction techniques have been pro- posed: Pidgets and Proxies on one end of the spectrum, Gesture Systems on the other, and Cross-media Links supporting tight integration of paper and digital artifacts. Pidgets and Proxies
The first class of pen-and-paper interaction techniques relies on attaching digital func- tionality to certain regions on paper. Whenever the user positions the pen on such a region, the system executes digital functionality associated with this region. Icono- graphic representations of this functionality help visualizing the concept to the user. This mechanism can be compared to clicking a button in a GUI, as depicted in Fig. 2.6. Anoto coined the term Pidget interaction to describe such techniques in the documentation of their paper SDK36. Thereby, a Pidget corresponds to an interac- tor represented by an icon printed on paper that triggers system functionality when ”clicked” on with a digital pen [Signer et al., 2014]. This technique is often com- bined with selection techniques, e.g., by drawing a line around a document area,
Figure 2.6: The Pidget Interaction Technique (courtesy Anoto AB)
to specify input to the command invoked when subsequently ”clicking” a Pidget [Costa-Cunha and Mackay, 2003].
Many PPI based systems use Pidgets, as they provide a very intuitive approach toward PPI both from the developer as well as from the user perspective. Examples are PaperPoint, [Signer and Norrie, 2007b], which employs the concept as the main paradigm to control slides in a presentation; and NICEBook, [Brandl et al., 2010], which categorizes contents of notes via Pidgets (similarly to [Steimle et al., 2008a]). Furthermore, Pidgets provide a very convenient way to support palettes in structured diagramming applications, e.g., in [Dachselt et al., 2008], as well as for remote control applications based on PPI, e.g., in [Berglund et al., 2006].
The concept of attaching functionality to a certain location makes complex seg- mentation operations, i.e., determining which digital ink refers to created content and which refers to control commands, completely obsolete. However, the main disad- vantage of Pidgets is their static nature: a region on paper has to be attached to func- tionality during design time and the graphic representation has to be printed on paper documents. Encountered paper documents cannot be instantly used by such appli- cations, as they would lack the printed representations. Additionally, using Pidgets renders re-use of paper documents in other contexts problematic at best (c.f., docu- ment mobility as described in section 2.3.1).
Furthermore, the amount of paper real estate dedicated to representing Pidgets as opposed to the real estate dedicated to contain user generated content, needs to be carefully balanced. This introduces an upper limit for the amount of Pidgets, which might exceed the requirements of a given application.
Gesture Systems
The second class of pen-and-paper interaction techniques bases on associating func- tionality with gestures. Whenever the user performs a gesture with a digital pen, the system processes and interprets the digital ink, subsequently mapping it to an alpha- bet of pre-defined gestures. This process is referred to as gesture recognition. If the system recognizes a known gesture, it triggers functionality associated with that ges- ture. Thereby, chaining and combining gestures is possible, e.g., combining selection techniques with gestures to specify input data as in PapierCraft, [Liao et al., 2008].
The actual recognition of gestures is a problem not specific to PPI or mPPI. A broad set of recognition algorithms exists. Recognition algorithms such as Hidden
Markov Models(HMM), [Sezgin and Davis, 2005], Dynamic Time Warping (DTW), [Choe et al., 2010], and feature based statistical classifiers, e.g., the Rubine classifier, [Rubine, 1991, Blagojevic et al., 2010], have been used to recognize PPI gestures. However, simple geometric techniques often satisfy the need for fast and accurate gesture recognition, while imposing less complexity into the underlying recognition system. Examples for this class of algorithms are the famous $1 Gesture Recog-
nizer, [Wobbrock et al., 2007, Anthony and Wobbrock, 2010], and the ¢1 Recognizer [Herold and Stahovich, 2012].
In order to support gesture recognition on the system side, Signer et al. introduced
iGesture, [Signer et al., 2007b], a flexible gesture recognition toolkit on top of the
iServer/iPaperinfrastructure. It offers several standard recognition algorithms in the domain of PPI to chose from. Other algorithms can be integrated by developers if needed.
Gestures have been used in a broad range of PPI based applications, e.g., Papier-
Craft, [Liao et al., 2005, Liao et al., 2008], a gesture system to manipulate documents, and PaperProof, [Weibel et al., 2008], a hybrid paper digital proof-reading system for scientific publications. Special gestures are used to mark certain regions for later use, e.g., in the ”hotspot association” gesture described by Yeh et al. [Yeh et al., 2006a], where the user draws a set of crop-marks to act as a placeholder for a digital im- age. Gestures can also control mixed paper digital environments, e.g., in Strip’TIC, [Gauthier et al., 2014], where gestures are designed to span paper and virtual artifacts. Gestures do not require any pre-printed interactors on documents. As a result, gesture systems support the use of documents in different application contexts, i.e., document mobility, considerably better than Pidget based techniques. However, a main problem of gesture systems is the discrimination between gestures and user generated content. Either complex segmentation techniques can be used, e.g. as proposed by Ao et al. [Ao et al., 2006] or Ispas et al. [Ispas et al., 2011]; or the user needs to explicitly define when a gesture starts, e.g., by pressing a button as in PapierCraft [Liao et al., 2008]. Furthermore, chaining of gestures requires methods
to distinguish individual gestures in the chain, e.g., by introducing special markers [Hinckley et al., 2005, Hinckley et al., 2006].
Besides this additional design complexity, gesture based interaction also imposes problems regarding learnability and recallability [Norman and Nielsen, 2010]: the user needs to learn and remember gestures defined in the alphabet, because the in- terface itself typically does not offer any clues regarding available functionality. This can quickly become a problem in more complex systems. However, the actual perfor- mance of users depends on the design of the particular gesture set and the application itself, e.g., Liao and Guimbreti`eere showed that the gesture set employed in Papier- Craft can be learned roughly within 30 minutes [Liao and Guimbreti`eere, 2012].
Cross-Media Links
The third class of interaction techniques combines actions on paper and digital arti- facts in a single coherent cycle. In most techniques, it is thereby important where actions are carried out, e.g., the user needs to perform her actions on a designated paper area. Typically, these techniques are applied to connect paper and digital docu- ments in order to establish cross-media links.
For instance, Steimle proposed a cross-media linking technique involving physical and digital artifacts in one cycle of actions [Steimle, 2009a]: in order to establish a link between a section of a paper document and a digital document, the user marks content by drawing a vertical line at a specially designated area on paper (link source) and another line on a digital document displayed on a screen (link target). After a link has been established in this manner, the markup, i.e., the line drawn on paper, acts as a Pidget and can be ”clicked” to activate the link, i.e., open the section of the digital document on the screen.
Similar concepts have been used in the context of links between multiple paper doc- uments [Liao et al., 2008, Brandl et al., 2010]. These gestures typically base on the
stitchingconcept, as originally described by Hinckley et al. [Hinckley et al., 2004]: the user lays paper documents and / or digital resources physically close together and draws a line spanning these resources to issue a link. Subsequently, the physical markup presented by the line can be used to follow the cross-media links. Thereby, actions in the digital world, e.g., pressing buttons as in [Liao et al., 2008], serve to initiate or confirm these links.
From an interaction point of view, cross-media linking techniques stand halfway between Pidgets and gestures. On the one hand, these techniques encompass behav- ioral components, i.e., the user needs to ”do something”. This resembles gestures in the fact that users have to learn, memorize and remember these components. On the other hand, these techniques typically involve a location component which enables applications to provide visual clues aiding recall, similarly to Pidgets. As such, cross-
media linking techniques inherit advantages and disadvantages of both other classes to a certain degree.
Thereby, the digital component of cross-media links allows overcoming limitations inherited from Pidgets: Tsandilas and Mackay extended the concept of interaction proxies in the form of knotty gestures [Tsandilas and Mackay, 2010]. Here the user issues a gesture-like command (c.f., gesture systems as described above) which is dynamically bound to the region where this command has been drawn. This concept creates interactors similar to Pidgets without requiring a pre-configured and dedicated document as the regular Pidget technique. However, this approach still suffers the same penalties on learnability and recallability as gesture systems.