2.3 Data Capture
2.3.3 Audience Engagement and Movement
2.3.3.3 Large-Scaled Viewer Navigation and Movement Patterns
Moving out from closed and indoor environments, an understanding of how viewers navigate beyond the immediate vicinity of a digital sign is essential to answer questions on the display effectiveness and potential behaviour and movement changes due to interactions with the sign. Linking back to the motivating scenario (see Section1.2), this level of tracking is key to link the viewer path from the display to the nearest shop – and therefore answer the question about a potential change in behaviour due to seeing the display content. Visual analytics and image processing have been widely explored in research as a common approach to track people behaviour. Candamo et al. have brought together an extensive literature survey on algorithms and tools that use image processing to extract the behaviour of individuals or groups from a video stream [Can+10]. Example applications of such systems include automated behaviour detection, interactions of people with each other and even automated fraud and safety detection [Can+10]. An early example for the development of a visual analytics software for the activity recognition of customers in a retail environment was performed by Krahnstoever et al. [Kra+05]. The authors developed a novel approach in which RFID technology was used as an additional source for the identification of objects on a shelf which have been picked up by the customer – while video analytics was used to capture the customers’ movement traces. The resulting analytics provided novel insights into the ways customers moved across the store and interacted within the retail space, on a level similar to user interactions on an e-commerce website (e.g. the tracking of products that were added and removed to the virtual or physical shopping basket). This could be linked to digital signage the customers may have interacted with before picking up and purchasing a certain product in the physical store.
Generally, video cameras and the subsequent video analytics computation are a common technique to use for the extraction of detailed information about people including their body attributes such as hair type, eye wear and clothing colour [Ham+09]. The authors mention the recognition of the same person across multiple video feeds as an explicit use case of this system – thus allowing the collection of movement patterns in a large scale and across locations as an alternative to using face recognition systems. The tracking of individuals across multiple video sequences and locations has also been explored by Yang et al. [Yan+07]. While their work is focused on the development of the actual visual computing algorithms that are used to track an individual across multiple frames and angles, the resulting insights allow the tracking of an individual across locations without the need for an active or passive tracking device. Of course, while the primary use of such a system is often motivated by safety arguments for the automated detection of fraud or crimes, it can also be used for
2.3 Data Capture 27
capturing analytical insights relevant for the digital signage domain. The computation of behaviour and navigation patterns of individuals after passing a digital sign could allow administrators and providers to gain insights into the effectiveness and performance of a display or content displayed. The recognition of people across multiple video sequences is a difficult problem. Mitzel et al. designed a system that uses sparse detection and segmentation to follow an individual, and supposedly is robust enough to handle occlusions and multiple camera feeds [Mit+10]. The approach of using additional data sources has also been used for improving the accuracy of people tracking. Teixeira, Jung, and Savvides combine visual analytics from existing surveillance infrastructures with on-device sensors (in this case, accelerometer and magnetometer) to uniquely identify people through their mobile phone’s unique identifier [TJS10]. This allows the recognition of the same person even after returning to the monitored area. Of course, this approach requires the person to carry a smartphone with a dedicated application as an active tracking device. As an example for the use of behavioural analysis, Girgensohn, Shipman, and Wilcox have developed a system that uses visual analytics to determine and track activity patterns in a retail store [GSW08]. The tool produces heat maps of common movement patterns and frequently visited places, and the visualisation also includes the location and speed of people. This can help in understanding the kinds of activities customers were performing in the space [GSW08].
Using visual analytics for activity and behaviour recognition and tracking is a common use case in research as identified in a survey conducted by Candamo et al. [Can+10]. Alternative approaches for tracking people to gain more insights into their navigation patterns, for example, in the context of adventure and amusement parks, involve the use of active GPS tracking devices [RCS10]. Konidala et al. use mobile applications to collect additional information about visitors, including their location coordinates as they move around the park [Kon+13]. The authors point out the usefulness of such data for analytics purposes for park providers. On an even larger scale, cellular data can provide movement traces of individuals [Bec+13] – though the granularity does not appear to be suitable for the use in the context of signage
analytics.
The detection of people and objects in video sequences, however, is a complex problem. Researchers have worked on methods to improve the accuracy and performance of such systems, e.g. by combining computer vision with “models of pedestrian dynamics” [Ant+06]. In addition to the technical challenges in audience tracking through video analytics, other challenges and concerns arise. The use of video analytics and tracking of individuals has been pointed out to be privacy invasive and a number of concerns have been raised within the research community (e.g. context-aware displays that recognise individual viewers make their private information visible on public displays) [Gre+14]. To address such issues, researchers have worked on methods to enable the use and collection of such insights while still preserving individuals privacy. Zhang et al. [Zha+10], for example, discuss privacy issues of closed- circuit television cameras and provide a privacy-preserving solution for pedestrian tracking and recognition. Instead of storing an image of the viewers face (i.e. a frontal image of an individuals face), the system computers biometric features of the face and stores a hash of
2.3 Data Capture 28
the biometric features [Zha+10] – similar to how systems currently hash password strings to avoid the storage of such sensitive information in plain text. This approach prevents potential attackers from decoding facial and biometric information stored on the device while allowing the system to recognise reoccurring viewers.
Viewer navigation paths (whether captured via video analytics or location tracking systems such as GPS) represent highly sensitive data and may violate the privacy of individuals. A number of systems, however, have been developed that address issues of privacy in this context. LocServ is an early example of a system developed by Myles, Friday, and Davies [MFD03] specifically designed to allow individuals take full control over location data captured and processed by defining and applying policies. For example, users can express times and contexts in which location tracking is acceptable (e.g. while at work) whilst rejecting location tracking in other contexts (e.g. while at home) [MFD03]. The system has been partially motivated by pawS (“a privacy awareness system for ubiquitous computing environments”) [Lan02] supporting the implementation of ‘data usage policies’ allowing users to both express their preferences for the usage of personal data and track the usage of it.
2.3.4 Interaction Events
Understanding more about interactions and engagement with digital signage and content showing on displays is a key aspect in drawing conclusions about the effectiveness and usefulness of both display deployments and content [Alt+12]. Multiple ways exist in which users can interact or visually engage with a public display. She et al. conducted a survey to extract common interaction modalities in the digital signage domain and identifiedpresence,
direct touch,gestureandremote controlas a set of interaction categories [She+14]. For each
of these categories, different software- and hardware-based capture techniques are available to capture the data that is relevant for signage analytics. Whilst we described proximity-aware systems that take viewer presence into account in Section2.3.3.1, we will give an overview of related work in which researchers have developed and deployed digital signage systems that allowed viewers to interact through remote controls in form of mobile phones, gestures, touch, and, as an additional category, gaze. All of the described technologies could function as data capture techniques and be used to build up analytics-relevant datasets about the interaction and engagement of viewers with digital signage.