3 UIAF system design
3.3 Use case design
4.2.6 Device specific presentation delivery
This section elaborates on the steps 12, 13 and 14 as introduced in Table 5. Based on the decision reached in the previous steps, the resulting multimedia presentations are constructed and delivered to the most suitable devices. SMIL is used as the presentation delivery format of choice to assure device specific content delivery. The biggest benefit of using SMIL is its structured multimedia document nature [97]. Since the system is dealing with documents, and not with encoded content, session continuity, dynamic adaptation, and distributed rendering are easier to implement and deploy. The presentation’s timeline is composed of temporal constructs such as parallels and
sequences, as Figure 30 (left) shows. The media items that compose the presentation follow the temporal structure imposed by the Mobile and Web applications. Normally layout information of the presentation and media items (that is, the regions) provide information for deciding where the items should be rendered. This is in line with the base media item approach that has been introduced earlier. The multimedia presentation delivery process defines a transformation step 13 for layout transformations to adjust spatial arrangements and resize media items, temporal transformations to assure session continuity, and media asset transformations for alternative media items that a specific device can use. Transformations are dynamic and dependent on the situation and multimedia devices in the user’s environment. This is based on the transformation alternatives included in step 7 of the process. In the example in Table 9 it can be seen parts of the presentation should be played on different devices. Text should be transformed to audio and be played on the HI-FI, the accompanying images on the mobile phone and the video on the TV set.
<layout> Document sent to the TV
<region id=“video” width=“...” height=“...”/> <video region=“video” src=“.. ./video.mov” <region id=“ad” width=“...” height=“...”/> customT est=“show”/>
<region id=“caption” width=“...” <audio region=“caption” src=“.../info.mp3”
height=“...”/> customT est=“show”/>
</layout> <img region=“ad” src=“.../ad.3gp”/>
<seq> Document sent to the mobile phone
<par> <video region=“video”
<video region=“video” src=“.. ./video.flv”/> src=“.. ./video.mov”/> <text region=“caption” src=“.../info.rt”/> <audio region=“caption” <img region=“ad” src=“.. ./ad.mov”/> src=“.. ./info.mp3”/>
</par> <img region=“ad” src=“.../ad.3gp”
</seq> customTest=“show”/>
Figure 30: Comparison between the original SMIL document (left) and the annotated one for device tailored delivery (right)
The final presentation will look like the one represented in Figure 30 (right) produce in step 12, where customTest attributes indicate whether the system shows an element. Device Agents are in charge of checking this attribute for each media item. After producing the adequate structured presentation for each of the rendering Device Agents, the system delivers the modified presentation document to the selected Device Agents. The delivery phase takes advantage of the resulting format being a textual descriptive format and transmits it via the Device Agent’s delivery mechanism (e.g. using socket connectivity, the Internet Engineering Task Force’s SIP mechanism, etc.). The usage of SIP as the delivery protocol as part of an IMS infrastructure [98] is detailed in the later Chapter 6 on the service platform realisation.
Finally in step 14, because different devices might render different parts of the presentation at the same time, there is a need for synchronizing the different media elements. Since the UIAF should be able render on varying underlying networks, it is important that the synchronization algorithm
can work across varying networks boundaries. Furthermore, this synchronization should be media sensitive. This work does not provide an approach on media synchronisation itself, but given these requirements a NeighbourCast - NM (Non-monolithic) algorithm as for example presented in [99] could be used.
4.3 Multimodal application control
Multimodal fusion or -integration as described in the ‘state of the art’ aims at allowing the user to interact naturally with computer systems through various modalities (e.g. speech, gesture, keyboard, etc.). From multimodal user interfaces SOTA as presented in Section 2.1.2 this thesis assumes that the user experience can be enhanced by integrating several user input modalities into one consistent fusion model, and extends it to allow flexible control of mobile applications. Knowledge about single modalities is combined based on their timing dependencies (e.g. recognised inputs from single modalities sources happen in parallel, or in a certain time sequence) to extract meaning for multimodal user interaction. Modality Fusion in the scope of this thesis aims to solve flexibility issues for exchangeable user interface devices with modality input capabilities as available in the user’s vicinity in Mobile and Web applications. Furthermore implications for different mobility situations can be studied with this approach as described later in Chapter 5. Normally multimodal user interaction has been analysed for one device and application only. Input recognition model Controller Model View
Mobile & web V application ''M obile & web application
GUI/distributed multimedia presentation
Figure 31; MVC concept mapped to multimodal application control related components
This work proposes a user interface metaphor for applications that translates the very well known Model-View-Controller (MVC) concept as described in [100] from GUI systems for multimodal application control. A very good summary of the MVC principles and related work is given in Wikipedia [101]. Figure 31 shows the basic concept and its direct and indirect relations (green/rectangular blocks) and in addition the mapping for the UIAF multimodal application
architecture or architectural pattern for the implementation of applications that provide any kind of user interaction. The model defines the applications domain logic or data model and is linked to a view. The view visualises the user interface, normally a GUI, based on the models information. The controller receives the user input and invokes changes to the model for a response to the interaction. This mechanism provides advantages for the application design and decouples the three activities needed for application interaction. For example, views can be defined independent from the domain models, the data changes in the model do not affect the implementation of the view directly, but rather the data presented in the view changes. In this work the idea is to move the controller from the application into the UIAF in order to design and implement an application independent multimodal application control mechanism. Therefore the controller becomes the Modality Fusion component. Now multiple applications are able to reuse the control functionality to simplify implementation of multimodal interaction.
The controller allows a translation of user interaction input into changes of the application model. Changes in the application model will affect the view of the application. When talking about Mobile applications the view can be the portal device’s GUI and the applications multimedia presentation as defined for the UIAF multimedia presentation delivery. The mapping is illustrated in Figure 31. Before going into the details of the actual Modality Fusion sequence and internal mechanisms description. Table 10 compares the interactions in MVC with the UIAF multimodal application control flow.
Nr. MVC flow as described in [101] UIAF multimodal application control flow 1 The user interacts with the user interface in
some way.
The user interacts with available user interface devices providing multiple modality input means. Input is provided through a token based input recognition model.
2 The controller handles the input event from the user interface.
Modality Fusion receives the recognised user inputs and matches them against an application provided control model.
3 The controller notifies the model of the user action, possibly resulting in a change in the model's state.
When actions are recognised the application model is notified and the according model states might change based on the application logic.
4 The view is automatically notified by the model of changes in state (Observer)
The application model updates its view according to the model changes. This can be mobile terminal GUI changes or multimedia presentation delivery requests. Table 10: Linking of the MVC control flow to the UIAF multimodal application control flow
This outlines the basic mapping of the MVC to the proposed UIAF application control mechanism. Subsequently the multimodal application control interaction sequence overview, details of the application control model, the token based recognition model and the multimodal application control matching are provided.
4.3.1 Interaction sequence overview
Having set the scene, this section provides the overview of the multimodal application control interaction sequence, highlighting the several involved UIAF components and their functionality in the process. For this purpose the UIAF initial system architecture introduced in Section 3.4 is adapted to highlight the involved components and data models. This is shown in Figure 32 together with the sequence numbers for each process step. Furthermore this relates to the interaction sequences as introduced for the high level architecture design in Section 3.4.1.2 and provides further details on the involved data models and results of each of the steps.
t o k e n r e c o g n i t i o n m o d e l II I - A p p l i c a t i o n c o m m a n d n o t i f i c a t i o n s - i n t e r n a l a p p l i c a t i o n i n p u t d a t a in i n p u t r e g i s t r y BAN / PA
Device and M odality Function (DeaM on)
Device and G atewa Function (DeG an)
S e s s io n C o n tro l
u s e r s m u l t i m o d a l
D ev ice
u o n t ro i i-u n c tio n a lily
i n t e r a c t i o n 6 X- ^ ^ — M odality R ec o g n itio n J t o k e n r e c o g n i t i o n m o d e l V . s t r e a m i n p u t D ev ice H a n d le r Input F u n d O u tp u t Functionality 5 M odality R ec o g n itio n M odality F u sio n UIAF S tu b A p p l i c a t i o n i n p u t s p e c i f i c a t i o n M obile & W e b a p p lic a tio n s D e v i c e a g e n t w i t h i n t e g r a t e d i n t e r a c t i o n s t r e a m r e c o g n i t i o n M u l t i m o d a l a p p l i c a t i o n c o n t r o l m o d e l
Figure 32: UIAF multimodal application control sequence and data models
The system flow initiates from the Mobile or Web application, forwarding the multimodal application control model as part of the application input specification description to register it with the UIAF Modality Fusion component. User’s multimodal interactions are streamed to the DeaMon in order to be recognised and matched against the registered control models. Table II describes the involved steps and used data models in detail.
Nr. Description of the step Resulting data 1 Retrieve the application input specification including the
multimodal application control model.
Application input specification
2 Parse the multimodal application control model into an internal application input data model for the Modality Fusion.
Internal application control model representation
3 Register internal application input data at the input request registry and generate a dialogue identifier associated with the application.
Model in input request registry, dialogue identifier issued
4 Retrieve users’ raw user interaction from continuous input streams (e.g. voice, button pressing, and gestures).
Raw application control input
5 Retrieve device information for input type and quality parameters.
Input type and quality parameters for recognition model
6 Recognition of application control input. Raw input streams are tokenized for application request matching.
Token based recognition model
7 Processing continuous tokenized user application control input and matching against application control request registry. Translating matches into application model change notifications.
Application model change notifications
8 Embedding the application model change in a notification invocation message delivered to the application.
Notification
9 Delivering application specific notification messages. Application model and view updates.
Table 11: Multimodal application control sequence description
This summarises the multimodal application control interaction sequence overview. Details about multimodal application control approach, the token based recognition model and the matching mechanism is provided in the next sub-sections.