3.2 Distributed mPPI Processing Infrastructure
3.2.1 Basic Platform Concepts
The basic idea of the distributed PPI processing pipeline as described in section 3.1 is to decouple processing stages (and hence interaction resources), both from particular
applications and deployment locations. The deployment becomes flexible, enabling a variety of different deployment schemes and approaches to share existing interaction resources.
The platform supporting PPI in the mobile domain based on the distributed pipeline architecture follows this principle of flexibility. It provides a flexible, yet light-weight approach toward interaction processing: Successive processing stages with clearly defined separated interfaces channel digital ink data to applications and transform it appropriately. The two basic building blocks of the infrastructure, as derived in section 3.1, hence are
Processing Stages (PS) The individual components in the pipeline transforming the data, i.e., the Driver Stage accessing the pen hardware, the Region Stage relating digital ink to interactive regions, the Semantic Stage interpreting the digital ink and the Application Stage preparing it for processing at the applica- tion level
Processing Stage Interfaces (PSI) The interfaces between processing stages as points of distribution consisting of data definitions and services, i.e., the IPen interface describing raw digital ink and services abstracting from the pen hard- ware, the IRegion interface relating this to interactive regions and organizing data into movement sequences and the IInk interface describing the full tree- like data structure of digital ink with attached interpretation
In order to provide the flexible and light-weight platform the concept of required and optional stages needs to be supported within the platform (c.f. section 3.1.2). This enables the infrastructure to cope with changing environment conditions, i.e., the infrastructure can ensure that the required stages remain accessible, while the optional stages can be deployed if needed and if resources permit. Additionally, a runtime re- deployment needs to be possible, including flexible resource sharing via appropriate handover mechanisms.
In a platform for mobile PPI processing based on the distributed PPI processing pipeline, this is achieved by employing a plug and play interoperability scheme be- tween the stages of the distributed PPI processing pipeline: stages can discover other stages by using standard service discovery methods and start interoperating within a flexible publish / subscribe based infrastructure. Following this approach, stages are deployed independently and have the ability to ”plug together”, i.e., stages can discover other stages and interoperate requiring minimal configuration at deployment time, by subscribing to, or publishing at pre-defined communication channels.
On the one hand this facilitates the physical distribution required to support PPI in the mobile domain, e.g., to allow the processing intensive handwriting recognition to be executed on a backend system and not on the mobile client itself. On the other
hand, it allows for redistribution according to the current needs of an application: Imagine a hybrid mobile / nomadic scenario, where a user interacts with her paper based note-taking application on the move and then arrives at her office in order to sort through and archive the notes. Here the mobile scenario might utilize a different deployment than the nomadic scenario in the office. Additionally, the plug and play like interoperability increases the resilience of such a system: shutdown or connection loss between components can be handled much more flexibly.
Services, Data and Dataflow
At the core of the plug and play like interoperation of components are services, com-
munication channels and data structures defined at the processing stage interfaces. Thereby, the mechanisms to exchange data between components need to support the pipeline based processing of digital ink, i.e., data successively travels through the pipeline. Thereby, the direction of dataflow plays an important role. While digital ink is typically pushed through the pipeline, the data on defined interactive regions, i.e., the digital representations of interactive regions, must be pulled through the pipeline by the processing components in order to allow relating digital ink to interested appli- cations.
In order to allow processing stages plugging together, the mPPI processing in- frastructure employs a micro service architecture (MSA), similar to emerging archi- tectures for distributed, light-weight web-applications5. This architectural approach bases on patterns and principles commonly found in Service Oriented Architectures (SOA), e.g., service discovery and explicit interfaces. However, its services are much more fine-grained and employ a high degree of autonomy.
In the MSA employed in the mPPI infrastructure, a set of discoverable services forms the backbone of the system. This is combined with a publish / subscribe (pub / sub) system defining the decoupled, asynchronous communication channels between components to support the push based data flow characteristic for pipeline process- ing approaches. Thereby, services do not necessarily correspond to entire processing stages, e.g., the driver stage exposes individual services for connected pens.
Typically, exposed services provide information on available communication chan- nels, e.g., on which channel to obtain the data of a particular digital pen. However, they also offer state data regarding particular pipeline components, or can be used to inquire about defined interactive regions or data collected within the pipeline. Pre- defined data structures associated with PSIs describe the data exchanged on these communication channels in order to allow for easy and convenient interoperation. Hence the processing stage interfaces consist of
Driver Stage S1 S 2 Region Stage S App 3 Pen Discovery / Access IR Discovery / Access PSI IRegion R1 s e s e e s PSI IPen P1 e s s e P2 e s PUB PUB SUB SUB PUB SUB PP RP
Figure 3.7: Dataflow and Services in the distributed PPI processing pipeline (example setup): Data consisting of events (e) and samples (s) is pushed through the pipeline on communication channelsP1,P2andR1(pub / sub). At the
same time, information about active pens (S1andS2), as well as defined
IRs (S3) is pulled as required.
• communication channels (push based), used to access data and establish data flow within the pipeline, this follows the pub / sub communication paradigm • discoverable services (pull based), used to identify communication channels
and inquire about state of interaction resources
• data structures describing the exchange format of data traveling on the commu- nication channels
Fig. 3.7 illustrates how services, communication channels and data structures form the interaction processing pipeline.
Channels. Data is pushed through a set of communication channels using pub /sub.
This dataflow starts at the pen as source to one or several applications as sinks, i.e., the applications interested in a particular interactive region on which interaction with the pen occurs. While traveling through the pipeline, the data is enriched with addi- tional information, e.g., recognition results or clustering information. However, the original data can still be traced, it is just wrapped with additional information. This employs a topic based pub / sub communication paradigm, where the different topics are certain stages of processed data (from raw to application level), corresponding to the processing stages of the pipeline architecture.
Services. At the same time, a set of services offer pull based information on inter- action resources present in the current pipeline setup, e.g., which pens are available or which interactive regions are currently defined. These services typically allow to inquire about state data of the associated interaction resources, e.g., whether a given digital pen is currently moving on an interactive region. Their main objective, how- ever, is to identify where the data generated by these resources can be obtained, i.e., at which communication channel it will be published and thus where an interested component needs to subscribe to.
Data Structures. As described above, the basic data structures are events and
samples. Events describe state changes in the course of interaction. Depending on the processing stage where they emerge, this can either be state changes regarding the pen movement, e.g., pen tip put down, or state changes regarding the recognition of continuous traces or even semantic recognition results, e.g., when a gesture recognizer detects a certain gesture. In contrast to this, samples form the actual digital ink gen- erated by the digital pen. A sample typically consist of thex and y coordinates of the pen tip position at a certain point in timet. Depending on the underlying digital pen hardware, additional information can be included, e.g., in the reference implementa- tion, Letras, the pressure applied to the pen tip while (f ) is also included in the sample information (c.f., section 3.3). These data structures constitute the data that is pushed through the pipeline, i.e., the data traveling along the communication channels.
Deployment Schemes
Following the concepts outlined above, the derived mPPI infrastructure allows a plug and play like interoperability of pipeline stages at the processing stage interfaces using standard service discovery and topic based publish / subscribe communication. This enables flexible and easy distribution: As channels can provide both local and network connections, the distribution decision can be made at deployment time, or even at run time. The deployment layout, i.e., which stage is hosted on which nodes and the communication links between them (local / network) corresponds to the deployment scheme being used (c.f., section 3.1.2, Deployment Schemes).
Each processing stage in the resulting platform consists of a main service, that will start or stop services for its encapsulated resources as required, e.g., the services wrapping digital pen resources. A processing stage will initiate a continuous service discovery for the services encapsulating resources of interest, e.g., digital pens or interactive regions, of adjacent processing stages.
Upon discovery of such a service, e.g., a service representing a digital pen resource, the processing stage will inquire the communication channels of this resource, e.g., the channel the pen streams its data on. It will automatically subscribe to these channels
Driver Stage 1 P3 P1 S1 S 3 PSI IPen Driver Stage 2 P2 S2 Pipeline
Figure 3.8: Service abstraction in the Driver Stage
and thus establish the connection between processing stages. After that it will com- mence operation immediately during runtime. This mechanism is therefore referred to as a hotplug mechanism.
With an ongoing, continuous service discovery and a hotplug connection mecha- nism, each pipeline stage is ready to connect to its adjacent stages during runtime. However, one further abstraction is required in order to fully utilize the deployment schemes described in section 3.1.2: processing stages require support for multiple
fan-outand multiple fan-in with respect to other processing stages.
In the mPPI infrastructure this is achieved exclusively using pub / sub channels as means of communication between stages, combined with the aforementioned mech- anism to detect new channels via wrapping services, e.g., a pen service offering in- formation where its encapsulated pen resource streams its digital ink on (at which channel). An advantage of the pub / sub paradigm is thereby that it naturally supports multiple subscribers observing a particular channel and processing data received on that channel. This enables multiple fan-out.
Furthermore, using a continuous service discovery in combination with the service abstraction at the PSI level, allows for dynamic discovery of available channels at the consuming processing stage (c.f., Fig. 3.8). This, in combination with a mechanism supporting subscriptions to multiple channels simultaneously allows for multiple fan- in at the consuming side: the consuming services receives data on multiple channels (subscriptions) and processes the combined data as needed.
Following this paradigm, different deployment schemes are supported as described in section 3.1.2. Pipelines can be constructed as required by the current setting, a con- cept similar to the pipeline concept employed in OpenInterface, [Lawson et al., 2009], a generic component model to construct pipelines for multimodal interaction process- ing. The flexible hotplug mechanism introduced above supports a broad variety of different deployment schemes, as well as runtime re-deployment of components, tai- lored to the requirements of the current setting.