Utility SLOC† Files
taptaudio 3794 taptaudio-pimpl, sample, streaming, taptaudio.h, backend-portaudio, backend-jack, backend-generic, backend-disabled, resample
strokereader 763 strokereader, mimiolinux, mimiodefines, schedlock, packet.h rxstring 597 rxstring, myregex
threadman 534 threadman, semwaiter sdl-server 509 sdl-server
mimiowarp 469 mimiocal
top-dir 288 scripts: cross-compile, bootstrap, branching, dependencies libminxml 261 minxml.cpp, minxml.h
handle 130 handle.h
TOTAL 7 345
Table 5.2: Cruiser Utility Libraries
†SLOC: Source Lines of Code
In future this could be replaced with a more complex representation, and new Gestures created that can process the additional fields introduced. The bounding box implementation provides a simple common ground for application writers that might not require a more complex representation of input.
5.3
Utility Libraries
The utility libraries summarised in Table5.2 were written specifically for Cruiser, but have generic applicability and have been decoupled from the Cruiser core. Cruiser is designed to work with these facilities, and their functionality is available to assist the development of plugins.
5.3.1 Audio Subsystem
For providing feedback and recording stories, Cruiser and the PhoTable application needed a flexible, cross-platform audio library. One prime requirement not provided by any library was the ability to manage multiple, simultaneous recordings and playing samples, without the need for multiple audio cards. Varying levels of support for mixing audio output exist, but none provided the versatility or level of abstraction desired for Cruiser.
For recording, it may sound strange to allow multiple simultaneous recordings, given that in our applications there is still a single microphone. However, it is useful to allow a background recording that can later be split into stories, as well as provide the facility to record explicit stories that can be replayed for immediate feedback. For playback, we also wanted to determine a volume level for each sample playing, rather than a combined level, so that feedback could be indicated over the photograph that the audio is attached to. Thus, multiple playing samples are mixed into the output, but we track the volume level that each sample contributes.
Audio libraries also tend to separate themselves from the loading, decoding, encoding and saving of files to and from disk – a library either handles files, or audible output, with
no links between. For example, portaudio22 provides low-level access to the raw Pulse
Code Modulation (PCM) data in the operating system’s sound buffer, but no facilities for loading and saving files. Libsndfile23 provides facilities for decoding and encoding a wide range of audio formats, but no facilities for actual playback or recording. Cruiser’s audio subsystem combines these two libraries and provides an elegant C++ API that protects
application writers from the low level details of audio decoding, encoding and playback. Streaming Another facility not provided by typical audio libraries is support for audio
streaming. When a sample to playback is long, or when a continuous recording is being
made, it is not feasible to store all the sound data for the audio sample in memory. The raw PCMdata that is eventually loaded into the operating system audio buffer is large – typically 176.4 kB for each second of audio, or over 10 MB per minute.
Streaming involves using a background thread to fill a fixed-size buffer insystem memory
with audio data. Meanwhile the interrupt-level function that fills the operating-system’s audio buffer just copies data from the stream buffer into the audio buffer. This setup is necessary as it is not possible to read or write to disk (nor allocate more memory) in an interrupt-level function.
Cruiser’s audio subsystem hides this complexity from the programmer. When a sample is loaded off disk (e.g. from a WAV file), or a recording started, one simply sets the stream flag, and subsequent use of that sample is streamed to or from disk.
Streaming is optional as it can create additional lag. For providing brief user feedback, such as beeps and clicks, keeping the whole sample in memory gives minimal lag and also allows the same sample to be mixed multiple times into the output stream without maintaining multiple open file handles for the same audio file.
Language-Independent The audio API has been designed in a way that not only supports multiple platforms flexibly, but also allows bindings to be created for programming languages other than C++. Swig24, the Simplified Wrapper and Interface Generator, is able
to use the C++ header file directly to generate bindings for all its supported languages,
including Python, Java, Ruby and others. This allowed the intelligence behind the digital photo album creation in the PhoTable application to be implemented in Python; using the splitting and audio filtering algorithms implemented as part of Cruiser’s audio subsystem. Resampling and encoding Through libsndfile, we gain support for reading and writing an extensive range of audio formats. This includes support for theAdaptive Differential Pulse Code Modulation (ADPCM)formats used by most digital still cameras that have support for adding an audio recording at the time of capture. Libsndfile has an elegant C
API for encoding to and from the PCMformat required for audio buffers. Cruiser adds a C++API to encapsulate the encoding and decoding processes, and coordinate the mixing
and recording of audio to and from the sound card. It also checks the sample rate of each loaded file, and automatically resamples it if required, so that it matches the format required by the sound card. This is important if we are to mix recordings from multiple sources, such as different cameras, which we cannot assume record audio at a particular sample rate.
5.3.2 Thread Management
Cruiser integrates the use of concurrent programming techniques to leverage the benefits of modern multi-core systems, and to simplify the task of making the main thread responsive,
22portaudio-
http://www.portaudio.com, verified 2008-03-19.
23
libsndfile -http://www.mega-nerd.com/libsndfile/, verified 2008-03-19.
5.3. Utility Libraries CHAPTER 5. DESIGN
which is responsible for accepting user input and drawing the display. However, concurrency can add some confusion. There are many cross-platform libraries for threading – Cruiser uses the API provided bySDL– but most libraries simply provide a wrapper around the threading APIprovided by the operating platform.
One particular problem is thread cleanup. It is often desirable to simply make a function call as normal, but have it execute in a different thread. Threading libraries generally require a function used to start a thread to meet a particular function prototype declaration – this is made more flexible in Cruiser by using C++templates. A harder problem is that the
parent thread must also clean up the thread’s execution stack when it completes, otherwise it becomes a zombie thread and will leak memory.
Cruiser provides a thread manager that checks whether threads it has started are ready for cleanup, and cleans them up automatically. For each request, it can also operate in a mode that serialises calls of the same function. So when the main thread makes 20 requests to run the load_image() function (each with a path argument) in a single iteration, rather than trying to execute all 20 requests in parallel, Cruiser’s thread manager will execute them in serial. Thus each function type started through the thread manager is treated as a background task, with requests to perform the task being queued if another request is already processing.
Combined with the event framework (§5.2.7), which can easily enqueue operations to be executed in the main thread, Cruiser is able to pick execution threads that a task should be run on. This is typically without additional work from the programmer – after writing a function ormember function, the name of the function and its arguments (or the object on which to execute the member function) are passed as arguments to the thread manager, as if it was a regular function call.
5.3.3 Input Device Framework and Calibration (strokereader)
To satisfy the goal of creating a tabletop interface, Cruiser must have support for novel hardware. In addition, because hardware independence is one of the goals of Cruiser, a flexible way to introduce inputs from new and unforeseen hardware was needed. Traditional mouse and keyboard are supported internally using SDL, and an emulation layer converts these events into the more flexible input primitive (§5.2.11.3) that is used to interpret Gestures (§5.2.10).
For other inputs, including those currently supported, the input primitive is introduced into the main thread’s event queue from a plugin. This decouples the Cruiser framework and application linkage from any dependencies introduced in supporting a particular piece of hardware. The plugin’s job is to receive hardware input in the hardware’s preferred format, and translate it into Cruiser’s input primitive. One such plugin is bundled with Cruiser; this converts input from Cruiser’s strokereader library.
Strokereader provides screen calibration and reading of hardware input from TCP
networking sockets, as well as Linux input event device nodes. This provides support for Mimio and SMARTBoards on Linux (and potentially many more, where the device supports the Human Interface Device (HID)class protocol [USB Implementers’ Forum, 2001]). It also provides support for access to Mimio stroke events in Windows (beyond what Mimio’s mouse emulation can provide).
5.3.3.1 TCP-Based Stroke Reading
In the event that library linkage is not possible, or device-node access is not provided, it may be necessary to use IPC mechanisms to communicate input to Cruiser. One very flexible mechanism is to useTCPnetworking sockets. This allows the process generating
input events to optionally run on a different computer to the Cruiser application, balancing load.
To support Mimio in Windows, a plugin was written for the Merlot application that converts Mimio stroke data, including the pen ID25, to a simple structure that is transferred over TCP in Mimio’s internal coordinate system. To avoid having to pre-configure this plugin with an Internet Protocol (IP)address to send the data to, the plugin operates as a server that listens for connections. It begins transferring stroke data once a connection is made.
Other hardware could also interface with Cruiser’s strokereader over TCP simply by using the same protocol. The coordinate system used does not matter – strokereader has built-in calibration (see below).
5.3.3.2 Linux Input Events for Mimio and SMARTBoard
On Linux, hardware input is considerably simpler than on Windows. When the strokereader library is initialised it simply tries to open every file of the form /dev/input/event*26, and begins reading from them. Those already reserved for input for the windowing system (e.g. the mouse) will be inaccessible – these events will be received via SDLinstead.
Data read from these files has a known structure, which strokereader will parse, looking for patterns. If the pattern does not look like a stroke-input device (e.g. extra keyboards), it will be ignored. Otherwise, a fallback mechanism picks a handler. Mimio has some peculiarities that must be dealt with, otherwise a generic HID converter is used, which handles e.g. SMARTBoard.
These input events will use a coordinate system that is hardware-dependent. Strokereader’s calibrator converts them to screen coordinates.
5.3.3.3 Calibration
Strokereader provides screen calibration by translating a hardware device’s native coordinate system into screen pixel coordinates, using a radial basis function neural network [Park and Sandberg, 1991]. Using a neural network may sound excessive, butRadial Basis Function (RBF) neural networks have some properties that are well-suited to screen calibration, and
solve particular issues with calibrating projected displays.
Projected displays can be subjected to keystoning, stretching and warping (e.g. due to reflection off a mirror). In the ideal situation, a projector would be chosen for a particular table, positioned carefully and fixed with scaffolding so that it projects a rectangular image with perfectly square pixels. In the real world, however, this is simply not the case. If we want to quickly bring tabletop interfaces into peoples’ homes (e.g. by using a regular white table and a mirror attached to a home theatre projector), we can’t rely on consumers employing theatre professionals to set up their interactive coffee table. Tiled projected displays can also introduce a discontinuity where they join.
Another problem is the relation of the input device to the display. Ideally, they would be perfectly orthogonal and use the same origin. But this might not be possible due to seating arrangements, table shape, or a desire to reposition the device off the table surface to avoid interference. Such repositioning of the input device introduces further problems of rotation and translation, or possibly arbitrary warping, that need to be solved.
The goal of screen calibration is to convert points from one coordinate system to another:
25This is unavailable using mouse emulation.
5.3. Utility Libraries CHAPTER 5. DESIGN
f (xd, yd) = (xs, ys)
f : <2 → <2
The task, then, is to determine f . RBFNeural networks provide universal approximation [Park and Sandberg, 1991], and so can approximate this function from training examples. Three examples is the minimum for linear interpolation of points in two dimensions. More examples allow the function to be approximated more accurately. Any number of training examples can be used. By default, Cruiser uses eight.
Solving the network merely involves solving two sets of simultaneous linear equations – there is a set for each of two dimensions each with one equation for each training example. The RBF equations are of the form:
y(x1) = w1e kx1−x1k2 2σ + w2e kx1−x2k2 2σ + · · · + wke kx1−xkk2 2σ y(x2) = w1e kx2−x1k2 2σ + w2e kx2−x2k2 2σ + · · · + wke kx2−xkk2 2σ .. . y(xk) = w1e kxk−x1k2 2σ + w2e kxk−x2k2 2σ + · · · + wke kxk−xkk2 2σ
Where the xk ∈ <2 are the training examples, in the input device’s native coordinate
system. One set of equations is solved for the x screen coordinate, and one set for the y
screen coordinate, resulting in two weight vectors.
The lapack27 Fortran library is used to rapidly solve these sets of equations to determine the weight vectors in the calibrator program. This takes a fraction of a second. Once solved, the weight vectors are saved to a file, which is later loaded by the strokereader library into a read only neural network, that can’t learn new examples. A straightforward function substitution is able to smoothly interpolate points between the training examples, so lapack does not need to be used in strokereader itself. This makes the operation that converts the coordinates during regular operation very fast28.
5.3.4 Calibrator Program
The calibrator program is an application separate from Cruiser, used to calibrate the
strokereader library. It starts showing a test pattern, filling the screen, so that a projector
may synchronise with the video signal accurately. It also shows the first calibration target. When input is received, it is calibrated with each target in turn. By default, 8 targets are shown in succession around the screen, but any number of points may be calibrated. A
divide and conquer algorithm determines where each point should be placed on the screen.
During calibration, a reverse-substitution of points, along with a linear mapping of the coordinate system to screen coordinates, gives feedback for the calibration. A square lattice of points is fed through the neural network to show the calibration mesh on screen. If there is severe distortion in the mesh, it may indicate that extraneous input was received (that may have been calibrated with the incorrect target), or perhaps that the input device
is positioned so that some of the display is out of its operational range.
27
http://www.netlib.org/lapack/verified 2008-03-20.
28In one test using 8 training examples, without optimisations, 500 000 coordinates could be converted in
less than 1 second; when the test program is compiled with the -O3 -ffast-math optimisations, this increased to 1.5 million. Since a typical maximum real rate is 60 coordinates per user per second, speed of the neural network is clearly not an issue. See §6.4.2for machine specifications.
Utility SLOC† Files
libfolder 1276 filedir, folder, image-search
libdb 834 metadb
libvideo 589 video, tcprobe.h
plug 390 plugin-init, plugin.h, debug-init, servers-init libbrowser 307 browser, sft-layout, browser-attachment libframe 271 framepic, anotoimage
libslider 236 sliderpic
toplevel 62 main.cpp
TOTAL 3 965
Table 5.3: Cruiser Plugin Libraries
†SLOC: Source Lines of Code
5.3.5 Other Utilities
A range of other utilities provide functionality that assists the development of plugins.
5.3.5.1 Cross-Platform TCP Servers
This is a threaded TCP server, similar to that provided by the GNU Common C++and
POCO29 libraries, but built using SDL_net for its cross-platform implementation of TCP
sockets. It allows the simple creation of a TCP server, that listens on a specified port. When a connection is made, a new thread is automatically started, which calls the handle function specified when the server is defined.
5.3.5.2 Reference-Counting Handle
The Handle class provides a reference-counted handle to a pointer, similar to Boost’s shared_ptr. It allows an object allocated with new to be automatically deleted when the last reference to it disappears (goes out of scope). It additionally provides an automatic
dynamic_cast that throws an exception on failure. C++ name demangling is used to
improve the format of the exception error message.
5.3.5.3 “Minimal” XML Library
A non-validating (but syntax-checking) XMLparser is provided. This has a simple 250-line implementation and a clean interface that effectively tokenises the XML file into tags (with attributes) and character data. It is provided for some plugins which store their
configuration in anXMLfile.