Extending Metadata Throughout the Lifecycle

Lifecycle

By categorising the functionality of a distributed multimedia system into transport, media, and metadata, layers, and mapping the extent of these layers through the continuous media lifecycle (figure 3.1), it becomes apparent that while transport and media layer functionality are present at all stages of the lifecycle, metadata use and functionality is concentrated in the generation and presentation stages.

While techniques exist for annotating structure for a piece of temporal media during generation, and utilising and displaying this metadata at presentation, there is no means by which to process and manipulate associated metadata during transmission, despite transmission being the defining feature of streamed multimedia.

This is not least because metadata tends to be applied when considering temporal media as a finite, closed, encapsulated, element within a document, rather than considering structure within the stream itself - as the streaming takes place. When referencing occurs within time-based media (as defined by the

Amsterdam Hypermedia Model, and implemented in SMIL), it takes place in the context of a presentation, where the streaming media forms an embedded

element of a multimedia document, albeit a temporal one.

When annotating continuous media, time should not be limited to serving as an index into what is otherwise an isolated and self-contained media object. Just as the media changes and evolves over time, so does the metadata associated with it. The same principles of structured data, hypertext, and knowledge, apply continuously to the content represented by a media stream as it progresses; the hyperstructure itself has a temporal dimension, and must be represented

throughout the lifecycle in parallel with the media.

3.2.1 Timeliness of the Transmission Stage

To see where and how metadata can be used in support of temporal media, we first categorise the multimedia data on two axes: when it is transmitted, and how it is transmitted. For each of these we consider the implications for the

associated metadata.

Stored multimedia data, such as audio or video recordings, are persistent entities which can be described by their associated metadata in exactly the same way as any document, but with a temporal element. In a media-on-demand context, the metadata might be used to assist in finding, delivering and navigating the

multimedia material; for example, the creation of movies by assembling video clips (‘sharable video’, see [112]) requires and creates metadata. The quantity of meta-information is likely to be small in comparison to the size of the original material. There may be little justification for streaming the metadata, which can be processed in advance of streaming the multimedia information, unless the metadata must be streamed due to sheer volume of data.

With live media, the metadata describing the structure of the content might not be available in advance, instead becoming available during the generation of the multimedia stream (or streams). For example, this metadata could include

information about camera positions, or decisions the producer is making

on-the-fly; live interactions might generate links [116]. It could also result from real-time processing of the stream, such as some form of classification,

segmentation or annotation.

In some cases although the multimedia content is stored, it will be viewed as part of an interactive (or collaborative) user experience. Any metadata generated through live interactions with the users as it is viewed (rather than recorded or stored) will be live, as if the media were too.

It is sometimes acceptable to introduce a delay in live media, which can give time for a pipeline of intermediate processes. An analogous situation can arise if the multimedia data takes a long time to present: the first viewer of a

presentation lasting several hours may provide useful annotations for a second viewer who accesses the presentation before the first has completed authoring - if metadata were preloaded at the start of the presentation the second user would not benefit from the information provided by the first (although the media stream is not live, the metadata is). In both cases preparation of the metadata cannot be guaranteed in advance, and must be streamed at the same time as the multimedia content. In some other situations the streaming of pre-existing metadata could be necessitated by the absence of any other form of metadata transport; for example, a receive-only device joining a live radio or TV broadcast at an arbitrary point in time.

Live media that connects two or more parties in real-time, such as video-conferencing, is the most demanding scenario. Session and group membership metadata may be available in advance, but content metadata is created on-the-fly and there is little opportunity for any pre-processing, as there are tight time constraints on this style of synchronous interaction. For example, the anchor generation system in OvalTine [134] was designed with

video-conferencing in mind. Collaborative virtual environments also impose real-time aspects, together with the prospect of a wealth of metadata associated with the objects involved as well as the people.

In document Continuous metadata flows for distributed multimedia (Page 80-83)