6 USING DISTRIBUTED RECORDING TO IMPROVE QUALITY
6.3 A DISTRIBUTED RECORDING ARCHITECTURE
6.3.5 System operation
The task of recording a multi-way multicast session in a distributed fashion is divided into a number of specific operations. Some of these operations are performed continuously and others during a specific phase of the recording. In this section, I describe in detail the various stages of the system’s operation. For some stages, potential solutions are briefly suggested; these may exist as part of work already carried out. This is possible since many of the operations have been already
considered; although within a different application context, the idea may also be suitable for the architecture proposed here.
• Existence: All recorders need to be aware of the existence and the location of other
recorders. One way of achieving this, requires the recorders to advertise their presence by periodically sending a message to a well-known multicast address. This message should contain information about the location of the recorder and its availability status (e.g. load, sessions currently being recorded and replayed, disc space available etc). This allows each recorder to maintain a list of potential ‘co-operators’ which could serve as recording caches. A text-based protocol to achieve something similar has already been designed at the University of Mannheim as part of the mVoD system [HOL97]; it can be modified and extended to completely suit MMCR.
• Recording setup by the client: A client contacts a recorder to start recording a
session. For the purposes of recording that session, this recorder gets the role of the Recording Archive; it is responsible for identifying and co-ordinating the recording caches used to assist in the recording. The selection of suitable caches amongst those available, might be achieved via user selection or via a monitoring mechanism. An example of a monitoring mechanism is proposed in [KOU98b]; it advertises received loss patterns, and nearby receivers seeing less loss respond. This could also be used to identify better recording caches as the conference proceeds.
• Recording control by the client: The client may issue control commands during
recording. This procedure is described in Section 4.3 and is on-going throughout the recording; commands received by the controlling recorder are propagated to the recording caches.
• Recording caches record all streams: The archive and the caches identified during
the recording setup phase start recording all the streams of the session and store the data they receive.
• Allocating sources to a nearby cache: Data repair will take place between every
source (SI, S2, S3, S4) and its nearest cache. There is an allocation stage where a source identifies its nearest cache and ‘binds’ to it. Allocation might be achieved by a
Time-To-Live (TTL) based expanding ring search algorithm; the recording agent at the source sends allocation request messages with increasing TTL, until it receives a response from a cache. If the actual cache changes during the conference, then the new cache notifies the affected sources of the change. Senders may also choose to bind to a different cache during the conference if the performance metrics (e.g. loss) indicate it to be necessary.
• First repair from recording agents: When a cache detects lost data from the
source, it sends a retransmission request which is fulfilled by the source’s recording agent. In this way, the complete data set for each source is recorded by at least one of the caches. The recording agent only has to hold a limited amount of data (e.g. the last 5 seconds), as correct reception will be periodically acknowledged. A mechanism for the request and repair of data between recording agent and recording cache, is presented in the next section. If available, exact loss and delay information is periodically transmitted by the recording agent to the recording cache.
• Second repair between recording caches: The second stage of repair, involves the
cooperation of the caches to construct the final version(s) of the recording. It is possible that this phase occurs in parallel to the session. However, it is better to avoid generating additional data while the session takes place; this repair operation can be performed after the session has ended. With regards to the repair method, there are two options; which one is used depends on user requirements and the current network conditions.
Over a low-loss network, it should be possible to perform repair at all the caches resulting in multiple loss-free copies of the recording; each copy also represents a different viewpoint. This could be achieved by using a reliable multicast mechanism; for example, using a slightly modified version of SRM, the caches can take turns in multicasting request messages which the other caches fulfil until all the caches have complete data sets. However, it is pointless to attempt multiple location repair over a lossy network; in this case repair should occur at a single location, most likely the Recording Archive (RA). The reconstructing cache sends the details of the packets it is missing to other caches. This can be achieved via multicast or unicast taking each cache in turn. The caches may then send the requested packets via a UDP unicast connection or use TCP for reliable delivery.
At the end of the system’s operation, and provided that all steps have been successfully completed, there will be one or multiple recordings of the session. If all the data from every source has been received by at least one cache, these recordings will be loss-free. It is highly likely, that the resulting recordings will be different to what any participant has seen; this is both in terms of the loss and the delay.
It was clearly not possible to investigate in depth all the steps in the system’s operation. I hence concentrated in designing a protocol for ‘localised’ data repair between sender agents and recording caches. It is assumed that the other parts in the system’s functionality are in place and that sources have been allocated to their nearest cache.