3.2 Network Layer QoS
4.2.4 Real-Time Audio Processing and Task Scheduling
During the implementation phase of WebAudio, problems regarding the scheduling of con- current tasks (for example, reading from the socket and writing data to the sound device) with respect to their time constraints were experienced.
In order to illustrate the problem, a short description of the task scheduling and the strict time constraints of real-time audio is provided. Interactive real-time audio streams use, according to the discussion in section 3.1.1, very small packets, or in other words only few audio samples, to minimize the end-to-end delay caused by packetization. In practice, real-time audio tools often use a packet size that encompasses as little as 20 ms or 40 ms audio. From now on this time is referred to as Packet Playout Time (PPT). The PPT is usually a multiple of the audio frame length (see also section 3.1.1). As a result, the WebAudio server, on the one hand, needs to capture, encode and send an audio packet within any PPT period. The client, on the other side, needs to receive, decode and playout the audio packet within this interval. The problem becomes obvious when the process scheduling granularity of common operating systems is examined. Unix systems usually have maximum scheduling units of approximately 5-20 ms. This means that all other
“runnable” processes19 are scheduled (at the most the maximum scheduling unit) before
the WebAudio process is activated again. Fortunately most processes are in blocking state and “runnable” processes are often consuming only small amounts of their processing time such that the scheduling delay of the WebAudio process is usually short enough to meet the strict time constraints of real-time audio. However, if the machine load is high or time 19All processes, which are neither “blocked” while waiting for a resource nor “stopped”, are referred to
4.2. IMPLEMENTATION ISSUES 131 consuming operations (for example, disk access) are scheduled, deadlines of the real-time processing within WebAudio might be exceeded.
An important implementation issue is that the task that needs to be processed next is always scheduled first. Rather than spending the processing time on a task which is ahead of its time schedule, the most urgent task should be processed first. For example, rather than reading the whole receiving buffer or en-/decoding several packets of a queue at once, it is more important to play back the packets whose playout time have exceeded.
Since the processing of the audio data within the WebAudio client and server differ signif- icantly from each other, they are considered separately here.
The implementation of the server application regarding to scheduling is less critical than the client application. Since the data flow is predestined by the audio capturing module – only one time-critical module – the application can simply block on the sound device until a new audio frame can be read. As soon as the “read” call returns, the frame is encoded. If sufficient audio frames for a packet are captured, the packet is sent to all receivers of the stream. Before the server process blocks again (while waiting for the next audio frame to be read), it processes pending stream control requests.
The client application, in contrast, requires a more careful design with respect to task scheduling. The data flow is predestined by the audio playback module and the RTP receiver module – two independent, time-critical modules which have to be processed con- currently. The audio playback module has to playout a new frame periodically. The time interval is determined by the frame length. Since the lack of audio samples in the sound device playout buffer immediately leads to disturbing crackles in the signal, the client has to make sure that the playout buffer never runs empty. The second time-critical module is the RTP receiver. As soon as a new audio packet arrives at the UDP port, the packet must be unpacked, decoded and queued. The playout time, depending on the current playout delay estimation, is assigned. As a result, the client’s task scheduler must be designed such that it guarantees to meet the time constraints of both time-critical modules. The use of blocking system calls, while waiting for new data being received, on the one hand, and waiting for the sound device to play the next frames, on the other hand, would not provide concurrency.
The asynchronous “select” mechanism cannot be used within the audio playback module since the system call would return as soon as a few samples could be written to the sound device. This, however, would be virtually at any time, since the sound device permanently plays audio samples from the buffer. Recent work in this area [Riz97] extended the sound device driver of USS such that the synchronous “select” behavior can be programmed to return only if a minimum threshold is under-run. However, having a competing “select” call for both time-critical modules, is not a good solution. The module invoking the blocking “select” call first would block the process until the resources becomes available.
Figure 4.7 illustrates how the time-critical task scheduling problem is solved within the WebAudio client.
Write Frame
Check Buffer Network Select
Read Packet
if new packet were received otherwise
if buffer time >> opt buffer time
(optimal buffer time + max scheduling unit)
Network Module Audio Module Scheduler WebAudio Client if buffer time > Sleep
Figure 4.7: Time-Critical Task Scheduling within the WebAudio Client
The client implementation makes use of the sound device driver feature that determines
the amount of audio samples in the playout buffer 20. Based on the scheduling behavior
of the system, the client computes dynamically the optimal buffering time of frames in the sound device such that the buffer hardly runs empty. This sophisticated algorithm “guarantees” that the audio signal is not permanently disturbed due to buffer under-runs caused by scheduling irregularities in the client system.
The calculation of the optimal buffering time is accomplished by the following algorithm:
based on the past scheduling behavior and a threshold percentage Tsuccess, the optimal
buffering time is estimated such that Tsuccess percent (usually Tsuccess > 95%) of the past
scheduling cycles were re-scheduled in less or equal time than the optimal buffering time. This adaptive mechanism estimates the optimal buffering time depending on changes in the scheduling behavior. Such changes are, for example, caused by an increase or decrease of the processing load. The adaptive behavior guarantees optimal performance since it tries to keep the buffering delay and thus the total end-to-end delay as small as possible. A more detailed description of this adaptive buffering time estimation is provided in section 4.2.5 where the same algorithm is used for the estimation of the optimal packet playout delay.
Experiments with FreeBSD and Linux on different system architectures, such as Intel Pentium II, Mobile Pentium 166 and Intel Pentium 90, have shown that the adaptive buffering time estimation operates well for these systems. Debug traces indicated that the 20The Unix device control call “ioctl” allows to request the number of bytes that can be written and the
4.2. IMPLEMENTATION ISSUES 133 algorithm properly adapts to different levels of the system load. Since process scheduling behavior is highly dependent on the operating system, the algorithm might have to be adjusted when the application is ported to Microsoft Windows systems. The buffering time estimation algorithm resides in the Audio class which has to be specially ported in any case.
In retrospect, one can summarize that the implementation of the task scheduling within WebAudio, and in particular the client application, was not straightforward but has been successfully solved. Using an adaptive buffering time estimation to compensate for the dynamics within process scheduling has been proven to be a useful enhancement.