3. System Implementation
4.5. Experience with the Sound Engineer module
4.5.1. Audio Mixing and Mastering
In our prototype setup, we get up to three audio streams from the AV servers of the lecturer, of the slide PC, and of the audience. As we have routed the audio of the lec- turer into his or her presentation computer and combined it with any sounds of anima-
tions or simulations of the computer, we only use the slides audio stream for the pre- mixed audio of the lecturer and the computer.
All audio coming from the questioner via the QM client and the QM server is trans- mitted via the audience audio stream. So, we only have to mix two different audio streams in the AV Mixer/Recorder. Nevertheless, all algorithms are built for process- ing all three audio sources as this makes no big difference in the resulting load.
As already mentioned in Section 3.4, the algorithms of applying the noise gate for normalization, for re-sampling and for mixing are robust, and work in real-time. While the function of all algorithms except normalizing the volume is based on an on- sample base, i.e., the smallest unit to operate on, are only one or two samples. In con- trast, normalizing operates only on sample sets of audio. The reason is that the algo- rithm has to find the loudest sample in the sample set of audio to determine the factor by which the whole sample set gets amplified. If the selected part is too small the al- gorithm only takes a local maximum into account which is not representative for the entire recording. Therefore, normalizing is mostly done as the last step during master- ing, taking the whole recording into account in order to use the global maximum.
As we have to bring all audio streams to the same volume level before mixing them, we have to amplify them if they do not contain only silence. There are two possibili- ties to amplify signals in order to achieve comparable volume levels:
1. Normalizing, i.e., determine the loudest sample, calculate the factor to bring this loudest sample to a defined volume level, and amplify the whole sample set with this factor,
2. Compressing/limiting, i.e., define a characteristic curve with different am- plification factors, dependent on the input volume level. Figure 34 shows such a curve as an example.
Figure 34: Screen shot of an exemplary curve of a combined noise-gate, compressor, and limiter. Input-dB on the x-axis, output-dB on the y-axis.
The horizontal axis shows the input signal strength in [dB] while the vertical axis shows the resulting output signal strength in [dB]. The line following the first bisec- tion is the neutral element for a compressor/limiter; it does not change the amplifica- tion. To positively amplify a signal, the line must be above the first bisection while a line below diminishes a signal. The example in Figure 34 shows a „noise gate… for the input signal strengths of -100 dB to -80 db, so these input levels are mapped to -100 db for the output signal strength. In the range of -80 dB to -20 db for the input signal strengths, a linearly increasing amplification takes place, mapping them to the range of -80 dB to -9 dB. This part „compresses… the input signal. Input signal strengths in the range of -20 dB to 0 dB gets „hard limited…, i.e., strictly mapped to the output sig- nal strength of -9 dB. The changes between noise gate and compression and between compression and limitation are done in this example by so called „hard knees… which change the behavior in a very abrupt way. In contrast, there is the so-called „soft knee… rounding the corners, and therefore the transfer from one mode to the other is less aggressive.
The advantages of a compressor/limiter algorithm are that it can operate on an on- sample base and that it can combine many different tasks easily into one processing step, e.g., „noise-gating…, „compressing…, and „limiting…. The disadvantage is that the algorithm is much more complex than normalizing, and we cannot implement and test
it successfully due to the time constraints we encountered. Nevertheless, it is planned for future work.
For our prototype, we amplify the audio data streams using the normalizing algorithm. As it needs to operate on a sample set, we need to define a useful one. At first glance, it may be useful to process the sample set of one audio frame of 40 ms at once. Unfor- tunately, if we normalize these small parts of audio data, every part will be amplified with a different factor, leading to block artefacts at the transition from the end of one part to the beginning of the next part. Due to the different amplifications, the slope of the curve changes significantly in a very short time, leading to a clicking noise. As such noises occur repeatedly every 40 ms, the whole recording is spoiled by crack- lings, making the entire processing unusable. Figure 35 shows two 440 Hz sine curves, one with different amplifications and one without. Inside the red mark, there is the source of the clicking noise.
Figure 35: A 440 Hz sine curve with different amplification factors in adjacent audio frames.
The next way of defining a useful sample set is to use the entire recording at once. So, a global maximum can be taken into account which normally leads to very good re- sults. This is true for sources with continuous audio signals and without any silence in it. Due to the question ‡ answer interaction of questioner and lecturer, we anticipate silent parts in the audio data streams which can easily lead to long parts of amplified silence producing significant noise. Thus, this way is not optimal for our prototype. This is a consequence of the way we implemented the noise gate: it checks for silence in the whole sample set and if only silence is found the entire sample set is neglected.
But it does not check for silent parts inside the sample set in order to keep it simple and fast. Besides this theoretical drawback, we would have to face memory allocation problems when loading three audio streams of 90 minutes into the memory in order to find the global maximum level in the audio streams.
Thus, we need to find a compromise to find amplification factors based on a signifi- cant amount of the data, as well as to suppress noise as well as possible. We found this compromise in the following approach: we save all incoming audio data directly onto the hard disk and process it afterwards producing a single WAV-file. This file is used as the final sound track for the video file. This reduces the load of the AV Mixer/Recorder significantly but still enables us to select any useful sample set size for normalizing. During our tests, we observed the duration of one second to be a use- ful sample set size as it leads to similar amplification factors and therefore to rare clicking noises. Another consequence is that all durations of amplified silence occur- ring before a questioner asks are shorter than one second which is noticeable but not very disturbing.
4.5.2. Overall Performance
The overall performance of the virtual sound engineer is good, and all algorithms are fast enough to process the audio data in real-time. Due to the already mentioned time constraints and CPU requirements of the AV Mixer/Recorder, we had to implement a work-around to prove our concept, which we successfully did. So, it fits perfectly into our virtual camera team.
In addition, we pointed out the planned future work to optimize the virtual sound en- gineer; it will be a perfect supplement to future work on the AV Mixer/Recorder.