3. System Implementation
4.2. Experience with the cameraman module
4.2.1. Performance of the image processing algorithms
The virtual cameraman†s main image processing algorithms are the motion detection and the skin color detection. On the one hand, they provide the measured values for
the cameraman†s sensor input while, on the other hand, they trigger the controlling algorithms in order to react in an appropriate way to the occurrences in the images.
For motion detection, we have implemented two slightly different algorithms because of the different origin of the images. One algorithm is used only for the slides video server output which is normally characterized by a very static image. Only some con- verter noise has to be filtered out in order to avoid false alarms of detected motion. The other algorithm has to cope with images of the real world which, besides camera sensor noise, may contain arbitrary motion. It is therefore more complex to differenti- ate between motion in the foreground, which normally is the motion we want to de- tect, and motion in the background, like trees waving in the wind which is of no inter- est for us.
For the first algorithm, we use the Frame Differencing approach and simply determine the distance of two pixels in the RGB color space in the difference of two consecutive images to detect changes. In order to distinguish between converter noise and motion in the image, the distance must be larger than a threshold, calibrated for the video server.
As a trade-off between precision and speed, we decided to tile the image before apply- ing any algorithm to it. Thus, we do not check the whole image on a per pixel base for motion but check whether the motion in a tile is larger than the threshold. The total percentage of motion in the image is then calculated by dividing the number of tiles in which motion was above the threshold by the total number of tiles in the image. This of course is only an approximate result but is still good enough for our purpose. The main advantage comes not only from a single algorithm but from combining it with others. For example, we will search for motion only in tiles which were already marked by the skin color detection algorithm when looking for a person moving around.
The latter algorithm to detect motion in real world images is a bit more complex. We use the Background Subtraction approach to avoid a background leading to false alarms. At first, we establish a background model for the image based on the Running
Gaussian Average algorithm which gets initialized with the first image
0 0 IMAGE
i i
i BG IMAGE
BG •(1†•)Ž †1…•Ž
Definition/Formula 11: Running Gaussian Average formula to update the background model.
The factor • defines how fast a new object gets integrated into the background
model. It can take values in the range [0;1]. The closer the • value is to one, the
faster new objects get incorporated into the background model. While an• value of
one leads to the same behavior the Frame Differencing approach, so called „ghosts… will occur when using a smaller • value. The ghosts occur when, e.g., a slowly mov-
ing object gets incorporated into the background model before moving further. Again, it is a trade-off to choose this • value. For our prototype, • •0.5 works fine. We now subtract the background model from the current image and again determine the tiles in which motion occurred; in this way we are able to roughly determine the per- centage of motion in the current image.
Both algorithms are very resource-friendly and run in real-time. So, their perform- ances are definitely sufficient for our system.
For skin color detection, we only use one algorithm as we expect skin color only in real-world images. It is based on the algorithm of the MoCA library (MoCA, 2006) and consists of two steps: At first, the red values and the green values of an image get normalized in order to make the algorithm more robust against changes of the bright- ness, Formula 12 shows the details:
1 1 … … … • … … … • BLUE GREEN RED GREEN GREEN BLUE GREEN RED RED RED norm norm
Definition/Formula 12: Normalizing red and green values for skin color detection.
The pixels are assumed to show skin color if both of their values are in the following ranges: [0.37• RED•0.58] and [0.26• GREEN •0.36].
This algorithm works fast and fairly well but cannot distinguish between real skin and items having a similar color. Therefore, we combined the tiles in which skin color was detected with the tiles in which motion was detected as most people are always mov- ing a little bit. The result is sufficient for our needs and it still runs in real time.
Concerning the image processing algorithms, the virtual cameraman†s performance definitely fulfills our needs. They are able to provide the necessary information for calculating the sensor inputs for the virtual director and for triggering the autonomous camera control procedures in real-time.