3. System Implementation
4.2. Experience with the cameraman module
4.2.2. Performance of the camera controlling algorithms
Besides the parameters already mentioned in Section 3.2, the virtual cameraman mod- ule is also able to steer the pan as well as the tilt and zoom of the PTZ cameras. We defined a Cartesian coordinate system for our lecture hall and set the zero degree an- gles of the cameras parallel to the x-axis; in this way we made sure that there were no constraints concerning the valid ranges of pan and tilt angles of the cameras. For a precise comparison, we measured all lengths and positions using a laser-based dis- tance measuring device. We allow two addressing modes for the cameras, absolute and relative; While the absolute addressing is used, e.g., for pointing the camera on a questioner, the relative addressing is used, e.g., for following a moving person. The built-in definition of the cameras of the manufacturer sets that negative values stand for angles left of the zero degree adjustment for panning and for angles below the horizontal adjustment for tilting. In order to keep the calculation of camera motion angles easy, we made sure that the cameras were located on the opposite site of the origin of the coordinate system, which means that negative angle values have the same meaning as those built into the cameras.
The coordinates of questioners in the room, as transmitted by the sensor tools module, refer to this coordinates system which enables us to use absolute addressing for all camera movements. First, we determine the distance vector between the position of the camera and the position of the questioner. The arc tangent of the x-value and the y-value of the distance vector result in the pan angle for the camera. If the target is left of the camera position the negative angle has to be taken. Second, we calculate the tilt angle in the same way; if the target is below the camera position, the negative angle has to be taken. Third, we want to show approximately three seats, the questioner and his or her right and left neighbors, to overcome possible position estimation errors of the indoor positioning system used in the sensor module. We defined 1.65 meters as the width of three neighbored seats (Worig). We need three technical specifications of
and the minimal focal distance (fmin). Having the length of the three-dimensional dis-
tance vector (d), we first calculate the focal distance (fDist) needed to show an object
of width (Worig=1.65m) in this distance to fill the width of the image by means of the
theorem on intersecting lines. Then, we calculate the necessary zoom factor (Z) by taking the ratio of fDist to fmin. Of course, Z must be greater than or equal to 1, and Z
must be less than or equal to Zmax:
) 1 ( | max min Z Z f f Z W d W f W d W f Dist orig CCD Dist orig CCD Dist • • • Ž • • •
Definition/Formula 13: Calculating the zoom factor to frame a questioner.
As an example for a distance of four meters, the zoom factor results in:
2.13 0041 . 0 ) 65 . 1 4 0036 . 0 ( ) ( min ‡ Ž • Ž • f W d W Z orig CCD
Definition/Formula 14: Calculating the zoom factor for a distance of four m.
The last step is to map the calculated zoom factor to a zoom parameter value of the camera. As no formula was available to do that, we used splines to approximate the values. The web interface of the cameras of our manufacturer allows setting an integer zoom factor and then reading which parameter value was used. Having the values for all integer zoom factors from 1 to 18, we used these data pairs to calculate the cubic splines. Thus, we are able to precisely calculate the zoom parameter value by using the correct spline of the according zoom factor. Now, the pan and tilt angles and the zoom parameter value are sent to the camera interface.
As all calculation steps needed for the absolute addressing of the cameras only use basic arithmetic operations, raising to the power of at most three and applying the arc tangent, the entire calculation can easily be done in real-time.
The second way to control the cameras is relative addressing. It is useful for a camera follow-up of a person. At first, the number of faces in the image is determined. If no face is found nothing happens. If one face is detected, it is used. If more than two
faces are found, the group of faces that takes the largest space in the image is used; if exactly two faces are found, some complex checks take place:
- Check whether one face is above the other, take the upper one.
- Check whether both faces are close together, and then treat them as one area.
- Check the designated alignment. Take the left face if left alignment is desired or take the right face if right alignment is desired.
After these checks, one face area remains. The coordinates of its center are deter- mined and compared with the coordinates of the alignment point, either more on the left side or more on the right side. If the difference is above a threshold, the center coordinates of the face area are set as the new center coordinates of the image, and the values are sent to the camera interface.
The calculation for the new center is very fast but there is a disadvantage of the cam- eras we use which is more severe for relative addressing than for absolute addressing: the cameras do not report when they have finished an operation. While absolute ad- dressing sets the new coordinates once, there is no need to wait for a completion ac- knowledgment from the camera. In contrast, the relative addressing is used for follow- ups of the camera, and therefore it is an enduring process. As it relies on image proc- essing, it depends on an image taken after the last movement is finished. The time span between two consecutive useful images is about 1.5 seconds. Therefore, it is impossible to follow-up fast moving persons or persons who are very close to the camera, as even small position changes from one image to the other lead to large changes of the camera angles.
A human cameraman overcomes this problem by first zooming out and only if this measure is not sufficient he or she follows the person. That is why we implemented such an algorithm taking the motion of the image into account. Every time the virtual cameraman module detects motion, it repeatedly zooms out a little until the percent- age of motion in the image is below a threshold. If there is only little motion in the image for a while, the virtual cameraman again zooms in. The advantage is that zoom- ing is performed very quickly by the cameras so we do not have to wait until it is fin- ished. In addition, we have made sure that this algorithm is executed before we try to follow-up a person.
Finally, the algorithms themselves are definitely fast enough, and in most combina- tions they fulfill our expectations. Especially, all the algorithms not relying on a fin- ished camera operation are working perfectly. Only the follow-up of a person is a lit- tle slow but it is still sufficient for the lecturer sitting in front of the audience, as is always the case in our scenarios. Nevertheless, it should be possible in future work to optimize the behavior of the virtual cameraman module in this respect, for example by fostering parallelization of some algorithms.
4.2.3. Overall Performance
The virtual cameraman module has proved its ability to process all necessary tasks in real-time. The control loop approach works as expected and provides all necessary steps. These steps, the algorithms of image processing and controlling the camera accordingly as well as the communication with the virtual director, have a certain amount of complexity which should not be underestimated. That is why we put the cameraman to sleep for 550 ms in each run of the loop. This value is configurable, it has been evaluated to work well for the computer we elected to run all four camera- men on. Figure 33 shows an example of the status message displays of three of the four cameramen during a lecture recording as the fourth did not properly fit in the image any more.
Figure 33: Exemplary status message displays of three virtual cameramen.
The complex combination of image processing algorithms and under some circum- stances the waiting for the camera to finish the last movement order can be improved in future work by introducing asynchronous ways of calculating new instructions, sending them, getting feedback of instruction completion, and minimizing sleeping
times versus the load generated. Nevertheless, the virtual cameraman module is still sufficient for the activities in a lecture hall as we made sure that all instructions from the director get processed even if they are transmitted while the cameraman is sleep- ing.