6. Summary
6.5. Outlook
From our current point of view, there are three important ways how to continue our work:
1. working on improvements of the current implementation concerning, e.g., faster or more precise algorithms especially in image and video processing, or more intuitive user interfaces;
2. working on new implementations which add new features to our prototype;
3. transferring our approach to a wider set of applications.
The following Sections discuss the details of these three ways.
6.5.1. Improving the Current Prototype
Image processing algorithms naturally are a good area to improve performance and precision. In order to improve a shot ‡ counter-shot scenario, the implementation of face detection, of gaze direction detection, and of visual person localization would be very helpful. Even more, any algorithm able to extract semantic meaning out of image processing will improve the system, e.g., reliable handraising detection. Furthermore, audio processing algorithms are able to support not only the audio quality but also the location of persons.
This leads directly to an important point: Up to now the question announcement de- tection is based on a GUI on a PDA, and hitting a button on a PDA really is a non- intuitive way to announce a question. Therefore, research on how to detect a hand- raising questioner in the audience and determine his or her position by using video and audio processing techniques has just been done by (Herweh, 2009).
Another internal improvement is to implement more cinematographic rules. This re- quires additional sensors and investigations how to interpret their signals and meas- urements, similar to the way a human camera team would interpret them.
Improvements concerning the usability are twofold: from the point of view of the sys- tem operator, it would be useful to implement an automated sequence to start up the different software parts of the system instead of doing it manually. It would also be nice to improve the automation of the transcoding system.
From the lecturer†s point of view, the GUI can be improved, e.g., by giving the lec- turer the detailed control when to block questions and when to limit the shots to be recorded to those including the slides camera, e.g., when the lecturer shows anima- tions, simulations, or videos. Furthermore, showing the position of an announcing questioner graphically on the GUI will help the lecturer to get into eye contact with the questioner. It would also be nice to provide improved support for questioners in remote lecture halls.
From the questioner†s point of view, implementing the announcement detection in a more intuitive way and porting the QM client software from PDAs to notebooks would be useful improvements. Even though it is necessary to port the software to different operating systems it is useful as most students already use a notebook or a netbook during the lecture. Problems with insufficient batteries will disappear, and the quality of audio transmitted data over WLAN will increase as notebooks and netbooks usually have better equipment built in. The most important operating systems to port the QM questioner software to are Microsoft Windows, Apple Mac OS and Linux. As the software solves relatively simple tasks, and the protocols used are simple, too, porting can be done e.g., by students during a student project.
Finally, concerning the usability, it could be useful to conduct empirical evaluations in order to investigate the influence of the system on a lecture, e.g., to which extent a lecturer and the questioners are distracted by the system.
6.5.2. Extending the Current Prototype
Extending the current prototype in order to enable it for live streaming is based on two parts which we presented in Chapter 4.5.3. As already mentioned, both parts are obvi- ous but due to the given time constraints it was not possible to fully implement them, respectively to test them completely.
The first part is a replacement for the audio normalizing algorithm with combined algorithms of a noise gate, an expander, a compressor and a limiter. The main advan- tage is that this combination does not need to detect a global maximum before being applied.
The second part, necessary to enable live streaming, are „DirectShow Source Filters… accepting bitmaps respectively raw PCM data as their input, and „DirectShow Trans- form Filters… transmitting the encoded streams using the RTP protocol.
6.5.3. Transferring Automatic Lecture Recording to other environments
As our prototype of the distributed Automatic Lecture Recording system is already successful in its first version, it is worth thinking of transferring it to different contexts outside lectures. From a technical point of view, all necessary parts are configurable so that there are no principle restrictions. Mainly, the FSM has to be re-written to cover all possible situations of the new context, and the configuration files have to be adapted to the hall where the event takes place.
Nevertheless, there currently are three constraints:
1. If the event to be covered by the FSM is very complex, it is really hard to manually write an FSM covering all possible situations. There might be a limit to the events which can be covered by our Automatic Recording system. The complexity at which such a limit occurs should be examined in future work.
2. Up to now, only fixed, mounted PTZ cameras have been used. If autonomous, moving cameras with any degrees of freedom are getting employed, many of the algorithms of the virtual cameraman module have to be rewritten as new kinds of motion will appear in the image.
3. Spontaneous recordings, e.g., on the street, are not the target of our Automatic Lecture Recording system as there is significant effort to be done to measure and to calibrate the system to every new location.
However, it should be easy to adapt the Automatic Lecture Recording system to all kinds of frontal presentations as this genre has rigid rules of interaction. The occur- rences of this genre are manifold, e.g., internal presentations in companies, presenta- tion coaching, panel discussions, plenary meetings, party conventions, stockholders† meetings and court hearings.
Transferring the system may also mean to exchange the recording equipment to meet different technical requirements, e.g., to increase the quality. As the cameraman is built modularly it is quite easy to exchange the cameras as long as they provide the
same set of functionals. If the functional range is different it is necessary to rewrite parts of the virtual cameraman module.
Depending on the equipment in use, higher system requirements may arise. To be more precise, while the Axis cameras and video servers used to consume a total band- width of about 16 Mbit/s during run-time, one single professional camera using as its main output the serial digital interface (SDI) has a constant bit rate of 270 Mbit/s for standard definition (SD) resolution, defined by the SMPTE-259M standard. In case of high definition (HD) resolution cameras, the main output uses the so-called HD-SDI with a constant bit rate of 1.485 Gbit/s, according to the SMPTE-292M standard. In near future the new 3 Gbit/s standard will be common in the studios. It is obvious that these amounts of data require much higher system capacities for every part of the sys- tem in order to keep up with the real-time requirement.
Having the already implemented features in mind and aiming at the presented possi- bilities of future work, this project is not only able to bring Automatic Lecture Re- cording to a higher level but also to extend its scope to further applications, providing a wider basis for researchers.