CSE 237A Final Project Final Report

(1)

CSE 237A Final Project Final Report

Multi-way video conferencing system over 802.11 wireless network

Yanhua Mao and Shan Yan Motivation

The latest technology trends in personal mobile computing are towards integration of various technologies. The new generation PDAs ( Personal Digital Assistant) can not only be used as calculators and organizers, they also have color screens with high resolution, audio and video capabilities, and internet access capabilities via Wi-Fi and WiMax, enabling them to be used as mobile phones (smartphones), web browsers or media players. The technology advances make people believe that one day we can realize personal mobile computing at any time in any places.

We would like to investigate the key technologies needed towards this vision, more specifically, by implementing a multi-way video conferencing system.

Project Goals

In this project, we are aiming to develop a power efficient multi-way video conferencing system on PDAs over 802.11 wireless networks.

State of Art

Most desktop based video conferencing system only supports 2-way communication.

While AIM is the only system we know that supports up to 4 way video conferencing [6]. However, all participants must use MacOS X to use the feature. Desktop based video conferencing systems do not assume the limited screen size and processing power that a typical PDA has. More over, power consumption is certainly not an issue for desktop systems.

Goals

We believe that video conferencing system on PDA is going to become popular and would like to investigate the key technologies to realize a multi-way video conferencing system. We will support up to 4-way communication and deliver real-time video and audio communication via wireless network among the participants.

We are looking into tuning up the user interface to provide best possible user experience via the limited resources provided by the PDA. Power consumption is a very important issue on handheld systems. In order to achieve reasonable power efficiency, we need to investigate low-power algorithms for A/V encoding and decoding. In addition, we need to balance between power consumption and A/V quality to reach a good tradeoff.

Approaches

We implemented a mechanism that support dynamic frame rate for video encoding and playback. This enables us to balance workload and save power consumption at

(2)

user’s request. From a user interface point of view, we designed a primary/secondary views GUI. We’ll show the active participant’s view in the larger primary view and show the inactive participants in the smaller secondary views. This not only allows the user to focus on who is speaking but also gives us opportunity to optimize power efficiency.

Project Implementation

We implemented a multi-way video conferencing system for the Xscale platform. The system can support up to 4 end-users at the same time. It has a user friendly primary/secondary GUI to give user the best experience. It integrats several power management schemes to reach the energy efficiency requirements for embedded applications. It also has the low bandwidth requirements design without sacrifice the quality of application. In the remainder of the section, we will discuss the system architecture, GUI design, audio/video implementation and energy efficient optimizations.

Audio LCD Camera Network

View Manager

Peer Manager

Touch Screen ADPCM

Codec

JPEG Decoder

JPEG Encoder

Timed Event Driven Engine

Config Manager

Figure 1 System Architecture Figure 2 GUI Snapshot System architecture

The system can be roughly divided into 4 layers as shown in Figure 1. At the bottom is the hardware modules, this includes audio, LCD display, camera, network and touch screen modules. Network and touch screen modules interface directly with a high performance timed event driven engine, while the audio, LCD and camera modules connect to the engine through their correspondent codec. Finally, at the top of the application are view manger, peer manager and configuration manager. We will not going into details of the modules, as the names are quite self-explanatory.

Primary/secondary GUI design

In order to maximize the user experience of using the system, we implement a primary/secondary view design. A GUI snapshot is shown in Figure 2.

There are one primary large view in the middle showing the view of the person who is actively talking and two corner secondary small views showing the other participants in the conference but not actively talking. The user’s own video is always shown in the middle-top view.

(3)

In the automatic view management mode, our application keeps track of volume of the participants other than the user himself. It will dynamically bring the video of the person who is actively talking to the primary display automatically. We also give user flexibility to disable this functionality by click the lock view button at the bottom left corner of the GUI. In this model, the user can choose which participant to be shown in the primary view by simply clicking that participant’s video. The other three buttons at the bottom of the GUI are used to set video encoding and playback frame rate. The fast mode gives the best real-time performance at the cost of higher power consumption. User can also save CPU power by decrease by using slower model. In this case, we give user the flexibility to balance between power consumption and user experience.

Audio implementation

We use Open Sound System [3] interface to capture 8 KHz, 16 bits monotonic audio in our implementation as various references suggested that it is sufficient for speech.

We consider both ADPCM and MPEG-Layer3 codec for audio. A quick research revealed that MPEG-Layer3 gives better compression rate but the encoding algorithm is substantially more complex, which means more CPU consumption. In the mean time an ADPCM codec can encode the 128Kbps raw stream into a 32 Kbps stream and provide fairly good quality. We feel that the ADPCM codec’s quality and bit rate is good enough for our application, therefore pick ADPCM over MPEG-Layer3 in favor of lower CPU consumption. We used Jack Jansen’s implementation [4] of Intel/DVI or IMA ADPCM in our application. It is referred as faster than the original Intel implementation.

We also implemented the audio to be a higher priority service than the video in our application. The reason behind is quite simply, as it is generally agreed that audio is more important in a conferencing system.

Video implementation

We implement on screen display via Linux frame buffer. We pick YUV color space for the video overlay because it is the native output color space for the video codec we use. This saves us the unnecessary color space conversion if we were to choose differently.

We read and modified the ci-capture example for our video capturing module. We picked the video resolution of 160x120 and 80x60 for large and small view respectively. Since the LCD screen is only 240x320, better resolution is simply unnecessary. For video codec we considered MPEG-1, MPEG-4, JPEG, JPEG2000 and finally choose JPEG because of the following reasons. First, MPEG-1 and MPEG-4 encoding requires motion detection, which will impose a higher CPU utilization. Second, we couldn’t find any existing open source MPEG-1 and MPEG-4 codec support dynamic frame rate and size, which we plan to use in our application.

With a still picture codec, we can easily timestamp and resize the individual frame and synchronize it with other frames and audio playback. We therefore choose JPEG

(4)

and JPEG2000 over MPEG-1 and MPEG-4. We finally decide to use JPEG codec as JPEG2000 use more complex and slower algorithm. Our experience with the JPEG codec [5] shows that it can compress a 160x120 picture into a baseline quality JPEG file less than 2KB in size. Consider the maximum 15 fps, it gives a maximum of 30KBps (or 240Kbps) stream for our application, which we believe is acceptable.

Finally, we tried real time encoding and decoding with baseline quality on a PC and feel the quality is satisfactory.

Customized optimizations

Not only the implementations decision of audio and video codec considers the energy efficient design, we also develop many power optimization techniques to further save the energy. Although each of them may only gain a little bit performance for us, by one by one and little by little, we get the desired performance for our application.

We use low level efficient hardware access in our code. Most of our codes directly deal with the hardware interface via operating system level API instead of using third party libraries. Although the coding process is painful, it gives better performance and saves the power as well.

We also hand coded zoom in/out routines. The zoom in/out routines are the most frequently used ones. Their efficiencies directly influence the system performance.

Thus we hand optimize the code to make it running fast and efficient.

When video is coded using JPEG, each frame of the same size has the same JPEG header information in the data. So it is redundant to transfer the header information for each frame. When transferring video data, we skip JPEG header data transfer which is approximately 700bytes/frame to save power.

As mentioned before, we work under YUV color space instead of RGB in favor of the native color space for JPEG library to save saves us the unnecessary color space conversion. This makes the GUI implementation a little bit painful for us, but we do gain performance by doing this.

We implement three modes of video transfer—fast mode, medium mode and slow mode. Fast mode is to meet the highest performance whereas medium mode and slow mode is designed to save the power consumption. The data transfer rate in each mode is also customizable. By default, the fast mode supports 10-15 frame/sec while medium mode and slow modes support 5-10 fps and 1-5 fps. In addition to slower framer rate, the application will also transfer smaller size picture in slow mode.

We implement dynamic video frame skipping scheme using the leaky buffer management algorithm. Both video encoding and decoding are under the control of leaky buffer. The actual frame rate is a function of user preference and resource available. Frames that will not be displayed or transferred will be dropped without going through the codec.

We also implement a fast, timed event-driven, single-thread application engine to control the whole system.

(5)

Wireless Support

We rebuild the Linux kernel to support PCMCIA and 802.11 wireless card. We also add packet socket support in the kernel to enable DHCP client. We cross-compiled the Wireless Tools utilities [1] and the Card Manager utilities [2]. We are able to bring up to wireless interface using ifconfig and iwconfig command and associate it to the UCSD network. Finally, we use the built-in command udhcpc to dynamic obtain IP addresses. However, the wireless card we borrowed is an old pre-802.11 card and has poor interoperation with the UCSD wireless network. It fails randomly after working for a while. We test our application when the card does working, the result is satisfactory as the bandwidth our application requires is way below that 802.11 network can provide. For most of our experiment and test, we emulate it with wired network as wireless and wired network use the same API

Conclusion, experiment and future work

Implementing a functional correct multi-way video conferencing system on PDAs is not trivial. To make the design power efficient and user friendly is even harder. In the design process, we are dealing with a lot of hardware interfaces from sound card, camera, LCD to touch screen. To figure out how each of them work needs a lot of time and work.

Our main focus of the system design is on the power managements. We designed many schemes to save power. Finally we are able to integrate these schemes into our design.

We set up two XScale platforms to run our application. We also add two other dummy applications that play prerecorded video on a PC. All four peers are able to connect to each other. We tested all the features we design and implemented and they are fully functional. The audio/video playback quality is also satisfactory.

The mechanism we implemented can support a wide range of policy than what we have implemented in the system. For example, the peers can dynamically negotiate a video frame transfer rate to save the unnecessary data transfer if the destination peer is set to a low frame rate setting.

From this project, we gain a lot of valuable experiences for the embedded system design.

References

[1] Wireless Tools for Linux: http://www.hpl.hp.com/personal/Jean_Tourrilhes/Linux/Tools.html [2] Linux / Unix Command: cardmgr:http://linux.about.com/library/cmd/blcmdl8_cardmgr.htm [3] Open Sound System: www.4front-tech.com/oss.html

[4] Jack Jansen’s implementation of Intel/DVI or IMA ADPCM http://homepages.cwi.nl/~jack/

[5] libjpeg from Independent JPEG Group: http://www.ijg.org/

[6] aim homepage http://www.aim.com