Participant environment - comparison of video frames and 3D video frames in terms of the qualit

comparison of video frames and 3D video frames in terms of the quality of facial features and expressions.

5.2.5 Participant environment

A computer lab with one central bench and two sets of PCs running down it on either side was used to display the still images. Half of the students faced each set of PCs. One set of students marked the video conferencing stills and the other the virtuality telepresence stills. They could not see the other groups’ images.

5.2.6 Analysis

Related t-tests were used to compare the identification of facial action units.

5.3 Results

A qualitative analysis compared the synchronised images from both mediums in terms of facial qualities using the Facial Action Coding System (FACS).

When testing H1 (Significantly fewer facial actions will be detected through the reconstructed avatar), no significant difference was found overall. However, over both mediums, coders identified a total of 626 facial action units with a mean of 4.47 facial action units per image. The video conferencing condition had 339 facial action units (mean 4.84

per image) and the virtuality telepresence condition 287 (mean 4.1 per image). There was no significant difference between the overall individual action units identified across each condition (video conferencing mean = 5.84, virtuality telepresence mean = 4.95, t=1.519, df=58, p=0.134). However, there was a significant difference between the lower facial action units identified (video conferencing mean = 7.0, virtuality telepresence mean = 4.19, t=2.858, df=15, p=0.012). The upper face action units and head position had identical scores.

5.4 Discussion

This experiment set out to determine the quality of 3D reconstructed avatars created using shape-from-silhouette. The question was asked: Do the shortcomings in the method for reconstruction contribute to a reduction in facial detail? An answer to this question was achieved by comparing the virtuality telepresence condition against that of video conferencing thus testing whether the visual quality of the medium would impact upon a participant’s ability to detect facial muscle movements. The following hypothesis was tested:

H1: Significantly fewer facial actions will be detected when viewing the reconstructed avatar. The experiment examined which parts of the face the receiver could identify using facial action coding. Ten still frames from each of the mediums were scored by a group of 14 coders. The results showed no significant difference in the overall number of facial action coding units identified across video conferencing and virtuality telepresence. This suggests that, even with the known problems of shape-from-silhouette, the image quality was good enough for identifying facial movements on a par with video conferencing.

Fig. 26. Screen shot of virtuality avatar illustrating: 1. Slicing. 2. Lack of Concavities. 3. The droop effect.

The above results were promising for the medium of virtuality telepresence. They show that, even with the impacting factors of shape-from-silhouette, virtuality avatars contain enough detail to allow a participant to identify facial muscle movements comparably with video conferencing. One problem with the fundamentals of the approach was shape-from- silhouette’s inability to detect concavities, although the general 3D geometry and texturing that illustrated the concavities contained enough information to allow the receiver to identify them (see Fig. 26). The other issues were around camera setup with spatial calibration causing slicing (see Fig. 26). In this experiment the slicing did not appear to affect performance. However, it is noted that if the slicing had occurred in a different location, at the jaw perhaps, then this may have been more of an issue. It is possible that droop caused by camera position could have been an issue with a significant difference in the number of lower face action units identified across the medium. This could also have been affected by one of the participants having a beard that might have made identification more difficult, however, in such a case this would have been present over both mediums. Finally, there was a problem with colour calibration causing an unrealistic skin tone although this did not appear to affect the results.

5.5 Summary

This experiment measured the quality of facial representation afforded by video based 3D reconstruction. This was achieved by comparing between still frames from video conferencing and a 3D video based computer graphic medium. The contribution of this work is, firstly, to demonstrate that a 3D medium aimed at faithfully reproducing a person’s appearance can be comparable to video conferencing in the overall portrayal of facial muscle movements and, secondly, it is a methodological contribution showing how facial action coding can be used as a measure of quality for virtual character facial expressions. Overall, there was no significant difference in the number of facial actions identified by coders across the two mediums (video conferencing mean = 5.84, virtuality telepresence mean = 4.95, t=1.519, df=58, p=0.134). This suggests that the two mediums are equally effective at displaying facial muscle movements. However, there was a significant difference across conditions for the lower facial action units (video conferencing mean = 7.0, virtuality telepresence mean = 4.19, t=2.858, df=15, p=0.012).

This study has shown that a 3D graphics avatar, continually reconstructed from live video, is comparable to video conferencing in portraying the overall facial muscle movements of the person it mimics. These findings may contribute to the design of future virtual humans and communication mediums.

In document What aspects of realism and faithfulness are relevant to supporting non verbal communication through 3D mediums (Page 148-152)