Limitations and Future Work

6.2.1 Spatial and Colour Calibration

6.2.1.1 Spatial

The result of the calibration process is now consistently of high quality. However, it would be of advantage if the capture system was enhanced with a feedback system to inform the calibrator when the wand spheres are in view of all cameras simultaneously. This could be achieved through audio cues where the system informs the calibrator how many cameras were observing both spheres, assisting the user with wand movements. Furthermore, if the calibration sparse bundling process were to be executed on samples of points whilst the calibrator was waving the wand, for instance at intervals of 1000

131 points, then it could give an indication of how accurate the result were to be so that the process could be halted once a sufficient quality was achieved. This too could be indicated by an audible cue to the calibrator.

6.2.1.2 Colour

Although the simple method of colour correction has proven effective further work could be carried out with the advanced method to make it fully autonomous. For example, the resultant silhouette from segmentation could be fed into the process to guide the process of identifying the predominant colours in the areas of interest. It may also be useful to use an optical spectrum analyser to determine the exact wavelength of the light emitted by the balls to assist with both ball choice and hue thresholding.

6.2.2 Background-Foreground Segmentation

6.2.2.1 Visible Light Spectrum

The hardware constraints imposed are not fundamental issues and it has been demonstrated that, on newer hardware, the background-foreground segmentation can in fact execute in near real-time.

Although the Ground Truthing adds value and validates the results further research could be performed to find better methods and metrics. Perhaps polygon length could be explored.

6.2.2.2 Infrared Light Spectrum

Experimenting with use of additional infrared lamps of higher quality, greater power output and the exact wavelength of the cut filter present in the cameras. An example setup is shown in Figure 6-1.

132 Exploring the possibility of mixing IR and visible light for the segmentation could be possible with correctly calibrated narrowband where a visible light is adjacent to an infrared camera.

Figure 6-1 Example of Infrared Lamp and Camera Configuration

6.2.3 System Architecture

In its current deployment the system does not support micro expressions but this is due to hardware constraints and not the methods employed.

133 The compression of the 3D mesh is currently achieved in a relatively simple manner with each mesh being compressed in isolation from the complete stream. Further work could be done by including some form of mesh sequence encoding and compression.

The end-to-end latency of the system was measured by the author to be 1.5 second in the streaming that took place to Germany. This was measured by audio and visual cues and provides a valid indication that the delay is too great for real time systems. However, given that this was done over a TCP/IP network that one currently has no control over the latency that can be expected. The continued adoption of IPv6 and the possibility of guaranteeing bandwidth between two end-nodes provides confidence that in the future this latency can be reduced significantly.

The usage of a single PC for capture and reconstruction could be possible. Some preliminary investigation shows that if a 10Gbps capable switch and network card was present in the capture/reconstruction server and if it was equipped with multiple Graphics Cards to perform the segmentation it could reduce the requirement for multiple capture node hardware.

The links between sites are currently neither authenticated nor secure. It may be of interest for further research to investigate ways of enhancing the security without impacting on the performance.

6.3 Conclusion

This thesis has presented a complete end-to-end system capable of capturing, reconstructing, streaming and finally rendering the 3D form of people and objects. It has overcome several problem characteristics that were identified, namely system architecture, calibration and background-foreground segmentation. It enables researchers

134 without domain specific knowledge to investigate telepresence and NVB. It is able to perform this using commodity hardware. The thesis also presented an investigation into performing segmentation in the infrared spectrum which will provide much insight for future research.

135

References

7-ZIP. (2015). LZMA algorithm [Online]. Retrieved 01/09/15, from http://www.7- zip.org/

Abramov, A., Pauwels, K., Papon, J., Wörgötter, F., & Dellen, B. (2012). Depth-

supported real-time video segmentation with the Kinect. Paper presented at the

Applications of Computer Vision (WACV), 2012 IEEE Workshop on.

ACM. (2015). Retrieved 14/11/15, 2015, from http://www.acm.org

Adrian Hilton. (2015). Retrieved 14/11/15, 2015, from

http://kahlan.eps.surrey.ac.uk/Personal/AdrianHilton/Welcome.html

Alexiadis, D. S., Zarpalas, D., & Daras, P. (2013). Real-time, full 3-D reconstruction of moving foreground objects from multiple consumer depth cameras. Multimedia,

IEEE Transactions on, 15(2), 339-358.

Backlund, P., Engström, H., Hammar, C., Johannesson, M., & Lebram, M. (2007). Sidh-

a game based firefighter training simulation. Paper presented at the Information

Visualization, 2007. IV'07. 11th International Conference.

Baumgart, B. G. (1975). A polyhedron representation for computer vision. Paper presented at the Proceedings of the May 19-22, 1975, national computer conference and exposition.

BBC Research & Development. (2015). Retrieved 14/11/15, 2015, from

136 BEAMING : Being in Augmented Multi-Modal Naturally-Networked Gatherings.

(2013). Retrieved 14/11/15, 2015, from http://beaming-eu.org/home

Benezeth, Y., Jodoin, P.-M., Emile, B., Laurent, H., & Rosenberger, C. (2008). Review and evaluation of commonly-implemented background subtraction algorithms. Paper presented at the Pattern Recognition, 2008. ICPR 2008. 19th International Conference on.

blue-c. (2003). Retrieved 14/11/15, 2015, from http://blue-c.ethz.ch

blue-c-II. (2012). Retrieved 14/11/15, 2015, from http://blue-c-ii.ethz.ch

Cadoz, C. (1994). Les réalités virtuelles.

CiteSeerX. (2015). Retrieved 14/11/15, 2015, from http://citeseerx.ist.psu.edu

Clausner, C., Pletschacher, S., & Antonacopoulos, A. (2011a). Aletheia-an advanced document layout and text ground-truthing system for production environments. Paper presented at the Document Analysis and Recognition (ICDAR), 2011 International Conference on.

Clausner, C., Pletschacher, S., & Antonacopoulos, A. (2011b). Scenario driven in-depth

performance evaluation of document layout analysis methods. Paper presented at

the Document Analysis and Recognition (ICDAR), 2011 International Conference on.

Cockburn, A. (2008). Using both incremental and iterative development. CrossTalk, May.

137 Debevec, P. E., Taylor, C. J., & Malik, J. (1996). Modeling and rendering architecture

from photographs: a hybrid geometry- and image-based approach. Paper

presented at the Proceedings of the 23rd annual conference on Computer graphics and interactive techniques.

Directors of the Intel VCI (2015). Retrieved 14/11/15, 2015, from http://www.intel- vci.uni-saarland.de/en/team/

Duckworth, T. (2013). Improving the performance of video based reconstruction and validating it within a Telepresence context. University of Salford.

Duckworth, T., & Roberts, D. J. (2014). Parallel processing for real-time 3D reconstruction from video streams. Journal of Real-Time Image Processing, 9(3), 427-445.

Ekman, P., & Matsumoto, D. (2008). Facial expression analysis. Scholarpedia, 3(5), 4237.

Furukawa, Y., & Ponce, J. (2008). Accurate camera calibration from multi-view stereo

and bundle adjustment. Paper presented at the Computer Vision and Pattern

Recognition, 2008. CVPR 2008. IEEE Conference on.

Furukawa, Y., & Ponce, J. (2010). Accurate, dense, and robust multiview stereopsis.

Pattern Analysis and Machine Intelligence, IEEE Transactions on, 32(8), 1362-

1376.

Gerndt, A., Gwinner, K., Fernando, T., Roberts, D., Musso, I., Basso, V., . . . Kasaba, Y. (2015). Collaborative Virtual Environments for Mars Science Analysis and Rover

138 2015, held 27 September-2 October, 2015 in Nantes, France, Online at

http://meetingorganizer. copernicus. org/EPSC2015, id. EPSC2015-928.

Godbehere, A. B., Matsukawa, A., & Goldberg, K. (2012). Visual tracking of human visitors under variable-lighting conditions for a responsive audio art installation. Paper presented at the American Control Conference (ACC), 2012.

Google. (2015). Protocol Buffers. Retrieved 25/11/2015, 2015, from https://developers.google.com/protocol-buffers/?hl=en

Google Scholar. (2015). Retrieved 14/11/15, 2015, from http://scholar.google.co.uk

Grau, O., Price, M., & Thomas, G. A. (2003). A 3D studio production system with

immersive actor feedback. Paper presented at the ACM SIGGRAPH 2003

Sketches & Applications.

Grau, O., Pullen, T., & Thomas, G. (2004). A combined studio production system for 3- D capturing of live action and immersive actor feedback. Circuits and Systems for

Video Technology, IEEE Transactions on, 14(3), 370-380.

Grau, O., Thomas, G. A., Hilton, A., Kilner, J., & Starck, J. (2007). A robust free-

viewpoint video system for sport scenes. Paper presented at the 3DTV Conference,

2007.

Griesser, A., De Roeck, S., Neubeck, A., & Van Gool, L. (2005). GPU-Based Foreground-Background Segmentation using an Extended Colinearity Criterion. Paper presented at the Proceedings of Vision, Modeling, and Visualization (VMV) 2005.

139 Grimson, W. E. L., Stauffer, C., Romano, R., & Lee, L. (1998). Using adaptive tracking to classify and monitor activities in a site. Paper presented at the Computer Vision and Pattern Recognition, 1998. Proceedings. 1998 IEEE Computer Society Conference on.

Gross, M., Würmlin, S., Naef, M., Lamboray, E., Spagno, C., Kunz, A., . . . Lang, S. (2003). blue-c: a spatially immersive display and 3D video portal for

telepresence. Paper presented at the ACM Transactions on Graphics (TOG).

Hansard, M., Lee, S., Choi, O., & Horaud, R. P. (2012). Time-of-flight cameras: principles, methods and applications: Springer Science & Business Media.

Heikkila, J., & Silvén, O. (1997). A four-step camera calibration procedure with implicit

image correction. Paper presented at the Computer Vision and Pattern

Recognition, 1997. Proceedings., 1997 IEEE Computer Society Conference on.

Henry Fuchs. (2014). Retrieved 14/11/15, 2015, from http://henryfuchs.web.unc.edu

Huang, P.-H., & Lai, S.-H. (2008). Silhouette-based camera calibration from sparse

views under circular motion. Paper presented at the Computer Vision and Pattern

Recognition, 2008. CVPR 2008. IEEE Conference on.

IEEE. (2015). Retrieved 14/11/15, 2015, from https://www.ieee.org

IEEE Xplore. (2015). Retrieved 14/11/15, 2015, from http://ieeexplore.ieee.org

Ilie, A., & Welch, G. (2005). Ensuring color consistency across multiple cameras. Paper presented at the Computer Vision, 2005. ICCV 2005. Tenth IEEE International Conference on.

140 Incremental, U. B. (2008). Iterative Development. Dr. Alistair Cockburn, Humans and

Technology. Crosstalk May.

Isabelle, S. K., Gilkey, R. H., Kenyon, R. V., Valentino, G., Flach, J. M., Spenny, C. H., & Anderson, T. R. (1997). Defense applications of the CAVE (CAVE automatic

virtual environment). Paper presented at the AeroSense'97.

Joshi, N., & Jensen, H. (2004). Color calibration for arrays of inexpensive image sensors. Master’s thesis, Stanford University Department of Computer Science.

KaewTraKulPong, P., & Bowden, R. (2002). An improved adaptive background mixture model for real-time tracking with shadow detection Video-based surveillance systems (pp. 135-144): Springer.

Larman, C., & Basili, V. R. (2003). Iterative and incremental development: A brief history. Computer(6), 47-56.

Laurentini, A. (1994). The visual hull concept for silhouette-based image understanding.

Pattern Analysis and Machine Intelligence, IEEE Transactions on, 16(2), 150-

162.

Lee, S.-Y., Kim, I.-J., Ahn, S. C., Ko, H., Lim, M.-T., & Kim, H.-G. (2004). Real time

3D avatar for interactive mixed reality. Paper presented at the Proceedings of the

2004 ACM SIGGRAPH international conference on Virtual Reality continuum and its applications in industry.

Li, L., Huang, W., Gu, I. Y., & Tian, Q. (2003). Foreground object detection from videos

containing complex background. Paper presented at the Proceedings of the

141 Maimone, A., & Fuchs, H. (2011a). Encumbrance-free telepresence system with real-

time 3D capture and display using commodity depth cameras. Paper presented at

the Mixed and Augmented Reality (ISMAR), 2011 10th IEEE International Symposium on.

Maimone, A., & Fuchs, H. (2011b). A First Look at a Telepresence System with Room-

Sized Real-Time 3D Capture and Large Tracked Display. Paper presented at the

International Conference on Artificial Reality and Telexistence (ICAT), Osaka (Japan).

Maimone, A., & Fuchs, H. (2012a). Real-Time Volumetric 3D Capture of Room-Sized

Scenes for Telepresence. Paper presented at the Conference: The True Vision -

Capture, Transmission and Display of 3D Video (3DTV-CON), Zurich (Switzerland).

Maimone, A., & Fuchs, H. (2012b). Reducing interference between multiple structured

light depth sensors using motion. Paper presented at the Virtual Reality Short

Papers and Posters (VRW), 2012 IEEE.

Massie, T. H., & Salisbury, J. K. (1994). The phantom haptic interface: A device for

probing virtual objects. Paper presented at the Proceedings of the ASME winter

annual meeting, symposium on haptic interfaces for virtual environment and teleoperator systems.

Matsuyama, T., Wu, X., Takai, T., & Wada, T. (2004). Real-time dynamic 3-D object shape reconstruction and high-fidelity texture mapping for 3-D video. Circuits

142 Matusik, W., Buehler, C., & McMillan, L. (2001). Polyhedral visual hulls for real-time

rendering: Springer.

Matusik, W., Buehler, C., Raskar, R., Gortler, S. J., & McMillan, L. (2000). Image-based

visual hulls. Paper presented at the Proceedings of the 27th annual conference on

Computer graphics and interactive techniques.

McNeill, D. (1992). Hand and mind: What gestures reveal about thought: University of Chicago press.

Mester, R., Aach, T., & Dümbgen, L. (2001). Illumination-invariant change detection using a statistical colinearity criterion Pattern recognition (pp. 170-177): Springer.

Mitchelson, J., & Hilton, A. (2003). Wand-based multiple camera studio calibration.

Center Vision, Speech and Signal Process.

Moore, C. (2012). Distribution and Processing of Video for Real-time 3D Telepresence. University of Salford.

O'Hare, J. Octave - technical information, University of Salford. Retrieved 01/09/15,

2015, from http://www.salford.ac.uk/computing-science-

engineering/facilities/octave-technical-information

Office of the Future. (2009). Retrieved 14/11/15, 2015, from

http://www.cs.unc.edu/Research/stc

Petit, B., Lesage, J.-D., Menier, C., Allard, J., Franco, J.-S., Raffin, B., . . . Faure, F. (2009). Multicamera real-time 3d modeling for telepresence and remote collaboration. International journal of digital multimedia broadcasting, 2010.

143 Pollefeys, M., Sinha, S. N., Guan, L., & Franco, J.-S. (2009). Multi-view calibration, synchronization, and dynamic scene reconstruction. Multi-Camera Networks: Principles and Applications, 29-75.

Porikli, F. (2003). Inter-camera color calibration by correlation model function. Paper presented at the Image Processing, 2003. ICIP 2003. Proceedings. 2003 International Conference on.

Ramanathan, P., Steinbach, E. G., & Girod, B. (2000). Silhouette-Based Multiple-View

Camera Calibration. Paper presented at the VMV.

Reflecmedia. (2015). Chromatte. Retrieved 15/12/15, 2015, from

http://www.reflecmedia.com/broadcast/products/chromatte/index.htm

ResearchGate. (2015). Retrieved 14/11/15, 2015, from http://www.researchgate.net

Rimé, B. (1982). The elimination of visible behaviour from social interactions: Effects on verbal, nonverbal and interpersonal variables. European journal of social psychology, 12(2), 113-129.

Rimé, B., & Schiaratura, L. (1991). Gesture and speech.

Roberts, D. J., Fairchild, A. J., Campion, S. P., O'Hare, J., Moore, C. M., Aspin, R., . . . Tecchia, F. (2015). withyou—An Experimental End-to-End Telepresence System Using Video-Based Reconstruction. Selected Topics in Signal Processing, IEEE Journal of, 9(3), 562-574.

Roberts, D. J., Rae, J., Duckworth, T. W., Moore, C. M., & Aspin, R. (2013). Estimating the gaze of a virtuality human. Visualization and Computer Graphics, IEEE

144 Sauvola, J., & Pietikäinen, M. (2000). Adaptive document image binarization. Pattern

recognition, 33(2), 225-236.

Schultz, C. (2006). Digital Keying Methods. University of Bremen Center for Computing Technologies.

ScienceDirect. (2015). Retrieved 14/11/15, 2015, from http://ieeexplore.ieee.org

Shen, R., Cheng, I., & Basu, A. (2008). Multi-Camera Calibration Using a Globe. Paper presented at the The 8th Workshop on Omnidirectional Vision, Camera Networks and Non-classical Cameras-OMNIVIS.

Shu, B., Qiu, X., & Wang, Z. (2008). Hardware-based camera calibration and 3D

modelling under circular motion. Paper presented at the Computer Vision and

Pattern Recognition Workshops, 2008. CVPRW'08. IEEE Computer Society Conference on.

Shujun, Z., Cong, W., Xuqiang, S., & Wei, W. (2009). DreamWorld: CUDA-accelerated

real-time 3D modeling system. Paper presented at the Virtual Environments,

Human-Computer Interfaces and Measurements Systems, 2009. VECIMS'09. IEEE International Conference on.

Sinha, S. N., & Pollefeys, M. (2004). Synchronization and calibration of camera

networks from silhouettes. Paper presented at the Pattern Recognition, 2004. ICPR

2004. Proceedings of the 17th International Conference on.

Sobral, A. (2013). BGSLibrary: An opencv c++ background subtraction library. Paper presented at the IX Workshop de Visao Computacional (WVC’2013), Rio de Janeiro, Brazil.

145 Starck, J., Maki, A., Nobuhara, S., Hilton, A., & Matsuyama, T. (2009). The multiple-

camera 3-D production studio. Circuits and Systems for Video Technology, IEEE

Transactions on, 19(6), 856-869.

Stauffer, C., & Grimson, W. E. L. (1999). Adaptive background mixture models for real-

time tracking. Paper presented at the Computer Vision and Pattern Recognition,

1999. IEEE Computer Society Conference on.

Stauffer, C., & Grimson, W. E. L. (2000). Learning patterns of activity using real-time tracking. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 22(8), 747-757.

Stockman, G. C., Chen, S. W., Hu, G., & Shrikhande, N. (1988). Sensing and recognition of rigid objects using structured light. Control Systems Magazine, IEEE, 8(3), 14- 22. doi: 10.1109/37.472

Tong, J., Zhou, J., Liu, L., Pan, Z., & Yan, H. (2012). Scanning 3d full human bodies using kinects. Visualization and Computer Graphics, IEEE Transactions on, 18(4), 643-650.

Tsai, R. Y. (1987). A versatile camera calibration technique for high-accuracy 3D machine vision metrology using off-the-shelf TV cameras and lenses. Robotics

and Automation, IEEE Journal of, 3(4), 323-344.

van den Bergh, F., & Lalioti, V. (1999). Software chroma keying in an immersive virtual environment. South African Computer Journal, 24(155-162), 50.

146 Westerteiger, R., Gerndt, A., & Hamann, B. (2011). Spherical Terrain Rendering using

the hierarchical HEALPix grid.

http://dx.doi.org/10.4230/OASIcs.VLUDS.2011.13

Will, P. M., & Pennington, K. S. (1971). Grid coding: a preprocessing technique for

robot and machine vision. Paper presented at the Proceedings of the 2nd

international joint conference on Artificial intelligence, London, England.

Yasuda, K., Naemura, T., & Harashima, H. (2003). Thermo-key: Human Region

Segmentation from Video Using Thermal Information. Paper presented at the

ACM SIGGRAPH.

Yasuda, K. N., T. ; Harashima, H. . (2004). Thermo-key: human region segmentation from video. Computer Graphics and Applications, IEEE, 24(1), 26-30. doi: 10.1109/MCG.2004.1255805

Zhang, H., Wong, K.-Y. K., & Zhang, G. (2007). Camera calibration from images of spheres. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 29(3), 499-502.

Zhang, Z. (2000). A flexible new technique for camera calibration. Pattern Analysis and

Machine Intelligence, IEEE Transactions on, 22(11), 1330-1334.

Zhang, Z. (2004). Camera calibration with one-dimensional objects. Pattern Analysis and

Machine Intelligence, IEEE Transactions on, 26(7), 892-899.

Zivkovic, Z. (2004). Improved adaptive Gaussian mixture model for background

subtraction. Paper presented at the Pattern Recognition, 2004. ICPR 2004.

147 Zivkovic, Z., & van der Heijden, F. (2006). Efficient adaptive density estimation per

image pixel for the task of background subtraction. Pattern recognition letters, 27(7), 773-780.

148

Appendix A

Revision: 1 Date: 13/06/2016

3D Reconstruction System User Guide

Revision: 1 Date: 13/06/2016

About... 3 Key to notations ... 4 Spatial Calibration of Cameras ... 5 Acquiring sphere coordinates ... 5 Capture Node PC Preparation ... 5 Ready the Octave ... 5 Acquire the points ... 5 Generating Calibration File ... 7 Running 3D Reconstruction and Rendering ... 9 Configuration of octave ... 9 For optimal results in the Octave research facility turn on all ceiling lights and set all wall

projectors to white. It is best to leave the projectors which illuminate the floor OFF. ... 9 Synchronous video acquisition and background-foreground segmentation ... 9 3D Reconstruction... 10 Initialising the software ... 10 Performing 3D Reconstruction ... 11

In document Video based reconstruction system for mixed reality environments supporting contextualised non verbal communication and its study (Page 147-200)

6.2.1

Spatial and Colour Calibration

6.2.2

Background-Foreground Segmentation

6.2.3

System Architecture

6.3

Conclusion

References

Appendix A

3D Reconstruction System User Guide

Contents