6.2.1
Spatial and Colour Calibration
6.2.1.1 Spatial
The result of the calibration process is now consistently of high quality. However, it would be of advantage if the capture system was enhanced with a feedback system to inform the calibrator when the wand spheres are in view of all cameras simultaneously. This could be achieved through audio cues where the system informs the calibrator how many cameras were observing both spheres, assisting the user with wand movements. Furthermore, if the calibration sparse bundling process were to be executed on samples of points whilst the calibrator was waving the wand, for instance at intervals of 1000
131 points, then it could give an indication of how accurate the result were to be so that the process could be halted once a sufficient quality was achieved. This too could be indicated by an audible cue to the calibrator.
6.2.1.2 Colour
Although the simple method of colour correction has proven effective further work could be carried out with the advanced method to make it fully autonomous. For example, the resultant silhouette from segmentation could be fed into the process to guide the process of identifying the predominant colours in the areas of interest. It may also be useful to use an optical spectrum analyser to determine the exact wavelength of the light emitted by the balls to assist with both ball choice and hue thresholding.
6.2.2
Background-Foreground Segmentation
6.2.2.1 Visible Light Spectrum
The hardware constraints imposed are not fundamental issues and it has been demonstrated that, on newer hardware, the background-foreground segmentation can in fact execute in near real-time.
Although the Ground Truthing adds value and validates the results further research could be performed to find better methods and metrics. Perhaps polygon length could be explored.
6.2.2.2 Infrared Light Spectrum
Experimenting with use of additional infrared lamps of higher quality, greater power output and the exact wavelength of the cut filter present in the cameras. An example setup is shown in Figure 6-1.
132 Exploring the possibility of mixing IR and visible light for the segmentation could be possible with correctly calibrated narrowband where a visible light is adjacent to an infrared camera.
Figure 6-1 Example of Infrared Lamp and Camera Configuration
6.2.3
System Architecture
In its current deployment the system does not support micro expressions but this is due to hardware constraints and not the methods employed.
133 The compression of the 3D mesh is currently achieved in a relatively simple manner with each mesh being compressed in isolation from the complete stream. Further work could be done by including some form of mesh sequence encoding and compression.
The end-to-end latency of the system was measured by the author to be 1.5 second in the streaming that took place to Germany. This was measured by audio and visual cues and provides a valid indication that the delay is too great for real time systems. However, given that this was done over a TCP/IP network that one currently has no control over the latency that can be expected. The continued adoption of IPv6 and the possibility of guaranteeing bandwidth between two end-nodes provides confidence that in the future this latency can be reduced significantly.
The usage of a single PC for capture and reconstruction could be possible. Some preliminary investigation shows that if a 10Gbps capable switch and network card was present in the capture/reconstruction server and if it was equipped with multiple Graphics Cards to perform the segmentation it could reduce the requirement for multiple capture node hardware.
The links between sites are currently neither authenticated nor secure. It may be of interest for further research to investigate ways of enhancing the security without impacting on the performance.
6.3
Conclusion
This thesis has presented a complete end-to-end system capable of capturing, reconstructing, streaming and finally rendering the 3D form of people and objects. It has overcome several problem characteristics that were identified, namely system architecture, calibration and background-foreground segmentation. It enables researchers
134 without domain specific knowledge to investigate telepresence and NVB. It is able to perform this using commodity hardware. The thesis also presented an investigation into performing segmentation in the infrared spectrum which will provide much insight for future research.
135
References
7-ZIP. (2015). LZMA algorithm [Online]. Retrieved 01/09/15, from http://www.7- zip.org/
Abramov, A., Pauwels, K., Papon, J., Wörgötter, F., & Dellen, B. (2012). Depth-
supported real-time video segmentation with the Kinect. Paper presented at the
Applications of Computer Vision (WACV), 2012 IEEE Workshop on.
ACM. (2015). Retrieved 14/11/15, 2015, from http://www.acm.org
Adrian Hilton. (2015). Retrieved 14/11/15, 2015, from
http://kahlan.eps.surrey.ac.uk/Personal/AdrianHilton/Welcome.html
Alexiadis, D. S., Zarpalas, D., & Daras, P. (2013). Real-time, full 3-D reconstruction of moving foreground objects from multiple consumer depth cameras. Multimedia,
IEEE Transactions on, 15(2), 339-358.
Backlund, P., Engström, H., Hammar, C., Johannesson, M., & Lebram, M. (2007). Sidh-
a game based firefighter training simulation. Paper presented at the Information
Visualization, 2007. IV'07. 11th International Conference.
Baumgart, B. G. (1975). A polyhedron representation for computer vision. Paper presented at the Proceedings of the May 19-22, 1975, national computer conference and exposition.
BBC Research & Development. (2015). Retrieved 14/11/15, 2015, from
136 BEAMING : Being in Augmented Multi-Modal Naturally-Networked Gatherings.
(2013). Retrieved 14/11/15, 2015, from http://beaming-eu.org/home
Benezeth, Y., Jodoin, P.-M., Emile, B., Laurent, H., & Rosenberger, C. (2008). Review and evaluation of commonly-implemented background subtraction algorithms. Paper presented at the Pattern Recognition, 2008. ICPR 2008. 19th International Conference on.
blue-c. (2003). Retrieved 14/11/15, 2015, from http://blue-c.ethz.ch
blue-c-II. (2012). Retrieved 14/11/15, 2015, from http://blue-c-ii.ethz.ch
Cadoz, C. (1994). Les réalités virtuelles.
CiteSeerX. (2015). Retrieved 14/11/15, 2015, from http://citeseerx.ist.psu.edu
Clausner, C., Pletschacher, S., & Antonacopoulos, A. (2011a). Aletheia-an advanced document layout and text ground-truthing system for production environments. Paper presented at the Document Analysis and Recognition (ICDAR), 2011 International Conference on.
Clausner, C., Pletschacher, S., & Antonacopoulos, A. (2011b). Scenario driven in-depth
performance evaluation of document layout analysis methods. Paper presented at
the Document Analysis and Recognition (ICDAR), 2011 International Conference on.
Cockburn, A. (2008). Using both incremental and iterative development. CrossTalk, May.
137 Debevec, P. E., Taylor, C. J., & Malik, J. (1996). Modeling and rendering architecture
from photographs: a hybrid geometry- and image-based approach. Paper
presented at the Proceedings of the 23rd annual conference on Computer graphics and interactive techniques.
Directors of the Intel VCI (2015). Retrieved 14/11/15, 2015, from http://www.intel- vci.uni-saarland.de/en/team/
Duckworth, T. (2013). Improving the performance of video based reconstruction and validating it within a Telepresence context. University of Salford.
Duckworth, T., & Roberts, D. J. (2014). Parallel processing for real-time 3D reconstruction from video streams. Journal of Real-Time Image Processing, 9(3), 427-445.
Ekman, P., & Matsumoto, D. (2008). Facial expression analysis. Scholarpedia, 3(5), 4237.
Furukawa, Y., & Ponce, J. (2008). Accurate camera calibration from multi-view stereo
and bundle adjustment. Paper presented at the Computer Vision and Pattern
Recognition, 2008. CVPR 2008. IEEE Conference on.
Furukawa, Y., & Ponce, J. (2010). Accurate, dense, and robust multiview stereopsis.
Pattern Analysis and Machine Intelligence, IEEE Transactions on, 32(8), 1362-
1376.
Gerndt, A., Gwinner, K., Fernando, T., Roberts, D., Musso, I., Basso, V., . . . Kasaba, Y. (2015). Collaborative Virtual Environments for Mars Science Analysis and Rover
138 2015, held 27 September-2 October, 2015 in Nantes, France, Online at
http://meetingorganizer. copernicus. org/EPSC2015, id. EPSC2015-928.
Godbehere, A. B., Matsukawa, A., & Goldberg, K. (2012). Visual tracking of human visitors under variable-lighting conditions for a responsive audio art installation. Paper presented at the American Control Conference (ACC), 2012.
Google. (2015). Protocol Buffers. Retrieved 25/11/2015, 2015, from https://developers.google.com/protocol-buffers/?hl=en
Google Scholar. (2015). Retrieved 14/11/15, 2015, from http://scholar.google.co.uk
Grau, O., Price, M., & Thomas, G. A. (2003). A 3D studio production system with
immersive actor feedback. Paper presented at the ACM SIGGRAPH 2003
Sketches & Applications.
Grau, O., Pullen, T., & Thomas, G. (2004). A combined studio production system for 3- D capturing of live action and immersive actor feedback. Circuits and Systems for
Video Technology, IEEE Transactions on, 14(3), 370-380.
Grau, O., Thomas, G. A., Hilton, A., Kilner, J., & Starck, J. (2007). A robust free-
viewpoint video system for sport scenes. Paper presented at the 3DTV Conference,
2007.
Griesser, A., De Roeck, S., Neubeck, A., & Van Gool, L. (2005). GPU-Based Foreground-Background Segmentation using an Extended Colinearity Criterion. Paper presented at the Proceedings of Vision, Modeling, and Visualization (VMV) 2005.
139 Grimson, W. E. L., Stauffer, C., Romano, R., & Lee, L. (1998). Using adaptive tracking to classify and monitor activities in a site. Paper presented at the Computer Vision and Pattern Recognition, 1998. Proceedings. 1998 IEEE Computer Society Conference on.
Gross, M., Würmlin, S., Naef, M., Lamboray, E., Spagno, C., Kunz, A., . . . Lang, S. (2003). blue-c: a spatially immersive display and 3D video portal for
telepresence. Paper presented at the ACM Transactions on Graphics (TOG).
Hansard, M., Lee, S., Choi, O., & Horaud, R. P. (2012). Time-of-flight cameras: principles, methods and applications: Springer Science & Business Media.
Heikkila, J., & Silvén, O. (1997). A four-step camera calibration procedure with implicit
image correction. Paper presented at the Computer Vision and Pattern
Recognition, 1997. Proceedings., 1997 IEEE Computer Society Conference on.
Henry Fuchs. (2014). Retrieved 14/11/15, 2015, from http://henryfuchs.web.unc.edu
Huang, P.-H., & Lai, S.-H. (2008). Silhouette-based camera calibration from sparse
views under circular motion. Paper presented at the Computer Vision and Pattern
Recognition, 2008. CVPR 2008. IEEE Conference on.
IEEE. (2015). Retrieved 14/11/15, 2015, from https://www.ieee.org
IEEE Xplore. (2015). Retrieved 14/11/15, 2015, from http://ieeexplore.ieee.org
Ilie, A., & Welch, G. (2005). Ensuring color consistency across multiple cameras. Paper presented at the Computer Vision, 2005. ICCV 2005. Tenth IEEE International Conference on.
140 Incremental, U. B. (2008). Iterative Development. Dr. Alistair Cockburn, Humans and
Technology. Crosstalk May.
Isabelle, S. K., Gilkey, R. H., Kenyon, R. V., Valentino, G., Flach, J. M., Spenny, C. H., & Anderson, T. R. (1997). Defense applications of the CAVE (CAVE automatic
virtual environment). Paper presented at the AeroSense'97.
Joshi, N., & Jensen, H. (2004). Color calibration for arrays of inexpensive image sensors. Master’s thesis, Stanford University Department of Computer Science.
KaewTraKulPong, P., & Bowden, R. (2002). An improved adaptive background mixture model for real-time tracking with shadow detection Video-based surveillance systems (pp. 135-144): Springer.
Larman, C., & Basili, V. R. (2003). Iterative and incremental development: A brief history. Computer(6), 47-56.
Laurentini, A. (1994). The visual hull concept for silhouette-based image understanding.
Pattern Analysis and Machine Intelligence, IEEE Transactions on, 16(2), 150-
162.
Lee, S.-Y., Kim, I.-J., Ahn, S. C., Ko, H., Lim, M.-T., & Kim, H.-G. (2004). Real time
3D avatar for interactive mixed reality. Paper presented at the Proceedings of the
2004 ACM SIGGRAPH international conference on Virtual Reality continuum and its applications in industry.
Li, L., Huang, W., Gu, I. Y., & Tian, Q. (2003). Foreground object detection from videos
containing complex background. Paper presented at the Proceedings of the
141 Maimone, A., & Fuchs, H. (2011a). Encumbrance-free telepresence system with real-
time 3D capture and display using commodity depth cameras. Paper presented at
the Mixed and Augmented Reality (ISMAR), 2011 10th IEEE International Symposium on.
Maimone, A., & Fuchs, H. (2011b). A First Look at a Telepresence System with Room-
Sized Real-Time 3D Capture and Large Tracked Display. Paper presented at the
International Conference on Artificial Reality and Telexistence (ICAT), Osaka (Japan).
Maimone, A., & Fuchs, H. (2012a). Real-Time Volumetric 3D Capture of Room-Sized
Scenes for Telepresence. Paper presented at the Conference: The True Vision -
Capture, Transmission and Display of 3D Video (3DTV-CON), Zurich (Switzerland).
Maimone, A., & Fuchs, H. (2012b). Reducing interference between multiple structured
light depth sensors using motion. Paper presented at the Virtual Reality Short
Papers and Posters (VRW), 2012 IEEE.
Massie, T. H., & Salisbury, J. K. (1994). The phantom haptic interface: A device for
probing virtual objects. Paper presented at the Proceedings of the ASME winter
annual meeting, symposium on haptic interfaces for virtual environment and teleoperator systems.
Matsuyama, T., Wu, X., Takai, T., & Wada, T. (2004). Real-time dynamic 3-D object shape reconstruction and high-fidelity texture mapping for 3-D video. Circuits
142 Matusik, W., Buehler, C., & McMillan, L. (2001). Polyhedral visual hulls for real-time
rendering: Springer.
Matusik, W., Buehler, C., Raskar, R., Gortler, S. J., & McMillan, L. (2000). Image-based
visual hulls. Paper presented at the Proceedings of the 27th annual conference on
Computer graphics and interactive techniques.
McNeill, D. (1992). Hand and mind: What gestures reveal about thought: University of Chicago press.
Mester, R., Aach, T., & Dümbgen, L. (2001). Illumination-invariant change detection using a statistical colinearity criterion Pattern recognition (pp. 170-177): Springer.
Mitchelson, J., & Hilton, A. (2003). Wand-based multiple camera studio calibration.
Center Vision, Speech and Signal Process.
Moore, C. (2012). Distribution and Processing of Video for Real-time 3D Telepresence. University of Salford.
O'Hare, J. Octave - technical information, University of Salford. Retrieved 01/09/15,
2015, from http://www.salford.ac.uk/computing-science-
engineering/facilities/octave-technical-information
Office of the Future. (2009). Retrieved 14/11/15, 2015, from
http://www.cs.unc.edu/Research/stc
Petit, B., Lesage, J.-D., Menier, C., Allard, J., Franco, J.-S., Raffin, B., . . . Faure, F. (2009). Multicamera real-time 3d modeling for telepresence and remote collaboration. International journal of digital multimedia broadcasting, 2010.
143 Pollefeys, M., Sinha, S. N., Guan, L., & Franco, J.-S. (2009). Multi-view calibration, synchronization, and dynamic scene reconstruction. Multi-Camera Networks: Principles and Applications, 29-75.
Porikli, F. (2003). Inter-camera color calibration by correlation model function. Paper presented at the Image Processing, 2003. ICIP 2003. Proceedings. 2003 International Conference on.
Ramanathan, P., Steinbach, E. G., & Girod, B. (2000). Silhouette-Based Multiple-View
Camera Calibration. Paper presented at the VMV.
Reflecmedia. (2015). Chromatte. Retrieved 15/12/15, 2015, from
http://www.reflecmedia.com/broadcast/products/chromatte/index.htm
ResearchGate. (2015). Retrieved 14/11/15, 2015, from http://www.researchgate.net
Rimé, B. (1982). The elimination of visible behaviour from social interactions: Effects on verbal, nonverbal and interpersonal variables. European journal of social psychology, 12(2), 113-129.
Rimé, B., & Schiaratura, L. (1991). Gesture and speech.
Roberts, D. J., Fairchild, A. J., Campion, S. P., O'Hare, J., Moore, C. M., Aspin, R., . . . Tecchia, F. (2015). withyou—An Experimental End-to-End Telepresence System Using Video-Based Reconstruction. Selected Topics in Signal Processing, IEEE Journal of, 9(3), 562-574.
Roberts, D. J., Rae, J., Duckworth, T. W., Moore, C. M., & Aspin, R. (2013). Estimating the gaze of a virtuality human. Visualization and Computer Graphics, IEEE
144 Sauvola, J., & Pietikäinen, M. (2000). Adaptive document image binarization. Pattern
recognition, 33(2), 225-236.
Schultz, C. (2006). Digital Keying Methods. University of Bremen Center for Computing Technologies.
ScienceDirect. (2015). Retrieved 14/11/15, 2015, from http://ieeexplore.ieee.org
Shen, R., Cheng, I., & Basu, A. (2008). Multi-Camera Calibration Using a Globe. Paper presented at the The 8th Workshop on Omnidirectional Vision, Camera Networks and Non-classical Cameras-OMNIVIS.
Shu, B., Qiu, X., & Wang, Z. (2008). Hardware-based camera calibration and 3D
modelling under circular motion. Paper presented at the Computer Vision and
Pattern Recognition Workshops, 2008. CVPRW'08. IEEE Computer Society Conference on.
Shujun, Z., Cong, W., Xuqiang, S., & Wei, W. (2009). DreamWorld: CUDA-accelerated
real-time 3D modeling system. Paper presented at the Virtual Environments,
Human-Computer Interfaces and Measurements Systems, 2009. VECIMS'09. IEEE International Conference on.
Sinha, S. N., & Pollefeys, M. (2004). Synchronization and calibration of camera
networks from silhouettes. Paper presented at the Pattern Recognition, 2004. ICPR
2004. Proceedings of the 17th International Conference on.
Sobral, A. (2013). BGSLibrary: An opencv c++ background subtraction library. Paper presented at the IX Workshop de Visao Computacional (WVC’2013), Rio de Janeiro, Brazil.
145 Starck, J., Maki, A., Nobuhara, S., Hilton, A., & Matsuyama, T. (2009). The multiple-
camera 3-D production studio. Circuits and Systems for Video Technology, IEEE
Transactions on, 19(6), 856-869.
Stauffer, C., & Grimson, W. E. L. (1999). Adaptive background mixture models for real-
time tracking. Paper presented at the Computer Vision and Pattern Recognition,
1999. IEEE Computer Society Conference on.
Stauffer, C., & Grimson, W. E. L. (2000). Learning patterns of activity using real-time tracking. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 22(8), 747-757.
Stockman, G. C., Chen, S. W., Hu, G., & Shrikhande, N. (1988). Sensing and recognition of rigid objects using structured light. Control Systems Magazine, IEEE, 8(3), 14- 22. doi: 10.1109/37.472
Tong, J., Zhou, J., Liu, L., Pan, Z., & Yan, H. (2012). Scanning 3d full human bodies using kinects. Visualization and Computer Graphics, IEEE Transactions on, 18(4), 643-650.
Tsai, R. Y. (1987). A versatile camera calibration technique for high-accuracy 3D machine vision metrology using off-the-shelf TV cameras and lenses. Robotics
and Automation, IEEE Journal of, 3(4), 323-344.
van den Bergh, F., & Lalioti, V. (1999). Software chroma keying in an immersive virtual environment. South African Computer Journal, 24(155-162), 50.
146 Westerteiger, R., Gerndt, A., & Hamann, B. (2011). Spherical Terrain Rendering using
the hierarchical HEALPix grid.
http://dx.doi.org/10.4230/OASIcs.VLUDS.2011.13
Will, P. M., & Pennington, K. S. (1971). Grid coding: a preprocessing technique for
robot and machine vision. Paper presented at the Proceedings of the 2nd
international joint conference on Artificial intelligence, London, England.
Yasuda, K., Naemura, T., & Harashima, H. (2003). Thermo-key: Human Region
Segmentation from Video Using Thermal Information. Paper presented at the
ACM SIGGRAPH.
Yasuda, K. N., T. ; Harashima, H. . (2004). Thermo-key: human region segmentation from video. Computer Graphics and Applications, IEEE, 24(1), 26-30. doi: 10.1109/MCG.2004.1255805
Zhang, H., Wong, K.-Y. K., & Zhang, G. (2007). Camera calibration from images of spheres. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 29(3), 499-502.
Zhang, Z. (2000). A flexible new technique for camera calibration. Pattern Analysis and
Machine Intelligence, IEEE Transactions on, 22(11), 1330-1334.
Zhang, Z. (2004). Camera calibration with one-dimensional objects. Pattern Analysis and
Machine Intelligence, IEEE Transactions on, 26(7), 892-899.
Zivkovic, Z. (2004). Improved adaptive Gaussian mixture model for background
subtraction. Paper presented at the Pattern Recognition, 2004. ICPR 2004.
147 Zivkovic, Z., & van der Heijden, F. (2006). Efficient adaptive density estimation per
image pixel for the task of background subtraction. Pattern recognition letters, 27(7), 773-780.
148
Appendix A
Revision: 1 Date: 13/06/2016
3D Reconstruction System User Guide
Revision: 1 Date: 13/06/2016
Contents
About... 3 Key to notations ... 4 Spatial Calibration of Cameras ... 5 Acquiring sphere coordinates ... 5 Capture Node PC Preparation ... 5 Ready the Octave ... 5 Acquire the points ... 5 Generating Calibration File ... 7 Running 3D Reconstruction and Rendering ... 9 Configuration of octave ... 9 For optimal results in the Octave research facility turn on all ceiling lights and set all wall
projectors to white. It is best to leave the projectors which illuminate the floor OFF. ... 9 Synchronous video acquisition and background-foreground segmentation ... 9 3D Reconstruction... 10 Initialising the software ... 10 Performing 3D Reconstruction ... 11