Polarisation photometric stereo

(1)

Elsevier Editorial System(tm) for Computer Vision and Image Understanding

Manuscript Draft

Manuscript Number: CVIU-17-1R1

Title: Polarisation Photometric Stereo Article Type: Research paper

Keywords: Polarisation; photometric stereo; 3D reconstruction Corresponding Author: Dr. Gary Atkinson, PhD

Corresponding Author's Institution: University of the West of England First Author: Gary Atkinson, PhD

Order of Authors: Gary Atkinson, PhD

Abstract: This paper concerns a novel approach to fuse two-source

photometric stereo (PS) data with polarisation information for complete surface normal recovery for smooth or slightly rough surfaces. PS is a well-established method but is limited in application by its need for three or more well-spaced and known illumination sources and Lambertian reflectance. Polarisation methods are less studied but have shown promise for smooth surfaces under highly controlled capture conditions. However, such methods suffer from inherent ambiguities and the depolarising

effects of surface roughness. The method presented in this paper goes some way to overcome these limitations by fusing the most reliable

information from PS and polarisation. PS is used with only two sources to deduce a constrained mapping of the surface normal at each point onto a 2D plane. Phase information from polarisation is used to deduce a mapping onto a different plane. The paper then shows how the full surface normal can be obtained from the two mappings. The method is tested on a range of real-world images to demonstrate the advantages over standalone

(2)

Dear reviewers and editor,

I am very grateful for the time and consideration that you have devoted to reviewing my paper. I was delighted to hear that the comments were mostly positive but also greatly appreciate the constructive feedback. I have addressed the issues raised and feel this has improved the paper significantly.

Please find my response to specific comments attached separately. I shall look forward to hearing your decision.

Best regards, Gary Atkinson

(3)

Reviewer #1:

“1. Do you evaluate the influence of individual steps in the proposed framework, e.g., the azimuth estimation? Which module is the major part regarding the overall accuracy?”

This is a very interesting question! I have conducted some experiments where either the polarisation data (phase and degree of polarisation) or photometric stereo data (intensity) were simulated. This showed clearly that, for simple surfaces at least, it is photometric stereo that is the main limitation. This additional analysis is now described near the end of Section 4.1 in lines 425 to 435.

“2. How is the accuracy compared to utilizing conventional polarization methods such as [11][12]?”

I have compared some of the methods in the literature review more directly to the new method now (scattered throughout lines 61 – 78). As is often the case, direct and perfectly fair comparisons are difficult due to unspecified details in certain papers and different hardware requirements (e.g. multi-view vs. multi-source). The most similar method however, is now compared numerically on line 74. Less-quantitative pros and cons compared to the state-of-the-art are provided throughout the literature review.

“3. The hyperlinks of the figures & references did not work in current pdf?”

Apologies for this. I believe it is a result of the LaTeX compiler in the submission system, which I do not have access to. The links are working when pdfLatex is used offline. I will make absolutely certain the links are all working in the published version.

Reviewer #2:

“1. More evaluating in terms of normal accuracy

As the author also correctly claimed, that normal estimation accuracy cannot be completely reflected from the integrated surface. So, it makes more sense to show the normal comparison for Fig. 6 and Fig. 9. The author uses l2 error for several evaluations, but I think the angular error statistics makes the result more intuitively understandable in Figs. 4, 7 and 8.”

I agree that a little more consideration about the error metric used for each figure was necessary and have made some amendments accordingly:

- Fig 6 (now called Fig 7): I have left this particular figure as a depth graph because surface normal error (which both the reviewer and I agree is generally more representative) is evaluated in great detail afterwards. I feel it is useful to see overall depth errors at some point early in the results section. However, I can change this to normal angular error if the reviewer insists but I think that would not add much to the paper.

- Fig. 7 and 8 (now called 8 and 9): Converted to angular error as requested – good suggestion. Fig. 9 (now 10): Converted to normal error as requested. I have also formatted this graph in the same way as Fig. 10 (now 12) for ease of comparison. However, I have also included the depth error graph as a new figure (Fig 11 in the new version) to demonstrate the effects of large angular errors on depth estimation.

- I’m a little confused about the reviewer’s reference to Fig. 4 (now 5) as this is not a graph at all. However, I am confident that the changes above satisfy this comment.

“2. Specular reflection vs. specular interreflection

When analyzing Figs. 2 and 3, the author said in line 159 that the pi/2 flip is due to specular reflection. Specular reflection can indeed cause such a flip, but the scenario in Fig. 3 is mainly due to specular interreflection. Line 251 says both direct and inter-reflective specular reflection can cause such an effect. I suggest checking the consistency of this term, and make it clear throughout the paper. If possible, the author may also show an example due to specular reflection (e.g., by painting half of the diffuse sphere

(4)

I agree with the reviewer that there is some inconsistency in terminology between “specularity”, “inter-reflection”, etc. I have therefore tidied up the terminology throughout the paper and used the terms “direct specularity” or “specular inter-reflection” throughout or simply used the term

“specular/specularity” where the text refers to either type. In addition, I have added a figure (Fig. 4 in the new version) to demonstrate a large direct specular reflection. This is explained in the last paragraph of Section 2.

“3. Minor issues

- Missing references:

[T. Ngo et al., Shape and Light Directions from Shading and Polarization, CVPR 15]: this paper combines photometric stereo and shape from polarization.

[W. Smith et al., Linear Depth Estimation from an Uncalibrated, Monocular Polarisation Image, ECCV 16]: this is one of the latest paper proposing a purely polarization based approach for complete normal estimation.”

Thank you for suggesting these references. I strongly agree that these deserve a mention and have covered both in Section 1.1 (between lines 72 and 85).

“- It should be highlighted in the conclusion that a major limitation of using two-source photometric stereo is that it cannot deal with non-Lambertian materials (it could be quite sensitive to material reflectance variation due to too few light sources). But shape from polarization is not that sensitive to reflectance properties and thus can deal with much more broader range of material types. I think combining with two-source photometric stereo somehow sacrifices such an advantage of shape from polarization.”

(5)

A method for surface normal recovery using photometric stereo and polarisation

Two source photometric stereo used for 2D normal estimation

Polarisation information used for azimuthal angle estimation

Unknowns in normals due to ambiguities are resolved via region growing across surface

Method is proven robust to specular inter-reflections and evaluated extensively

(6)

Polarisation Photometric Stereo

Gary A. Atkinson

Bristol Robotics Labratory

Department of Engineering Design and Mathematics University of the West of England, Bristol

BS16 1QY UK

Abstract

This paper concerns a novel approach to fuse two-source photometric stereo

(PS) data with polarisation information for complete surface normal recovery

for smooth or slightly rough surfaces. PS is a well-established method but is

limited in application by its need for three or more well-spaced and known

illumination sources and Lambertian reflectance. Polarisation methods are less

studied but have shown promise for smooth surfaces under highly controlled

capture conditions. However, such methods suffer from inherent ambiguities

and the depolarising effects of surface roughness. The method presented in this

paper goes some way to overcome these limitations by fusing the most reliable

information from PS and polarisation. PS is used with only two sources to

deduce a constrained mapping of the surface normal at each point onto a 2D

plane. Phase information from polarisation is used to deduce a mapping onto

a different plane. The paper then shows how the full surface normal can be

obtained from the two mappings. The method is tested on a range of real-world

images to demonstrate the advantages over standalone applications of PS or

polarisation methods.

Keywords: Polarisation, photometric stereo, 3D reconstruction

1. Introduction

Three-dimensional surface reconstruction/analysis has been a major goal of

computer vision for many years. There have a plethora of methods proposed

(7)

in the last few decades, each with their unique pros and cons. For example,

binocular stereo [1] applies at any range and is inexpensive but inappropriate for

5

featureless surfaces. By contrast, time-of-flight cameras [2] are known to operate

on a variety of scenes at high speed but suffer at long range. This paper uses

a fusion of photometric stereo (PS) and polarisation vision aimed at featureless

surfaces at close range: a goal that typically requires expensive methods that

use lasers or active illumination, else only operate at low resolution.

10

The proposed method initially applies a form of two-source PS to estimate

part of the surface normal. This is much more applicable to real world

appli-cations than most PS methods that require three or more sources on different

illumination planes. Polarisation information is then used in conjunction with

the PS-based estimates in order to fully constrain the normals at each pixel.

15

The paper therefore represents a significant step forward towards commercial

robotics applications able to exploit a largely untapped area of light: that of its

polarisation state.

This paper starts by reviewing the state-of-the-art in three-dimensional

sur-face estimation using polarisation. Then, in Section 2, the key background

20

theory from Fresnel Theory and polarisation imaging is provided. The novel

re-construction algorithm is described in detail in Section 3. Experimental results

and conclusions are provided in Sections 4 and 5 respectively.

1.1. Related work

One of the early proposed methods for surface shape estimation was that

25

of shape-from-shading, which aims to acquire geometry from an image using

intensity information only [3]. Due to its single-view and passive nature, it is

an attractive goal and does not suffer from the limitations of the above

ex-amples (binocular stereo and time of flight). Unfortunately, the method is

known to be ill-posed due to the bas-relief ambiguity and other reasons [4]. PS

30

was proposed as a means to overcome the ambiguities at the expense of the

need for three or more light sources, of known direction, illuminated separately

(8)

method become useful in many applications due to the relative ease of high

speed camera–illumination synchronisation [5]. However, the applications are

35

still highly limited due to the need for at least three sources. For example, for

many land-based robotics applications, it is easy to include two light sources

(to the far-left and far-right of the robot), but it is difficult to position a source

high above the ground. That said, Zhang et al. [6] apply geometric assumptions

to force a solution for smooth surfaces with two-source PS.

40

A less-studied approach to 3D vision problems is that ofpolarisation vision.

That is, the use of the polarisation state of reflected light to deduce surface

information [7]. As with the field of hyperspectral imaging, the method taps into

an entire set of information about the incoming light (polarisation state/spectral

composition) that a standard greyscale or RGB camera is unable to. Thus, one

45

may argue that a great deal of potentially useful information about the incoming

light is lost in standard vision techniques. Of course, the additional information

available from polarisation and hyperspectral imaging systems comes at the cost

of more expensive and often slower capture hardware.

Most previous computer vision research using polarisation uses Fresnel

the-50

ory applied to specular reflection [8]. The theory tells us that initially

unpo-larised illumination undergoes a partial linear polarisation process upon

reflec-tion from surfaces. The specific properties of the polarisareflec-tion correlate to the

relationship between the surface orientation and the viewing direction [7, 9].

Unfortunately, there are inherent ambiguities present (see Section 2) and the

55

refractive index of the target is typically required. Further, different equations

are required to model reflection if adiffuse component is present.

Miyazaki et al. [10], Atkinson and Hancock [11] and Berger et al. [12] all use

multiple viewpoints to overcome the ambiguity issue. In [10], specular reflection

is used on transparent objects – a class of material that causes difficulty for most

60

computer vision methods. In [11], a patch matching approach is used for

dif-fuse surfaces to find stereo correspondence and, hence, 3D data. Results are of

comparable quality to this paper, but cannot easily handle

(9)

is applied. Taamazyan et al. [13] use a combination of multiple views and a

65

physics-based reflection model to simultaneously separate specular and diffuse

reflection and estimate shape (Miyazaki et al. [14] also separate reflection

com-ponents but only as a pre-processing step).

Drbohlav and ˇS´ara [15, 16] use PS (as in this paper) but with linearly

po-larized incident illumination. Atkinson and Hancock [17] also use PS but with

70

more light sources and less resilience to inter-reflections than the method of this

paper. Ngo et al. [18] propose a similar combined polarisation/two-source PS

method to the one presented in this paper with optimal results reported to be

of the order 2◦ error from ground truth (this paper manages of 3−4◦).

How-ever, the Ngo paper focuses on estimating the lighting directions rather than

75

addressing complications due to specularities/inter-reflections (which was a key

motivation here).

Garcia et al. [19] use a circularly polarised light source to overcome the

ambiguities while Morel et al. [20] extend the methods of polarisation to metallic

surfaces by allowing for a complex index of refraction. Smith et al. [21] assume

80

a known constant refractive index and albedo but are able to pose the problem

as a system of linear equations. This allows to directly deduce depth up to a

concave/convex ambiguity in the presence of both diffuse and specular reflection.

Furthermore, the method is successfully applied to conditions of uncontrolled

illumination and only relies on a single view.

85

In an effort to overcome the need for the refractive index, Miyazaki et al.

con-strain the histogram of surface normals [14] while Rahmann and Canterakis

[22, 23] use multiple views and the orientation of the plane of polarisation for

correspondence. Huynh et al. [24] actually estimate the refractive index using

spectral information. Kadambi et al. [25] propose a method to allow polarisation

90

information to be combined with alternative depth capture methods resulting

(10)

1.2. Contribution

The motivation for this paper relates to the following weaknesses of existing

methods in related areas: firstly the need for three light sources in PS; secondly

95

the multiple ambiguities in polarisation; and thirdly the complexities due to

specular inter-reflections and diffuse reflection in both PS and polarisation. The

contributions are as follows:

• Exploit the advantages of PS methods but using only two sources.

• Reduce the dependency of PS on light source knowledge.

100

• Use a combination of PS and polarisation data with a novel region

grow-ing algorithm to deduce the full 3D surface normals of an object in the

presence of specular inter-reflections (assuming that the majority of the

surface is not specularly reflecting). The method is also somewhat robust

to direct specularities, depending on their relative intensity compared to

105

the diffuse component.

• Avoid the need to know or estimate the refractive index.

As stated earlier, it is thought that overcoming these limitations allows a

signif-icant step towards the usage of polarisation information in real-world robotics

applications.

110

2. Polarisation Vision

This paper is based on the premise that light undergoes a partial

polarisa-tion process upon reflecpolarisa-tion from smooth surfaces. Consider the experimental

arrangement shown in Fig. 1, which is used for all of this work. Fresnel

re-flectance theory [8] is able to quantify the polarisation process that occurs when

115

initially unpolarised incident light is reflected towards the camera. The reflected

light can be parametrised by three values. First is the intensity,I, of the light.

Second is the phase angle,φ, which defines the principle angle of the electric field

component of the light as shown in the figure. Finally, the degree of

polarisa-tion,ρ, indicates the level of polarisation from 0 (unpolarised) to 1 (completely

120

(11)

x

z

y

b₂ f

b q

a

N

1

Figure 1: Arrangement and definitions. Two polarisation images are acquired using the camera and rotating polariser. Each image has a different source illuminated. The surface normal,N, is defined by its zenith angle,θ, and azimuth angle,α.

As explained elsewhere in the literature [7, 26], for specularly reflected light

the phase angle aligns perpendicularly to the projection of the surface normal

onto the image plane. Assuming that the Wolff subsurface scattering model [27]

applies, diffuse reflection causes parallel alignment. Since there is no

distinc-125

tion in phase angle shifts ofπ radians, there are therefore four possible surface

azimuthal angles (defined in Fig. 1) for an unknown reflectance type:

α∈

φ, φ+π

2, φ+π, φ+ 3π

4

where 0≤φ < π (1)

The degree of polarisation contains information principally related to the

zenith angle of the surface. Unfortunately, the relationship between zenith

an-gle and degree of polarisation varies substantially between specular and diffuse

130

reflection and depends on the refractive index of the reflecting surface, which is

typically unknown [26]. For this reason, its use for surface normal estimation is

avoided in this paper. The method does however use the degree of polarisation

as a measure of reliability of the phase data: any phase data with a

correspond-ing degree of polarisation below a threshold (fixed at 1%) is deemed unreliable

135

due their correspondingly high associated noise levels.

The method described in the next section requires two polarisation images of

an object for input (a single polarisation image comprises a separate intensity,

phase and degree of polarisation value for each pixel). There are now several

commercially available cameras that capture such data directly such as the

[image:11.612.167.451.126.237.2]

(12)

Fraunhofer “POLKA” [28]. The POLKA has a sensor divided into 2×2 cells,

with each element sensitive to light of a given electric field orientation (0, 45◦,

90◦ and 135◦). If the intensities measured for these angles are called I0, I45,

I90 andI135 respectively, then the polarisation data for each cell is calculated

via the Stoke’s parametersS0, S1 andS2 (assuming no circular polarisation is

145

present) [8]:

Stoke’s parameters:

S0=

I0+I45+I90+I135

2 (2)

S1=I0−I90 (3)

S2=I45−I135 (4)

Polarisation image data:

150

I=S0 (5)

φ= 1

2arctan2

S2 S1

(6)

ρ=

p

S2 1+S22 S0

(7)

where arctan2 is the four quadrant inverse tangent [29].

Raw data for this paper were captured using a Dalsa Genie HM1400 camera

with a motor-controlled circularly polarising filter in front of the lens. The motor

155

rotated the filter to angles of 0, 45◦, 90◦ and 135◦between each image capture,

after which (2–7) were applied to calculate the polarisation image; following a

similar method to the POLKA. A circular filter was used, rather than linear, in

order to minimise effects of any inter-reflections in the lens (Schneider KMP-IR

Xenoplan 23/1,4-M30,5) [30]. The rotating polariser method is clearly much

160

slower (but cheaper) than using the POLKA (or similar products) and thus

restricts application to stationary objects. However, the rest of the paper is

independent of the capture method and so could easily be applied to moving

objects if such cameras were used with this method in the future.

Examples of the three components of the polarisation image for a white

165

snooker ball are shown in Figs. 2 to 4. Note that the intensity is normalised

(13)

ball illuminated by a single small white LED placed close to the camera and

blackout curtains surrounding the ball to diminish all inter-reflections from the

environment. The reflection is therefore of diffuse type throughout the surface

170

and so the first or third solutions to (1) must be true for all pixels.

For Fig. 3, a plain matte white board was placed behind the ball. A sudden

phase shift is present near the occluding contours of the ball due to an

inter-reflection from the board. Since this region is effectively undergoing a specular

inter-reflection, the second or fourth solutions to (1) are true. By contrast,

175

the first or third solutions are true for the rest of the ball, which is exhibiting

diffuse reflection. Note also, that the degree of polarisation is higher near the

inter-reflection on the right-hand side of the ball. This is also predicted by the

reflection theory but is of less significance to this paper [26]. The noise-ridden

background to Fig. 3 is due to the very low degree of polarisation for the board

180

and the shadow cast by the ball; neither of which are of significance here.

The polarisation image in Fig. 4 includes an extended light source located to

the upper-right of the ball. The phase shift is clearly present here, demonstrating

the specular nature of the direct reflection. Careful examination of the degree of

polarisation demonstrates some inter-reflections from the laboratory are present

185

just below the main specularity. Since the intensity from these inter-reflections

is very low, the dominant reflection type is diffuse there.

3. Method

The approach can be broadly divided into the following steps assuming that

the starting point is two full polarisation images corresponding to the two light

190

source locations shown in Fig. 1.

1. Extract estimates of 2D (y-z) surface normals from two-source PS.

2. (a) Extract estimates of 2D (x-y) surface normals from polarisation aided

by PS.

(b) Disambiguate spurious data or specular inter-reflections (or,

poten-195

(14)

(a)

0 0.5 1

(b)

0 45 90 135 180

(c)

[image:14.612.139.473.157.300.2]

0 0.1 0.2 0.3

Figure 2: Example polarisation image of a white snooker ball with no inter-reflections and only one specularity (which is saturated in the image). (a) IntensityI, (b) phase angleφ, (c) Degree of polarisation,ρ.

(a)

0 0.5 1

(b)

0 45 90 135 180

(c)

0 0.1 0.2 0.3

[image:14.612.140.472.422.562.2]

(15)

(a)

0 0.5 1

(b)

0 45 90 135 180

(c)

[image:15.612.141.472.126.265.2]

0 0.1 0.2 0.3

Figure 4: Example polarisation image of a white snooker ball with a direct specularity from an extended source and a sequence of uncontrolled environmental inter-reflections on the right-hand side. (a) IntensityI, (b) phase angleφ, (c) Degree of polarisationρ.

3. Combine data from each modality to obtain 3D (x-y-z) surface normals.

The first step essentially applies a 2D version of PS to obtain a 2D surface

normal at each pixel. This is not merely a projection of the 3D normal onto

they-z plane but rather a vector representing one particular degree of freedom

200

of the orientation of the surface at each point. The second step uses the

po-larisation phase angle of the incoming light and assumes diffuse reflection (for

now) to estimate a 2D version of the surface normal in thex-y plane. PS data

is applied to disambiguate most of these vectors but region growing is needed

where specularities occur or where previous disambiguation is deemed incorrect

205

(as defined in Section 3.2.2). Finally, the information above is combined to form

a 3D surface normal map.

3.1. Application of PS

The aim here is to use the method of PS to estimate a 2D normal in they-z

plane at each point. To do this, the original methodology of Woodham [31] is

210

adapted to two images in order to obtain a set ofnnormals as follows:

 



N(ps)_i =





N_y,i(ps)

N_z,i(ps)

     n i=1 ←  

−cos(βR) cos(βL)

sin(βR) −sin(βL)     IL,i IR,i 

(16)

where the angles βL and βR are defined in Fig. 1 and the suffix “ps” is a

reminder that these estimates are from photometric stereo. The “←” symbol

is used throughout this paper to refer to variable assignment. The vectors are

then normalised into unit vectors in preparation for fusion with the polarisation

215

data.

One of the aims of this paper is to ensure that the method is robust to

disruptions in illumination. For example, standard PS yields poor results if one

source appears much brighter than the other or if the distance to the target

object varies (as this affects the light source vector). The results for this paper

220

were obtained with accurately positioned illumination so this potential issue

does not manifest. However, it is possible to use the phase information from

polarisation to calibrate the light source intensities assuming a linear (or at

least known) camera response profile. This is done by scaling the intensities

such that areas where φ = 90◦ _{or 270}◦ _{have equal} _I _{values. Further, it is}

225

worth noting at this point that the distance-to-target issue is less pronounced

here than standard PS since only thez-component of the normal is affected by

incorrect source vectors.

3.2. Application of polarisation vision

This section describes the method to estimate 2D normals in thex-yplane. It

230

assumes that the method for polarisation image acquisition described in Section

2 has been applied to arrive at corresponding intensity, phase and degree of

polarisation values for all npixels: {Ii, φi, ρi}n_i₌₁. This section also makes use

ofnN(ps)_i on

i=1

. This part of the algorithm is in two parts, the code for which is

summarised in Algorithm 1.

235

The raw data yields two estimates of polarisation values for each pixel (one

corresponding to each light source direction). In theory, the data should be

identical between each polarisation image, except for the locations of

speculari-ties and shadows. The proposed algorithm first forms a new polarisation image

corresponding to what is deemed the most reliable data for each pixel. Two

240

(17)

corresponding to the highest intensity at each point, (2) use the polarisation

data from the image corresponding to smallest deviation between raw data and

the transmitted radiance sinusoid [26]. While the latter approach is perhaps

more technically sound, it was experimentally determined that the final quality

245

of results were very similar between the two approaches. For this reason, the

former more computationally efficient approach was used for the remainder of

the research.

3.2.1. Initial azimuth estimation

This is covered by the first three lines of Algorithm 1. The first line applies a

250

sharpening operator to the phase data. The reason for this that the subsequent

specularity detection/disambiguation algorithm (“localAlign”) is more reliable

in the presence of sharp transitions between specularities and diffuse regions.

An alternative to this is to increase the definition of a neighbourhood in the

subsequent function, allowing the method to bridge over any gradual transitions.

255

The method makes the initial assumption that all surface normal projections

Algorithm 1Input: nφi, ρi,N (ps) i

on

i=1 Output:

n N(po)_i o

n

i=1

1: {φi} ←sharpen ({φi})

2: {αi} ←  



φi ∀i|

N_y,i(ps)<0∧φi>π₂

∨N_y,i(ps)>0∧φi< π₂

φi+π otherwise

3:  



N(po)_i =





N_x,i(po)

N_y,i(po)

     n i=1 ←  

−sinαi

cosαi 

 ∀i

4: {Ri} n i=1←

 



1 ∀i|isBorderPixel(i)∨ρi<0.01

0 otherwise

5: while[More seed points?] do

6: j←Get seed point

(18)

onto thex-y (image) plane are aligned parallel to, or anti-parallel to, the phase

angle, as predicted by the theory for diffuse reflection covered in Section 2. Line

2 sets the azimuthal angle, α, of each point accordingly. Whether said

projec-tions are parallel or anti-parallel depend on the best match to the estimated 2D

260

normal from PS. Line 3 simply generates a set of 2D surface normals on the

x-y plane using the calculated azimuth angles. The suffix “po” reminds us that

these estimates are primarily from polarisation.

3.2.2. Final azimuth estimation

This is covered by the last four lines of Algorithm 1. It is expected that

265

most of the normals will be correct at this point. However, it is likely that some

regions of the image will be incorrect due to one of the following reasons:

• Diffusely reflecting regions with incorrect disambiguation. This is typically

whereNy(ps)≈0, meaning the initial disambiguation is not robust. These

regions have an azimuth error orπradians.

270

• Specular (direct or inter-reflective) regions. In these areas the azimuth

angle error isπ/2 radians, as predicted by the theory described in Section

2.

Lines 4 to 7 of Algorithm 1 are intended to address these possibilities, while

enforcing (1). First, a set of pixels,{Ri} n

i=1 are determined that shouldnot be

275

further considered by the algorithm. In the first instance, these pixels

corre-spond to:

• Image border pixels: purely to avoid slowing down the algorithm with

numerous conditional statements.

• Those with very low (<1%) degree of polarisation: since these areas are

280

deemed to have too much noise to be of use.

Next, a seed point is chosen. This can easily be done either manually or

randomly using a point of high confidence (i.e where the degree of polarisation

is high and eitherNy(ps)0 orN

(ps)

(19)

Algorithm 2 “localAlign”.

Input: nN(po)_i , Ri on

i=1

, j Output: nN(po)_i

on

i=1

1: if [Rj = 1]then

2: return

N(po)_{, R}

3: Rj ←1

4: for[k←neighboursOf (j)]do

5: ∆α←arccosN(po)_j ·N(po)_k

6: if [small (∆α)]then

7: N(po) ←localAlign N(po), R , k

8: if [small (π−∆α)]then

9: N(po)_k ← −N(po)_k

10: N(po) _←_localAlign

N(po)_{, R} _{, k}

11: if

small π₂ −∆α

then

12: Nrot← 



0 1

−1 0



·N (po) k

13: if harccosN(po)_j ·Nrot

<π 2 i

then

14: N(po)_k ←Nrot

15: else

16: N(po)_k ← −Nrot

17: N(po) _←_localAlign

N(po)_{, R} _{, k}

this point must be of diffuse reflection so may benefit from some heuristics-based

285

selection in future work. Assume for now that only one seed point is needed,

but note that the code permits more if necessary (e.g. for complicated shapes).

The rest of the process involves a recursive call to a region growing function,

(20)

The region growing works as follows (see Algorithm 2). First, the function

290

checks whether the point in question j should be considered by reference to

{Ri} (lines 1 and 2). On occasions whereRj = 1, the function call terminates.

Where this is not the case, pointj is added to{Ri}, i.e. so it is not considered

again (line 3). The remainder of the function is completed for each neighbour

of the point (line 4). The neighbourhood can be defined in multiple ways but a

295

simple 4-connected region is used for this paper.

Next, the angle between neighbouringx-ysurface normals is calculated, ∆α

(line 5). There are four possibilities for ∆α:

• ∆αis close to zero: assume both azimuth angles are correct, move fromj

to neighbouring point,k, and continue the process (lines 6 and 7).

300

• ∆αis close toπ: as above but assume azimuth disambiguation was

incor-rect so rotate byπ(lines 8 to 10).

• ∆α is close to π/2: rotate azimuth by π/2 since this region is likely

fol-lowing specular reflection. The direction of rotation is chosen to minimise

the angle between the normals atj and k. Again, move from j to kand

305

continue the process thereafter (lines 11 to 17).

• Otherwise: there could be a boundary of orientation so there is no reason

to believe a the azimuth estimate is erroneous.

The definition of “close” for this recursive function remains open for now but

only minor effects on results were observed as the threshold for closeness was

310

varied between 5◦ and 10◦.

3.3. Fusion of data into full 3D surface normals

The methods from the previous two sections give two sets of 2D surface

normals:

N(ps) _{on the} _y_-_z _{plane and}

N(po) _{on the} _x_-_y _{plane. These can}

be combined into a single 3D vector. First assume that the 2D vectors are

315

normalised such that the following is true for each point:

(21)

where the subscriptsiare omitted for the sake of brevity.

Denote the 3D surface normal at a particular point N = [Nx, Ny, Nz]T.

Further, choose the components estimated by polarisation forxand y so there

is only one unknown,Nz:

320

N= [Nx, Ny, Nz] T

=hN_x(po), N_y(po), Nz iT

(10)

Normalising this vector to unit length gives:

n=

h

Nx(po), N (po) y , Nz

iT

p

1 +N2

z

=hn(po)x , n(po)y , nz iT

(11)

This can then be re-normalised such thatn(po)y 2

+n2z= 1, matching the form of

the 2D estimate from PS as in (9). For thez component specifically:

nz q

n(po)y 2

+n2 z

=N_z(ps) (12)

Substituting the components ofnfrom (11) and simplifying:

N_z(ps)= _q Nz

Ny(po) 2

+N2

z

(13)

Rearranging:

325

Nz= N (po) y 1

Nz(ps) 2−1

!−1/2

(14)

where the|·| sign is used sinceNz must be positive. This means that the only

unknown in (10) is resolved and so the surface normal is fully determined.

As stated earlier, the regions of the images where the degree of

polarisa-tion is less that 1% are not used in the algorithm to ensure robustness. For

the results in the next section, bi-cubic interpolation was used to estimate the

330

normals for these areas. It was also found that improvements could be made

by interpolating over regions of very lowNy, although this was kept to a

min-imum. It is acknowledged however, that more sophisticated methods from the

(22)

The normals can be converted to depth using any of a number of integration

335

methods [32]. Since it is not the aim of this paper to develop a new

integra-tor, the well-established Frankot-Chellappa method [33] is primarily used for

this step. The method is fast and highly robust to noise, but can suffer from

over-smoothing. An alternative method that does not over-smooth is briefly

considered in Section 4.3.

340

4. Results

This section contains a detailed analysis of the performance of the method

for a range of target objects. In the first instance, a thorough numerical

evalu-ation for the surface normals and height estimevalu-ations are carried out for a white

snooker ball under varying conditions and compared to ground truth and

base-345

line methods. Subsequently, a range of more challenging shapes and textures

are considered.

4.1. Spherical objects

Consider Fig. 2, the polarisation image for a white snooker ball captured in

ideal conditions as described in Section 2. The angle between the camera and

350

light sources wasβL=βR= 19.6±0.5◦. One hundred images were captured at

each of the four polariser angles (0, 45◦, 90◦ and 135◦, aligned with resolution

to the nearest degree) and the mean intensity used at each pixel to minimise

noise. As expected, the phase angle directly relates to the surface azimuth up

to a 180◦ambiguity and the degree of polarisation is highest near the occluding

355

contours [26].

As mentioned earlier in this paper, and elsewhere in the literature, among

the issues facing polarisation-based computer vision algorithms are the

compli-cations caused by the simultaneous presence of specular and diffuse reflection.

To test the robustness of the method proposed here, a second polarisation

im-360

age was captured in exactly the same conditions as for Fig. 2 but with a large

(23)

[image:23.612.201.416.132.239.2]

Figure 5: Surface normals and depth estimated from the polarisation image shown in Fig. 2 (and its right-illuminated counterpart). Surface normals are encoded by colour (azimuth) and saturation (zenith).

object. The resulting image is shown in Fig. 3, as also discussed in Section 2.

The specular inter-reflection can only just be seen in the intensity image, yet

the 90◦ phase shift and increased degree of polarisation make it clear in those

365

two components of the polarisation image.

The results of applying the surface normal estimation algorithm to the

po-larisation images shown in Figs. 2 and 3 are shown in Figs. 5 and 6,

respec-tively. The results are qualitatively good although the region of interpolation

(described in Section 3.3) near the top-centre and bottom-centre of the normal

370

maps is clearly apparent. Note that the data has been cropped here so only the

object itself is being integrated.

Fig. 7 shows a comparison of the profile of the reconstructions from Figs. 5

and 6 to ground truth data. To aid comparison, the profiles are aligned such

that the tops of each curve are touching. Very similar results were found for

375

pink and yellow balls, while blue and green results were slightly poorer due to

their lower brightness causing higher noise levels.

In order to optimise the surface normal estimates with regards to the light

source separation, both simulated and real-world data were analysed for varying

light source angles. For the simulated data, images were generated assuming no

380

inter-reflections/specularities were present and a range of illumination angles,

(24)

[image:24.612.199.416.178.282.2]

Figure 6: Surface normals and depth estimated from the polarisation image shown in Fig. 3 (and its right-illuminated counterpart). Surface normals are encoded by colour (azimuth) and saturation (zenith).

No backing With backing Ground truth

[image:24.612.194.422.445.587.2]

(25)

5 10 15 20 25

Source angle (deg)

0 1 2 3 4 5

ε

(d

eg

)

[image:25.612.209.389.131.287.2]

0 0.005 0.01 0.015

Figure 8: Surface normal errors using synthetic data with varying levels of Gaussian noise added (standard deviations shown in legend).

ranging from 0 (no noise) to 0.015 (typical for the camera settings used). The

results are shown in Fig. 8, where the performance metric used is the median

angular error,ε, between the method and ground truth. This metric has been

385

chosen since the more obvious choice of height residuals would also be assessing

the integration method used, which is not the contribution of the current paper.

The results to the analysis of simulated data show the following:

• The surface normals are not perfectly estimated even with no noise present

as a result of small surface regions being interpolated over. There may

390

also be some edge-effects degrading results slightly.

• There is relatively little variation in performance with illumination angle

for most noise levels.

• There is an optimum illumination angle of about 15◦ for higher noise

levels. This is due to the fact that (1) at very small angles the difference

395

between intensity images is very small which is poor conditioning for PS,

and (2) at large angles major areas of shadow are present.

Experimental results for varying the illumination angle are shown in Fig. 9.

(26)

10 15 20 Source angle (deg) 0 1 2 3 4 5 6 ε (d eg ) (a)

10 15 20

Source angle (deg) 0 0.02 0.04 0.06 0.08 z er ro r (b)

D = ∞

D = 128mm

D = 257mm

PS only (approx.) Pol only

Figure 9: (a) Angular errors and (b) height errors for the white snooker ball with a white board at various distances behind the ball,D. Height values are in units such that a value of 1 corresponds to a distance of one radius. Best case results for PS-only and polarisation-only are also indicated.

the focus here is on the former due to reasons already mentioned.

Unsurpris-400

ingly, the magnitudes of the errors are slightly higher than for simulated data

for most cases. Surprisingly however, the trend towards an optimum separation

is reversed for the real data. It is not entirely clear why this should be the case

but repeated experiments have yielded similar trends. It was therefore decided

to use 20◦ illumination angles for the remainder of the research (higher values

405

than this were neither practical for the experimental rig being used nor for most

real-world applications).

Fig. 9 also shows results with the white board re-introduced at several

dis-tances and the baseline results for standard photometric stereo and single-source

polarisation. The results with the board present show a strong performance by

410

the method at overcoming specular inter-reflections, which is one of the key

strengths of the method. Indeed, there is evidence that in some cases the

recon-structions may be even better when strong specular behaviour is present. For

[image:26.612.143.470.121.312.2]

(27)

that gave the best surface normal estimates for that method (which was

exper-415

imentally determined to be 25◦). For the single-source polarisation baseline, a

single polarisation image was used with the illumination source as close to the

camera as possible (βL =βR≈5◦ in practice) in order to minimise shadowed

regions. Regions of the image were manually selected in order to disambiguate

the azimuth angles and the zenith angles were calculated according to the Wolff

420

diffuse reflection model with a refractive index of 1.5 [7]. At first sight, the

baseline polarisation method appears comparable to the novel method, but it

is reiterated that the novel method is superior as it does not require manual

disambiguation nor knowledge of refractive index.

For the case with no board present behind the ball, a test on partially

sim-425

ulated data was conducted in order to analyse the relative effect of the PS and

polarisation elements of the method. Average angular errors for (1) entirely

real; (2) simulated polarisation (φandρ); and (3) simulated intensity (I) data

were: 7.6◦, 6.4◦and 2.9◦respectively. This is rather striking as it shows the PS

component has by far the largest associated error. This is presumably due to

430

the resilience of polarisation phase data against reflectance properties compared

to PS and the lesser need for carefully positioned light sources. It should be

noted however, that such a breakdown of experimental error may change for

certain target objects, such as where large areas of the surface have very low

degree of polarisation.

435

These areas of low degree of polarisation have associated high noise levels

and may be a potential weakness of the method. Results so far in this paper

overcame this in a rather expensive manner by capturing 100 images in very

rapid succession for each of the four polariser angles. The effects of using fewer

images is illustrated by Fig. 10 which shows the estimated surface normal (y

-440

andz-components only for compactness) using 1, 5, 20 and 100 images. At first

sight, the quality of results is poor when only few images are used. However,

the increased noise has a relatively low impact on surface height reconstruction,

as shown in the Fig. 11. This is due to the facts that (1) the Frankot-Chellappa

surface integrator is smoothing over the noise, and (2) the higher noise regions

(28)

-1 -0.5 0 0.5 1 Distance from axis -1

-0.5 0 0.5 1

Ny

-1 -0.5 0 0.5 1

Distance from axis 0.2

0.4 0.6 0.8 1

Nz

[image:28.612.144.464.127.277.2]

1 image 5 images 20 images 100 images

Figure 10: Comparison between surface normal estimates for varying number of images per polariser angle for a cross-section of the snooker ball.

50 100 150 200 250

Image column number -100

-50 0

z

1 image 5 images 20 images 100 images

Figure 11: Comparison between surface depth estimates for varying number of images per polariser angle for a cross-section of the snooker ball.

correspond to areas of low zenith angle which have smaller effects on the

inte-gration than those of larger zenith angle.

4.2. Other objects

Surface normal estimates for two cylinders are shown in Fig. 12. For this

case, the cylinders are both porcelain: one pure white and one multicoloured,

450

as shown in Fig. 13. The results clearly show a close match to theory; with the

main discrepancy being at the occluding contours. For the case shown, the axes

of the cylinders were oriented parallel to thex-axis (see Fig. 1). Unfortunately,

the method completely fails for the case where the cylinder axis is parallel to

y-axis. This is due to the fact that all pixels on the surface in such cases appear

[image:28.612.189.421.322.436.2]

(29)

-1 -0.5 0 0.5 1 Distance from axis -1

-0.5 0 0.5 1

Ny

-1 -0.5 0 0.5 1

Distance from axis 0

0.2 0.4 0.6 0.8 1

Nz

[image:29.612.138.471.128.282.2] [image:29.612.267.343.327.424.2]

Ground truth White cylinder Multicolour cylnder

Figure 12: Comparison between surface normal estimates and ground truth for a white porce-lain and multicoloured porceporce-lain cylinder.

Figure 13: Photograph of the multicoloured cylinder used for Fig. 12.

equally bright from each light source, making seed selection and region growing

nearly impossible: a clear weakness of the method which will be addressed in

future work.

Tests on objects of more complicated geometry are shown in Figs. 14 and

15. The general geometries of the fruits in Fig. 14 have clearly been reproduced,

460

albeit with over-smoothing in places. The model dinosaur in Fig. 15 also has

reasonable shape, despite a few areas of incorrect disambiguation on the foot

and head. It is hoped that such issues might be resolved in future work by

(30)

[image:30.612.166.479.198.311.2]

Figure 14: Surface reconstructions of an apple and lemon.

[image:30.612.201.406.475.587.2]

(31)

4.3. An alternative integrator

465

The choice of the Frankot-Chellappa integration method for conversion from

normals to depth was made since (1) it is well-established in the field and (2) it

operates at high speed (much less than one second to reconstruct a VGA frame).

However, the method is not very effective in certain circumstances. The most

known of its weaknesses is its tendency to over-smooth surfaces as a result of

470

its basis functions being unsuited to discontinuities.

Another weakness, which is more specific to the method of this paper,

re-lates to the fact that the integrator applies equal weighting to all points on the

surface. While this is sensible for most reconstruction methods, the approach

here may benefit from weighting certain pixels more than others. Specifically,

475

it is beneficial to give a higher weight to pixels of higher degree of polarisation

since those are less prone to noise/distortion.

An alternative integrator that is much less affected by discontinuities and

allows for variable pixel weighting is that of the M-estimator [34]. Fig. 16

demonstrates the advantage of the M-Estimator approach using the underside

480

of a porcelain bowl as a test object. The figure first shows the result of using

the Frankot-Chellappa method. The result is highly distorted at the base of the

bowl since the degree of polarisation is very low here over a wide area. In an

effort to mitigate these effects, the M-Estimator was used with the weight set

to zero wherever the degree of polarisation is less than 1%. As shown in the

485

figure, this significantly improves results for most of the surface1. The obvious

disadvantage of this approach was that the integration took much longer (≈40 s

compared to≈0.1 s).

5. Conclusion

Results in this paper demonstrate the potential for use of polarisation

in-490

formation for scene understanding in the presence of both diffuse reflection

(32)

[image:32.612.150.477.138.217.2]

Figure 16: Reconstructions of the underside of a porcelain bowl, including the side view. The left reconstructions were based on the Frankot-Chellappa method, while those to the right use the M-Estimator.

and specular inter-reflections. The results show reconstruction quality that is

improved or comparable with the baseline methods while needing only two

il-lumination sources, which is competitive to many other methods in the field.

At present, the data is captured using a rotating polariser and switching light

495

sources meaning the data takes several seconds to capture. As stated in Section

2 however, the method can be applied to data captured from a polarisation

camera such as the POLKA. This means that the only significant limitation in

capture time relates to the light source switching, which has easily been shown

to operate in the order of milliseconds [5]. The processing time for this method

500

is of the order five seconds at VGA resolution on an average personal computer.

The method has clearly demonstrated the use of polarisation information

to reduce the number of light sources needed for photometric stereo to two.

Unlike with many alternative methods in polarisation vision, this advantage

does not come at the expense of needing to know a refractive index or

mak-505

ing strong assumptions on reflectance properties. One major weakness of many

PS methods is that of shadowing. While the novel method still suffers from

reduced information at shadowed regions; only the zenith angle is affected. It

is therefore predicted that the estimated azimuth angles for such regions may

be combined with intensity information from one source only to fully recover

510

shadowed surfaces in follow-on work. The dependence of the method on the

reflectance properties of the surface is also deemed superior to many PS

(33)

reflectance parameters. That said, it is acknowledged that the proposed

ap-proach is still somewhat affected by reflectance properties due to the use of (8)

515

for thez-component of the normal.

One limitation that should be addressed in future work relates to the open

question of how to recover the geometry of an entire scene (as opposed to a

single object). It may be that optimisation of seed point generation and

weight-ing inter-object pixels in the integrator may reduce this limitation. Another

520

limitation relates to the complications caused by very rough surfaces. The main

problem caused by roughness for many polarisation-based methods is the

de-polarising effect of microscopic inter-reflections on the surface. This typically

has a direct degrading effect on the reconstruction because the zenith angle is

inherently dependent on degree of polarisation. For the method proposed in

525

this paper however, the zenith angle is entirely determined by PS, diminishing

the effects of roughness. Nevertheless, the increased noise present due to lower

degrees of polarisation from rough surfaces will still clearly impact results of the

method to an extent.

References 530

[1] M. Z. Brown, G. D. Hager, Advances in computational stereo, IEEE Trans.

Patt. Anal. Mach. Intell. 25 (2003) 993–1008.

[2] S. Foix, G. Aleny`a, Lock-in time-of-flight (ToF) cameras: A survey, IEEE

Sensors J. 11 (2011) 1917–1926.

[3] J. D. Durou, M. Falcone, M. Sagnona, Numerical methods for

shape-from-535

shading: A new survey with benchmarks, Comp. Vis. Im. Understanding

109 (2008) 22–43.

[4] P. N. Belhumeur, D. J. Kriegman, A. L. Yuille, The bas-relief ambiguity,

Intl. J. Comp. Vis. 35 (1999) 33–44.

[5] M. F. Hansen, G. A. Atkinson, L. N. Smith, M. L. Smith, 3D face

(34)

structions from photometric stereo using near infrared and visible light,

Comp. Vis. Im. Understanding 114 (2010) 942–951.

[6] W. Zhang, M. L. Smith, L. N. Smith, A. Farooq, Gender recognition from

facial images: two or three dimensions?, J. Opt. Soc. Am. A 33 (2016)

333–344.

545

[7] L. B. Wolff, T. E. Boult, Constraining object features using a polarisation

reflectance model, IEEE Trans. Patt. Anal. Mach. Intell. 13 (1991) 635–657.

[8] E. Hecht, Optics, 3rd Edition, Addison Wesley Longman, 1998.

[9] M. Saito, Y. Sato, K. Ikeuchi, H. Kashiwagi, Measurement of surface

ori-entations of transparent objects using polarization in highlight, in: Proc.

550

CVPR, Vol. 1, 1999, pp. 381–386.

[10] D. Miyazaki, M. Kagesawa, K. Ikeuchi, Transparent surface modelling from

a pair of polarization images, IEEE Trans. Patt. Anal. Mach. Intell. 26

(2004) 73–82.

[11] G. A. Atkinson, E. R. Hancock, Shape estimation using polarization and

555

shading from two views, IEEE Trans. Patt. Anal. Mach. Intell. 29 (2007)

2001–2017.

[12] K. Berger, R. Voorhies, L. Matthies, Incorporating polarization in stereo

vision-based 3D perception of non-Lambertian scenes, in: Proc. SPIE, Vol.

9837, 2016.

560

[13] V. Taamazyan, A. Kadambi, R. Raskar, Shape from mixed polarization,

arXiv preprint: 1605.02066.

[14] D. Miyazaki, R. T. Tan, K. Hara, K. Ikeuchi, Polarization-based inverse

rendering from a single view, in: Proc. ICCV, Vol. 2, 2003, pp. 982–987.

[15] O. Drbohlav, R. ˇS´ara, Unambiguous determination of shape from

pho-565

tometric stereo with unknown light sources, in: Proc. ICCV, 2001, pp.

(35)

[16] O. Drbohlav, R. ˇS´ara, Specularities reduce ambiguity of uncalibrated

pho-tometric stereo, in: Proc. ECCV, 2002, pp. 46–62.

[17] G. A. Atkinson, E. R. Hancock, Two-dimensional BRDF estimation from

570

polarisation, Comp. Vis. Im. Understanding 111 (2008) 126–141.

[18] T. Ngo, H. Nagahara, R. Taniguchi, Shape and light directions from shading

and polarization, in: In Proc. CVPR, 2015, pp. 2310–2318.

[19] N. M. Garcia, I. de Erausquin, C. Edmiston, V. Gruev, Surface normal

re-construction using circularly polarized light, Opt. Express 23 (2015) 14391–

575

14406.

[20] O. Morel, C. Stolz, F. Meriaudeau, P. Gorria, Active lighting applied to

three-dimensional reconstruction of specular metallic surfaces by

polariza-tion imaging, Applied Optics 45 (2006) 4062–4068.

[21] W. Smith, R. Ramamoorthi, S. Tozza, Linear depth estimation from an

580

uncalibrated, monocular polarisation image, in: In Proc. ECCV, 2016, pp.

109–125.

[22] S. Rahmann, N. Canterakis, Reconstruction of specular surfaces using

po-larization imaging, in: Proc. CVPR, 2001, pp. 149–155.

[23] S. Rahmann, Reconstruction of quadrics from two polarization views, in:

585

Proc. IbPRIA, 2003, pp. 810–820.

[24] C. P. Huynh, A. Robles-Kelly, E. Hancock, Shape and refractive index

recovery from single-view polarisation images, in: Proc. CVPR, 2010, pp.

1229–1236.

[25] A. Kadambi, V. Taamazyan, B. Shi, R. Raskar, Polarized 3D: High-quality

590

depth sensing with polarization cues, in: Proc. ICCV, 2015.

[26] G. A. Atkinson, E. R. Hancock, Recovery of surface orientation from diffuse

(36)

[27] L. B. Wolff, Diffuse-reflectance model for smooth dielectric surfaces, J. Opt.

Soc. Am. A 11 (1994) 2956–2968.

595

[28] Fraunhofer Institute for Integrated Circuits, http://

www.iis.fraunhofer.de/en/ff/bsy/tech/kameratechnik/

polarisationskamera.html, [Accessed: 1 March 2017].

[29] W. Burger, M. J. Burge, Principles of Digital Image Processing:

Funda-mental Techniques, Springer, 2010.

600

[30] N. Karpel, Y. Schechner, Portable polarimetric underwater imaging system

with a linear response, in: Proc. SPIE, Vol. 5432, 2004.

[31] R. J. Woodham, Photometric method for determining surface orientation

from multiple images, Optical Engineering 19 (1980) 139–144.

[32] R. F. Saracchini, J. Stolfi, H. C. G. Leit˜ao, G. A. Atkinson, M. L. Smith,

605

A robust multi-scale integration method to obtain the depth from gradient

maps, Comp. Vis. Im. Understanding 116 (2012) 882–895.

[33] R. T. Frankot, R. Chellappa, A method for enforcing integrability in shape

from shading algorithms, IEEE Trans. Patt. Anal. Mach. Intell. 10 (1988)

439–451.

610

[34] A. Agrawal, R. Raskar, R. Chellappa, What is the range of surface

(37)

LaTeX Souce Files