Transductive Image Segmentation - Graph Based Image Segmentation and Tracking

2.3 Graph Based Image Segmentation and Tracking

2.3.2 Transductive Image Segmentation

In the context of image segmentation, some graph terminologies in graph theory are reinterpreted below. Given anmbynimage, let a undirected graphG = (V,E)with

N=m×nnodes. Its nodevidenotes each pixel on the image and an edgeei jdenotes a connection between a pixelvi and its neighbouring pixelvj1. TheNbyNsimilarity matrixWis defined to describe the similarity between the pair of pixels. Particularly,

wi jis the similarity measurement between pixelsviandvj. The similarity matrix actu- ally is a generalised adjacency matrix that describes the connectivity between nodes.

The degree matrixD is an Nby N diagonal matrix, whose theith diagonal element

di =∑jwi j. The graph cut is a partition that separates the original graph into two dis- connected subgraphs. The cost of the graph cut is equal to the summation of similarity values over cutting edges,cut(A,A′) =∑_i_∈_A_,_j_∈_A′w_{i j}. The Laplacian matrix is anNby

1_{Depending on the definition of the neighbourhood, the graph is varied. The 4-connected neighbour-}

§2.3 Graph Based Image Segmentation and Tracking 33

Nsymmetric matrix with one row and column for each node defined byL =D−W.

It holds many favourable properties of the graph listed below: 1. Lis always positive semidefinite.

2. The number of times 0 appears as an eigenvalue in the Laplacian is the number of connected components in the graph.

3. The smallest eigenvalue is always 0.

4. The second smallest eigenvalue is called the algebraic connectivity.

5. The smallest non-trivial eigenvalue of L is called the spectral gap or Fiedler value.

Optimisation of image segmentation often involves integration over the entire feature space. This is usually analytically intractable. In their work [Duchenne et al. 2008], Duchenne et al point out that as the image segmentation problem deals with finite space, the integration over the entire feature space can be approximated by a discrete summation. Particularly, a laplace Beltrami operator is approximated by a Laplacian matrix. The transductive approach reduces an intractable integration problem to a discrete approximation, and eventually simple, solvable linear equations.

In their work, segmentation is treated as statistical transductive inference, in which some pixels are already classified correctly, and remaining ones need to be classified. The method utilises a laplacian graph regulariser, a powerful manifold learning tool based on the estimation of variants of the laplace Beltrami operator and tightly related to diffusion processes. The distinction between transductive and inductive inference is that there is not any unknown input, rather all inputs have known class labels. Thus, given a set of classified pixels, the task is to infer the class label of remaining pixels rather than infer the class label of novel pixels from different images. In this case, the generalisation process to avoid overfitting becomes less important, and a better fit decision boundary is more desirable.

In traditional optimisation of image segmentation, the objective is to search for a smooth function f from the input space into the output space such that f(xi)is close to the associated outputyi (class label) on a training set. In Duchenne et al’s work, it is assumed that the points are generated by a probability distribution2 pwith a sup- port on a submanifold Mof Euclidean space. Further, they believe the function value in low density regions (equivalently, the segmentation boundary regions) should be allowed to vary more than in other regions, since those are places where misclassi- fications occur. Hence, by imposing a control parameters > 0, their approach can control how low the density should be to allow large variations of f. With consider- ation of the confidenceci of the training pixel assignment, the inference problem can be summarised as follows: min f _i_∈

∑

_Trainci(yi− f(xi)) 2₊∫ M∥∇f∥ 2_ps_dv

Minimisation to find a smooth function which infers the output label yi, given xi, minimises estimate errors while penalising overfitting, accounting for the density of the input probability distribution as well as a low density control parameter. ci are positive coefficients measuring how much the training pair(xi,yi)will contribute to overall errors, and also reflecting the confidence that the class labelyiis correct. When

psis small it allows a large magnitude of curvature to occur at a particular point, oth- erwise it encourages small changes at that point. However, the integral in the above formulation is mathematically intractable. An alternative from Hein et al’s results [Hein et al. 2005] is an equivalent discrete approximation of this problem (more de- tails can be found in [Duchenne et al. 2008]). It can be given by:

min F∈Rn

∑

i∈Train

ci(yi−Fi)2+FTLunF

§2.3 Graph Based Image Segmentation and Tracking 35

where, Fi = f(xi)i = 1....N×N is an estimate label forxi. Lun is a unnormalised Laplacian matrix. Further, it can be written as:

min F∈Rn

∑

i∈Train

(F−Y)TC(F−Y) +FTLunF

where,Cis an Nby Ndiagonal matrix for which theith diagonal element isci for a training pixel, and 0 for the remaining pixels. SimilarlyYis anN-dimensional vector for which theith element isyifor a training pixel, and 0 for the remaining pixels. For the above quadratic minimisation problem, it is simply reduced to the solution of the following linear system by assigning the its gradient to zero:

(Lun+C)F =CY (2.3.2)

By assuming Fi = yi,ci = ∞,i ∈ Train, segmentation is obtained by solving this simple linear system. Once again, the nature of the image segmentation problem is a linear combinatorial problem accounting for non-linear neighbourhood smoothness. Overall the transductive segmentation algorithm is outlined by Algorithm 3.

Algorithm 3Transductive Segmentation

1. Calculate the kernel k(xi,xj) = exp{−∥ xi−xj∥2 2σ2 g − ∥C(xi)−C(xj)∥2 2σ2 c }, and degree

d(xi) = ∑ni=jk(xj,xi), whereσg andσc are scales for the geometric and chromatic

neighbourhoods, respectively. C(xi) denotes the RGB levels of a square patch of

size 2m+1 around the pixelxi.

2. Calculate the normalised kernel and degree by K(xi,xj) =

k(xi,xj)

(d(xi)d(xj))λ, where λ=1−s/2.D(xi) =∑ni=jK(xj,xi)

3. Compute the unnormalised Laplacian matrixL= D−W

4. Solve the linear system 2.3.2

In document Inferring Human Pose and Motion from Images (Page 42-46)