• No results found

Correlation estimation with reference image

Given the set of K atoms {gγi} that approximate the first image ˆI1, the disparity or motion estimation problem consists in finding the corresponding visual patterns in the second image I2, while the latter is given only by the compressed random measurements ˆY2. This is equivalent to finding the correlation between images I1 and I2 with the joint sparsity model based on local geometrical transformations described in Section 5.2.3. We describe here the proposed regularized optimization framework that leads to the estimation of the correlation between images I1and I2.

5.3.1

Regularized energy function

We are looking for a set of K atoms in I2 that corresponds to the K visual features{gγi} selected in the first image. We denote their parameters by Λ, where Λ = (γ1, γ2, . . . , γK ) for some γi, ∀i, 1 ≤ i ≤ K. We propose to select this set of atoms in a regularized energy minimization framework as a trade-off between the set that well approximates I2 and the set that results in smooth local transformations between images.

The energy model E proposed in our scheme is expressed as

E(Λ) = Ed(Λ) + α1Es(Λ), (OPT-1)

where Edand Esrepresent data and smoothness terms respectively; α1is the regularization parameter that

5.3 Correlation estimation with reference image 69

set of K atom parameters Λ∗ that minimizes the energy E, i.e., Λ∗= arg min

Λ∈SE(Λ), (5.4)

where S represents the search space given by

S ={(γ1, γ2, . . . , γK )| γi = δγi◦ γi, 1≤ i ≤ K, δγ ∈ L} , (5.5) wherei} are the parameters of the atoms in the reference image. The L ⊂ R5is given byL = [−δtx, δtx]× [−δty, δty]× [−δθ, δθ] × [−δsx, δsx]× [−δsy, δsy], where δtx, δty, δθ, δsx, δsydetermine the search window size corresponding to the translation parameters tx, ty, rotation θ, and scales sx, syrespectively. We describe below the two cost functions used in OPT-1.

5.3.2

Data cost function

The data fidelity term computes in the compressed domain, the accuracy of the sparse approximation of the second image with geometric atoms. The decoder receives the measurements ˆY2 that are computed by the

quantized projections of I2 onto a sensing matrix Φ. For each set of K atom parameters Λ ={γi}, the data

cost function Ed then reports the error between the measurements ˆY2 and the orthogonal projection of ˆY2

onto ΨΛthat is formed by the compressed versions of the atoms, i.e., ΨΛ= Φ[gγ1|gγ2| . . . |gγK ]. It turns out

that the orthogonal projection of ˆY2 is given as ΨΛΨΛYˆ2, where† represents the pseudo-inverse operator. More formally, the data cost is computed using the following relation:

Ed(Λ) = ˆY2− ΨΛΨΛYˆ222= ˆY2− ΨΛc22. (5.6)

The data cost function given in Eq. (5.6) first calculates the coefficients c = Ψ†ΛYˆ2 and then measures the distance between the observations ˆY2 and ΨΛc. In other words, the data cost function Ed accounts for the intensity variations between images by estimating the coefficients c of the warped atoms.

However, when the measurements are quantized, the coefficient vector c fails to properly account for the error introduced by the quantization. The quantized measurements only provide the index of the quantization interval containing the actual measurement value and the actual measurement value could be any point in the quantization interval. Let Y2,p be the pth coordinate of the original measurement, and

ˆ

Y2,p be the corresponding quantized value. It should be noted that the joint decoder has only access to the quantized value ˆY2,p, but not the original value Y2,p. Henceforth, the joint decoder knows that the quantized measurement lies within the quantized interval, i.e., ˆY2,p ∈ RYˆp = (rp, rp+1], where rp and

rp+1 define the lower and upper bounds of the quantizer bin Qp. We propose to refine the data term by computing a coefficient vector ˜c as the best solution when considering all the valid measurement values in

the quantization interval, i.e., ˜Y2 ∈ RYˆ, where RYˆ is the Cartesian product of all the quantized regions

RYˆp, i.e.,RYˆ =

p R

ˆ

Yp. The coefficients ˜c and the measurements ˜Y2 can be jointly estimated by solving the

following optimization problem:

c, ˜Y2) = arg min ˜

c, ˜Y2

 ˜Y2− ΨΛ˜c22 s.t. ˜Y2∈ RYˆ. (5.7)

By following the steps in Proposition 1 described in Chapter 4, it can be easily shown that the Hessian of the objective function h( ˜Y2, ˜c) =  ˜Y2− ΨΛ˜c22 is positive semidefinite, i.e., 2h  0, and hence the

objective function h is convex (not strictly). Also the regionRYˆ forms a closed convex set, as each region

RYˆp= (rp, rp+1],∀p forms a convex set. Therefore, the optimization problem given in Eq. (5.7) is convex, but not strictly convex as the Hessian of the objective function is not positive definite. Finally, the data

fidelity term given in Eq. (5.6) can be modified with the estimated coefficients ˜c and the measurements ˜Y2

as

˜

Ed(Λ) = ˜Y2− ΨΛ˜c22, (5.8)

where ˜Ed(Λ) represents the robust data function that efficiently accounts for the quantization noise.

5.3.3

Smoothness cost function

The goal of the smoothness term Esis to penalize the atom transformations such that they result in coherent transformations of neighbor atoms. In other words, the atoms in the neighborhood are likely to undergo similar transformations, when the correlation between images is due to object or camera motion. Instead of penalizing directly the transformations{Fi} (or equivalently {δγ}) such that they tend to be coherent

for neighbor atoms, we propose to generate a dense motion field from the atom transformations and to penalize the motion (or disparity) field such that it is coherent for adjacent pixels. This regularization is easier to handle than a regular set of transformations Fiand directly corresponds to the physical constraints

that explain the formation of correlated images. For a given transformation fieldf = (fh,fv,fθ,fa,fb), we

compute the horizontalmh and verticalmv components of the motion field as

 mh(z) mv(z)  =  m(z) − tx(z) n(z) − ty(z)   fa(z) 0 0 fb(z)     S  cos(fθ(z)) sin(fθ(z)) −sin(fθ(z)) cos(fθ(z))     R  m(z) − tx(z) − fh(z) n(z) − ty(z) − fv(z)  ,    T (5.9) where (m(z), n(z)) represent the Euclidean coordinates, and tx(z) and ty(z) represent the translation pa- rameters at location z. The matrices S, R and T represent the grid transformations due to scale, rotation and translation changes respectively. Finally, the smoothness cost Esin OPT-1 is computed as

Es= 

z,z∈N

Vz,z = 

z,z∈N

min|mh(z) − mh(z)| + |mv(z) − mv(z)|, τ , (5.10) wherez and z are the adjacent pixel locations, andN represents a usual 4-pixel neighborhood. The param- eter τ sets a maximum limit to the smoothness penalty term, and thus helps to preserve the discontinuities in the motion field [130].

Now, we describe the methodology to estimate the dense transformation field f from the sets of atom transformations. Given the ithpair of atoms g

γi and gγiwith γi = (tx, ty, θ,sx, sy) and γi= (tx, ty, θ, sx, sy)

in the images I1 and I2 respectively, we first calculate the local transformation captured by these atoms

given by

δγi = (tx− tx, ty− ty, θ− θ, sx/sx, sy/sy). (5.11) It should be noted that all the pixels in the support of the atom gγi take the transformation value δγi, i.e.,

f(z) = δγi,∀z ∈ Zi, whereZi denotes the set of pixels in the support of the atom gγi given as

Zi={z = (m, n) | |gγi(m, n)| > }, (5.12)

where  > 0 is a constant. Using a similar process (see Eq. (5.11)) a local transformation is established for all the atom pairs. Then, the transformations{δγk} captured by the K pairs of atoms are fused together to estimate a global transformation fieldf. For a given location z, we first assign relative weights {w(k)z } to each candidate transformation δγi based on the response of the ith atom at the pixel locationz. Then, the

fusion process is simply implemented by choosing the most confident transformation for each pixel position

z. Mathematically, we write the transformation at position z as