Algorithm for Frame Doubling - Theory and Algorithm

5.3 Theory and Algorithm

5.3.6 Algorithm for Frame Doubling

Our framework handles any reasonable frame rate conversion.4 _{For testing}

we have chosen just to implement a frame rate doubler, but at the end of this chapter we will discuss how to implement algorithms for other conversion ratios, which turns out to be mainly a practical problem fairly easy solved. Here we look at the implementation of a frame doubler minimizing the energies in (5.5), (5.6) and (5.7) according to (5.3).

Let us start with the flow energy minimization. After exchanging the E2-

term of the flow energy in (5.6) with the E2-term from (5.7), the flow Euler-

Lagrange equation is derived. It is implemented numerically along the lines given by Brox et al. in [9] and by Lauze in [64] and minimized iteratively using a Gauss-Seidel solver with a fixed point approach to linearize the otherwise nonlinear system.

The same general solution is used for the intensity energy. We have that the optical flow constraint is approximated by the brightness constancy assumption, that is

£_V~u = ∇u · ~v + ut= ~VT∇3u ≈ u(x, t) − u(x + ~v, t + 1)

with the discretization suggested by u(x, t) − u(x + ~v, t + 1). We define A(u) = 2ψ0_(|∇u|2_{) and B(u) = 2ψ}0_(|£

Vu|2). Then the gradient of the energy in (5.5)

G(A(u), B(u)) :=∂E

∂u = −λsdiv2(A(u)∇u) − λtdiv3

B(u)(£_V~u)~V

´ (5.8) where div2 and div3 are the 2D and 3D divergence operators respectively. Dis-

cretization is performed as described for deinterlacing and video super resolution in Chapters 3 and 4. Equation (5.8) set equal to zero is the intensity Euler- Lagrange equation. It is unfortunately nonlinear with A(u) and B(u) being the nonlinear terms. In order to linearize the system so that we can apply a Gauss-Seidel solver to it, we use a fixed point approach. A(u) and B(u) are only updated in each of a number of outer fixed point iterations. For each outer iteration we run a number of inner iteration in which the values of A(u) and

B(u) are now just constants (fixed) and the system thus linearized, enabling the

use of a Gauss-Seidel relaxation solver.

In our multiresolution settings, on each level k of the pyramid, we first com- pute the forward and backward flows, ~vf₀ and ~vb

0, of the original input sequence u0 (resized to the size of the current level), minimizing (5.6) (with or without

the GCA included in E2) in which the resized input sequence u0simply replaces u.5 _{This is to have a highly reliable anchor flow when calculating the flows ~v}f

and ~vb_{of the full output sequence. At the given level of the pyramid, k, we then}

initialize intensities and the flows of the new frames by resizing the intensities and flows calculated at the above coarser level k + 1. Then we calculate from these initializations the flows by minimizing the energy in (5.6), again with or without the GCA in E2. Next we calculate u at level k by minimizing the energy

(5.5) knowing ~vf _{and ~v}b _{and using the resized intensities from level k + 1 as}

initialization of u in the new frames, just as when calculating ~vf _{and ~v}b_.

4_{We have already found frame doubling to be the worst case scenario, but doing e.g. 25}

fps from 1 fps video or similar is what we would consider unreasonable, as good results would be unrealistic in case of complex motion in the scene.

The resizing function (resize) we use to initialize flows and intensities at level k from the values calculated at the above level k + 1 is a simple spatial down- and upscaling function for multiresolution schemes as given in [11]. We also use this resize function to downscale u0to the given level k to always have

the best representation of the original frames at the given level. The recalculation of the flow of the original frames, ~v₀f and ~vb

0, is important,

because it rids us of the assumption that the flow is constant/linear and can just be halved when we insert new frames. When the flows in the original frames is more precise, the flow we calculate in the new frames and actually use in the intensity calculations, also becomes more precise and reliable.

The use of a multiresolution schemes is considered essential when doing variational flow calculations. In TSR, calculating both flow and intensities at each level solves the hen-egg problem of what comes first in a new frame: The flow or the intensities. Thus we iteratively improve first one and then the other to get simultaneous computations and optimize our solution. By using a small scale factor between pyramid levels ensures a very good initial guesses of the intensities and flows in the new frames at each level. Using further integration to make the calculations ’more’ simultaneous (as it is done for video super resolution in Chapter 4) by alternating between minimizing E(~v) and E(u) internally on each level in the pyramid is unlikely to improve output quality when using small scale factors in the pyramid.

At the coarsest level at the top of the pyramid we do not have a k + 1 level to initialize our data from and thus have to use temporal initialization (inferior to k + 1 initialization). For the flow calculation we have chosen to do frame averaging of both flow and intensities. If the new frame is located at time n and the two know frame are at time n ± 1/2 then

~v(x, n) = ~v0(x, n − 1/2) + ~v0(x, n + 1/2)

2 and

u(x, n) = u0(x, n − 1/2) + u0(x, n + 1/2)

2 .

Since the spatial size of the frames at the top level is only a fraction of the size of the frames at bottom level, even very large motion will be downscaled to be very small at the top level, and thus the initialization will be a good approximation of the actual values. Even though the flow we compute at the top level is of subpixel (or close to) size, we still use it to re-initialize the intensities by simple interpolation along the flow

u(x, n) = u0(x + ~v

b_{, n − 1/2) + u}

0(x + ~vf, n + 1/2)

before we minimize E(u). As the top level is only a very crude estimate of the finest level at the bottom and still has to go through many corrections down through the pyramid, this initialization should suffice.

The algorithms we use is (leaving out the special initialization case at the top level):

At each level from the top, coarse to fine, for k = levels until k = 1 : 1. Calculate the forward and backward flows, ~vf₀ and ~vb

0, of the resized orig-

2. Initialize new frames: u(x, t, k) = resize[u(x, t, k − 1)] in the domain D. 3. Initialize forward and backward flows of new frames:

v(x, t, k) = resize[~v(x, t, k − 1)] in the domain D.

4. Calculate the flows ~vf _{and ~v}b _{of the output sequence u minimizing (5.6)}

with/without E2 from (5.7).

5. Calculate new frames in u|D by minimizing (5.5).

5.4 Experiments

In our tests we will focus on the major problem caused by having to low frame rates in image sequences: Unnatural motion. The problem is mostly caused by camera pans on scenes containing high contrast edges. By doubling the frame rate we will make the apparent motion of these high contrast edges smoother and (partially) reestablishing the phi-effect.

As discussed in the previous section we have implemented our frame doubling algorithm in such a way that we can test it in two versions: The first includes the gradient constancy assumption in the flow calculations and is expected to give the most correct results as both flow and intensities are subject to minimal blurring when the GCA sharpens the flow. The second method without GCA in the flow is expected to blur the flow more and following from that, also blur the intensities more. The version without GCA is, when implemented in a version where the GCA is not turned out by setting its weight to zero, faster. We test it in the hope that its results, although expected to be more smooth, will be convincing to the human viewer and not be perceived as unsharp (as discussed in section 5.2.2). If we get results with different sharpness in the new frames, it will help us say something about blur acceptance in frame doubling. Before we start to look at our frame doubling results, there are some topics we wish to discuss first.

5.4.1 Test Material

We have tested on a few homemade sequences as well as on cutout sequences of real world motions pictures on standard PAL resolution (720×576 pixels, 25 fps, telecined) DVDs. We only process the luminance channel, but the extension to full color processing is discussed in Section 2.4 of this thesis. The cutouts used are of areas with high contrast edges in motions and the sequences chosen were experienced to have jerky motion when watched in their full frame PAL versions on a 43” Pioneer plasma screen with a viewing distance of 2 to 2.5 meters. The frame rate of the screen is unknown, but the frame rate up-conversion from 25 fps is unlikely to be motion compensated due to the experienced jerky motion. The jerkiness in the test sequences was also confirmed by played back the 25 fps sequences on a LCD PC monitor with 75Hz refresh rate using frame repetition. Online Video Files

All inputs and results given in the figures in Section 5.4.5 (Frame Doubling Results) are also given as video files (*.avi) online at: http://image.diku.dk/

Without GCA With GCA Multiresolution Scale factor 1.04 1.04

Levels 55 or 75 55 or 75

Flow Fixed point iterations 10 5

Relaxation iterations 40 20

λ3 in (5.6) 30 100

λ2 in (5.7) 1 1

γ in (5.7) 0 100

Intensities Fixed point iterations 5 5 Relaxation iterations 10 10

λs in (5.5) 1 1

λtin (5.5) 50 50

Table 5.1: Optimal parameter settings for variational TSR. The parameters are discussed in details in section 5.4.2. The eleventh parameter of the algorithms is the convergence threshold set to 10−7 _{in all tests.}

sunebio/TSR/TSR.zip [58]. References to specific files are given when they are discussed in Section 5.4.5 and all files are named with the same name as the input test sequence and the method used on them. The shareware AVI video editor VirtualDub is also given in the online material.

5.4.2 Parameters

As with any advanced method there is a number of parameters that needs to be tuned in our algorithms to optimize performance. To be exact there are eleven parameters. In our case optimal performance is optimal output quality, we have left tuning for lower running times for later, but will discuss the issue in this section. Through extensive parameter testing we have optimized two sets of parameters for variational frame doubling TSR; with and without GCA in the flow. Our experience from doing deinterlacing and video super resolution (see Chapter 3 and Chapter 4 of this thesis) together with literature on inpainting and optical flow calculation ([9], [12], [14], [64] etc.) enabled us to give qual- ified guesses on parameters. From these initial guesses we could then try to push each parameters in both directions making our parameter tuning deter- ministic. Testing just three different settings of each parameter in all possible combinations would result in 311_{= 177, 147 different test results for evaluation.}

The settings we decided on were optimal (given our obviously incomplete search) can be found in table 5.1 for both versions of our algorithm. We see that settings for the intensity energy minimization is the same for both algorithm versions, we did of course test other settings, but the given settings optimizes output quality. The weight ratio temporal to spatial diffusion is high in favor of temporal diffusion which ensures that spatial diffusion is only used when temporal information is highly unreliable. Lowering the ratio from 50:1 to 20:1 gave similar results when evaluating on video, but judging from the stills, there where minor degradations, thus we recommend λt:λs= 50:1. It is possible that

fewer intensity iterations will give results similar to those we get now, but as we consider lowering the number of iterations to be pure running time optimization,

we have (not yet) tested this tuning option thoroughly.

The number of flow iterations needed are higher than the number of intensity iterations needed, which indicates a larger complexity in the flow calculations. We did conduct tests with fewer flow iterations, but we are close to the bottom limit as is. Experiments of running more iterations than given in Table 5.1 gave no visual improvements – with one exception: While tuning we found that doing 20 fixed iterations instead of 10 in the flow calculations without GCA gave very small visible improvement, but only when results where viewed as still, thus we are confident that 10 iterations are enough here. Even though we appear to be at convergence with the given number of iterations, we never reached the convergence threshold of 10−7_{, which would break out of the loops and stop}

iterations. Thus a higher value of convergence threshold might be of more use, if this parameter is to live up to its name.

Table 5.1 also shows that without GCA in the flow we need more iterations to get optimal flows. This is because fewer point give reliable flow information when only evaluating brightness constancy, which increases the need for flow diffusion by the regularization term on the flow (E3) – and this takes extra

time. The E3-term is weighed 30 times over the OFC when we do not use

the GCA, but is given the same weight as the GCA when the GCA is used (with the OFC nearly neglected due to the λ2:γ = 1:100 ratio). Since we have

implemented flow calculations without GCA by setting γ = 0, we cannot say if the version without GCA in the flow is faster than the version with GCA in spite of its need for four times as many iterations, but we do expect it to be faster. It would under all circumstance be less complex to implement e.g. in hardware.

The number of levels in the multiresolution pyramid is set to either 55 or 75 depending on the frame size of the given image sequence with a 1.04 (coarse to fine) scale factor between the levels. The low scaling factor ensures good information transfer down through the pyramid, but we have not tried increasing to values above 1.1 (and only tried the value 1.1 on one sequence) but doing so could optimize running times as each level would shrink in size. Whether larger scale factor will decreasing the output quality is unknown. The maximum flow magnitude (relative to the frame size) should be scaled to (close to) subpixel at the top level. With larger scale factors the number of levels in the pyramid could also be decreased, but this would only give a minor speedup as we would remove only the very small coarse levels at the top of the pyramid. It is more likely that we would get a speedup from the lowering the number of iterations at all levels but the finest bottom levels of the pyramid (and possibly the top levels to cope with the inferior initializations used) but again this is a speedup optimization and is left for later.

We also conducted experiments using zero as initial values for both flows and intensities in the new frames at the top level. Given the many levels we use, the error introduced was corrected down through pyramid, showing great robust against bad initializations. The robustness of our algorithm is also enforced by the fact that we compute the flows on the original frame only on each level.

In document Video Upscaling Using Variational Methods (Page 116-120)