Improving Variational Optical Flow in Motion Compen-

5.5 Discussion

5.5.1 Improving Variational Optical Flow in Motion Compen-

Scenes with complex motion can cause problems even to advanced optical flow algorithms. We will look at some examples of variational optical flow computed on sequences with complex motion. The flows have been computed by minimiz- ing the flow energy with gradient constancy assumption as given in Equations (5.6) and (5.7).

First, in cases where there is many different motions in a scene, e.g. crowd scenes, a highly segmented optical flow needs to be computed. In Figure 5.7(b) a relative complex scene, School Yard, with multiple motions including a camera pan is shown and in Figure 5.7(a) we see how the variational optical flow is nicely segmented – the two persons in the middle (red in the flow illustration) do actually have identical projected motion. In the sequence Street also used in Chapter 4 the motion is even more complex and many of the object in motion (far away cars and pedestrians) are too small to be segmented in the flow, see Figures 5.7(c) and 5.7(d). As shown with the flows of the sequence Square (see Figure 5.2) the flow computations with GCA will produce very fragmented flows even in detailed regions in with no or very little uniform motion, e.g. the stationary background in Square and the close to stationary buildings part of Street. To get smoother (and better looking) flows one could do some Gaussian pre-smoothing of the image sequence, but it might cause a loss of details in the flows.

Looking only at the objects with significant flow magnitudes in Street, the cars in the front, the van turning right around the corner in the middle of the frame, the Canadian flag to the upper right and the US flags on the left, they are all well segmented with reliable flows. For both examples in Figure 5.7 we have been able to compute artifact-free video super resolution results.

It is important to have a robust motion compensated algorithm that switches off temporal input when flows are unreliable, but the limitations of the human visual system will also help: In this kind of complex scenes the HVS will not be able to track all the motions and thus we might ’get away’ with producing suboptimal outputs. Still optimal results require precise and reliable flows. Even if there is only one or a few motions in the sequence, but one of them is of a type that is not included in our motion modelling and handled in a way that artifacts will appear, the HVS will most likely spot the problems. The non- modelled motion could be transparent motion, whirls, some cases of accelerated motion, and specific to block matching algorithms; anything not translational

(a) (b)

Figure 5.7: Variational optical flow with gradient constancy assumption. (a) forward flow of frame 13 in the sequence School Yard shown in (b). (c) forward flow of frame 2 in the sequence Street shown in (d). Both sequences are from movies telecined anamorphically to (widescreen) PAL DVDs.

at block size resolution.

A problem not handled very well in our variational flow algorithms both with and without GCA, is fast dynamic changes; flows with large accelerations. An example is the sequence Keyboard where the fast up-and-down motion of the typing hands causes problems. Four of the twenty frames in Keyboard are shown in Figure 5.8 and when we compute the flow (still with GCA as in the previous examples) of the full sequence as one volume, we get a flow that changes very little over the duration of the sequence, the true optical flow of the hands is not found. This is most likely due to the fact that we use local 3D spatiotemporal regularization on the flow, R_Ω¡ψ(|∇v1|2) + ψ(|∇v2|2)

dx. This regularization

term will fill in information from local temporal neighbors and in theory switch off the temporal diffusion if there are edges in time (the moving object has moved away). But for several sequences we have tested on, the flows stay the same over time in a given spatial location even though the object causing the motion is only present in that location in part of the sequence. Keyboard is one of these sequences, but School Yard where the correct motion fields nicely follow the objects over time, is not. We have used the 3D regularization on the

(a) (b) (c) (d)

(e) (f) (g) (h)

Figure 5.8: Optical flows with GCA on the 20 frame sequence Keyboard pro- cessed in 4 bites of 5 frame each. (a)-(d) forward flows of the second frame of each bite corresponding to frames 2, 7, 12 and 17 of the full sequence, the frames shown in (e)-(h). The flows are a somewhat oversegmented as the weight of the flow prior/regularization was set very low in this experiment (λ3= 30).

flow as it was reported to give better result than pure spatial 2D regularization by Brox et al. in [9] and by Bruhn et al. in [12].

To solve the problem of handling accelerated motion one could process the sequence in shorter bites. We tried to do so with Keyboard, we cut it into four bites of five frames each and the resulting flows are show in Figure 5.8. As the figure shows we now have dynamically changing flows and the problem is solved. Internally in each bite the flow is still close to constant over time, but this can be handled by making shorter bites, maybe even processing frame in pairs.

Solving the acceleration problem by bite wise flow processing in very short bites will give a loss of temporal long distance information propagation and time consistency achieved by processing longer bites iteratively. Using a sliding windowing function in time processing in overlapping bites might prevent some of this loss, but to get a sound solution to the problem, the solution should be formulated theoretically and from there implemented.

The 3D regularization (prior) we use, will in a multiresolution setting give an advantage with temporally slowly changing motions like zooms, exemplified by results on the Yosemite sequence given in [9] by Brox et al. where the variational optical flow algorithm using 3D regularization has lower angular errors than when using 2D regularization (and also outperforms any other algorithms tested on Yosemite). We have in our work on motion adaptive deinterlacing in [55] and [57] tested a 3D total variation regularization on intensities, but due to a lack of shutdown of temporal diffusion across temporal edges in case of motion, we abandoned it and used the split 2D and 1D prior instead as reported in Chapter 3 of this thesis. The assumption that a spatiotemporal volume is 3D seems to hold only in limited cases and image sequence volumes should be defined as 2+1D, separating space and time and using optical flow as the link holding them together (the ’+’). Taking variational optical flow beyond Yosemite and other artificially generated test sequences and using it in general applicable motion compensated algorithms (e.g. temporal super resolution and other upscalings) we believe using 2D regularization will help solve the accelerations problem

encountered with when using 3D regularization and we intend to test that in the future. We would also like to do the in-between of 3D and 2D priors, a local 2D+1D prior with a stronger temporal shutdown than the 3D prior or a 2D+1D prior including a £_V~V derivative along the flow. The latter might~

– depending on its exact formulation – actually prevent motion acceleration, in that case some specific acceleration prior should be employed instead. Any improvement of our model keeping us from using bite wise flow processing to handle accelerations is welcome as it will improve temporal consistency.

In document Video Upscaling Using Variational Methods (Page 129-132)