5.5 Discussion
5.5.2 Pointers to Generic Variational Frame Rate Conversion
Doubling the frame rate or doing any other integer multiplication of the frame rate is relatively straightforward, but any odd conversion like 25 to 24 fps, 25 to 72 fps or 24 to 60 fps is more requiring. Our framework allows arbitrary frame rate conversions, but instead of keeping old frame as every other new frame, the data term will have to specify a mapping from input temporal resolution to output temporal resolution, similar to the spatial super resolution constraint used in Chapter 4 of this thesis. Since the temporal point spread function would have to model the aperture time of the camera, one would always need to now the shutter speed used by the photographer for every recording. It would be more practical to let the data term map the nearest original frame in each direction to know where to find reliable data when creating the new frames. Lets us leave the more theoretical discussion and look at the implementation issues involved in doing generic (in frame rates) temporal super resolution.
To get TSR algorithm with arbitrary frame rate conversion, first the initial- izations need to be changed, which should be rather simple to do. Multireso- lution schemes are a great help here, since we can always build a pyramid that downscales the image to a size where |~v| ≈ 0 or at least |~v| < 1 and thus get a good initialization of our basic data.
Secondly we would have to change the nice, regular spaced temporal grid with grid size ∆t = 1, to a more irregular grid, which mainly affects the calcu- lations of temporal derivatives in the numerical implementation of the energy minimizations.
Thirdly, in the frame doubler we are in the fortunate situation to have two original frames close by as nearest neighbors to each new frame. When doing arbitrary frame rate TSR we might have more than one new frame between each pair of original frames and the question is to what degree should we ignore data in a neighboring new frame to get more reliable information from a further away original frame. Iteratively solving our problem, data in new frames become more and more reliable and neighboring new frames represents reliable information transported there from known input frames. With a good initialization from the level above in the multiresolution pyramid when using a small scale factor between levels, we can also gradually improve quality as we go down through the pyramid.
Finally one could consider if any original frame close to the position (±∆t) of a new frame should be used directly as that new frame. The maximum ∆tmax we can allow depends on the maximum projected displacement ∆xmax
on the retina, which does not annoy the HVS by creating jumpy/jerky motion. ∆tmax depends on ∆xmax thought the maximum projected velocity ∆vmax
however not fixed, but depends on many factors such as contrast, brightness, viewing angle and so on.
5.6
Conclusion
In this chapter we have discussed the requirements put on the design of temporal super resolution algorithms by the human visual system, we have presented a novel idea of simultaneous flow and intensity calculation in new frames of an image sequence, and we have introduced a novel variational temporal super resolution method and implemented and tested a variational frame rate doubler in two versions.
Even though we do not always create perfect new frames, variational mo- tion compensated temporal super resolution does provide high quality 50 fps video from 25 fps video without noticeable artifacts during video playback, thus reestablishing the pi-effect for the problem case of high contrast edges in motion. The framework presented also has the potential to be used for other frame rate conversion than frame rate doubling.
Detecting Interlaced or
Progressive Source of Video
This chapter is a collaboration with Kim Steenstrup Pedersen and my co- supervisor Fran¸cois Lauze and was originally published as [56]. Minor cor- rections have been done from [56] to improve readability.
In this chapter we introduce an algorithm – commonly known as a film mode detector – for separating progressive source video from interlaced source video. Due to interlacing artifacts in the presence of motion, a difference in isophote curvature can be measured and a threshold for effective classification can be set. This can be used in a video converter to ensure high quality output. We study two approaches.
6.1
Introduction
Many elements are needed to make a full video converter. Some of the most important elements are a deinterlacer, a spatial resolution up-converter (video super resolution) and a frame rate converter (temporal super resolution). The input video can be either interlaced or progressive [82]. In an interlaced video signal (broadcast or stored on e.g. DVD discs) one can have progressive video embedded, e.g. when the signal is of film source telecined to interlaced [82]. By doing a pull-down – that is recreating the original progressive frames from the interlaced fields – before further processing, interlacing artifacts can be avoided in progressive material as a deinterlacing would not necessarily remove all interlacing artifacts [55], [4]. The quality of interlaced material will in the presence of motion also suffer from just being merged to frames instead of being properly deinterlaced.
Thus determining the scan format of the input is vital for the further pro- cessing and the output quality. Hence another key element in a video converter is the input scan format detector. This element is often called film mode de- tection as film was earlier the only source of progressive material, but today progressive material can also originate from video and television cameras.
If the input source is DVD, the MPEG-2-codec facilitates flagging of video as either interlaced or progressive, which could make source detection obsolete.
Figure 6.1: Interlacing artifacts: Serration, none (progressive) and line crawl Unfortunately, it is far from sure that the flagging has been done correctly [76] and if the source is standard broadcast there is no flagging.
Some material like documentaries mixing video and film material and ’behind the camera’ shows on movies mix progressive and interlaced material... Therefor it is important to have a film mode detection, that can switch from one format to the other relatively fast.