Texture Video Super-Resolution - Overview of Image/Video Super-Resolution

2.3 Overview of Image/Video Super-Resolution

2.3.3 Texture Video Super-Resolution

Having an extra dimension of information than images, the video super-resolution problem can be implemented in two ways, one is spatial resolution enhancement, the other one is frame rate enhancement or frame rate up-conversion. As a multiple input, multiple output SR problem, the input video can be captured either from different viewpoints with the same frame rate or from the same viewpoint but with different frame rate. The former input type can generate super-solved multiview HR video (spatial resolution enhancement), while, the latter one can generate video with a high frame rate (frame rate up-conversion). It is worth noting that, the input videos can have different resolutions forming mixed-resolution inputs.

Spatial Resolution Enhancement

Since video can be regarded as many images that are captured at different times, most of the image SR problem approaches can be applied to implement video SR. Each frame of a LR video can be super-resolved to the target HR frame, and then the whole LR video has been super-resolved to the desired HR video.

Based on the LR image observed model, in [64] a Bayesian-based approach was used for adaptive video SR by simultaneously estimating the camera motion, blur kernel, and noise level while reconstructing the original HR frames. Due to the generalization of the motion and blur kernel, on one hand, it can achieve high accuracy for the estimation algorithm and high quality for the reconstructed HR frames. On the other hand, it involves many equations and many unknown parameters. Therefore, it is too complex and time consuming for real time video SR applications. Instead of using a Bayesian Maximum A Posteriori (Bayesian MAP) to determine the unknown parameters of each pixel, in [65], the Bayesian MAP is used to solve the block-based unknown parameters reducing the whole complexity while promising results are maintained. Work in [8] adopted a mixed-resolution video system where at least one of the views is captured at LR, while the others are captured at HR. Hence, in [8] the high frequency content has been extracted from the virtual view frame-by-frame and then added to the corresponding LR frame to reconstruct the HR frame. However, in this work, the high frequency content is extracted from the whole frame, thus the local characteristics of the scene are not taken into account.

Figure 2.18: SR approach for multiview images. A super-resolved image ˆVn is cre-

ated from its low-resolution version, V_nD, a neighboring HR view, Vk, and the depth

In this thesis, Chapters 3 and 4 propose two solutions to video spatial resolution enhancement. In Chapter 3, virtual views have been utilized to recover FR frames in a Mixed-Resolution Multiview Video plus Depth (MR-MVD) framework. The local similarity between the LR view and its corresponding virtual view has been used to steer the FR recovery mechanism. In Chapter 4, in addition to super-resolving one LR view, the two FR views are downsampled before encoding and super-resolved after decoding by exploiting inter-view redundancy via virtual views.

Frame Rate Up-conversion

Frame Rate Up-conversion (FRUC) is required for applications such as NTSC- PAL conversion and display on HDTV, where high frame rates are desired [66] [67]. The most commonly used frame rate up-conversion methods are frame repetition, linear interpolation, and motion compensated interpolation. Relying on temporal correlation of the original video sequence, many FRUC algorithms adopt a motion compensation tech- nique to construct the up-sampled frame [68]. Motion compensation is bi-directionally carried out in order to take into account frames on both sides of the up-converted frame. From a theoretical point of view, the implementation of linear interpolation and motion compensation-based interpolation are simple, however, they cannot deal with temporal aliasing caused by capturing the video below the Nyquist frequency of the motion trajectory, the true motion of the object cannot be recovered even by perform- ing ideal temporal interpolation (Fig.2.19). Hence, an insufficient frame rate will result in inaccuracy in the motion estimation by FRUC.

Spatio-temporal Resolution Enhancement

Recently, many consumer digital cameras support a dual shooting mode of both LR video and HR image. By periodically switching between the video and image modes, this type of camera makes it possible to super-resolve the LR video with the assistance of neighboring HR still images. Zhai and Wu proposed the convertion of LR video to HR video which has the same resolution as the auxiliary HR still images [69]. The target HR frames are modeled by a 2D Piecewise Autoregressive (PAR) process and the PAR model parameters are learned from these HR still images.

Time He ig h t t₂ (a) t3 t4 t1 Time He ig h t t2 (b) t3 t4 t1 Time He ig h t t2 (c) t₃ t₄ t₁

Figure 2.19: Temporal aliasing. (a) Trajectory of a ball over time. (b) Trajectory sampled over time by a low frame rate camera. Perceived trajectory is along a straight line. (c) Illustration that even with ideal temporal interpolation of (b) the true motion trajectory cannot be recovered.

In document Depth-Map-Assisted Texture and Depth Map Super-Resolution (Page 49-52)