Computational Video - Computational video enhancement

Computational video is a more recent field than computational photography, so its seminal literature is not yet established. In this dissertation, computational video involves processing video inputs to create video results that modify and combine elements across many frames.

This is accomplished using techniques enabled by modern computational power and the si- multaneous processing of many frames enabled by the low cost of storage. One type of computational video is the temporal resampling of a video to change its duration. Com- putational video techniques are thus allowed to warp both space and time, compressing or extending the video’s duration and intermixing frames.

2.7.1 Video Summarization

Temporal resampling is commonly used for multimedia summarization. Early work in Video Skimming (Smith and Kanade, 1997) looked for short, representative video segments that, when pieced together, could tell the story of the video in a reduced period of time. Segments were chosen based on characteristics including scene change detection, camera motion, object recognition, and audio. The documentary films they targeted had distinct scene changes providing the algorithms additional hints. An alternate summarization approach proposed by Hua et al. (2003) searched for video segments that contain scene and camera motion between shot boundaries and combined them to match the rhythm of an audio source to create music videos.

Using similar documentary films to (Smith and Kanade, 1997) and standard, uniformly- sampled summarization (fast-forward), Wildemuth et al. (2003) explored how fast videos can be played back while remaining coherent to the viewer. The result was that showing 1 out of every 64 frames typically allowed the viewer to comprehend most of the content. Note that in addition to documentary-style video sources, summarization can also be applied to static-camera, time-lapse sources. In particular, time-lapse has been shown to have many applications for viewing slowly changing processes. Time-lapse techniques are regularly used in fields as varied as biological microscopy (Riddle, 1979) and cinematic effects (Kinsman, 2006).

“Video Summarization By Curve Simplification” (DeMenthon et al., 1998) presents an algorithm to choose a non-uniform temporal sampling based upon simplification of tracked motion-path curves. These motion curves can be considered a subset of all motion activity in the scene. The sampling is derived from the use of the greedy Douglas-Peucker curve-

fitting algorithm (Douglas and Peucker, 1973). Note that slower, but optimal, dynamic programming-based curve-fitting solutions (Perez and Vidal, 1994) are possible.

In “Video Summarization Using MPEG-7 Motion Activity and Audio Descriptors” (Di- vakaran et al., 2003), short video sub-clips identified as containing significant motion are played at real-time speeds and assembled into a shorter video. The combined duration of these essential sub-clips forms the lower bound of the output video’s duration. Longer videos are constructed by padding the result with less interesting frames.

Rav-Acha et al. (2006) summarize long videos by allowing events to occur without the strict chronological ordering of the source footage, thus events may overlap. The events are identified and then combined in a manner determined via a spatio-temporal Markov random field optimization and simulated annealing.

2.7.2 Temporal Resampling and Compositing

Computational video considers a wider range of temporal resamplings and frame combina- tions. For instance, a class of operations exist that extend the length of videos by repeating segments. “Video Textures” (Sch¨odl et al., 2000) looks for transitions within a video that are least noticeable in an attempt to indefinitely extend its playing time. A dynamic programming solver is used along with a pairwise error metric to evaluate potential jumps.

Other approaches have also attempted to warp time, both globally (full frame) and locally (region-by-region). “Flow-Based Video Synthesis and Editing” (Bhat et al., 2004) rearranges repeating patterns of natural phenomena, such as waterfalls, that have reoccurring flow characteristics, to extend their play time. “Evolving Time Fronts” (Rav-Acha et al., 2005a) plays videos with differing speeds in multiple image regions for the effect of altering their outcomes or for artistic effects. “Dynamosaics” (Rav-Acha et al., 2005b) carries this idea further by improving the blending between regions with graph-cuts (Boykov et al., 1999). “Panoramic Video Textures” (Agarwala et al., 2005) also finds frame-to-frame jumps within non-static camera videos to create panoramas.

Videos can also be processed to alter their component visual elements. “Motion Mag- nification” (Liu et al., 2005) renders videos with amplified object motion vectors while not

changing the underlying temporal sampling, thus increasing apparent object velocity. Al- ternatively, “Space-Time Super-Resolution” (Shectman et al., 2005) creates a composite of multiple videos using their relative frame rates and spatial positions to maximize spatial and temporal information while reducing overall aliasing.

As a parallel to these computational video methods, user-assisted, temporally-aware video compositing algorithms have also been developed. “Video Matting of Complex Scenes” (Chuang et al., 2002) approaches the problem using Bayesian methods. “Interactive Video Cutout” (Wang et al., 2005) combines compositing with spline-based, non-planar spatio- temporal volume cuts in its interface for extracting foreground elements. In this interface, the user can specify hints to the alpha compositing engine across multiple frames by painting directly onto the cut plane. The VideoShop project (Wang et al., 2007) composites multi- frame visual elements from video clips together in the gradient domain through the use of a 3D multigrid Poisson solver.

In document Computational video enhancement (Page 39-42)