Energy minimization - Virtual camera specification

7.2 Virtual camera specification

7.3.7 Energy minimization

Overall the problem of finding the virtual camera trajectory given the actor bounding boxes and the shot specification, can simply be summarized as a problem of minimizing a convex cost function with linear constraints. Which is defined as follows:

minimize

7.4. RESULTS 127

Figure 7.9: Illustration of pull in penalty for shot specification "MS A". It penalizes any instance when an actor not included in the shot specification is chopped by the virtual camera frame (and it is close enough to be included in the virtual camera frame). This figure illustrates an example where an external actor is chopped on the right side of the virtual camera frame.

A cost equal to the part not included (shaded region) in the virtual camera frame is added to the objective function, which is equal to(xr⁰⁰_t − f x_t− A_rf st) in this case. By minimizing the missing part in the virtual camera frame, the cost function tries topull in the external actor.

Here, λ1, λ2, λ3, λ4, λ5and λ6 are parameters. They can be adjusted to control the amount of regularization and the weight of each penalty term. In this paper, we use only two parameters with (λ1 = λ₂) and (λ₃ = λ₄ = λ₅ = λ₆), giving a similar preference to each penalty term.

But this can be adjusted in special cases where higher preference may be required for a specific penalty term. One major advantage of our method is that any standard off the shelf convex optimization toolbox can be used to solve Equation8.1. In our case we use cvx [GB14].

7.4 R

ESULTS

We present results on five different sequences (each in Full HD 1920 × 1080) from the play

‘Death of a Salesman’. Those sequences were chosen from scenes with two, three and four actors to demonstrate the versatility of our approach.

For each of these master shots, we generate a variety of reframed sequences with different shot specifications. The reframed sequences are generated with a resolution of (640 × 360), maintaining the original 16 : 9 aspect ratio. These generated sequences can be directly im-ported and edited in a standard video editing software as a multi-clip. Figure7.10shows ex-ample of a multi-clip sequence consisting of the original sequence (master shot) and the three reframed sequences generated using our method. On average, the optimization took around 3-4 seconds per rush per minute. All the original videos and generated rushes are available online¹. Qualitative evaluation

The results on two different sequences are shown in Figure7.11and Figure7.12. Each figure shows a few selected keyframes from the original video and the corresponding frames from the

1https://team.inria.fr/imagine/vgandhi/cvmp_2014/

Figure 7.10: Screenshot of a multiclip editing session in Final Cut Pro. The multi-clip was created using three reframed sequences (MS A, MS B, FS All) and the original master shot. On the right, we see the edited sequence.

virtual camera sequences generated using our method. A plot of the horizontal position fx of the virtual camera trajectory against time is shown for each of the generated sequences. The generated sequences allow the editor to highlight details which may be not be so easy to notice in the original sequence. Also, it provides much more variety to keep the viewer interested.

Now we discuss the generated sequences on three important aspects of cinematography:

Composition

We can observe that the virtual cameras maintain a nice composition based on the shot spec-ification. For example, the virtual cameras "MS A" and "MS B" in Figure7.11keep a stable medium shot of both actors avoiding the actors to come in contact with the image frame. The generated shot also preserves the screen continuity, for example the camera "MS B" keeps the actor B at 1/3 right as she is positioned on the right side of the stage. Similarly, the camera

"MS A" keeps actor A on 1/3rd left as he enters from the left. Another example can be seen with camera "MS B" in Figure7.12, where the camera keeps the actor in the center as it stays between two other actors on stage.

The virtual cameras also avoid cropping the actors not mentioned in shot specification. For example, the camera "MS B" in Figure 7.11pulls in actor A when it comes close to actor B at keyframe 6. Similar example can be seen with camera "FS B,C" in Figure7.12, which maintains a tight full shot of actors B and C but pulls in actor A when it comes close to the camera frame.

Camera motion

The plots of fx in Figure 7.11and Figure 7.12 show that the virtual camera path smoothly transitions between long static segments. Observe how the virtual camera remains static for long period between keyframes 4 to 5 and keyframes 6 to 7 in Figure7.11as the actors do not move significantly. When the camera moves, it moves smoothly preserving the apparent

7.4. RESULTS 129

Figure 7.11: Reframing results on a sequence with two actors (A,B). The top row shows a set of selected keyframes from the original video. The corresponding keyframes from the three different virtual camera sequences are shown below. The three reframed sequences include the medium shot of each actor (MS A, MS B) and a full shot of both the actors (FS All). A plot of the horizontal positionf_xof the virtual camera trajectory against time is shown for each of the three reframed sequences. The position of the keyframes on the plot is marked with red dots.

motion of the actors on stage. For example, observe how the camera "MS A" in Figure7.11 moves to the right as the actor A enters the stage between keyframes 3 and 4.

Cuttability

Good composition, screen continuity and long static cameras in the generated virtual camera sequence provide the editor plenty of choices to cut. For example the editor can switch among all four possibilities (including the original) at keyframe 4 and 5 in Figure 7.11. Similarly, the editor can switch among all five options at keyframe 1 in Figure 7.12. In some cases the generated virtual cameras may not be cuttable, for example cutting between camera "MS A"

and "MS B" at keyframe 6 in Figure 7.11would create a jump cut. This happens because, due to the pull in event both cameras end up framing the same actors with slightly different compositions. In some cases, the virtual camera framing comes too close to the framing of the original master shot and cutting between them may lead to a jump cut. An example of this can be seen in keyframe 3 of camera "FS All" in Figure7.11.

1 2 3 4 5

Figure 7.12: Reframing results on a sequence with three actors (A,B,C). Selected keyframes from the original sequence and 4 virtual camera sequences are shown in this figure. The four virtual camera sequences include the full shot of all three actors (FS All), full shot of actors two actor (FS B,C) and medium shots of two of the actors (MS A, MS B). A plot of the horizontal positionf_xof the virtual camera trajectory against time is shown for each of the four reframed sequences. The position of the selected keyframes on the plot is marked with red dots.

7.5 S

UMMARY

We have presented a system which can generate multiple reframed sequences from a single viewpoint taking into consideration the composition, camera movement and cutting aspects of cinematography. We have cast the problem of rush generation as a convex minimization prob-lem and demonstrated qualitatively correct results in a variety of situations. To our knowledge this is the first time that the problem of rush generation has been addressed and validated exper-imentally. In effect, our method provides a cost-effective solution for multi-clip video editing from a single viewpoint.

7.5. SUMMARY 131

Currently in our system, optimization is performed separately for each given shot specifi-cation. This may lead to jump cuts in few cases as discussed in previous section. Future work should investigate a joint optimization for the set of given shot specifications. The proposed work focuses on framing actors present on stage but does not allow to include objects in the shot specification. In future work, it would be interesting to integrate some simple objects in the shot naming conventions using standard objects detectors. It would also be interesting to perform further evaluation/validation based on a quantitative metric or user questionnaires.

The Full HD master shots used in the experiments did not provide us enough resolution to go closer than medium shots. But the method can be easily applied to master shots with higher resolutions (4K or 6K), which will allow to extend the range of shots to medium close-ups (MCU) and close-ups (CU). The reframed rushes obtained with our method are automatically annotated with actor and camera movements which makes them suitable for automatic editing.

One of the promising direction of future research is to investigate the problem of automatic camera selection given the rushes.

The algorithm described in this chapter could become a plugin for Final Cut Pro (or Pre-miere or After Effect or Nuke or Open source Natron). A patent is being filed by INRIA (with Gandhi, Ronfard and Gleicher as co-inventors). The algorithm can also be used for role based reframing in virtual cinematography systems [GRCS14].

CHAPTER

8 APPLICATIONS AND PERSPECTIVES

133

8.1 F

LOOR PLAN VIEW RECONSTRUCTION

One of the important aspects in theatre direction is ‘blocking’, which refers to the precise move-ment and positioning of actors on a stage in order to facilitate the performance of a play. In contemporary theatre, the director usually determines blocking during rehearsal, telling actors where they should move for the proper dramatic effect, ensure sight lines for the audience and work with the lighting design of the scene. The term comes from 19th century theatre, where directors worked out the staging of a scene on a miniature stage using blocks to represent each of the actors.

During the rehearsals, usually the assistant director or the stage manager (or both) take notes about where actors are positioned and their movement patterns on stage. These notes are often made in form of markings (blocking notations) on the stage floor plan view. This is im-portant to ensure that actors follow the assigned blocking from night to night [Spo85,Sch97].

To replace this tedious task of manual markings for numerous rehearsals, we present a system

Figure 8.1: Illustration of floor plan view reconstruction for four different frames. The gray part in each image shows the top view of the stage floor and the corresponding actor positions.

In document Cadrage et montage automatique de films de théâtre par analyse sémantique de vidéo (Page 127-136)