6. Enhancing the augmented reality system
6.2 Diminished reality
6.2.3 Diminishing 3D objects
Conflicts between virtual and real objects is a challenge that comes up frequently in augmented reality. People seldom use AR applications in an empty space. They often use them in environments that have several real objects, which may then overlap with the augmentation. For example, in interior design the user may want to test a virtual couch on a spot where a physical couch is. Besides, the user may want to see an augmentation from a viewpoint where something comes partly in front of a virtual object (as e.g. backrest of a chair in Figure 82). These kinds of situations where real objects overlap with virtual objects are problematic in aug-mented reality. One solution is to remove disturbing existing objects virtually. Fur-thermore, the whole purpose of the application might be to visualise changes in environment, including the possibility to virtually remove existing structures or objects.
The diminished area for hiding an object is object’s projection into image plane.
Diminishing a 3D object differs from diminishing a planar object in one aspect; for a planar object it is straightforward to calculate its projection into the image plane (it is a plane-to-plane homography). For a 3D object, the projection depends on the shape of the object and viewing direction. In addition, the neighbourhood of the object in image plane changes depending on the 3D structures of the back-ground. Defining the object and the diminished area is an essential issue in gener-ic object hiding.
In the following, we first describe methods defining the diminished area for 3D objects and then discuss further inpainting methods for real-time augmented reali-ty. Later, in Section 6.3 we discuss more handling relations between real and virtual objects.
Multi-camera systems have been used in diminished reality, e.g. [215, 216].
These two approaches, like most multi-camera systems, use background infor-mation for texturing the diminished area. However, single-camera systems are more common in AR. Thus we focus in this work on single-camera AR and dis-cuss multi-camera approaches only briefly here.
A typical situation in augmented reality is that the operator uses the application in an unforeseen environment and the items in the environment are unknown.
Therefore, a conventional approach is to use user interaction to define the object instead of object recognition methods [217].
A straightforward and commonly used approach is that the user draws a poly-gon around the object with a mouse. Usually this is done in several key-frames
in [218]. In diminished reality, it is sufficient that the reconstruction approximates the shape of the object as long as the object is inside the volume.
The user may also indicate the object by drawing a loop around it. Defining a volume based on round-shaped loops is complex as well as calculating projections of free shape. Therefore, free-shape loops are seldom used together with the volume reconstruction approach in real-time AR applications. In contrast, con-struction of a 3D polygon and calculating projection of a polygon mesh is feasible.
However, the user can indicate the object circling it with a mouse if some other approach is used.
For example, the method presented in [219] tracks the object on an image plane after the user has once indicated it. It tracks the boundaries of the objects using active contour algorithm [220]. This approach uses the assumption that the object differs clearly enough from background and that the appearance of the object in successive frames is almost the same, and it may fail if there are strong boundaries in the background. This implementation increases the area of an ob-ject in previous frame for following frame, and then the obob-ject boundary is again searched with the active contour algorithm. The diminished area is always select-ed a bit larger than the object to ensure that the object is totally coverselect-ed.
We propose a new approach as one possible solution: the user can select pre-defined 3D volumes and cover objects with them. In our implementation, the user can select some predefined volume, e.g. a cube. The volume appears then on the scene as a wireframe object (red cube in Figure 81), which the user can scale and move around in the application and position to cover the desired object. For each frame, the projection of the volume on the image plane is calculated (blue polygon in Figure 81). The projection area is then diminished (right image in Figure 81).
Our approach is fast and well-suited to real-time applications.
Figure 81. Images of our diminished reality implementation using predefined 3D volume. On the left: red wireframe illustrated cube, blue polygon illustrated the diminished area. On the right: object removed.
Besides the volumes to be removed, an application can add volumes defining
“holes” inside the volume to be removed. This allows it to define simple
non-convex volumes and volumes with holes easily. Our current implementation sup-ports partially this functionality. For example, in Figure 82, we diminish the backrest of the chair in such a way that only the parts belonging to actual chair are manipulated and the holes are left untouched.
Figure 82 also illustrates the occlusion problem: a virtual object is rendered on top of a real object (in the middle), and on the right, the same situation with the exception that the chair in front of the image is first removed virtually. The middle image illustrates how the visual illusion is disturbed if a background object is ren-dered partially in front of foreground object.
Figure 82. Example of our occlusion handling with diminished reality.
An augmented reality system with haptic feedback is a special case; the system knows the visually disturbing object (the haptic device) beforehand, the pose of the object is known, and users can even influence the appearance of the object. In this kind of situation, special approaches such as optical camouflage are possible.
For example, users can paint the device with retro reflective paint, detect it and use a projector to project a background image on top of it as proposed in [221], for example.
A different approach for avoiding visual obtrusion is to define the volume of the haptic device based on the knowledge of its location, its posture and physical shape. This approach is used in [222], for example, where the haptic device is covered with a combination of boxes and a sphere. These volumes are rendered in stencil buffer to form a mask of the diminished area. Then pre-recorded back-ground images with associated camera positions and rough geometric approxima-tion of the background are used for texture generaapproxima-tion.
Methods using information of the background structures are usually computa-tionally demanding or require extra equipment (e.g. additional video cameras or depth cameras). Therefore, these methods are unsuitable for lightweight solutions.
Besides, they often require some sort of initialisation, which in turn requires more or less expertise. This limits their use in consumer applications.
Furthermore, fast methods (e.g. texture interpolation) usually blend the colours and textures, which creates unwanted blur on structural edges, see the rightmost image in Figure 84, for example.
Yet with a simple modification, we can improve the visual result significantly. If we take the divisions on the background (e.g. the junction between floor and wall)
into account, we can divide the diminishing area into sub areas and achieve a more natural result, as can be seen in our example in lower image of Figure 84.
We identify several means for doing this. First, should we have some prior knowledge of the environment and the 3D-structure we could use that. For exam-ple in interior design, the floor plan indicates the boundaries of the room, i.e. the location of the floor-wall intersection. Secondly, we can use image processing and computer vision algorithms to detect the lines between image segments and use them. Figure 83 shows an example of our automatic line detection implementation.
Our real-time method finds dominant lines in the area around the diminished area, finds their counterparts from the opposite boundary and interpolates textures on these directions.
Figure 83. An example of automatic line detection. On the left: the blue area shows the area to be diminished. In the middle: interpolation without line detection.
On the right: our texture interpolation method with line detection.
A third possibility is to use simple user interaction to define these kinds of lines, as was the case in our example shown in Figure 84. Our application also supports predefined edge locations as well.
Figure 84. Lower images: our fast volume hiding, which takes 3D structures into account. Upper images: same method without taking 3D structures into account.
Should the object be moving, the system should track the movements of the di-minished object. Only in special cases, as e.g. in the abovementioned case of haptic device, the system knows the movement. Object tracking can rely on motion detection and tracking, image segmentation, feature detection and tracking or any other real-time object tracking method. Our abovementioned object hiding using a 3D volume can be combined with feature tracking to remove moving objects.