CHAPTER 3: MITIGATING DISPARITIES IN ILLUMINATION
3.4 Validation
3.4.3 Application 1: High Dynamic Range, Scene-Adaptive,
DBI was a key enabling component of the low-latency, HDR, scene-adaptive, DMD-based OST AR display demonstrated in our 2017 publication, (Lincoln, Blate, et al. 2017). Indeed, DBI was initially conceived to add color to this display’s antecedent, a low-latency 6-bit monochrome OST AR display (Lincoln, Blate, et al. 2016).
Overview and Objectives
3.4.3.1
The goals for the display in (Lincoln, Blate, et al. 2017) were to:
a)implement a display with dynamic range sufficient to match the real world;
b)react to changes in HDR scene brightness by changing the brightness of the virtual scene; and c)maintain the low motion-to-photon latency of the antecedent display–on the order of 100 μs.
Our DBI illuminator implementation satisfies goal (a) under all but the most intense sunlit conditions. At full intensity, the virtual objects on the display can be bright to the point of mild
discomfort. The display’s dynamic range, about 115 dB in total, is characterized in Table 2 and discussed in section 3.4.1.
The proof-of-concept position-sensitive light sensor, discussed below, provides the display controller with the requisite data to satisfy goal (b), i.e., maintaining consistency between scene and virtual object luminosity. We note, however, that the illuminator is capable of reproducing luminosities below the noise floor of the positional light sensor; this did not pose any practical problems for our experiments.
The motion compensation latency-mitigation mechanisms of the antecedent display (Lincoln, Blate, et al. 2016) (see also (P. C. Lincoln 2017)) are used in the HDR display as well. The change to DBI illumination from the PDM illumination in the antecedent display, however, does increase the measured motion-to-photon latency from an average of 80 μs to 124 μs: The additional 44μs is because we have to wait for the entire DMD to settle before activating the illuminator (as discussed in section 3.3.2). The
antecedent display used a constantly-on light source, so the “photon” event occurs as soon as the first part of the DMD is switched.
Goal (c) was met in significant part: Based upon the display’s present mean motion-to-photon latency (124 μs), real-virtual displacement would remain under one pixel (about 2 to 3 arcminutes) at up to 336°/s of head rotation (yaw or pitch). Additionally, several known optimizations to the display would bring its motion-to-photon latency close to or below 100 μs; the details of said optimizations are beyond the scope of this dissertation.
Baffle
Sensors
Figure 18: Position-sensitive light sensor prototype: principle of operation.
(Figurative overhead view, not to scale.) A circuit board (green), upon which are installed four HDR light sensors (orange), is positioned behind a baffle. The vertical slots in the baffle are tapered and angled, giving each sensor about a 9° field of view. Each sensor measures the mean luminosity of its respective partition (a vertical slice) of the scene; the slices are non-overlapping out to a distance of about 2 m.
Figure 19: Cut-away of a prototype baffle for position-sensitive light sensor.
The baffle has been cut along its horizontal axis to reveal its internal structure. This baffle’s geometry is similar but not identical to the baffle used in our experiments. The baffle is facing down-range. The baffle is constructed from a block of Delrin™ plastic, into which tapered slots are milled; the angles between slots is approximately 9°. The baffle is about 100 mm wide and 60 mm tall. The actual baffle used in our experiments is shown in Figure 7 as part of the positional light sensor assembly. The sensor board mounts to the baffle with the sensors facing down-range, i.e., the sensors align with the near face of the baffle facing into the page.
m
9 mm
Figure 20: Position-sensitive light sensor circuit board.
The small black squares are the light sensor chips; the right-most sensor is circled in red. The sensors are located on 9 mm centers, as indicated. The board is mounted to the rear of the baffle with the sensors centered vertically and horizontally with respect to the slots. The board itself is about 100×60 mm.
Position-Sensitive Light Sensing
3.4.3.2
To achieve goal (b), above, “[to] react to changes in HDR scene brightness by changing the brightness of the virtual scene,” the display must match the luminosity of virtual objects with that of the immediately-surrounding scene; the display thus needs to know the luminosity of the scene in the area proximate to each virtual object. Under typical conditions, scene lighting conditions may range –
simultaneously – over perhaps 100-120 dB. That is, two objects or areas within the same scene may differ in brightness by five or more orders of magnitude. Measuring mean scene brightness, e.g., with a single ambient light sensor, is insufficient because the mean brightness could be so much larger than that of the darkest region.
In Lincoln, Blate, et al. (2017), we introduced the principle of position-sensitive light sensing into the context of OST AR systems. The display of [idem] requires measurements of light levels at many locations over the user’s field of view. We do not know the requisite the number of locations, but we estimate that an angular resolution of 5-10° (2.5-5 times the FOV of the fovea) is sufficient; optimization of this design parameter may be a fruitful avenue of future research. To mitigate real-virtual discrepancies in luminosity, the sensor’s dynamic range must span at least that of the display and, in principle, should span or exceed the range of expected scene lighting. Based on our estimates (see section 3.2.3), the requisite sensor dynamic range is at least 100-120 dB referenced to 0.1 lx.
The sensor dynamic range requirement presents a challenge vis-à-vis implementing a position- sensitive light sensor. For example, at the time of the implementation of the display of (Lincoln, Blate, et al. 2017), no readily-available digital camera sensor had sufficient dynamic range for the present purpose. Instead, we chose to implement our prototype position-sensitive light sensor using photodiode-based sensors. Specifically, we used the Avago™ APDS-9250 digital ambient light sensor (Avago 2015). The APDS-9250 has a published dynamic range of “18,000,000:1” or about 145 dB referenced to 1 lx [idem].
The principle of operation of our prototype position-sensitive light sensor is illustrated in Figure 18. A baffle, machined from black Delrin™ plastic, partitions the field of view into four ~9°-wide vertical strips; a cutaway of an earlier version of the baffle is shown in Figure 19; the baffle has been sliced along
its horizontal axis (perpendicular to the slots) to reveal the internal structure. Note that the geometry of baffle pictured is similar but not identical to our final version. A circuit board, upon which four
APDS-9250 sensors are installed, is mounted behind the baffle. The circuit board is shown in Figure 20. The sensors, which appear as small black squares in the figure, are installed on 9 mm centers; the sensors align horizontally and vertically with the narrow ends of the tapered, angled slots in the baffle. The sensors’ digital interfaces67
are connected to the display controller. Each individual sensor measures the mean luminosity of its partition of the scene68.
The fields-of-view of the sensors are non-overlapping out to a distance of about 2 m (the distance of the scenes in the demonstrations of the display in (Lincoln, Blate, et al. 2017)). Taken together, the light measurements from the four sensors cover a superset of the horizontal axis of the display’s field-of- view (about 35°) at a spatial resolution sufficient for the demonstration of the display of (Lincoln, Blate, et al. 2017).
Spatial and temporal filtering is applied to the sensor data to avoid discontinuities between sensors and over short time scales. Without the spatial filtering, for example, large differences in readings from adjacent sensors would result in unnatural discontinuities in the adjusted brightness of virtual objects. Spatial filtering is implemented via linear interpolation between the sensor readings. In the time domain, a simple low-pass filter is applied to the sensor readings. The filter’s step response is on the order of 200-300 ms, which is consistent with perceptual adaptation rates (Riggs 1965). Interestingly, low- latency and high sample rates are not requirements for the light sensors themselves, which are sampled at 40 Hz; this, again, is due to perceptual adaptation rates [idem]. The 40 Hz sampling frequency is due to the sensors themselves, which require longer integration periods to obtain the desired dynamic range. This frequency is more than adequate given that the pass-band of our temporal filter is on the order of 3-4 Hz.
67 For completeness, the display controller communicates with the sensors via the I2C (Inter-Integrated Circuit)
protocol (Phillips 1982). Each sensor has a dedicated connection to the display controller.
68 The ADPS-9250 also provides color sensing (luminosities for red, green, and blue); this feature was not used in
Spatial and temporal filtering is performed at a higher rate (1.5 kHz) to avoid temporal discontinuities. The 1.5 kHz rate was due to limited computational resources; the value should be understood as non-normative.
To be clear, the position-sensitive light sensor is not properly a part of the illuminator or DBI. Rather, a position-sensitive light sensor or equivalent measurement device is a necessary component of a high dynamic range OST AR display with scene-adaptive lighting. The sensor and the illuminator independently interface with the display controller.
Figure 21: HDR OST AR display demonstration.
The three photos are of the same scene (real and virtual content). The virtual content comprises two Utah teapots, both of which are visible in photo (b). The camera’s exposure is fixed to provide correct exposure for teapot under the box. The luminosity of the left-hand teapot is 256 times (48 dB) greater than the right-hand teapot; that is, the entire 16-bit range of the illuminator is in simultaneous use. As seen in photo (a), due to the camera’s limited dynamic range and the selected exposure setting, most of the image is saturated and only the darker areas are visible. In photo (b), a neutral density filter (about 98% (~34 dB) attenuation) covers the left half of the camera lens; this allows the camera, with its limited dynamic range, to see the bright and dark sides of the scene. In photo (c), the neutral density filter covers the entire camera lens; note that the teapot on the right is no longer visible. The entire scene and both teapots are clearly visible to the naked eye (Lincoln, Blate, et al. 2017).
(a) No Filter
(b) Filter over
left half of
frame
(c) Filter
over full
frame
Figure 22: Motion-to-photon latency demonstration.
Both images were captured from video with the display rotating at approximately 16°/s to the right. A 120 mm-pitch checkerboard is positioned in the center of the real-world scene. A virtual checkerboard is registered to the real checkerboard. Top: motion compensation disabled. The virtual image is displaced to the right by a full checkerboard square due to latency. Bottom: motion compensation enabled. The virtual image remains correctly-registered with the real world. The slight misalignment at the upper right is due to (uncorrected) optical distortion, not latency (Lincoln, Blate, et al. 2017).
Motion compensation disabled
Dynamic Range Demonstration
3.4.3.3
HDR is the hallmark of the display of (Lincoln, Blate, et al. 2017). In this demonstration, we use a camera with standard dynamic range at a fixed exposure. A neutral density filter, placed in front of the camera’s lens, attenuates the incoming light, revealing the brighter areas of the scene and virtual imagery.
The photographs in Figure 21 are of the same scene, the same virtual images (including intensity), and are photographed with the camera at the same fixed exposure. The left-hand side of the scene is lit by a 250 W-equivalent desk lamp; the objects on the right-hand side of the scene are under an opaque box, placing them in deep shadow. The luminosity of the left-hand teapot is 256 times (~48 dB) greater than the right-hand teapot; that is, the entire 16-bit-per-color programming range of the display is in simultaneous use. The camera’s exposure is set to provide correct exposure for the deep shadow area. Note that the human eye, whose dynamic range far exceeds that of the camera, has no difficulty viewing this scene through the display; both teapots appear at intensities appropriate for their surroundings.
Image (a) of Figure 21 is what the camera sees with the aforementioned fixed exposure; the bright portion of the image is saturated, but the dark area has appropriate exposure. To reveal the content in the bright region, shown in image (b) of Figure 21, we covered the left half of the camera’s lens with a neutral density filter, providing about 98% (~34 dB or about 5.6 f-stops) of attenuation. With the filter in place, we can now see both teapots. Covering the entire camera lens with the neutral density filter, as shown in image (c) of Figure 21, the deep shadow area completely disappears.
The real-time scene-adaptive component of this display is difficult to illustrate in still images. We refer the reader to the video referred to in (Lincoln, Blate, et al. 2017) for demonstrations of this
Low-Latency Demonstration:
The display’s low-latency was evaluated by placing a video camera in the display’s eye box and filming the display as the head-mounted display (HMD) was rotated left and right (yaw). A checkerboard was placed on a table about 2 m from the display. A virtual checkerboard is shown on the display, registered to the real checkerboard. Display latency will result in displacement of the virtual image relative to the real world. Figure 22 illustrates this effect with the HMD rotating at about 16°/s to the right. In the upper image, the display’s motion compensation logic is disabled, resulting in at least 16.7 ms of motion-to-photon latency69. The effect, even at this relatively low rotational velocity, is a real- virtual displacement of about one full checkerboard square (120 mm) to the right (in the direction of rotation). Seen in real-time through the display, the virtual object moves discontinuously—jumping left and right by a distance proportional to the rotational velocity.
In the lower image, motion compensation is enabled and no real-virtual displacement is visible. At higher rotational velocities, motion blur in both the real and virtual content is seen, which is what would be expected.
69
Figure 23: Volumetric near-eye display depth decomposition.
Figurative example of voxelization, decomposition, and mapping to binary frames of a notional 2-bit-per- color (6 bpp scene). DBI enables full-color, as depicted in the displayed binary volume (right). In
practice, the order and intensities of colors are scene-optimized. Note that in the computation of the color volume from the OpenGL scene, only the first voxels encountered (those that would be visible) are preserved. (See also Figure 5 of (Rathinavel, et al. 2018).)
Figure 24: Volumetric near-eye display: view through display.
View through volumetric near-eye display where virtual objects are placed among real objects. Left: Overhead depiction of scene geometry. Icons to the left of the dimensional line correspond to virtual objects in the scene and icons to the right of the dimensional line correspond to real objects in the scene. In each row, the only difference between the see-through views is the camera’s focus setting. (See Figure 9 in (Rathinavel, et al. 2018).)