• No results found

Standard Intensity Vision Processing

2.4 Vision Processing for Retinal Prostheses

2.4.2 Standard Intensity Vision Processing

Current approaches predominantly employ intensity-based vision processing, in which stimulation levels convey the sampled light intensity near the projected electrode location in the visual field. This can be thought of as directly downsampling the input intensity image to the output display units. Although the implementation details of different approaches may differ slightly, this type of method will be referred to as the standard intensity method. An example of the standard intensity vision processing is shown in Figure 2-6.

The aim of these downsampling based vision processing methods is to filter all high frequency data, such as fine image details, noise, or texture, above a given cutoff without affecting lower frequencies, such as relatively large contiguous regions. Selection of an appropriate downsampling filter is important because if not all high frequency signals are filtered then aliasing can occur. An example of this effect in prosthetic vision would be if small changes in intensity on a single surface, e.g. from texture, cause significant and potentially misleading variations in phosphene brightness. Furthermore, some filters may also affect frequencies below the cutoff, which can result in a significant reduction in the sharpness of the filtered image.

In the standard intensity representation, the brightness𝐡(πœ™, 𝐼)of each phosphene

πœ™ is obtained by sampling the filtered input image at the projected location of the

electrode 𝑋(πœ™). Thus, the general phosphene brightness function for filtering based

methods is given by:

𝐡(πœ™, 𝐼) = (𝐼*𝐹)βˆ˜π‘‹(πœ™), (2.11)

where 𝐼*𝐹 denotes the convolution of 𝐼 with filter kernel 𝐹.

The minimal amount of vision processing is performed by simply setting phosphene brightness based on the intensity of the closest pixel to the projected phosphene location in the raw input image [23]. This is a special case of Equation 2.11 where

𝐼*𝐹 =𝐼. This approach does not filter high frequency data within the input image

and is prone to large output level variations caused by noise or texture. More recent methods use non-trivial filters, in order to ensure that the stimulation level of each

electrode is a more stable and representative depiction of the corresponding region of the visual field. Humayun et al. [65] use block filtering, averaging the intensity values within a block of pixels centred on the target pixel, setting 𝐹 to a matrix of ones

divided by the size of the filter. This type of filter has a high impact on frequencies below the cutoff, heavily blurring boundaries in the scene. Hayes et al. [62] set 𝐹 to

a Gaussian kernel, improving boundary sharpness by giving higher weights to pixels closer to the projected phosphene location. Dowling et al. [45] use median filtering, further improving the sharpness of edges in the input image.

2.4.2.1 Lanczos2 Filtering

In signal processing it is well-understood that Nyquist band-limited filtering prevents aliasing when performing image downsampling [154]. Lanczos2 offers a better com- promise than other practical filters in reducing aliasing and retaining sharpness [148]. Barnes et al. [11] have shown that applying the Lanczos2 filter to downsample the input image at the Nyquist frequency is effective for prosthetic vision. Since sampling frequency is the inverse of sample spacing, the Nyquist frequency of a phosphene dis- play can be estimated as 1/π‘Ÿ, where π‘Ÿ is the average nearest-neighbour distance of

each phosphene. Note that in a regular phosphene grid, π‘Ÿ is equal to the phosphene

spacing. This gives the 2D Lanczos2 reconstruction kernel of size 2π‘Ÿ+ 1:

𝐿=π‘˜ (οΈ‚ 𝑐(π‘₯, 𝑦) π‘Ÿ )οΈ‚ , (2.12)

where 𝑐(π‘₯, 𝑦) = β€–(π‘₯, 𝑦)βˆ’(π‘Ÿ, π‘Ÿ)β€– is the distance to the kernel centre, and π‘˜ is the

Lanczos2 kernel given by:

π‘˜(π‘Ž) = ⎧ βŽͺ ⎨ βŽͺ ⎩ sinc(π‘Ž)sinc(οΈ€π‘Ž2)οΈ€, if π‘Žβ‰€2 0, otherwise. (2.13)

Therefore the vision processing method defined by

(a) RGB (b) Sampling Locations (c) Phosphene Image

Figure 2-6. Example SPV of a scene with the standard intensity vision processing method. Image (c) shows a simulation of what an implant user might be expected to see when viewing the scene using this visual representation with a 20-electrode retinal implant. Note the difficulty of interpreting this scene with the standard representation.

is the current state-of-the-art standard intensity method. 2.4.2.2 Clinical Trials

The standard intensity visual representation has been evaluated on an orientation and mobility task by Second Sight Medical Products LLC [65]. Using the Argus II epiretinal implant, participants performed the task of walking towards and touching a door-sized black target in a room with white featureless walls. Use of the implant with the standard intensity visual representation (system-ON) was found to result in a higher success rate than the control condition (system-OFF) in which no visual information was conveyed through the implant (55% system-ON vs. 31% system-

OFF at 3 months with 𝑁 = 29 participants; 60% system-ON vs. 8% system-OFF at

24 months with 𝑁 = 8 participants). This demonstrates that the standard intensity

representation can enable basic wayfinding for implant users assuming an environment with appropriate contrast.

2.4.2.3 Limitations

The standard downsampling-based vision processing methods aim to directly depict the scene intensity values from regular intervals of the input image, without assessing the importance of any part of the scene. However, this often means that details useful for the task, such as the top stair in a flight of steps, or the trip hazard

resting on the ground in Figure 2-6c, can be missed due to a lack of display capacity. Additionally, the standard intensity method relies on different scene components to have high contrast with each other in order to be discernible on the prosthetic vision display. Therefore it can be desirable to prioritise and enhance the display of crucial scene components.