Application of Program 2 to the Grading of Images

3. Investigation of Longitudinal Chromatic Aberration on Short-Wavelength

5.5 Application of Program 2 to the Grading of Images

The difference between two graders in making clinical judgements whilst using Program 2 was assessed. Grading was performed using the bitmap image in

conjunction with the Fundus Grading Grid program with reference made to the stereoscopic pair stored in the original JPEG format, where necessary. The graders, an optometrist (JA) and consultant ophthalmologist (JMG) performed all grading independently. The graders were masked to the identity of the patients and images were graded in a random order to reduce observer bias.

49 images from patients with AMD were viewed on a 20.1” screen. A prismatic stereoviewer was used to facilitate differentiation of large drusen from areas of hypopigmentation and to aid viewing of any raised lesions, such as a pigment epithelial detachment. Visualisation factors such as the distance and angle of viewing, monitor resolution and magnification were kept constant. Images were graded according to the International Classification and Grading System (Bird et al. 1995). Stage of disease was determined according to the stages of severity defined by an epidemiologic study, based on progression rates of features over a 6.5-year period (van Leeuwen et al. 2003b). Stage of disease was then redefined based on an alternative staging system the CARMS system (Seddon et al. 2006). Modified from the AREDS staging system and originally designed to be used by graders with minimal training, the CARMS stages could be easily identified from the features already graded.

The inter-observer agreement was determined using the weighted kappa statistic (κ) for each feature graded and is shown in Table 5-2. The κ statistic can be used with categorical data and can range from -1 indicating exact disagreement to +1, representing exact agreement (Landis & Koch 1977). The following interpretation of κ values was proposed as 0.41 to 0.60 indicating moderate agreement, 0.61 to 0.80 substantial agreement, and 0.81 to 0.99 almost perfect agreement (Landis & Koch 1977). The agreement between graders ranged from 0.42 to 1. All characteristics showed substantial agreement except for area covered by drusen, main location of hyper/hypopigmentation and main location of neovascular AMD, which showed

moderate agreement. Agreement between graders of identification of stage was excellent.

The difference between two staging systems is shown in Figure 5-7. The line of best fit has a gradient of 0.73. Were the two stages to agree exactly, this value would be 1 (illustrated by dotted line).

Weighted κ Standard error Drusen Type 0.91 0.03 Number 0.90 0.05 Size 0.90 0.04 Main location 0.80 0.07 Area covered 0.56 0.10 Pigmentary Changes Hyperpigmentation 0.76 0.08 Hypopigmentation 0.94 0.05 Main location 0.42 0.16 Geographic Atrophy Presence 1 0 Location 0.73 0.16 Area covered 0.82 0.18

Neovascular AMD Presence 1 0

Typifying features 0.84 0.17

Location 0.57 0.35

Area covered 1 0

Stage of AMD 0.90 0.05

0 1 2 3 4 0 1 2 3 4

Stage of AMD: The Rotterdam Study

S ta g e o f A M D : C A R M S

Figure 5-7. Comparison of two systems for staging of severity of AMD

Two staging systems for AMD were used to grade the same fundus images. A converted scale for the CARMS staging system (Seddon et al. 2006) is plotted against the Rotterdam Study staging system (van Leeuwen et al. 2003b). The solid line shows the line of best fit which has a gradient of 0.73. The dotted line shows the line of exact agreement which has a gradient of 1.

5.6 Discussion

Two separate applications of the programming were to map visual field data and the standard AMD grading grid onto the fundus image. The programs were evaluated for accuracy of the spatial mapping in terms of retinal distances. Once the grading program was developed and evaluated, it was then applied in order to assess the inter- observer agreement in grading of AMD. The evaluation results present evidence of accurate mapping to within approximately 80µm for Program 1: Perimetric Fundus Map and approximately 65µm for Program 2: Fundus Grading Grid. Interpretation of Program 1 in relating drusen or other features with coinciding visual field defects at individual stimulus locations is therefore limited to drusen larger than 80µm, since it is not possible to make the same interpretations with smaller features. Overall inter- observer agreement of the grading of AMD features when using the program was good. It is not surprising that the repeatability of Program 2 yielded greater accuracy than Program 1, since less user defined measures are made in Program 2. The accuracy of the Perimetric Fundus Map program to detect whether visual field defects lie over

retinal signs should be limited to features larger than 80µm, therefore hard drusen or other small features in isolation cannot be confidently associated with defects on the output map. Factors influencing the grid positioning were the degree of definition of the optic disc margins and the macula, which affected the ease for the user of mouse click placement. A more magnified image, in cases where very small drusen were present in the central subfield, was less likely to give positional errors, than a less magnified image. This was due to better visibility of the retinal landmarks despite a greater number of pixels contained within the same area.

In photography, compression algorithms reduce the size of image. TIFF (tagged image file format) and bitmap files use lossless compression, in which the algorithms search for redundancy of information in the image to recode more efficiently. Lossless compression permits the original image data to be retrieved, whereas lossy image files do not. JPEG (Joint Photographic Experts Group) files are lossy files and were designed to reduce the file size without causing a significant visible difference to the image. It is possible to vary the compression ratio to suit the image quality versus file size requirements. The camera used in this evaluation was a 6.3 megapixel camera with a pixel resolution of 3072 x 2048 and is a camera recommended by the National Screening Committee guidelines for diabetic retinopathy (UK National Screening Committee, 2009). The monitor resolution used for all grading was a 20.1” display with a pixel resolution of 1600 x 1200 in accordance with the standards for grading diabetic retinopathy (UK National Screening Committee, 2009), which fulfils the image resolution requirement of at least 20 pixels per degree. A limitation of the camera was that the best possible image quality output was a high quality JPEG, as opposed to a lossless file such as a TIFF. However, since the maximum quality JPEG image reduces the file size by 90.2% compared with a TIFF image, it was considered that the visible difference in image quality was negligible. The only image file format which could be programmed and manipulated in Liberty BASIC software was the bitmap.

This necessitated conversion from the original JPEGs to bitmaps. The bitmap conversion of these files therefore, represents a lossless compression of the original JPEG files, where there was no observable degradation to small features such as hard drusen. Notably, the automated drusen segmentation program (Smith et al. 2005a,b) converts all imported images to bitmaps before performing segmentation, which validates the programming methods in the present evaluation.

The standard assumption of retinal distance used in AMD grading is in itself a major source of error relevant to the grid positioning in the programs. The COR values in the above results recalculated using 1850µm as the disc diameter gave values of 100µm and 78µm for Programs 1 and 2, respectively. Further caution regarding the interpretation of the COR values is necessary based on the parametric nature of their calculation, in this case being applied to non-Gaussian data. Unfortunately no non- parametric equivalent exists and transformation of the variables rendered the COR values clinically meaningless. The International classification system quotes diameters of hard drusen as <125µm and soft intermediate drusen as >63µm and <125µm (Bird et al. 1995), therefore the ranges of repeatability still fall within the same range of drusen type.

The MP1 microperimeter allows for superimposition of the retinal differential light thresholds onto the fundus image. This is achieved by a calculation of the true size of the retinal image, involving the camera and ocular factors. Based on the Gullstrand schematic eye, a camera factor of 0.438 is used. The ocular magnification is calculated according to the ametropia of the eye. This is measured by correcting for any size difference in the diameter of the superior temporal vein as it leaves the optic disc. It is assumed that this vessel diameter is constant for all eyes. This technique of superimposition appears to rely on a greater number of assumptions than the methods described in this evaluation.

The worst agreement between graders for features of AMD related to location of pigmentary and neovascular changes and the percentage area covered by the drusen. The greatest error in grading thus involved spatial aspects rather than the identification of the features. This highlights the importance of the initial accurate positioning of the grading grid. Spatial inter-observer error could also be reduced by agreement on precise counting or measuring of drusen at the subfield margins prior to grading. Less variability surrounded the assignation of images to stage of severity of AMD, since this relied primarily on feature identification. Comparison of two staging systems determined from the same images showed close agreement of severity of disease. Both staging systems distinguished five stages of AMD (Tables1-2 and 1-3), the main differences between the systems involved the observation of pigmentary changes and drusen together. The CARMS system (Seddon et al. 2006) classified the presence of pigmentary changes and drusen at an earlier stage, whereas this was considered to carry higher risk based on incidence data from the Rotterdam Study (van Leeuwen et al. 2003b). Furthermore the CARMS system included approximate counts of drusen number, however the system after van Leeuwen et al. (2003b) did not. Parallel findings in a previous study were reported, where little difference was observed between grading of images using the systems proposed by the Rotterdam Study and AREDS (Tikellis et al. 2006). Although definitions of features differ between staging systems, this evidence suggests the determination of functional progression of disease may yield similar results between systems.

Program 1: Perimetric Fundus Map was written to relate visual function to structural changes at the macula (see Chapter 8). The clinical application may be to aid the detection and monitoring of central retinal diseases, where visual field defects can be easily mapped to fundus photographs or other images such as infra-red SLO images. Retinal photography and perimetry are routinely carried out in Optometric practice, therefore the program could be a useful tool integrated into everyday patient care.

Program 2: Fundus Grading Grid represents a rapid and convenient method of grading AMD for research purposes, which avoids the traditional method of plastic overlays. These programs provide useful tools for the analysis of digital fundus images.

6. Drusen Detection in Retro-Mode Imaging by a Scanning Laser

In document Visual field and structural alterations in age-related macular degeneration (Page 155-163)