Modelling the effects of inter-observer variation on colour rendition

(1)

Modelling the effects of inter-observer

variation on colour rendition

MJ Murdoch PhD and MD Fairchild PhD

Munsell Color Science Laboratory, Rochester Institute of Technology, Rochester, NY, USA

Received 7 July 2017; Revised 14 October 2017; Accepted 25 October 2017

The colour rendition characteristics of light sources are quantified with measures based on CIE standard observers, which are reasonable representations of population averages. However, even among people with normal colour vision, the natural range of variation in colour sensitivity means any individual may see something different than the standard observer. Modelling results quantify the effects of these inter-observer differences on colour rendition measures defined by IES TM-30-15. In general, inter-observer differences tend to be smaller for light sources with high colour fidelity values, and they are affected by spectral characteristics of different lighting technologies. The magnitude of variation in colour rendition measures, up to 5–10 units in IES TM-30-15 (Rf, Rg), measures is

compared with other sources of variability and ambiguity.

1. Introduction

Colour rendition is one of the most important characteristics of a light source, while at the same time being one of the most difficult to describe simply. Throughout the history of modern electric lighting, researchers and practitioners have sought to provide objective and relevant descriptors of colour perform-ance. In recent years, an ongoing conversa-tion on colour fidelity, naturalness, gamut area and preference, along with the evolution of colour difference computations, has played out. So far, this conversation has left out another maturing topic in colour science: the role of inter-observer variability. This paper outlines how the human population distribu-tions that affect colour sensitivity translate to differences in colour rendition, and puts them in context with other sources of variation.

2. Method

2.1. Colour rendition of light sources

The spectral characteristics of an illumin-ant have a direct and readily apparent effect on the reflected spectral power distribution of objects being illuminated, and as such on the colour stimuli reaching the human eye. As lighting technologies have developed, and as our understanding of human perception has improved, many different measures and indices have been used to describe colour rendition. A good review of progress since 1965 is provided by Houser et al.,1 in which they compare measures that use different classes of colour rendition: fidelity, or the accuracy with which a light source renders object colours as compared to a familiar reference; preference, or the pleasantness or flattery with which a light source renders objects; and discrimination, or the ability of a light source to increase perceived differences between object colours.

The legacy industry standard for colour rendition, which addresses colour fidelity, is the

Address for correspondence: Michael J Murdoch, Munsell Color Science Laboratory, Rochester Institute of Technology, 54 Lomb Memorial Drive, Rochester, NY 14623, USA. E-mail: [email protected]

(2)

Commission Internationale de l’Eclairage (CIE) General Colour Rendering Index Ra, which describes the average computed accuracy in colour reproduction over a small set of stand-ard colours, relative to a reference illuminant of the same correlated colour temperature (CCT).2 CIE Ra was developed in part to quantify visual differences people noticed in the colour rendition of objects and human skin under gas discharge and fluorescent lighting compared to incandescent sources or natural daylight.3 Rules-of-thumb and prescriptive norms have followed, for exam-ple ENERGY STARÕ requires a CIE Ra of 80 or higher.4 Lighting manufacturers have responded by producing products carefully designed to meet the different levels of CIE Ra values that the market demands.

However, because CIE Ra addresses only colour fidelity, and it does so with a small number of test colours using outdated color-imetry, it is being eclipsed by the Illuminating Engineering Society of North America’s

(IESNA) standard TM-30-15 (TM-30,

herein),5 presently under consideration inter-nationally by the CIE. As described by David et al.,6 TM-30 uses a set of 99 colour evaluation samples (CES) chosen for spectral and colour space uniformity and computes colour differences for these colours between test and reference sources in CAM02-UCS.7 The results provided by TM-30 are two summary scores – a colour fidelity index Rf and a colour gamut index Rg – along with additional graphics and values to clarify hue-binned colour distortions. The need for a two-measure system follows an interesting line of research into discrimination, naturalness, and preference including the work of Rea and Freyssinnier,8,9 Smet et al.10 and Houser et al.1 Importantly, TM-30 makes excellent use of plots, especially colour vector graphics, which grew out of the work of van der Burgt et al.,11,12 to further explain colour rendition differences that the two average indices may obscure. The IESNA provides an Excel

spreadsheet that performs the computations described in the TM-30 document. The appli-cation of TM-30 to a test light source includes these steps:

1) Determine the CCT of the test source using a standard 2-degree observer. 2) Define a reference illuminant of the same

CCT that is Planckian if below 4500 K, using the CIE daylight model if above 5500 K, and a linear mix in between. 3) Compute the coordinates of the 99 spectral

colour evaluation samples (CES) in

CAM02-UCS using a standard 10-degree observer for both test and reference light sources.

4) Compute colour differences between test and reference as an Euclidean distance in CAM02 UCS, and report a fidelity metric Rf based on the mean colour difference. 5) Compute a polygon in the

opponent-colour plane of CAM02 UCS based on 16 binned hues of the 99 CES for both test and reference sources, and report a gamut Rgbased on the ratio between the areas of the test and reference polygons.

6) Provide bar charts of hue-binned fidelity and relative saturation along with a hue-circle distortion colour vector graphic. It is important to note that neither CIE Ra nor IESNA TM-30 explicitly provides a quality or preference metric. Many of the references mentioned do discuss preference and in general the pleasing enhancement of colour saturation in certain parts of the hue circle, depending on context.10,13–15 Recently, Royer et al.16,17 have shown that the relative saturation increase of red hues, as quantified by the TM-30 colour saturation score for hue bin 16, Rcs,h16, is a good predictor for

perceived saturation and preference.

However, the present study makes use of the TM-30 measures explicitly and thus does not comment on preference. To quantify the effects of normal variation in visual sensitivity in the population, we modelled realistic

(3)

individual observer dependencies and how they cascade through computations to colour rendition measures. This paper presents a complete story based on preliminary findings published previously.18

2.2. Inter-observer variability

Colorimetry has functioned successfully for nearly a century using sets of colour matching functions designed to represent the visual sensitivity of an average human observer. As with any average, the mean sensitivities do not necessarily represent any individual observer. Within any given population of people with normal colour vision (i.e. exclud-ing colour vision deficiencies), there is a natural range in spectral sensitivities caused by variations in cone spectral absorptivity, ocular media density, and other anatomical and physiological parameters. Some of these variations are age dependent, others are genetic, and some have other causes such as diet or environment.

It is not practical to measure many indi-viduals’ visual sensitivities, but using known, measured distributions of the variables that affect sensitivity, population distributions can be estimated. CIE TC 1-36 standardised a representative set of physiological observers in 2006 (CIE06) from which cone responsiv-ities can be computed for observers of various ages (from 20 to 80 years) and for matching stimuli of various field sizes (from 1 8 to 10 8).19 However, each of these computed CIE06 function sets represent an average observer for that particular age and field size. They do not capture the added variabil-ity amongst all of the observers for the specified age and field size combination. Sarkar et al.20 provide both an excellent review of the development of the CIE06 observers and an analysis of the effect of physiological perturbations to the average CIE06 observers on colour reproduction. They showed that observer differences caused both differences in computed colorimetric

values and changes in direction of error for pairs of stimuli that were metameric to the average observer.

Recently, Asano et al.21 created an indi-vidual colorimetric observer model that takes into account the natural distributions of anatomical and physiological features such as physical densities of ocular components and wavelength and density shifts of cone photopigments based on age, size of visual field and random population variability. Using Monte Carlo simulations, their model is able to create populations of individual sets of cones sensitive to long-, medium- and short-wavelength (LMS) light with realistic physiological distributions for observers of specified age and field size. These simulated populations of observers have been verified using population statistics and on an individ-ual basis by predicting observed experimental colour matches. A simulated set of 1000 individual observers’ 10-degree LMS sensitiv-ities is shown as shaded areas in Figure 1(a), along with the mean LMS sensitivities as black lines. Each individual’s sensitivities are a smooth set of curves plotted with transpar-ency so that the shaded regions are denser where more observers happen to overlap.

The relative sensitivities of human LMS cone cells are often transformed to XYZ-like colour matching functions (which will simply be referred to as CMFs in this paper) that can then be used to integrate a stimulus spectral power distribution to colorimetric values. There is not a standard conversion matrix between LMS and CMF curves, so in this work a best-fit 3 3 matrix is computed between the mean of a population of LMS curves and the CIE 10-degree standard obser-ver CMFs. The population CMFs corres-ponding to the LMS curves in Figure 1(a) are

shown as shaded in Figure 1(b) with

the 10-degree standard observer overlaid as black lines. The specific matrix conversion from mean LMS to mean XYZ sensitivities follows

(4)

x y z 2 6 4 3 7 5 10 ﬃ 0:4503 0:2631 0:0461 0:1617 0:0731 0:0013 0:0032 0:0048 0:2305 2 6 4 3 7 5 l m s 2 6 4 3 7 5 US:10

The CMFs in Figure 1(b) behave, on average, like the CIE 10-degree observer, but individual curves include variation typical of the US population. Each individual’s set of three curves can be used in place of the standard CMFs for any colorimetric compu-tation. This affects everything from XYZ to CIELAB values. In this work, two simulated populations of observers, comprised of real-istically-distributed individual colour match-ing functions, are used to analyse how individual differences affect colour rendition measures with selected light sources.

2.3. Modelling methodology

Colour rendition computations essentially mimic a visual comparison between a set of coloured objects illuminated by a test light source and the same objects illuminated by a reference illuminant. One could physically set up this comparison and have a look, but usually, this is done computationally using

standard colorimetric observers (e.g. CIE 1964 10-degree observer, in the case of TM-30), giving numbers that describe the colour rendition that the standard observer would see. The situation modelled herein is slightly different: the same comparison between test and reference is set up (implying that the choice of reference illuminant, based on CCT, was made using the standard colorimetric observer – CIE 1931 2-degree observer, in the case of TM-30). Then, a simulated but realistic population of obser-vers with normal colour vision queued up to make the visual comparison, and they each reported their description of colour rendition based on what they saw. Because of inter-observer variation, each has some probability of seeing something different than the stand-ard observer, while on average the population behaves similarly to the standard observer.

MATLAB implementations of the colour rendition measures discussed above were written based on the IES’ TM-30 document5 and verified against its Excel spreadsheet (IES TM-30-15 Basic Calculation Tool v1.01.xlsm). The MATLAB re-implementation was neces-sary both for rapid scriptability and to allow the flexibility to vary the colour matching functions (and thus diverge from the TM-30 recommendation). In Step 3 of the TM-30

computation, the CMFs used in the

10 (a) (b) 2 1.5 1 0.5 0

Rel. sensitivity Rel. sensitivity

8 6 4 2 0 400 450 500 550 Wavelength (nm) 600 650 700 400 450 500 550 Wavelength (nm) 600 650 700

Figure 1 Population sensitivities. Plot (a) shows LMS long- (in red), medium- (green) and short-wavelength (blue) cone sensitivities, and plot (b) shows CMF x (red), y (green) and z (blue), of 1000 simulated observers for 10-degree field of view. Solid black lines show average LMS sensitivity in plot (a) and CIE 1964 standard observer in plot (b). (Available in colour in online version).

(5)

computation of colorimetric values for both test and reference sources were made variable so that the individual colorimetric observers could be used. The CIE 1931 2-degree stand-ard colorimetric observer was used to calcu-late CCT and select the reference illuminant for all of the present modelling (Steps 1 and 2). Thus, this is a simulation of realistic popula-tions of observers assessing the comparison of test and reference illuminants and potentially seeing colour rendition characteristics that differ from standard measures thanks to their individual sensitivities.

2.4. Observer populations

In the present work, we simulated two different populations at two different visual field sizes. Because age has a strong effect on the physiological factors that affect visual sensitivity, the populations differ in age dis-tribution. The first population, a.k.a. the US Age population, comprised 1000 observers sampled from an age distribution taken from the most recent (2010) U.S. census. The age distribution is fairly uniform up to about age 60, then it tails off steeply. A subset of this distribution was made, limited to ages between 10 and 70, inclusive, the same as that described by Asano et al.21 Their paper

gives further details on the distributions of physiological characteristics used in the simulation. Our supplementary data file USAgePopulation.xlsx includes all 1000 observers’ physiological parameters and LMS sensitivities – these are the sensitivities shown in Figure 1.

The US Age sensitivities are repeated in Figure 2, with the sensitivities of two example observers, arbitrarily named Pat and Sam, highlighted. Much of the variation visible can be interpreted as a wavelength shift in LMS: peak sensitivity wavelength shift for each cone type are explicit parameters in Asano’s model, and one effect of increased macular density is similar to a shift of the S curve to longer wavelengths. The high-lighted observers were selected from the 1000 because they are both of the mean and median age of 38 years, and because they are approximately equidistant, on average, in wavelength shift, from the average observer. Their LMS values are shifted about 3 nm batho (Pat) and hypso (Sam) from the aver-age LMS curves, which puts them at the 10th and 88th percentiles of the population. Their names are intentionally gender-neutral to remind the reader that sex is not a physio-logical characteristic included in the model.

10

(a) (b)

Rel. Sensitivity Rel. Sensitivity

8 2 1.5 1 0.5 0 6 4 2 0 400 450 500 550 Wavelength (nm) 600 650 700 400 450 500 550 Wavelength (nm) 600 650 700 Mean LMS CIE 1964 Pat Sam Pat Sam

Figure 2 US Age population sensitivities. Plot (a) shows LMS long- (in red), medium- (green) and short wavelength (blue) cone sensitivities, and plot (b) shows CMF x (red), y (green) and z (blue), of 1000 simulated observers drawn from an age distribution matching the US population, for 10-degree field of view. Solid black lines show average LMS sensitivity in plot (a) and CIE 1964 standard observer in plot (b). Dashed lines show example observers ‘Pat’ and ‘Sam,’ explained in the text. (Available in colour in online version).

(6)

Their results are included in subsequent figures to help show individual and popula-tion differences.

The second population, the Age Group population, includes a bimodal distribution, chosen both because age is one of the few physiological factors that can be practically targeted in the market, and to emphasise that age is not the only factor in inter-observer

variation. The Age Group population

includes 500 observers of age 25 years and 500 observers of age 65 years, and their LMS and CMF sensitivities can be seen in Figure 3. With the younger observers plotted in trans-parent red and the older in transtrans-parent blue, biases are clearest at short wavelengths, while in regions where they are similar they blend together as purple. Note that while age is a strong contributor to visual differences, it is not the only one: there are differences in sensitivity visible both within and between the two ages present in the Age Group popula-tion. All 1000 of the Age Group observers’ physiological parameters and LMS sensitiv-ities are included in the supplementary data file AgeGroupPopulation.xlsx.

As was done with the US Age population, the simulated Age Group LMS sensitivities were transformed to CMFs using a best-fit matrix. However, separate matrices were fit for the average of the younger and the

average of the older groups of the population. Our intent behind this choice was to be conservative and not exaggerate the effect of age difference beyond their inherent CMF shape differences. The matrix equations are shown below, respectively

x y z 2 6 4 3 7 5 10 ﬃ 0:4641 0:2784 0:0470 0:1746 0:0616 0:0023 0:0086 0:0119 0:2300 2 6 4 3 7 5 l m s 2 6 4 3 7 5 25yr:10 x y z 2 6 4 3 7 5 10 ﬃ 0:4165 0:2263 0:0445 0:1306 0:1000 0:0015 0:0112 0:0134 0:2321 2 6 4 3 7 5 l m s 2 6 4 3 7 5 65yr:10 2.5. Light sources

Nine contemporary light sources with dif-fering spectral characteristics and CCTs were

10 (a) (b) 2 1.5 1 0.5 0

Rel. sensitivity Rel. sensitivity

8 6 4 2 0 400 450 500 550 Wavelength (nm) Wavelength (nm) 600 650 700 400 450 500 550 600 650 700

Figure 3 Age group sensitivities: plot (a) shows LMS cone sensitivities, and plot (b) shows CMF xyz of 1000 simulated observers, 500 of whom are 25 years old (plotted in red) and 500 of whom are 65 (blue). Black lines in (a) indicate the mean LMS of each group of 500 (solid for age 25, dashed for age 65), while solid black lines in (b) show the CIE 1964 10-degree x-bar, y-bar and z-bar standard CMFs. (Available in colour in online version).

(7)

selected, including tungsten, RGB and RGBA mixed-narrowband LEDs, and blue-pumped phosphor-converted white LEDs. The nine sources, with code numbers corresponding to the light source numbers in the TM-30 spreadsheet,5 are listed in Table 1 and their relative spectral power distributions are shown in Figure 4. They were selected both because they include a variety of relevant LED light sources and because they cover a wide range of gamut versus fidelity values. These nine are not meant to be a representa-tive sampling of all light sources, but rather illustrative examples; later in the paper a larger set of light sources is considered.

3. Results and discussion

3.1. Variation in colour rendition measures The modelling results show how the nat-ural variations in an observer population manifest themselves in terms of colour rendi-tion measures for the selected light sources. Looking first at the variation present in the US Age population, we can see what degree of variation can be expected among observers in the US. Figure 5 shows TM-30 gamut versus fidelity plots, with clouds of dots indicating the computed measures for each individual, and the labeled open circles show-ing the standard observer’s measures. The trend with these nine light sources is for

smaller clouds near the maximum-fidelity cusp: at high levels of fidelity, a light source is spectrally so similar to its reference that little inter-observer variation is possible: what this means is that even if different observers see a light source’s colour rendition differ-ently, the difference between its colour

Table 1 Selected light sources: Code numbers and descriptions from the TM-30 spreadsheet, with relevant measures. CCT (in Kelvin), Duv, and Raare computed as standardised by CIE; Rf, Rgand Rcs,h16according to TM-30-15

Code Source type CCT Duv Ra Rf Rg Rcs,h16

82 Incandescent (60WA19) 2812 0.0001 100 100 100 0%

114 RGB (455/547/623) 3300 0.0001 73 73 113 17%

123 RGB (465/535/590) 5023 0.0004 82 77 90 13%

146 RGBA (455/530/590/635) 3K-A 2943 0.0014 74 83 112 13%

162 RGBA (455/530/590/635) 4K-G 3959 0.0011 92 88 104 4%

169 LED Phosphor Blue Pump (01) 2880 0.0082 92 87 91 4%

114 Rel. Power Rel. Power Rel. Power Rel. Power 3300K 123 5023K 162 3959K 255 2623K 279 6811K 146 2943K 169 2880K 276 400 500 600 Wavelength (nm) Wavelength (nm) 700 400 500 600 700 3081K

Figure 4 Selected spectral power distributions: the eight LED light sources are shown. Tungsten (82) is not shown

(8)

rendition and that of the reference is consist-ently small over observers. Results for the highest-fidelity light source, tungsten (82), are essentially pinpoint. For other sources throughout the (Rf, Rg) space, the cloud shapes are not consistently patterned. A few clouds are essentially elliptical with various orientations, while others appear to have an underlying ‘curl’ to their shapes.

Further information about the differences in colour rendition seen by the population of observers is shown in the colour vector graphics in Figure 6. These plots show rela-tive distortions of a hue circle on normalised CAM02-UCS axes approximating bluish-yellowish versus greenish-reddish. The gray polygon represents the reference hue circle, made up of the 16 hue bins of TM-30, and the

black solid ring shows the location of the hue bins under the test light source computed using the CIE 1964 10-degree standard obser-ver. The short line segments connecting the vertices help indicate the direction of both saturation (radial) and hue (circumferential) colour distortions. Also shown on these plots are the rings for each of the 1000 observers (red) and example observers Pat (dash) and Sam (dot-dash). A few values are presented in the centre of each graphic, the Rcs,h16 overall mean and standard deviation, as well as the values for Pat and Sam.

Looking over the colour vector graphics, in some cases the inter-observer variation is greater in regions where the distortion is greater: for example, light sources 82 and 255 show very little distortion and almost no variation, while light sources 114 and 279 show lots of both. However, light source 123 shows lots of variation, even in regions with low distortion, while light sources 169 and 276 show very low variation even in regions of high distortion. The lack of pattern in these variations seems to reinforce the need for a multi-measure, graphical approach to com-municating colour rendition and variability. 3.2. Age-related variation

Turning to the Age Group distribution, results in Rgversus Rfare shown in Figure 7, with the 25-year-old observers’ clouds plotted in red and the 65-year-old observers’ clouds plotted in blue. It is apparent that not only are there large differences between observers within these subgroups, there are even larger differences between them. For most plotted light sources, there is a clear difference between the age-based clouds. In extreme examples such as 146 and 169, there is little to no overlap between the subgroups. This means that observers of different ages are unlikely to perceive the colour rendition of these light sources in the same way.

Colour vector graphics for the Age Group population are shown in Figure 8, in which 120 CIE 1964 Pat Sam 115 110 105 Rg “gamut” 100 95 90 85 80 70 75 80 85 Rf “fidelity” 90 95 100

Figure 5 Rgvs. Rfclouds: red clouds of points for each of

the selected light sources indicate the modeled variability in observer colour sensitivity. The labeled open circles indicate the (Rf, Rg) values computed using the CIE 1964

Standard Observer, according to TM-30; and sym-bols þ and show (Rf, Rg) values for example observers

(9)

the red polygons indicate the younger group (age 25) and the blue indicate the older group (age 65). These vector graphics can be compared to those in Figure 6, which high-lights the example observers Pat and Sam, both aged 38. While not totally consistent, it may be observed that in many hue regions, for most light sources shown, Pat’s results

look like those typical of the younger age group, and Sam’s look like those of the older age group. This is because the LMS sensitiv-ities of Sam and Pat are shifted batho and hypso with respect to the standard observer, which is the same trend seen on average in younger and older observers, respectively. Figure 5 may be compared with Figure 7 as Code: 82 –0% –0% –0% 0.0% Rcs,h16: Mean: Std: Pat: Sam: Code: 146 13% 14% 11% 1.3% Rcs,h16: Mean: Std: Pat: Sam: Code: Greenish - reddish bluish - yellowish bluish - yellowish bluish - yellowish

Greenish - reddish Greenish - reddish 255 –1% 0% –2% 0.7% Rcs,h16: Mean: Std: Pat: Sam: Code: 276 –2% –2% –4% 0.8% Rcs,h16: Mean: Std: Pat: Sam: Code: 279 –13% –13% –13% 0.8% Rcs,h16: Mean: Std: Pat: Sam: Code: 162 4% 3% 4% 1.1% Rcs,h16: Mean: Std: Pat: Sam: Code: 169 –4% –2% –5% 0.8% Rcs,h16: Mean: Std: Pat: Sam: Code: 114 17% 18% 15% 1.6% Rcs,h16: Mean: Std: Pat: Sam: Code: 123 –13% –8% –17% 3.2% Rcs,h16: Mean: Std: Pat: Sam:

Figure 6 US Age colour vector graphics: each of the nine subplots indicates the relative distortion of hues around a hue circle, plotted on normalised axes approximating bluish-yellowish vs. greenish-reddish. On each plot, the gray polygon is an undistorted circle, the red polygons show the distortions per hue for observers in the simulated population, and the solid black polygon indicates the distortion computed with the standard observer. Short vectors connecting the gray and black polygons show the direction of distortion, and dashed and dot-dashed polygons represent the relative distortion seen by observers Pat and Sam, respectively. Text within each graphic lists the relative colour saturation for hue bin 16 (Rcs,h16): overall mean and standard deviation, and values for Pat and Sam. (Available in

(10)

well, and a similar relationship can be observed in TM-30 (Rf, Rg). It should be emphasised, however, that averages do not tell the whole story, because while age is a strong factor, the distribution of wavelength shifts and other characteristics reaches nearly the same range for individuals of any given age, as do the distributions seen in the resulting colour rendition measures.

3.3. Differences between light sources

Returning to the US Age population and looking closely at the results for Pat and Sam in Figure 5, it can be observed that they do not consistently appear on the same ‘sides’ of the clouds. Specifically comparing light sources 162 and 276 near the middle of the diagram, Pat (þ) sees a much smaller differ-ence in Rf, on average, than does Sam (),

because they appear to diverge in different directions from the standard observer. Differences in the Rf and Rg metrics for all observers including Pat and Sam are shown in Figure 9(a). It is worth pointing out the very different spectral characteristics of these two light sources (shown in Figure 4), which is likely the cause of the divergence between Pat and Sam.

Is it significant if one observer sees an Rf difference of 6 and another sees an Rf difference of 10? An easier question to answer is, what if two observers disagree on the direction of a difference (meaning, one thinks A4B while the other thinks B4A)? We found several pairs of light sources in the TM-30 spreadsheet that show this situation, with similar (Rf, Rg) values and reasonably high Rf. To illustrate, this, the light sources 141 (RGBA LED of 2994 K, Duv 0.0041, Rf86.9, Rg 99.5) and 192 (Phosphor-converted LED of 3551 K, Duv 0.0024, Rf 87.2, Rg 99.1) are shown in Figure 9(b). In this figure, it appears that about half the population will find the first source higher in Rf and Rg and the second source lower, and the other half of the population will see the opposite. Figure 10 clarifies the positions of these four illuminants in TM-30 (Rf, Rg) space and shows how Pat and Sam relate to the standard observer. We do not mean to exaggerate the magnitude of these differences – Pat and Sam are about four units of Rfand 2.5 units of Rgapart – but our point is that it is quite possible for observers with normal colour vision to dis-agree on the direction of a difference in colour fidelity and/or colour saturation between two light sources.

3.4. Susceptibility of light sources to inter-observer variation

One goal based on this work might be to design or specify light sources that are less susceptible to inter-observer variation in their colour rendition. It is interesting to observe that some light sources are indeed more 120 CIE 1964 25 years 65 years 115 110 105 Rg “gamut” 100 95 90 85 80 70 75 80 85 Rf “fidelity” 90 95 100

Figure 7 Rgvs. Rfclouds by age: distributions of colour

rendition measures for 25-year-old observers (red) and 65-year-old observers (blue). The labelled open circles indicate the (Rf, Rg) values computed using the CIE 1964

Standard Observer, according to TM-30. (Available in colour in online version).

(11)

susceptible than others, meaning the clouds of observer results for the US Age population are larger. On the TM-30 Rg versus Rf graphic, even though the inter-observer clouds are not strictly elliptical in shape, a first-order assess-ment of their two-dimensional variance and orientation may be made with fitted ellipses.

Ellipses were fitted to the US Age

distributions’ clouds of Rg versus Rf values for all light sources in the TM-30 data set and are shown in Figures 11 and 12. Both plots show the general trend of smaller ellipses closer to the maximum Rf point, but also clear differences in the characteristics of dif-ferent light source technologies, at least in this set of light source spectra: in Figure 11, the

Code: 82 –0% –0% 0.0% 0.0% Rcs,h16: Mean 25: Std 25: Mean 65: Std 65: Mean 65: Std 65: Mean 65: Std 65: Mean 65: Std 65: Code: 146 14% 11% 0.6% 1.4% Rcs,h16: Mean 25: Std 25: Mean 65: Std 65: Mean 25: Std 25: Mean 65: Std 65: Mean 25: Std 25: Mean 65: Std 65: Mean 25: Std 25: Mean 65: Std 65: Mean 25: Std 25: Code: Greenish - reddish bluish - yellowish bluish - yellowish bluish - yellowish

Greenish - reddish Greenish - reddish 255 –0% –1% 0.5% 0.6% Rcs,h16: Code: 276 –1% –4% 0.6% 1.0% Rcs,h16: Mean 65: Std 65: Mean 25: Std 25: Rcs,h16: Code: 279 –13% –12% 0.6% 0.6% Code: 162 4% 3% 0.8% 1.2% Rcs,h16: Code: 169 –4% –4% 0.5% 0.8% Rcs,h16: Code: 114 19% 15% 1.2% 2.2% Rcs,h16: Mean 25: Std 25: Code: 123 –13% –14% 3.4% 3.0% Rcs,h16: Mean 25: Std 25:

Figure 8 Age Group colour vector graphics: each of the nine subplots indicates the relative distortion of hues around a hue circle, plotted on normalised axes approximating bluish-yellowish vs. greenish-reddish. On each plot, the gray polygon is an undistorted circle and the solid black polygon indicates the distortion computed with the standard observer. Short vectors connecting the gray and black polygons show the direction of distortion. Red and blue polygons show the distortions per hue for observers in the simulated age groups of 25 years and 65 years, respectively. Text within each graphic lists the mean and standard deviation of relative colour saturation for hue bin 16 (Rcs,h16) for

(12)

8 (a) (b) Difference in Rg Difference in Rg 7 6 5 4 3 2 1 0 _–4 –3 –2 –1 0 1 2 3 4 0 2 4 6 Difference in Rf Difference in Rf 8 10 12 –6 –4 –2 0 2 4 6

Figure 9 Differences in Rfand Rg: two pairs of light sources are compared. Plot (a) shows the difference in Rfand Rgfor

light sources 162 and 276 for all 1000 observers (red dots), the standard observer (o), Pat (þ), and Sam (). Plot (b) shows the same for light sources 141 and 192

120 115 110 105 100 95 90 85 80 70 80 Rf “fidelity” Blue pump RGB/RGBA Rf “gamut” 90 100

Figure 11 Ellipse fits part I: ellipses fitted to the US Age distributions in Rgvs. Rffor two categories of LED light

sources taken from the TM-30 data set. Ellipses are shown for blue-pump (dotted) and for RGB/RGBA (solid) LEDs. Higher eccentricity and more vertical orientation are characteristic of the RGB/RGBA LEDs

110 CIE 1964 Pat Sam 105 100 95 80 85 Rf “fidelity” Rg “gamut” 90

Figure 10 Rgvs. Rfclose up: the light source pairs 162

and 176 along with 141 and 192, shown for all 1000 observers (red dots), the standard observer (o), Pat (þ), and Sam (). Black connecting lines are included to show which Pat and Sam points belong with which light sources

(13)

ellipses of the blue pump LEDs (blue) appear more circular and more horizontally-oriented, and those of the RGB/RGBA LEDs (red) appear more eccentric and more vertically-oriented, meaning the average gamut, or relative colour saturation, of RGB/RGBA LEDs relative to a reference illuminant can be quite different for different observers. Figure 12 shows three additional light source categories: fluorescent (green solid), whose ellipses are generally moderately eccentric

and horizontally oriented; LED hybrid

(magenta dot), whose ellipses do not show a clear pattern; and incandescent (black dash), whose ellipses are mostly too small to see at very high Rf values, with the exception of the neodymium-filtered tungsten sources visible at about (86, 109).

The observed ellipse characteristics may be related directly to the Rf fidelity metric, as

shown in Figure 13. Figure 13(a) shows clear correlations between Rf and ellipse area, and a distinct difference between the blue pump LED (þ), RGB/RGBA LED () and incan-descent (o) types. All trend to zero as Rfnears its maximum value of 100, and with few exceptions the RGB/RGBA LEDs () result in the largest ellipses. Incandescent sources cluster near the bottom-right of the plot, with very high fidelity and very small inter-observer variation (again, the exceptions at Rf 86 are neodymium-filtered tungsten). Figure 13(b) shows no clear relationship between ellipse eccentricity and Rf, but indi-cates that RGB/RGBA ellipses () are over-all more eccentric than those for blue pump LEDs (þ) and fluorescent light sources (). Note that eccentricity is not meaningful

20 (a) (b) Blue pump RGB/RGBA Fluoro Incand Ellipse area Ellipse eccentricity 15 10 5 0 1 0.8 0.6 0.4 Rf “fidelity” 0.2 70 75 80 85 90 95 100

Figure 13 Ellipse characteristics: As a function of fidelity Rf, the area (in plot (a)) and eccentricity (in plot (b)) are

shown for light sources taken from the TM-30 data set, split into four categories based on technology. Ellipse area is computed in RfRg units, and eccentricity is

defined such that a 0 corresponds to a perfect circle and 1 to an infinite-aspect ratio ellipse

120 115 110 105 100 95 90 85 80 70 80 Rf “fidelity” Fluoro LEDHybrid Incand Rf “gamut” 90 100

Figure 12 Ellipse fits part II: ellipses fitted to the US Age distributions in Rgvs. Rffor three categories of LED light

sources taken from the TM-30 data set. Ellipses are shown for fluorescent (solid), LED Hybrid (dotted), and incan-descent (dashed) LEDs

(14)

for the incandescent sources with near-zero ellipse size.

4. Future work

4.1. An inter-observer variation index

This paper is based on Monte Carlo simulations used to uncover how statistical variations in visual sensitivities correspond to variations in colour rendition measures. It would be much more convenient if the variations in measures could be predicted directly from characteristics of a light source spectral power distribution, or at least com-puted directly, rather than via thousands of simulations. It could be very valuable to directly compute an indicator of variability, such as an inter-observer variation index. The value of such an index would be to distill this complexity to a manageable number that could be used in communicating the colour quality of lighting products, and especially in the design of better light sources.

As with most aspects of colour rendition, the effect of inter-observer variation varies considerably with the spectral characteristics of light sources. With the goal of directly relating spectral characteristics of the source directly to a measure of inter-observer vari-ation, we computed various predictors based on light source and observer characteristics. For sources, we considered aspects including spectral slopes, near-zero spectral regions, and peak-to-valley power contrasts; and for sensitivity variation over the population, the standard deviations of both sensitivity and wavelength shift per wavelength. It seems that the amount of overlap between source spec-tral patterns and observer variation should predict the size of clouds in colour rendition measures. However, we could not find a solution with satisfactory correlation for all types of lighting technologies, which implies there is another underlying aspect of source spectral characteristics. Note that the current study highlights nine light sources for

illustration, and includes ellipses for the 318 TM-30 sources, neither of which comprises a comprehensive nor statistically-defensible selection. We maintain hope that the relation-ship between source spectral characteristics and the computed ellipses and non-elliptical cloud shapes can be elucidated in future work. 4.2. Variation in context

With the observer-based distributions of colour rendition measures quantified for some example light sources, the broader question is, how important is inter-observer variation in context with other sources of variation or confusion? One important point to reinforce is that the indices Rf and Rg are averages over many colour evaluation sam-ples, which means that it is quite possible for light sources that plot at the same (Rf, Rg) point to render colours differently20– for the same observer, not even considering inter-observer variation. This means that there can be no easy rule of thumb about what is a meaningful Rfor Rgindex difference, because even no difference can be visible. In our simulation, we have shown that different observers may see the (Rf, Rg) indices to vary by 5–10 units (judging by the size of clouds in Figures 5 and 7) – between obser-vers, not between light sources. Further, the relative difference between light sources isn’t simply shifted for different observers, but can be inverted, meaning that some observers see one light source as of higher fidelity than another, and other observers see the opposite, easily reaching differences of five units (Figure 9). The magnitude of these variations is not huge, but we argue that it is large enough not to be easily dismissed.

Inter-observer variation affects everything about colorimetry, so it is useful to compare the reported colour rendition variation to other performance measures such as illumin-ance and CCT. Another simulation was computed with the US Age distribution to create 1000 2-degree observers for the

(15)

purpose of computing illuminance and CCT. Using these individual sensitivities in place of the 1931 Standard Observer, the computed illuminance and CCT of the 318 TM-30 light sources was found to vary slightly. The standard deviation over observers for these light sources was on average 1.9% of the illuminance, with a maximum standard devi-ation of 4.6%. In terms of CCT varidevi-ation, the average standard deviation was 123 K and maximum was 454 K (for a 6811 K source). As with the colour rendition measures, these inter-observer differences would be most interesting if they affect direct comparisons, in which observers may disagree about which of a pair of light sources is higher in illumin-ance or in CCT. In all cases, these are rather small differences.

A point of reference might be found in another source of variation that has been well studied and quantified, the effect of visual field size.19,22To test this, using the CIE 2006 model,19 we computed CMFs for field sizes 1 8–10 8 and age 30 years, and used them in the TM-30 computations in the same way we used the population CMFs described earlier. The results are plotted in Figure 14 as connected black dots coded by size, together with the Age Group population clouds (red), the CIE 1964 10-degree (open circle) and 1931 2-degree (blue diamond) standard observers. For several light sources (82, 162, 255), the field size series dots are hardly visible, coin-cident with the 10-degree standard; however, for some (114, 123, 146, 279), they result in trails wandering from the standard outward with smaller field sizes, in unpredictable directions. In all cases, however, the length of the wandering path is much smaller than the size of the cloud resulting from inter-observer differences. For all readers who have seen for themselves that colours matched in a 10-degree field appear to not match when changed to a 2-degree field, consider that the difference any two observers might be seeing could be much larger.

Finally, for practical analysis of colour rendition, the scores for index measures are computed for a standardised set of object colours, which will undoubtedly vary from an observer’s experience with real objects in a real lighted room. Royer and Wei noted that objects used in published visual colour rendition experiments range from natural skin and fruit to a variety of colour targets. In their computational comparison, they found that the Rf and Rg indices covered ranges of tens of units,23 meaning that the strongest source of variation in observed colour rendition may well be the objects used in the evaluation. Overall, we find that the magnitude of variation due to inter-observer differences, whether judged by the

120 115 110 105 100 95 90 85 80 70 75 80 85 Rf “fidelity” Rg “gamut” 90 CIE 1964 10° CIE 1931 2° CIE 2006 10° CIE 2006 5° CIE 2006 1° 95 100

Figure 14 Field size comparison: in Rgvs. Rf, the clouds

of points in red are the Age Group population, on top of which are highlighted the 1964 10-degree observer (open circle), 1931 2-degree observer (blue diamond), and a series of CIE 2006 observers with field sizes from 1 to 10 degrees (connected black dots, sized accordingly)

(16)

typical US population range or age cate-gories, are larger than the range caused by field size differences, but smaller than what may be observed with different selections of test colours.

4.3. Communicating variability

It is important to consider how to appro-priately communicate that inter-observer variability is likely, and that it affects the resolution of colour rendition measures for specific light sources. The graphical sum-mary of TM-30 could easily be updated with ellipses and error bars, showing the size of a standard deviation over a relevant popula-tion of observers on each value shown. A proposal is shown in Figure 15, which adds a one standard deviation ellipse to the Rg versus Rf plot and one standard deviation error bars to the bar charts. Additionally, it shows the spread of colour vector graphic polygons that was shown in Figure 6, all based on the 1000 observers in the US Age population (and implying that the TM-30 computations were run 1000 times to pro-duce this graphic). A similar alternative might be to show an additional two values for younger and older observers, based on the CIE 2006 CMFs,19 which would allow individuals a slightly more tailored look at

colour rendition measures. The TM-30

graphics with these additions clearly expose the inter-observer variability that is to be expected from a light source. We realise that adding yet more information to the already complex TM-30 graphical presentation may either clarify or confuse, depending on the levels of understanding and interest of the reader, but we think it is worth considering how to convey such data. In general, the professional lighting community might like to know if different light sources are more or less susceptible to inter-observer differences in colour rendition.

5. Conclusions

A main concluding message of this study is that inter-observer variation, even among people with normal colour vision, has a significant effect on colour rendition meas-ures. Communicating this clearly adds to the already-difficult burden of conveying the many aspects of colour rendition, but it may aid in developing light sources that minimise inter-observer variation. We have focused on the measures and graphics of IES TM-30-15, but it is important to point out that all colour rendition measures, because they rely on standard observers, will show the impact of inter-observer variability when it is included by methods such as our Monte Carlo approach.

We have shown that there is wide variabil-ity in TM-30 Rf and Rg values, when the

6 120 110 100 90 80 70 80 90 100 4 2 0 100 Rf/Hue Rel sat/hue 80 60 40 20 10 –10 –20 5 10 Hue Bin 15 20 0 0 400 600 wl Ref Rf CCT = 3300 Rf = 73.3 Rg = 112.8 Rg bluish -yellowish greenish - reddish Test SPD

Figure 15 TM-30 graphics: for TM-30 RGB LED light source 114, upper left plot shows test (red) and reference (black) SPDs vs. wavelength, and upper right shows Rg

vs. Rf with US Age 1-std ellipse. Lower left bar charts

show fidelity and relative colour saturation per hue bin, with 1-std error bars indicated in black. Lower right shows the colour vector graphic with the inter-observer variabil-ity shown in red behind the main circle (black)

(17)

standard colour matching functions are replaced by those of individuals simulating the variation in the US population, especially far from the fidelity peak of Rf¼100. In many cases, the clouds of values resulting

from inter-observer differences reach

5–10 units in size. Differences are visible in the colour vector graphics and bar charts of TM-30, which show the extent to which specific hue bins are affected. Further, we identified situations in which the index value differences for pairs of light sources invert for different observers. Through simulation of a bimodal age distribution of observers 25 and 65 years old, we have quantified large age-related differences in colour rendition, show-ing not only the categorical difference that could have been predicted from CIE 2006 observers, but also the expected distributions in each population. For some light sources, the clouds of values observed for the two age groups hardly overlap, which means that observers of different ages are unlikely to see the same colour rendition characteristics. This may present an opportunity for age-targeted products and marketing, if visual experiments verify the differences.

We have observed large differences in variability between light sources, between and within technology types. There are gen-eral trends including lower variability with higher colour fidelity, but the implication is that different spectral characteristics vary in susceptibility to inter-observer variation. It would be an interesting engineering goal to create light sources to minimise inter-observer variation. We foresee future work to uncover the relationships between spectral differences and variation, which would allow the devel-opment of an observer variability index and enable direct computation, bypassing the Monte Carlo simulation employed herein. The payoff would be easier computation of measures to both communicate product spe-cifications and inform the engineering of better products.

To put the magnitude of variation caused by observer differences into context, we have compared the observed clouds of variation in TM-30 Rf and Rg indices of 5–10 units to that observed with field size in the CIE06 physiological observers (0–5 units) and that which might be expected using different object colours for assessment (tens of units). Clearly, the amount of ambiguity between different observers is large enough to not be ignored, and should be communicated along with the already complex specifications of light source colour rendition.

Declaration of conflicting interests The authors declared no potential conflicts of interest with respect to the research, author-ship, and/or publication of this article. Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

References

1 Houser KW, Wei M, David A, Krames MR, Shen XS. Review of measures for light-source color rendition and considerations for a two-measure system for characterizing color rendi-tion. Optics Express 2013; 21: 10393–10411. 2 Commission Internationale de l’Eclairage.

Method of Measuring and Specifying Colour

Rendering Properties of Light SourcesCIE

13.3-1995. Vienna: CIE, 13.3-1995.

3 Nickerson D. Light sources and color render-ing. Journal of the Optical Society of America 1960; 50: 57–69.

4 U.S. Environmental Protection Agency and

U.S. Department of Energy. ENERGY STARÕ

program requirements for luminaires. Retrieved 27 June 2017, from https://www. energystar.gov/ia/partners/product_specs/pro-gram_reqs/Final_Luminaires_Program_ Requirements.pdf

(18)

5 Illuminating Engineering Society of North America. IES Method for Evaluating Light

Source Color Rendition IES TM-30-15. New

York: IESNA, 2015.

6 David A, Fini PT, Houser KW, Ohno Y, Royer MP, Smet KAG, Wei M, Whitehead L. Development of the IES method for evaluating the color rendition of light sources. Optics

Express 2015; 23: 15888–15906.

7 Luo MR, Cui G, Li C. Uniform color spaces based on CIECAM02 color appearance model.

Color Research and Application 2006; 31:

320–330.

8 Rea MS, Freyssinier-Nova JP. Color render-ing: A tale of two metrics. Color Research and

Application2008; 33: 192–202.

9 Rea MS, Freyssinier JP. Color rendering: Beyond pride and prejudice. Color Research

and Application 2010; 35: 401–409.

10 Smet K, Ryckaert WR, Pointer MR, Deconinck G, Hanselaer P. Correlation between color quality metric predictions and visual appreciation of light sources. Optics

Express 2011; 19: 8151–8166.

11 van der Burgt PJM, van Kemenade J. About color rendition of light sources: The balance between simplicity and accuracy. Color

Research and Application2010; 35: 85–93.

12 de Beer E, van der Burgt P, van Kemenade J. Another color rendering metric: Do we really need it, can we live without it? Leukos 2015; 12: 51–59.

13 Smet KA, Hanselaer P. Memory and preferred colours and the colour rendition of white light sources. Lighting Research and Technology 2016; 48: 393–411.

14 Ohno Y, Miller CC, Fein M. Vision experiment on chroma saturation for colour quality prefer-ence: Proceedings of the 28th Session of the

CIE, Manchester, UK, 28 June to 4 July 2015. Vienna: CIE.

15 Lin Y, Wei M, Smet KAG, Tsukitani A, Bodrogi P, Khanh TQ. Colour preference varies with lighting application. Lighting

Research and. Technology2017; 49: 316–328.

16 Royer MP, Wilkerson A, Wei M, Houser K, Davis R. Human perceptions of colour rendi-tion vary with average fidelity, average gamut, and gamut shape. Lighting Research and

Technology2017; 49: 963–988.

17 Royer MP, Wilkerson A, Wei M. Human perceptions of colour rendition at different chromaticities. Lighting Research and

Technology2017. First published 24 August

2017. DOI: 1477153517725974.

18 Murdoch MJ, Fairchild MD. Effects of inter-observer variation on color rendition measures: Proceedings of the IS&T 24th Color and Imaging Conference, San Diego, CA, USA, 7-11 November: 2016: 187–191.

19 Commission Internationale de l’Eclairage. Fundamental Chromaticity Diagram with

Physiological Axes—Part 1CIE 170.1-2006.

Vienna: CIE, 2006.

20 Sarkar A, Autrusseau F, Vie´not F, Le Callet P, Blonde´ L. From CIE 2006 physiological model to improved age-dependent and average col-orimetric observers. Journal of the Optical

Society of America A2011; 28: 2033–2048.

21 Asano Y, Fairchild MD, Blonde´ L. Individual colorimetric observer model. PLoS ONE 2016; 11: e0145671.

22 Hu X, Houser KW. Large field color matching function. Color Research and Application 2005; 31: 18–29.

23 Royer M, Wei M. The role of presented objects in deriving color preference criteria from psychophysical studies. Leukos 2017; 13: 143–157.