Introduction

Reflectance spectrophotometry is used to compare colour across individuals, populations or species, often in cross-species comparative studies that assess differences among animals with widely different colours. Reflectance spectra are usually computed as the percentage of reflected light at different wavelengths, by reference to a white standard (Andersson and Prager 2006). Methods to extract information from these spectra fall in two categories: receiver-independent methods that quantify properties of the spectra directly, and methods that use visual models (i.e. functions of visual sensitivity at different wavelengths) to quantify colour (reviewed in Montgomerie 2006; for a recent overview and implementation of these methods, see Maia et al. 2013). Each approach has advantages and limitations: receiver-independent methods provide more objective descriptions of colour properties, while visual models provide a better approximation to how animals perceive colour differences.

Here we note that measuring reflectance on a linear scale (percentage or proportion of reflectance relative to white) is artificial from the perspectives of colour perception, pigment-based production of colour, and some structural-based colour mechanisms as well. We then explain that working with reflectance on a linear scale (e.g. averaging reflectance spectra using arithmetic means, or computing differences in reflectance by subtraction) can bias analyses. These problems are more severe when using receiver-independent colour metrics, but also apply when processing reflectance data (e.g. averaging multiple reflectance measurements) for later use with visual models, and working with logarithmic reflectance (i.e. reflectance on a ratio scale) avoids them.

Reflectance Ratios in Colour Perception and in Colour Production

A fundamental property of biological sensory and cognitive systems is that discrimination thresholds augment proportionally to the intensity of stimuli (Weber’s law) such that, over a large dynamic range, psychological sensation scales with the logarithm of stimulus intensity (Fechner’s law; Dehaene 2003; Goldstein 2010; Akre and Johnsen 2014). Therefore, animals discriminate better among stimuli of low than of high intensity within their sensory range. In the case of light intensity (perceived as colour brightness), many species were shown to discriminate brightness approximately on a ratio scale of light intensity (e.g. Griebel and Schmid 1999; Scholtyssek et al. 2008; Lind et al. 2013), though deviations to this pattern exist (e.g. when changing from photopic to scotopic vision, or in fishes with double retinas; Anthony 1981; Nicol 1989). Accordingly, most visual models log-transform quantum catches (i.e. the amount of light stimulation in photoreceptor cells) in order to compute the strength of the sensory signal (e.g. equations 3 and 4 in Vorobyev et al. 1998; equation 10 in Endler and Mielke 2005).

From the perspective of pigment-based colour production, ratio scales of reflectance are also more meaningful. Tissue that absorbs little light is either translucent or bright achromatic (i.e. reflecting evenly across wavelengths). Most pigments, on the contrary, absorb light efficiently. Pigments that absorb evenly across the visible wavelengths will darken tissues achromatically, and pigments that absorb preferentially certain wavelengths will give a complimentary colour hue to the tissue by subtractive colour mixing (Billmeyer and Saltzman 1981). For example, carotenoid pigments absorb mostly at medium wavelengths, causing reflectance plateaus at the long (yellow to red) and short wavelengths (ultraviolet to blue) that we perceive as the red and yellow colours of many animals; conversely, in plants, chlorophylls absorb mostly long and short wavelengths, causing a peak of reflectance at medium wavelengths (green). All else being equal, increasing the concentration of a pigment by equal amounts should result in progressively smaller changes in linearly-measured reflectance, because pigments absorb light efficiently and the asymptotic, saturated reflectance (i.e. the reflectance of the pure pigment) is approached rapidly. As a commonplace example, we perceive that the first spoon of coffee darkens a cup of milk more than the second spoon, and so forth (despite the bias of visual systems to distinguish dark colours better than bright ones; Goldstein 2010). As a more accurate example, reflectance at wavelengths where chlorophyll absorbs the most decays exponentially as chlorophyll concentration increases (i.e. chlorophyll concentration is negatively proportional to the logarithm of reflectance; e.g. Sims and Gamon 2002), and the best colorimetric proxies for chlorophyll content are thus based on reflectance ratios (Chappelle et al. 1992; Datt 1999; Sims and Gamon 2002).

As a consequence of reflectance decaying exponentially with pigment concentration, log-transformation is also advisable on measurement-theoretical grounds. As linearly-measured reflectance tends to an asymptote with increasing pigment concentrations, variances in measurements will likely decrease: for a given change in pigment concentration, changes in linearly-measured reflectance will be smaller lower in the reflectance scale. Similarly, it is common that in other attributes of animals (e.g. size, Gingerich 2000) variance scales with value, approaching geometric normality of variation, and in those cases measurements should be log-transformed to a ratio scale in order to normalize variance (Houle et al. 2011). Working with a ratio scale also has the advantage that ratios are unitless, and therefore the numeric values of colour comparisons are not affected by, for example, differences in calibration of white across studies.

Much of animal coloration is based on pigmentation, but there are also various mechanisms of structural colour production (Kinoshita and Yoshioka 2005; Kinoshita 2008). Because structural colour often involves complex combinations of reflection, refraction and/or absorption, it is not always straightforward which reflectance scale better depicts quantitative changes in the underlying production mechanism. Therefore, we make no sweeping claim that a ratio scale is always advantageous to quantify structural colours. But often reflectance ratios should depict differences in the underlying colour mechanisms better than linear reflectance, because increasing elaboration of the colour-producing structure will eventually converge towards an asymptote of reflectance. For example, all else being equal, linear increases in the reflectance of multi-layer structures imply an exponential increase in the number of layers (i.e. the number of layers is proportional to the logarithm of reflectance; e.g. Figure 4 in Kinoshita and Yoshioka 2005, and pages 22–23 in Kinoshita 2008).

Reflectance Ratios Reduce Analysis Bias

Reflectance spectra (plots of reflectance vs. wavelength) provide a detailed description of colour. Three properties of colour are indicated by the height and shape of these spectra: brightness is indicated by the height of spectra (brighter colours reflect more), saturation by differences in reflectance across wavelengths, and hue by where (i.e. at which wavelengths) those differences in reflectance are located. Receiver-independent colour metrics quantify these three aspects of the height and shape of reflectance spectra (Montgomerie 2006), and can be used to compare spectra that do not differ strongly in shape. The following example shows how this rationale is biased when working with reflectance on a linear scale and, on the contrary, how it holds true when working on a ratio scale.

Figure 1a shows a simple reflectance spectrum increasing linearly from 1 to 50 % across the bird-visible wavelengths, and two additional spectra transposed upwards by adding a constant amount of reflectance each time. The three spectra have the same shape and are equidistant from each other on a linear scale, which erroneously suggests identical hue and saturation, and similar differences in brightness. Figure 1b shows approximately how the hue, saturation and brightness of these spectra would be perceived by us and, using an avian sensory model, quantifies the chromatic and achromatic contrasts between adjacent spectra (see “Appendix 1” for methods): the three colours are perceived as different both chromatically and achromatically, and differences between the brighter spectra are smaller than between the darker spectra. Figure 1c shows the same three spectra plotted on a ratio scale, where it is clear that the three colours are different (the shapes of the upper spectra are shallower), and that the differences in brightness are unequal. If the original reflectance spectrum (the one increasing from 1 to 50 %) is instead transposed on a ratio scale—i.e. each time multiplying reflectance by a constant (Fig. 2a) or, equivalently, adding a constant to log-transformed reflectance (Fig. 2c)—, then the three colours are identical except for brightness (Fig. 2b). Again, spectra on a ratio scale give a good indication of differences and similarities between colours (the three spectra have the same shape and are equally distant; Fig. 2c), while on a linear scale (Fig. 2a) the different shapes and uneven distances between spectra would wrongly suggest otherwise.

Fig. 1
figure 1

Reflectance colour spectra differing by a constant linear difference in reflectance, plotted (a) on a linear or (c) on a ratio scale of reflectance. b Illustration of how hue and brightness of these three colours would be approximately perceived by humans, and the extent of chromatic and achromatic contrasts between adjacent colours, in units of just noticeable differences. See “Appendix 1” for methods

Fig. 2
figure 2

Reflectance colour spectra differing by a constant ratio of reflectance, plotted (a) on a linear or (c) on a ratio scale of reflectance. b Illustration of how hue and brightness would be approximately perceived by humans, and chromatic and achromatic contrasts between adjacent colours in units of just noticeable differences. See “Appendix 1” for methods

Thus, reflectance spectra on a linear scale give a biased depiction of colour, while spectra on a ratio scale are true to colour similarities and differences. Among other biases, the above example shows that reflectance on a linear scale overestimates differences among bright colours, which can affect empirical results. As a real-life example, consider Taysom et al.’s (2010) data on sexual dichromatism in pigment-based red to yellow colours of Australasian parrots. Figure 3a plots sexual dichromatism (for simplicity computed as achromatic contrast between the sexes, using an avian visual model) against brightness of the different species (see “Appendix 2” for methods). Sexual dichromatism shows only a weak trend for brighter species to be less dichromatic (Fig. 3a; r = −0.19, P = 0.36, N = 27 species). A similar result is obtained computing brightness differences with a receiver-independent metric (mean reflectance) on a ratio scale (Fig. 3b; r = −0.06, P = 0.75). On the contrary, the same receiver-independent metric on a linear scale would yield an artefactual trend for brighter species to be more dichromatic (Fig. 3c; r = 0.27, P = 0.18), because reflectance on a linear scale overestimates differences among bright colours.

Fig. 3
figure 3

Differences in brightness between male and female red-to-yellow colours of Australasian parrots, computed as a achromatic colour contrast, b absolute differences in reflectance on a ratio scale, and c absolute differences in reflectance on a linear scale, plotted against the mean colour brightness of the different species. Note how reflectance differences on a linear scale change substantially the apparent relation between sexual dichromatism and mean brightness. Data from Taysom et al. (2010); see “Appendix 2” for methods

Recommendations

Because of the above, working with reflectance on a linear scale can introduce errors at different stages of colour analysis. We next give recommendations to address problems of (a) graphic misrepresentation of colour, from the perspectives of colour perception and production, (b) assessing measurement inaccuracy, (c) distorting colour information during processing, and (d) bias in colour metrics.

  1. a.

    Graphic misrepresentation of colour

As illustrated with the examples in Figs. 1 and 2, reflectance spectra on a linear scale misrepresent colour because identical spectral shapes at different heights of reflectance have different chromatic properties. A better visual representation of colour is provided by reflectance spectra on a ratio scale, which is consistent with the mechanisms of perception, and production of pigment- and some structural-based colours, and in which case the shape of spectra indicates chromatic properties of colour independently of achromatic brightness. The base for the logarithmic transformation used to convert percent reflectance to a ratio scale is arbitrary.

  1. b.

    Dealing with measurement inaccuracy

Spectral measurements have a degree of inaccuracy, and light contamination or other problems may also occur. Therefore, common procedures for quality control are to look for outliers across measurement of the same colour patch, or to monitor reflectance in real-time and save a spectrum only after its shape stabilizes across consecutive readings. This helps to avoid measurements with noticeable contamination or transient irregularities in reflectance. Most colour metrics based on visual models are affected by ratios rather than absolute changes in reflectance (e.g. Vorobyev et al. 1998; Endler and Mielke 2005), but monitoring spectra on a linear scale makes it difficult to assess those relative changes near 0 % reflectance (e.g. doubling reflectance from 0.5 to 1 % is difficult to monitor compared to an equivalent doubling from 5 to 10 %, for example). Monitoring spectra on a ratio scale would render those relative differences perceptible, and allow better quality control of measurements.

Small inaccuracies measuring reflectance have negligible effects on colour metrics when reflectance is high, but when reflectance is near 0 % they translate into large errors in relative reflectance. Therefore, limits to instrument accuracy compromise the usefulness of very low reflectance measurements. Because of instrument inaccuracy, spectra of very dark colours may have small peaks of negative reflectance, and it is common that researchers or software flatten these negative values to 0 % (e.g. the procspec function in the R package pavo ; Maia et al. 2013). Flattening low reflectance values changes the shape of spectra, but this may be justifiable as a compromise to avoid measurement error in the low reflectance range. Flattening low reflectance values also seem preferable to shifting spectra, for example by addition of a constant, because this would change chromatic properties even in spectra that do not have regions of low reflectance (see Fig. 1 and the following section). We recommend that low reflectance values be flattened higher than 0 % (at least to 1 % reflectance or, equivalently, 0 log-reflectance), to make explicit the compromise with avoiding spurious colour estimates in the low-reflectance range: although flattening regions of low reflectance in very dark colours can strongly change reflectance ratios within spectra, and thus diminish the inferred saturation of colour, this seems preferable to obtaining some very high, spurious saturations that can come about due to slight instrument inaccuracy in this very low reflectance range.

  1. c.

    Processing reflectance spectra

Reflectance spectra are typically processed before colour analysis. A common procedure is to average spectra from different points on a colour patch, rather than taking a single measurement, in order to account for colour heterogeneity. When using linear reflectance this distorts colour information because, as explained before, chromatically identical colour spectra change multiplicatively rather than additively. Therefore, using arithmetic means with linear reflectance overestimates mean colour brightness and changes the other chromatic properties of colour. The correct way to average reflectance spectra is to use geometric means with linear reflectance or, equivalently, regular arithmetic means with logarithmic reflectance. Strictly speaking, smaller-scale averaging of reflectance, such as that involved in curve smoothing or in the automatic integration of consecutive readings done by spectrophotometry software, should also use logarithmic reflectance. But since this smaller-scale processing deals with spectra at a single measurement point, rather than measurements at distinct points on the animal, this level of detail is probably of little relevance.

Other processing that is sometimes done to reflectance spectra includes adding a constant or rescaling in order to standardize a property of colour. This type of manipulation is generally unnecessary for computing colour metrics, and is perhaps unadvisable because it can introduce unintended changes in the properties of colour. For example, when working with spectra of linear reflectance, adding a constant to standardize brightness will also change the chromatic saturation of colour (Fig. 1).

  1. d.

    Colour metrics

Most implementations of visual modelling work on a ratio scale by log-transforming light quantum catches (Vorobyev et al. 1998; Endler and Mielke 2005), or computing ratios of quantum catches rather than differences (Evans et al. 2010), and colour metrics produced by those models already address the non-linear sensation of light intensity. But, inconsistently with the rationale of these visual models, measuring and processing (e.g. averaging) spectra do not typically address the non-linear nature of light reflectance, thus causing the shortcomings explained in points (b) and (c), above. Therefore, we advise processing reflectance data on a ratio scale prior to running visual models. Visual models are then run as usual (i.e. do not input log-transformed spectra into models designed to accept linear reflectance, but rather back-transform to linear before input into those models).

Some implementations of visual modelling do not log-transform quantum catches, arguing that it homogenises variation in colour saturation and that, although light intensity stimulates photoreceptor cells on a ratio scale according to Fechner law, higher-level neural processing may revert this in order to enhance colour discrimination (Stoddard and Prum 2008). We acknowledge that log-transformation reduces variance in inferred colour saturation, as illustrated by the data in Fig. 4. We also agree that higher-level neural processing may adjust colour discrimination to fit the ecological and social needs of animals. But reflectance ratios are generally more informative regarding the physical and biological world, because they have stronger links to colour production mechanisms (see above); it is thus likely that higher-level adjustment of colour discrimination remains close to a ratio scale, rather than reverting to linear. This is in accordance with the behavioural discrimination of light intensity on a ratio scale found in most animals (e.g. Griebel and Schmid 1999; Scholtyssek et al. 2008; Lind et al. 2013). Since visual models that log-transform or abstain from log-transforming quantum catches do not always estimate colour saturation congruently with each other (Fig. 4), the decision over whether or not to apply Fechner law may occasionally affect conclusions.

Fig. 4
figure 4

Scatterplot of colour saturation, r (quantified as the distance to the achromatic centre of an avian colour space; Endler and Mielke 2005; Stoddard and Prum 2008), for ca. 9000 measurements of plumage colours across 135 species of estrildid finches, using visual models that do or do not apply Fechner law (i.e. log-transforming quantum catches of photoreceptor cells). Histograms show the variances in each metric, and the scatterplot shows the non-linear relation between the two metrics. See “Appendix 3” for methods

As for receiver-independent metrics, when computed using a linear scale of reflectance they are prone to the biases explained earlier: misjudging differences in brightness, and chromatic metrics being confounded by the overall level of brightness. Using logarithmic reflectance corrects those biases (Figs. 1, 2). Some receiver-independent metrics are not affected by the reflectance scale used (hue metrics that identify wavelengths of peak reflectance), but most metrics will differ from a linear to a ratio scale (e.g. all brightness and saturation metrics, hue metrics that identify reflectance slopes or mid-reflectance; see Montgomerie 2006, for a list and explanation of receiver-independent metrics). Most receiver-independent metrics are applicable to spectra of logarithmic reflectance with no changes in formulae. The exceptions are metrics that join information from spectrum height and shape, rather than obtaining information on height (for brightness) and shape (for hue and saturation) separately. For example, the proportion of reflected light on selected wavelengths relative to the entire visible range is often used as a metric of colour saturation. Such proportions cannot be calculated using logarithmic reflectance, which is a ratio scale itself and as such does not have a lower boundary (for use with logarithmic reflectance, these metrics should be modified to: mean reflectance on selected wavelengths minus mean reflectance on the entire visible range).

In conclusion, we hope to have raised awareness that linear reflectance is inconsistent with principles of colour production and perception, which are better approximated by a ratio scale. As a consequence, quantifying reflectance on a ratio scale improves the workflow of colour analysis: from better quality control of reflectance measurements, to better colour metrics.