Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Distance sampling provides a rigorous framework for estimating detectability, allowing us to correct counts of detected animals in covered areas for those that were missed. The fundamental concept involved in estimating detectability in the distance sampling context is the detection function, which represents the probability of detecting an object of interest as a function of its distance from the line or point. Thus a key step in any distance sampling analysis is to choose a plausible and parsimonious model for the detection function.

In this chapter, we introduce tools to help us select a suitable model for the detection function, and to assess how well it fits our data (Sect. 5.1). We then consider conventional distance sampling (Sect. 5.2), in which probability of detection is assumed to be a function of distance from the line or point alone, and for which animals at the line or point are certain to be detected. A set of plausible functions is introduced to model the detection function. Next, we allow the detection function to depend on additional covariates, such as habitat type, observer, animal behaviour or weather conditions, using multiple-covariate distance sampling (MCDS, Sect. 5.3). Finally, we relax the assumption that all animals on the line or point are detected, using mark-recapture distance sampling (MRDS, Sect. 5.4): instead of assuming g(0) = 1, we use information on number of animals detected by either one or both of two observers to account for failure to detect all animals on the line or at the point.

1 Model Selection and Goodness-of-Fit

As noted above, a central concept in distance sampling is that of the detection function. This function, usually referred to as g(y), where y represents either the distance x from a line or the distance r from a point, describes how probability of detection falls off as a function of distance from the line or point. A good model for the detection function should have the following properties.

Shoulder. The model should possess a shoulder. That is, the probability of detection should remain at or close to one as distance from the line or point increases from zero, before falling away at larger distances (Sect. 5.2.1). This is the so-called shape criterion (Buckland et al. 2001, pp. 68–69).

Non-increasing. The model should be a non-increasing function of distance from the line or point (Sect. 5.2.1). That is, the probability of detection at a given distance cannot be greater than the probability of detection at any smaller distance.

Model robust. As we never know the true shape of the detection function, we need flexible models that can fit a variety of shapes. The purpose of adjustment terms (Sect. 5.2.1) is to provide this flexibility.

Pooling robust. Although in conventional distance sampling, we model probability of detection as a function of distance alone, in reality it is a function of many factors. We can attempt to include these factors in the model (Sect. 5.3). Otherwise, we need to rely on the pooling robustness property, which states that inference from our model should be largely unaffected if we fail to include the various factors that influence detectability.

Estimator efficiency. Other things being equal, we would prefer a model that gives high precision (i.e. small standard errors). However, we should not select a model that gives high precision unless we are confident that it also satisfies the above properties, in which case it should have low bias.

While the first two properties stem from intuitive considerations about the search process and the resulting detectability pattern, the last three relate to the desirable statistical properties of the model for the detection function.

Below, we first consider the issue of pooling robustness in greater detail, and then we briefly describe tools used to select a suitable model.

1.1 Pooling Robustness

Pooling robustness is a key concept in distance sampling. For those more familiar with estimating abundance using mark-recapture, it seems implausible that estimates from distance sampling should be largely unaffected if there is unmodelled heterogeneity in probability of detection, yet for conventional distance sampling, this is the case. A mathematical proof of the pooling robustness property of distance sampling estimators is given in Burnham et al. (2004, pp. 389–392). Here we focus on when it applies and when it does not, and we illustrate both cases with examples.

When pooling robustness applies, we can model probability of detection as a function of distance from the line or point, while ignoring other factors that may affect detectability. Thus there is no need to record covariates on individuals (such as gender, whether calling, whether in a cluster), environment (such as habitat, thickness of vegetation, visibility) or observer (such as identity, experience, number of observers). This is a significant advantage relative to mark-recapture, for which such heterogeneity is very problematic (Link 2003). Hence it is important to understand when the property applies.

Pooling robustness does not apply for mark-recapture distance sampling, which is unsurprising given that it does not apply to mark-recapture in general. Thus if probability of detection at the line or point is not certain (g(0) < 1), and double-observer methods are used to address this, every attempt should be made both to reduce heterogeneity through use of standardized field methods and to model remaining heterogeneity.

Provided we can reasonably assume that g(0) = 1, and provided heterogeneity is not extreme, pooling robustness applies to an overall abundance estimate. An example of extreme heterogeneity might be surveys of a songbird in dense forest in the breeding season, for which the male may be detectable at great range due to its song, while the female might be undetectable unless very close. In such an extreme case, it may be better to estimate the more detectable component of the population, separately from the less detectable component. For the songbird example, precision on the female component may be very poor; a better option might be to estimate the number of singing males, allowing estimation of the density of territories. Whether it is then reasonable to assume that, on average, there is one female for every male depends on the species.

If analyses are stratified, but a common detection function is assumed for the separate strata, then stratum-specific abundance estimates are not pooling robust, although an estimate of overall abundance obtained by summing the stratum-specific estimates is, provided the proportion of the stratum sampled is equal across strata. Again using the songbird example, if females were sufficiently detectable to allow their inclusion in the analysis, but still less detectable than males, then if we assume a common detection function, we will overestimate number of males in the population and underestimate number of females, although total population size will be estimated with little bias. The stratum-specific bias can be avoided by fitting separate detection functions for males and females, or by including gender as a factor in multiple-covariate distance sampling.

If a survey region is stratified into two habitats, and detectability is lower in one habitat than the other, the stratum-specific abundance estimates will again be biased, if we assume that the same detection function applies to both habitats. Total abundance across habitats will only have the pooling robustness property if effort is in proportion to stratum area. For example if one stratum is twice the size of the other, it should have twice the survey effort.

We illustrate pooling robustness using a dataset on trees in Sect. 5.2.2.4.

1.2 Information Criteria

Information criteria can be used for selecting between competing models, or quantifying the degree of support for each model. In conventional distance sampling, we can use information criteria to select between say a half-normal and a hazard-rate detection function. The most widely used information criterion is Akaike’s information criterion (AIC), and this is the one that we will use throughout this book. Conceptually, most applied statisticians prefer to use information criteria rather than hypothesis testing, but mathematically, it is easy to show that using AIC to compare two nested models, for which one model has a single additional term, is equivalent to using a likelihood ratio test (Sect. 5.1.3) with size 0.157 (15.7 %).

For a given model with maximized likelihood \(\hat{\mathcal{L}}\), information criteria are of the form \(-2\log _{e}\hat{\mathcal{L}}\) with a penalty added, where the penalty is a function of the number of parameters in the model. For AIC, we have

$$\displaystyle{ \mathrm{AIC} = -2\log _{e}\hat{\mathcal{L}} + 2q }$$
(5.1)

where q is the number of parameters in the model. (Thus q is the number of parameters in the key function plus the number of adjustment terms.) AIC is evaluated for each model, and the model with the smallest AIC value is deemed best.

For ease of comparison, ΔAIC values are often used. These are formed by subtracting the AIC value corresponding to the best model (i.e. the one with the smallest AIC) from the AIC value of each model in turn. Thus the best model has ΔAIC = 0. Burnham and Anderson (2002) suggested that models with a ΔAIC value of around two or less should be deemed to be well supported by the data, with decreasing support the larger ΔAIC becomes; models with values of ΔAIC above ten should be considered to be very implausible.

AIC is the default model selection method in software Distance. Two other methods are provided. The first of these, AICc, includes a small-sample correction for when the data are normally distributed. As the method is not necessarily better for non-normally distributed observations, it is not the default method in Distance. It is defined as

$$\displaystyle{ \mathrm{AICc} = -2\log _{e}\hat{\mathcal{L}} + 2q + \frac{2q(q + 1)} {n - q - 1}\ . }$$
(5.2)

When sample size n is large and q is small (as for most distance sampling datasets), AICc and AIC are approximately the same.

Also provided in Distance is the Bayes information criterion (BIC):

$$\displaystyle{ \mathrm{BIC} = -2\log _{e}\hat{\mathcal{L}} + q\log _{e}n\ . }$$
(5.3)

The penalty qlog e n exceeds 2q for n ≥ 8, so that for realistic sample sizes, BIC never selects a larger model than does AIC, and often selects a smaller one. If it is believed that truth is low-dimensional, and that one of the competing models represents truth, then BIC has better properties than AIC. If however, truth is considered to be high-dimensional, and if we are seeking to find a ‘best approximating model’ based on the data, AIC has the better properties. In our examples, we use AIC, which reflects better our philosophy.

A forwards stepping procedure may be adopted to select adjustment terms; this is computationally more efficient than evaluating AIC for every possible combination of adjustment terms, and in most cases, leads to the same choice of model. Thus we start with the key function on its own for model 1, and the key function with just the first adjustment term added as model 2. If model 1 has a smaller value of AIC, then use it. Otherwise, select model 2, then define model 3 to include a second adjustment term. If model 2 has the smaller AIC, we select it; otherwise we compare model 3 with model 4 (which has three adjustment terms). This process continues until the simpler of the two models is not rejected, or until we reach the maximum number of adjustments allowed by the software.

Values of information criteria may be transformed into weights, for model averaging or for quantifying relative support for the models (Buckland et al. 1997; Burnham and Anderson 2002). Model averaging is especially useful for when different models have comparable information criterion values, yet give rather different estimates of animal density; the approach then allows uncertainty over which model is appropriate to be incorporated into estimation. The AIC weight for model m is defined by

$$\displaystyle{ w_{m} = \frac{\exp (-\mathrm{AIC}_{m}/2)} {\sum _{m}\exp (-\mathrm{AIC}_{m}/2)} }$$
(5.4)

where AIC m is AIC for model m. (The weights are unaffected if calculated using ΔAIC values in place of AIC values.)

In line transect sampling, if two models give quite different estimates, rather than using model averaging, it is usually worth considering what aspect of the data gives rise to the difference. This may allow one model fit to be rejected, for example because an assumption failure has distorted the fit. This can occur for example if small distances from the line tend to be rounded to zero, perhaps causing the hazard-rate model to fit a spurious spike; or animals avoiding the observer may result in the hazard-rate model fitting an implausibly wide shoulder. By contrast, in point transect sampling, there is relatively little information to judge fit at small distances, and this can result in different plausible model fits yielding quite different estimates of abundance. In this circumstance, model averaging may prove useful, allowing uncertainty to be better quantified.

1.3 Likelihood Ratio Test

The likelihood ratio test is only relevant for testing between nested models, which limits their usefulness relative to information criteria, which can be used to compare non-nested models too. Thus the likelihood ratio test may be used to test for inclusion of adjustment terms, having selected a key function, but it cannot be used to select between different key functions.

Suppose we have model 1, with m 1 adjustment terms, and we denote the maximum value of its likelihood by \(\hat{\mathcal{L}}_{1}\), and model 2, with m 1 + m 2 adjustment terms and maximized likelihood \(\hat{\mathcal{L}}_{2}\). Then the test statistic is

$$\displaystyle{ \chi ^{2} = -2\log _{ e}\left (\hat{\mathcal{L}}_{1}/\hat{\mathcal{L}}_{2}\right ) = -2\left [\log _{e}\hat{\mathcal{L}}_{1} -\log _{e}\hat{\mathcal{L}}_{2}\right ] }$$
(5.5)

If model 1 is the true model, then the test statistic has a χ 2 distribution with m 2 degrees of freedom.

A forwards stepping procedure is again most convenient for testing for inclusion of adjustment terms. Thus we start with the key function on its own for model 1 (m 1 = 0), and the key function with just the first adjustment term added as model 2 (m 2 = 1). If model 1 is not rejected (i.e. the p-value exceeds the size of the test, typically taken to be 0.05 or 5 %), then use it. Otherwise, test the model with one adjustment against the one with two adjustments, and so on.

1.4 Goodness-of-Fit Tests

Information criteria do not provide an absolute measure of how well a model fits the data — instead, they allow the models to be ranked according to which fit best, and they quantify the relative support that each of the competing models has, given the data. However, even the best model as judged by AIC might provide a poor fit to the data.

Goodness-of-fit tests allow formal testing of whether a detection function model provides an adequate fit to data. The χ 2 goodness-of-fit test cannot be used on continuous data, so that it is of limited use for testing MCDS or MRDS models. It is useful for testing models using conventional distance sampling. However, if distances are not grouped, they must first be categorized into groups to allow the test to be conducted. Thus there is a subjective aspect to the test, and different analysts, using different group cutpoints, may reach different conclusions about the adequacy of the model. By contrast, the Kolmogorov–Smirnov and Cramér–von Mises tests can be applied to continuous data. All three tests can be carried out using the Distance software.

Goodness-of-fit tests, as their name implies, assess how well the model fits the data. It is important to realise that, in the event of a poor fit, there are two possible explanations. First is that the model is unable to approximate adequately the true detection function. However, it may be that the model provides an excellent approximation, but that there is a problem with the data. The most common problem is rounding of distances to favoured values, such as multiples of 10 m. Non-independent detections can also generate significant test statistics. An extreme example of this is cue counting from points, where a single bird may give many songbursts from the same location during the recording period. Hence goodness-of-fit tests cannot be used in that context to assess model fit (Sect. 9.4.2).

In surveys with large numbers of detections, goodness-of-fit tests tend to indicate poor fit even when a perfectly good model has been used. This may be simply because with many observations, very small departures from the assumed model can be detected. More usually, it reflects an assumption failure, such as rounding of distances, non-independence of detections, or responsive movement. If the effect is relatively small, then any resulting bias is likely to be of little concern, despite the apparently poor fit. Indications of poor fit when sample size is small are of greater concern.

1.4.1 The χ 2 Goodness-of-Fit Test

Suppose we have n animals detected, with counts of \(n_{1},n_{2},\ldots,n_{u}\) in u distance intervals (see Sect. 5.2.2.2), either by recording grouped distances in the field, or by subsequently defining interval cutpoints and counting how many recorded distances fall within each interval. Denote the interval cutpoints by \(c_{0},c_{1},\ldots,c_{u}\), where c 0 = 0 unless the data are left-truncated, and c u  = w. The n j (\(j = 1,2,\ldots u\)) are the observed counts by interval, while for a given model for the detection function, the corresponding expected counts are given by n multiplied by the proportion \(\hat{\pi }_{j}\) of the fitted probability density function, \(\hat{f}(y)\), that lies within each interval:

$$\displaystyle{ \hat{\pi }_{j} =\int _{ c_{j-1}}^{c_{j} }\hat{f}(y)\,dy }$$
(5.6)

The χ 2 statistic is then defined by

$$\displaystyle{ \chi ^{2} =\sum _{ j=1}^{n}\frac{\left (n_{j} - n\hat{\pi }_{j}\right )^{2}} {n\hat{\pi }_{j}} \ . }$$
(5.7)

This statistic has an approximate χ 2 distribution with \(u - q - 1\) degrees of freedom under the null hypothesis that our model is the true detection function, where q is the number of parameters in the model that we have estimated. This approximation can be poor if there are small expected counts; a rough guide is that expected counts of less than 5 should be avoided. For continuous data, cutpoints can be chosen accordingly. If data are collected in bins, and an expected count is small for a given bin, the data from that bin can be pooled with the data from the neighbouring bin, thus reducing the number of bins by one.

The effects of rounding of continuous data on the χ 2 goodness-of-fit statistic can be ameliorated by suitable choice of cutpoints. For example if many distances are rounded to the nearest 10 m, cutpoints can be chosen at 5 m, 15 m, 25 m, 35 m, …. In other words, try to select cutpoints that are mid-way between favoured rounding distances, so that few distances are recorded in the wrong distance interval. This is one circumstance where the subjective choice of intervals in the χ 2 test can be used to advantage.

1.4.2 The Kolmogorov–Smirnov Test

The Kolmogorov–Smirnov test can be applied to continuous or discrete data. It is thus preferable to the χ 2 test for MCDS and MRDS methods. Suppose we have a model g(y, z) for the detection function, where z is a vector of covariates. We give two such models in Sect. 5.3: the half-normal model (Eq. (5.40)) and the hazard-rate model (Eq. (5.41)), where the dependence on z is modelled through the scale parameter (Eq. (5.39)).

The probability density function of the distance y i of the i th detected animal, given covariates z i , is denoted by \(f_{y\vert z}(y_{i}\vert \mathbf{z}_{i})\). Note that in general, this function is different for each observation y i (although if there are no covariates, then it is the same for all observations). This creates a problem, as the Kolmogorov–Smirnov test assumes that all observations have the same distribution. However, the test is carried out on values of the cumulative distribution function, defined as \(F_{i} = F_{y\vert z}(y_{i}\vert \mathbf{z}_{i}) =\int _{ 0}^{y_{i}}f_{y\vert z}(y\vert \mathbf{z}_{i})\,dy\). The values of this function are all independently and identically distributed as uniform(0,1), so we can think of using the cumulative distribution function as a transformation of the observations y i , which do not all have the same distribution, to F i , that do. We can then conduct the test on the F i . The remaining complication is that we must estimate these values.

Having maximized the conditional likelihood of Eq. (5.42), we estimate \(f_{y\vert z}(y_{i}\vert \mathbf{z}_{i})\) by \(\hat{f}_{y\vert z}(y_{i}\vert \mathbf{z}_{i})\) and hence \(F_{y\vert z}(y_{i}\vert \mathbf{z}_{i})\) by \(\hat{F}_{y\vert z}(y_{i}\vert \mathbf{z}_{i})\), which we denote \(\hat{F}_{i}\). Having evaluated \(\hat{F}_{i}\) for every observation, we rank these values from smallest to largest. The Kolmogorov–Smirnov test statistic is then defined as

$$\displaystyle{ D_{KS} = \begin{array}{c} \mathrm{max} \\ i \end{array} \left \{\left \vert \frac{i} {n} -\hat{ F}_{(i)}\right \vert,\left \vert \frac{i - 1} {n} -\hat{ F}_{(i)}\right \vert \right \} }$$
(5.8)

for \(i = 1,\ldots,n\), where \(\hat{F}_{(i)}\) indicates the ordered values. Thus the test statistic is the biggest difference (ignoring sign) between the estimated cumulative distribution and the empirical distribution function, which is defined to be zero for F < F (1), one for F ≥ F (n), and in for F (i) ≤ F < F (i+1), \(i = 1,\ldots,n - 1\).

1.4.3 The Cramér–von Mises Test

The Cramér–von Mises test is very similar to the Kolmogorov–Smirnov test, except that the test statistic is a function of the sum of the squared differences between the cumulative distribution function and the empirical distribution function. As a consequence, it uses more information, and should have higher power, although in practice, the power is very similar. As with the Kolmogorov–Smirnov test, for MCDS and MRDS, it must be carried out on the \(\hat{F}_{i}\) rather than the observations y i .

A weighted version of the test is possible. Software Distance carries out both the unweighted test and a test in which differences close to zero distance from the line or point are given greater weight than other differences, as it is the fit close to zero distance that is most important for reliable estimation. See Burnham et al. (2004, pp. 388–389) for more details.

2 Conventional Distance Sampling

Conventional distance sampling was described in Chap. 1 It refers to the case when all assumptions of Sect. 1.7 hold, and when the detection function is modelled as a function of distance from the line or point only, relying on the pooling robustness property (Sect. 5.1.1) to yield estimates of density with low bias.

2.1 Models for the Detection Function

Models for the detection function g(y) (where y = x, the perpendicular or shortest distance of a detected animal from the line for line transect sampling, and y = r, the distance of a detected animal from the point for point transect sampling) should have a ‘shoulder’ (Fig. 5.1). In mathematical terms, we say that the slope of the detection function at zero distance, denoted by g (0), is zero. In practical terms, we assume that not only g(0) = 1 (certain detection at the line or point — the first assumption of Sect. 1.7), but also that g(y) stays at or close to one for animals at small distances from the line or point. Theoretical considerations indicate that a shoulder should normally exist, provided that g(0) = 1 (Buckland et al. 2004, p. 338). More pragmatically, if the true detection function has a wide shoulder, then different models for the function will tend to give similar estimates of density, while if the detection function has no, or only a narrow, shoulder, different models can give rise to very different estimates of density, even if they fit the data equally well. Thus field methods should be adopted that ensure that probability of detection stays close to one for some distance from the line or point, as illustrated in Fig. 5.1.

Fig. 5.1
figure 1figure 1

A good model for the detection function should have a shoulder, with probability of detection staying at or close to one at small distances from the line or point. At larger distances, it should fall away smoothly. The truncation distance w corresponds to the half-width of a strip (line transect sampling) or the radius of the circular plot (point transect sampling)

Models for g(y) should fall away smoothly at middle distances from the line or point. The fall should not be too rapid, and the function should be a non-increasing function of y. At large distances, the function should level off at or close to zero (Fig. 5.1).

The Distance software provides four models for the detection function, but also allows series adjustment terms to be added to the model, for when the simple model (termed a ‘key function’) on its own does not provide an adequate fit. The first key function is the uniform model:

$$\displaystyle{ g(y) = 1\,\ \ 0 \leq y \leq w\ . }$$
(5.9)

This model should be used to carry out a strip transect analysis, assuming that all animals in the strip of half-width w are detected. If series adjustment terms are added, then the model is also useful for analysing line or point transect data.

The second key function is the half-normal model:

$$\displaystyle{ g(y) =\exp \left [\frac{-y^{2}} {2\sigma ^{2}} \right ]\,\ \ 0 \leq y \leq w\ . }$$
(5.10)

Note that g(0) = 1 as required. The parameter σ is a scale parameter; varying it does not change the shape of the detection function, but does affect how quickly probability of detection falls with distance from the line or point.

The third key function is the hazard-rate model:

$$\displaystyle{ g(y) = 1 -\exp \left [\left (-y/\sigma \right )^{-b}\right ]\,\ \ 0 \leq y \leq w\ . }$$
(5.11)

It too has a scale parameter σ, but it also has a shape parameter b, giving it greater flexibility than the other models.

The fourth key function is the negative exponential model. All four models are shown in Fig. 5.2. The negative exponential model is in Distance for historical reasons, having been the first rigorously-developed line transect model (Gates et al. 1968), but we do not recommend its use, as it has no ‘shoulder’ (Sect. 5.1). Often, assumption failure leads to spiked data, for which there are many detections close to the line or point, with a sharp fall-off with distance. For example, in line transect sampling with poor estimation of distance, many detected animals may be recorded as on the line (zero distance). In such cases, a model selection tool such as AIC might select the negative exponential model because of its spiked shape. However, we should consider whether the negative exponential is a plausible model a priori for a detection function. If all animals on the line are certain to be detected, it is implausible that many animals just off the line will be missed. This is confirmed by hazard-rate modelling of the detection process (Hayes and Buckland 1983); even a sharply-spiked hazard of detection leads to a detection function with a shoulder. Thus there are good reasons not to fit a spiked detection function, even if the distance data appear spiked. Instead, we should look critically at field methods, to understand why the data are spiked, and how we might avoid such data in the future. Apart from measurement error (rounding to zero distance), spiked data can occur due to animal movement (e.g. dolphins approaching a ship to ride the bow wave), or because probability of detection on the line is less than one (in which case probability of detection might fall rapidly with distance).

Fig. 5.2
figure 2figure 2

Plots of the key functions available in Distance. Left plot: half-normal model \(g(y) =\exp \left [-y^{2}/(2\sigma ^{2})\right ]\) with σ = 1 (solid line); uniform model \(g(y) = 1/w\) (dashed line); negative exponential model \(g(y) =\exp (-y/\sigma )\) with σ = 1 (dotted line). Right plot: hazard-rate model \(g(y) = 1 -\exp \left [(-y/\sigma )^{-b}\right ]\) with σ = 1 and b = 2 (solid line), b = 3 (dashed line) and b = 5 (dotted line). In each case, truncation distance w = 2. 5

The software Distance allows three types of adjustment to be made to a key function. These adjustments are not needed unless the fit of the key function to the distance data is poor. By default, Distance uses Akaike’s Information Criterion and forward stepping to decide whether to add adjustment terms (Sect. 5.1.2). These default options can be changed by the user.

The three types of adjustment are as follows. The first is a cosine series. If used in conjunction with the uniform key function, this gives the Fourier series model. The second is a Hermite polynomial series. These have orthogonal properties with respect to the half-normal key function, and together, these give the Hermite polynomial model. The third is a simple polynomial series. In practice, any adjustment type can be used with any key function, and maximum likelihood methods are used to fit the models. Full details are given by Buckland et al. (2001, pp. 58–68).

2.2 Line Transect Sampling

A key assumption of conventional line transect sampling is that lines are placed independently of animal locations. This is achieved by placing the lines according to a randomized design — usually, a systematic grid of equally-spaced lines, randomly placed over the survey region. This ensures that animals available for detection are on average uniformly distributed with respect to distance from the line. (By having sufficient lines in the design, any chance non-uniformity averages out across the lines.) Hence any decline in numbers of detections with distance from the line reflects a fall in the probability of detection with distance.

2.2.1 Exact Distance Data

In statistical terms, we can model the relative frequencies of observed detection distances by fitting a probability density function f(x) to the distances x, 0 ≤ x ≤ w. Because animals on the surveyed strip and so available for detection are distributed uniformly with respect to distance from the line, we can assume that f(x) has exactly the same shape as g(x). More formally, f(x) is proportional to π(x)g(x) where π(x) represents the distribution of animals (whether detected or not) with distance from the line, and is uniform here, so that \(\pi (x) = 1/w\), 0 ≤ x ≤ w. Thus π(x) is in fact independent of x. A valid probability density function must integrate to one, and so we have

$$\displaystyle{ f(x) = \frac{\pi (x)g(x)} {\int _{0}^{w}\pi (x)g(x)\,dx\ } }$$
(5.12)

which due to the uniform distribution simplifies to

$$\displaystyle{ f(x) = \frac{g(x)} {\mu } }$$
(5.13)

where

$$\displaystyle{ \mu =\int _{ 0}^{w}g(x)\,dx\ . }$$
(5.14)

The quantity μ is the area under the detection function (Fig. 1.5), and is termed the effective strip half-width, because it is the distance from the line at which the expected number of animals detected beyond distance μ (but within w) equals the expected number of animals missed within a distance μ of the line (Fig. 5.3).

Fig. 5.3
figure 3figure 3

The effective strip half-width μ is the distance for which as many animals are detected at distance greater than μ (but less than w) as are missed closer to the line than μ. Thus the two shaded areas have the same size, and μ is the distance for which, if you were able to do a complete count of the strip extending a distance μ either side of the line, you would expect to detect the same number of animals as were detected within a distance w of the line

Thus for line transect sampling, the uniform model g(x) = 1 for 0 ≤ x ≤ w (i.e. strip transect sampling, Sect. 6.2.2.1) gives μ = w and

$$\displaystyle{ f(x) = 1/w\,\ \ 0 \leq x \leq w\ . }$$
(5.15)

In general, the integration of Eq. (5.14) does not have a closed analytic form, and is solved numerically. See Buckland et al. (2001, pp. 61–68) for details. Below, to illustrate how to fit a model using maximum likelihood methods, we take a model for which the integral does exist in closed form: the half-normal model without truncation.

If we denote the distances from the line of the n detected animals by \(x_{1},x_{2},\ldots,x_{n}\), then the likelihood function, conditional on n, is given by their joint probability density function. If we assume that the distances x i are independent, this likelihood is given by:

$$\displaystyle{ \mathcal{L}_{x} =\prod _{ i=1}^{n}f(x_{ i}) = \frac{\prod _{i=1}^{n}g(x_{i})} {\mu ^{n}} \ . }$$
(5.16)

We propose a model for f(x); in our example, this is the half-normal model with w = :

$$\displaystyle{ g(x) =\exp \left [\frac{-x^{2}} {2\sigma ^{2}} \right ] }$$
(5.17)

(note that g(0) = 1) so that

$$\displaystyle{ f(x) = \frac{\exp \left [\frac{-x^{2}} {2\sigma ^{2}} \right ]} {\mu } }$$
(5.18)

where

$$\displaystyle{ \mu =\int _{ 0}^{\infty }g(x)\,dx = \sqrt{\frac{\pi \sigma ^{2 } } {2}}\ . }$$
(5.19)

Hence

$$\displaystyle{ \mathcal{L}_{x} =\mu ^{-n}\exp \left [\frac{-\sum _{i=1}^{n}x_{ i}^{2}} {2\sigma ^{2}} \right ] = \left \{\frac{2} {\pi \sigma ^{2}} \right \}^{n/2}\exp \left [\frac{-\sum _{i=1}^{n}x_{ i}^{2}} {2\sigma ^{2}} \right ]\ . }$$
(5.20)

We now find the value of σ 2 that maximizes \(\mathcal{L}_{x}\). This is most easily done by taking the logarithm of \(\mathcal{L}_{x}\), differentiating the resulting function with respect to σ 2, and setting to zero, giving \(\hat{\sigma }^{2} = \frac{\sum _{i=1}^{n}x_{ i}^{2}} {n}\). This in turn allows us to estimate μ:

$$\displaystyle{ \hat{\mu }= \sqrt{\frac{\pi \sum _{i=1 }^{n }x_{i }^{2 }} {2n}} \ . }$$
(5.21)

More generally, the proportion P a of animals available for detection (i.e. within distance w of a line) that are actually detected may be expressed as the effective area searched divided by the total area of the sample plots:

$$\displaystyle{ P_{a} = \frac{2\mu L} {2wL} = \frac{\mu } {w}\ . }$$
(5.22)

Further, it follows from Eq. (5.13) that \(\mu = \frac{g(x)} {f(x)}\) for any x in (0, w). In particular, because g(0) = 1 by assumption, \(\mu = 1/f(0)\).

Thus the three parameters P a , μ and f(0) are related as:

$$\displaystyle{ P_{a} = \frac{\mu } {w} = \frac{1} {wf(0)}\ . }$$
(5.23)

Given a maximum likelihood estimate of any one of these parameters, the invariance property of maximum likelihood estimators allows us to calculate the maximum likelihood estimate of the other two. For example, given \(\hat{\mu }\), we have:

$$\displaystyle{ \hat{P}_{a} = \frac{\hat{\mu }} {w} }$$
(5.24)

and

$$\displaystyle{ \hat{f}(0) = \frac{1} {\hat{\mu }} \ . }$$
(5.25)

Because we are using maximum likelihood methods, we can estimate the standard error of \(\hat{f}(0)\) by first estimating the Fisher information matrix (Buckland et al. 2001, pp. 61–68). We can then estimate the standard error of \(\hat{\mu }\) and of \(\hat{P}_{a}\) using the approximate result cv\([\hat{f}(0)] =\mathrm{ cv}(\hat{\mu }) =\mathrm{ cv}(\hat{P}_{a})\), so that se\((\hat{\mu }) =\hat{\mu }\mathrm{ se}[\hat{f}(0)]/\hat{f}(0)\) and se\((\hat{P}_{a}) =\hat{ P}_{a}\mathrm{se}[\hat{f}(0)]/\hat{f}(0)\).

2.2.2 Grouped Distance Data

We have assumed that exact distances of detected animals from the line are recorded. Often, distances are grouped into bins. For example, in aerial surveys, the speed may be too great for the observer to record accurate distances to each detected animal, especially in areas of high density. Instead, it is easier simply to count the number of animals by distance bin, where cutpoints between bins are delineated for example by aligned markers on wing struts and windows. In this case, the model for f(x) is fitted to the grouped data, using a multinomial likelihood:

$$\displaystyle{ \mathcal{L}_{m} = \frac{n!} {m_{1}!\ldots m_{u}!}\prod _{j=1}^{u}f_{ j}^{m_{j} } }$$
(5.26)

where m j is the number of animals counted in distance interval j, \(j = 1,\ldots,u\), \(n =\sum _{ j=1}^{u}m_{j}\), and \(f_{j} =\int _{ c_{j-1}}^{c_{j}}f(x)\,dx\), where \(c_{0},c_{1},\ldots,c_{u}\) are the cutpoints between bins, with c u  = w, and c 0 = 0 (unless data are left-truncated; Buckland et al. 2001, pp. 153–154).

Maximum likelihood methods are again used to fit the model, as described by Buckland et al. (2001, pp. 62–64).

Often, distances are not binned in the field, but substantial rounding error may occur. In this circumstance, especially if there is rounding to zero distances, estimation may be more robust if distances are binned before analysis. In this case, the interval cutpoints that define the bins should be chosen to avoid favoured rounding distances (Buckland et al. 2001, p. 155). For example, if there is rounding to the nearest 10 m, cutpoints might be chosen at 15 m, 25 m, 35 m, \(\ldots\).

Little precision is lost by binning data. However, information is lost for assessing whether assumptions hold, and whether models for the detection function provide an adequate fit. If it is feasible to estimate exact distances, then such data should be recorded using aids (e.g. laser rangefinders, gps, clinometers, reticles) to reduce estimation error.

If data are analysed as exact, distance intervals still need to be defined for plotting distance data in histograms, and for conducting χ 2 goodness-of-fit tests. The Montrave case study on the book website explains how to specify interval cutpoints in Distance for analyses of exact distance data and of grouped distance data.

2.2.3 The Montrave Case Study: Line Transect Sampling

We use the robin data to illustrate line transect sampling. In total, 82 robins were detected, and the corresponding distance estimates ranged from 0 to 100 m. We see from Fig. 5.4 that there is rounding in the data. Seven distances were each recorded four times or more, and all are divisible by 5: 5, 15, 20, 25, 35, 45, 50 m. In principle, distances could be measured to the nearest metre, as a laser rangefinder was used. However, as is typical of songbird surveys in woodland habitat, many more birds were heard than seen. While the laser rangefinder was useful for measuring to visible objects, and aided distance estimation to singing birds that were not visible, nevertheless distance estimation to the nearest metre simply was not possible. To assess model fit, as distinct from revealing rounding in the data, we would like to select cutpoints both for our histogram and for the χ 2 goodness-of-fit test to avoid these favoured distances. Here, we use the following cutpoints: 0, 12.5, 22.5, 32.5, 42.5, 52.5, 62.5, 77.5, 95 m. Note the use of wider intervals at larger distances, where smaller numbers of birds are detected. For goodness-of-fit testing, a rough guide is that we should avoid expected numbers of observations of less than 5 for any interval (Sect. 5.1.4.1). Note too that we have truncated our distances at w = 95 m. We see from Fig. 5.4 that this truncates two of the 82 detections. Generally, some truncation increases the robustness of the analysis, and Buckland et al. (2001, p. 16) suggest truncation where probability of detection is estimated to be around 0.15. As we will see, this suggests truncating at somewhere close to 80 m. In reality for these data, it makes very little difference. However, for studies where probability of detection shows much more heterogeneity among individuals, resulting in a longer upper ‘tail’ to the distribution of distances, more truncation is typically needed, and choice of truncation distance is then important. In such cases, exploratory analyses should use different choices of truncation distance, before selecting a suitable value. Indicators that more truncation may be needed include poor model fit, the need for two or more adjustment terms to improve the fit, or observations at distances much greater than (e.g. more than three times) the mean detection distance.

Fig. 5.4
figure 4figure 4

Estimated distances of the 82 robin detections from the line for line transect sampling, plotted by 1 m interval

We show a histogram of the robin distance data with the above grouping in Fig. 5.5. It is now much easier to assess how detectability changes with distance. There is also a suggestion that some birds on or near the line either are not detected or move further from the line before detection. We will return to this later.

Fig. 5.5
figure 5figure 5

Distribution of robin detections by distance (line transect sampling). Note that probability density is plotted on the y-axis rather than counts, because interval width varies, so that untransformed counts are not a valid guide to the shape of the detection function

We will now consider fitting models for the detection function to these data. The models we will consider are the uniform key with cosine adjustment terms, the half-normal key with Hermite polynomial adjustment terms, and the hazard-rate key with simple polynomial adjustment terms. We could consider various other combinations of key function and adjustment term, but there is generally little to be gained from this.

The next decision is whether to analyse the distance data as if they are exact, or whether to group them. Of course, if they were collected in the field as grouped data, we have no option. The robin data are not grouped. In that circumstance, it is generally better to analyse the data as exact, unless the amount of rounding is fairly extreme, in which case grouping might be preferred to ameliorate the effect of rounding. For example, if there is a ‘spike’ of detections at zero distance caused by rounding, this can cause bias if we select a model that fits the spike. We might therefore choose to group the data, with the first group width being chosen to be large enough that all distances recorded as zero are likely to have come from the first group, had exact distances been recorded. We do not have this level of rounding, and so we will analyse the data as exact. (The Montrave case study description on the book web site clarifies how to analyse the data as grouped.)

We summarize estimates from each model in Table 5.1. We see that AIC indicates that there is very little to choose between the three models, with a slight preference for the uniform key function with cosine adjustments. For this model and the half-normal key with Hermite polynomial adjustments, estimates are very similar. However, the hazard-rate model gives a slightly higher estimated effective strip half-width, with a much smaller coefficient of variation. We can gain some insight into these differences by examining plots of the estimated detection functions (Fig. 5.6). We see that the first two model fits are very similar, while the hazard-rate model gives a wider, flatter shoulder to the fitted detection function. Generally, the wider and flatter the shoulder, the better the precision. However, if the true detection function does not have such a wide shoulder, then the hazard-rate estimate will be biased.

Table 5.1 Analysis summary for three detection function models applied to the robin line transect data from the Montrave case study
Fig. 5.6
figure 6figure 6

Estimated detection functions for the line transect data for robins. Fits of (a) uniform key with cosine adjustments, (b) hazard-rate model and (c) half-normal key with Hermite polynomial adjustments are shown

Of the models available in the Distance software, only the hazard-rate model is derived from a model of the detection process. However that detection process model does not incorporate any heterogeneity across individuals in probability of detection, except for that due to distance of the animal from the line. If in reality there are other sources of heterogeneity, then this causes the shoulder to become more rounded. This is illustrated in Fig. 11.2 of Buckland et al. (2004).

To further assess which model to use, we can examine the results of the goodness-of-fit tests. The output from Distance reveals that for the Kolmogorov–Smirnov test and both the weighted and the unweighted versions of the Cramér–von Mises test, and for all three models, p-values all exceed 0.2, indicating that all models provide good fits to the data. We show the results of the χ 2 goodness-of-fit tests for each model in Table 5.2. For these tests, we have pooled the last two distance categories (62.5–77.5 m and 77.5–95 m) to avoid expected values less than five (Sect. 5.1.4.1). Again, all three models provide an excellent fit to the data, as judged by this test. (The χ 2 statistic should be significantly higher than its degrees of freedom to find evidence of poor fit, and for all three models, the statistic is close to the degrees of freedom.)

Table 5.2 Observed numbers of robins detected by distance interval (line transect sampling), together with expected numbers under each of the fitted models

We see from Table 5.2 that only 11 robins were detected within 12.5 m of the line, whereas under these models, around 16 or 17 were ‘expected’. However, we now see that this discrepancy can easily arise by chance, so that there is insufficient evidence here to suggest that an assumption may have failed (either failure to detect all birds on the line, or movement of birds away from the line prior to detection).

We thus have three excellent model fits, and need to decide which model to use. Generally, we would simply choose a criterion, probably AIC, and go with the best model as judged by that criterion (in this case, the uniform key with cosine adjustments).

2.2.4 Cork Oaks: An Illustration of Pooling Robustness

We use data from a study to investigate differences in density of cork oak (Quercus suber) seedlings, saplings and young trees between 13 conservation zones and 14 management zones located in eight different estates. In each zone, three to five 50 m lines were surveyed, and distances from the line of detected seedlings, saplings and young trees recorded. Here, we analyse the distance data to illustrate pooling robustness (Sect. 5.1.1).

We see from Fig. 5.7 that, as might be expected, seedlings are less detectable than saplings, which are in turn less detectable than young trees. When data are pooled (bottom right plot of Fig. 5.7), the histogram shows the characteristics of a dataset with substantial heterogeneity in detection probabilities: a spiked histogram, with frequencies dropping rapidly with distance interval, together with a long upper tail, because a few, highly visible individuals are detected at relatively large distances. Indeed, the seedling and sapling histograms also show these characteristics.

Fig. 5.7
figure 7figure 7

Perpendicular distances of detected cork oak seedlings, saplings and young trees from the line

Preliminary analyses revealed that, of the models considered, the hazard-rate model provided the best fit to these data, and so we show results for this model only, with the addition of a simple polynomial adjustment if this improved the AIC value.

In this example, we have large sample sizes, so that we can fit separate detection functions for each category of tree. Resulting density estimates are given in Table 5.3. We will use these to compare estimates obtained below. All models provided satisfactory fits, as judged by χ 2, Kolmogorov–Smirnov and Cramér–von Mises goodness-of-fit tests.

Table 5.3 Estimated tree densities \(\hat{D}\) (numbers per hectare) together with coefficients of variation cv(\(\hat{D}\)) for the separate categories of seedling, sapling and young tree, and for the three categories combined

Very often, sample sizes are too small to allow independent estimation of the detection function for each stratum. We now consider the implications when data are pooled across strata before fitting the detection function. The distribution for the pooled data has a long tail (Fig. 5.7); for such datasets, substantial truncation gives analyses with greater robustness. Preliminary analyses suggested a truncation distance of w = 4 m was reasonable for these data. To verify that detectability does vary across strata, we will also analyse the data, fitting a separate detection function to each stratum, but this time with truncation at w = 4 m for each stratum, allowing us to use AIC to compare a pooled fit with separate fits by stratum. Results are shown in Table 5.4. (We cannot use AIC to select between the stratified analyses of Table 5.3 and the pooled analyses of Table 5.4 because the truncation distance is not the same in every analysis, hence the need for the stratified analyses reported in Table 5.4.)

Table 5.4 Estimated tree densities \(\hat{D}\) (numbers per hectare) together with coefficients of variation cv(\(\hat{D}\)) for the separate categories of seedling, sapling and young tree, and for the three categories combined

AIC shows a clear preference for the stratified analysis, and the stratified results are almost identical to those of Table 5.3. It is perhaps surprising that the estimated density of young trees is unchanged, and the estimated precision very similar, despite having reduced the truncation distance from w = 15 m to w = 4 m for this tree type, which reduced sample size from n = 543 to n = 247. With a truncation distance of w = 4 m, the model estimates that all young trees in the strip are detected.

We can now see the effect of pooling robustness. If we ignore the large variability in detectability among strata, we estimate total density as 1320 individuals per hectare, a change of a little more than 2 % from the estimate allowing for that variability. However, the stratum-specific estimates do not benefit from pooling robustness, and if we compare density estimates by tree type, we see that seedling density reduces by nearly 20 % and young tree density increases by nearly 140 % when we ignore variability among strata. This is because we overestimate detectability of seedlings, and underestimate detectability of young trees, leading to underestimation and overestimation of density respectively. The message is clear: do not rely on pooling robustness if stratum-specific estimates are important, unless you have evidence that detectability varies little if at all among strata.

We will return to this example in Sect. 5.3.2.3, to show how the heterogeneity in the detection probabilities can be modelled.

2.3 Point Transect Sampling

2.3.1 Exact Distance Data

The mathematics is not quite so straightforward for point transect sampling as for line transect sampling. As for line transect sampling, we assume that g(0) = 1, which in this case equates to assuming that an animal at the point is certain to be detected. We now use the recorded distances r of detected animals from the point (i.e. animal-to-observer distances) to model the detection function g(r), which is the probability of detecting an animal that is at distance r (0 ≤ r ≤ w) from the point.

We assume that points are placed independently of animal locations. Animals available for detection now increase linearly with distance — for example we expect twice as many animals to be present on average at 20 m from a point than at 10 m (Fig. 5.8). This means that animals have a triangular distribution with respect to distance from the point. We therefore expect numbers of detections to increase with distance from the point at small distances. At larger distances, the increasing numbers of animals available for detection are offset by the decreasing probability of detection (Fig. 5.9).

Fig. 5.8
figure 8figure 8

In point transect sampling, the area in an annulus with internal radius r and external radius r + dr is approximately 2π rdr. Thus if we double the distance from the point, the area in the annulus doubles, and we expect twice as many animals to be present. In mathematical terms, the distribution of animals with respect to distance from the point is triangular: π(r) = Cr where C is a constant. The honesty condition \(\int _{0}^{w}\pi (r)\,dr = 1\) gives \(C = 2/w^{2}\)

Fig. 5.9
figure 9figure 9

In point transect sampling, the number of animals available for detection at a given distance increases with distance. Hence histograms of detection distances from the point initially increase roughly linearly. At larger distances, many animals are not detected, so that the initial increase in frequencies slows, then reverses, giving rise to a histogram whose maximum frequency is typically at mid-distance

As for line transect sampling, we model the relative frequencies of observed detection distances by fitting a probability density function f(r) to the distances r, 0 ≤ r ≤ w. We still have f(r) proportional to π(r)g(r), but now, π(r) is triangular: \(\pi (r) = 2r/w^{2}\). Hence f(r) is proportional to \(r \cdot g(r)\). Again using the result \(\int _{0}^{w}f(r)\,dr = 1\), we find:

$$\displaystyle{ f(r) = \frac{\pi (r)g(r)} {\int _{0}^{w}\pi (r)g(r)\,dr\ } = \frac{rg(r)} {\int _{0}^{w}rg(r)\,dr\ } }$$
(5.27)

which can be written as

$$\displaystyle{ f(r) = \frac{2\pi rg(r)} {\nu } }$$
(5.28)

where

$$\displaystyle{ \nu = 2\pi \int _{0}^{w}rg(r)\,dr\ . }$$
(5.29)

The rearrangement above is helpful as the quantity ν is the effective area surveyed per point, and ν = π ρ 2 where ρ is the effective radius — the expected number of animals detected beyond distance ρ (but within w) equals the expected number of animals missed within a distance ρ of the point (Fig. 5.10).

Fig. 5.10
figure 10figure 10

The effective radius ρ is the distance for which as many animals are detected at distance greater than ρ (but less than w) as are missed closer to the point than ρ. Thus the two shaded areas have the same size, and ρ is the distance for which, if you were able to do a complete count of the circular plot of radius ρ, you would expect to detect the same number of animals as were detected within a distance w of the point

If we denote the distances from the point of the n detected animals by \(r_{1},r_{2},\ldots,r_{n}\), then the likelihood function (conditional on n) is:

$$\displaystyle{ \mathcal{L}_{r} =\prod _{ i=1}^{n}f(r_{ i}) = \left [\frac{2\pi } {\nu } \right ]^{n}\prod _{ i=1}^{n}r_{ i}g(r_{i})\ . }$$
(5.30)

We can expect that good models for the detection function g(x) in line transect sampling will also provide good models for g(r) in point transect sampling, which allows us to derive corresponding models for f(r) using Eq. (5.28). As for line transect sampling, we substitute our chosen model into the likelihood, and maximize it, to obtain estimates of the model parameters (Buckland et al. 2001, pp. 61–68).

The proportion P a may be expressed as the effective area searched divided by the covered area:

$$\displaystyle{ P_{a} = \frac{K\nu } {K\pi w^{2}} = \frac{\nu } {\pi w^{2}}\ . }$$
(5.31)

We have from Eq. (5.28)

$$\displaystyle{ f(r)/r = \frac{2\pi g(r)} {\nu } \ . }$$
(5.32)

We also know that g(0) = 1, and so

$$\displaystyle{ \begin{array}{c} \mathrm{lim}\\ r \rightarrow 0 \end{array} \frac{f(r)} {r} = \frac{2\pi } {\nu } \ . }$$
(5.33)

The left-hand side of this equation is the slope of f(r) at r = 0; we denote this by h(0). Hence

$$\displaystyle{ \nu = \frac{2\pi } {h(0)}\ . }$$
(5.34)

Thus, having fitted our model for f(r), we can estimate h(0) (by \(\hat{h}(0)\)), from which

$$\displaystyle{ \hat{\nu }= \frac{2\pi } {\hat{h}(0)} }$$
(5.35)

and

$$\displaystyle{ \hat{P}_{a} = \frac{2} {w^{2}\hat{h}(0)}\ . }$$
(5.36)

We use an estimate of the information matrix to estimate the variance of \(\hat{h}(0)\), and hence of \(\hat{\nu }\) and \(\hat{P}_{a}\). See Buckland et al. (2001, pp. 61–68).

2.3.2 Grouped Distance Data

If distances are grouped, instead of Eq. (5.30), we get a multinomial likelihood (Buckland et al. 2001, p. 63):

$$\displaystyle{ \mathcal{L}_{m} = \frac{n!} {m_{1}!\ldots m_{u}!}\prod _{j=1}^{u}\pi _{ j}^{m_{j} } }$$
(5.37)

where m j is the number of animals counted in distance interval j, \(j = 1,\ldots,u\), \(n =\sum _{ j=1}^{u}m_{j}\), and \(\pi _{j} =\int _{ c_{j-1}}^{c_{j}}f(r)\,dr\), where \(c_{0} = 0,c_{1},\ldots,c_{u} = w\) are the cutpoints between bins.

2.3.3 The Montrave Case Study: Point Transect Sampling

We use the robin data from the snapshot method to illustrate point transect sampling. In total 52 robins were detected, with distance estimates ranging from 20 to 120 m. Note the lack of detections close to the point. This may be partly due to avoidance of the observer, but is primarily because in point transect sampling, the area surveyed and hence the number of birds available to be detected close to the point is low. As for line transect sampling, Fig. 5.11 shows clear evidence of rounding to the nearest 5 m. We should therefore choose cutpoints that avoid these favoured rounding distances to assess model fit. We take the following cutpoints: 0, 22.5, 32.5, 42.5, 52.5, 62.5, 77.5, 110 m. Note the wider intervals at both small and large distances, where smaller numbers of birds are detected, thus ensuring that we avoid small expected values in the χ 2 goodness-of-fit test (Sect. 5.1.4.1). Our choice of truncation distance, w = 110 m, means that we have truncated just two detections of the 52 (Fig. 5.11). When heterogeneity in probability of detection due to factors other than distance is strong, typically substantially more observations should be truncated for robust estimation from point transect data, perhaps 10 % or more (Buckland et al. 2001, p. 151). However, for the robin data, distance was the only factor that significantly affected detectability of a singing male, so that choice here is not influential. If substantial heterogeneity is, or might be, present, exploratory analyses should be carried out with a range of truncation distances. A suitable choice should allow the data to be well-modelled with just one or two parameters, or perhaps three if sample size is large. (The total number of parameters is equal to the number of key parameters — none for the uniform, one for the half-normal and two for the hazard-rate — plus the number of adjustment terms fitted.)

Fig. 5.11
figure 11figure 11

Estimated distances of the 52 robin detections from the point for snapshot point transect sampling, plotted by 1 m interval

Figure 5.12 shows a histogram of the robin distance data with the above grouping. Unlike for line transect sampling, this histogram is not a guide to the shape of the detection function, because the detection function g(r) and the probability density function f(r) do not have the same shape (Sect. 5.2.3.1). However, we note that there are few detections in the first interval (in fact, Fig. 5.11 shows that there is just one observation of less than 22.5 m), and a relatively large number between 32.5 and 42.5 m.

Fig. 5.12
figure 12figure 12

Distribution of robin detections by distance (snapshot point transect sampling). Note that probability density is plotted on the y-axis rather than counts, because interval width varies, so that untransformed counts are not a valid guide to the shape of the probability density function

We will fit the same three detection function models as for line transect sampling: the uniform key with cosine adjustments, the half-normal key with Hermite polynomial adjustments, and the hazard-rate key with simple polynomial adjustments. We will also analyse the data as exact; again, the degree of rounding in these data is not sufficient to warrant grouping the data for analysis. (Remember that the groups defined above are purely for presenting the data in the form of a histogram, and for goodness-of-fit testing using χ 2; they are not used for fitting the model, unless we decide that they should be.)

We summarize estimates from each model in Table 5.5. In contrast with the line transect analysis, AIC gives a clear preference for the hazard-rate model. The hazard-rate model gives the highest estimate of effective radius, although the differences are not large. However, one of the models, the half-normal with Hermite polynomial adjustments, gives a much higher coefficient of variation and much wider confidence interval than do the other models. Again, plots of our model fits help to interpret these results. For point transect sampling, we need two sets of plots. In Fig. 5.13, we show the fitted probability density functions and the corresponding detection functions. In the latter plots, the frequency counts have been scaled, making model fit more difficult to assess, so we use the plots of the probability density functions to assess the fits of our models to the data, and the detection function plots to see the shape of the fitted detection functions.

Table 5.5 Analysis summary for three detection function models applied to the robin snapshot point transect data from the Montrave case study
Fig. 5.13
figure 13figure 13

Estimated detection functions (left) and probability density functions (right) for the snapshot point transect data for robins. Fits of the uniform key with cosine adjustments (a, b), the hazard-rate model (c, d), and the half-normal key with Hermite adjustments (e, f) are shown

We see from these plots that there appear to be too few detections in the first interval (0–22.5 m), and too many in the third interval (32.5–42.5 m). This suggests that avoidance may have occurred, with birds initially close to the point moving to beyond 32.5 m. The hazard-rate model can fit a very flat, wide shoulder to the detection function, and so is able to fit these data most successfully.

We need to examine the goodness-of-fit tests to draw firmer conclusions. Perhaps surprisingly, Distance output shows that for all three models, the Kolmogorov–Smirnov test and both the weighted and the unweighted versions of the Cramér–von Mises test all yield p-values that exceed 0.2, indicating that all models provide good fits to the data. The χ 2 goodness-of-fit tests for each model are summarized in Table 5.6. The p-values corresponding to the χ 2 statistics are 0.069 for the uniform-cosine model, 0.079 for the half-normal-Hermite model, and 0.173 for the hazard-rate model. Thus no model is rejected when testing at the 5 % level, although the uniform-cosine and half-normal-Hermite models give significance levels only slightly above 5 %.

Table 5.6 Observed numbers of robins detected by distance interval (snapshot point transect sampling), together with expected numbers under each of the fitted models

Table 5.6 shows that only one robin was detected within 22.5 m of the point, whereas each model suggests that around six should have been detected. Further, 13 were detected between 32.5 and 42.5 m, whereas the models suggest that around 7–9 should have been detected. The χ 2 goodness-of-fit results therefore give some evidence of avoidance behaviour of robins, although it is not conclusive.

The choice of model is not easy here. AIC favours the hazard-rate model (which has an AIC weight of 0.66, compared with around 0.17 for each of the other two models). The χ 2 goodness-of-fit statistics also give support, albeit inconclusive, for this model. We therefore choose to base our inference on the results from this model, although we note that, if there was avoidance behaviour, then this may be the reason that the hazard-rate model is favoured, and in practice, this model may underestimate density a little.

2.4 Summary

We use y to represent either distance x from the line (line transect sampling) or distance r from the point (point transect sampling), and we denote the distribution of distances of animals (whether detected or not) from the line or point by π(y), where for line transect sampling and random line placement, \(\pi (y) = 1/w\), and for point transect sampling with random point placement, \(\pi (y) = 2r/w^{2}\). The detection function g(y) is the probability that an animal at distance y from the line or point is detected, 0 ≤ y ≤ w. Then P a is the expected probability of detection for an animal within distance w of the line or point, where expectation is over distance y:

$$\displaystyle{ P_{a} =\int _{ 0}^{w}g(y)\pi (y)\,dy\ . }$$
(5.38)

Further, the effective area surveyed around line or point k is ν = a k P a where a k  = 2wl k for line transect sampling, where l k is the length of line k, and a k  = π w 2 for point transect sampling. The corresponding effective half-width of the strip centred on a line is μ = wP a , and the effective radius around a point is \(\rho = w\sqrt{P_{a}}\).

We also have \(P_{a} = 1/[wf(0)]\) for line transect sampling, where f(0) is the probability density function of distances y evaluated at y = 0, and \(P_{a} = 2/[w^{2}h(0)]\) for point transect sampling, where h(0) is the slope of the probability density function of distances y, evaluated at y = 0. Statistical tools allow us to fit the probability density function to observed distances y i , \(i = 1,\ldots,n\), and hence we can estimate P a and any function of P a , such as effective area.

3 Multiple-Covariate Distance Sampling

3.1 Adding Covariates to the Detection Function Model

So far, we have assumed that probability of detection for an animal depends only on its distance from the line or point. In many circumstances, this leads to reliable estimates of abundance even when we know that some animals are inherently more detectable that others (see Sect. 5.1.1). However, it can be useful to model the dependence of detectability on covariates such as habitat, animal behaviour, cluster size, observer, etc.

Multiple-covariate distance sampling (MCDS) is covered in detail by Marques and Buckland (20032004). We give a summary of those methods here.

In Sect. 5.2, we defined three key functions for modelling the detection function g(y). Two of these, the half-normal detection function (Eq. (5.10)) and the hazard-rate detection function (Eq. (5.11)) each has a scale parameter σ. (The negative exponential model also has a scale parameter, but we will not consider that model here.) A natural extension of the modelling of the previous section is to model the scale parameter as a function of covariates. The scale parameter must always be positive, and so a natural model for the scale parameter, ensuring that this constraint is respected, is as follows.

$$\displaystyle{ \sigma (\mathbf{z}_{i}) =\exp \left (\alpha +\sum _{q=1}^{Q}\beta _{ q}z_{iq}\right ) }$$
(5.39)

where \(\mathbf{z}_{i} = \left (z_{i1},z_{i2},\ldots,z_{iQ}\right )^{{\prime}}\) is a vector of covariate values recorded for the i th detected animal, and \(\alpha,\beta _{1},\ldots,\beta _{Q}\) are coefficients to be estimated.

We can now write the half-normal detection function for detection i as

$$\displaystyle{ g(y_{i},\mathbf{z}_{i}) =\exp \left [ \frac{-y_{i}^{2}} {2\sigma ^{2}(\mathbf{z}_{i})}\right ]\ \ 0 \leq y_{i} \leq w }$$
(5.40)

and the hazard-rate detection function as

$$\displaystyle{ g(y_{i},\mathbf{z}_{i}) = 1 -\exp \left [\left (-y_{i}/\sigma (\mathbf{z}_{i})\right )^{-b}\right ]\ \ 0 \leq y_{ i} \leq w\ . }$$
(5.41)

Given an appropriately randomized design, we know the distribution of y in the population: it is uniform on (0, w) for line transect sampling, and triangular for point transect sampling. Typically however, we do not know the distribution of covariate z iq for \(q = 1,\ldots,Q\). Hence for MCDS, in addition to conditioning on n when specifying the likelihood, we also condition on the observed z iq :

$$\displaystyle{ \mathcal{L}_{y\vert \mathbf{z}} =\prod _{ i=1}^{n}f_{ y\vert \mathbf{z}}(y_{i}\vert \mathbf{z}_{i}) }$$
(5.42)

where \(f_{y\vert z}(y_{i}\vert \mathbf{z}_{i})\) is the probability density function of y i conditional on the covariates z i (and on n). For line transect sampling, distance y i becomes x i , the shortest distance from the line of the i th detected animal, and

$$\displaystyle{ f_{x\vert z}(x_{i}\vert \mathbf{z}_{i}) = \frac{g(x_{i},\mathbf{z}_{i})} {\mu (\mathbf{z}_{i})} }$$
(5.43)

where

$$\displaystyle{ \mu (\mathbf{z}_{i}) =\int _{ 0}^{w}g(x,\mathbf{z}_{ i})\,dx\ . }$$
(5.44)

As g(0, z i ) = 1 by assumption, if we set x i  = 0 in Eq. (5.43), we can write

$$\displaystyle{ \mu (\mathbf{z}_{i}) = \frac{1} {f_{x\vert z}(0\vert \mathbf{z}_{i})}\ . }$$
(5.45)

For point transect sampling, y i becomes r i , the distance from the point of the i th detected animal, and

$$\displaystyle{ f_{r\vert \mathbf{z}}(r_{i}\vert \mathbf{z}_{i}) = \frac{2\pi r_{i} \cdot g(r_{i},\mathbf{z}_{i})} {\nu (\mathbf{z}_{i})} }$$
(5.46)

where

$$\displaystyle{ \nu (\mathbf{z}_{i}) = 2\pi \int _{0}^{w}r \cdot g(r\vert \mathbf{z}_{ i})\,dr\ . }$$
(5.47)

Again noting that g(0, z i ) = 1 by assumption, we find from Eq. (5.46) that

$$\displaystyle{ \nu (\mathbf{z}_{i}) = \frac{2\pi } {h(0,\mathbf{z}_{i})} }$$
(5.48)

where

$$\displaystyle{ h(0,\mathbf{z}_{i}) = \begin{array}{c} \mathrm{lim} \\ r \rightarrow 0 \end{array} \frac{f_{r\vert \mathbf{z}}(r\vert \mathbf{z}_{i})} {r} \ . }$$
(5.49)

We can now maximize the conditional likelihood to give estimates of the coefficients \(\alpha,\beta _{1},\ldots,\beta _{Q}\). As for conventional distance sampling, we can add series adjustment terms to the detection function if model fit is poor, although in software Distance, there are no monotonicity constraints when using the mcds analysis engine, so implausible fits may occur.

3.2 MCDS Case Studies

3.2.1 Hawaii Amakihi

We illustrate fitting MCDS models to point transect data using a bird survey in Hawaii. The survey was implemented to assess a translocation experiment of an endangered Hawaiian honeycreeper on the island of Hawaii (Fancy et al. 1997). For illustration, we select data on an abundant species, the Hawaii amakihi (Hemignathus virens, Fig. 5.14). The same data set, which is available as a sample project in software Distance, was analysed by Marques et al. (2007), a tutorial-like paper for MCDS, where further details about the survey and data analysis are described.

Fig. 5.14
figure 14figure 14

The Hawaii amakihi, one the most abundant native Hawaiian honeycreepers. Photo: Jack Jeffrey

The survey was conducted over seven time periods, which we consider as temporal strata. In each period, counts were made at up to 41 points, leading to a total of 1485 amakihi detections. Here we pool data across time periods and concentrate on modelling the effects of covariates on the detection function. (In Sect. 6.4.3.4, we build on this to estimate density per time period.)

The available covariates include observer (a factor with three levels) and time of day. Time of day was coded both as a factor covariate (hour, six levels) and as the number of minutes after sunrise. A priori, these covariates were expected to have an influence on detectability: more experienced observers might be expected to detect more birds, and as birds become less active later in the day, they are also likely to become less detectable.

It is important to gain a good understanding of the data prior to fitting the detection function. Exploratory data analysis can reveal problems in the data and anticipate problems in the analysis, guiding the remainder of the modelling process. The detection distances are summarized in Fig. 5.15. Also shown is the relationship between the distances and each of the available covariates.

Fig. 5.15
figure 15figure 15

The detection distances for the Hawaii amakihi survey with different bins and truncation (left column), and as a function of the available covariates (right column). The black line in the middle right plot is a simple regression line

We can now draw some initial conclusions. First, although the largest detection distance is 250 m, there are relatively few beyond 100 m. Unusually large detection distances can be difficult to model satisfactorily, and contribute little to fit of the model at small distances, which is where fit is important. As with CDS models, truncating these distances will facilitate model fitting, for example by reducing or eliminating the need for adjustment terms (Sect. 5.2.1) to achieve a satisfactory fit. Despite reducing sample size, such truncation often results in more precise estimation. See Sects. 5.2.2.3 (line transect sampling) and 5.2.3.3 (point transect sampling) for more discussion of truncation.

The distance data exhibit a mode at mid-distances, as expected for point transects (Sect. 5.2.3.1). If we plot the data using a large number of bins, heaping to the nearest 5 m is evident, especially between 60 and 80 m (Fig. 5.15, top and middle left panels). Although we could proceed with an analysis based on binned data, choosing bins judiciously to solve the problem (Sect. 4.2.2.1), we proceed with the analysis of exact data; modest amounts of heaping have little impact on analyses. However, we still need to select bin cutpoints for conducting a goodness-of-fit test, and in the presence of heaping, those cutpoints need to be selected with care, to ensure that most distances, after rounding, remain in the correct bin (Sect. 5.1.4.1).

Figure 5.15 suggests that all covariates have an effect on the detection distances, and hence we can anticipate that these will be useful for explaining detectability.

After conducting this exploratory analysis and evaluating the sensitivity of results to several truncation distances, we selected w = 82. 5 m. This corresponds to truncating about 16 % of the data. Note that this truncation choice avoids a preferred rounding distance. This is reasonable as we expect that distances of between 77.5 and 82.5 m would tend to be rounded to 80 m (and hence included in the analysis), while distances between 82.5 and 87.5 m would tend to be rounded to 85 m (and hence excluded); one should never choose for truncation a distance to which rounding was evident. Alternatively, one might prefer to consider a more severe truncation, perhaps somewhere between 50 and 60 m, which would correspond to 42.7 % and 35.4 % of the data, respectively. This would avoid the poor fit due to the apparent lack of detections at around 50–60 m relative to greater distances (bottom left panel of Fig. 5.15), but would not affect estimates greatly.

We begin by implementing a conventional distance sampling (CDS) analysis over the pooled data. The candidate detection function models were the uniform key with cosine adjustments, the half-normal key with cosine adjustments, and the hazard-rate key with simple polynomial adjustments. The best model according to AIC is the half-normal with four cosine adjustment terms, followed by the uniform with two cosine terms and the hazard-rate without adjustment terms (Table 5.7).

Table 5.7 Summary details for models fitted to the amakihi data: key function used (HN is half-normal, HR is hazard-rate, Uni is uniform); number of parameters in the model (Pars); ΔAIC values; effective detection radius (EDR); and p-values for χ 2, Cramér–von Mises (CvM) and Kolmogorov–Smirnov (KS) goodness-of-fit tests

As we expect detectability to be affected by the covariates, the next step is to include these in the modelling of the detection function. We therefore consider both the hazard-rate and the half-normal model with various combinations of covariates. (We do not consider models that include both the continuous covariate mas and the factor hour as the two variables are highly correlated.) Results appear in Table 5.7. None of the models with the hazard-rate key require adjustment terms, while the half-normal models always require cosine terms. The covariate representing observer seems to be very important in explaining detectability; all models that include it have a considerably lower AIC than any of the CDS models or any of the other MCDS models without this covariate. Overall, detectability, as reflected by the estimates of the effective detection radius, is relatively constant across models, suggesting that the analysis is robust to model choice. Models that include time recorded as a continuous variable (mas) are usually preferred to those that include time as a factor (hour). The p-values for all χ 2 tests indicate poor fits, while none of the Cramér–von Mises tests do so. This probably indicates heaping in the data (Sect. 4.2.2.1) rather than mis-fit of the model; the χ 2 test is sensitive to such heaping, unless distance intervals can be identified that ensure most detections are recorded in the correct interval, while the Cramér–von Mises test is not.

Here, we draw inference based on the MCDS model with the lowest AIC, which includes observer and the continuous time covariate. Output from this model is shown in Fig. 5.16. The fit of the model seems reasonable. The apparent bad fit due to the spike in detections for the first interval in the top left plot is mis-leading, as the histogram bars have been rescaled, exaggerating the mis-fit at small distances relative to that at larger distances; we should judge model fit from the top right plot of the probability density function (Sect. 5.2.3.3), which indicates good fit close to the point.

Fig. 5.16
figure 16figure 16

The model selected by AIC for the Hawaii amakihi survey data was a hazard-rate model with no adjustments, and including observer (obs) and minutes since sunrise (mas) as covariates. Top left shows the estimated detection function, top right shows the estimated probability density function, bottom left is the detection function as a function of observer (each line represents one observer), and bottom right is the detection function as a function of minutes since sunrise for the best observer (the top line represents zero minutes from sunrise and subsequent lines represent hourly increments up to 5 h from sunrise)

Figure 5.16 indicates that one of the three observers was able to detect birds at larger distances. We also see that detectability decreases as expected with time since sunrise. Marques et al. (2007) also considered detection functions fitted separately to each of the seven sampling periods, assessing whether a stratified detection function would be better than one including the covariates. AIC showed clearly that this was not the case, suggesting that the available covariates were the predominant sources of heterogeneity in detectability over time. This shows how MCDS can be a parsimonious way of allowing detection probabilities to vary across strata, by fitting a single MCDS detection function across strata. In our case, the strata were the time periods, but they could equally well be spatial strata. Specific software implementation details are provided on the online amakihi case study.

3.2.2 The Montrave Case Study

In this case study, sample sizes for one species, the great tit, tended to be small. In particular, for the snapshot point transect method, the number of detections within a distance w = 110 m was just n = 18. Thus we have insufficient data for reliable modelling of the detection function, and we can expect poor precision. A possible solution is to pool data across species, and conduct an MCDS analysis, with species as a factor-type covariate. The Montrave case study on the book website explains how to do this.

In Table 5.8, we show estimates of the effective radius for each species, both for independent analyses by species and for the above MCDS analysis. AIC favoured the hazard-rate model over the half-normal model for the MCDS analysis. For comparability, for the independent analyses, just the half-normal and hazard-rate models were fitted, with no adjustment terms.

Table 5.8 Estimates \(\hat{\rho }\) of effective detection radius (m) for the snapshot point transect method for independent analyses of each species, and for MCDS with species as a factor. (Coefficients of variation in parentheses.)

We see that point estimates of effective radius are sensitive to choice of model, but not to whether a pooled MCDS analysis is used. The pooled analysis gives a smaller coefficient of variation for all but one of the eight comparisons (four species, two models for each species). Thus the MCDS analysis gives better precision, at the expense of having to assume the same model (hazard-rate or half-normal) for all species. We suggest that data should be pooled across species when multi-species surveys are carried out, but that the pooling should be restricted to sets of species with similar characteristics in terms of their detectability, so that the detection function can be expected to have a similar shape. Those characteristics might reflect song frequency, behaviour, whether the species typically occupies the canopy or the ground stratum, etc.

3.2.3 Cork Oaks

In Sect. 5.2.2.4, we saw that for line transect sampling of cork oaks, detectability varied by tree type (seedling, sapling or young tree). Here, we seek to model heterogeneity in detection probabilities using MCDS. Tree type is an obvious covariate to include, and is a factor with three levels, corresponding to the three tree types. Also recorded was tree height, a continuous covariate which may more directly relate to detectability than does tree type. We also consider whether the type of plot (‘conservation’ or ‘management’, and so a factor with two levels) affects detectability. Using a truncation distance of w = 4 m, the hazard-rate model (without adjustment terms) gave substantially better fits than the half-normal model, and we show AIC values for each possible hazard-rate model in Table 5.9.

Table 5.9 Density estimates \(\hat{D}\) for each tree type under hazard-rate models for the detection function

We see that all models give good estimates of overall density (a consequence of pooling robustness), and, apart from those with neither height nor tree type, all models give estimates of the density of each tree type that are in reasonable agreement with those of Table 5.3. The models with small ΔAIC values all have height in them. While there is some evidence for including either tree type or plot type, or both, as well as height, the model with just height seems adequate. We show the fitted detection functions at the median height, and also at the lower and upper quartiles, in Fig. 5.17. As there are relatively few young trees (far less than 25 %), these plots do not give an indication of the fitted detection function for young trees. In Fig. 5.18, we show the comparable plots for each tree type, obtained from the model with both height and tree type. We see that the model estimates that almost all young trees of height 90 cm or more and within 4 m of the transect are detected.

Fig. 5.17
figure 17figure 17

Fitted detection function corresponding to the median (13 cm) and lower (8 cm) and upper (30 cm) quartile heights of trees pooled across categories (seedlings, saplings and young trees)

Fig. 5.18
figure 18figure 18

Fitted detection function corresponding to the median and lower and upper quartile heights of trees for each tree type. Panel (a) depicts seedlings, panel (b) saplings and panel (c) young trees

From the cork-oak study, we construct a small simulation study to assess the robustness of the estimate of overall abundance if the presence of the three types of trees (seedling, samplings and young trees) is ignored by the analyst. We fitted the hazard-rate detection function to the cork oak data with tree type as a factor using program Distance with truncation distance w = 4 m, giving values of the scale parameter σ of 13.13, 0.90 and 0.65, corresponding to young trees, saplings and seedlings respectively, and a common shape parameter of 1.21.

We performed 500 replicates of the simulation with a true density of 1350 trees ha−1. Simulated trees were assigned one of the three age-specific values of σ with equal probabilities. Candidate models (ignoring size classes whether as strata or as a factor covariate with three levels) were fitted to each replicate set of detection perpendicular distances. The candidate models were hazard-rate key with Hermite adjustment terms or half-normal key with cosine adjustments. Detection distances in the simulated data were truncated at w = 4 m. This resulted in a percent relative bias of −4.4 %, confidence interval coverage of 92 %, and a distribution of estimates shown in Fig. 5.19.

Fig. 5.19
figure 19figure 19

Distribution of density estimates of cork oaks from 500 replicates when data are simulated from hazard-rate detection functions with differing values of σ for three age classes. True density (shown by vertical line) is 1350 trees ha−1

This simulation was performed using the R package DSsim and is included as a case study on the book website. The amount of heterogeneity in detectability in the simulation is evident from comparing the top panel of Fig. 5.18 with the bottom panel: for young trees, probability of detection is still close to one at 4 m, while for seedlings, probability of detection has fallen to around 0.25 at 2 m. As indicated in Sect. 5.1.1, this extreme amount of heterogeneity produces only small bias in the population-level estimation of density. This is in contrast to mark-recapture methods, as described by Link (20032004), where any unmodelled heterogeneity may produce considerable bias in estimates of abundance.

4 Mark-Recapture Distance Sampling

4.1 Modelling the Detection Function

Mark-recapture distance sampling (MRDS; Borchers et al. 2006; Laake and Borchers 2004) is similar to MCDS, in that probability of detection is modelled as a function of both distance from the line or point and other covariates. However, we now remove the assumption that detection at the line or point is certain (g(0) = 1). Burt et al. (2014) provide a non-technical explanation of MRDS methods.

Unfortunately, conventional distance sampling data do not provide information to allow estimation of g(0), and so additional data are needed. We consider several ways in which the additional data might be collected below. Each method generates a form of mark-recapture data, which allows probability of detection to be estimated, without having to assume g(0) = 1. A disadvantage of this approach is that the pooling robustness property (Sect. 5.1.1) does not hold; in common with standard mark-recapture methods, unmodelled heterogeneity in probability of detection generates positive bias in the estimated probability of detection and negative bias in abundance estimates. For this reason, we do not consider MRDS models with no covariates other than distance.

One option for generating the mark-recapture data that we need is to have two observers, searching the same area independently but simultaneously. If one observer detects an animal, then we can conceptualize this as a trial, in which the second observer either ‘recaptures’ (i.e. detects) the animal or not. Thus we need to be able to identify ‘duplicate’ detections: those animals detected by both observers.

In the above arrangement, there is a symmetry: conceptually, each observer sets up trials for the other. For line transect sampling especially, we might instead have one observer setting up the trials (e.g. by searching further ahead with optical aids and tracking animals in, Hammond et al. 2002), and the other carrying out standard line transect search. A tracked animal that is detected by the second observer is a recapture.

Mark-recapture trials might be set up by having identifiable animals at known locations (e.g. radio-tagged animals), and recording whether a single observer detects these animals. This might be during the course of the main survey, or by setting up trials, so that the observer is directed along a route or to a point such that the known-location animal is within detection range. A model may then be fitted to the trial data, predicting the probability that a given animal is detected as a function of distance and of other relevant covariates, such as animal behaviour, cluster size for clustered populations, weather conditions, observer, etc.

In MRDS, it is convenient to scale the function g(y, z) so that g(0, z) = 1 by definition. We then denote the detection function by p(y, z), which represents the probability that an animal at distance y from the line or point and with covariates z is detected. Thus p(0, z) ≤ 1.

When there are two observers, we need to distinguish between their respective detection functions. We denote the detection function for observer 1 by p 1(y, z), and that for observer 2 by p 2(y, z). We will also need further detection functions: p 1 | 2(y, z) is the probability that observer 1 detects an animal at distance y and covariates z given that observer 2 detects it, and the converse: p 2 | 1(y, z). Finally, we need to define the probability that at least one observer detects the animal: \(p_{\cdot }(y,\mathbf{z})\).

These different detection functions are related as follows:

$$\displaystyle\begin{array}{rcl} p_{\cdot }(y,\mathbf{z})& =& p_{1}(y,\mathbf{z}) + p_{2}(y,\mathbf{z}) - p_{2}(y,\mathbf{z})p_{1\vert 2}(y,\mathbf{z}) \\ & =& p_{1}(y,\mathbf{z}) + p_{2}(y,\mathbf{z}) - p_{1}(y,\mathbf{z})p_{2\vert 1}(y,\mathbf{z})\ .{}\end{array}$$
(5.50)

Early models for MRDS assumed full independence. Under this assumption, Eq. (5.50) becomes

$$\displaystyle{ p_{\cdot }(y,\mathbf{z}) = p_{1}(y,\mathbf{z}) + p_{2}(y,\mathbf{z}) - p_{1}(y,\mathbf{z})p_{2}(y,\mathbf{z})\ . }$$
(5.51)

Unfortunately, this approach is typically biased due to unmodelled heterogeneity in the probabilities of detection. Even though the approach incorporates covariates z in an attempt to model heterogeneity, the bias from this source is often greater than the bias that arises by assuming that detection at the line or point is certain. Laake (1999) realised that a weaker assumption is to assume independence at the line or point only, termed point independence. At the line or point, probability of detection is typically high, and heterogeneity consequently less problematic. Under point independence, we have:

$$\displaystyle{ p_{\cdot }(0,\mathbf{z}) = p_{1}(0,\mathbf{z}) + p_{2}(0,\mathbf{z}) - p_{1}(0,\mathbf{z})p_{2}(0,\mathbf{z})\ . }$$
(5.52)

For y > 0, we can write

$$\displaystyle\begin{array}{rcl} p_{\cdot }(y,\mathbf{z})& =& p_{1}(y,\mathbf{z}) + p_{2}(y,\mathbf{z}) - p_{12}(y,\mathbf{z}) \\ & =& p_{1}(y,\mathbf{z}) + p_{2}(y,\mathbf{z}) -\delta (y,\mathbf{z})p_{1}(y,\mathbf{z})p_{2}(y,\mathbf{z})\ .{}\end{array}$$
(5.53)

where

$$\displaystyle{ p_{12}(y,\mathbf{z}) = p_{1\vert 2}(y,\mathbf{z})p_{2}(y,\mathbf{z}) = p_{2\vert 1}(y,\mathbf{z})p_{1}(y,\mathbf{z}) }$$
(5.54)

is the probability that both observers detect an animal at distance y and with covariates z. Hence we can write

$$\displaystyle{ \delta (y,\mathbf{z}) = \frac{p_{12}(y,\mathbf{z})} {p_{1}(y,\mathbf{z})p_{2}(y,\mathbf{z})} = \frac{p_{1\vert 2}(y,\mathbf{z})} {p_{1}(y,\mathbf{z})} = \frac{p_{2\vert 1}(y,\mathbf{z})} {p_{2}(y,\mathbf{z})} }$$
(5.55)

Thus we now need to specify a model for δ(y, z), such that δ(0, z) = 1, and we expect δ(y, z) ≥ 1 for y ≥ 0.

If we take the above point independence argument to its logical conclusion, then it is clear that independence holds in the limit as probability of detection tends to one; there can be no heterogeneity in probability of detection if all probabilities are one. This is the concept of limiting independence (Buckland et al. 2010). In principle, this allows estimation with lower bias, but in practice, the data often provide too little information to allow such models to be fitted reliably. However, the method does provide a framework for assessing formally whether a point independence model or a full independence model provides an adequate fit to the data, for example by comparing AIC values for models with full independence, point independence and limiting independence (Sect. 5.4.2.2).

When formulating the likelihood component corresponding to the detection function for MCDS, we found it convenient to condition on the covariates z, as we did not know the distribution of z in the population. The same applies here, and so we again have

$$\displaystyle{ \mathcal{L}_{y\vert z} =\prod _{ i=1}^{n}f_{ y\vert \mathbf{z}}(y_{i}\vert \mathbf{z}_{i})\, }$$
(5.56)

which we can write as

$$\displaystyle{ \mathcal{L}_{y\vert z} =\prod _{ i=1}^{n}\frac{p_{\cdot }(y_{i},\mathbf{z}_{i})\pi (y_{i}\vert \mathbf{z}_{i})} {E(p_{\cdot }\vert \mathbf{z}_{i})} }$$
(5.57)

where \(E(p_{\cdot }\vert \mathbf{z}_{i}) =\int p_{\cdot }(y,\mathbf{z}_{i})\pi (y\vert \mathbf{z}_{i})\,dy\), and \(\pi (y_{i}\vert \mathbf{z}_{i})\) is the conditional probability density function of y given z i , evaluated at y i . Given a randomized survey design, we can generally assume that \(\pi (y\vert \mathbf{z}_{i}) \equiv \pi (y)\), independent of the covariates z i ; then we have \(\pi (y) = 1/w\) with 0 ≤ y ≤ w for line transect sampling and \(\pi (y) = 2y/w^{2}\) for 0 ≤ y ≤ w for point transect sampling.

However, we now also have a mark-recapture component to the likelihood. We denote the capture history of an animal by the vector \(\boldsymbol{\omega }\), comprising two elements, each of which is zero or one. For an animal detected by observer 1 but not observer 2, \(\boldsymbol{\omega } = (1,0)\), while for an animal detected by observer 2 but not observer 1, \(\boldsymbol{\omega } = (0,1)\). For an animal detected by both observers, \(\boldsymbol{\omega } = (1,1)\). Note that the capture history \(\boldsymbol{\omega } = (0,0)\) is not observed, and so does not appear in the likelihood. We can now write

$$\displaystyle{ \mathcal{L}_{\omega } =\prod _{ i=1}^{n}\mathrm{Pr}\left (\boldsymbol{\omega }_{ i}\vert \mathrm{detected}\right ) =\prod _{ i=1}^{n} \frac{\mathrm{Pr}\left (\boldsymbol{\omega }_{i}\right )} {p_{\cdot }(y_{i}\vert \mathbf{z}_{i})} }$$
(5.58)

where

$$\displaystyle\begin{array}{rcl} & \mathrm{Pr}\left (\boldsymbol{\omega }_{i} = (1,0)\right ) = p_{1}(y,\mathbf{z})\left (1 - p_{2\vert 1}(y,\mathbf{z})\right )\, & {}\\ & \mathrm{Pr}\left (\boldsymbol{\omega }_{i} = (0,1)\right ) = p_{2}(y,\mathbf{z})\left (1 - p_{1\vert 2}(y,\mathbf{z})\right )\, & {}\\ & \mathrm{Pr}\left (\boldsymbol{\omega }_{i} = (1,1)\right ) = p_{1}(y,\mathbf{z})p_{2\vert 1}(y,\mathbf{z}) = p_{2}(y,\mathbf{z})p_{1\vert 2}(y,\mathbf{z})\ .& {}\\ \end{array}$$

By substituting these expressions into Eq. (5.58) and rearranging, we obtain:

$$\displaystyle{ \mathcal{L}_{\omega } = \left \{\prod _{i=1}^{n}\left [\frac{p_{2}(y_{i}\vert \mathbf{z}_{i})} {p_{\cdot }(y_{i}\vert \mathbf{z}_{i})} \right ]^{\omega _{i2}}\left [\frac{p_{1}(y_{i}\vert \mathbf{z}_{i})\left (1 - p_{2\vert 1}(y_{i}\vert \mathbf{z}_{i})\right )} {p_{\cdot }(y_{i}\vert \mathbf{z}_{i})} \right ]^{1-\omega _{i2} }\right \} }$$
$$\displaystyle{ \times \left \{\prod _{i=1}^{n_{2} }p_{1\vert 2}(y_{i}\vert \mathbf{z}_{i})^{\omega _{1i}}\left (1 - p_{1\vert 2}(y_{i}\vert \mathbf{z}_{i})\right )^{1-\omega _{i1}}\right \} }$$
(5.59)

where n 2 is the number of animals detected by observer 2, and ω 1i and ω 2i are the elements of \(\boldsymbol{\omega }_{i}\), with ω 1i = 1 if animal i was detected by observer 1 and ω 1i = 0 otherwise, and similarly for ω 2i. The first product is the likelihood arising from whether or not observer 2 detected each of the n recorded animals, given that at least one of the observers detected them, and the second term is the likelihood arising from whether or not observer 1 detected the n 2 animals detected by observer 2, given that observer 2 detected them.

We can now specify models for the detection functions, assuming full, point or limiting independence, and fit them, for example by maximizing the product of likelihoods \(\mathcal{L}_{y\vert z} \times \mathcal{L}_{\omega }\). See Laake and Borchers (2004) and Borchers et al. (2006) for full details. As noted by Burt et al. (2014), if we wish to use a full independence model, we could base inference on the mark-recapture component \(\mathcal{L}_{\omega }\) alone, including distance as a covariate.

4.2 MRDS Case Studies

4.2.1 Antarctic Crabeater Seals

To illustrate fitting a detection function using double-observer (i.e. mark-recapture distance sampling) methods, we consider a crabeater seals dataset collected during an Antarctic survey. The data were analysed by Borchers et al. (2006) to show use of covariates to model heterogeneity in mark-recapture distance sampling.

The survey used a helicopter to fly line transects over the ice. Front and rear observation platforms operated simultaneously and independently. There were two observers on each platform, one on each side of the helicopter. Detection distances up to 800 m were recorded. However, the flat windows of the helicopter prevented effective searching within 100 m of the line. The data have therefore been left-truncated, by removing all detections recorded as < 100 m, and subtracting 100 m from the remaining distances. Thus we take \(w = 800\mbox{ \textendash }100 = 700\) m.

Several covariates in addition to distance were available to model detectability. These included factor covariates observer identity (obs, 12 levels), group size (size, three levels), visibility (vis, three levels), ice cover (ice, three levels), platform (two levels), presence of glare (glare, two levels) and helicopter side (side, two levels). Continuous covariates observer experience (exp), time in minutes since start of flight (ssmi) and time in minutes on effort since last rest (fat) were also recorded.

The detection distances are shown in Fig. 5.20. In total, 1740 unique groups were recorded. The front platform detected 1394 groups and the rear platform detected 1471; 1125 groups were detected by both platforms. The distance data appear well-behaved, showing a smoothly decreasing number of detections with distance from the transect line.

Fig. 5.20
figure 20figure 20

Perpendicular distances to detected groups of crabeater seals collected during an Antarctic survey

A standard line transect sampling analysis assuming g(0) = 1 was carried out. The half-normal model was preferred over the hazard-rate model, and gave a point estimate of 2976 seal groups within the covered strips.

If we ignore the distance data and the other covariates, the two-sample Lincoln-Petersen capture–recapture model yields a much lower estimate: (1471 × 1394)∕1125 = 1822 seal groups. This illustrates the danger from unmodelled heterogeneity when using a mark-recapture approach. Even though we have used a method that does not assume that all animals on the line are detected, we obtain a much smaller abundance estimate than was found using distance sampling and assuming certain detection on the line: g(0) = 1.

Note that estimates given by Borchers et al. (2006), although said to be estimated numbers of seals, are in fact estimates of the number of seal groups. The mean observed group size was 1.18 animals. There are also some minor differences in AIC values between our results and those reported by Borchers et al. (2006). The data used here are available on the book website and the results obtained can therefore be easily reproduced using the code provided.

We next explicitly model the double-platform data. If we ignore any available covariates apart from distance, and fit a full independence model (Fig. 5.21), the point estimate for the number of groups in the covered area is 1918 (Table 5.10). We have substantial underestimation compared with the CDS estimate, again suggesting a severe problem from unmodelled heterogeneity. When a point independence model is used (Fig. 5.22), the estimated number of groups increases to 3012, which is in much better agreement with the CDS estimate. Perhaps not surprisingly, AIC strongly favours the point independence model over the full independence model. Also not surprisingly given the fits shown in Figs. 5.21 and 5.22, the goodness-of-fit tests show a significant lack of fit for the full independence model, while the point independence model provides an adequate fit. Also, the full independence model gives the lowest standard error: we gained precision at the expense of large bias.

Fig. 5.21
figure 21figure 21

Fitted detection functions under the full independence model without covariates. Top left: estimated detection function for platform 1, scaled so that g(0) = 1, plotted on a scaled histogram of distances to seal groups detected from platform 1. Top middle: equivalent plot for platform 2. Top right: estimated detection function for the two platforms combined, scaled so that g(0) = 1, plotted on a scaled histogram of distances to seal groups detected by at least one platform. Bottom left: estimated probability that a seal group is detected from both platforms as a function of distance, plotted on a scaled histogram of distances to duplicate detections. Bottom middle: estimated probability that a seal group detected from platform 2 is also detected from platform 1, plotted on a histogram showing proportion of seal groups detected from platform 2 that were also detected from platform 1. Bottom right: equivalent plot for platform 2 given platform 1

Table 5.10 The crabeater seal double-observer results
Fig. 5.22
figure 22figure 22

Fitted detection functions under the point independence model without covariates. See Fig.  5.21 for explanation of the plots

For a full independence analysis, the estimated detection function for platform 1, conditional on detection from platform 2, and vice versa (Fig. 5.21, bottom right and middle plots) are the same as the unconditional detection functions for platforms 1 and 2 (Fig.  5.21, top left and middle plots). For the point independence models, the conditional detection functions differ from the unconditional plots (Fig. 5.22, bottom right and middle plots compared with top left and middle plots). The distance sampling component estimates the probability of detection of the pooled detections (Fig. 5.22, top right plot), while the mark-recapture component defines the height of the distance sampling detection function (which would equal 1 by assumption under CDS, corresponding to distance zero).

Point independence models are strongly favoured over full independence ones. We can investigate which covariates help to explain heterogeneity in detection probabilities. Using a stepwise approach, and including at each step the covariate that gives the biggest reduction in AIC, we obtain a model which includes all the covariates available except ssmi; adding ssmi too gives a slight increase in AIC (Table 5.10). It is reassuring to see that the results are insensitive to the specific point independence model considered, despite large differences in AIC. Given that the point estimate of p(0) for both platforms combined is 0.99 (Table 5.10), this is unsurprising, as we can expect pooling robustness (Sect. 5.1.1) to ensure insensitivity, when p(0) is very close to one. This ties in with the concept of limiting independence (Sect. 5.4.1): the idea that in the limit as probability of detection tends to one, we can assume independence of detections. We illustrate this concept in the next example.

4.2.2 Minke Whales in the North Sea

We use the northern minke whale (Balaenoptera acutorostrata) data analysed by Buckland et al. (2010) to illustrate limiting independence models. The second Small Cetacean Abundance in the North Sea and adjacent waters (SCANS II) survey was a multinational survey conducted in 2005 by ship and aircraft. Double-observer line transect survey methods were used because for many species, detection of animals on the trackline was expected to be less than unity. Here we analyse only shipboard survey data on minke whales. The methods used in the SCANS surveys were designed to break up the dependence between the two observers, by ensuring that they are not simultaneously searching the same patch of sea. A tracker scans with high-powered binoculars well ahead of the ship, and tracks detected animals in, to check whether the primary platform, searching with hand-held binoculars and naked eye, detects them.

Using a truncation distance of 700 m, the tracker detected 54 minke groups totalling 62 animals, while the primary platform detected 57 groups totalling 59 animals; 17 groups (19 animals) were detected by both tracker and primary platform.

We take as our full model:

$$\displaystyle{ p_{j}(y,z) = \frac{\exp \left (\lambda _{0j} +\lambda _{1j}y +\lambda _{2j}z\right )} {1 +\exp \left (\lambda _{0j} +\lambda _{1j}y +\lambda _{2j}z\right )} }$$
(5.60)

where j = 1, 2 indicates platform, and

$$\displaystyle{ \delta (y,z) = L(y,z) +\delta _{0}(y,z)\left [U(y,z) - L(y,z)\right ] }$$
(5.61)

where

$$\displaystyle{ L(y,z) =\max \left \{0, \frac{p_{1}(y,z) + p_{2}(y,z) - 1} {p_{1}(y,z)p_{2}(y,z)} \right \}\, }$$
(5.62)
$$\displaystyle{ U(y,z) =\min \left \{ \frac{1} {p_{1}(y,z)}, \frac{1} {p_{2}(y,z)}\right \}\, }$$
(5.63)

and

$$\displaystyle{ \delta _{0}(y,z) = \frac{[1 - L(y,z)]\exp (\alpha +\beta y)} {[U(y,z) - 1] + [1 - L(y,z)]\exp (\alpha +\beta y)}\ . }$$
(5.64)

The covariate z is scalar in this example, and represents sea state (Beaufort).

We can use maximum likelihood methods to fit this full model, and a range of reduced models, with λ 2j = 0 for j = 1, 2 (models with no covariate z), with β = 0 (also a limiting independence model, but with less flexibility), with α = 0 and β ≠ 0 (a point independence model), and with \(\alpha =\beta = 0\) (a full independence model). We can also remove the dependence of the λ’s on j, so that the same detection function model is assumed for both platforms. We return to this case study in Sect. 6.4.4.4, where we show abundance estimates under each model.

4.2.3 Double-Observer Point Transects for Songbirds

In the same way that perfect detectability on the transect line can be violated, so too can animals be missed at distance zero from point transects. Double-observer methods can be employed with point transects to overcome this violation of classical point transect estimation. The analysis framework for double-observer point transects is the same as for line transects. There is a distance sampling element and a mark-recapture component to the likelihood. Covariates are included in the mark-recapture component so as to account for heterogeneity in the detection process. There are also models that accommodate full and point independence between the observers.

Double-observer methods for point transect data that incorporate only distance as a covariate do not adequately model heterogeneity in the detection process and hence tend to produce estimates of density with negative bias, as occurs when g(0) < 1 but is assumed to equal one (Laake et al. 2011).

In songbird surveys conducted by Collier et al. (2013), 453 point transects were sampled with a fixed truncation distance of 100 m and a 5-min sampling visit to each point during spring of 2011 on the Fort Hood military base in Texas. Aural detections of singing males were recorded in binned distance categories. Data from endangered golden-cheeked warblers (Setophaga chrysoparia) and black-capped vireos (Vireo atricapilla) were recorded.

Several candidate models were fitted to the two datasets; we present a subset of models fitted for the black-capped vireos. Full independence models (with only a mark-recapture component) and point independence models were fitted to the black-capped vireo data. For point independence models, hazard-rate and half-normal key functions with no covariates were considered. In both cases, distance and observer were included as covariates in the mark-recapture model. For the mark-recapture model, main effects of observer and distance along with an interaction between observer and distance were considered in the candidate model set (Table 5.11).

Table 5.11 Model selection results for double-observer point transect survey of black-capped vireos

The goodness-of-fit χ 2 statistic for the selected model (half-normal,distance×observer) shown in Fig. 5.23 was χ 2 = 0. 59 with 2 df (p = 0. 75) for the distance sampling component and χ 2 = 7. 88 with 4 df (p = 0. 10) for the mark-recapture component.

Fig. 5.23
figure 23figure 23

Unconditional detection function for pooled observers fitted to point transect data of black-capped vireo males from Fort Hood in 2011. The detection function was point independence and mark-recapture model with main effects of distance and observer, with an interaction of distance and observer. Note that the y-intercept corresponds to \(\hat{g}(0) = 0.81\). From Collier et al. (2013)