Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

As methods developers and data analysts, we often see data long after they have been collected. In some instances, no amount of clever analysis can retrieve a study that has fallen foul of poor field methods. In this chapter, we define what is meant by ‘poor field methods’, and by contrast, what constitutes ‘good’ field methods; and we describe means by which ‘poor’ field methods can be diagnosed and remedied before data collection is completed.

There are two aspects to fieldwork: preparation before entering the field, and data collection. Allocation of sampling effort that fails to respect the tenets of design-based inference (i.e. locations sampled are representative of locations over which inference is to be drawn) will produce flawed results.

Difficulties will also arise if the design assumption that animals are distributed independently of the line or point (see Sect. 1.7) is not met. For these reasons, sampling for example near roads or easily accessible areas is likely to result in biased estimates of abundance. If a suitably randomized scheme is not possible, investigators should assess the potential for bias arising from failure of this assumption prior to commencing data collection.

Assumptions exist to make data analysis easier. The need to meet key assumptions necessitates careful attention during data collection. As we will see in Chap. 11, when assumptions cannot be met because of challenges in data collection, assumptions can be exchanged for the collection of additional data.

1 Field Methods

Field methods employed during data collection should be developed with the three key model assumptions of Sect. 1.7 in mind: that animals on the line or point should be detected with certainty, that animals should be detected at their initial location before any movement, and that distances should be measured without error. It is important that observers are trained to understand these assumptions, and how to ensure that they are met.

To ensure that all animals at or very close to the line or point are detected, the observer might move slowly and quietly (line transects), or remain longer at the point (point transects). Another option is to have more than one observer, to increase probability of detection. Technology such as thermal imagers or acoustic sensors can help to ensure that this assumption is met. In marine surveys for example where observers have a tendency to search too far out at the expense of missing animals close to the line, training needs to stress the importance of not missing such animals. The principal aim of the observer is not to maximize the number of detections or to compete with other observers, but to ensure that animals at the line or point are not missed.

Animal movement, whether responsive or not, may be problematic. Field methods should seek to allow observers to detect animals before responsive movement occurs, or if the animal is detected because it flushes, then the location recorded should be where it flushed from. Non-responsive movement should be slow on average relative to the speed of the observer. This may conflict with the requirement that probability of detection for animals on the line or point is certain; the need for certain detection may require that observers move slowly, while animal movement may dictate that observers move quickly. If these conflicting needs cannot be reconciled, then more complex methods may be required, for example double-observer methods (Sect. 4.1.2.4) so that we do not have to assume that animals on the line or point are certain to be detected.

Technology has reduced the difficulties in estimating distances accurately. We will return to this issue in the appropriate sections below.

1.1 Point Transect Sampling

Errors in estimating distances generate appreciably greater bias for point transect sampling than for line transect sampling (Sect. 11.3). It is important therefore to give observers effective training, and laser rangefinders should always be provided. Even for surveys in which most detections are aural, a laser rangefinder is invaluable in measuring the distance to an estimated location.

Point transect sampling is mostly used for songbird surveys. On arriving at a point, typically observers are trained to allow some time for birds to resume normal behaviour after the initial disturbance, and then to record for a fixed time, remaining at the point. Observer training is straightforward for this approach, but it has several difficulties. First, distance sampling is a snapshot method; conceptually, animals are assumed to be frozen at their initial location while the survey takes place. For line transect sampling, non-responsive movement (i.e. movement independent of the observer) generates little bias provided that movement is slow relative to the speed of the observer (Sect. 11.4). However, for point transect sampling, any movement during a count is problematic. Second, in closed habitats, most detections of songbirds tend to be aural, and for an observer who remains at the point, estimating the distance to such detections is difficult (Alldredge et al. 2008). Third, it may prove difficult to keep track of detected birds for the duration of the count, so that double-counting may occur, generating upward bias in density estimates. Fourth, the length of the count is a compromise in the hope that downward bias arising from failure to detect all birds at the point cancels with the upward bias from bird movement and double-counting. In multi-species surveys, the appropriate length of count will vary by species, and will be unknown.

Buckland (2006) proposed a snapshot field method. In it, the observer arrives at a point, and takes several minutes ahead of a snapshot moment to assess what is on the plot. The snapshot moment might be defined by setting a timer on arrival at the point. The observer then records the positions of birds at the snapshot moment. He or she may take as long as is necessary after the snapshot moment to confirm locations, and if a location cannot be confirmed, the bird is not recorded. After the snapshot moment, the observer may move up to say 15 m from the point, to help confirm locations, and to improve estimates of distances of birds from the point. This enables the observer to use triangulation to identify locations of birds that are only heard, or allows the observer to change his or her viewpoint, which may help in locating detected birds.

This snapshot method requires more training, so that observers fully understand what they need to achieve, and how best to use the greater flexibility. It assumes that a bird at or very near the point at the snapshot moment is certain to be located and recorded. If this is problematic, cue counting (Sect. 9.4) or mark-recapture distance sampling (Sects. 5.4.1 and 6.4.4) should be considered.

1.2 Line Transect Sampling

1.2.1 Terrestrial Surveys

The key design assumption of distance sampling is that lines are placed independently of animal locations. It is often difficult to achieve this in terrestrial surveys. A vehicle may be preferred to increase observer speed (thus reducing bias from non-responsive movement of animals) and sample size. However, this may restrict transect lines to roads and tracks (Sect. 11.1). Densities along roads and tracks may be unrepresentative, for example because of greater disturbance, or because they avoid less accessible terrain and habitats, or because the roads and tracks themselves create open space and edge habitat that may attract animals. It may prove necessary to do additional off-track transects by foot, to allow calibration of density estimates from track transects. In this case, overall precision might be greater by conducting the whole survey on foot, at the expense of less coverage and smaller samples, due to the imprecision in estimating the calibration factor for correcting track density estimates.

Access issues will also arise for surveys conducted on foot, either due to topography or because access is restricted. If it is impractical to conduct surveys along random transects, point transect sampling should be considered as an alternative: it is easier to reach a random point by the easiest or most accessible route than to follow the route of a random line, so that the ideal design is less compromised.

When detection distances are very small (e.g. 1–2 m as occurs in dung (Sect. 9.3) and plant (Sect. 10.7) surveys), it is important to have a marked line to measure to, as otherwise, there is a tendency to record short distances as zero. If a high proportion of detection distances is recorded as zero, reliable analysis of the data is not possible. A common strategy is for one person to pull a rope, or lay down a biodegradable thread, while another searches for objects and measures accurate distances.

When detection distances are larger, it is less important to have a marked line, but there should still be a means of locating the line reasonably accurately. This might be achieved for example by one fieldworker identifying the bearing, while another walks ahead, searching for animals. This helps to avoid bias for example when an observer deviates from the ideal transect to skirt around less accessible habitat; distances of detected animals should then be from the ideal line, not from the route taken by the observer.

Often with foot surveys, target animals hear the observer approaching, and slip away undetected. In these circumstances, field methods should be developed to avoid this as much as possible. For example in closed habitats, transects might be cut in advance, to allow quieter and faster passage of the observers. In such cases, the cut should be minimal, so as not to create an obvious path and hence greater disturbance. Further, observers cannot then see along the transect to a great distance, which would create greater bias from non-responsive animal movement, as animals crossing the transect well ahead of the observer might be spotted and recorded as on the line.

Laser rangefinders should be considered an essential item of equipment for most terrestrial surveys.

1.2.2 Aerial Surveys

Special issues arise with aircraft surveys because of the speed of travel. On the positive side, non-responsive animal movement is slow relative to the speed of the observer, so that movement bias is not usually a problem. However, responsive movement may be a problem, if animals have time to move away from the path of the aircraft before being detected and recorded by observers who may have a restricted sideways-only view.

Because of the speed of the platform, it can be difficult to estimate or measure distances of detected animals from the line, and the task may become impossible in high-density areas. If the terrain is relatively flat, this can be addressed by placing markers on the wing struts, together with markers on the window (see Fig. 7.9 of Buckland et al. (2001)). When the observer aligns these markers, they define strips of known width and known distance from the transect. Observers then simply count the number of animals detected in each strip, and analysis of grouped distance data is carried out (Sect. 5.2.2.2). Optical clinometers can also be used to measure vertical sighting angle. Vertical sighting angle along with aircraft altitude can be trigonometrically converted to perpendicular distance from the transect.

Visibility below the aircraft is usually poor. Some aircraft are designed with bubble and/or belly windows for good downward visibility, as illustrated in Fig. 7.14 of Buckland et al. (2001). Without such aircraft, the transect is usually considered to be offset to a distance where visibility is good, for which it is reasonable to assume that probability of detection is certain. If search is conducted from both sides of the plane, there is thus a gap down the centre of the surveyed strip (Fig. 7.8 of Buckland et al. (2001)). Helicopters offer some advantages for aerial surveys because they give less obstructed views forward and below the aircraft, thereby reducing the need for an offset, as well as reducing the difficulties of responsive movement if the observer is able to detect animals before they respond to the aircraft.

If some animals are missed even if they are on the (possibly offset) transect line, double-observer methods (Sect. 4.1.2.4) may be used, which allows estimation of the detection function without having to assume certain detection at any distance (Sect. 5.4.1). Becker and Quang (2009) take this a step further. They do not offset the line, but model the fall-off in detectability under the aircraft.

Increasingly, visual observers are being replaced by high-resolution imagery (Sect. 10.2.2.2), together with strip transect methods (Sect. 6.2.2.1) for analysis. This allows the aircraft to fly higher, causing less disturbance, and fewer problems in areas with complex topography. Further, the data can be validated in a way that visual counts cannot. Technology is likely to result in substantial changes to how aerial surveys are conducted in the future (Sect. 12.2). It seems likely that many studies will soon be conducted using high-resolution imagery from drones (‘unmanned aerial vehicles’).

1.2.3 Shipboard Surveys

Shipboard surveys are typically costly, and so ship time should be used to full effect. This generally means having a number of observers, perhaps split into two teams, so that continuous search effort can be conducted during daylight hours. There should be sufficient observers on duty at any one time to ensure, if possible, that all animals close to the line are detected. If this is not feasible, then double-observer methods should be used (Sect. 4.1.2.4).

Animals often respond to a vessel, sometimes avoiding it and sometimes being attracted to it. Field methods should seek to ensure that such animals are detected before they respond. This might mean carrying out a behavioural study as part of a pilot study, to assess over what distance animals respond. This may reveal that observers should search with hand-held binoculars rather than by naked eye, or even that large tripod-mounted binoculars (see Fig. 7.5 of Buckland et al. (2001)) should be used. Such binoculars are especially effective if animals are more-or-less continuously visible, such as occurs with larger schools of dolphins, as they allow a wider strip to be searched, thus increasing sample size. For animals only intermittently available for detection, such as whales, porpoise or small groups of dolphin, or diving seabirds, the narrow field of view can result in animals close to the trackline being missed. Further, given the narrow field of view, it may be necessary to have multiple observers. They should be trained to ensure that their combined search pattern is effective, which may require some degree of overlap in search area of the observers.

Sighting conditions (e.g. sea state, glare) should be routinely recorded. It may prove necessary to discard data recorded in poor conditions. Porpoise for example can easily pass undetected even when very close, and the encounter rate drops dramatically in high sea states. If double-observer methods are used, such covariates can be included in the models to reduce bias arising from heterogeneity in detection probabilities. Other covariates that might be recorded for possible inclusion in the detection function model include observer identity, animal behaviour, and group size for clustered populations.

Laser rangefinders cannot measure distances on water, unless there is an object to hit, but they are still very useful in marine surveys for carrying out distance estimation experiments, or to give observers feedback on their ability to estimate distances, by measuring distances to floating objects. Reticles in binoculars are often the most effective way of estimating distances to detected animals at sea. These measure the distance down from the horizon, which is converted to distance from the observer (Buckland et al. 2001, pp. 256–258). These observer-to-animal distances r can be converted to distances x of detected animals from the transect provided that the sighting angle θ is recorded (Fig. 1.1). If angles are estimated by eye, considerable rounding can be expected. Especially problematic is that angles close to zero tend to be rounded to zero, resulting in a perpendicular distance of zero. Such systematic rounding to zero makes it very difficult to identify and fit a good model for the detection function. If high-powered tripod-mounted binoculars are used, angle rings on the tripod are a simple and accurate way to estimate angles — though many observers still have a tendency to round to the nearest 5, unless trained not to! Otherwise, angle boards (see Fig. 7.10 of Buckland et al. (2001)) may be used: an arrow on the board is pointed at the location of a detected animal, and the corresponding angle read. Observers should be trained to record the angle accurately, not to round; even though there is imprecision in aligning the arrow, discouraging rounding of the resulting angle prevents systematic rounding of small angles to zero.

Surveys conducted from small boats are considerably less costly than shipboard surveys, but also have greater restraints on number of observers, on where they can operate, and on how far from the line observers can search.

1.2.4 Double-Observer Surveys

Double-observer methods are useful for when detection of animals on the line is uncertain, as is the case for example with cetaceans. Double-observer data can be analysed using mark-recapture distance sampling methods (Sects. 5.4.1 and 6.4.4). The two observers (or in some cases, the two teams of observers) correspond to the two sampling occasions of mark-recapture. In two-sample mark-recapture, the sampling occasions are ordered: animals are caught and marked in the first sample, and any recaptures occur in the second sample. By contrast, each observer in double-observer methods can fulfil either role. Whether we wish to exploit this symmetry depends on circumstances.

One strategy is to have observer 1 search an area first, setting up trials, and then to have observer 2 search that area, and to record which animals detected by observer 1 are ‘recaptured’. This might be achieved by having observer 1 search with powerful tripod-mounted binoculars from a ship, while observer 2 searches with hand-held binoculars or naked eye, so that observer 1 has finished searching an area before observer 2 searches it. Another strategy is to have two aircraft flying in tandem, with one observer in each. By separating the two areas of search, any dependence across the two observers (caused for example by an animal behaving in a way that makes it more easily detectable to both observers) is weakened or even removed (Sect. 6.4.4.4). Because such heterogeneity in the probability of detection of different animals generates strong bias in mark-recapture estimates if it is not modelled, this is potentially an important advantage.

If we have the two observers searching the same area simultaneously, we generate more data (because either observer can ‘recapture’ an animal that was ‘marked’ by the other), and it is easier to identify animals that are detected by both observers than when they search the same area at different times. The cost however is the need to model the heterogeneity in the probabilities of detection, or to adopt more complex analysis methods to reduce bias from such heterogeneity.

Whichever approach is adopted, strategies are needed to allow reliable identification of duplicate detections (those detected by both observers). For shipboard surveys in which observer 1 searches further ahead of the ship than does observer 2, observer 1 stops searching after detecting an animal, and attempts to track it in. Although observer 2 should be unaware of any detections made by observer 1, observer 1 can be informed of detections made by observer 2, to allow a judgement of whether observer 2 detects the tracked animal. This is so-called one-way independence.

In the case of tandem aircraft, it is not possible to track a detection, so a model allowing for uncertainty in identifying duplicate detections can be adopted (Hiby and Lovell 1998).

Secondary information should be recorded on detected animals to aid the identification of duplicate detections. This might include recording exact times of cues and animal behaviour, in addition to the usual distance sampling data: distance from the line and size of group in the case of clustered populations.

For surveys of whales, a survey of cues (whale blows) is sometimes preferred. If some whales are unavailable because they do not surface while in detection range, then whale-based methods are biased unless availability is estimated or modelled, whereas cue-based methods do not assume that all whales are available. Also, for double-observer surveys, exact times of blows can be recorded, making the task of identifying duplicate cues much easier than the equivalent task of identifying animals detected by both observers in whale-based surveys. However, the disadvantage is that cue rate must be estimated, and this can be problematic (Sect. 9.4).

2 Data Issues

2.1 Data Recording

In every survey, there is a compromise to be found between the bare minimum and information overload. The bare minimum comprises a record of effort, together with, for each animal detected, an estimate of its distance from the line or point. If an estimate of abundance in the study area is required, rather than just an estimate of mean density, then the size of the study area is required. This has the virtue of simplicity, but limits analysis options.

Effort is line length multiplied by number of visits to a line for line transect sampling, and number of visits to a point for point transect sampling. If only one side of a line is surveyed, the effort is halved. Similarly, if only a sector of a circle around a point is surveyed, then effort is multiplied by θ∕(2π) where θ is the sector angle in radians.

Very often, there is interest in modelling animal density as a function of location, habitat or other variables. When initiating new surveys, the objectives may be more basic than this, but objectives often change over time. Hence it is worth considering whether it is practical to collect data to allow a spatial modelling approach later. Thus plot locations should be recorded (e.g. grid reference of a point, or of the two endpoints of a line), together with animal locations. Thus in addition to recording the distance of a detected animal from a line or point, we might record in the case of line transect sampling which side of the line the animal was located, and the animal’s position along the line. (Distances along the line can also be useful for exploring density gradients in the study area.) For point transect sampling, we might record the bearing of the animal from the point together with its distance.

Whether for spatial modelling or modelling the detection function, it is worth considering what covariates are worth recording. Covariates may be static (i.e. do not vary over time, such as altitude) or dynamic (vary over time, such as sea state). Further, covariates might be associated with individual detections (such as animal behaviour, cluster size, gender, species) or with effort (such as observer, sea state, habitat), or they might be spatial covariates available throughout the study area (such as altitude or sea surface temperature available from satellite data).

Careful consideration should be given to how data are recorded. Paper recording forms might be used, or data might be entered onto a computer or data entry device. If it is important that observers are not distracted from searching, but there is no dedicated data recorder, voice-sensitive recording devices might be used, to allow observers to record data without breaking off their search. This is especially useful for aerial surveys. If data are stored electronically, frequent back-ups should be made.

However data are recorded, time should be allocated at the end of each day to checking data, and perhaps to entering data onto a computer, if datasheets or voice recorders are used. In this way, if there are missing data, or errors in the data, corrections can be made while memories are fresh. More importantly, the data can be reviewed, to assess whether there are any problems (see Sect. 4.2.2). If there are, changes to field procedures might be implemented to resolve problems early, before too many data are compromised.

2.2 Data Checking

2.2.1 Heaping

Heaping in the data (rounding of distances, and possibly angles, to favoured values) is generally evident if distances are plotted in a histogram with a large number of intervals. If in line transect sampling, sighting distances and angles are recorded, from which perpendicular distances are calculated (see Fig. 1.1), rounding is more evident if sighting distances and angles are plotted in a histogram, rather than perpendicular distances. A common problem in shipboard surveys is rounding of angles to zero, which results in estimated perpendicular distances of zero.

The effects of heaping can be reduced by grouping distances into intervals, and analysing using the methods of Sect. 5.2.2.2 (line transect sampling) or 5.2.3.2 (point transect sampling). The intervals should be chosen such that as far as possible, all observations end up in the correct distance interval. This is achieved by ensuring that each favoured distance for rounding is roughly in the middle of a distance band, so that distances that belong to a different band are unlikely to be rounded to that distance.

Two examples of heaping in line transect data are shown in Fig. 4.1. In the left-hand plot, although there is clear rounding to the nearest 10 m, this has relatively little impact on estimation. If we analyse the data as if the distances are exact (Sect. 5.2.2.1; for point transect sampling, it would be Sect. 5.2.3.1), the effective strip half-width assuming a half-normal model is estimated to be \(\hat{\mu }= 35.6\) m, while if we group data into intervals of 0–5 m, 5–15 m, 15–25 m, 25–35 m, 35–55 m, we obtain an estimate of \(\hat{\mu }= 34.1\) m. If we instead assume a hazard-rate model, we obtain \(\hat{\mu }= 37.6\) m for exact data and \(\hat{\mu }= 38.3\) m for grouped data.

Fig. 4.1
figure 1figure 1

Two datasets showing strong heaping. The left-hand plot shows n = 49 distances from the line, and rounding to the nearest 10 m is evident. The right-hand plot shows n = 34 detection distances, of which eight were recorded as zero. The data are ‘spiked’, as a consequence of rounding to zero

The right-hand plot of Fig. 4.1 shows what can occur for example in shipboard surveys, where observers either have been poorly trained to record angles or have not been provided with aids to allow reliable estimation of angles (see Sect. 4.1.2.3). Now if we analyse the data as before, analyses under the half-normal model are still similar: \(\hat{\mu }= 36.0\) m for exact data and \(\hat{\mu }= 34.0\) m for grouped data. However, the fit of the model is poor. If we fit the more flexible hazard-rate model, it attempts to fit the spike in the data at zero distance, and as a consequence, we obtain very different estimates from those obtained under the half-normal model, and also very different estimates depending on whether we group the data or not: \(\hat{\mu }= 2.8\) m from exact data and \(\hat{\mu }= 21.5\) m from grouped data. If we tried different choices of group interval, we would find sensitivity to this choice. Reliable estimation from these data is not possible unless we somehow obtain information on the rounding process to allow us to correct for it.

2.2.2 Responsive Movement

Responsive movement is sometimes detectable from the data. For example if animals close to the line or point move away from the observer before detection, but remain in detection range, then there is a tendency to see excess detections at mid-distances, and too few detections at short distances. A good example of this for point transect sampling is shown in Fig. 10.2 However, if animals that respond to the observer flee beyond detection range, then there is unlikely to be any indication in the data of the problem. The result is an abundance estimate that is biased low.

Some animals are attracted towards the observer, for example dolphins approaching a ship to ride the bow wave, or a songbird investigating an intruder on its territory. An excess of detections at short distances relative to mid-distances may be indicative of this. If the fitted detection function falls more quickly with distance from the line or point than might be expected, attraction to the observer should be considered as a possible cause. For example, field experience might suggest that almost all animals at 20 m from the line or point should be detected, but the fitted detection function might give an estimated probability of detection of just 0.5 at 20 m.

2.2.3 Animals Missed on the Line or Point

In cases such as marine mammals or diving seabirds, or songbirds in dense canopy, there is no guarantee that all animals on the line or point will be detected. Standard distance sampling data give no indication of such a problem, and consequently, standard analysis will give estimates of abundance that are biased low. If it is thought that this occurs, double-observer methods might be adopted. If one observer detects an animal on the line or point that the other does not, this tells us that probability of detection is less than one. Further, we can use mark-recapture distance sampling methods to allow estimation without assuming that animals on the line or point are certain to be detected. Another solution to the problem is to adopt cue-based methods, which assume that the cue (e.g. a whale blow or a songburst for a bird), rather than the animal, is certain to be detected if it is very close to the observer. This assumption may also be relaxed by combining cue counting with double-observer methods. Double-observer field methods are discussed in Sect. 4.1.2.4, and analysis methods in Sect. 5.4 Further discussion on missing animals at the line or point appears in Sect. 11.2, and cue-based methods are addressed in Sect. 9.4.

2.2.4 Overdispersion

When distance data are plotted by distance interval in a histogram, overdispersion in the plotted frequencies may be apparent. This arises when individual detections are not independent. The effect is seldom strong enough to be detectable. An exception to this is when cue counting is conducted from points (Sect. 9.4.2). In this case, the animal may give several cues from the same location; as each cue is separately recorded, this results in multiple records at exactly the same distance from the point. An example of this is shown in Fig. 9.13 Even when we plot the data with just a few, wide distance intervals (Fig. 9.14), the shape of the histogram is still not as smooth as we would wish.

Another circumstance that gives rise to overdispersion is when animals occur in clusters, but are recorded individually. This may be the preferred option if clusters are spread out so that the location of the cluster centre is difficult to identify, or more distant animals in the cluster may be undetected. See Sect. 10.4.1

Analysis methods are very robust to failures of the independence assumption. However, when overdispersion is as extreme as in Fig. 9.13, goodness-of-fit tests cannot be used to assess the adequacy of the model fit, and AIC can be an unreliable guide (Sect. 9.4.2).

2.2.5 Biased Estimation of Cluster Size

For populations that occur in clusters, we assume that the size of detected clusters is recorded without error. We can plot recorded cluster size against distance from the observer to assess whether there is a relationship. However, a relationship may arise because larger clusters are easier to detect (an example of size-biased sampling) and so are over-represented at larger distances (in which case there will be a positive trend in size with distance), or because the size of more distant clusters is underestimated (which would lead to a negative trend in recorded size with distance). If we are lucky, these two sources of bias will approximately cancel. For conventional distance sampling, the default method in software Distance is to conduct a regression of log cluster size on estimated probability of detection; we can then estimate mean cluster size when probability of detection is one. This corresponds to clusters on the line or at the point, where we expect no size bias. If we can also assume that cluster size is accurately estimated for such clusters, then this method also corrects for underestimation of the size of more distant clusters. See Sect. 6.3.1.3

2.2.6 Poor Search Pattern

Training of field staff should include how to search for animals to ensure that resulting data are easy to model. We show plots indicating different shapes for the detection function for line transect sampling in Fig. 4.2. We would prefer a histogram of the type shown in the top left plot. The data here suggest that the detection function has a wide shoulder, which means that the probability of detection stays close to one for some distance from the line. We would expect different models that fit the data well to give very similar estimates of animal density in this case. The top right plot by contrast suggests that the detection function has a narrow shoulder, with the detection probability dropping appreciably below one at relatively small distances from the line. We might expect to see more variability in density estimates using different models for the detection function, but estimation should nevertheless be reasonably reliable. The bottom left plot shows ‘spiked’ data, with apparently a very steep fall-off in probability of detection with distance from the line. Different models may yield very different estimates of animal density for these data. If the spike is real, indicating that some animals are unlikely to be detected unless very close to the line, then we need to attempt to fit the spike. This will result in very poor precision. If the spike is an artefact, arising from rounding many distances to zero, or from animals being attracted towards the observer, then a model that cannot fit the spike, such as the half-normal without adjustments, is likely to give lower bias. The bottom right plot is similar, except that there appears to be a wide shoulder, but with extra observations very close to the line. This can occur due to the strategy of ‘guarding the trackline’ adopted in some shipboard surveys. Suppose the main observers search with binoculars, and some animals on or near the line avoid detection. An additional observer might search by naked eye to detect such animals. However, they search a much smaller area than do the observers using binoculars, and when the two sets of data are combined, we can obtain data like those shown in the bottom right plot of Fig. 4.2. It is not possible to model such data reliably, when the effect is this large. A double-observer approach (Sect. 4.1.2.4) might be the better option in this case.

Fig. 4.2
figure 2figure 2

Plots of number of detections by distance interval. In the top left plot, the data suggest that the detection function has a wide shoulder. The top right plot suggests a narrow shoulder for the detection function. The bottom left plot shows ‘spiked’ data, while the bottom right plot indicates a good shape except that there are extra detections at very small distances

Similar considerations apply to point transect data, except that there are relatively fewer detections close to the point, so that patterns at small distances tend to be less clear. Field methods that produce a wide shoulder for the detection function are more critical for point transect sampling than for line transect sampling, because the relative lack of data close to the point results in greater variability in density estimates arising from different models for the detection function.

2.2.7 Diagnostics of Poor Data

The previous subsections, dealing with challenges to analysis presented by awkward data, suggest a routine of plotting that should be adopted as a matter of course. A set of plots of collected data should be prepared on a daily basis to look for telltale signs of heaping, responsive movement, cluster size bias or problematic search patterns. Plots of data are not guaranteed to detect all of these problems. Nevertheless, just as diagnostic plots are an accepted practice in standard regression analysis, similar rigour in assessing distance sampling data while data collection is on-going is key to producing robust data to produce defensible estimates of animal density or abundance.

2.3 Field Testing

Because there are subtleties associated with field craft in both the sighting of animals and the recording of data, often with the aide of field gear, it is imperative that the entire enterprise be tested prior to data collection in earnest. It is axiomatic that if a field study lacks a pilot study, then the first field season will equate to a pilot study.

Field personnel should both be trained to collect and record high quality data, and assessed to determine if their training has been effective. Testing should assess the ability of field crews to

  • properly identify the species of interest,

  • accurately measure group size (if animals occur in clusters),

  • accurately measure distances of detected animals from the point or line, and

  • detect all animals on the transect line.

Trials for each of these skills can be set up under field conditions and crew members can be periodically provided with refresher training. The costs of training as well as periodic reassessment should be incorporated into the project budget.

Not only should there be training and assessment of the data collection process, but also of other aspects of the field enterprise. Checks should be made of the equipment used to make distance measurements, e.g. rangefinders, clinometers, altimeters, angle boards, binocular reticles, tape measures. Equipment used to measure effort (maps, stopwatches, GPS units) should be checked for accuracy. Anything used for data entry and recording, e.g. voice-activated microphones, handheld computers and paper forms, all need to be tested by the crew members collecting the data.

The complete field protocol, from laying out transects, through collection of data, to daily diagnostics of gathered data, to daily backup of collected data should be practised by all members of the field crew to build in redundancy in crew expertise. Most importantly, apprising all members of the field crew of the steps from data collection through to data analysis enables the field crews to make enlightened decisions in the field when unforeseen circumstances arise.