Introduction

Forensic entomologists have mainly used either a carrion insect development or succession model to infer some aspect of time since death, often called the postmortem interval (PMI) [1]. Of the two methods, analysis based on development has received much more attention in the published literature, which includes many training (=reference) data sets generated for this purpose (e.g., [26] and many more). A development model is used to determine the age of a carrion insect thought to have fed on the corpse. This value equals a minimum postmortem interval (PMImin), provided that oviposition or larviposition occurred on the deceased following death.

Any forensic science inference should include an objective estimate of the uncertainty associated with that estimate [7], and we advocated a statistical approach to estimating carrion insect age [1, 8]. Adapting the standard methodology of inverse prediction, a model relating size y to age x is fit to training data in which larvae are sampled and measured at predetermined ages. The measured size y * of a mystery specimen, a larva of unknown age, is tested as an outlier at each potential age by comparing it to the fitted model. This produces a probability, a p-value, for each age. The set of ages for which the p-value is greater than 5 % comprises a 95 % confidence set on the age of a mystery specimen.

Inverse prediction is built on the assumption that the training data larvae are sampled at random from the populations of all larvae at their respective ages. However, as is usually the case, this is not possible in real settings, e.g., one could not randomly sample the population of all 24-h-old larvae of a particular insect species. Fortunately the credibility of statistical methods is adequately maintained if the design takes all practical measures to avoid bias.

This raises the issue of whether or not published carrion insect laboratory development data are suitably unbiased for inverse prediction. Many older papers (e.g., [9]) included so few technical details that the reader cannot know how data were collected. More recent publications are more explicit, but there is no standard experimental protocol or set of protocols.

The most common experimental method has been to set up a rearing container with an insect cohort of approximately equal-age individuals of one species, maintain that container under a defined set of conditions, and periodically remove a subset of those individuals (usually without replacement) in order to record their size and/or instar (e.g., [4, 1016]).

Many authors collected subsamples that were clearly biased in that they targeted the largest individuals in the rearing container (e.g., [3, 4, 10, 13, 1720]). Davies and Ratcliffe [21] repeatedly measured all individuals of replicated single cohorts, but reported only the largest measurements for each age. Some authors used subsamples either described as random (e.g., [11, 14, 22, 23]) or with no stated selection criterion (e.g., [12, 15, 16, 24, 25]). However, given that none of these papers described a mechanical randomization procedure for choosing samples, it is very unlikely that sampling was even approximately random [26].

A few authors sampled entire age cohorts [5, 8, 27, 28]. Because a separate rearing container must be assembled and processed for each age, this can require much more labor than the subsampling approach.

Here, we use data from laboratory rearing experiments on the forensically important calliphorid flies Chrysomya megacephala and Sarconesia chlorogaster to illustrate the effect of training data sampling method on age prediction model performance, that is, on specimen age estimation by inverse prediction.

Materials and methods

The study species were the forensically important calliphorid flies C. megacephala (1,405 total larvae) and S. chlorogaster (2417 total larvae), both reared at constant incubator temperature (C. megacephala, 27 °C; S. chlorogaster, 25 °C) and a light cycle of 16L:8D (C. megacephala) or 12L:12D (S. chlorogaster). Individual rearing containers were set up and randomly assigned to predetermined sample ages, and during sampling, all insects were removed from the container and killed in hot ethanol (C. megacephala) or hot water (S. chlorogaster). For C. megacephala, we used the development data in [27], which was a source population from Bangalore, India. That colony had been maintained for six generations prior to the experiment. The S. chlorogaster colony originated from Curitiba, Brazil, and had been maintained for approximately 16 generations. S. chlorogaster rearing and sample methods were similar to those in [27], except that larvae were reared in a 500-ml plastic container with 500 g of ground beef.

Analyses were based on larval length measurements. In addition to the complete data, data sets were generated using the following subsampling strategies in order to determine each method’s effect on the resulting model for predicting age.

  1. 1.

    Simulated random sampling without replacement from a single cohort. In other words, from the youngest age cohort, we selected five individuals using an online random number generator (www.randomizer.org/form.htm). From the next higher age cohort, five individuals were randomly eliminated, and then five of the remaining set of values were selected at random for analysis. From the next higher age, ten individuals were eliminated, and then five selected, etc.

  2. 2.

    Five individuals from each cohort were selected at random.

  3. 3.

    Similar to no. 1, but always selecting the five largest (in length) larvae, some of which may have been of equal size.

  4. 4.

    Similar to no. 2, but again always selecting the five largest values.

Growth curves with associated 95 % inverse prediction bands and confidence intervals about an estimate of age were constructed as in Wells and LaMotte [8]. Calculations and graph production were done using SAS statistical software [29].

Results and discussion

The essential, defining property of a confidence set is that it covers the true age with probability 95 %. Figures 1, 2, and 3 illustrate the performance of prediction models derived from each sample scheme and species. Each graph shows three features. These are (1) histograms showing frequency distributions of specimen lengths for each age cohort in the full data set, (2) 95 % prediction limits on length based either on all data in each cohort (Fig. 1) or on subsamples (Figs. 2 and 3), and (3) average coverage proportions of confidence intervals on age for one complete age cohort (60 h for C. megacephala and 48 h for S. chlorogaster) based on feature 2 for each graph. That is, for the length of each larva in the cohort, the p-value was computed at each hour over a range of ages (18 to 120 h and 18 to 150 h, respectively). Those ages for which the p-value is greater than 5 % are covered by the corresponding 95 % confidence set. Thus for each age, for some larvae, the corresponding confidence set covers the age, and for the rest, it does not. The proportion of confidence sets that cover the age is plotted for each age. It is an estimate of the coverage probability for each age. The histogram of the targeted age cohort is highlighted in each graph.

Fig. 1
figure 1

Larval length as a function of age at a constant incubator temperature based on complete age cohorts. Histograms show frequency distributions of specimen lengths for each age cohort in the full data set. Red lines show 95 % prediction limits on length. Green lines show average coverage proportions of confidence intervals on age for the highlighted age cohort (see text for further explanation). a. Chrysomya megacephala (27 °C, n = 1,405). b. Sarconesia chlorogaster (25 °C, n = 2,417)

Fig. 2
figure 2

Inverse prediction limits for random subsamples superimposed on Fig. 1. a C. megacephala separate age cohorts. b S. chlorogaster separate age cohorts. c C. megacephala simulated repeated sampling from a single cohort. d S. chlorogaster simulated repeated sampling from a single cohort

Fig. 3
figure 3

Inverse prediction limits for subsamples of largest larvae for a given age superimposed on Fig. 1. a C. megacephala simulated repeated sampling from a single cohort. b S. chlorogaster simulated repeated sampling from a single cohort. c C. megacephala separate age cohorts. d S. chlorogaster separate age cohorts

The ideal result for each graph would show a coverage probability (the green line in the figures) of 100 % at the cohort age and 0 at all other ages. In the real world, in which the distributions of lengths will overlap for different ages, an acceptable result would be at least 95 % for the true age, and the more steeply it falls off for other ages, the better. Coverage probabilities for the test cohorts in the full data set (Fig. 1) exceed 0.95 for the true age value.

Random subsamples

A genuinely random subsample, either with separate age cohorts (Fig. 2a, b) or from a simulated single cohort (Fig. 2c, d) yielded a predictive model that closely approximated that of the full data set. The curvature of the confidence limits in these figures reflects the stochastic inclusion of a higher or lower proportion of outliers in a subsample, with corresponding shifts in the variance influencing both a quadratic term in the inverse prediction function and the degrees of freedom estimator [8]. This resulted in discontinuous coverage probability distributions (Fig. 2b, d) and associated confidence intervals and is more likely to occur the closer the true age is to the point in development where postfeeding larvae cease to grow in length. It is possible that postfeeding and feeding larvae can be reliably distinguished using measurements other than body length [2, 30]. If so, then inclusion of such a variable in the model may eliminate this problem.

We are not aware of this sampling scheme having been employed for modeling carrion insect development (see “Introduction”), and while it may offer no practical advantage over generating the full data set if the measurement of each specimen involves little time and expense, it might be worthwhile for an investigator using a labor-intensive and/or expensive measurement technique, such as gene expression [30, 31]. Should that be the case, random samples from full cohorts would be the best subsample method of those we evaluated for an investigator wanting to process fewer specimens while best preserving the form of the development model.

Largest individuals from a simulated single cohort

As described in the “Introduction,” these data mimic a common method in the forensic entomology literature. This subsampling scheme (Fig. 3a, b) understates the variance, and particularly for C. megacephala, it moves prediction bounds progressively lower when more larvae are deleted. This deviation in the C. megacephala growth curve resulted from the relatively small size of the oldest cohort [27], which was reduced to a single individual by the simulated sampling without replacement. The prediction limits are unrepresentatively narrow; consequently, the coverage rates are too low because the confidence intervals are too narrow. Coverage rates for the true ages of 60 and 48 h are less than 5 %.

The obvious drawback to this method is that a confidence interval produced from this model would be much more precise than is justified and would be unlikely to include the true specimen age.

One rationale (e.g., [4, 17]) for this focus on only the most rapidly developing individuals was that this is analogous to evidence collection during a death investigation. In other words, because scene investigators are advised to collect the largest or otherwise apparently most developed carrion insects [32], training data based on only the largest individuals in an age cohort are analogous to the death investigation evidence. We think that this is a flawed analogy. The largest individuals in a rearing container are “large for their age,” that is, they are known to be developing more rapidly compared to others of the same species under the same conditions. The largest larva found on a corpse may or may not have developed more rapidly than others of the same species and age on that corpse. A more developed larva or pupa of the same species and age may have escaped detection. Because relatively fast- and slow-developing individuals from a corpse cannot be distinguished based on appearance in a mixed-age population, a development model for only the fastest subset of the corpse population does not reflect the casework situation.

Largest individuals from separate cohorts

This subsampling method has also, to our knowledge, not been used for a published study. However, it closely resembles to an effective sampling of largest larvae with replacement [21]. As with the previous scheme, such data yield artificially narrow confidence limits and low coverage rates, with a coverage not being highest at the true age (Fig. 3c, d). Again, an estimate of larval age based on this type of model is likely to be both too precise and incorrect.

In conclusion, random subsamples, such as those some authors have claimed to use, would produce a predictive model similar to that of the full data set, although the less even confidence limits may yield less precise and discontinuous predictions of age. As described in the “Introduction” though, it is unlikely that earlier authors actually sampled at random, so the implications of our results for published studies are unclear. Inverse prediction of age based on growth data from largest individuals performs poorly.

We continue to favor carrion fly growth models produced from samples of entire age cohorts. A model for estimating larval age from size performs better when based on such data compared to an analysis based on subsamples. An accurate prediction interval, that is, a statistical confidence interval about an estimate of specimen age, is obviously desirable for death investigation casework. We also believe that it is essential for validating a predictive model because confidence limits objectively define the threshold proportion of validation study observations needed to support the model, e.g., 0.95 if one uses 95 % confidence limits.