Introduction

Although the great majority of forensic DNA analyses are aimed at human identity testing [5], a growing number of applications have been proposed for identifying species (e.g., [11, 12, 17, 32]).

For a forensic entomologist, identifying an insect specimen is typically an important early step in an investigation [1, 20]. Traditional morphological keys are unavailable or difficult to use for many immature stages of these insects or even adult specimens such as some female Sarcophagidae [20]. Therefore, many authors have proposed a DNA-based method for forensic insect identification (e.g., [9, 14, 21, 26, 29, 30, 35, 39]).

The published literature on this topic has included considerable discussion of the relative merits of various loci and laboratory methods [4, 19, 21, 22, 26, 35, 39]. Less attention has been paid to more basic issues of determining an adequate study sample when designing a species diagnostic test, including replication and selection of taxa. As a caution to workers in this field, we present in this paper an example of how the apparent utility of a common DNA-based test for identifying insect specimens declined as more information became available.

The blow fly genus Lucilia Robineau-Desvoidy, whose members are often referred to as the greenbottle flies, includes many common species of both forensic and veterinary importance [8, 20]. Compared to most forensically important insects, they have been well-studied using molecular systematic methods [6, 10, 25, 38]. Biosystematic analysis based on sequences of the commonly used cytochrome oxidase I (COI) gene produced robust phylogenetic trees that largely agreed with conclusions based on morphology [25, 38]. One surprise was the discovery that all Hawaiian Lucilia cuprina that were genotyped formed a distinct COI lineage that was a sister group to the COI of Lucilia sericata, while a phylogeny based on the nuclear gene 28S ribosomal RNA produced separate branches corresponding to the two morphologically defined species [27]. Although the COI data for L. sericata and L. cuprina did not agree with classical taxonomy, this did not invalidate the concept of using COI for identifying specimens because the odd L. cuprina haplotypes were both monophyletic and confined to a limited geographic area. In other words, the genetic data simply suggested three rather than two categories for assigning a specimen: L. sericata, Hawaiian L. cuprina, and all other L. cuprina.

Since these studies were conducted, many additional COI sequences for Lucilia spp. and closely related species have been deposited in the public GenBank database. These data, plus some we generated, cast the previous phylogenetic results in a new light.

Materials and methods

New cytochrome oxidase I sequences

Newly sequenced adult fly specimens with associated GenBank accession numbers are listed in Supplementary Table 1. Standard methods, described in Wells and Sperling [34], were used for DNA extraction, PCR, and sequencing.

Previously published sequences

The previously published GenBank calliphorid COI sequences used in our analyses are listed in Supplementary Table 2. Each sequence represents some portion of the region corresponding to positions 1–1,545 of GenBank accession AF295550, Phormia regina [35], which includes the coding region for COI and a few bases of flanking transfer RNA genes. A number of GenBank Lucilia COI records were not included because each was identical to a record that was used for our analysis.

Computer analyses

Maximum parsimony analysis [28] was performed as [35]. “Complete” sequences approximately 1,545 bp in length were included in each phylogenetic analysis. P. regina, Chrysomya rufifacies [34], Eucalliphora latifrons [35], and Cynomya (=Cynomyopsis) cadaverina [37] were used to form the root of each phylogenetic tree.

Results and discussion

The COI sequences generated by us were deposited in GenBank (Supplementary Table 1). Among the previously published sequences, Hemipyrellia ligurriens, with accession number AY097334 [6], was found to possess a 9-bp deletion relative to the other sequences corresponding to base positions 306–314 in AF295550 [35]. Therefore, three amino acids are missing from the typical M3 transmembrane helix [13]. A COI amino acid indel that is not very close to one end of the peptide is extremely unusual for an insect [13], and it was not observed for the H. ligurriens specimen that we sequenced (DQ453493).

The use of DNA sequence data to distinguish closely related species is likely to be unreliable unless a condition of reciprocal monophyly exists for those species [7, 15, 18] (Fig. 1a). Alternative phylogenetic outcomes can make identification based on a particular locus sometimes or always impossible (Fig. 1b,c).

Fig. 1
figure 1

Hypothetical phylogenies for a genetic locus being evaluated for distinguishing species 1 and 2. a Species 1 and 2 show reciprocal monophyly; therefore, identification of a forensic specimen using this locus is likely to be unambiguous. b Species 1 is monophyletic but species 2 is paraphyletic. Homologous sequence from a forensic specimen that falls between the two will have an uncertain identity. c The two species share an allele (1 = 2) and the relationships are polyphyletic. The locus is unsuitable for specimen identification

Our phylogeny of the complete COI haplotypes indicated a paraphyletic pattern for Lucilia illustris, and Lucilia porphyrina (Fig. 2). The paraphyly of L. cuprina COI seen here was previously described [27, 31]; however, in our earlier paper we thought that this was solely because of a distinct Hawaiian lineage with all other populations being monophyletic for mtDNA [27]. Based on these additional data, the Hawaiian L. cuprina haplotypes are not distinct as previously thought. Instead they closely resemble L. cuprina from Taiwan (AY097335 [6]), but also the separate genus Hemipyrellia, also from Taiwan (the new DQ453493 and AY097334 [6]). Bootstrap support for this lineage is weak (63%), making the exact relationship of these H. ligurriens haplotypes to those of L. sericata and L. cuprina poorly resolved. However, based on COI, H. ligurriens is clearly embedded within the L. sericata/cuprina clade (95% bootstrap support, Fig. 2), a result that we feel raises questions about the validity of the genus Hemipyrellia that is separate from Lucilia. Although the genus Dyscritomyia was placed within Lucilia, this was based on poorly supported basal nodes. Therefore, the exact relationship of Dyscritomyia to the other greenbottle species is ambiguous.

Fig. 2
figure 2

Maximum parsimony bootstrap (1,000 replicates) consensus tree for Calliphoridae based on the complete cytochrome oxidase I sequence. Numbers on branches indicate percent bootstrap support. Dashed lines indicate species that are not monophyletic for this gene

Phylogenetic analyses of the shorter sequences (Supplementary Fig. 1) were congruent with the pattern shown by the complete sequences. L. illustris and Lucilia caesar could not be distinguished using COI (Supplementary Fig. 1c–k). L. caesar AF017424 was strongly associated with the incorrect species (Supplementary Fig. 1c), even though it is only 199 bp in length (Supplementary Table 2). Some L. cuprina were assigned to the mixed species lineage that also included Hemipyrellia (Supplementary Fig. 1m,p,q), as was a single L. sericata (Supplementary Fig. 1y). Lucilia ampullacea was monophyletic (Supplementary Fig. 1b) and the position of Lucilia papuensis was unresolved (Supplementary Fig. 1a).

Thus, a lack of COI reciprocal monophyly appears to be common within the genus Lucilia. In fact this was the case for every sister species comparison for which we had replicate haplotypes. Other than a situation in which the single available samples from two species are found to share an allele, a departure from reciprocal monophyly can only be detected with replicate samples. Therefore, additional specimens must be analyzed before the COI monophyly of many of these Lucilia species can be known.

It has long been observed that L. cuprina shows morphological variation corresponding to geographic distribution [2, 16, 33]. Some authors have labeled these the subspecies Lucilia cuprina cuprina (Asia, Oceana, the New World, and tropical Australia) and Lucilia cuprina dorsalis (Africa to western India, temperate Australia, and New Zealand) [16, 33]. For most L. cuprina in our analyses, we lack the morphological data needed to distinguish these two forms. Therefore, we cannot judge whether they should be considered separate species as advocated by Wallman et al. [31]. In contrast to this mtDNA phylogeny, L. cuprina was monophyletic for nuclear loci [23, 24, 27]. Although animal mtDNA is expected to undergo lineage sorting more quickly than nuclear DNA after speciation [3, 18], and therefore to be more useful for distinguishing recently diverged sister species, this may not be the case for L. cuprina.

Our results do not automatically invalidate the use of COI for identifying Lucilia specimens, but we believe that it highlights at least two major concerns for anyone proposing a forensic DNA-based species diagnostic test. The first is the importance of having replicate samples from a wide geographic range [36]. The second is the importance of knowing the natural history and distribution of the species involved. For example L. illustris and L. caesar are both Palearctic species, but only L. illustris occurs in the New World [20]. Therefore, COI data may serve to identify an L. illustris specimen from North America but probably not one from Europe. Based on the sequence data available so far, the COI haplotypes of L. cuprina (assuming it is a single species) and L. sericata are distinct in most geographic areas but not so in Taiwan and perhaps other East Asian regions. Knowledge of other aspects of the study organisms, such as seasonal activity patterns, might allow an investigator to eliminate a problematic species from consideration and make identification based on COI possible.

Finally, we cannot exclude the possibility that one or more of the deviations from reciprocal monophyly we observed was the result of a misidentified specimen. It seems unlikely, though, that this could account for all such cases given the multiple research teams that produced the data.

Although this study focused on a single locus and insect genus, the issues of study design raised here apply equally well to any effort, such as those concerned with enforcing food quality [12] or conservation [17] laws, to develop a forensic species diagnostic test.