Introduction

The management of risks posed by potential invasive species requires, minimally, the ability to recognize those species. The logistics of early detection and monitoring of invasive species entail that the tools utilized for species recognition be rapidly deployable, cost-effective, technically accessible, and accurate; in addition, the best tools will be applicable across a wide range of taxa. Unfortunately, these criteria are not well met by traditional approaches, including identification based on organismal morphology. That approach can be time consuming for micro- and meio-fauna, and can require the expertise of multiple taxonomists for complex communities, thus significantly elevating costs (Lawton et al. 1998; Nee and Lawton 1996). In addition, depending on the taxa under investigation, availability of taxonomic expertise may be limited or altogether absent (Mallet and Willmott 2003). Typically, such considerations force researchers to base biodiversity estimates on identifications to family level or to “morphospecies” (Caesar et al. 2006). Furthermore, the accuracy of morphological identification is severely attenuated by the requirements of invasive species monitoring: the difficulty of identifying early life history stages (eggs and larvae) by morphological criteria is well known (Besansky et al. 2003), and yet recognition of these stages is crucially important to the task of tracking invasions.

These limitations of morphological approaches to species identification have led many researchers and managers to propose and pursue the development of DNA-based tools for monitoring invasive species. Although other molecular tools have been developed (e.g. those that employ immunological methods (Symondson et al. 1999; Trowell et al. 2000)), DNA-based approaches are currently the most widely adopted and promise to be most broadly applicable in the future. Here we consider a number of different DNA-based tools for monitoring invasive species (Box 1). These tools are in various stages of development, from mere proposal to actual deployment, and potentially address a wide variety of important problems confronting managers and policy-makers. We examine the applicability of these different approaches, along with the technical difficulties associated with their development. This review is meant to provide those most in need of effective tools with a broad overview of the possibilities currently—or perhaps soon to be—available to them, and to indicate where further research is most urgently needed.

Box 1 Summary of potentially useful DNA-based technologies for invasive species monitoring

Choosing the best tool for the job

Invasive species monitoring is a complex undertaking comprising numerous applications, each with its own attendant technical challenges. Not all DNA-based tools are appropriate for all applications. Some may be insufficiently advanced to meet the demands of certain tasks; conversely, some methods may be technological overkill for a given application, and could be replaced with more cost-effective alternatives. We find it useful to categorize the diversity of available applications by considering three characteristics: sample complexity, target specificity, and quantitation. Monitoring tasks may require processing of samples consisting only of single individuals of unconfirmed identity, or they may entail working with complex biological communities drawn from environmental samples, with all of their associated abiotic components. In addition, while certain tasks involve detection or identification of a single species (e.g. a “black-listed” species of particular concern), others may require the ability to target multiple species or even all species in a complex sample. Finally, though it may be sufficient to obtain binary presence/absence data for target species, the potential value of information on propagule pressure (Kolar and Lodge 2001; Leung et al. 2002; Lockwood et al. 2005) might put a premium on the ability to quantitate species abundance.

If we consider these characteristics as axes of a three-dimensional space, then various monitoring tasks can be thought of as points in that space (see Fig. 1A). Generally, the technical difficulty associated with developing tools appropriate to each task increases with distance from the origin. The simplest applications will involve individual specimens and identification aimed at a specific target taxon; in other words, the confirmation of species identity. (Note that we disregard the quantitation of simple samples, which typically consist of single individuals.) Tools appropriate for the most technically demanding applications, on the other hand, would be capable of delivering estimates of abundance for all taxa present in a complex environmental sample. Figure 1A also indicates that the conditions of simple/complex, specified/unspecified, and quantitative/non-quantitative are not strictly binary. Samples can be moderately complex, specification can be to genus or family level rather than species level, and abundance estimates may be absolute or relative.

Fig. 1
figure 1

Conceptual scheme for delineating DNA-based tools for invasive species monitoring. Applications can be distinguished according to three criteria: sample complexity, specification of the target, and necessity of quantification. (A) provides a spatial metaphor for the relationships between applications, with each of the three criteria shown as an axis in three-dimensional space. (B) Provides a decision-making flowchart based on the same criteria. Numbers in both top panels refer to the applications listed in the bottom panel

Figure 1B re-imagines Fig. 1A as a decision-making flowchart. While Fig. 1A is useful for visualizing the relationships between different applications (particularly in terms of the difficulty associated with tool development), Fig. 1B may be of more practical use to those needing to decide on the best tools for a particular monitoring task. By couching the three axes of Fig. 1A as “yes/no” questions it is possible to specify a most appropriate application for a given type of sample and the type of information that must be extracted from it. Figure 2, in turn, provides a summary of the DNA-based tools discussed in detail below and their suitability for each particular application outlined in Fig. 1. The combination of Figs. 1B and 2 thus represents a simple guide for determining which tools are best pursued given the demands associated with particular samples and particular research questions. In the following, we discuss in detail these applications, their relationship in general to the aims of invasive species monitoring, and the tools that have been (or are being) developed to address them.

Fig. 2
figure 2

Appropriateness of specific tools for particular monitoring applications. Appropriateness is based on both the capacity of the tool to address the needs of each application and the difficulty involved in developing the tool. In other words, if two tools can answer the same question but one is far less costly to develop, then the former will be listed as “best” and the latter as “questionable”

Potential applications of DNA-based identification

Confirmation of specimen identity

It is often necessary to identify individual specimens for which morphological criteria can provide only approximate or tentative assignments. In these situations, DNA-based identification may serve either as an alternative method for identification or as a means of “quality control,” confirming initial morphological identifications. Methodologies appropriate to this application require the capacity to compare, either directly or indirectly, DNA sequence derived from the unknown specimen with reference sequences associated with previously identified and vouchered specimens.

Given the wide availability of DNA sequencing technology, virtually all contemporary tools for DNA-based species identification are derived ultimately from the analysis of DNA sequence information. Developing a DNA-based tool for confirming identity (“Is this species X?”, as opposed to the more challenging “What species is this?”) typically begins with direct comparisons between DNA sequences obtained from multiple individuals of a target taxon and sequences from representatives of multiple non-target taxa to identify nucleotide polymorphisms specific to the target. Diagnostic polymorphisms are then employed to design downstream applications that determine the character of a sample sequence through indirect means. Such indirect approaches take advantage of available sequence information without incurring the cost (in both time and money) of directly sequencing each sample.

The most widely used indirect approach combines PCR amplification of a small fragment of DNA with restriction digestion of that amplified fragment. Initial amplification utilizes PCR primers designed to recognize a narrow range of species, including the target; often this comprises both the target and congeners or other closely related species. The presence of diagnostic restriction sites which yield restriction fragment length polymorphisms (RFLPs) specific to the target taxon are then exploited to confirm species-level identification of the sample. Weathersbee et al. recently demonstrated the utility of this approach for distinguishing between morphologically cryptic eggs of two closely related root weevils, Diaprepes abbreviatus (a regulated invasive) and Pachnaeus litus (a minor native pest) (Weathersbee et al. 2003). In some cases, underlying variation may be sufficient even to target populations from specific geographic origins. Saltonstall et al., for instance, were able to develop a rapid and inexpensive means of distinguishing native and non-native haplotypes of the common reed Phragmites australis in North America (Saltonstall 2003). In another study, species-specific restriction sites and genus-specific PCR primers allowed identification of both European and Asian varieties of introduced gypsy moths (Lymantria dispar); this represents a tool of some practical importance considering the different dispersal capacities of the two populations (Pfeifer et al. 1995). The majority of recent studies employing PCR–RFLP for identification of introduced species have targeted significant insect pests (Armstrong et al. 1997; Khemakhem et al. 2002; Miller et al. 1999; Muraji and Nakahara 2002; Pfeifer et al. 1995; Scheffer et al. 2001; Szalanski et al. 2004; Szalanski and Powers 1996; Toda and Komazaki 2002; Weathersbee et al. 2003); however, the technology has also been adopted for introduced macroalgae (Teasdale et al. 2002), vascular plants (Saltonstall 2003), and even mammals (Lopez-Giraldez et al. 2005).

Indirect DNA-based identification may also be accomplished by designing primers that specifically recognize binding sites only in the target species (Species-Specific PCR, or SSP). This approach is conceptually identical to the initial step in PCR–RFLP, and is easier to implement, not requiring secondary restriction of PCR products. However, it necessitates sufficient interspecific nucleotide variation, coupled with intraspecific sequence conservation, to enable design of species-specific PCR primers, and it precludes identification of multiple target species with a single assay. SSP has been adopted for identification of insect pests (Haymer et al. 1994; Szalanski et al. 2004; Weathersbee et al. 2003), weedy plants (Shrestha et al. 2005), and introduced marine invertebrates (Heath et al. 1995). In one early demonstration of the power of this approach, Haymer et al. showed that accurate identification of Tephritid flies (including the highly invasive Mediterranean fruit fly Ceratitis capitata) was possible using individual eggs, larvae, or even body parts as starting material (Haymer et al. 1994). The genetic loci utilized for SSP vary, and include standard “barcoding” loci (e.g. COI and 16S, see below), microsatellite loci, or any other sufficiently variable loci. For instance, Shrestha et al. recently adopted randomly amplified polymorphic DNA markers to develop suites of SSP primers capable of distinguishing economically important weeds of the Sporobolus species complex (including invasive populations) from their less damaging native conspecifics (Shrestha et al. 2005).

Differences in fragment electrophoresis not attributable to length variation can also be used to discriminate target from non-target taxa. PCR-amplified DNA fragments with only single nucleotide differences can exhibit observable differences in electrophoretic mobility under certain conditions (Muyzer 1999; Sunnucks et al. 2000). Fragment polymorphisms can therefore be assigned to underlying, species-specific nucleotide polymorphisms through various electrophoretic techniques like single-stranded conformational polymorphism (SSCP) or denaturing gradient gel electrophoresis (DGGE). DGGE has been utilized, for example, to distinguish invasive Asterias amenuensis from other native Asterias seastars (Deagle et al. 2003). Like PCR–RFLP, DGGE and SSCP allow resolution of species-level identity even when species-specific PCR primers are unavailable.

One general limitation of these indirect approaches is the possibility of unobserved nucleotide variation. It is possible in all cases to obtain false positive results if the nucleotide differences between target and non-target fail to impact PCR amplification, restriction digestion, or electrophoretic mobility of fragments. False negatives are possible as well (though less likely) in cases of unrecognized intraspecific variation. To guard against these problems, it is crucial that initial screening for sequence level variation be thorough. In particular, comparison of target sequences with non-targets likely to be encountered must be as comprehensive as possible.

Identification of unknown specimens

The identification of unknown specimens (i.e. answering the question “Which species is this?”) holds much in common with targeted identification—in large part because there may be few truly unknown specimens. Typically morphological identification can narrow the field at least somewhat, even if only to gross taxonomic levels. In some cases (e.g. identification of early life history stages or of fragments of individuals), however, the lack of knowledge may be more profound. Some of the tools described above may have the capacity to differentiate between several to dozens of species, and thus may be suitable for the identification of “unknown” targets, but none are capable of identifying specimens that have been assigned generically to particularly speciose groups—or that haven’t been assigned at all. Perhaps the greatest drawback to utilizing techniques such as PCR-RFLP for identification of unknown specimens is that such techniques are “hit-or-miss.” While a positive result (i.e. generation of a known RFLP pattern from specimen DNA) may give a positive ID, a negative result (i.e. generation of an unfamiliar RFLP pattern or failure of PCR amplification) gives almost no information whatsoever, except the knowledge that the specimen is not a representative of the small number of species that the assay has been designed to recognize.

At present the most appropriate technology for the identification of unknown specimens is DNA barcoding. Barcoding ideally allows DNA-based identification of species when initial morphological identification can offer only very approximate assignments to speciose genera or even to taxonomic families or orders. Unlike the techniques described above, this is a direct approach, involving analysis of DNA sequence for each tested specimen. This method involves PCR amplification and sequencing of a short DNA sequence from a specified “barcode” region of the genome. These regions are chosen based on observable patterns of molecular evolution—candidate regions for barcoding must exhibit low intraspecific sequence variation but sufficiently high interspecific variation to unambiguously differentiate species (Blaxter 2004; Hebert and Gregory 2005). Several barcode regions have been proposed or are in use. For example, a section of the mitochondrial gene cytochrome c oxidase subunit I (COI) is commonly adopted for barcoding metazoans (Hebert et al. 2003). Barcode sequences derived from test specimens are compared to those available in a reference database in order to determine species identity.

The difficulties associated with barcoding are primarily related to the availability (or lack thereof) of sequences in the reference database, and to the problem of parsing inter- versus intra-specific genetic variation in order to assign identity confidently at the species level. The former difficulty is primarily a logistical one, and has provided the motivation for major barcoding initiatives aimed in part at populating databases with the barcode reference sequences necessary to make identifications (Schindel and Miller 2005). In principle, then, the solution to the problem is straightforward. In practice, however, limitations of existing barcode databases seriously constrain the ability to conduct DNA-based identification, and will in many cases require substantial initial investment to generate reliable reference sequence databases for particular research projects.

The problems associated with parsing genetic diversity in such a way as to enable confident species-level identifications are considerably more challenging. Clearly, not all individuals of a given species will have identical sequences over a several hundred-nucleotide stretch of barcode. This in itself is unproblematic, but becomes awkward if the degree of variation present within a species overlaps substantially with the degree of variation existing between closely related species. The so-called “barcoding gap” (Meyer and Paulay 2005)—the lack of overlap between inter- and intra-specific distributions of genetic variation—is crucial to successful species-level assignments. Much of the existing empirical literature on DNA barcoding has been devoted to assessing levels of genetic diversity within and between species. While early results were able to demonstrate clearly that, for many taxa, accurate barcoding is possible due to appropriately distributed variation (e.g. Armstrong and Ball 2005; Barrett and Hebert 2005; Hebert et al. 2004; Hogg and Hebert 2004), some recent research has indicated that this may not always be the case. Absence of the barcoding gap can result in significant levels of mis-assignment in some taxa; predictably, error is more pronounced in cases of poorly studied taxa (Meyer and Paulay 2005) or young species assemblages resulting from recent radiations (Monaghan et al. 2005).

The necessity of a barcoding gap to accurately resolve species-level relationships renders DNA-based identification far more complicated than simply matching a sample barcode sequence with a single databased reference sequence. Rather, identification requires placement of the test sequence within a phylogenetic framework including multiple representatives of the target species and multiple representatives of closely related non-target species. The number of related species and number of representatives of each that must be investigated will differ from case to case depending on the degree of relatedness within and between species in the test group. Confident assignment of species identity may thus necessitate the availability of dozens to hundreds of individual sequences in the reference database. For invasive species monitoring, this requirement may entail sampling and data collection from related native “reference” species as well as from the introduced taxon.

Barcoding approaches to invasive species identification have already proven effective in several studies. Armstrong and Ball have shown that high levels of accuracy (>93%) are attainable for tussock moths (Lymantriidae) and fruit flies (Tephritidae), both groups that include economically significant introduced pest species (Armstrong and Ball 2005). When compared to other DNA-based identification approaches (e.g. PCR–RFLP or SSP), barcoding methods based on the Folmer fragment of COI (Folmer et al. 1994) proved more accurate and more robust, identifying several specimens that were unassignable by other techniques and correcting misidentifications (Armstrong and Ball 2005). Barcoding has also been utilized to confirm the introduced status and clarify the taxonomy of a widely distributed species of leeches from the Helobdella triserialis species complex (DeSalle et al. 2005), and to identify an invasive swimming crab in New Zealand (Smith et al. 2003).

Screening for the presence of a target species

Although screening for target species can be accomplished utilizing the same basic techniques outlined above, this application raises additional technical demands related to the processing of complex environmental samples. In particular, whereas the identification of specimens requires accuracy, screening requires both accuracy and sensitivity; representatives of target taxa in an environmental sample may be extremely rare, and screening tools must be capable of recognizing this rare “signal” amongst the extraneous “noise” generated by the multiplicity of non-targets present in the sample.

PCR is ideally suited for amplification of such rare signals. However, effective target-specific PCR is dependent on two things. First, it must assume the efficiency of upstream processing. Specifically, it requires that the isolation of nucleic acids from the environmental sample has not been biased against rare taxa. PCR can often work with very little target template, but it cannot work if that template is lost altogether through inefficient DNA extraction. The extraction step is thus crucial to the success of any DNA-based identification, though this fact may be under-appreciated (but see Nordgard et al. 2005; Valentin et al. 2005). Numerous DNA extraction methods are available, ranging from standard Chelex and organic phase extraction methods to various commercial kits. It may be the case that for every researcher conducting DNA-based identification there is a different preferred method for obtaining PCR-quality DNA; more troubling is the possibility that different taxa of interest to any particular researcher may require different modifications of her favorite protocol. Attempts have been made to standardize approaches. Perhaps most notable is the detailed protocol by Ivanova et al. obtainable through the Smithsonian Institute’s barcoding website (http://barcoding.si.edu/PDF/Protocols_for_High_Volume_DNA_Barcode_Analysis.pdf). Even with such ostensibly “universal” protocols, however, concerns over taxon-to-taxon variability in extraction efficiency are very real and must be considered in any attempt to detect targets in environmental samples. For instance, it has been demonstrated for microbial communities that different extraction techniques can result in different estimates of community diversity, indicating bias in extraction procedures (Martin-Laurent et al. 2001). Furthermore, PCR amplification of DNA extracted from environmental samples is hampered generally by the frequent presence of compounds that inhibit DNA-modifying enzymes such as Taq polymerase. This problem is well documented in cases of direct extraction from soil (Tebbe and Vahjen 1993), but may similarly arise for environmental samples such as ballast water or benthic sediments.

More obvious is the dependence of effective screening on the design of target-specific PCR primers. Particularly in the case of rare targets, it is crucial that PCR primers not recognize non-target templates, as those templates may be far more common than the target and may thus swamp any signal. Despite these potential difficulties, targeted screening has been successful in several cases. A single group of researchers in Australia has pioneered these approaches, developing PCR-based screens for the introduced seastar Asterias amenuensis (Deagle et al. 2003), the gastropod Maoricolpus roseus (Gunasekera et al. 2005), and the Pacific oyster Crassostrea gigas (Gunasekera et al. 2005; Patil et al. 2005). These studies have employed both DGGE (seastars) and SSP (gastropods and oysters) to detect target DNA sequences. In one case, sensitivity was as low as a single Asterias larva in 200 mg of plankton derived from uninfected ballast water (Deagle et al. 2003); in another, sensitivity was sufficient to detect larvae of M. roseus in plankton samples collected from invaded Tasmanian waters (Gunasekera et al. 2005).

Assessing propagule pressure

While in most aspects identical to targeted screening, accurate assessment of propagule pressure is far less permissive of bias in either DNA extraction or PCR amplification. Quantitation mandates representative extraction of DNA from the target species and the elimination of spurious PCR amplification from non-target DNAs. Perhaps the most attractive option for such applications is quantitative PCR (qPCR). In contrast to traditional PCR, where product is monitored only after completion of the reaction, qPCR monitors the amplification of product throughout the reaction. Since the initial appearance of product (unlike the final amount of product) is highly sensitive to the availability of template, monitoring product formation in the early stages of the reaction can provide accurate estimates of the starting amount of target DNA. The result is an extremely powerful means of quantifying the copy number of target DNA available in a sample.

The application to estimation of propagule pressure may seem obvious, but it is important to note that DNA copy number may not accurately reflect organismal “copy number.” Certain organisms in an environmental sample will simply yield more template DNA than others; the most straightforward explanation for this discrepancy is differences in body mass, although there may be others (e.g. differences in copy number of the target locus). It may be that this problem of “body mass bias” has a relatively simple solution—e.g. the development of conversion factors to translate amount of template DNA in a sample to number of individuals in that sample. Unfortunately, such factors would need to be developed for each target organism.

Little has been published on the application of quantitative DNA-based methods for estimation of propagule pressure. qPCR technology has been utilized for sensitive detection of several introduced pest species, including the solanum fruit fly Bactrocera latifrons (Yu et al. 2004), the melon thrip Thrips palmi (Walsh et al. 2005), and potato and beet cyst nematodes Globodera pallida and Heterodera schachtii (Madani et al. 2005). However, in two of these studies qPCR was adopted primarily for its sensitivity benefits over conventional PCR; only in the case of the cyst nematodes was qPCR utilized explicitly to attempt estimation of pest abundance. The authors of that work generated standard curves with known numbers of individual organisms in order to determine the relationship between template concentration (as determined by qPCR) and number of individuals present in the sample. This is the most obvious, and probably most effective, means of overcoming problems of bias in the PCR protocol. They subsequently report a high level of correspondence between the number of juvenile nematodes present in test samples and the number predicted by qPCR analysis, with sensitivities down to a single individual in some cases; however, they also note the sensitivity of this approach to DNA extraction efficiency, and emphasize the need for unbiased and consistent extraction protocols.

Biodiversity surveys: species composition

It is tempting to speculate that in the near future an entire environmental sample may be rapidly and automatically processed in an instrument that utilizes DNA-based technology to accurately identify all species in the sample. Such an apparatus would be enormously useful for numerous and diverse environmental monitoring applications. It would, for example, allow simultaneous screening for a wide variety of established or potential invasive species, and would additionally provide a rapid and inexpensive means of assessing the impact of invasives on native biodiversity.

Though conceptually this extension of DNA-based methods is seductively straightforward, the development of the necessary tools is a daunting task. Nevertheless, there are several general approaches that may be appropriate for overall biodiversity surveys. These tools have been developed with reasonable success for applications in microbial and fungal community genetics. In many ways, that field provides a model for tool development for invasive species monitoring, particularly when it comes to overall assessments of biodiversity. Researchers seeking to describe the full diversity of microbial and fungal communities are seriously limited by the lack of morphological criteria for species-level assignments, especially for non-culturable taxa that would require field identification; for roughly two decades now they have relied instead on DNA-based tools to make such identifications (reviewed in Anderson and Cairney 2004; Dorigo et al. 2005).

The first general approach, which may be the easiest to implement but is also the slowest and most expensive, is “shotgun barcoding.” With this approach, bulk DNA is extracted from an environmental sample and a specific barcode fragment of that DNA is amplified using universal barcode primers. For accurate biodiversity assessment, it is important that these primers effectively amplify the barcode sequence from all taxa in the sample—or, at least, from all taxa of the particular target group that is being surveyed. The result is a complex mix of PCR amplicons, comprising barcode sequence from all targeted taxa. These amplicons are subsequently cloned into bacterial vectors; each individual bacterial transformant will thus possess a single amplicon representing a single taxon present in the initial sample. Extraction of bacterial plasmid DNA from a randomly selected clone and subsequent sequencing provides a barcode sequence that can then be analyzed by the procedure described above to generate an assignment for one of the taxa present in the environmental sample. Analysis of multiple clones (hundreds to thousands, depending on the biotic complexity of the sample) will ideally provide a complete description of sample biodiversity. Several groups have used this approach to assess microbial biodiversity, adopting ribosomal DNA sequences (16S or 18S) as barcodes (Dorigo et al. 2002; Glockner et al. 2000; Zwart et al. 2002), but none have yet applied it to large-scale surveys of metazoan biodiversity. Aside from the technical difficulties already discussed above with reference to other PCR-based methods (bias in DNA extraction and amplification leading to over- or under-representation of certain taxa, lack of reference sequences in public databases, etc.), the shotgun barcoding approach presents additional obstacles associated with obtaining an accurate community description from cloned sequences. Specifically, it may not be obvious how many clones need to be sequenced in order to get a complete biodiversity assessment, including rare taxa. Statistical solutions to this problem have been developed (Bohannan and Hughes 2003; Colwell et al. 2004), but in very highly diverse systems the consequent costs of the shotgun approach may be prohibitive.

Also possible are more cost-effective indirect methods for assigning identities to multiple taxa present in a single sample. These include the adoption of gradient gel electrophoresis, DNA fingerprinting, or microarray technology. DGGE or the closely related TGGE (temperature gradient gel electrophoresis) have been used with great success to assess microbial diversity. The ability—in theory at least—to separate DNA fragments differing by as little as one nucleotide allows resolution of all nucleotide variants of a single amplified gene region, typically a barcode region such as COI (for metazoans) or 16S (for bacteria). The application of this technique to microbial community genetics is well reviewed by Muyzer (1999). Analogous to this approach are DNA fingerprinting methods such as terminal restriction fragment length polymorphism (T-RFLP) analysis. T-RFLP similarly utilizes PCR amplification of a barcode sequence, but exploits size differences in end-labeled restriction fragments of barcode amplicons to identify taxa. Ideally, nucleotide differences between the barcode sequences associated with each species will result in different RFLP patterns when PCR products are enzymatically digested. Inclusion of a fluorescently labeled universal primer in the PCR reaction will insure that one fragment per amplicon (the terminal fragment) can be visualized on a high-resolution acrylamide gel. Each different fluorescently labeled fragment thus corresponds to a single species in the original sample. Bioinformatic tools can be used to determine species identities based on the pattern of these fragments (Kent et al. 2003), assuming sequence information is available for all potential target species (which may not always be the case (Anderson and Cairney 2004)). Marsh (1999) provides an excellent review on the usefulness of T-RFLP for microbial community genetics. Although theoretically this method is applicable to monitoring of invasive species or of other metazoan communities, no such studies have yet been done.

Both DGGE/TGGE and T-RFLP suffer from similar limitations. Most problematic is the fact, already mentioned earlier, that since both techniques assess nucleotide variation only indirectly, a single band may in fact represent multiple sequences with identical fragment mobilities. This may restrict the resolution of these techniques and prevent their application to highly diverse communities of unknown composition. Sensitivity is also highly dependent on choice of primers and (for T-RFLP) restriction enzymes (Dorigo et al. 2005; Engebretson and Moyer 2003). Indeed, both techniques have been utilized most effectively to assess changes in species composition, either spatio-temporal changes or those associated with exposure to certain environmental stressors (Anderson and Cairney 2004; Dorigo et al. 2005; Marsh 1999), rather than to directly enumerate and identify the taxa present in a sample.

Perhaps the most widely advertised tools for biodiversity surveys are microarrays. Microarray technology has only recently been applied to microbial communities (DeSantis et al. 2005; Loy et al. 2002, 2005; Wilson et al. 2002), and has yet to be fully explored in the context of metazoan communities. Of particular interest in the current context are “phylogenetic oligonucleotide arrays” (POAs), which have been adopted with some success for the description of microbial communities (see Zhou 2003; Zhou and Thompson 2002 for reviews). POAs can be thought of as “barcoding arrays”: microarray substrates are spotted with oligonucleotide probes corresponding to fragments of some barcode region (typically 16S small subunit rRNA for microbial systems). These oligonucleotides exhibit the range of nucleotide variation present in whatever target group is being studied—the broader the target group, the more oligonucleotides will be necessary to capture the diversity present in the sample. Arrays then are probed with PCR amplified barcode sequence from bulk extracted DNA. Statistical and bioinformatic approaches are available to correlate patterns of hybridization with the most probable taxonomic identities associated with the DNA species present in the amplified mix. This technology is readily transferable to a metazoan context by utilizing the appropriate barcode, most likely COI for many applications.

Although some success has been reported utilizing POAs for descriptions of microbial diversity, problems still remain with the technology. Wilson et al. adopted a 16S rRNA microarray to assess bacterial communities collected from air samples. While the method accurately identified single organisms and was able to inventory broad phylogenetic groups as well or better than standard shotgun barcoding, it was unable to resolve individual taxa in complex samples (Wilson et al. 2002). This may be unsurprising, given results that indicate dependence of signal intensity on the position and type of mismatch between probe and target DNA (Urakawa et al. 2002). Other studies have also shown reproducible variation in results depending on DNA extraction method (DeSantis et al. 2005). A similar approach was adopted to assess diversity of sulfate-reducing bacteria from tooth pocket samples; although the array in that study was generally effective at identifying reference taxa and discriminating species in environmental samples, there was evidence of occasional false negative hybridization with reference organisms and false positive identification of non-targets (Loy et al. 2002). More recently, Loy et al. have demonstrated that limitation of target taxa to a single taxonomic order dramatically increases the accuracy and sensitivity of the POA approach (Loy et al. 2005).

Very few groups have attempted to utilize microarray technology for metazoans. Pfunder et al. have developed a small-scale microarray based on COI oligonucleotides and capable of accurately discriminating among a small group of morphologically cryptic rodent species (Pfunder et al. 2004). And though microarrays are often touted as a potential solution to the problem of invasive species monitoring (e.g. Chornesky et al. 2005), only one group has applied the technology to identify commonly introduced insect pests in the Bactrocera dorsalis species complex (Naeole and Haymer 2003). No large-scale biodiversity surveys on the order of those conducted for microbial communities has yet been undertaken for metazoans.

Biodiversity surveys: assessing abundance

Theoretically, all of the methods for assessing species composition could be employed to estimate abundances: shotgun barcoding by determining the number of clones representing each taxon, DGGE/TGGE and T-RFLP by assessing intensity of identifying bands, and microarrays by monitoring intensity of hybridization. Although quantification has not been attempted with the shotgun barcoding approach, estimates of relative abundance have been retrieved from microbial communities using DGGE (Nubel et al. 1999). More promising is the potential for microarray analysis to provide such estimates. Although microarray quantification is a standard approach in many fields, particularly those that apply microarrays for analysis of gene expression, its application to community genetic analysis is relatively novel. Some studies have reported strong correlations (r > 0.9) between concentration of target DNA and hybridization signal intensity in studies of microbial diversity (DeSantis et al. 2005; Wu et al. 2001). However, sensitivity of signal to DNA extraction method (DeSantis et al. 2005) and the inability to distinguish between those intensity shifts due to abundance and those due to nucleotide mismatch (Wu et al. 2001; Zhou and Thompson 2002) recommend caution in interpreting these results. Other authors have recognized the potential biases in quantification introduced by template copy number (the microbial equivalent of body mass bias) and preferential PCR amplification (Polz and Cavanaugh 1998; Webster et al. 2003). All of these pitfalls have been considered previously, and all apply equally to metazoan communities; indeed, the range of body sizes and body compositions possible amongst metazoans may further exaggerate the biases associated with copy number/body mass and DNA extraction.

Given the difficulties associated with these approaches, it is not entirely clear that quantification will always be advisable. This will depend on the application, of course. But in some cases quantification may actually detract from the efficacy of the study. For instance, the importance of detecting rare taxa in invasive species monitoring might recommend that, rather than seeking relative abundance data, one would be better served by preferential amplification of rare templates.

Conclusions

DNA-based technology presents a wealth of potential tools for researchers and managers involved in the detection, identification, and monitoring of invasive species. Indeed, the recent explosion of papers on applications of DNA taxonomy (particularly DNA barcoding) have prompted virtually unqualified speculation as to the coming profusion of cheap, rapid, and accurate means of identifying specimens and assessing biodiversity. The promise of even more exotic implements—such as handheld devices for in situ field identification (see Box 2)—has further whetted the appetite for molecular technologies. We have attempted in this review to place such speculation more firmly within the context of technology development, in order to realistically assess the difficulties associated with generating the promised tools. Whereas some approaches are already available, pending only the initiative and investment of time and money to tailor tools to specific tasks, others remain hopeful science fiction. More specifically, the tools associated with confirmation of specimen identity, identification of unknown specimens, and targeted screening of environmental samples (applications 1–3 in Fig. 1) could all likely be adequately developed with currently available technology in the majority of cases. In contrast, methods for assessment of propagule pressure and characterization of biodiversity in complex metazoan communities presently require the solution of numerous technical problems before becoming realistic possibilities. The days of the “invasive species chip,” capable of screening samples for hundreds to thousands of potential invasives are, regrettably, some years off. And whether or not currently available technologies will ever be able to effectively quantify biodiversity in complex environmental samples is still very much an open question. Nevertheless, the impending benefits of such tools are vast, and worthy of continued effort and investment.

Box 2 “Labs on chips” and PCR-free DNA detection