Introduction

The Australian Weed Risk Assessment (AWRA) system (Pheloung 2001; Pheloung et al. 1999), and systems adapted from it, have been widely used and tested as a means of predicting the risk of adverse impacts from introductions of alien plants. The system was developed to screen proposed new introductions of plants into Australia, and uses a battery of 49 questions relating to a species’ biogeography, undesirable attributes, biology, and ecology. Scores for these questions are combined in a mainly additive manner to generate an overall score. Species with an overall score of 0 or less are rated as acceptable for introduction, those scoring 7 or more are rejected, and those in the range 1 to 6 are subject to further evaluation. The AWRA has been used since 1997 in Australia as a component of the federal regulatory system for proposed new introductions of plants (Weber et al. 2009). The predictive power of the AWRA has been evaluated in a number of political jurisdictions and geographic areas around the world (Daehler and Carino 2000; Daehler et al. 2004; Dawson et al. 2009; Gassó et al. 2010; Gordon et al. 2008b; Kato et al. 2006; Krivánek and Pyšek 2006; Nishida et al. 2009; Williams and West 2000). A recent review of many of these evaluations (Gordon et al. 2008a) concluded that the accuracy of the system across a diverse range of geographies is sufficient that it can be generally adopted as a screening system for proposed new plant introductions.

In Canada the Canadian Food Inspection Agency (CFIA) is the body responsible for regulating plant importation and the declaration of quarantine pests. The CFIA has been using a qualitative weed risk assessment system guided by the International Plant Protection Convention standards ISPM #2 and ISPM #11 (FAO 2004, 2007) since 2000, but these assessments can be labour-intensive and time-consuming, and are best suited to evaluating the risk of unintentional introductions of known weed species from other parts of the world. They are not designed for the rapid assessment of proposed new intentional introductions, or of emerging invaders. This has prompted interest in the potential usefulness of a suitably adapted version of the AWRA for Canadian use (e.g., Daehler and Denslow 2007), and this study was undertaken to determine whether such a system would be accurate enough to use in practice. Although Gordon et al. (2008a) suggested that the AWRA was reliable enough to be adopted as a working system without further testing, we felt that it would not be advisable to implement the system without first evaluating its performance against species previously introduced into Canada. Canada is the largest geographic area to which the AWRA has been applied, has a wider range of climatic and environmental conditions than most areas in which the system has been evaluated, and has the most severe winter conditions, a factor which would be expected to have a significant influence on the success of plant species introduced mainly from milder climates.

As in the evaluations referred to above, we assessed the performance of the AWRA against a set of test plants known to have a history of introduction into Canada, and ranging in status from non-naturalized species to major invaders with significant impacts. The predictions of the AWRA were compared with the status of each plant species in Canada as assessed by an independent panel of agricultural, botanical and conservation experts.

Methods

Development of test plant list

We developed an initial list of 261 non-native vascular plant species present in Canada, including non-naturalized, naturalized, and weedy/invasive species, selected so that the representation of taxonomic groups and growth habits was similar across the three categories. Species were selected using lists compiled during the preparation of an earlier report on the status of invasive plants in Canada (Canadian Food Inspection Agency 2008) together with data in White et al. (1993), Darbyshire (2003), and the Weed Seeds Order (Minister of Agriculture and Agri-Food 2005), and with the addition of some non-naturalized species from Ashley and Ashley (1992–1993), Buckley (1977), Knowles (1995), and Munro and Small (1997). All species were evaluated independently for their weed or invasive status in Canada by a panel of experts including eight botanists, two conservationists and four agriculturalists. They were asked to rate each species as a “major weed”, “minor weed”, or “not a weed” in Canada, based on their own experience and expertise and the definitions in Table 1, and to leave the ratings blank for any species with which they were not familiar.

Table 1 Definitions for weediness categories used in the expert rating process (Daehler et al. 2004)

After compiling the responses, twelve species were deleted from the list because of insufficient responses or taxonomic problems. The remaining species were grouped into the three categories of “major weed”, “minor weed”, and “not a weed”, based on the consensus of the expert assessments, following the same process as Daehler et al. (2004). All of these species were reviewed to confirm their non-native status in Canada and at least a 50-year documented history of introduction. As only three fully aquatic species remained in the list at this stage, these were deleted. Each category was then further reduced by random removals for a final list of 50 major weeds, 50 minor weeds, and 52 non-weeds (Appendix 1). The non-weed category included 19 species not known to have naturalized in Canada.

Modifications to the questions

The questions and scoring system of the Australian Weed Risk Assessment System were used in their original form (Pheloung et al. 1999) except for four questions that refer specifically or by implication to Australian conditions. Question 2.01 “Species suited to Australian climates” was changed to “Species suited to Canadian climates”. We evaluated climate suitability by comparing the species’ known distribution outside Canada with the global version of the USDA Plant Hardiness Zones (Magarey et al. 2008; maps at http://www.nappfast.org/Plant_hardiness/ph_index.htm). Hardiness zone ratings were sometimes available directly from the horticultural literature, but more often were estimated by comparing the species’ known distribution with the 10-year global plant hardiness zone map referred to above. The “known distribution” includes the native and naturalized range, as well as records from cultivation that indicated that the species was able to complete its full life history. Species that were hardy to zone 6 or below scored 2 points, those hardy to zones 7–9 scored 1 point, and those only hardy to zone 10 or above scored zero. Question 2.04 “Native or naturalized in regions with extended dry periods” was changed to “Native or naturalized in regions with cold winters”, which we defined as those where at least 1 month has a mean daily minimum temperature below −10°C. Data in New et al. (2002) were used to produce a global map of these areas (Fig. 1).

Fig. 1
figure 1

Areas of the world with cold winters, defined as at least 1 month with mean daily minimum temperature below −10°C, based on New et al. (2002)

Question 4.10 “Grows on infertile soils” was changed to “Grows on soil types found in Canada”. For the purposes of this question, soil types found in Canada were considered to be the Canadian soil classification orders Luvisolic, Chernozemic, Podzolic, Brunisolic, and Gleysolic, which together cover over 85% of the area of Canada for which soil mapping is available (Soil Landscapes of Canada Working Group 2007). The equivalent groups in the World Reference Base for Soil Resources classification (IUSS Working Group WRB 2006) are the soil groups luvisol, kastanozem, chernozem, podzol, cambisol, and gleysol. A map showing the general world distribution of these soil groups (Fig. 2) was developed from data in FAO (2003).

Fig. 2
figure 2

Areas of the world with soils matching “Canadian” soil types—see text for definitions

Finally, question 8.05 “Effective natural enemies present in Australia” was changed to “Effective natural enemies present in Canada”.

Answering the questions

Questions were answered by the authors and other CFIA staff on the basis of standard literature searches covering books, refereed journals, extension literature, and some online databases. All information derived from occurrence of the test plant species in Canada was excluded. Data on fifteen species that had previously been assessed in Florida (Gordon et al. 2008b) were provided to us and used with appropriate modifications for the Canadian assessment. We followed the guidance developed at the 2007 International Weed Risk Assessment Workshop (Gordon et al. 2010) as far as possible, and used a Microsoft Excel spreadsheet developed for the Florida evaluation of the AWRA (Gordon et al. 2008b) for data entry and score calculation. It was not possible to score the species “blind” (Gordon et al. 2008b) as most of the assessors were familiar with weeds and invasive plants in Canada. A training workshop conducted by the senior author at the beginning of the scoring phase and a guidance document that was updated by ongoing consultation among the assessors were used to ensure consistent approaches to the questions as far as possible. All assessments were reviewed for consistency by the senior author. On average, each species assessment took approximately 8 h.

The secondary screen described by Daehler et al. (2004) was applied to species in the “evaluate further” range of scores, and a dichotomous key format for the secondary screen was developed to simplify its application (Table 2).

Table 2 Dichotomous key format for the secondary screen of Daehler et al. (2004)

Data analysis

We calculated a pest status index from the mean score of all experts who rated each species, where a score of 0, 1, or 2 corresponded to non-weedy species, minor, or major weeds, respectively (Daehler et al. 2004).

We used logistic regression to analyze the effect of the AWRA score on the consensus expert ratings. Logistic models were estimated for the probability of a species being rated as a “major weed” or “not a weed” as a function of the AWRA score, and the minor weed probability was estimated as one minus the sum of these two probabilities.

Receiver operating characteristic (ROC) curves (Fawcett 2003; Gordon et al. 2008a) were plotted to assess the predictive power of the system. An ROC curve is a plot of the proportion of correctly diagnosed positive cases (true positives) against the proportion of negatives incorrectly diagnosed as positives (false positives). Standard ROC curves can only be plotted for a test with a binary outcome, whereas we used three weediness categories. We therefore plotted two ROC curves, one with the minor weeds included in the non-weeds (corresponding to a less strict regulatory policy of rejecting only species likely to be major weeds), and one with the minor weeds included with the major weeds (corresponding to a risk-averse policy of rejecting even species likely to be minor weeds). The area under the ROC curve (AUC) is a measure of the power of the test to correctly discriminate true positive from true negative cases. Youden’s index (true positive proportion minus false positive proportion) was used to examine the effect of varying the cutoff threshold on the discriminatory power of the test and to identify the cutoff scores that gave the maximum discriminatory power (Bewick et al. 2004).

The contribution of individual questions to the predictive power of the system was assessed by χ2 tests comparing the distribution of answers among the three weediness categories with the expected distributions if answers were independent of weediness.

Statistical tests were carried out using Statistix® 8.0 (Analytical Software, Tallahassee, FL, USA) and the climatic and soil maps were generated using Manifold® System 8.0 (Manifold Net Ltd, Carson City, NV, USA).

Results

AWRA scores ranged from −9 for onion, Allium cepa, and cabbage, Brassica oleracea, to 31 for bull thistle, Cirsium vulgare. According to Pheloung et al. (1999) possible scores range from −14 to +29, but Gassó et al. (2010) found a score of 32 for one species they assessed, and it is not hard to generate hypothetical species with higher scores. The scores were a highly significant predictor (linear regression, F 1,150 = 27.716, P < 0.0001: Fig. 3) of the pest status index calculated from the expert ratings. Mean scores for all weediness categories were significantly different (Table 3), but there was considerable overlap in the distribution of scores (Fig. 4).

Fig. 3
figure 3

Relationship of pest status index with WRA scores

Table 3 AWRA scores for the three weed categories in Canada
Fig. 4
figure 4

Proportion of species rated as not weeds (open circles) or major weeds (filled circles) as a function of the AWRA score. Curves show the logistic models for probability of being not a weed (long dashes), a minor weed (short dashes) or a major weed (solid line)

Logistic regression showed that the AWRA scores were significant predictors of the consensus expert ratings (all model parameters significantly different from zero, P < 0.0001). The logistic models (Fig. 4) were:

$$ \begin{aligned} P_{\text{not}}\,=\,1/\left( { 1+ e^{{ 1. 8 9 5- 0. 2 2 7 {\text{a}}}} } \right) \\ P_{\text{major}}\,=\,1/\left( { 1+ e^{{ - 3. 7 1 7+ 0. 20 3 {\text{a}}}} } \right) \\ P_{\text{minor}}\,=\,1- P_{\text{major}} - P_{\text{not}} \\ \end{aligned} $$

where P not, P major, and P minor are the probabilities of being rated as not a weed, a major weed, or a minor weed respectively, and a is the AWRA score.

Applying the standard cut-off scores of 0 and 6, the system rejected all major weeds and 86% of minor weeds. However, it also rejected 42.3% of the non-weeds. Ten percent of the minor weeds and 44.2% of the non-weeds fell into the “evaluate further” range. After application of the secondary screen, all minor weeds in the “evaluate further” range were accepted. Among the non-weeds in this group 18 were accepted, one rejected, and four remained in the “evaluate further” range (Table 4).

Table 4 Outcomes of the weed risk assessment process for 152 Canadian test species, before and after applying the secondary screen

Areas under the ROC curves were 0.867 if minor weeds are included as positives, and 0.845 if only major weeds are counted as positives (Fig. 5). When minor weeds are counted as weeds, the maximum value of Youden’s index (true positive rate minus false positive rate) is 0.638 at a cutoff score of 10. Considering only major weeds as weeds, the maximum value for Youden’s index is 0.545 at a cutoff score of 14.

Fig. 5
figure 5

ROC curves for the AWRA system applied to 152 Canadian test plants, with minor weeds counted as weeds or non-weeds. Open circles show the location of the maximum value of Youden’s index

Table 5 shows the results of χ2 tests of individual yes/no questions based on the hypothesis that weediness status is independent of answers to the questions. There are significant or highly significant effects for all the “weed elsewhere” questions, most of the propagule dispersal questions, occurrence in areas with cold winters, broad climate suitability, naturalization beyond the native range, allelopathy, the existence of congeneric weeds, reproductive failure in the native habitat, and forming “dense thickets”. Question 8.03, “well controlled by herbicides”, has a strong effect but in the opposite direction to that assumed by the AWRA, with a “yes” answer being associated with a higher than expected number of weedy species. A number of other attributes, including toxicity or unpalatibility to animals, shade tolerance, hybridization, apomixis or self-fertilization, fire hazard, wind dispersal, and production of viable seed, showed no signs of association with weediness. Other attributes fell in an intermediate range, with marginally significant effects that might indicate some degree of association with weediness.

Table 5 Results of χ2 tests for independence of weediness categories and question answers

As expected, climatic adaptation was an important predictor of weed status. Questions 2.03 (broad climatic suitability) and 2.04 (native or naturalized in regions with cold winters) both had highly significant χ2 values. Question 2.01 in our version adapted for Canada requires the user to provide an answer from 1 to 13, representing the lowest plant hardiness zone in which the species is known to live. There was a clear relationship between the pest status index based on expert opinions and the hardiness rating, as seen in Fig. 6 (linear regression, F 1,150 = 34.30, P < 0.0001). The mean pest status index fell into three groups, with species in hardiness zones 1 to 3 having a mean index of around 1, those in zones 4 to 6 means around 0.4 to 0.5, and those in zones 7 and above (only 6 species) having means close to zero. Based on these test plants it appears that species hardy only to zones 7 or above are unlikely to become even minor weeds in Canada.

Fig. 6
figure 6

Relationship between pest status index based on expert opinions (means ± standard error) and plant hardiness scores for 152 introduced plant species in Canada

Discussion

Scores across all categories in our evaluation were consistently higher than in other published assessments of the AWRA (Table 6) despite our efforts to adhere closely to the guidance developed by previous studies (Gordon et al. 2010). Reasons for this discrepancy are not clear. A possible reason could be differences in the criteria for classifying species into the a priori groups used in different studies. In some studies all naturalized species are classified as at least minor invaders. However the mean score for the 19 non-naturalized species in our study was 3.16, still noticeably higher than scores for this group in other studies where non-naturalized species were treated separately (Table 6).

Table 6 Comparison of WRA scores from this study with other published regional assessments

The AWRA performed well in identifying weedy species in Canada, rejecting 100% of major weeds and 86% of minor weeds. There was, however, a high false positive rate, with 44% of non-weeds being rejected using the standard cut-off at 6 points and the secondary screen. Rejected species that were considered non-weeds in Canada by the experts were Chamaemelum nobile, Kniphofia uvaria, Mimosa pudica, Aira caryophyllea, Amaranthus spinosus, Cardamine flexuosa, Fraxinus excelsior, Galanthus nivalis, Genista tinctoria, Geranium pusillum, Iris pumila, Lobularia maritima, Lonicera periclymenum, Luzula campestris, Matthiola longipetala ssp. bicornis, Medicago polymorpha, Molinia caerulea, Prunus tomentosa, Rorippa amphibia, Rosa canina, Sherardia arvensis, Torilis nodosa, and Veronica spicata. The first three of these are not naturalized in Canada. Several of these species (Iris pumila, Aira caryophyllea, Galanthus nivalis, Kniphofia uvaria, Luzula campestris, Geranium pusillum, Chamaemelum nobile, Torilis nodosa, Mimosa pudica, Amaranthus spinosus) have hardiness zone ratings of 5 or above and thus would not be expected to be hardy in large areas of Canada. This suggests that the scores assigned for climatic adaptation do not penalize lack of cold-hardiness enough. This could potentially be corrected by adjusting the scoring of the cold-hardiness and “weed elsewhere” questions. However a more fundamental problem seems to relate to the additive nature of the scoring system. This means that even species very unlikely to establish in Canada because of climatic limitations, and which therefore score very low on the “biogeography” section of the assessment, can be pushed into the “reject” region if they score highly enough on the “undesirable attributes” section.

False positive rates reported in other studies are difficult to compare because of varying definitions of the a priori categories and because not all studies used the secondary screen, but in any case vary widely. Krivánek and Pyšek (2006) found only 2% of non-naturalized and casual species were rejected after the secondary screen, and Pheloung (1999) reported 7% of non-weeds were rejected. Kato et al. (2006) found that 23% of non-pests were rejected after application of the secondary screen. The false positive rate of 42% in our study was among the highest reported, exceeded only by the 50% rejection rate of “casuals” by Gassó et al. (2010). Correspondingly, we found the discriminatory power of the system, as measured by the area under the ROC curve, to be near the low end of the range of values found in other studies. If minor weeds are included as weeds, AUC values found in other studies ranged from 0.79 for East African rainforests (Dawson et al. 2009) and Spain (Gassó et al. 2010) to 0.95 for the Czech Republic (Daehler et al. 2004; Gordon et al. 2008a; Krivánek and Pyšek 2006). Excluding minor invaders the range of values is from 0.86 for Australia to 0.99 for the Czech Republic (Gordon et al. 2008a; Krivánek and Pyšek 2006; Pheloung et al. 1999).

The association of a “yes” answer to question 8.03 (“Well controlled by herbicides”) is probably because information on herbicidal control is much more likely to be available in the literature for known weedy species. Answers to many other questions showed no significant association with weediness categories. For some questions this lack of association may reflect a different composition of the Canadian weed flora from the Australian. For instance, the question on geophytes (5.04) probably reflects the presence of a significant number of invasive South African bulb species in mediterranean regions of southern Australia (Fox 1990); there is no corresponding component of the Canadian weed flora. Other questions, such as 4.01 (“Grows on soil types found in Canada”) or 6.02 (“Produces viable seed”), were not predictive because they were answered “yes” in virtually all cases. Regardless of the reasons, this suggests that a risk assessment system using a reduced question set could perform as well as the full Australian system in identifying species likely to have adverse impacts if introduced. A similar conclusion was reached in a review of the operation of the original Australian system by Weber et al. (2009).

Conclusions

Our results showed that the Australian system has significant predictive power in identifying introduced species which have become weedy in Canada. However, the performance of the system in our evaluation was somewhat inferior to what has been found in other geographic areas. In particular, the high rate of false positives would be problematic for a system to be used as a screen for restricting proposed introductions, and would leave the system open to the criticism that it was needlessly limiting introductions of species that could have significant economic benefits. Given the high level of interest in and support for new crop development in Canada (e.g., Alberta Agriculture Food and Rural Development 2006; Blade and Slinkard 2002; Li 1999; Small 1999; Zheljazkov et al. 2008), and the historical lack of regulation of new species introductions, it will be difficult to defend a weed risk assessment system with a high false positive rate. The false positive rate could be reduced somewhat by raising the cut-off score for rejection from the standard value of 6 to 10 or 14, as indicated by the maximum value of Youden’s criterion. However this would come at the cost of allowing a higher rate of introduction of species with significant impacts. Our results suggest that there is scope for refining the AWRA to give better predictive power for Canadian purposes, particularly by making better use of information on climatic suitability. In addition to cold winters, the climate of most parts of Canada is characterized by a relatively short and cool growing season (Liu et al. 2002), an aspect that is not captured by measures of climatic adaptation based solely on cold-hardiness. It is possible that using an additional measure of adaptation such as growing degree-day requirement would improve the predictive power of the system. We also expect that the system could be improved by by incorporating a more standard risk assessment model where hazard (consequences) and exposure (likelihood) elements are treated in a multiplicative way (Cook 1999; Daehler and Virtue 2010) rather than the additive approach used in the Australian system. This should mean that species unlikely to establish in Canada because of climatic limitations would be rated as low-risk, regardless of any other undesirable characteristics they might have. Further development of a risk assessment system using test species previously introduced into Canada should be based on dividing these species into a test set and a training set, to avoid spuriously high estimates of predictive power due to overfitting (Fawcett 2003). We are continuing to use the results from this study to develop a revised risk assessment system that should have improved predictive power for species introduced into Canada.