Introduction

Biological invasion is a popular topic in ecological research and an urgent problem to be solved (González et al. 2010; Van Kleunen et al. 2010; Lowry et al. 2013). Researchers have tried to explain invasion of alien plants from various aspects, including the characteristics of successful alien plants, habitat invasibility, invasion history of the plant in other regions, and evolution of plants (Rejmánek 1996; Kolar and Lodge 2001; James and Drenovsky 2007; Van Kleunen et al. 2010). Since Baker (1974, 1965) proposed classic weed characteristics, many studies have explained and predicted plant invasions based on plant functional traits (Rejmánek and Richardson 1996; Reichard and Hamilton 1997; Goodwin et al. 1999; Thompson et al. 2001; Daehler 2003; Hamilton et al. 2005; Pyšek and Richardson 2007). However, it is challenging to find the universal characteristics of that can differentiate invasive and non-invasive alien plants. Moreover, there are very few studies on exploring the characteristics of invasive alien plants using multi-species comparison studies in China (Chen et al. 2015). Among the few studies, Chen et al. (2009) comparatively analysed 23 traits between invasive alien plants, non-invasive alien plants, and native weeds in Jiangsu, Zhejiang, and Shanghai provinces of China. Chen (2012) compared the differences in nine characteristics between invasive alien plants and naturalised alien plants in China. There are other similar multi-species comparative studies, but they were limited to small-scale geographic areas, and the comparison groups were usually a few invasive alien plants and non-invasive native plants (Chen 2016; Hu et al. 2016; Wei et al. 2017). Overall, research on the characteristics of invasive plants through multi-species comparison is still in the preliminary stage in China because the comparison groups, their geographical scale, and even the plant characteristics are not presently comprehensive enough. Status identification (i.e., invasive or non-invasive alien plants) of the alien plants is a prerequisite for multi-species comparative studies.

After years of basic research, information on the list of names, the invasive or non-invasive status, and damage evaluation of alien plants in China has become relatively more comprehensive (Xu and Qiang 2004, 2011; Wan et al. 2009; He 2012; Ma 2013; Huang 2014), which provides a basis for conducting multi-species comparisons on a national scale. On the other hand, plant trade in China began in the early 21st century (Normile 2004). However the current existing risk assessment methods are mainly suitable for managing invasive plants that have already entered China (Yin et al. 2007). Therefore, there is an urgent need for an effective pre-border risk assessment tool to screen invasive plants before introduction.

To develop a complete and efficient invasive alien plant screening tool suitable for China, it is important to identify distinguishing characteristics that can differentiate potentially invasive and non-invasive alien plants in China. To this end, in the present study, we (1) screened a range of plant characteristics previously used to distinguish between invasive vs. non-invasive plants in other parts of the world, and (2) used a predictive model to evaluate the discriminatory power of the metrics screened.

Materials and methods

Species selection

Invasive alien plants

A total of 103 invasive alien plant species that cause loss of economic or ecological benefits at the national level were selected for this analysis from the monograph ‘The Checklist of the Chinese Invasive Plants’ edited by Ma (2013). This monograph ranked the risk level of invasive alien plants in China into five grades based on their occurrence and the extent and degree of damage caused by them. Among the five grades, the species that cause loss of economic or ecological benefits at the national level were classified as malignant invasion (level I) (34 species) and serious invasion (level II) (69 species). Invasive plants that cause local damage or have little or no damage were classified as local invasion (level III), general invasion (level IV), and category to be seen (level V). Since the classification is not static, especially for invasive plants not in levels I and II, it is difficult to predict whether they will cause harm at the national level or be controlled by nature in the future (Yan et al. 2014). Therefore, we selected the 103 invasive alien plants listed under levels I and II.

Non-invasive alien plants

A total of 107 non-invasive alien plant species were selected for this analysis from the monograph ‘Exotic Plants in China’ (He 2012). Non-invasive alien plants were identified mainly according to the criteria that they did not cause economic or environmental harm in China (Daehler et al. 2004), and the impact of the time of introduction was also considered. The damage was mainly confirmed by referring to domestic and foreign literature and authoritative pest or invasive species databases, mainly in Crop Protection Compendium, Invasive Species Compendium, and Global Invasive Species Database.

It often takes a period of time for an invasive alien species to go through establishment to spread and outbreak after being introduced. The process is called invasion lag (Song and Xu 2004). Generally, the longer the duration since a plant was introduced into a non-native region, the higher the probability that it will become invasive (Pyšek et al. 2009a, 2009b). But on the other hand, if a plant didn’t cause economic and environmental harm for a long period of time after its introduction, then it will probably not become invasive. It often takes decades for an alien plant to become invasive (Williamson 1996), but the exact number of years is unknown. Relevant research in Japan has used alien plants that have been introduced for at least 40 years (Nishida et al. 2009), and at least 75 years in the United States (Koop et al. 2012). This study selected alien plants for at least 75 years from being introduced to China to ensure accuracy.

Characteristic selection

The indices set in the existing weed risk assessment methods were derived from the discriminative characteristics of invasive plants (see the indices in the Appendix 1). At present, the Australian Weed Risk Assessment method (AWRA) is widely used globally (Gordon et al. 2008; Dawson et al. 2009; Nishida et al. 2009; Crosti et al. 2010; Gassó et al. 2010; McClay et al. 2010). The AWRA has a set of 49 evaluation indexes, including the historical, biological and ecological characteristics of invasive plants, which are very comprehensive and applicable (Pheloung et al. 1999). It has also significantly influenced weed risk assessment methods in other regions, such as the United States (Koop et al. 2012). Meanwhile the Central European weed risk assessment methods use the characteristics of invasive plants that are easy to query and evaluate as evaluation indices (Weber and Gut 2004).

To quickly obtain the characteristics needed to establish a risk assessment method for invasive plants in China, we sorted out 45 discriminative characteristics related to invasive plants included in the AWRA and the Central European weed risk assessment (see Appendix 1). Among them, 43 characteristics were selected from the AWRA, two were included in the Central European weed risk assessment, and one was included in both (Pheloung et al. 1999; Weber and Gut 2004). In order to obtain more and easily assessable invasive plant evaluation indicators, 8 characteristics that were easily obtained and related to invasive plants, were obtained from other related studies (Reichard and Hamilton 1997; Pyšek and Richardson 2007; Qiang 2009; Chen 2012). Overall, we selected a total of 53 plant characteristics, among them, 52 were qualitative characteristics, and one was quantitative. The qualitative characteristics were mainly measured using the binary method of yes/no. The details of each characteristic are included in Appendix 1. According to the nature of the characteristics, each characteristic was classified into seven sections: colonisation (9 characteristics), reproduction (12 characteristics), dispersal (8 characteristics), undesirable characteristics (10 characteristics), life form and biological habits (6 characteristics), morphology and taxonomy (3 characteristics), and weed history (5 characteristics).

Information source

The information on each characteristic for each alien plant was mainly compiled from scientific publications. When the information was contradictory or ambiguous, it was judged by the consulting experts. Alien plant species with no information on each of the characteristics was not evaluated. All searches were completed in July 2021. Cnki (www.cnki.net/), Springer (springerlink.bibliotecabuap.elogim.com/), Wiley Online Library (onlinelibrary.wiley.com), and Google Scholar (scholar.google.com/) were mainly used to download the literature and books on corresponding plants. Online databases or Google search were used to directly query the relevant characteristics of plants. The Search keywords follow the tips given in the reference of Gordon et al. (2010). For example, to find out whether a plant was shade tolerant, we used the plant name + ‘shade’ / ‘sun’ / ‘light’ as keywords to search online. To find out whether a plant was capable of natural hybridization, we used the plant name + ‘hybrid’/‘crossing’ as keywords. Online plant information websites mainly included the following: Crop Protection Compendium (www.cabi.org/cpc), Invasive Species Compendium (www.cabi.org/isc), China National Pest and Quarantine Information System (www.pestchina.com), Global Invasive Species Database (www.iucngisd.org), Hawaii-Pacific Weed Risk Assessment(www.hear.org/plants/), Plant For A Future (www.pfaf.org), Seed Information Database (data.kew.org/sid/), Index to Plant Chromosome Numbers (www.tropicos.org), Useful Tropical Plants Database (tropical.theferns.info), US Department of Agriculture (plants.usda.gov), US Food and Drug Administration Toxic Plants Database (www.accessdata.fda.gov), Canadian Poisonous Plants Information System (www.cbif.gc.ca), Global Compendium of Weeds (www.hear.org/gcw/), and China Online Flora (www.iplant.cn).

Characteristic comparison and metrics selection

In order to identify metrics that can be used to formulate a predictive model for alien invasive plants, some characteristics were subdivided into multiple binary indicators, such as the area of origin, which was subdivided into five metrics (America, Europe, Asia, Africa, Oceania; Table 1 and Appendix 1). This approach yielded 80 metrics from the 53 characteristics. The 80 metrics were compared individually to explore whether there were differences between the two groups of alien plants (103 invasive alien plants and 107 non-invasive alien plants) in China. For qualitative characteristics, the chi-square test was used to test for differences (Lake and Leishman 2004; Sutherland 2004), while Fisher’s exact test was used when the conditions did not meet the requirements in the Chi-square test. For quantitative characteristics, an independent sample t-test was used to test for differences.

All the above analyses were performed using the statistical analysis software SPSS Statistics V22.0 (Li and Zhang 2015). To reduce type-I error, we performed Benjamini and Hochberg correction (BH) and Holm correction to correct the P-values after Chi-square comparisons (Tredennick et al. 2021). So only metrics with both values < 0.05 were considered to show significant differences between invasive vs. non-invasive plants in China, and these metrics were then used in the predictive modelling below.

Formulating a predictive model for alien invasive plants in China

The metrics analyzed in this study were all discrete and binary data; therefore, Lasso regression analysis was used to screen variables and build a predictive model for alien invasive plants in China (Tredennick et al. 2021). Lasso regression analysis was performed using STATA 16. LASSO (Least absolute shrinkage and selection operator) is a regularization method that adds a penalty term to the model estimation to compress too small regression coefficients to zero at the cost of estimation bias, thus obtaining higher model prediction accuracy and model generalization capability. LASSO adds an l1-parametric penalty term to ordinary least squares (OLS), which allows some regression coefficients to be compressed to zero and thus eliminates some variables from the model. Thus, LASSO regression can perform both model fitting and variable selection. In addition, LASSO also avoids the problems of overfitting and multicollinearity associated with OLS estimation when there are too many predictor variables (Tibshirani 1996; Tredennick et al. 2021; Zhang et al. 2020). Model fitting and variable selection in STATA is usually carried out by calculating the value of the parameter λ of the penalty function for model determination. There are three specific calculation methods including cross-validation (CV), adaptive lasso and plugin lasso. The most used method at present is cross-validation. The criterion of cross-validation is to minimize the CV function f (λ), an estimate of the out-of-sample prediction error. The model for the λ that minimizes the CV function is the selected model. In this study, the CV method is also used for model selection, choosing the λ that minimize the CV function to determine the final model. LASSO is able to calculate the goodness of fit of every model simulated. The larger the CV deviation ratio and the smaller deviation indicate a better fit (StataCorp. 2021, Obuchi and Kabashima 2016).

A total of 210 species of alien plants (including 103 invasive and 107 non-invasive species) were collected in this study, and 70% (72 invasive and 75 non-invasive species) were used as the training set, while 30% served as the test set. When setting parameters in STATA 16, the Logit model was selected, the random seed number was set to 123, and the 10-fold cross-validation method was employed to construct multiple potential predictive models. Predictive models should be able to calculate the probability of a plant species becoming invasive. We classified species with a probability > 0.5 as invasive, and a probability ≤0.5 as non-invasive (Tang and Li 2014). We determined the model selected by the software that minimize the CV function to be the final optimal model and tested the evaluation outcome of the optimal model on the test set. Additionally, a receiver operating characteristic (ROC) curve was used to test the evaluation outcome of the prediction model.

Results

Colonisation

Among the nine characteristics classified as colonisation, in terms of repeated introduction outside their natural range, all 210 selected plants had a history of repeated introduction. There was no significant difference in this characteristic between invasive alien plants and non-invasive alien plants in China (Table 1). Among the remaining eight characteristics, there were significant differences in the characteristics of the source area, domestication, inhabiting risky habitats, and shade tolerance between invasive alien plants and non-invasive alien plants in China (Table 1). Among them, in the invasive alien plant group, the proportion of plants that were native to the Americas, not domesticated, inhabiting the risky habitats, and shade tolerance was significantly higher than that of the non-invasive alien plant group (Table 1). In the non-invasive alien plant group, the proportion of plant species native to Europe and Asia was significantly higher than that in the invasive alien plant group (Table 1).

Table 1 Comparison results of colonisation characteristics between invasive alien plants and non-invasive alien plants in China

Reproduction

Among the 12 characteristics classified into reproduction, there were no significant differences between the invasive alien plant and the non-invasive alien plant groups in the three characteristics of seed germination requirements, specialist pollinators and substantial reproductive failure in native habitats (Table 2). None of the 92 selected plants (49 invasive alien plants and 43 non-invasive alien plants) failed to breed for the latter characteristic. There were significant differences between the two groups of plants for the remaining ten characteristics. In the invasive alien plant group, the proportion of plant species with a persistent propagule bank, polyploidy, high reproductive capacity, self-fertilisation, and natural hybridisation was significantly greater than that in the non-invasive alien plant group (Table 2). In terms of the characteristics of minimum generative time, the proportion of plants with a generative time less than one year in the invasive alien plant group was significantly greater than that in the non-invasive alien plant group (Table 2). In terms of the characteristic of pollinators, the proportion of plants with wind-pollinated plants in the invasive alien plant group was significantly higher than that in the non-invasive plant group (Table 2). There were no significant differences between the two groups in other pollination methods. In terms of reproductive mode, the proportion of plants with only asexual reproduction was significantly less in the invasive alien plant group than that in the non-invasive alien plant group (Table 2). The proportion of plants with both asexual and sexual reproduction in the invasive alien plant group was significantly higher than that in the non-invasive alien plant group (Table 2). The two groups of plants had no significant differences in the characteristic of only sexual reproduction. With regard to the plant reproductive system, the proportion of polygamo-monoecious plants in invasive alien plants was significantly higher than in non-invasive alien plants (Table 2). However, there was no significant difference between the two groups in terms of the characteristics of hermaphrodite, dioecious, and polygamo-dioecious.

Table 2 Comparison results of reproduction characteristics between invasive alien plants and alien non-invasive plants in China

Dispersal

There were significant differences between the invasive alien plant group and the non-invasive alien plant group in China in the eight dispersal characteristics (Table 3). Among them, in terms of the characteristic of ‘dispersed intentionally by people’, the proportion of plants that can be actively spread by people in the invasive alien plant group was significantly lower than that in the non-invasive alien plant group (Table 3). For each of the remaining seven characteristics, the proportion of plants in the invasive alien plant group was significantly higher than that in the non-invasive alien plant group (Table 3).

Table 3 Comparison results of dispersal characteristics between invasive alien plants and alien non-invasive plants in China

Undesirable characteristics

Among the ten undesirable characteristics, the invasive alien plant group and non-invasive alien plant group in China did not show significant differences in the four characteristics causing fire hazards, parasitic plants, unpalatable to grazing animals climbing or covering growth habit (Table 4). The proportion of plants that can carry important pests and diseases in the invasive alien plant group in China was significantly lower than that in the non-invasive alien plant group (Table 4). For each of the remaining five characteristics that were of forming dense thickets, having allelopathic effects, being toxic to animals, being allergic or toxic to humans, and having thorns, spines or burrs, the proportion of plants in the invasive alien plant group was significantly higher than that in the non-invasive alien plant group (Table 4).

Table 4 Comparison results of undesirable characteristics between invasive alien plants and alien non-invasive plants in China

Life form and biological habits

There was no significant difference between the proportion of plants in the invasive alien plant group and the non-invasive alien plant group in China in terms of nitrogen fixation characteristics, aquatic and geophyte (Table 5). In each of the three characteristics of life form, flowering characteristics, and fruiting characteristics, the proportion of plants in the group of invasive alien plants was significantly different from that of non-invasive alien plants. In terms of life form, the proportion of small herbaceous perennials in the invasive alien plant group was significantly lower than that in the non-invasive alien plant group (Table 5). The proportion of small herbaceous plants was significantly higher than that in the non-invasive alien plant group (Table 5). In terms of flowering characteristics, the proportion of plants beginning flowering in spring in the invasive alien plant group was significantly lower than that in the non-invasive alien plant group (Table 5). In addition, the proportion of plants that bloomed throughout the year in the invasive alien plant group was significantly higher than that in the non-invasive plant group. In terms of fruiting characteristics, the proportion of plants that bear fruit throughout the year among the invasive alien plants was significantly higher than that in the non-invasive alien plants (Table 5).

Table 5 Comparison results of life history and life form between invasive alien plants and alien non-invasive plants in China

Morphology and taxonomy

It showed that there were significant differences between the invasive alien plant group and the non-invasive alien plant group in China in each of the three characteristics of fruit or seed morphology, seed quality, and taxonomy (Table 6). In terms of the fruit morphology, the proportion of plants whose fruit or seed had long-distance dispersal structures in the invasive alien plant group were significantly higher than that in the non-invasive alien plant group (Table 6). Additionally, the proportion of plants with large fleshy fruits was significantly lower than that in the non-invasive alien plant group. In terms of seed quality, the average 1000-seed mass of invasive alien plants (12.66 g) was significantly lower than that of non-invasive alien plants (242.10 g) (Table 6). In terms of classification, the number of plants belonging to Asteraceae and Poaceae in invasive alien plants was significantly higher than that in the non-invasive alien plants (Table 6).

Table 6 Comparison results of morphology and taxonomy between invasive alien plants and alien non-invasive plants in China

Weed history

In each of the four characteristics, weeds of agriculture/ horticulture/ forestry, environmental weed, garden/ amenity/ disturbance weed, and congeneric weed, the proportion of plants possessing any one of the four characteristics in the group of invasive alien plants in China was significantly higher than that in the group of non-invasive alien plants (Table 7). The proportion of plants with weedy races in the group of invasive alien plants in China was not significantly different from that in the non-invasive alien plant group (Table 7).

Table 7 Comparison results of associated characteristics between invasive alien plants and alien non-invasive plants in China

Metrics selection

Two metrics without significant differences from Colonisation and Reproduction were excluded from metrics selection (i.e., ‘repeated introductions outside its natural range’ and ‘substantial reproductive failure in native’). Since almost all of the metrics were binary, we also excluded thousand-seed weight from the metrics selection. So the remaining 77 metrics were performed Benjamini and Hochberg correction (BH) and Holm correction (Holm). Finally 30 metrics of the 77 metrics were identified as significant differences by both BH and Holm correction between the invasive and non-invasive plant groups (The top 30 metrics in Table 8), with five of the metrics from colonisation, eight from dispersal, two from morphology and taxonomy, six from reproduction, five from Undesirable characteristics and four from weed history (Table 8).

Table 8 Metrics selected by Benjamini and Hochberg correction (BH) and Holm correction

Optimal predictive model selection

LASSO simulated 100 models with 10-fold cross-validation to assess model accuracy. The optimal prediction model for alien invasive plants in China was identified as having λ = 0.018 and 18 coefficients. This model parameterisation minimised the CV function (Fig. 1), and yielded the minimum CV deviation and maximum dev. Ratio (Table 9). The optimal model could accurately identify 97% of non-invasive plants and 94% of invasive plants in the training set. The 18 predictor variables included in the optimal LASSO model and their coefficients are listed in Table 10. A positive or negative coefficient indicated the metric had a corresponding positive or negative correlation with a plant being scored as invasive. The magnitude of the coefficient represented the degree of correlation. The metric ‘domesticated’ meant it had the highest negative correlation with invasive plant, and the metric ‘allelopathic’ had the lowest positive correlation with invasive plant.

Table 9 Predictive models simulated by Lasso
Table 10 Predictor variables and the coefficient estimated of the optimal predictive model selected
Fig. 1
figure 1

Different values of lambda and their corresponding values of CV function.  λ: It’s the tuning parameter used to control the degree of compression of the regression coefficients, and the larger the value, the stronger the penalty.  λ cv : The lambda value when minimizing the value of CV function

Prediction outcome of the optimal predictive model

Using the test set, 90% of invasive plants and 75% of non-invasive plants were accurately identified by the optimal predictive model (Table 11). ROC results showed that the prediction outcomes of the model using the test set were significantly higher than 0.5 (P < 0.05), and the area under the curve (AUC) values were 0.925 (Fig. 2).

Table 11 Predictive results on test sets by the optimal predictive model
Fig. 2
figure 2

ROC curve for predictive outcome of optimal predictive model selected on test set

Discussion

Characteristics with no significant differences

We found no significant differences in 36 metrics between invasive alien plants and non-invasive alien plants in China through comparative analysis before Benjamini and Hochberg correction (BH) and Holm correction. These 36 metrics come primarily from colonisation, reproduction, undesirable characteristics, life form and biological habits.

In summary, the reasons for this result may lie in the aspects of uses of plants, life form, habitat, number of samples, and information quality of alien plants in China. Concretely, there were five characteristics in the category of colonization that showed no significant differences between the two groups of plants: ‘naturalised beyond native range’, ‘repeated introductions outside natural range’, ‘herbicide resistance’, ‘tolerates or benefits from mutilation, cultivation, or fire’, and ‘low-nutrient tolerance’. The reason for the two characteristics ‘naturalised beyond native range’ and ‘repeated introductions outside its natural range’ might be related to the fact that the alien plants were used for agricultural planting, ornamental, and medicinal purposes (Weber et al. 2008). Such plants were widely circulated globally and could be introduced repeatedly in one place. At the same time, continuous planting after introduction would inevitably bring high reproduction pressure to the introduced area (Gassó et al. 2010), thus promoting the escape of alien plants and causing naturalisation (Pyšek et al., 2009a, 2009b). Therefore, we inferred that these two characteristics are not accurate in discerning invasive alien plants in China.

The reasons for the other three characteristics in the category of colonization, ‘herbicide resistance’, ‘tolerates or benefits from mutilation, cultivation, or fire’, ‘low-nutrient tolerance’, and the characteristic ‘having weedy races’ in the category of weed history, the comparative analysis results were significantly affected by the lack of information. Usually, the information for invasive alien plants was not balanced with that of non-invasive plants; the former had more information than the latter (Gassó et al. 2010; McClay et al. 2010), and the quality of the information was generally low (Kueffer et al. 2013; Pyšek et al., 2009a, 2009b). In this study, among the four characteristics mentioned above, the sample numbers of non-invasive plants were between 20 and 33. Therefore, it is necessary to strengthen related research and information accumulation for the three characteristics in the future, especially for non-invasive plants. On the other hand, it was not easy to obtain information regarding these characteristics. Thus, presently, these characteristics might not be suitable for use as an index in the weed risk assessment in China.

There were four characteristics without significant differences between the two groups of plants in the category of undesirable characteristics: ‘unpalatable to grazing animals’, ‘creating a fire hazard in natural ecosystems’, ‘climbing or smothering growth habit’, and ’parasitic’. The reason that there was no significant difference on ‘unpalatable to grazing animals’ might mainly lie in plant use. According to the specific statistical results, 32 invasive plants can be used as animal feed, and about 40 non-invasive plants usually consumed by poultry are used as food and vegetable plants. Therefore, both groups of plants have several plants that are palatable to herbivores, resulting in no significant difference in the characteristic of unpalatability. In terms of the characteristic of creating a fire hazard in natural ecosystems, there were a very small proportion of combustible plants between the two groups. This might be because invasive plants in China are mainly present in human-managed agricultural or forestry ecosystems (Weber et al. 2008), making it difficult to form fire regimes of the grass-fire cycle similar to that in the natural environment (Fusco et al. 2019). At the same time, non-invasive plants are rarely grown in the natural environment, so the two groups of plants are not different in this characteristic. There were four characteristics that showed no significant differences between the two groups of plants in life form and biological habits (i.e., the ‘life form’,

‘aquatic’, ‘geophyte’, ‘nitrogen fixation’ categories). The reason that there was no significant difference in the four characteristics in life form and biological habits, and two characteristics ‘climbing or smothering growth habit’ and ’parasitic’ in undesirable characteristics is likely related to the life form bias of the alien plants in China. Approximately 84% of the invasive plants in China are herbaceous plants, and among them, there are many annual herbs, few aquatic plants, and very few vines (Weber et al. 2008; Yan et al. 2014).

There were two characteristics without significant differences between the two groups of plants in the reproduction category: substantial reproductive failure in the native habitat and seed germination requirement. For the former characteristic, such a situation rarely occurs in natural conditions (Gordon et al. 2010), therefore, the two groups in this study did not show significant differences in this characteristic. For the latter characteristic, the relevant data mainly came from the Seed Information Database (SID) and many results in the database were obtained under indoor conditions, so there might be deviations from the actual situation.

New characteristics or metrics for constructing weed risk assessment

In this study, we included some new characteristics that are not assessed in the Australian or Central European WRA: ‘the source area from America/Asia/Europe’, ‘polyploidy’, ‘Asteraceae’, and ‘wind-pollinated’. In terms of pollinators, this study found more wind-pollinated plants in the invasive alien plant group. Different studies have shown different results for wind pollination. Some have found that it was negatively related to invasion (Williamson and Fitter 1996), while others have suggested that it was positively related to invasion (Daehler 1998). Our results suggest that wind pollination is positively correlated with plant invasion. However, the pollinator data in this study, especially the pollinator information for non-invasive plants, were mainly derived from the PFAF website. Sutherland (2004) believed that pollinator information on this website is not accurate. In view of the inconsistency of pollinators in different studies and the objective reality of information acquisition, this characteristic may cause instability in predictions. Therefore, we suggest that wind pollination should be applied with caution in the WRA constructing of China.

Plant polyploidy came from the references of Qiang (2009) and Reichard and Hamilton (1997). In recent years, many studies have begun to pay attention to the correlation between polyploid and invasive plants and found that polyploid plants are more likely to be invasive and have higher invasion ability than diploid plants (Levin 2002; Pandit et al. 2011, 2014; Te Beest et al. 2012). Our results showed that more than half (55%) of the invasive plants in China were polyploid, significantly higher than the proportion of non-invasive plants, which is consistent with the conclusions of the previous studies. Therefore, we conclude that the characteristics of plant polyploidy could be an effective predictive index.

Before performing Benjamini and Hochberg correction and Holm correction, we found that the metrics ‘polygamo-monoecious’ and ‘annual flowering and fruiting’ differed significantly between the invasive and non-invasive plants. Although the two metrics did not show significant differences between invasive and non-invasive alien plants in China after P-value correction and were not screened in the finally optimal predictive model in this study, we suggest that these two metrics could be used as evaluation metrics, especially for the weed risk assessment method for China in the future. There are few studies on the effect of being polygamo-monoecious for plant invasiveness. Usually, studies compare the invasiveness of hermaphrodite and dioecious plants, but these results are controversial (Pyšek and Richardson 2007). Sutherland (2004) found that the majority of invasive alien plants were hermaphrodites. Razanajatovo et al. (2016) suggested that hermaphrodite plants may facilitate plant invasion because plants presenting this characteristic could reproduce by themselves without relying on other individuals. Compared with hermaphrodites, the breeding system of polygamo-monoecious plants may be more complicated and maybe more conducive to survival and reproduction in the natural environment. Although the absolute number of polygamo-monoecious plants among the invasive alien plants in China was not high, the relative proportion was also quite high, at 15%. At present, there are very few studies testing the influence of being polygamo-monoecious in plant invasion. Therefore, the preliminary results of this study on polygamo-monoecious plants put forward an interesting hypothesis, that is, whether the characteristic of being polygamo-monoecious contributes to plant invasion. However, given the significant difference in this characteristic between invasive and non-invasive plants, this study suggests that it could be applied as an effective index in the construction of weed risk assessment.

In terms of the characteristics of the flowering and fruit periods, the results of this study showed that invasive alien plants generally have a longer flowering and fruit period than non-invasive alien plants, and there were more plants of annual flowering and fruiting in the invasive alien plant group in China. Previous studies also found that invasive alien plants had a longer flowering and fruiting period than non-invasive plants (Goodwin et al. 1999; Lake and Leishman 2004). This might be due to the long flowering and fruiting period that could increase the reproduction time and yield, thereby increasing the chance of colonisation (Baker 1974). The reason for more invasive plants flowering and fruiting annually in this study might be related to the geographical distribution pattern of invasive plants in China. Yan et al. (2014) found there were more invasive plants in the south and east, less in the north and west in China, and many invasive plants were distributed in the southwest and southeast coastal areas, especially in Fujian, Guangdong, Guangxi, Yunnan, and Taiwan. These five provinces are located in subtropical and tropical regions (Wu et al. 2006; Yan et al. 2014), so they could provide suitable environmental conditions for the plants flowering and fruiting all year round.

The prediction accuracy of the optimal prediction model and establishment of future invasive plants screening tool

Overall, the model identified in this study had high predictive accuracy on both the training set and the test set. In particular, the average prediction accuracy for invasive plants could reach more than 90% on both training and test sets, and the prediction accuracy for invasive plants only on the test set could also reach 90%, which was higher than the prediction accuracy of the AWRA and the USA weed screening tool for invasive plants in China (84% and 68%) (He et al. 2018, 2020), and also higher than the highest prediction accuracy rate of 82% in models simulated using machine learning algorithms in China by Chen et al. (2015).

The results showed that the metrics with significant differences screened in this study could be used for the construction of screening tools for invasive plants in China, and the screening tools were likely to have good predictive ability in distinguishing invasive plants. The high predictive accuracy of the optimal model should benefit primarily from the judging metrics, as these metrics were mainly derived from the Australian weed risk assessment, and they were obtained from different studies of invasive plant characteristics around the world (Pheloung et al. 1999). At the same time, this study adds 4 new predictors that are not adopted in the AWRA and the Central European weed risk assessment, which were the origin from America, only asexual reproduction, wind-borne and polyploidy. These metrics reflect the distinguishing characteristics of invasive plants in China and therefore they could contribute to predict which alien plants may become invasive in the future. For example, in this study, there were a total of 110 species of alien plants from the Americas, of which 78 species had become invasive plants in China, indicating that the source areas of invasive plants in China were significantly biased. The reason behind this bias might be a combination of climatic similarities and trade. China has a similar climate type to North America (Liu and Wei 1986), and according to the continental drift hypothesis, plants from East Asia and North America should have similar genetic backgrounds, while the climate of southeastern China has similarities to that of South America (Yan et al. 2014), so alien plants whether native to North America or South America, might find a suitable place to live in China. In addition, the frequent trade between China and the USA has led to the frequent introduction of invasive alien plants into China (Callaway et al. 2006), increasing the propagule pressure (Pyšek et al. 2015) and thus giving them a better chance of surviving in China.

But meanwhile, we also found that the overall prediction accuracy of the model on the test set was lower than the prediction result on the training set. The main reason lied in the lower prediction accuracy for non-invasive plants. Only 75% of the non-invasive plants were correctly identified, while the predictive accuracy of non-invasive plants on the training set was as high as 97%. This was very similar to the assessment accuracy of non-invasive plants in China by AWRA, which was 76% (He et al. 2018). Our result is in line with the consistent assessment of AWRA, that is, assessments accuracy for non-invasive plants are generally lower than those for invasive plants (McClay et al. 2010; Koop et al. 2012). The important reason for the lower predictive accuracy for non-invasive plants by AWRA lies in the setting of threshold division of predictive results. So by adjusting the threshold appropriately, it is possible to improve the assessment accuracy of AWRA for non-invasive plants (Nishida et al. 2009; McClay et al. 2010; Koop et al. 2012).

But for our result, there might be other two reasons for the lower predictive accuracy of the optimal model for non-invasive plants. First, non-invasive plants and invasive plants are not absolutely discrete, but are usually in a continuous state (Gordon et al. 2008). Some non-invasive plants may become invasive in the future, and, therefore, also have one or two characteristics typical of invasive plants which might have affected the final result. For example, the probability of becoming invasive for 3 of the non-invasive plants in the test set had a result value of about 0.6. But if the value of one predictor variable was canceled, (e.g., allelopathy), then the probability would be reduced to below 0.5. So such plants in the intermediate state may affect the predictive accuracy of our model for non-invasive plants. The second reason might be related to the categories classified by the predictive model. Usually a complete weed screening tool will divide the results into three categories, major invasive plant, minor invasive plant and non-invasive plant (e.g. Kato and Hata 2006; Koop et al. 2012; Křivánek and Pyšek 2006; Pheloung et al. 1999). For minor invasive plant, a re-evaluation tool can usually be used for re-screening, which would improve the predictive accuracy for non-invasive plants, because usually the re-evaluation tool only evaluates the most important prediction features (Daehler et al. 2004; Koop et al. 2012; McClay et al. 2010).

Overall, we believed the optimal prediction model had high prediction accuracy on the test set. Firstly it could accurately identify 90% of invasive alien plants and thus excellently fulfilled the primary purpose of a WRA preventing the vast majority of invasive alien plants from entering China and causing damage. Secondly the 75% predictive accuracy for non-invasive alien plants of this model was still higher than the average of 70% predictive accuracy of AWRA for non-invasive alien plants Worldwide. Finally, because of China’s large land area and the resulting wide distribution range of invasive alien plant, economic losses from invasive alien plants might be greater than the economic benefits from accepting the introduction of the 25% non-invasive alien plants.

The current results show that it is feasible to use the metrics screened in this study to construct an efficient screening tool for invasive plants in China. But before formally building an invasive plant screening tool in China, we think that the following aspects should be carefully considered and resolved. (1) Considering the division of predicted results, it is recommended to divide the results into three categories (i.e., non-invasive plant, minor invasive plant and major invasive plant), and establish a secondary screening tool with the most discriminative metrics to improve the predictive accuracy for minor invasive plant to better assist management decision-making; (2) Assess the sources of uncertainty. For example, lack of information is an important type of uncertainty (Koop et al. 2012), which can be dealt with by adopting different algorithms to build relevant predictive models, such as Bayesian methods (Tredennick et al. 2021). (3) Consider using different machine learning algorithms for invasive plant screening tool building. In fact some existing studies have shown that various machine learning models have high evaluation accuracy (Elith et al. 2006; Keller et al. 2011; Chen et al. 2015). (4) Use multiple test sets to test the evaluation outcome of the final invasive plant screening tool. For example, using invasive plants from different regions or countries as test sets, such as invasive plants that have invaded the Americas or Africa.

Conclusion

Our results suggest that, among the 80 known metrics related to plant invasion, 30 can discriminate between invasive and non-invasive alien plants in China. Therefore, we could infer that not-every character known to be related to invasive plants in other regions applies to invasive alien plants in China. Invasive alien plants in China showed a particular bias in-terms of geographic origin, taxonomy, polyploidy and reproductive mode. Compared to non-invasive plants, invasive plants in China are more likely to be native to the Americas, belong to the Asteraceae family, be polyploid, and do not propagated asexually only. The predictive model simulated with these 30 metrics in the study has high predictive accuracy on both our training and test sets. These results suggest that there are specific and suitable metrics for the construction of a pre-border weed screening tool for China. Our study has laid a preliminary foundation for further research on invasive plant screening tools for China.