Introduction

Classical biological control of weeds is the foreign exploration, importing and dissemination of agents that are adapted to attacking an invasive plant species (McFadyen 1998) on the premise of a co-evolutionary relationship between herbivore and plant (Erlich and Raven 1964). An ideal agent, be it insect, mite, or fungus, would be one that can potentially mitigate the negative ecological consequences of the invasion without attacking desired plant species. To avoid non-target effects, potential agents are tested using a list of candidate plant species, mostly including those closely related to the invasive species (Wapshere 1974), but our hypotheses of relationships between species, genera and even families are not necessarily correct. Another challenge is that invasive populations of a species are rarely homogenous entities. There may be genetic variation that leads to phenotypic variation in resistance or tolerance to attack by biocontrol agents (Gaskin et al. 2011). The phylogenetic relationships of species to each other and the relationships of populations or genotypes within species to each other are challenging to discern using traditional phenotypic information. Advances in molecular tools have added an additional dataset to these types of analyses, revolutionizing our understanding of plant phylogenies (Stevens 2001; Angiosperm Phylogeny Group 2016), invasion history, and intraspecific relationships that previously challenged effective biocontrol. Molecular studies of both the target and agent are increasing in biocontrol programs (Hinz et al. 2020), but many programs still lack this research. Here I discuss some of the more recent molecular investigations of invasive plants targeted for biocontrol and suggest enhancing collaborations between molecular biologists and classical biocontrol researchers to improve biocontrol outcomes. Specifically, I will discuss examples of molecular studies that elucidate information regarding origins, population structure, reproductive mode, hybridization, and phylogenetics of invasive weed targets.

Molecular methods

Over the last few decades, molecular methods have improved enabling rapid and economical collection of copious data (Flanagan and Jones 2019; Hu et al. 2021). The use of next generation DNA sequencing has grown exponentially in biological studies in general, and various “short-read” sequencing methods produce abundant data for discerning genotypes, making it easier than before to find population level variation and discriminate between intraspecific taxa. Earlier methods such as RFLPs (restriction fragment length polymorphisms), AFLPs (amplified fragment length polymorphisms), ISSRs (inter-simple sequence repeats), SSRs (simple sequence repeats) and Sanger sequencing of single or a few genes gave us few to dozens to hundreds of variable loci. Next generation, high throughput sequencing methods such as GBS (genotyping by sequencing) and RAD-Seq (restriction site associated DNA sequencing) enable discovery of thousands to tens of thousands of SNPs (single nucleotide polymorphisms), or more, in an individual. Both GBS and RAD-Seq methods basically digest genomic DNA of an individual, attach adaptors to the DNA fragments, then sequence the fragments to compare and detect variation between individuals or populations. Pooling of multiple individuals, each with a unique ID tag, in each small reaction makes the process highly efficient and relatively affordable (Elshire et al. 2011). These methods do not require a reference genome, known PCR primers, or assembly of the complete genome of the plant, though studies of whole genomes can be informative when investigating gene functions and rapid, adaptive evolution of invasiveness (McCartney et al. 2019).

Collecting plant DNA samples from the introduced or native range is often a time consuming and costly step (Hoelmer et al. 2023). As population level variation and any population genetic structure are usually unknown at the beginning of a project, it is important to sample multiple plants per population, and multiple populations per species. This helps determine if genetic variation exists across geographic areas, and if it is mostly among or within populations. The exact number of samples per population depends on budget, scale of genetic variation desired (e.g., distinguishing between individuals would require more samples than distinguishing between species) and reproductive mode of the organism. The number of loci analyzed and the number of samples per population both influence the ability to adequately capture the genetic diversity within a population. Leipold et al. (2020) suggested that 120 loci and 23 samples per population are adequate for a stable estimation of 95% of the genetic diversity. Nazareno et al. (2017) demonstrated that with higher numbers of loci (e.g., > 1000 SNPs), sampling eight plants per population is adequate. Note that, in general, clonally reproducing and primarily selfing species will have lower within-population variation than outcrossing species, requiring fewer samples to describe within population diversity, but there can be exceptions such as in broadleaved pepperweed (Lepidium latifolium L.), a primarily outcrossing species with very low within-population diversity (Gaskin et al. 2012; i.e., 99% of invasion samples were genetically identical when using 100 polymorphic loci). Since much of the cost of collecting population samples is in travelling to the location, I suggest collecting at least 20 samples per population, and a lower number can be processed if initial genetic analysis suggests there is little within-population genetic variation.

Codominant molecular markers (those able to detect the genetic contribution or alleles from both parents) are important when looking for recent hybridization in an invasion. The only methods mentioned above that do not provide codominant data are AFLPs and ISSRs, though these can be used in hybridization studies if one parental taxon is completely fixed for presence of an allele while the other parent is fixed for absence of an allele (e.g., Falush et al. 2007). When plants have higher ploidy levels than diploid (e.g., triploid) or mixed ploidies in a species, discerning which alleles are associated with each homologous chromosome becomes challenging (Dufresne et al. 2014), and confirming the ploidy of an individual may require microscopic chromosome counts or flow cytometry (e.g., Amsellem et al. 2001).

Regardless of the type of molecular genetic study conducted, estimating the error rate is essential for discriminating between genotypes (banding patterns) that can be considered different from each other, based on sufficient replication and an accurate calculation of the error rate, especially if the objective is to discriminate genotypes at a fine scale with many loci or SNPS, as any errors may suggest a new genotype that is not real (Crawford et al. 2012; Saunders et al. 2007). Error checking of molecular data is also important if using DNA sequences to estimate phylogenies, as errors in DNA reads, often found at beginning or ends of uncleaned sequences, can create errors in phylogenetic estimates (Salas et al. 2005).

The cost of molecular analyses to help find effective biocontrol agents is not trivial, but doing so may help avoid the monetary and ecological costs of developing and releasing an inefficient biocontrol agent, or one that attacks non-targets (Sheppard et al. 2005). There is also an argument that it is not necessary to collect biological control agents from weed genotypes that match those in the invasive range (new-association biocontrol; Hokkanen and Pimentel 1989), suggesting that lack of co-evolution may result in a lack of ecological equilibrium, making a biocontrol agent potentially more damaging. Roley and Newman (2006) provide an example of this with the aquatic milfoil weevil that co-evolved with a native milfoil but has now expanded its host range to include the invasive watermilfoil (Myriophyllum sibiricum Komarov). Additionally, biocontrol agents with variable success have been collected from genotypes that were found in later studies to be not those present in the invaded range (e.g., agents for saltcedar Tamarix ramosissima Ledeb. were used on an invasion that is mostly novel hybrid genotypes that do not exist in the native range; Gaskin and Schaal 2002), but that does not contradict the generally accepted doctrine that the best adapted biocontrol agents may be found on genotypes best matching or identical to those that are invasive.

Even so, molecular analysis is likely justified as a part of the agent development process when there are unknown origins, unclear phylogenetic relationships, variation in reproductive mode, population structuring (different genotypes or lineages exist in different geographic areas), or suspected taxonomic confusion or hybridization (Ward et al. 2008; Gaskin et al. 2011; Hinz et al. 2019; Müller‐Schärer et al. 2020). Below I will review recent molecular studies of plant invasions and discuss how they improved our knowledge about weed biocontrol targets.

Origins and population structure

Geographic origins of plant species, especially ones that are weedy, are often broad, such as at the continent or multi-country level, thus exploration for biocontrol agents in the native range can be time consuming. A population survey across the native range to find origins for the invasive plant genotypes or taxa can expedite agent exploration. Molecular data can match invasive and native range genotypes to find putative origins, thus speeding up the discovery of closely co-evolved agents that may be better adapted to the host and more effective in controlling the invasion (Ward et al. 2008; Harms et al. 2020). If biocontrol agents are host-specific at below the species level, or certain plant genotypes are resistant or tolerant of the agent, a program may need to also determine the distribution and diversity of plants found across the invasion as each of these may have different origins and co-evolved herbivores (Gaskin et al. 2011).

For example, earleaf acacia (Acacia auriculiformis A. Cunn. ex Benth.) is an Australian tree introduced into the USA and is now invasive in Florida. It is endemic to north Queensland and the Northern Territory in Australia, and Papua New Guinea. McCulloch et al. (2021) used genotyping by sequencing (GBS), and, based on over 9000 SNPs, found that Florida samples formed a distinct cluster and were genetically most similar to samples from the Northern Territory, Australia. There was no evidence that Florida plants were introduced from other parts of the native range. This information led to surveys for potential biocontrol agents from particularly significant and more precise areas, finding dozens of arthropod species which have the potential to be host-specific and impactful (Minteer et al. 2020).

Another example of population structure (i.e., where certain genotypes are found only in certain areas of the native or introduced range) is flowering rush (Butomus umbellatus L.), a perennial rhizomatous aquatic invasive plant species in North America that originated in Eurasia. This species is primarily clonal, reproducing via bulbils or root fragments, so low genetic diversity is expected. To identify and narrow down origins of the North American invasions, Gaskin et al. (2021) used 80 AFLP loci and found six invasive genotypes, indicating multiple founding events. The genetic makeup and ploidy (diploid and triploid plants) of the western North American populations was distinct from the earlier eastern North American invasion, with different genotypes and ploidy levels dominating different regions. An exact genetic match for the common western North American genotype was only found in the Netherlands. The authors also proposed best estimates for origins of the other invasive genotypes in Hungary and the Republic of Georgia (Gaskin et al. 2021). This work allows exploration for biocontrol agents in more precise locations where they have co-evolved with the various genotypes and ploidies of flowering rush.

It is at times difficult to morphologically distinguish between closely related species, leading to taxonomic changes and arguments, which can confound biocontrol exploration. McCulloch et al. (2020) analyzed African boxthorn (Lycium ferocissimum Miers), a weed of national significance in Australia. The authors sampled putative L. ferocissimum from the native range in South Africa and introduced range in Australia and subjected them to both morphometric and DNA sequencing. Nuclear and chloroplast genetic diversity across South Africa and Australia was low, with no evidence of population genetic structure. All of the samples in the introduced range (Australia) were confirmed as L. ferocissimum, and sequence data indicated that one of the two common invasive genotypes was found only near Cape Town, suggesting this location as the origin for this genotype. Multiple samples morphologically identified as L. ferocissimum in the native range were genetically determined to be other Lycium species. Without this molecular analysis, biocontrol researchers may have wasted exploration time on the wrong plant species or wrong origins of the invasive genotypes.

Reproductive mode

Many perennial invasive plant species can utilize both sexual and asexual modes of reproduction (Pyšek 1997; Liu et al. 2006), but it is often not known which reproductive mode prevails in the field (Eckert 2002). The genetic diversity of plants in a population can provide evidence of sexual reproduction (Eriksson 1989; Gaskin and Littlefield 2017; West et al. 2023). This is a less useful method of investigation when the plant species can self-pollinate, because in that case it is difficult to tell if identical genotypes are the result of selfing or vegetative propagation. Two examples of studies of reproductive mode in plant invasions include research on leafy spurge (Euphorbia virgata Waldst. and Kit.) which is a perennial, highly self-incompatible, forb that reproduces by root budding and seed. West et al. (2023) studied the abundance of seedling vs. clonal (shoot) recruitment using AFLP loci on 100 transects (1958 plants genotyped) across North Dakota, Montana and Idaho, USA. In the past, leafy spurge was assumed to be mostly clonally reproducing via underground rhizome spread (Chao et al. 2006), but West et al. (2023) found an unexpectedly high genetic diversity across most sites, evidence of frequent recruitment from seed. This suggests that biocontrol strategies for E. virgata should be modified: after decades of biocontrol effort targeting clonal reproduction, increased importance should be placed on developing agents that reduce the production of seeds, or attack seeds.

Another study involves the self-incompatible field bindweed Convolvulus arvensis L. invasions in North America. Gaskin et al. (2023) performed AFLP analysis on 634 plants from 64 populations across western North America and found 399 distinct AFLP genotypes. The production of new shoots within populations was by both seed and rhizome, with reproduction by seed being slightly more common. Some individuals grew to approximately 50 m in length via rhizome spread. Field bindweed’s ability to reproduce successfully via seed from outcrossing may be a key to its extensive phenotypic variation and invasive success. The study suggests that attack on seeds or floral structures might be useful to stop local and long-distance dispersal, but without root attacking biocontrol agents it is unlikely that local spread or persistence of the invasive species can be successfully controlled.

Hybridization

Human-mediated movement of species, intraspecific taxa, or genotypes that have been historically isolated from each other can lead to novel hybrid genetic combinations within a plant invasion (Gaskin et al. 2011). Hybridization can increase invasiveness in some cases by providing a rapid mechanism for increasing genetic diversity and producing novel gene combinations (Ellstrand and Schierenbeck 2000). Novel hybrid genetic combinations may complicate biocontrol programs, as agents did not co-evolve with these hybrids, and may have never adapted to or even encountered these genotypes. For this reason, any hybrid genotypes should be included in host-specificity testing.

A recent example of hybridization in a plant invasion includes Mexican waterlily (Nymphaea mexicana Zuccarini) which is an aquatic plant native to southern USA and Mexico that has become problematic in South Africa. N. mexicana hybrids exist in the wild and horticultural trade, but identification is difficult. To ensure that potential agents were collected off plants similar to invasive populations in South Africa, Reid et al. (2021) used ISSRs and found the presence of both hybrid and pure forms of N. mexicana in South Africa, which may present difficulties for management using biocontrol.

Delta arrowhead (Sagittaria platyphylla (Engelm.) J.G. Sm.), an aquatic plant from the southern USA, is invasive in Australia and South Africa. Kwong et al. (2017) used AFLP markers to analyze populations from the USA, Australia and South Africa and results suggest that introduced populations in Australia and South Africa were founded by multiple sources from the USA, and intraspecific hybridization between genetically distinct lineages from the native range may have occurred. The authors suggest that any hybridization may influence biocontrol effectiveness if a candidate agent is highly specialized to species-specific genotypes and use of any genotype-specific agents may suggest that a novel plant genotype is less susceptible to attack.

Cogongrass, Imperata cylindrica (L.) Palisot de Beauvois, is a federally listed noxious weed invading the southeastern USA and constitutes a significant threat to global biodiversity and sustainable agriculture worldwide (Overholt et al. 2016). The geographical origin and native range of this Old World species were obscure, making searches for biocontrol agents difficult. Additionally, hybridization with congener I. brasiliensis Trin. was suspected, complicating identification of the origins of invasive genotypes of this species. Burrell et al. (2015) used 2320 SNPs derived using GBS to identify the reproductive mode, genetic diversity and geographic origins of this invasion in the southeastern USA. Analyses identified four clonal lineages of cogongrass in the USA with no evidence of hybridization among the different lineages, despite geographical overlap. Molecular data supported anecdotal suggestions of southern Japan as the proximal origin of some introductions to the Gulf Coast states, which will simplify searches for co-evolved agents.

Phylogenetics

Classical biocontrol agent development relies strongly on the testing of agents on non-target plants most closely related to the invasive species (Wapshere 1974; Kelch and McClay 2004). Use of molecular markers to create more accurate phylogenies has been ongoing for a few decades, and many of the higher-level relationships (e.g., at the plant family level) have been resolved. Though not so recent, some of these studies had implications for biocontrol host-range testing. For example, the family Scrophulariaceae, which contains invasive toadflax species of the genus Linaria Mill., was shown to not be a monophyletic lineage, and has since been divided into seven or so families, and still a few of the genera formerly assigned to Scrophulariaceae do not fit into any existing clade recognized at the family rank (Tank et al. 2006). These changes can have important impacts on developing a host test list, as genera and families previously thought to be most closely related to the invasive species may actually be more distantly related genera, and vice versa.

A good overview of family level relationships can be found at the Angiosperm phylogeny website v 14 (Stevens 2001). A review of the different molecular markers used in phylogenetic studies can be found in Suyama et al. (2022). Most phylogenetic studies are done by researchers outside of the biocontrol field and priorities for biocontrol may not be addressed in a timely manner, which can be problematic.

There are still many phylogenetic assignments that need to be revised below the family and genus level to help biocontrol practitioners develop accurate host test lists of most closely related plant taxa. Recent examples include a study of the genus Cirsium Mill. in North America, which contains Canada thistle (Cirsium arvense (L.) Scop.). Ackerfield et al. (2020) found that many of the varietal complexes in Cirsium were polyphyletic (i.e., taxa in the complex were not most closely related to each other) and found evidence to support new relationships. Taxonomic difficulty in the genus is partly the result of phenotypic convergence (taxa look similar but are not as closely related as once believed) and hybridization. Another example is the invasive species houndstongue (Cynoglossum officinale L.) which is in the subtribe Cynoglossinae, and with ca. 200 species is one of the most taxonomically challenging subtribes of tribe Cynoglosseae. Pourghorban et al. (2020) used nuclear and chloroplast DNA sequences and found that only one of the genera in the study was monophyletic, with some Cynoglossum species showing up in different clades of the subtribe, and likely more closely related to other genera. This type of finding improves both host specificity test lists and agent exploration, which is often done on close relatives of the invasive species in order to estimate host range of potential agents from field observations.

Conclusion

Development of a single biocontrol agent is a multi-year process that can consume a researcher’s time, funding and career direction. Biocontrol agent researchers are typically entomologists, mycologists or ecologists with strong knowledge bases of the agents that will attack plants, and typically do not have training in molecular studies. Adding molecular investigations to a project is burdensome to the biocontrol researcher and expensive to the program. Adding to the complexity of this research, the failure of agents to impact the abundance of target species may not be directly related to weed or agent genotype, but instead be due to reasons such as climate, predation, parasitism, insufficient numbers agents released, etc. (Stiling 1993; Harms et al. 2020). Thus, researchers should only embark on molecular studies of target weed species when there are complications such as unknown origins, phylogenetic relationships, reproductive mode or population structure, or suspected taxonomic confusion or hybridization.

Which type of multilocus analysis should be performed for a biological control of weeds project when questions such as listed above arise? High throughput sequencing (next generation) methods, such as RAD-Seq (SNPs) can certainly produce an order of magnitude or two more markers than the older process of AFLPs, but the several hundred markers from AFLPs are likely sufficient. Kirschner et al. (2021) found that, in four out of six study species, AFLP led to results comparable to RAD-Seq. AFLPs, per sample, are a cheaper process than high throughput sequencing, though they may me more expensive per locus, and they are less useful for identifying hybrids due to their dominance. In the end, many methods may be successful, but it may come down to what type of multilocus analysis the laboratory or outsource laboratory is set up to perform, and which process the collaborator that reads the data is used to analyzing. As of 2011 the use of AFLPs has dropped and RAD-Seq has increased, and during 2019 the number of publications using both methods was roughly equivalent (Kirschner et al. 2021). No doubt high throughput sequencing will be the standard in the near future as costs per sample continue to decrease.

The molecular aspect will necessitate collaboration with geneticists and plant taxonomists and systematists, and their area of focus in not necessarily the target weed, so finding and recruiting molecular researchers or laboratories to work on a project is not always trivial. There are many more invasive plant species in need of biocontrol as part of their integrated management, especially for invasions outside of Australia, Hawaii, New Zealand, North America and South Africa, where strong programs already exist (Schwarzländer et al. 2018). This is a call for the relevant agencies and universities to fund and hire molecular geneticists to work explicitly with biocontrol researchers, to aid in developing safe, effective, and timely biocontrol agents.