Introduction

Cone snails of the genus Conus are predatory venomous marine mollusks feeding on fish, worm or snails. After decades of biological prospecting, conopeptides expressed in their venom duct have emerged as one of the richest and most promising marine sources of natural products (Blunt et al. 2012). The analysis of cone snail venoms has revealed a complex exogenome that is characterized by an extremely high level of diversity. With more than 600 described Conus species, each producing an estimated 100–200 venom components, the ensemble of cone snails were, until recently, estimated to produce between 50,000 and 100,000 different toxins (Menez et al. 2006; Olivera 2006). Recent studies, however, clearly demonstrate that this figure is an underestimation, probably by a factor of ten or so, with several new species described every year, more venom components detected in each sample using evolving technologies such as mass spectrometry (Biass et al. 2009; Ueberheide et al. 2009; unpublished results) and NextGen sequencing (Hu et al. 2011; Terrat et al. 2011) or combinations thereof (Violette et al. 2012), and marked intra-species and even intra-specimen variations in venom composition (Davis et al. 2009; Dutertre et al. 2010; Jakubowski et al. 2005). It is now estimated that the number of cone snail venom components exceeds one million.

An important characteristic of conopeptides which makes them attractive for drug development is their high selectivity for molecular targets that span a broad range of therapeutic applications (Gayler et al. 2005; Leary et al. 2009; Molinski et al. 2009). So far, the conopeptide MVIIA (SNX-111, Prialt, or Ziconotide) from Conus magus (the magician cone) that selectively blocks Cav2.2 N-type voltage-gated calcium channels has been approved for the treatment of severe chronic pain (McGivern 2007; Miljanich 2004) and there are more promising drug candidates in the pipeline (e.g., see Favreau et al. 2012; Han et al. 2008a; Lewis 2012). The potential of this rich source of pharmacological products has stimulated a race for the discovery of new toxins. From the traditional bioactivity-guided identification, lead discovery efforts have evolved toward modern structure-driven characterization (venom peptidomics and proteomics, venom gland transcriptomics, targeted genomics, structure–function studies) and biocomputing-assisted analyses (proprietary databases and bioinformatic tools) (Daly and Craik 2009; Favreau and Stöcklin 2009; Koua et al. 2012; Laht et al. 2011). In addition, phylogenetic approaches have recently emerged as an effective way to quickly identify divergent lineages that are likely to have evolved with different functional characteristics. This approach to identify these previously uncharacterized conopeptides is referred to as concerted discovery (Conticello et al. 2001; Duda and Remigio 2008; Olivera 2006; Puillandre and Holford 2010).

However, despite the effectiveness of phylogenetic approaches in concerted discovery, the technique is rarely used for the classification of conopeptides (but see Aguilar et al. 2009; Conticello et al. 2001; Wang et al. 2008; Zhangsun et al. 2006). Several statistical methods for conopeptide classification, such as Mahalanobis (Lin and Li 2007) or BLAST and Euclidian distances among others (Mondal et al. 2006) have been described; however, most of these approaches are primarily designed for classification of new sequences rather than for testing the current classification (i.e., checking the validity of each known group by a blind-exploratory approach). Conopeptide precursors are characterized by a typical structural organization consisting of a highly conserved signal region, followed by a more variable pro-region and a hyper-variable mature toxin containing a few conserved amino acids such as the cysteine residues required for disulfide bonds. Conopeptides are mainly named and classified according to three properties: first, they are characterized by their signal sequence, this short sequence (~20 amino-acids) is highly conserved, and has been used to define superfamilies; second, mature toxins structural families are characterized depending on their pattern of cysteines (the Cys-pattern), for example, the mature toxin can include a variable number of cysteines (most commonly 4 or 6), and their respective position can vary (4 cysteines can be organized as C–C–C–C or CC–C–C where “–” represents a variable number of amino-acids); finally, several conopeptides have also been characterized according to their molecular targets, referred to hereafter as “functional families,” and also previously termed “pharmacological families.”

In a recent paper, Kaas et al. (2010) reviewed the structure, function, and diversity of conopeptides on the ConoServer database (www.conoserver.org). In particular, they proposed that “the ‘gene superfamily’ classification scheme focuses on evolutionary relationships between conopeptides”, while the two other classification schemes (cysteine framework and function) do not. Their underlying hypothesis was that similarities in the Cys-pattern or function might have arisen by convergence. While we fully agree with this statement, we also argue that it could serve as a rationale to assess the congruence between the current gene superfamily classification and the evolution of the corresponding multigenic system, and to accurately demonstrate that convergence phenomena are common in conopeptide structure and function.

Here, we review the current superfamily classification of conopeptides by analyzing all the signal sequences available in GenBank using a phylogenetic approach to check: (i) if all the defined superfamilies correspond to homogeneous groups; and (ii) if all the GenBank signal sequences belong to a known superfamily. This study seeks to provide a “rationale” for a phylogenetic classification of conopeptides and to clarify their current classification, thus complementing the work initiated by Kaas et al. (2010).

Materials and Methods

Sequences from GenBank

Since the signal sequences used for phylogenetic analyses (see below), are only found on complete nucleotide precursors and are not known for conopeptide discovered using proteomic approaches, all the nucleotide sequences associated with the genus Conus were downloaded from GenBank (www.ncbi.nlm.nih.gov). The sequences corresponding to non-coding regions, ribosomal genes, mitochondrial genes, and genes with a function that did not relate to toxin activity were removed from the dataset, thus keeping only coding genes with a potential toxin activity. Only sequences obtained from Conus species belonging to the large major clade (Duda and Kohn 2005) were conserved, as a large number of the conopeptides found in species from other clades (e.g, C. californicus) are highly divergent and do not match with any of the currently known superfamilies (Biggs et al. 2010; www.conoserver.org). Consequently, the classification in the present analysis is relevant only for conopeptides of the large major clade species. Conopeptide superfamilies are defined by a conserved signal sequence, thus we used the Signalp 3.0 server (Bendtsen et al. 2004) to identify the signal sequence; all sequences that did not include at least 50 % of the signal region were removed, together with sequences including a stop codon. Only the signal region was used for phylogenetic analyses, as only this part of the conopeptides can be aligned within and, to some extent, between superfamilies.

Phylogenetic Analysis

Aligning signal sequences between highly divergent conopeptides (i.e., belonging to different superfamilies) is arduous, and homology hypotheses are doubtful. Thus, sequences were translated to amino acids and automatically aligned using two different algorithms: Muscle (Edgar 2004 www.ebi.ac.uk/Tools/msa/muscle) and ClustalW (http://clustalw.ddbj.nig.ac.jp/top-e.html). Best model of evolution for these two datasets was selected using Modelgenerator V.85 (Keane et al. 2006) following the corrected Akaike Information Criterion (with four discrete gamma categories) and used to reconstruct phylogenetic trees. The best model of evolution identified by Modelgenerator was JTT + G (Jones Taylor Thornton model, implemented under the name “Jones model” in MrBayes —Jones et al. 1992) for both datasets. Bayesian analyses were performed by running two parallel analyses in MrBayes (Huelsenbeck et al. 2001), each consisting of eight Markov chains of 30,000,000 generations each with a sampling frequency of one tree every ten thousand generations. The number of swaps was set to 5, and the chain temperature at 0.02. A neighbor-joining tree obtained with MEGA5 (Tamura et al. 2011) was used as starting tree. Convergence of the parameters was evaluated using Tracer 1.4.1 (Rambaut and Drummond 2007), and analyses were terminated when ESS values were all superior to 200. A consensus tree was then calculated after omitting the first 25 % trees as burn-in.

As is the case for most multigenic families, the identification of an outgroup was highly problematic. No gene phylogenetically related to, and proven to be an outgroup for conopeptides has been described. Furthermore, the use of toxins from other conoidean species was not possible, as it would require that the toxins from cone snails all arose from duplication events that took place after the divergence between the cone snails and other conoideans. Consequently, no outgroup was included in the analysis. This absence of an outgroup did not allow us to infer ancestor/descendant relationships.

Results

A total of 1,364 sequences potentially corresponding to conopeptides and with a signal sequence were downloaded from GenBank (performed on 1st of July, 2011). Alignments were 34 and 30 amino-acids long with Muscle and Clustal W, respectively. To limit the time of calculation for phylogenetic analysis, only one sequence per amino-acid haplotype was kept; finally, 585 sequences were retained. Overall, the phylogenetic trees obtained from the Muscle and Clustal alignments were congruent; discrepancies were not supported (posterior probabilities <0.90) and concerned phylogenetic relationships between the main clades and the position of a few highly divergent sequences (see details below). For clarity, only the phylogenetic tree based on the Clustal alignment is presented (Fig. 1) but the results obtained from the Muscle alignment, when different, are discussed.

Fig. 1
figure 1

Bayesian phylogenetic tree (midpoint rooting) obtained from the Clustal alignment of the signal sequences of conopeptides from GenBank. Posterior probabilities (when >0.9) are provided for each node. Gray boxes are used to visualize the superfamilies. The B and C superfamilies respectively correspond to the contulakins and conantokins. The lineages X1–X7 potentially correspond to previously unrecognized superfamilies (see details in the text)

Using information from GenBank and the literature, it was possible to link the clades defined with the bayesian analysis to known superfamilies. Most of the defined superfamilies (A, D, I1, I2, I3, J, L, O1, O3, P, S, T, V) corresponded to monophyletic groups, with some highly supported (Fig. 1). With the Muscle alignment, the O2 superfamily was included within the O1 superfamily; the superfamily Y was represented by a single sequence, and corresponded to a unique lineage in the tree. However, some superfamilies did not correspond to a monophyletic group, as they included other conopeptides (e.g., O2 included sequences of contryphans, and M included conomarphin—a result already discussed by Han et al. 2008b). Several conopeptides from GenBank did not cluster in any of the known superfamilies. These corresponded to known cysteine-poor conopeptides, contulakin, and conantokin, shown in Fig. 1 as the B and C superfamilies, respectively (the C superfamily has been previously defined by Jimenez et al. (2007)); two conoCAP sequences (FN868446.1 and FN868447.1—named X1 in the Fig. 1 and appendix 1) described by Möller et al. (2010); and sequences putatively annotated (FJ237364.1, named X2) or without annotation in GenBank (DQ359922.1, EF493183.1/EF493184.1 and DQ359921.1, named respectively X3, X4, and X5). In the Clustal alignment, two other groups of sequences, FJ375238.1/FJ375239.1/FJ375240.1 and EF208033.1 clustered in the superfamily A and O1, respectively with long branches, but corresponded to the independent lineages in the Muscle alignment (X6 and X7, respectively).

Function and cysteine pattern were not clade-specific; conopeptides with the same function or cysteine pattern were found in different clades. In addition, sixteen new (i.e., not numbered with Roman numbers) cysteine patterns were identified; however, most of them certainly correspond to anecdotic mutations of the canonical framework in a given family (i.e., C–CC–C–C, C–C–CC–C–CC, and C–CC–C–C, found in the O1 superfamily, differ from the pattern VI/VII by only one mutation), while others may represent a new Cys-pattern number (e.g., the Cys-pattern C–C–C–CC–C, found in the three members of the X6 group). The results are summarized in Table 1 (full details are provided in Appendix 1).

Table 1 Number of sequences found in each superfamily, with list of cysteine patterns identified and known function in each superfamily

Table 2 lists the number of conopeptides found in each superfamily and their distribution among the 71 Conus species. The superfamilies A, M, and O1 were the largest, each containing at least 39 species, followed by the superfamilies T and I2. Conus caracteristicus, C. imperialis, and C. litteratus each express conopeptides belonging to more than 10 different superfamilies in their venom; however, it was difficult to know if this result reflects a higher conopeptide diversity in comparison to other species, or is due to a greater sampling effort in these species. All the superfamilies present in more than 10 Conus species (A, B, I2, M, O1, O2, and T) were found in mollusk, worm, and fish-hunting species.

Table 2 Number of conopeptides in each superfamily and species

Discussion

An Updated Classification of Conopeptides

Overall, the molecular phylogeny, based on more than 1,300 conopeptides signal sequences extracted from GenBank, strongly supports the current superfamily classification based on phenetic resemblances, as established in ConoServer. But, this relative congruency between phylogenetic and phenetic classifications is not surprising given the relative conservation of the signal sequence within superfamilies compared with between superfamilies, and the phylogenetic tree reflects these differences. However, the phylogenetic approach also revealed several new features, the most striking of which is the presence of deeply divergent lineages that, until now, were not included in the conotoxin superfamily classification. There are two main explanations for this result. First, the conopeptide superfamily classification reviewed by Kaas et al. (2010) includes only what is traditionally referred to as “cysteine-rich” conotoxins [i.e., conopeptides with at least two disulfide bridges in the mature sequence as defined by Norton and Olivera (2006)], thus excluding the conopeptides with two cysteines and linear conopeptides also broadly present in the venom (unpublished results). However, although the authors noted that “in future, all disulfide-poor conopeptides will probably have to be attributed to a superfamily,” they refrained from doing so because of the low number of cysteine-poor conopeptides with precursor sequences in ConoServer (21). In GenBank, we identified more than 50 such sequences and included them in the current analysis. The signal sequences of cysteine-poor conopeptides do not cluster separately from the conotoxins; some of them share highly similar signals with know superfamilies (contryphan with O2 and conomarphin with M), therefore, their exclusion from the superfamily classification is not phylogenetically justified. We identified two additional superfamilies, B and C, for conantokins and contulakins, respectively, one of which (C) has been proposed previously (Jimenez et al. 2007). Second, including non-annotated sequences from GenBank in the dataset helped to identify several independent lineages in the tree (X1–X7). The level of divergence of their respective signal sequences with the signals of other superfamilies was equivalent to the level of divergence between known superfamilies, and they thus deserve recognition as new superfamilies. However, as these independent lineages are represented by only one, two or three sequences, and because some of them may not exhibit toxin activity (even if they were all found in venom ducts of cone snails), we refrained from proposing new superfamily names, and only provided temporary names (X1–X7). It should also be borne in mind that many other conopeptides have been described in the literature, some of which have been given formal names (conkunitzin, conolysin, conomap, conophysin, conopressin, conorfamide, and conorphan). Because their signal sequences are not represented as nucleotides in GenBank, they were not included in the analysis. However, a search in the protein database of GenBank retrieved two complete precursors of Conkunitzin, with highly similar signal sequences (P0C1X2.1 and P0CY85.1) and a local BLAST search (performed using BioEdit—Hall 1999) of the dataset used for the phylogenetic analyses revealed that the conkunitzin signals were unique, and probably represent a new superfamily. Finally, if most of the superfamily-level clades are highly supported, most of the inter-superfamily nodes are not, preventing any reliable conclusion concerning the phylogenetic relationships at this level.

The original results presented herein raise several issues concerning the classification and nomenclature of the conopeptides and, more generally, of the genes that belong to multigenic systems. The updated classification system we propose is based on a phylogenetic reconstruction that guarantees the identification of sequence clusters that share a common ancestor. However, such phylogenetic trees cannot help in deciding which clades deserve a superfamily-level ranking and which ones do not. One common solution is to rely on a threshold of genetic distances, but the analyses of the genetic distances (calculated as the number of differences) between all the conopeptide signal sequences revealed that the distribution of genetic distances within superfamilies of conopeptides largely overlaps with the distribution of genetic distances between superfamilies (Fig. 2). This overlap can be linked to the high level of homoplasy found in conopeptides, making two conopeptides from different clades having, by chance, a relatively low genetic distance, or to the fact that two previously defined superfamilies would actually correspond to only one. This is the case of the L and I3 superfamilies, separated by genetic distances comprised between 0.38 and 0.69 that would, in most cases, correspond to within superfamily genetic distances.

Fig. 2
figure 2

Pairwise distribution of genetic distances (p distances) calculated with MEGA5 using the Clustal alignment. Genetic distances between sequences from the same superfamily are shown in gray, genetic distances between sequences from different superfamily in black

Consequently, it is not possible to rely only on a genetic threshold to define superfamilies for conotoxins. A threshold of 0.6, roughly corresponding to the gap between the two distributions of genetic distances (Fig. 2), would lead to the division of the M-superfamily into numerous superfamilies (indeed, Wang et al. 2008 proposed to divide the M-superfamily in M1 and M2), and to the grouping of the superfamilies I1, I3, and L in a single one. However, our approach is aimed at offering a complementary guidance to help, in the future, deciding if a conotoxin or a group of conotoxins deserve a superfamily name: (i) since the minimum genetic distance between superfamilies is 0.32, this distance should be the minimum distance between the potential new superfamily(ies) and all the others; (ii) the new superfamily(ies) should correspond to an independent lineage, i.e., it should not cluster in any of the superfamily clades previously defined; (iii) the molecular target of the new conotoxin(s) should ideally be identified, to avoid naming conopeptides that would not be functional; (iv) the structure (cysteine pattern) and/or function should be different from the most closely related superfamilies in terms of genetic distances and/or phylogenetic relationships. All these criteria apply to the B and C superfamilies (genetic distances with other superfamilies >0.3, these two lineages are independent and monophyletic, their molecular targets are identified—Mena et al. 1990, Craig et al. 1999—, and their cysteine framework are different from their respective sister-groups), justifying the attribution of new superfamily names. We followed the traditional nomenclature of conopeptide superfamilies, i.e., a Roman capital letter. As the number of Roman letter is limited, some superfamilies have been named with a Roman letter followed by an Arabic number (e.g., I1, I2, I3, O1, O2, and O3) when several superfamilies share a common cysteine framework or molecular target. Because of the potentially high number of unknown superfamilies of conopeptides, we have no doubt that the nomenclature based on both Roman letters and Arabic numbers will become the reference rule.

The first and fourth criteria also apply to the seven “X” lineages (Fig. 1), but the second applies to only 5 of them (two clustered within the A and O1 superfamilies with the muscle alignment) and the third to none of them. We propose to name such potential superfamilies of conopeptides that currently do not meet all the criteria but could in the future with the X Roman letter, followed by an Arabic number, waiting for either to be fully recognized as a separate superfamily or as belonging to an existing one.

Evolution of the Conopeptides

The phylogenetic analysis clearly confirms that most of the defined superfamilies include conopeptides with different cysteine frameworks and functions. Conversely, similar cysteine frameworks and functions are found in different superfamilies, suggesting that a given cysteine framework or function can appear several times independently, probably as a result of convergent evolution. The multiple apparitions of the same framework and function during conotoxin evolution are probably linked to the extremely rapid diversification of the genes. Several molecular mechanisms have been proposed as being responsible for this high rate of diversification. Pi et al. (2006) suggested that alternative splicing, unequal crossing-over or exon shuffling could explain this diversity. Olivera et al. (1999) proposed two other mechanisms: the lack of a mismatch repair system, at least in the hypervariable part of the sequence (the mature toxin); and recombination mechanisms. Several other hypotheses, such as a high rate of duplication, followed by a strong diversifying selection on the newly created gene copies that could lead to the rapid appearance of several structurally and functionally highly divergent genes, have been also proposed and tested by different authors (Duda and Palumbi 1999, 2000; Conticello et al. 2000, 2001; Espiritu et al. 2001; Duda and Remigio 2008; Chang and Duda 2012). All these molecular mechanisms, together with observed differences in the expression pattern between species, maybe linked to episodes of gene silencing and reactivation (“Lazarotoxins”, Conticello et al. 2001; Duda and Palumbi 2004; Duda 2008), could favor the rapid diversification of Conus species, by allowing them to envenomate and feed on new prey and thus colonize new niches (Duda and Lee 2009).

A phylogenetic approach could be very useful to identify divergent conopeptides with potentially different functions, even if they share a common structural framework. For example, the cysteine framework IV, found in the A-superfamily, is already linked to two different functions (αA—Hopkins et al. 1995 and κA—Craig et al. 1998). However, conotoxins, described by Conticello et al. (2001), with the same framework, belong to the M-superfamily, suggesting that these IV-conotoxins that are structurally convergent with the IV-conotoxins in a different superfamily, could exhibit a completely different function. A similar strategy could also apply within each superfamily, where not only the signal sequence, but also the propeptide and mature regions can be aligned, and could reveal divergent lineages with as yet uncharacterized functions (e.g., see Aguilar et al. 2009; Puillandre et al. 2010; Wang et al. 2008; Zhangsun et al. 2006).

Furthermore, our identification of numerous new cysteine frameworks among the GenBank sequences was also surprising. Even if some of them may be non-functional genes (pseudogenes), others could correspond to novel protein structures. A few publications demonstrated that even toxins with odd numbers of cysteines can be functional, for example with two 5-Cys toxins forming a functional dimer or bioactive polymers of the 13-Cys “Con-ikot-ikot” peptide from Conus striatus (Quinton et al. 2009, Walker et al. 2009). Our findings challenge the traditional view where conotoxins are characterized by a limited number of cysteine frameworks: by exploring new evolutionary pathways, the apparition of novel cysteine frameworks may also participate in the hyper-diversification of the conotoxins. In addition, this raises the question of the total number of cysteine patterns one could expect to find among cone snail toxins. It is possible to predict the theoretic number of cysteine patterns that could exist. If we limit the exercise to the 2, 4, and 6 cysteine patterns and exclude those with more than two consecutive cysteines, 20 different frameworks can be proposed (C–C*, CC, CC–C–C*, CC–CC*, C–CC–C, C–C–CC*, C–C–C–C*, CC–CC–CC, CC–CC–C–C, CC–C–CC–C, CC–C–C–CC*, CC–C–C–C–C*, C–CC–CC–C, C–CC–C–CC, C–CC–C–C–C*, C–C–CC–CC, C–C–CC–C–C*, C–C–C–CC–C*, C–C–C–C–CC, C–C–C–C–C–C*). Ten of these frameworks (marked with an *) can be found in GenBank. Given the extreme capacity of the conopeptides to evolve and the apparent lack of evolutionary constraints (as illustrated by the multiple apparitions of identical frameworks during their evolution), there is no reason that all these theoretical patterns will not be found in the future. It could be argued that mechanical constraints would prevent the existence of some cysteine patterns; for example, it could be unfavorable to have a disulfide bridge between two adjacent cysteines. However, despite this we found a short mature toxin in the venom of one cone snail with a disulfide bridge between adjacent cysteines (unpublished results). The peptide has been reproduced by protein synthesis, confirming this finding.

Conus and Conoidea Toxin Diversity

The diversity of conotoxins in the venom of several Conus species (Table 2) confirms that most species are able to express a variety of conotoxins, as widely reported in literature (e.g., Olivera 2002). Furthermore, our results also suggest that Conus diet (fish, mollusk, and worm) is not correlated with differences in venom composition at the superfamily level. If differences exist, as suggested in the literature (e.g., Conticello et al. 2001; Kaas et al. 2010), they most likely occur at the species and intra-superfamily levels. Furthermore, phylogenetic analyses suggest that, at least, the worm- and fish-hunting species are not monophyletic, as these two diets appeared independently several times during the Conus evolution (Duda and Palumbi 2004; Espiritu et al. 2001; Kraus et al. 2011). Thus, differences in the venom composition should not be sought between the three diet groups, but between the monophyletic clades defined within these three groups (Duda and Palumbi 2004).

Diversity of the marine snail toxins is not limited to species included in the large major clade of Conus. Recent analyses in other conoidean taxa suggest that toxin hyperdiversity is not the privilege of the Conus large major clade. C. californicus, which is highly divergent from all the other Conus species (Duda and Kohn 2005), showed a high diversity of toxins in its venom and several of them were thought to correspond to new superfamilies (Biggs et al. 2010; www.conoserver.org/?page=classification&type=genesuperfamilies). To a lesser extent, species in the small major clade of Conus, may also contain several novel conotoxins, as suggested by an original Cys-pattern (XIII) found in the species C. delessertii (Aguilar et al. 2005). In addition to the family Conidae, original toxins have already been reported in several other species of Conoidea, such as Polystira albida (Lopez-Vera et al. 2004; Rojas et al. 2008), Gemmula periscelida (Lopez-Vera et al. 2004), G. speciosa, G. sogodensis, G. diomedea, G. kieneri (Heralde et al. 2008), Lophiotoma olangoensis (Watkins et al. 2006), Terebra subulata (Imperial et al. 2003), Hastula hectica (Imperial et al. 2007) and Crassispira cerithina (Cabang et al. 2011). Furthermore, taxonomic surveys (Bouchet et al. 2009) and phylogenetic analyses (Puillandre et al. 2011) suggest that the superfamily Conoidea actually comprises a number of deeply divergent clades, whose species diversity is currently largely underestimated. Presently, around 4,500 species have been described, but the group is believed to include more than 10,000 species (Bouchet et al. 2009). Even if the venom apparatus has been lost in several lineages of Conoidea (e.g., Fedosov 2007; Fedosov and Kantor 2008; Holford et al. 2009; Medinskaya and Sysoev 2003), these findings suggest that the conotoxin diversity characterized so far represents only a small part. If the level of diversity across all conoidean species is similar to that found in those already investigated, the number of toxins produced by this single superfamily could be as high as ten millions.