Introduction

Venoms are key evolutionary innovations that are found across a broad range of animal phyla (Fry et al. 2009). Toxins recruited into venoms have been found to belong to only a handful of protein families, and studying the evolutionary trajectories of these convergently recruited toxins will provide us with a better understanding of the mechanism of proteins and peptide neofunctionalisation. A major limitation of this type of research is the narrow taxonomical range studied, with entire groups being neglected. Hymenoptera, which represents one such entirely neglected but highly speciose lineages of venomous animals, have conquered virtually every terrestrial ecosystem (Grimaldi and Engel 2005). Their venom is constituted by a mixture of proteins, peptides and low molecular weight compounds, and is employed for antipredator defence of the individual and/or the entire colony, as well as for prey capture (Palma 2013). Multiple proteins present in hymenopteran venoms are allergenic and are most commonly associated with local and systemic allergic reactions. As a result, stings caused by hymenopterans are one of the main causes of IgE-mediated anaphylaxis among the human population (Bilò 2011; Bircher 2005).

Allergens play an important role in the defence of hymenopteran insects. Schmidt (2009) have proposed that these toxins confer an evolutionary advantage as they induce learnt avoidance in predators. Despite their tremendous ecological importance, there is a lack of understanding of the evolutionary origin and diversification of venom allergens across hymenopteran species. Because of their relatively deep origin (280 mya), hymenopterans are an ideal system to investigate the dynamics of venom across long evolutionary time (Peters et al. 2017). Further understanding the molecular evolution may help in determining their roles in hymenopteran venoms, as well as may facilitate improvements in the therapeutic treatments for allergic reactions/hypersensitivity to stings.

In this study, we examine phylogenetic histories and the molecular evolution of the major venom allergens and provide the first comprehensive overview of Hymenoptera venoms. This study examines the evolution of major venom allergens in hymenopteran venoms, including phospholipase A1 (PLA1), phospholipase A2 (PLA2), hyaluronidase, acid phosphatase, serine protease and antigen 5 (ag5).

Methods

Phylogenetic Reconstruction

These six allergens were selected as they are the major venom allergens identified in hymenopteran venoms (Hoffman 2006, 2008). Protein sequences for each hymenopteran allergen were pulled from the UniProt database. The sequences were aligned using a combination of manual alignment of the conserved cysteine positions and alignment using the MUSCLE algorithm implemented in AliView for the blocks of sequence in between these sites (Edgar 2004; Larsson 2014). We reconstructed the phylogeny of these sequences using MrBayes 3.2 for 15,000,000 generations and 1,000,000 generations of burnin with lset rates = invgamma (allows rate to vary with some sites invariant and other drawn from a γ distribution) and prset aamodelpr = mixed (allows MrBayes to generate an appropriate amino acid substitution model by sampling from ten predefined models) (Ronquist et al. 2012). The run was stopped when convergence values stabilized at approximate 0.013.

Tests for Selection

Coding DNA sequences were compiled from GenBank (Benson et al. 2006). The sequences were trimmed to only include those codons, which translate to the mature protein, translated, aligned and reverse translated using AliView and the MUSCLE algorithm (Edgar 2004; Larsson 2014). Phylogenetic trees for each clade were generated from the resulting codon alignments using the same methods as described above. This tree topology was used for all subsequent analyses. We used several of the tests for selection implemented in HyPhy version 2.220150316beta due to their different emphases (Pond and Muse 2005). The AnalyzeCodonData analysis generates overall ω values for an alignment while the FUBAR method gauges the strength of consistent positive or negative selection on individual amino acids (Murrell et al. 2013). In contrast, the MEME method identifies individual sites that were subject to episodes of positive selection in the past (Murrell et al. 2012).

Protein Modelling

Custom models for each clade were generated by inputting representative sequences to the Phyre2 webserver using the Intensive option (Kelley et al. 2015). Alignments of each clade were trimmed to match these structures and attribute files were created from FUBAR and MEME results. The structures were rendered and coloured according to these attributes in UCSF Chimera version 1.10.2 (Pettersen et al. 2004).

Results

We investigated the nature of natural selection influencing the evolution of genes encoding various hymenopteran allergens by computing the ratio of non-synonymous (dN) to synonymous (dS) substitutions, called omega (ω), where ω less than, greater than or equal to one is characteristic of negative, positive and neutral selection, respectively. Fast Unconstrained Bayesian AppRoximation (FUBAR) and Mixed Effects Model of Evolution (MEME) were also employed. FUBAR detects sites evolving via pervasive diversifying and purifying selection and MEME identifies sites under episodic diversifying selection.

Phospholipases

Our phylogenetic analysis showed that hymenopteran PLA1 belongs to two distinct monophyletic clades (counting those with at least five sequences) (monophyletic in this instance meaning groups within this selection of sequences that form clades) (Fig. 1). Clade A consisted of Formicidae species and clade B of Vespidae species. Since all Formicidae sequences are more closely related to one another than to Vespidae sequences and vice versa, it appears that the diversification of this toxin family occurred independently in both families after their divergence. Although purifying selection largely influences both clades, clade B has a lower overall ω value than clade A (ω = 0.36 and 0.56, respectively) and has far more sites that were found to be significantly under purifying selection (110 and 16 sites, respectively) and fewer sites under diversifying selection (1 and 5 sites, respectively) according to FUBAR (Table 1). MEME, however, identified more sites that have been subject to episodic diversifying selection in clade B (28 sites) than clade A (10 sites; Table 1). Figure 2 applies the values generated by these site-specific analyses to protein structures predicted by the Phyre2 server, which shows that despite the lower overall ω value, clade B possesses several specific residues that are subject to diversifying selection.

Fig. 1
figure 1

Phylogenetic tree of publicly available hymenopteran phospholipase A1 sequences. Where A and B represent closely related groups. Scale bar represents an average of 0.3 substitutions per site

Table 1 Tests of selection on the Hymenoptera allergens
Fig. 2
figure 2

Protein models of phospholipase A1 show front and back views coloured according to FUBAR’s estimated strength of selection (β–α, left) and MEME’s significance levels (right)

Phylogenetic analysis of hymenopteran PLA2 shows that these proteins belong to at least 3 distinct monophyletic clades; two distinct groups of Apidae species and one group of Formicidae species (Fig. 3).

Fig. 3
figure 3

Phylogenetic tree of publicly available hymenopteran phospholipase A2 sequences. Where A, B and C represent closely related groups. Scale bar represents an average of 0.2 substitutions per site

The rates and patterns of molecular evolution in the clades are similarly under the influence of purifying selection, with low ω values ranging between 0.12 and 0.34 (Table 1). FUBAR method identified between 35 and 76 negatively selected sites and MEME identified between 2 and 9 sites under episodic selection. Protein modelling showed that the majority of sites under positive and episodic selection are on the surface of the protein structure (Figs. 4).

Fig. 4
figure 4

Protein models of phospholipase A2 show front and back views coloured according to FUBAR’s estimated strength of selection (β–α, left) and MEME’s significance levels (right)

Acid Phosphatase

Phylogenetic analysis of the hymenopteran acid phosphatase enzyme showed four distinct monophyletic clades (Fig. 5). Clade A and D were formed solely by Formicidae species, while Clade B was composed of Apidae and Clade C encompassed a variety of parasitoid wasps and sawflies. The rates and patterns of evolution in the clades of acid phosphatase had many similarities, all influenced by purifying selection. There were consistent low ω values (0.06–0.28), FUBAR method identified numerous sites (189–230) under negative selection, while MEME identified only a small number of sites (4–35) as having experienced episodes of diversifying selection (Table 1). Figure 6 combines these tests for selection with protein structures predicted by the Phyre2 server, showing the extent to which these genes are dominated by purifying selection.

Fig. 5
figure 5

Phylogenetic tree of publicly available hymenopteran acid phosphatase sequences. Where A, B, C and D represent closely related groups. Scale bar represents an average of 0.3 substitutions per site

Fig. 6
figure 6

Protein models of acid phosphatase show front and back views coloured according to FUBAR’s estimated strength of selection (β–α, left) and MEME’s significance levels (right)

Hyaluronidase

Phylogenetic analysis of the hymenopteran hyaluronidase enzyme shows that they belong to two distinct monophyletic clades (Fig. 7). Clade A consists of Braconidae parasitoid wasps and clade B is composed of Vespidae species. Both clades show similarly low ω values 0.15 and 0.21, respectively (Table 1). Despite this, FUBAR identifies many more negatively selected sites in clade A and MEME also identifies more sites under the influence of episodic diversifying selection in clade A. Figure 8 uses both FUBAR and MEME tests for selection with predicted protein structures to provide further phylogenetic context.

Fig. 7
figure 7

Phylogenetic tree of publicly available hymenopteran hyaluronidase sequences. Where A and B represent closely related groups. Scale bar represents an average of 0.2 substitutions per site

Fig. 8
figure 8

Protein models of hyaluronidase show front and back views coloured according to FUBAR’s estimated strength of selection (β–α, left) and MEME’s significance levels (right)

Serine Protease

Our phylogenetic analysis showed that hymenopteran serine protease enzyme belongs to five distinct monophyletic clades (Fig. 9). These clades show that the serine protease sequences were quite diverse. Clade A is composed of parasitic wasps and sawfly’s, clade B Apoidean bees, clade’s C and D consist of all of the Formicidae species and clade E the Braconidae wasps. There were minimal differences in the rates and patterns of molecular evolution between the clades (Table 1). All clades had a low ω values (0.11–0.55), and significant number of sites that FUBAR identified as being negatively selected (204–231). MEME identified as many as 54 sites evolving under the influence of episodic diversifying selection in clade D and as little as 7 sites in clade A. On all of these measures, Clades A and E exhibited stronger purifying selection. Figure 10 combined the FUBAR and MEME tests for selection with protein structures predicted by the Phyre2 server in order to provide additional phylogenetic context.

Fig. 9
figure 9

Phylogenetic tree of publicly available hymenopteran serine protease sequences. Where A, B, C, D and E represent closely related groups. Scale bar represents an average of 0.2 substitutions per site

Fig. 10
figure 10

Protein models of serine protease show front and back views coloured according to FUBAR’s estimated strength of selection (β–α, left) and MEME’s significance levels (right)

Antigen 5

Our phylogenetic analyses showed that hymenopteran ag5 and ag5-like sequences phylogenetically distinct (Fig. 11). The ag5 sequences make up four clades consisting of Pteromalidae, Braconidae, Formicidae and Vespidae. The ag5-like sequences fall into three clades: Apidae, Formicidae and Braconidae.

Fig. 11
figure 11

Phylogenetic tree of publicly available hymenopteran ag5 sequences. Where A, B, C, D, E and F represent closely related groups. Scale bar represents an average of 0.2 substitutions per site

There were clear differences in the rates and patterns of molecular evolution between the clades (Table 1). Ag5 were under weak negative selection with ω values ranging from 0.58 to 0.95. There were very few sites under negative or positive selection as identified by FUBAR and MEME identified as many 66 sites under episodic diversifying selection. In contrast ag5-like sequences while also under weak purifying selection (0.66, 0.78) and one clade, E was under weak positive selection (1.06). No sites were under negative selection as identified by FUBAR and only clade G had any sites under episodic diversifying selection. Figures 12 and 13 display the FUBAR and MEME results with protein structures predicted by the Phyre2 server. It shows that most of the residues under diversifying selection are highly exposed on the surfaces of the proteins.

Fig. 12
figure 12

Protein models of antigen 5 show front and back views coloured according to FUBAR’s estimated strength of selection (β–α, left) and MEME’s significance levels (right)

Fig. 13
figure 13

Protein models of antigen 5 show front and back views coloured according to FUBAR’s estimated strength of selection (β–α)

Discussion

Strong Negative Selection Influences the Evolution Hymenopteran Allergens

Molecular evolutionary assessments of hymenopteran allergens show that all with the exception of ag5 and ag5-like toxins are subject to extreme evolutionary conservation (Table 1). The ω values ranged between 0.06 and 0.56, indicating a strong influence of negative selection on a majority of sites in these proteins. This fits with the previous findings that, overall, the venoms of ancient lineages evolve under heavy constraints of negative selection, while venoms in relatively recent lineages are more likely to be evolving under the influence of positive selection, such as snake metalloprotease and three-finger toxins (Brust et al. 2013; Casewell et al. 2011, 2012; Dutertre et al. 2014; Lynch 2007; Sunagar et al. 2012, 2013a, b, c, 2014).

The use of FUBAR identified a number of sites (1–10) under positive selection while MEME identified several sites (ranging 3–54) across the allergens that experienced episodic bursts of adaptive selection (Table 1). Similar phenomenon have been found in scorpions (Sunagar et al. 2013c) and cnidarians (Jouiaei et al. 2015), both of which are ancient venomous lineages originating ~ 400 (Dunlop and Selden 2009) and ~ 600 million years ago (mya) (Park et al. 2012), respectively. These studies suggest that variation in venom-encoding genes accumulate episodically, likely under evolutionary pressure from prey and predators or shifts in ecology (Sunagar and Moran 2015). When toxins that are beneficial originate they become fixed in the population and undergo purifying selection. Hymenoptera began to diversify ~ 281 mya (Peters et al. 2017), suggesting that they may follow the same pattern of purifying evolution. This is in contrast to advanced snakes and cone snails, which are comparatively evolutionary younger lineages and show a more pronounced rapid evolution of genes under positive selection (Dutertre et al. 2014; Sunagar et al. 2013a, b, c).

Sites Under Episodic Selection Are Surface Accessible

Venom proteins involved in predation have been suggested to evolve through rapid accumulation of variation in the exposed residues (RAVER). This is where the surface of the toxin accumulates the bulk of variations under positive selection while the core residues involved in stability and activity are conserved (Sunagar et al. 2013a, b, c). Mutations leading to the loss of stable structure and function are removed by purifying selection, and structurally and catalytically important residues are conserved. Additionally, accumulations of variations on the surface of a protein are advantageous to altered surface chemistry potentially leading to neofunctionalisation.

Evolution through RAVER is congruent with the 3D models of the allergen structures where the majority of positively selected or episodically adaptive sites are surface exposed (Figs. 7, 8, 9, 10, 11, 12, 13). RAVER has been established in multiple venom linages, and it appears that even toxins from Hymenoptera adopt RAVER and favour accumulation of variation on the molecular surface (Brust et al. 2013; Ruder et al. 2013; Sunagar et al. 2012, 2013a, b, c). Certain codon sites are under the influence episodic diversifying selection; they are mostly concentrated to the surface of the allergen, likely a consequence of adaption to host immune responses. This favouring of episodic evolution on the surface of the protein may be one of the reasons that observed clinical cross-reactivity between families is low (Henriksen et al. 2001).

Despite the main codons under positive selection occurring on the surface of the allergen, the majority of sites are still under negative selection. These conserved codons may be the site of IgE-binding epitopes. Tree pollen allergens have structurally conserved molecular surfaces that are the basis for allergic cross-reactivity (Mirza et al. 2000). Mirza et al. (2000) suggested that these conserved codons on the molecular surface of the allergen harbour major IgE-binding epitopes. This may also be the case for hymenopteran allergens; however, further characterisation of the molecular surface is required in order to determine this.

Antigen 5 Under Neutral Selection

Our phylogenetic analyses show that hymenopteran ag5s have a complex evolutionary history with frequent gene duplications and losses. The ag5-like proteins found in Apidae venoms, while clearly related to the group of ant and wasp venom ag5 proteins, form their own distinct clade (Fig. 11). Ag5-like proteins were also identified in multiple other Hymenoptera species. The ω values for ag5 proteins ranged between 0.73 and 0.95, indicating a weak influence of negative selection on a majority of sites in these proteins (Table 1). The ag5-like proteins all show similar ω values, with the exception of clade E which is comprised of Apidae species, at 1.06 and is under weak positive selection. The ω and FUBAR values suggest that the protein is under neutral selection, while MEME values indicate that in ag5 proteins there are multiple sites under going episodic diversifying selection (0–59 sites).

Allergen Functionality

Allergens have been shown to belong to a small number of protein families that present with limited molecular functions (Radauer et al. 2008). Of the known allergens, one-sixth have hydrolase activity. Hydrolase activities are present in hymenopteran allergens as phospholipase, acid phosphatase, hyaluronidase and serine protease (Table 2). However, functional activity of hymenopteran allergens is largely unknown, with only basic molecular functions being ascribed. Snake venoms have been extensively studied and it is interesting to compare their known functions with those unknown ones of Hymenoptera.

Table 2 Known functional activities of allergens

Venom allergens do not appear to have any unique antigenic properties, as a result it is likely that venom allergenicity occurs from the activation of accessory cells that secrete cytokines triggering the development of T helper and regulatory cells, in turn regulating the development of IgE-producing B cells in susceptible organisms (King and Wittkowski 2011). Despite extensive research on allergens, it is still unknown what factors render proteins allergic, further evolutionary and molecular studies are required in order to unravel this allergenic mystery.

Given the almost ubiquitous occurrence of allergic reactions across mammals as a result of being stung by a bee, ant or wasp, it leaves the question why are hymenopteran venoms so allergenic? Proteins such as hyaluronidases and PLA2 are present in various other venoms including snakes and centipedes. However, they rarely if ever cause the same allergic reaction that hymenopteran venoms induce. Is there an adjuvant present in the venom that may be influencing the allergic potential? Or potentially  the structure or function of the allergenic protein that is causing this allergenic reaction? Studies looking at antibody response have suggested that low molecular weight hyaluronic acid polymers and oligomers in the skin released upon being stung may function as adjuvant to promote venom allergenicity (King and Wittkowski 2011). However, this is yet to be explored further.

Conclusion

This study is the first of its kind to look at the evolutionary and molecular evolution of the major allergens found in hymenopteran venoms. We demonstrated that the major allergens present in hymenopteran venoms are evolving predominately under the influence of strong negative selection, while codon sites that experiencing episodic diversifying selection are concentrated on the surface suggesting a conservation of core amino acids. Functional testing is sorely needed in order to further understand the purpose of allergens. These results emphasize the importance of understanding the molecular evolution, diversification and phylogenetic histories of allergen components in Hymenoptera.