Introduction

Streptococcus agalactiae (group B streptococci or GBS) is a commensal bacteria found in the gastrointestinal, urinary and genital tracts of 30 % of healthy adults (van der Mee-Marquet et al. 2008). It is also the first cause of invasive infections in neonates and is an important cause of disease in pregnant women, immunocompromised adults and the elderly (Le Doare and Heath 2013). In animals, S. agalactiae is a major cause of bovine mastitis (Keefe 1997) and an emerging piscine pathogen (Evans et al. 2008). It has also been isolated from numerous other vertebrates such as camels, dogs, dolphins, gray seal and frog where it can occasionally be associated with infections (Bishop et al. 2007). Ten capsular polysaccharide serotypes have been described in S. agalactiae, but five of them (Ia, Ib, II, III and V) account for the majority of disease (Le Doare and Heath 2013). Multilocus sequence typing (MLST) has become the standard method for determining the population structure of GBS and several studies examined the correlation between serogroups and sequence types (Bisharat et al. 2004; Jones et al. 2006; Bohnsack et al. 2008). Capsular serotype was reported to not strictly follow sequence type and ST17 appears overrepresented in neonatal disease irrespective of the capsular serotype (Bisharat et al. 2004; Jones et al. 2006; Bohnsack et al. 2008). Consistent with the large host range of S. agalactiae, a high genomic diversity was observed in this species with accessory genes representing 20 % of the genome of each strain (Tettelin et al. 2005). Most of strain-specific genes of S. agalactiae cluster in chromosomal genomic islands (Glaser et al. 2002; Tettelin et al. 2005). Several chromosomal prophages have been detected in this species, but they are less abundant in S. agalactiae than in S. pyogenes (Canchaya et al. 2003; Domelier et al. 2009). In addition, a first analysis carried out on the genome of eight human S. agalactiae strains indicated that a large part of the chromosomal genomic islands could be transferable by conjugation (Brochet et al. 2008). A first category of these elements corresponds to integrative and conjugative elements (ICEs), which are self-transmissible elements encoding their excision from the host, their transfer by conjugation from a donor cell to a recipient cell and their insertion into a target site (Burrus et al. 2002a; Bellanger et al. 2014). Like other mobile genetic elements, ICEs are composed of modules which are a combination of genes or sequences involved in the same function and evolve by exchange of these modules with other mobile genetic elements (Burrus et al. 2002a; Toussaint and Merlin 2002; Bellanger et al. 2014). One of these modules contains all genes and sequences necessary for the excision and integration of the element (recombination module). Most of the ICEs carry a tyrosine recombinase that catalyzes excision by site-specific recombination between sequences flanking the ICE (attL and attR) (Grindley et al. 2006). The conjugation module is composed of all genes and sequences involved in the transfer of the element. It includes genes encoding (1) components of the conjugation pore, (2) a relaxase (helped by other proteins) which initiates conjugal transfer by nicking DNA on a nic site on the transfer origin (oriT) and (3) a coupling protein which makes the connection between the relaxosome (proteins–DNA complex) and the conjugal pore (Wozniak and Waldor 2010; Bhatty et al. 2013). In addition to the genes involved in their mobility, ICEs also carry genes controlling their mobility (regulation module) and cargo genes which can provide new properties (adaptation, virulence, antibiotic resistance) to the recipient cell (Wozniak and Waldor 2010; Bellanger et al. 2014). A second category of genomic islands, which was found in the eight strains of S. agalactiae previously analyzed, corresponds to integrative mobilizable elements (IMEs), which encode their excision and integration but are not autonomous for their conjugative transfer. Most of IMEs encode a relaxase, but none of the proteins belonging to the conjugation pore. They thus need to use the apparatus of another conjugative element to transfer (Burrus et al. 2002b). Another category of elements widespread in the eight genomes of S. agalactiae previously studied is CIs-mobilizable elements (CIMEs), which derive from ICEs or IMEs by deletion and are autonomous neither for their integration/excision nor for their transfer (Pavlovic et al. 2004). They only retain flanking att recombination sites which enable them to be mobilized in cis by related ICEs or IMEs. Indeed, we previously showed that an ICE can integrate in an att recombination site flanking a CIME to form a composite element (accretion event). The whole tandem can then excise and transfer by conjugation (cis-mobilization) (Pavlovic et al. 2004; Bellanger et al. 2011; Puymège et al. 2013).

ICEs detected in our first genome analysis (Brochet et al. 2008) include ICEs of the well-studied Tn916-,Tn5252-families and two families of ICE relying on a DDE transposase for their excision. A fifth family of ICEs, integrated in the tRNALys CTT locus, was also found to be widespread in GBS (Brochet et al. 2008; Haenni et al. 2010). ICEs of this family can confer interesting properties to the host cell. Indeed, ICE or degenerate ICE found in two strains (515 and COH1) carry a second CAMP factor gene which was found to be active in several streptococci (Chuzeville et al. 2012). Furthermore, ICE_515_tRNALys encodes an antigenI/II adhesion involved in biofilm formation and adhesion to epithelial cells (Chuzeville et al. submitted). This family of elements thus appears interesting to explore. In addition, our first work indicated a substantial number of IMEs in the genomes and little is known about this category of elements. In this work, we analyzed the prevalence and diversity of the different classes of elements in the genome of 303 strains of S. agalactiae isolated from different hosts and belonging to different MLST groups and serogroups. We also examine the possibility to transfer by conjugation an ICE found in a bovine strain to a recipient human strain.

Materials and methods

Sequence database searches and in silico comparison of genetic elements

Characteristic genes and recombination sites attL and attR found in previously described genomic islands integrated in the 3′ end of the tRNALys gene (Brochet et al. 2008; Puymège et al. 2013) were searched by BlastN analysis (default settings, Expect threshold < 10−3) on 14 complete genomes and 291 draft genomes of S. agalactiae genomes (http://www.ncbi.nlm.nih.gov/genome/genomes/186, last accessed on September 1, 2014) (see Supplemental Table S1). Search of geographical data and history of the strains indicated that two strains are identical to two other strains (LMG 15083 identical to A909 and LMG 15084 identical to 18RS21). They were thus not included in the analysis. In total, 12 ORFs of the conjugation module of ICEs (homologous to genes orfA-orfM of ICESt3 of S. thermophilus (Pavlovic et al. 2004), except orfI which is absent in ICEs of S. agalactiae), 2 additional relaxase genes and 3 integrase genes were included in the analyses. Sequence of genetic elements was extracted from contigs or complete genomes and analyzed using Vector NTI advance (Invitrogen). Pairwise comparisons of elements were performed with Artemis Comparison Tool provided by the Sanger Centre (Carver et al. 2005) using comparison files generated by Double Act (available at: http://www.hpa-bioinfotools.org.uk/pise/double_act.html). Manual editing of comparison figures was performed using Inkscape. For nonannotated contigs, annotation was made using the RAST Annotation Server (http://rast.nmpdr.org/) (Overbeek et al. 2014). Analysis of conserved domain of proteins was made using the Batch CD search tool (available at: http://www.ncbi.nlm.nih.gov/Structure/bwrpsb/bwrpsb.cgi) (Marchler-Bauer et al. 2011).

ISs were analyzed using the IS Finder tool (https://www-is.biotoul.fr/; Siguier et al. 2006). Secondary structure of the conserved region of oriT containing the nic site was analyzed using RNAfold (http://rna.tbi.univie.ac.at/cgi-bin/RNAfold.cgi; Gruber et al. 2008).

MLST typing

MLST typing was done using a scheme based on seven genes (adhP, pheS, atr, glnA, sdhA, glcK and tkt) developed by Jones et al. (2003). ST groups were defined from genomes using the Batch sequence query tool (with FASTA file of genome or multifasta file of contigs for draft genomes) available on the S. agalactiae MLST website (http://pubmlst.org/sagalactiae/) sited at the University of Oxford (funded by the Wellcome Trust; Jolley and Maiden 2010). E-BURST analysis was performed using eBURSTv3 (Single dataset analysis) developed and hosted at Imperial College London (available at: http://eburst.mlst.net/; Feil et al. 2004). e-BURST diagrams were manually edited using Inkscape.

In silico determination of serogroup

Since information about capsular antigens was not available for most of the strains included in the analysis, an in silico determination of the serogroup was performed as described by Kong et al (2002). A first classification was made on the basis of the 23 positions of variability in the 3′ end of cpsE-cpsF and 5′ end of cpsG. Discrimination of serogroups Ib and IV was confirmed by searching for a repetitive sequence at positions 78–86 of the cpsD gene. Discrimination of serogroups Ia and III-3 was made by analyzing positions 627 and 636 of the cpsE gene. Discrimination of serotypes II/III-4 and Ia/III-3 was confirmed by searching for the cpsIIIH polymerase gene specific of serogroup III.

Bacterial strains and culture conditions

Rifampicin and streptomycin resistant mutants of S. agalactiae NEM316, COH1 and A909 and of other species (Streptococcus uberis, Streptococcus dysgalactiae subsp. dysgalactiae, Streptococcus pyogenes, Enterococcus faecalis, Streptococcus thermophilus, Streptococcus salivarius and Streptococcus mutans) used as recipient strains for filter mating experiments were obtained in a preceding work (Puymège et al. 2013). Strain 515 used as donor in conjugation experiments carries an ICE which was labeled by an erythromycin resistance gene (Puymège et al. 2013). Strains FSL S3-026 (LMG 26500) and BSU108 (LMG 26527) were obtained from BCCM™/LMG bacteria collection (University of Gent, Belgium).

S. agalactiae, S. uberis and S. dysgalactiae subsp. dysgalactiae strains were grown in brain heart infusion (BHI, Difco) broth at 37 °C with 150 rpm shaking. S. pyogenes and E. faecalis strains were grown in the same conditions, but without shaking. Solid cultures of these species were performed on tryptic soy plates supplemented with defibrinated horse blood (5 %). S. thermophilus strains were grown in reconstituted skim milk (10 %, wt/vol) and M17 broth supplemented with 0.5 % lactose (LM17) (Oxoid) at 42 °C under anaerobic conditions (GENbox Anaer atmosphere generators and incubation jars from bioMérieux). S. salivarius strains were grown in M17 broth supplemented with 0.5 % glucose (GM17, Oxoid) at 37 °C under anaerobic conditions. S. mutans strains were grown in Todd Hewitt broth supplemented with 0.1 % of yeast extract (THY, Oxoid), at 37 °C under anaerobic conditions. Cultures were supplemented with the following antibiotic when required: chloramphenicol, 16 μg/mL; erythromycin, 50 μg/mL; rifampicin, 75 μg/mL; spectinomycin, 500 μg/mL; or streptomycin, 250 μg/mL.

DNA manipulations and PCRs

Preparation of chromosomal DNAs was performed according to standard protocols (Sambrook and Maniatis 1989). Primers used in this study were purchased from Eurogentec and are listed in Supplemental Table S2. PCRs and high-fidelity PCRs were carried out according to the manufacturer’s instructions using the ThermoPol PCR kit (New England Biolabs) and the Phusion High Fidelity DNA polymerase (Thermo Scientific), respectively. PCRs were performed with 4 μg/mL of DNA template and were composed of 30 cycles of amplification. The annealing step was executed at 5 °C below primer’s Tm for standard PCR (ThermoPol PCR kit, New England Biolabs) and at 3 °C upper primer’s Tm for high-fidelity PCR (Phusion High Fidelity DNA Polymerase - Thermo Scientific). Multiple-locus variant-repeat assay (MLVA) described by Radtke et al. (Radtke et al. 2010) was used to distinguish transconjugants from donor and recipient cells. This method relies on a multiplex PCR of five VNTR (variable number of tandem repeat) in the genome of S. agalactiae. The restriction and modifying enzymes were purchased from Thermo Scientific. DNA sequencing was performed by Beckman and Coulter genomics.

Excision tests

The presence of an attB (empty site) and an attI site (characteristic of a circular form) was evaluated by nested PCR (two-step PCR). The first step corresponds to a standard PCR except that 25 cycles of amplification were made instead of 30. In the second step, the template corresponds to the PCR product obtained in the first step and primers used are internal to this fragment. Primer sequences are indicated in supplemental Table S2. Conditions used are those of standard PCR described above.

Filter matings

ICE_FSL S3-026_tRNALys was labeled by insertion of an erythromycin resistance cassette in a gene encoding the ATPase subunit of an ABC transporter located at one of its extremity (Supplemental Fig. S1) as described previously for ICE_515_tRNALys, (Puymège et al. 2013).

Filter mating experiments were performed using S. agalactiae 515 (carrying ICE_515_tRNALys Ery) or strain FSL S3-026 (carrying ICE_FSL S3-026_tRNALys Ery) as donor cells and rifampicin-/streptomycin-resistant strains as recipient cells as described previously (Puymège et al. 2013). At least three independent tests were conducted for each conjugation experiment. Rifampicin-/streptomycin-resistant strains of S. agalactiae (NEM316, COH1, A909), S. uberis (20388 and 21458), S. dysgalactiae subsp. dysgalactiae (14998 and 16192), S. pyogenes ATCC 12202, E. faecalis JH2-2, S. thermophilus LMG18311, S. salivarius CIP102503 and JIM8777 and S. mutans UA159 were selected in a previous work (Puymège et al. 2013). Rifampicin-/streptomycin-resistant mutants of S. agalactiae BSU108 were selected to use this strain as recipient cell in the conjugations.

Statistical analysis

Statistical analysis was performed as described by Georgin and Mouet (2000) and Cumming et al. (2007). The means and standard errors from at least three independent experiments are indicated.

Results

Search of genomic islands integrated in the tRNALys CTT gene in the pan-genome of S. agalactiae

Previously described genomic islands (Brochet et al. 2008; Puymège et al. 2013) were used as references to scan 303 genomes of S. agalactiae at the tRNALys CTT gene locus. Further analysis reveals that ICE_2603 VR_tRNALys, IME_A909_tRNALys and CIME_NEM316_tRNALys are composite elements and correspond to a tandem of an ICE and an IME integrated at the same locus, 2 CIMEs and 1 IME and 2 CIMEs, respectively (Brochet et al. 2008; Supplemental Fig. S2). CIME1 of strain A909 and CIME3 of strain NEM316 are closely related and differ only by 3 ORFs present only in strain A909. ICE_18RS21_tRNALys harbors a second unrelated relaxase gene and a second unrelated integrase gene between orfK and orfJ of the ICE (Supplemental Fig. S2). Searches also included all recombination sites attL and attR found in these genomic islands (Supplemental Fig. S2).

To evaluate the number of genetic elements, we chose to name separately the elements which appear in tandem (whatever the nature of the element: ICE, IME or CIME). That means that each element flanked by recombination sites is considered as an element and the presence of other characteristics allowed us to classify them into putative ICEs, degenerated ICEs, IMEs or CIMEs. To allow unambiguous denomination of the elements and to reflect their diversity, we gave them a name that includes the putative nature of the element (ICE, IME, CIME or GI), the host strain and its insertion site.

Among the 303 genomes examined, only 9 carry an empty tRNALys CTT site (Supplemental Table S1). The other genomes carry in total 428 elements. Eighty-eight genomes carry an element bordered by recombination sites attL and attR which contains a full conjugation module and a recombination module, i.e., a putative ICE. Twenty other elements (including the element found in strain 18RS21 which lacks the orfH gene due to a sequencing gap in the draft genome) lack some of these characteristic features due to missing contigs in the draft genome assembly and were thus not counted as putative ICEs (elements for which gaps are indicated in Table S1). Among them, five elements carry the 12 conjugation genes, the integrase gene and an attR recombination, but do not have an attL recombination due to a missing contig in the draft genome assembly. Sixty-nine elements encode a relaxase and an integrase related to those of IME_A909_tRNALys and are bordered by recombination sites attR and attL, thus fulfilling requirements to be mobilized by conjugative elements (IMEs). Seventeen of these putative IMEs are in tandem with a putative ICE. Two hundreds and fifteen elements are devoid of conjugation/mobilization and recombination genes, but are flanked by recombination sites, i.e., are putative CIMEs (Supplemental Table S1). Among these elements, 132 are in tandem with another element (26 with another CIME, 1 with an IME, 102 with a CIME and an IME and 3 with an ICE). Thirty-six elements lack conjugation/mobilization and recombination genes and one of the recombination sites and were counted as genomic islands. In addition, the elements found in two genomes were not counted because of ambiguities (presence of two contigs with the tRNALys CTT gene, one with an ICE integrated and another one with a CIME).

Distribution of the genomic islands integrated in the tRNALys CTT gene in S. agalactiae in different phylogenetic lineages and capsular serogroups

The 303 strains analyzed were isolated from 11 different hosts: 211 from humans, 51 from cattle, 27 from different fish species (1 from flathead mullet, 22 from Nile tilapia, 2 from striped bass and 2 from trout), 5 from gray seal, 3 from dog, 3 from frog, 2 from camel and 1 from dolphin, (Supplemental Table S1). Among these strains, 255 can be assigned to an MLST group (Supplemental Table S1).

E-burst analysis of MLST data indicated that strains belong to 49 different sequence types (ST) and can be grouped into ten clonal complexes (CC1, CC3, CC7, CC12, CC17, CC19, CC23, CC61, CC283 and CC452) (Fig. 1). Eight strains appear as singletons (Fig. 1). In silico determination of capsular serogroup revealed that strains belonged to seven different serogroups: Ia (n = 60), Ib (n = 41), II (n = 52), III (n = 70), IV (n = 12), V (n = 60) and VI (n = 3) (supplemental Table S1). The serogroup could not be determined for 5 strains due to missing contigs in the draft genome. Serogroup Ia is found in two major clonal complexes (CC23 and CC7). Most of the strains of serogroups Ib and V belong to clonal complexes CC12 and CC1, respectively. Half of the strains of serogroup IV belong to clonal complex CC1. The situation is more complex for serogroups II and III. Indeed, three major clonal complexes (ST22, CC19 and CC61) harbor serogroup II capsular antigens and three major clonal complexes (CC19, CC17 and CC61) harbor serogroup III antigens. All ST17 strains harbor serogroup III capsular antigens, but strains of clonal complexes CC19 and CC61 can harbor four and three different capsular antigens, respectively.

Fig. 1
figure 1

Population snapshot obtained by E-burst analysis showing the clusters of linked and unlinked STs in the whole population of typable strains of S. agalactiae (n = 257) included in the analysis. All of the single locus variants (SLVs) are connected by a line to the centrally positioned predicted founder (indicated as a blue circle). The area of the ST circles is proportional to the abundance of the ST in the input dataset. The numbers are those of the STs from the PubMLST database of S. agalactiae (http://pubmlst.org/sagalactiae/; Jolley and Maiden 2010). Since the default definition of a group was used for E-burst analysis, all isolates in a group have the same alleles at six or more of the seven loci. Therefore, all STs linked as a single cluster belong to the same clonal complex. For easier visualization, clonal complexes are indicated by a dashed line. Singletons appear as dots in the center of the diagram. The number of isolates in an ST that carry an ICE (blue arrow) or an IME (red arrow) is indicated in brackets. (color figure online)

The nine strains devoid of genomic islands have three different origins: human (n = 4), bovine (n = 2) and gray seal (n = 3). They belong to two MLST groups, ST23 (n = 5) and ST103 (n = 2) (2 strains could not be assigned to an MLST group), and three serogroups Ia (n = 5), II (n = 1) and III (n = 3).

Among the 88 strains that host a putative ICE, 70 were human strains (33 % of the human strains) and 14 were bovine strains (28 % of the bovine strains), but ICEs were also detected in strains isolated from dog (n = 2) and gray seal (n = 2) (Supplemental Table S1). ICEs disseminated in six clonal complexes (CC1, CC17, CC19, CC23, CC61 and CC452) and in one strain of ST91 (Fig. 1), corresponding to six serogroups (Ia, II, III, IV, V and VI). Two clonal complexes (CC19 and CC23) gather 76 % of the ICEs detected in strains included in MLST analysis (n = 76) (Fig. 1; Supplemental Table S1). In CC19 (n = 39), 87 % of the strains harbor an ICE (7 of these ICEs are in tandem with an IME). Interestingly, all the strains belonging to CC19 that harbor capsular antigen III (n = 25) carry a putative ICE or degenerate ICE. Half of the strains with serogroup Ia, all except three belonging to CC23, carry a putative ICE.

Among the 69 strains that host a putative IME integrated in the tRNALys CTT gene, 39 were human strains (19 % of the human strains), but IMEs were also detected in bovine strains (n = 12, corresponding to 24 % of the strains), in strains isolated from dog (n = 1), dolphin (n = 1), frog (n = 1) or different fish species (n = 15, corresponding to 56 % of the strains) (Supplemental Table S1). IMEs disseminated in eight clonal complexes (CC1, CC3, CC7, CC12, CC19, CC23, CC61 and CC283) (Fig. 1). Two clonal complexes (CC7 and CC12) gather 73 % of the IMEs detected in strains included in MLST analysis (n = 59) and do not host ICE (Fig. 1). In CC7 (n = 23), which includes 54 % of the typable fish isolates, 74 % of the strains (n = 18) harbor an IME. These strains (except 2) harbor capsular antigens of serogroup Ia. All the strains of clonal complex CC12, which almost all (n = 22) harbor serogroup Ib antigen, carry an IME.

CIMEs are found in 151 strains mostly in human strains (n = 102 corresponding to 48 % of the human strains), but also in bovine isolates (n = 14 corresponding to 28 % of the bovine strains), two camel isolates, two of the three dog isolates, dolphin isolate, three frog isolates and in all the fish isolates (Supplemental Table S1). They are found in six clonal complexes (CC1, CC3, CC7, CC12, CC23 and CC283) and in five singletons (ST26, ST130, ST260, ST261 and ST609) and in all the seven serogroups detected. CC1 is particularly rich in CIMEs since 54 strains harboring at least one CIME belong to this clonal complex (corresponding to 41 % of the typable strains harboring a CIME and to 96 % of the strains belonging to this clonal complex). Most of these strains (n = 41) harbor capsular antigens of serogroup V (n = 41).

Diversity of the genetic elements integrated in the tRNALys CTT gene of S. agalactiae

For deeper analysis, elements were selected to reflect the diversity of origins of the host strain and the diversity of MLST groups. The DNA sequence of the elements was extracted from genome/contig, annotated if necessary and compared to other sequences.

Thirty-five putative ICEs were chosen according to the origin of the host strain (24 human strains, 8 bovine, 2 from dog and 1 from gray seal) and its MLST group (distribution in 17 different ST) (Supplemental Table S1). These putative ICEs varied in size from 27,840 bp (for ICE_NGBS_061_tRNALys) to 42,790 bp (for ICE_FSL S3026_tRNALys) (Supplemental Table S1). All of them, except ICE_NGBS061_tRNALys, have a conjugation module very close to the one of ICE_515_tRNALys, element that was previously found to excise and transfer (Puymège et al. 2013). These ICEs are closely related and differ only by insertion/deletion of genes at five different locations in the element (selected ICEs representative of these different variants are shown in Fig. 2). ICE_NGBS061_tRNALys, found in a ST459 human strain, carries a different conjugation module. The orfB gene is absent and three genes (orfA, orfC and orfG) particularly differ from those of ICE_515_tRNALys. Indeed, the corresponding proteins show only 50, 46 and 59 % of amino acid identity with OrfA, OrfC and OrfG of ICE_515_tRNALys, respectively (Supplemental Table S1). Furthermore, this ICE has a left recombination site (attL) with only 69 % of identity with attL of other ICEs (alignment was performed on the 200 bp which are conserved in the left end). This ICE is also found in the second ST459 human strain included in the analysis. The plasticity in this group mainly correlates with the presence of ISs. One ICE (ICE_FSL S3-026_tRNALys) identified in a bovine strain is particularly rich in IS (3 ISSag11 copies and 1 ISSag5) (Fig. 2). ICEs that carry the ICE_515_tRNALys conjugation module, except ICE_MRI Z1-022_tRNALys, carry the same accessory genes, i.e., a putative bacteriocin operon, a mercury resistance operon (merR/merA), a CAMP factor gene and two type II toxin–antitoxin systems (Fig. 2).

Fig. 2
figure 2

Comparison of ICEs integrated in the tRNALys CTT gene detected in the genomes of S. agalactiae. Only representative elements are indicated in the figure, since other ICEs share more than 90 % of nucleic identity with them. For more clarity, elements in accretion with these ICEs are not shown. Host origin is indicated below the name of the strain in brackets. The two complementary strands of DNA are indicated on separate lines. ORFs appear as arrows (truncated genes are indicated by Δ) with genes of the regulation module colored in green, genes of the conjugation module colored in dark blue and genes of the recombination module colored in red. Genes of the conjugation module were named according to their similarity to ICESt1/St3 of S. thermophilus (Bellanger et al. 2009). Genes encoding proteins with putative function inferred from in silico analysis are indicated with a color: pink for a putative bacteriocin operon, light blue for the CAMP factor gene, yellow for surface proteins and brown for toxin–antitoxin genes. ISs are indicated as arrows colored in light gray. Direct repeats are drawn as pinheads. Nucleic identity higher than 80 % between sequences of ICEs is indicated in gray. Gaps in the genomes due to missing contigs are indicated by a double slash. (color figure online)

Thirty-three strains carrying a putative IME integrated in the tRNALys CTT gene were selected from human (n = 17), bovine (n = 8), dog (n = 1), dolphin (n = 1), frog (n = 1) and different fish species (n = 5) isolates, belonging to 16 different ST (Supplemental Table S1). Twenty strains carry a composite element consisting of one or two CIMEs in tandem with an IME. These elements are very close to the one found in strain A909, differing only by insertion of an IS1381A copy or by a deletion in the CIMEs (see representative elements shown in Fig. 3; Supplemental Table S1). The other elements are related to IME_2603V/R_tRNALys, with differences located only at the left end. All these IMEs are unrelated to ICEs since their relaxase and integrase display no significant similarity or only 41 % of protein identity on the whole protein sequence with the relaxase and integrase of ICE_515_tRNALys, respectively. The size of the IMEs goes from 8290 bp (for IME_A909_tRNALys) to 10,948 bp (for IME_2603V/R_tRNALys) (Supplemental Table S1). IMEs related to IME_2603V/R_tRNALys encode an ABC transporter of the drug resistance transporter subfamily (cd03230, ABC_DR_subfamily). IMEs related to IME_A909_tRNALys encode a putative intracellular protease (cd03140, GATase1_PfpI_3) which may hydrolyze small peptides to provide a nutritional source.

Fig. 3
figure 3

Comparison of IMEs and composite genomic islands including IMEs integrated in the tRNALys CTT gene detected in the genomes of S. agalactiae. Only representative IMEs are indicated on the figure since other IMEs share more than 90 % nucleic identity with them. In two strains (2603 V/R and FSL S3-654), the IME is in accretion with an ICE which has not been drawn on the figure due to space limitation. Host origin is indicated below the name of the strain in brackets. The two complementary strands of DNA are indicated on separate lines. ORFs appear as arrows (truncated genes are indicated by Δ) with genes of the regulation module colored in green, relaxase (rel) gene colored in dark blue and genes of the recombination module colored in red. ISs are indicated as arrows colored in light gray. Genes encoding proteins with putative function inferred from in silico analysis are indicated with a color: pink for a putative bacteriocin operon, light blue for an ESAT6 family protein, yellow for a putative protease and brown for a putative multidrug ABC transporter. Direct repeats are drawn as pinheads. Nucleic identity higher than 80 % between sequences of IMEs is indicated in gray. (color figure online)

Among the 215 putative CIMEs, 76 and 59 putative CIMEs are closely related to CIME1 and CIME2 found in strain A909 (Supplemental Fig.S2; Supplemental Table S1). Seventy-three are closely related to CIME2 found in strain NEM316 at the tRNALys CTT gene locus (Supplemental Fig. S2; Supplemental Table S1). Seven other elements carry an attL of CIME 1 and an attR of CIME2 of strain NEM316, but are devoid of internal recombination site. CIME1 of strain A909 and CIME1 of strain Nem316 are closely related CIMEs which differ only by the presence of three additional ORFs in the latter. These putative CIMEs exhibit 99 % of identity with the left end of ICE_515_tRNALys (99 % of identity for 1910 bp) and thus likely derive from an ICE by deletion. CIME2 of Nem316 carries four genes homologous to genes found in the IME of strain A909 (including a truncated integrase gene similar to the integrase gene of this IME) and thus likely derive from an IME. Interestingly, a copy of an ICE carrying tetM resistance gene and belonging to Tn916 family, very distantly related to the ICEs integrated in the tRNALys CTT gene, is integrated inside the CIME of strain FSL F2-343. In addition, CIMEs related to CIME1 of strain A909 and NEM316 carries a gene encoding an ESAT6-/EsxA-secreted virulence protein (cl02005, WXG100_ESAT6 motif).

Accretion of genetic elements at the tRNALys CTT gene locus in Streptococcus agalactiae

Many tandems of elements were detected during this analysis indicating that accretion events had occurred. One strain (FSL S3-586) displays a tandem of two ICEs. Seventeen strains carry a tandem of ICE and IME (8 ICE–IME tandems and 9 IME–ICE tandems). Four other strains also likely carry IME–ICE tandems, but were discarded since they had a gap at the left end. Three strains (BSU260, MRI Z1-205 and NGBS061) harbor a tandem CIME–ICE. The ICE–CIME structure was not detected. Fifty-one IMEs are in tandem with two CIMEs and one is in tandem with one CIME (CIME–IME tandem in strain MRI Z1-206, Fig. 3). Thirteen CIME–CIME tandems were also observed. In addition, seven other elements carry an attL of CIME 1 and an attR of CIME2 of strain NEM316 which indicates an accretion event followed by evolution of the tandem by deletion of the internal recombination site. In total, 148 examples of accretion were thus detected.

A novel family of elements: IMEs integrated in oriT of ICEs

A second integrase gene was found in the degenerated ICE integrated in the tRNALys CTT gene of strain 18RS21 (Brochet et al. 2008). The encoded tyrosine integrase does not show any significant similarity with the integrase of ICEs or IMEs integrated in the tRNALys CTT gene, suggesting another site specificity for integration. In addition, analysis of the upstream genes revealed excisionase and relaxase genes (unrelated to those found in ICEs or IMEs integrated in the tRNALys CTT gene) suggesting the presence of an IME integrated inside the ICE. Direct repeats of 8 bp (TTTCTAAT) delineate an element that is localized between genes orfK and orfJ of a putative ICE integrated in the tRNALys CTT gene. The S. agalactiae ICE integrated in the tRNALys CTT gene belongs to the ICESt3 family which is very distantly related to ICEBs1 from Bacillus subtilis and to Tn916 family (Burrus et al. 2002b). Comparison of the orfKorfJ region with ICEBs1 reveals that the 8-bp direct repeat corresponds to the conserved left end of the transfer origin (oriT) and includes the nic site recognized by the relaxase (Lee and Grossman 2007). Analysis of this region in ICEs belonging to ICESt3 family indicates that this conserved region of oriT includes a CTAA sequence located in a stem–loop (Fig. 4). Analysis of this region in ICE_18RS21_tRNALys indicates that the 8-bp direct repeats (TTTCTAAT) result from the integration of an IME in the putative ICE oriT. The IME brings a CCC(A)C sequence at its right end which is complementary to the GGGGG sequence of oriT, thus enabling the formation of a stem-and-loop structure (ΔG = −6.6 kcal/mol instead of −10.4 kcal/mol in the native oriT) (Fig. 4). Search of this IME integrase gene in the other S. agalactiae genomes revealed that two other strains (GB00984 and GB00957) harbor a related IME. However, in these strains, the element is integrated in Tn916 (strain GB00984) or in an ICE that carries Tn916 conjugation module, but encodes an unrelated tyrosine integrase and is site specifically integrated in the 3′ end of the guaA gene (strain GB00957). Deeper analysis reveals a direct repeat of 4 or 5 bp (for GB00957 and GB00984 respectively) ((T)CTAA) corresponding to the conserved nic site of oriT (Fig. 4). Furthermore, a second IME is likely integrated in ICE_tRNALys in these two strains. Indeed, no contig covers the intergenic region between coupling protein and relaxase genes of ICE_tRNALys in the draft genome of these strains. In addition, sequences upstream and downstream from the sequencing gap share identity with the right and left end of IME_oriT of 18RS21 (81 to 84 % of identity, respectively). The size of these IMEs ranges from 5231 bp for IME_GB00957_oriT to 6445 bp for IME_18RS21_oriT (Fig. 5; Supplemental Table S1). IME_18RS21_oriT carries a merR/merA mercury resistance operon as found on ICEs related to ICE_tRNALys (Fig. 5). By contrast, IME_GB00984_oriT and IME_GB00987_oriT both carry a gene lsa(C) which was shown to confer cross-resistance gene to lincosamides, streptogramins A and pleuromutilins in S. agalactiae (Malbruny et al. 2011) and IME_GB00984_oriT also carry a gene tfoX-like which could play a role in DNA uptake by competence (Fig. 5).

Fig. 4
figure 4

Alignment of the conserved region in the origin of transfer of different ICEs. Host species other than S. agalactiae are indicated in parentheses. ICEs belonging to the same family of ICEs are indicated in the same color (red, blue and black for ICEBs1, ICESt3 and Tn916 families, respectively). The nic site of ICEBs1 is indicated by an arrow. The nucleotides which participate in the formation of the stem-and-loop structure are indicated by a red rectangle. The minimal direct repeat generated by integration of an IME_oriT is underlined. (color figure online)

Fig. 5
figure 5

Comparison of the IMEs integrated in the origin of transfer of an ICE in different strains of S. agalactiae. Host origin is indicated below the name of the strain in brackets. The two complementary strands of DNA are indicated on separate lines. ORFs appear as arrows with genes of the regulation module colored in green, relaxase (rel) gene colored in blue and genes of the recombination module colored in red. Direct repeats are drawn as pinheads. Nucleic identity higher than 80 % between sequences of IMEs is indicated in gray. Genes encoding proteins with putative function inferred from in silico analysis are indicated. (color figure online)

The integrase and the relaxase of these IMEs shows 94 and 91 % of amino acid identity, respectively, with those of the lsa(C) carrying element described in strain UCN70 (Malbruny et al. 2011). This element is thus likely an IME. Analysis of the upstream and downstream sequence of this element indicates that it is also integrated in the oriT of an ICE (100 % of identity over 1891 bp and 99 % of identity over 1680 bp with the orfKorfJ region of ICE_NGBS572_tRNALys of S. agalactiae). Furthermore, search in other streptococcal genomes revealed three closely related IMEs in Streptococcus mutans (strains C150, 3SN1 and 11VS1) and in S. mitis A2. In this latter strain, the IME is integrated in Tn916. In the three strains of S. mutans, no obvious integration site could be determined.

Analysis of the functionality of putative integrative and conjugative elements integrated in the tRNALys CTT gene in S. agalactiae

Further analysis of the 88 putative ICEs detected indicated that 30 have an insertion of IS1381A in the gene orfD leading to its inactivation. This gene encodes a VirB4 homolog of the Agrobacterium conjugation pore, which is likely essential for DNA translocation through the conjugation apparatus (Goessweiner-Mohr et al. 2013). This is the case of ICE_GB00984_tRNALys, which carries an IME integrated in its transfer origin. These elements are thus likely to be nonfunctional like ICE_2603VR_tRNALys, whose gene orfD is also interrupted by IS1381A (Puymège et al. 2013). Among these elements, one harbors a second copy of IS1381A integrated in the adjacent gene orfC, a gene that encodes a VirB6 homolog of the Agrobacterium conjugation pore, which is also likely essential for DNA transfer by conjugation (Goessweiner-Mohr et al. 2013). The putative ICE of strain COH1 (ICE_COH1_tRNALys) was tested experimentally and shown to be defective in excision (Puymège et al. 2013). In total, 31 of the 88 putative ICEs identified in the first in silico analysis are thus likely to be nonfunctional.

As mentioned above, ICE_FSL S3-026_tRNALys from a bovine strain belonging to CC61 (clonal complex which includes only bovine isolates) carries 4 IS. One of this IS is inserted in a gene encoding an ABC transporter located between genes orfJ and orfH of the conjugation module (Fig. 2). This insertion could have an impact on the transcription of the conjugation genes and on ICE transfer. The functionality of this ICE was thus evaluated. An empty tRNALys CTT gene locus (attB) site and an attI site characteristic of a circular form were detected, showing the excision of ICE_FSL S3-026_tRNALys (data not shown). Conjugation assays on filters were performed using the FSL S3-026 strain harboring ICE_FSL S3-026_tRNALys tagged with an erythromycin resistance gene (Supplemental Fig. 1). Rifampicin- and streptomycin-resistant mutants of strains COH1, A909 and NEM316, which already harbor an ICE, an IME and a CIME in the tRNALys CTT gene locus, were used as recipient strains as previously described (Puymège et al. 2013). Another strain, BSU108, which was found in the in silico analysis to have an empty tRNALys CTT gene recombination site, was also tested as recipient. ICE_FSL S3-026_tRNALys was found to transfer only to NEM316 recipient cells with a frequency of transfer tenfold lower than ICE_515_tRNALys tested in parallel (4.3 ± 2.3 × 10−8 versus 4.0 ± 0.8 × 10−7 respectively). Transconjugants were confirmed to be recipient cells carrying an ICE by multiplex PCR (Fig. 6). By contrast, ICE_515_tRNALys was found to transfer to S. agalactiae BSU108, but at a slightly lower frequency than with NEM316 recipient cells (3.0 ± 1.65 × 10−7 versus 4.0 ± 0.8 × 10−7). Other species of Firmicutes (S. uberis, S. dysgalactiae subsp. dysgalactiae, S. pyogenes, E. faecalis, S. thermophilus, S. salivarius and S. mutans) were tested as recipient cells, but transfer of ICE_FSL S3-026_tRNALys was not observed in species other than S. agalactiae.

Fig. 6
figure 6

Analysis of transconjugants obtained after filter mating experiments between donor strain FSL S3-026 (carrying ICE_FSL S3-026_tRNALys) and recipient strain NEM316 that carries a CIME. a Amplification of the SAL_2079 gene, which is longer in the tagged ICE (1750 bp) than in the recipient strain (750 bp) because of insertion of an erythromycin resistance cassette; b multiplex PCR that differentiates donor and recipient strains. M molecular weight marker, D donor strain, R recipient strain. Transconjugants are numbered 117

Discussion

A previous analysis revealed the presence of a genomic island integrated in the 3′ end of tRNALys CTT gene in all the eight sequenced genomes of human strains of S. agalactiae examined (Brochet et al. 2008). In this work, 303 genomes of S. agalactiae isolated from 11 different hosts and belonging to 49 different ST and 7 capsular serogroups were analyzed. Among these strains, only nine were devoid of genomic island at this locus (less than 3 %). This confirms that the tRNALys CTT gene is a hotspot of integration of mobile genetic elements in S. agalactiae. In total, 428 elements were detected: 57 putative ICEs, 69 putative IMEs and 215 putative CIMEs. As previously described (Jones et al. 2006; Bohnsack et al. 2008), a correlation between serogroup and clonal lineage was observed (Ia and CC23, Ib and CC12, V and CC1), except for strains serogroups II and III which belong to three major clonal complexes. Some clonal complexes appear to be rich in ICEs (CC19 of serogroup III and CC23 of serogroup Ia), IMEs (CC7 of serogroup Ia and CC12of serogroup Ib) or CIMEs (CC1 of serogroup V). Half of the fish isolates carry an IME integrated the tRNALys CTT gene that is similar to IME_A909_tRNALys. This is consistent with their close relationship with the A909 human strain (Liu et al. 2013). Even “exotic” strains (frog, dolphin, camel, gray seal) carry a genomic island at this locus. Most of the detected ICEs are closely related to the functional ICE already characterized in S. agalactiae strain 515. They carry a mercury resistance operon (Mathema et al. 2011), a CAMP factor gene (Chuzeville et al. 2012) and toxin–antitoxin genes (genetic addition systems, Yamaguchi et al. 2011) which likely participate in ICE spreading and maintenance. Interestingly, two ICEs carry a novel type of conjugation module that differs mainly in OrfA, OrfC and OrfG, three major constituents of the conjugative transfer system. orfA encodes a putative peptidoglycan hydrolase, harboring a CHAP domain. Such hydrolases are known to be essential to peptidoglycan opening for conjugation pore assembly (Goessweiner-Mohr et al. 2013). Differences in its amino acid sequence could affect its binding specificity and thus host range of the ICE. PsiBlast analysis reveals that OrfC is related to Orf15 of Tn916 which belongs to the VirB6 family of proteins (Goessweiner-Mohr et al. 2013). In Agrobacterium, VirB6 is a membrane protein with transmembrane motifs that participates in the buildup of the inner-membrane transfer channel (Goessweiner-Mohr et al. 2013). OrfG belongs to the VirB8/TcpC family that is predicted to localize to the exterior face of the cytoplasmic membrane. It could extend across the cell wall together with the C-terminal domain of OrfA to generate a channel for the secretion of substrate. IMEs integrated in the tRNALys CTT gene detected during this analysis are similar to the previously described IME_A909_tRNALys (for 60 % of them) or IME_2603V/R_tRNALys (for the remaining 40 % of IMEs). They also carry genes that could confer a selective advantage to the host bacteria (intracellular protease and drug transporter genes in particular). CIMEs appear widespread in all the populations examined (except in the isolates of gray seal). Interestingly, an insertion of Tn916, an ICE belonging to a very distant family, which carries the tetM tetracycline resistance gene, was detected in one of these CIMEs. Another carries a gene encoding an ESAT6/EsxA protein. This family of proteins is secreted by a type VII secretion system and participates in the virulence of Mycobacterium tuberculosis (Stanley et al. 2003) and Staphylococcus aureus (Burts et al. 2005). Homologs of the proteins that constitute this type VII secretion system were detected in S. agalactiae (SAG1033, SAG1035, SAG1036, SAG1038 and SAG10385) (Burts et al. 2005). CIMEs are not mobile genetic elements per se but can be mobilized in cis by an ICE that integrates in tandem and thus genes that they carry can also disseminate (Bellanger et al. 2011; Puymège et al. 2013). This is the second description of a genetic element carrying an ESAT6/EsxA gene, since we already detected this gene on an IME integrated in the 3′ end of rpsI in strains NEM316, 18RS21, H36B and A909 (Brochet et al. 2008).

Tandems of degenerated ICEs; ICE and IME; ICE and CIME; IME and CIME; and CIMEs were detected during this analysis: in total, this represents at least 148 examples of accretion. The mechanism of accretion thus likely plays a major role in the evolution of genetic elements integrated in the tRNALys CTT gene. Only CIME–ICE tandems and not ICE–CIMEs tandems were detected. This suggests that the ICE has integrated into the attR site flanking the CIME. It is in accordance with the demonstration that the integration of ICE occurs preferentially at the attR recombination site (Puymège et al. 2013). In addition, in the 52 tandems of IME and CIME(s) observed, the IME is also always integrated at the right end leading to CIME(s)–IME tandems and not IME–CIME(s) tandems. This suggests that IMEs also preferentially integrate in the attR site of a resident element, even if their integrase is unrelated to the ones of ICEs. In the same way, it was previously reported that, in tandems integrated in the 3′ end of the tmRNA gene of Escherichia coli and Salmonella enterica, the less degenerated element is located at the 3′ end of the target gene (attR end) (Song et al. 2011). This suggests the decay of a resident element that retains attL and attR sites, and the subsequent integration of an incoming element in the attR site. CIMEs are indeed the most prevalent elements detected in the genomes, suggesting decay of the mobile genetic elements and stabilization of adaptation genes in the genome. ISs appear to be major contributors to these deletion events. Two kinds of tandems between an ICE and an IME were almost equally observed: ICE–IME (as already observed in the composite element of strain 2603 V/R) and IME–ICE. Two chronological scenarios of element acquisition could thus explain the tandems between ICE and IME observed: ICE integration followed by IME acquisition or IME integration followed by ICE acquisition. However, a third scenario involving the acquisition of the whole composite element (ICE+IME) cannot be excluded.

Strain FSL S3-026 is the first bovine strain of S. agalactiae whose genome has been sequenced (Richards et al. 2011). It belongs to clonal complex 67, which has been identified as putative cow-adapted subgroup of S. agalactiae (Sorensen et al. 2010). A distinctive feature of this bovine GBS genome is the high frequency of ISs. Consistent with this observation, the ICE found integrated in the tRNALys CTT gene in this genome carries four copies of ISs. One of them is located in the conjugation module and could impact the functionality of ICE. We were thus interested in testing its ability to transfer by conjugation. We showed that ICE_FSL S3-026_tRNALys is functional and is able to transfer to the NEM316 human strain. The presence of four ISs on this ICE thus not impairs its functionality and transfer of this ICE between bovine and human strain is possible. As most of ICEs of this family examined (Chuzeville et al. 2012), this ICE carries a CAMP factor gene and thus could participate in the dissemination of this virulence factor. Until now, transfer was only tested between human strains and using recipient strains already carrying a genetic element integrated at the tRNALys CTT gene locus. Wide genome analysis identified several strains devoid of elements at this recombination site. One of these strains was thus used as recipient in conjugation experiments to evaluate the impact of an empty site on transfer frequency. As for ICE_515_tRNALys, no transfer was obtained using the A909 strain as recipient. As already suggested (Puymège et al. 2013), this is probably due to the CRISPR system described in this strain. Indeed, three of the spacers of this CRISPR system match sequences of ICE_515_tRNALys and thus likely protect cells from invasion by ICEs of this family. No transfer was observed either using the BSU108 strain as recipient even if no CRISPR system was detected in this strain. By contrast, ICE_515_tRNALys is able to transfer to this strain, despite at lower frequency than with the NEM316 strain as recipient. The presence of a resident element (ICE or CIME) at the tRNALys CTT gene locus thus does not appear to be a problem for further element acquisition, consistent with the observation of several tandems of elements in the genomes analyzed. Host factors, which remain to be identified, likely have an impact on the frequency of transfer of ICEs. Several membrane-related functions were recently found to affect the efficiency of transfer of ICEBs1 of Bacillus subtilis and this could apply to other ICEs (Johnson and Grossman 2014).

This comparative genome analysis brought light on a novel family of IMEs which integrate into the nic site of the oriT of very distantly related ICEs belonging to the superfamily Tn916/ICESt3 (ICE_tRNALys, ICE_guaA and Tn916). Integration of these IMEs generates a 4–8 bp direct repeat of the nic site. The IME brings a sequence at its right end which enables the formation of a stem-and-loop structure at the nic site. However, this structure is less stable than the native one and the impact of this nic site modification on ICE transfer remains to be examined. Two copies seem to be able to cohabitate in a same strain.

Two of these IMEs carry a mercury resistance operon (Mathema et al. 2011) which likely helps to disseminate the mobile genetic element. One IME carries a tfoX-like gene which in other species controls competence development (Sinha et al. 2009; Lo Scrudato and Blokesch 2013). This is surprising since S. agalactiae is not known to be competent even if it carries all the genes necessary for competence (except genes required for induction). Two of these IMEs carry the lsa(C) gene which confers cross-resistance to lincosamides, streptogramins A and pleuromutilins. This is worrying since lincosamides constitute, with macrolides, the recommended alternative drug for intrapartum antibiotic prophylaxis or invasive infection treatment in patients that are allergic to penicillin. Other examples of integration of IMEs carrying antibiotic resistance in ICEs have already been described (Bellanger et al. 2014). It should be emphasized that this enables these IMEs to transfer by two mechanisms: trans or cis-mobilization. Hijacking a large superfamily of ICEs that is widespread in streptococci will likely allow these IMEs to largely disseminate. Further work is thus needed to evaluate the prevalence and mobility of these IMEs.

In conclusion, previous work carried on eight sequenced genomes of human isolates of S. agalactiae suggested that integrative and conjugative or mobilizable elements are widespread and contribute to genome plasticity and evolution in this species. We took advantage of the huge amount of genomic data now available for S. agalactiae to explore more largely the prevalence and distribution of genomic islands in isolates from different hosts and belonging to different phylogenetic groups. Focusing on those integrated in the tRNALys CTT gene, we found that only a few strains (less than 3 %) do not carry elements integrated in this recombination site. All the others, including strains isolated from different fish species or camel, carry a genomic island (ICE, ICE derivate, IME, CIME or composite genomic island). Comparison of these elements highlights their plasticity and diversity and their widespread distribution. Consistent with this observation, an ICE hosted by a bovine strain (ST67), albeit rich in ISs, can transfer to a phylogenetically distant human isolate (ST23). A novel family of IMEs that integrate in the nic site of oriT of ICEs of the superfamily Tn916/ICESt3 was detected by the in silico analysis. This in theory enables these IMEs to transfer by two mechanisms: by trans or cis-mobilization. Further work is needed to evaluate the impact of these IMEs on the transfer of targeted ICE and the mobility and dissemination of these IMEs which carry an antibiotic resistance gene (lsa(C)).