Introduction

Conjugative gene transfer mediated by plasmids with broad host-ranges is generally believed to be a common and widespread mechanism for the transfer of genes across a broad phylogenetic range of bacteria (Thomas 2000a). Within the Proteobacteria, broad-host range (BHR) plasmids have been defined as those plasmids that can be introduced and stably maintained in bacterial species from at least two subgroups (e.g., between α- and β-Proteobacteria; Szpirer et al. 1999). The promiscuity of the BHR plasmids seems to be in part dependent on the flexibility of the replicon to interact with host factors of distinct hosts, but the genetic determinants that distinguish BHR from narrow-host-range plasmids are still not exactly known (Espinosa et al. 2000). Moreover, current insights into the diversity of these BHR plasmids are far from complete. Until recently, the known self-transmissible BHR plasmids belonged to the incompatibility groups IncP-1 (IncP in E. coli classification), IncN, IncW, and IncU (Mazodier and Davies 1991). Of those, IncP-1 plasmids were the most promiscuous. More recently, complete genome sequence analysis of the plasmids pSB102, pIPO2, and pTER331, and partial sequence of plasmid pMOL98 revealed a new family of BHR plasmids (Schneiker et al. 2001; Tauch et al. 2002; Gstalder et al. 2003; Mela et al. 2008). This and other recent discoveries of novel BHR plasmid groups (Heuer et al. 2009) illustrate the lack of understanding of the genetic diversity of these important mobile genetic elements. Thus, in spite of general agreement on the importance of BHR plasmids in the adaptive evolution of bacteria, surprisingly little is known about the genetic diversity of these plasmids and the traits they encode.

Plasmids typically include two regions (Thomas 2000b): the “plasmid backbone” encodes genes involved in the replication, maintenance, and transfer of the plasmid, while one or more other plasmid sections are comprised of various “accessory” genes that confer specific phenotypic characteristics to the host, such as resistances to antibiotics, heavy metals and UV light, catabolic functions, or production of toxins (Top et al. 2000). The mobility of these accessory genes is evidenced by their widespread association with integrons, transposons, or insertion sequences. However, a large number of traits encoded by plasmids remain cryptic as several open reading frames do not code for any known phenotypes. An extreme example is the recently sequenced plasmid pQBR103, in which no function could be ascribed to 80% of the predicted coding sequences (Tett et al. 2007). These findings underscore our poor understanding of the diversity of functions encoded on these important genetic elements of the horizontal gene pool (Frost et al. 2005).

The goal of this study was to increase our insight into the diversity of BHR plasmids by analyzing the complete sequence of plasmid pMOL98, and comparing it with other members of its group, the “pIPO2-like” plasmid family. Plasmid pMOL98 is a derivative of plasmid pES1, which was isolated from a soil that was heavily polluted by hydrocarbons. Plasmid pES1 was captured from the soil community without retrieving its host, by means of a triparental exogenous plasmid isolation method using a cured derivative of the C. metallidurans strain CH34 as recipient (Top et al. 1994). So far no known phenotypes were found to be encoded by this plasmid. Therefore it was marked with a mini-Tn5 transposon in order to test its transferability and host-range (Szpirer et al. 1999; Gstalder et al. 2003), and the resulting plasmid was named pMOL98. Since the plasmid could transfer to and replicate in members of the α-, β-, and γ-Proteobacteria (Szpirer et al. 1999; Gstalder unpublished), it was defined as a BHR plasmid. Gstalder et al. (2003) showed that the oriV region of pMOL98 was very similar to those of plasmids pIPO2 and pSB102, and that their RepA proteins showed some degree of identity with the Rep proteins of IncW plasmids. The complete genome sequence of pMOL98 described in this study shows that the similarity among these three plasmids, the more recently described fourth plasmid family member pTER331, and a newly sequenced plasmid, pMRAD02 extends to the entire backbone regions of the plasmids. Moreover, examination of the accessory regions detected a transposon that may have been acquired from the recipient host during or after the plasmid capture procedure.

Materials and methods

Plasmids

Both pMOL98 and pIPO2T are derivatives of the wild-type plasmids pES1 and pIPO2, respectively (van Elsas et al. 1998; Tauch et al. 2002; Gstalder et al. 2003). In order to monitor their transfer, they were marked with an antibiotic resistance gene using mini-transposons prior to sequencing. While pIPO2T contains mini-Tn5 luxAB, encoding luciferase genes and tetracycline resistance, pMOL98 contains mini-Tn5 Km, encoding kanamycin resistance (de Lorenzo et al. 1990). In contrast, plasmid pSB102, pTER331, and pMRAD02 represent wild-type plasmids (Schneiker et al. 2001; Mela et al. 2008). pIPO2 [GenBank accession: NC_003213] was isolated from wheat rhizosphere in Wageningen, The Netherlands, pSB102 [GenBank accession: NC_003122] from alfalfa rhizosphere in Braunschweig, Germany, pTER331 [GenBank accession: NC_010332] from a dune soil on Wadden Island Terschelling, The Netherlands, and pMOL98 from hydrocarbon-polluted soil in Essen, Germany. pMRAD02 [GenBank accession: NC_010509] was sequenced along with the genome of its host strain, Methylobacterium radiotolerans JCM2831, which was isolated from unpolished rice in Japan (Ito and Iizuka 1971).

Sequencing and annotation

The genome sequence of plasmid pMOL98 was determined by shotgun sequencing at the DOE Joint Genome Institute (Walnut Creek, CA). One region of poor sequence quality (mini-Tn5 Km) was resequenced at the University of Idaho by PCR amplification followed by sequencing, using Big Dye Terminator v3.1 Cycle Sequencing Kit and a 3730 DNA Analyzer (Applied Biosystems). Machine annotation of these sequences was performed by the J. Craig Venter Institute Annotation Service and further annotated manually. The complete sequence of pMOL98 has been deposited in the GenBank database [GenBank accession: FJ666348].

Bioinformatic analyses and software

GenBank (http://ncbi.nlm.nih.gov) was searched for similar sequences using BLAST (Altschul et al. 1997). Coding sequences and proteins were aligned using ClustalW (Thompson et al. 1994) and manual realignments were unnecessary. Phylogenetic trees were inferred using the neighbor joining algorithm (Saitou and Nei 1987) on protein distances with Poisson correction (MacVector, Accelrys). Plasmid map and alignment figures included in this work were generated using the BioPython, GenomeDiagram, and ReportLab open software packages. Plasmid alignments were performed using Mauve (Darling et al. 2004) and rendered graphically using PlasmiG: a Python-based sequence map and alignment drawing tool for plasmids (Van der Auwera, in preparation), with identity scoring by the BLAST algorithm bl2seq (Tatusova and Madden 1999).

Results

Basic genome sequence information for plasmid pMOL98

The complete nucleotide sequence of pMOL98, including the 1,845 bp mini-Tn5 tag, was determined to be 55,563 bp long with a G + C content of 59.9% (Fig. 1). Analysis of the coding content and organization revealed few intergenic regions and a total of 67 coding sequences (CDS) (Fig. 1; Supplemental Table 1). The sequence has a 91% coding ratio with an average CDS length of 724 bp. Similarity searches showed that most of the 67 predicted CDSs encoded proteins with similarity to proteins from plasmids of other organisms. While biological functions could be attributed to 44 CDSs, 19 CDSs code for conserved hypothetical proteins, and the remaining four predicted genes do not have any known homologues. The details of all 67 CDSs are available in Supplemental Table 1.

Fig. 1
figure 1

Circular map of pMOL98. Coding sequences (CDS) are represented by block arrows on the outer circle. Predicted functions/homologies are indicated by the color key featured below. The first circle from the center delineates the major functional modules identified on the plasmid; Tra indicates the transfer region, R/P indicates the replication and partitioning region, Acc indicates the main accessory genes region and T indicates the selection marker tag. The second circle from the center is a circular bar graph of the G + C composition (100 base window) of the plasmid sequence, with the median value (58.5%) as baseline; values above the line are G + C rich (max value 78% G + C) and values below the line are A + T rich (min value 39% G + C). Note that the sequence corresponding to the mini-Tn5 Km tag was omitted from these calculations to remove bias, since it is A + T rich. The third circle from the center is a graduated size scale with small tick marks every 1 kb and large tick marks every 10 kb

pMOL98 shares a common backbone with pIPO2, pTER331, pSB102, and pMRAD02

The similarity of the replicon of pMOL98 to that of pIPO2 and the other pIPO2-like plasmids, pTER331 and pSB102 has been noted previously (Gstalder et al. 2003; Mela et al. 2008). Similarity searches of GenBank indicate that the sequence similarity and synteny observed for the replicon can be extended to the full length of the plasmid pMOL98 detailed sequence alignments of these sequences as well as a newly sequenced plasmid, pMRAD02 from the soil proteobacterium Methylobacterium radiotolerans JCM2831, shows that all five plasmids share a common backbone comprising replication, maintenance/control, and conjugative transfer genes organized in clearly identifiable functional modules (Fig. 2). The fundamental differences between the plasmids reside mainly in discrete regions encoding accessory genes. The genetic content and organization of the shared backbone has been described extensively most recently for pTER331 (Mela et al. 2008). The high level of conservation of genetic content and synteny observed between the corresponding segments of the five plasmids makes a detailed description of the functional annotation of pMOL98 redundant. Therefore we only point out interesting differences and findings below.

Fig. 2
figure 2

Linear alignment of pMRAD02, pSB102, pMOL98, pIPO2T, and pTER331. CDSs are represented by block arrows. Key genes and functions are annotated above or below the corresponding regions. HIM stands for hemagglutinin, SPR for serine protease, MER for mercury resistance, MMF for multiple metal response phenotype and mTn for mini-Tn5. See Supplemental Table 1 for full gene annotation details. Predicted functions/homologies are indicated by the color key featured below the figure. Well-conserved (>30% amino acid identity of translated DNA sequences) segments of the plasmids are paired by shaded regions, with local identity percentage ranges indicated by the color key scale featured below the figure. Scale is indicated by the bar in the lower left-hand corner

As shown in Fig. 2, the levels of similarity between the five plasmids are not uniform over the entire length of the common backbone. Regardless of the plasmid pairs considered, the highest identity scores from pairwise alignments of translated DNA were observed for the predicted Tra proteins, TraA to TraM, encoding homologues of the VirB system as defined in the canonical Type IV Secretion System (T4SS) from Agrobacterium tumefaciens (Christie and Cascales 2005). In contrast, the Tra region encoding, among others things, the VirD2 and VirD4 homologues of the T4SS, TraN to TraS, display much lower amino acid identity scores, as do the segments encoding the KorA-IncC-KorB, and KrfA elements of the maintenance and central control region. One segment in particular, comprising the putative KrfA locus on pMOL98, pIPO2T, and pTER331, is almost entirely unrecognizable on pSB102. There, it has been annotated as “Region A” with no assigned putative functions. Furthermore, the corresponding region on pMRAD02 is shorter and also does not contain any predicted functions. A second segment that could not be associated with predicted functions, which has been annotated as “Region B” on pSB102, is poorly conserved across the plasmid group, such that their respective annotations vary widely in terms of predicted CDS in this region. Whether these two segments could truly be considered as part of the “essential” plasmid backbone remains undetermined in the absence of functional information, as they could conceivably correspond to relics of shared ancestral accessory gene modules.

To evaluate the evolutionary relatedness of these five plasmids, a phylogenetic analysis was performed on six concatenated backbone proteins shared by all five plasmids. These proteins were chosen because of their large size and high level of synteny (RepA, TraB, TraE, TraN, TraO, and KorB). Results suggest that pMOL98 and pIPO2T are most similar to each other and then to pTER331, while pSB102 and pMRAD02 are both more divergent (Fig. 3a).

Fig. 3
figure 3

Phylogenetic trees of (a) six shared backbone proteins and (b) RepA proteins. Phylogenetic relationships were inferred using the neighbor joining algorithm on protein distances with Poisson correction for (a) six concatenated backbone proteins (RepA, TraB, TraE, TraN, TraO, and KorB) and (b) the RepA protein only, from plasmids pMRAD02, pSB102, pMOL98, pIPO2T, and pTER331. Comparison of RepA also included the IncW plasmids R388 [GenBank accession: BR000038], pPAES01 [GenBank accession: CP001109] and pPRO2 [GenBank accession: NC_008608]. Phylogenetic distance (amino acid difference percentage) is indicated by the length of the tree branches and scale bars

An additional sequence comparison with other plasmids confirmed that the shared plasmid backbone displays a mosaic structure, as noted previously (Schneiker et al. 2001). Indeed, the different functional backbone modules (involved in replication, maintenance/control and transfer) show phylogenetic relationships to those of plasmids from different known incompatibility groups (data not shown). A broad similarity search showed that no other plasmids known to date share all three backbone regions of these five plasmids. The plasmids pXF51 [GenBank accession: NC_002490] and pFBAOT6 [GenBank accession: NC_006143] were previously suggested to belong to this plasmid group because they share much sequence similarity with the transfer regions of pIPO2, pSB102, and pTER331 (Marques et al. 2001; Schneiker et al. 2001; Rhodes et al. 2004; Mela et al. 2008). However, a detailed analysis showed that there is less than 30% amino acid sequence identity between the predicted RepA protein of either pXF51 or pFBAOT6 and that of any of the five plasmids, suggesting that their replication modules are not directly related. Figure 3b shows that based on a phylogenetic analysis of the RepA protein alone, the five plasmids form a cluster related to but distinct from IncW plasmids such as R388 [GenBank accession: BR000038]. Therefore we propose to name this new group of BHR plasmids ‘PromA’, for ‘promiscuous A’. We also suggest that other BHR plasmids like IncN, IncP, IncU, and IncW could then be designated PromN, PromP, PromU, and PromW in this new classification system.

Most unique segments of the PromA plasmids are transposons inserted near or at the parA resolvase locus

The unique segments of each plasmid were analyzed in terms of location, genetic content and natural occurrence. Smaller indels (<2 kb with no predicted functions) were disregarded and the following were identified as major unique regions (Table 1): a 6 kb segment on pMRAD02, a 10 kb segment on pSB102, a 10 kb segment and a 1.8 kb segment on pMOL98, a 5 kb segment on pIPO2T and none on pTER331. These regions may correspond to accessory genes that are not essential for the basic replication, maintenance and transfer functions of the plasmid, but may be important for plasmid success because of the selective advantages they confer to the host. All but one of these were found immediately upstream or downstream of the predicted CDS for the backbone gene parA, which is thought to encode a resolvase involved in plasmid maintenance. Sequence comparisons to parA on pTER331, which lacks any major unique segment, indicated that the corresponding CDS on the other plasmids are truncated forms interrupted by the presence of the unique segments. In the case of pIPO2T, the insertion seems to have occurred centrally within the CDS, leaving no likely ORF to be annotated for the parA gene. The 6 kb segment unique to pMRAD02 constitutes a notable exception as it does not display recognizable transposon features and is not located near the parA locus but immediately downstream of the repA gene. Interestingly this corresponds to the location on the PromA backbone of a predicted nuclease gene, parB. Thus the parA locus could be considered a hot-spot for accessory elements in plasmids of the PromA group.

Table 1 Unique segments per plasmid

On pMOL98, a unique 1.8 kb segment located closely downstream of parA contains the mini-Tn5 Km transposon that was used to mark the wild-type plasmid. Similarly, on pIPO2T, the 5 kb segment was recognized as mini-Tn5 luxAB, used to mark pIPO2 (Table 1). The two remaining unique, larger segments, located immediately upstream of the parA locus on pSB102 and pMOL98 were both identified as naturally occurring transposons of the Tn5053-like family. This is a family of transposons characterized by a tniABQR transposition module and an accessory module encoding mercury resistance (mer) genes (Kholodii et al. 1993), but related/variant forms such as Tn402 and Tn5090 have been shown to contain integrons instead of the mer genes (Radstrom et al. 1994). Transposons in this family are typically delineated by 25 bp terminal inverted repeats (IR), generate 5 bp direct repeats upon insertion and show a strong tendency to insert into plasmid resolvases (Minakhina et al. 1999). The 10,413 bp transposon on pSB102 had been identified previously and annotated as Tn5178 (Schneiker et al. 2001), and was found to possess the classic features of the family: nearly perfect 25 bp IRs, perfect 5 bp direct repeats and the mer genes (Table 2). The 10,402 bp transposon on pMOL98 displays transposition features that are very similar to those of Tn5178: the tniABQR transposition-related genes (87% amino acid identity of predicted products), nearly identical 25 bp IRs, and perfect 5 bp direct repeats (Table 2). The point of insertion of the transposon with respect to the parA locus is also almost exactly the same. However, the accessory genes of this transposon do not correspond to known mercury resistance genes, nor do they display integron-like features.

Table 2 Features of Tn5178/Tn6048 copies or variants

Based on the results of a nucleotide similarity search against bacterial genomes in GenBank, the 10 kb transposon identified on pMOL98 was found to be 100% identical to a transposon present in three copies in the genome of C. metallidurans strain CH34. This transposon was first designated TnCme1 and later renamed Tn6048 (Table 2) (Van Houdt et al. 2009). The accessory genes of Tn6048 have been provisionally annotated as mmf for multiple metal response phenotype on the basis of experimental observations (M. Mergeay et al. unpublished data). A transposon similar but not identical to Tn6048 was found on the plasmid pBVIE02 from Burkholderia vietnamensis strain G4. The accessory genes within this transposon on pBVIE02 display 97% nucleotide identity to the copies found in C. metallidurans CH34 and on pMOL98, and carry a 1,271 bp IS600 insertion sequence interrupting one of the accessory genes. Finally, a similarity search focused on the accessory genes alone of Tn6048 yielded stretches of Burkholderia cenocepacia, Dechloromonas aromatica and Ralstonia pickettii genomes that contain nearly complete sets of homologous genes, with nucleotide identities ranging between 70 and 80%, but are not associated with any transposition-related elements. Thus Tn6048-like transposons seem to be widespread on plasmids and chromosomes of β-Proteobacteria.

Discussion

Based on an earlier characterization of the replicon of pMOL98 (Gstalder et al. 2003) and its inclusion in a subsequent comparative analysis (Mela et al. 2008), pMOL98 was proposed to belong to the same family of BHR plasmids as the fully sequenced plasmids pIPO2T (Tauch et al. 2002), pTER331 (Mela et al. 2008) and pSB102 (Schneiker et al. 2001). The full sequence analysis of pMOL98 has now made it possible to establish the extent of this similarity over its entirety. The 47 kb plasmid pMRAD02 from Methylobacterium radiotolerans is a fifth member of this novel BHR plasmid group. These five plasmids were isolated from either rhizosphere or soil in geographically distinct locations in The Netherlands, Germany, and Japan. In spite of these different origins, the five plasmids share a highly conserved backbone sequence comprising replication, maintenance and control, and transfer functions. Since the entire backbones are not directly related to any single characterized incompatibility group, and the replicons are similar to but distinct from that of IncW plasmids, the plasmids should be classified as a new family, which we propose to name “PromA”.

The variable segments of the PromA plasmids (which consisted mainly of transposons) were found clustered in a specific locus of the shared backbone that corresponds to a predicted resolvase gene. This observation of a possible “hot-spot” for insertion of transposons in this location is in agreement with previous reports (Minakhina et al. 1999; Sota et al. 2007). Factors commonly invoked to explain high frequency insertion of transposons into sites with certain sequence features include the presence of repetitive palindromic elements as well as A/T content and topology of the target DNA (Ason and Reznikoff 2004; Liu et al. 2005; Tobes and Pareja 2006). It has been suggested that transcription levels of genes in the target locus may negatively influence transposon insertion rates (Manna et al. 2004). What remains unknown is whether insertion into plasmid sites might in some cases occur preferentially compared to insertion into the chromosome. It is tempting to speculate that conformational effects of plasmid supercoiling could produce or enhance sites that are especially “attractive” to transposases. Evolutionary pressures on the plasmid and/or on the transposon can explain preferential insertion of transposons into plasmids. For the transposon, insertion in a plasmid could be a strategy maximizing its overall copy number. Indeed, transposition within a single genome is thought to be self-limited to avoid a “poisoning effect”, i.e., damage due to insertions in essential genes (Reznikoff 2003). Mobile plasmids offer transposons a route to “colonize” other genomes. For the plasmid itself, transposons have the potential of promoting horizontal and vertical spread because they often carry accessory genes encoding host-beneficial functions such as catabolic properties and resistance to drugs or to heavy metals. Nevertheless, insertion sites may be limited because plasmid gene density is typically high and a multiplicity of insertions could lead to plasmids that are less stable, less transferable by conjugation, or confer a higher metabolic cost to their host (Sota et al. 2007). Whatever the mechanism, the tendency of PromA plasmids to “capture” transposons coupled with the ability to transfer to a wide variety of hosts and to retromobilize non-conjugative vectors (Szpirer et al. 1999) makes them effective shuttle vectors for potential host-beneficial genes. Just like the plasmids of the IncP group (Schluter et al. 2007), they may constitute significant agents of bacterial evolution.

It is interesting to examine the nature of plasmid-borne accessory genes in light of the method by which the plasmid was isolated or identified. For instance, since pSB102 was isolated by selecting for a mercury resistance phenotype, it is not surprising that its accessory genes encode mercury resistance. In contrast, pMRAD02 was not specifically targeted but sequenced along with the complete genome of its host strain in the context of a study on methylotrophic organisms (C. Marx personal communication). The accessory genes identified on pMRAD02 are predicted to encode an adhesin/hemagglutinin and its cognate serine protease. These gene products may be involved in the plant-associated lifestyle of Methylobacterium. In other organisms hemagglutinin/protease pairs have been implicated in pathogenesis as invasion factors, and some of these were found on plasmids like pO157 of enterohemorrhagic Escherichia coli (Burland et al. 1998). Whatever the true functions of these accessory proteins, they are not of the kind that is easily selected for in a plasmid capture assay. Thus this plasmid would not be known today as a member of the PromA family if its host had not been chosen for genome sequencing.

Plasmid pIPO2T, which was selected for its ability to mobilize a non-conjugative vector, does not possess genetic regions that can be identified as potentially host-beneficial accessory genes at this point. The same is true for plasmid pTER311, which was found in an isolate of Collimonas fungivorans and studied for its contribution to the rhizosphere competence of its host (Mela et al. 2008).

Plasmid pMOL98 was isolated based on its ability to mobilize another plasmid in a triparental mating. Like pIPO2T, it did not encode known accessory functions. The large transposable element on pMOL98, Tn6048, is probably an artefact of the isolation procedure, as pMOL98 was captured using a plasmid-cured derivative of C. metallidurans strain CH34, which possesses two copies of Tn6048 on its chromosome. The transposon may thus have transposed from the recipient chromosome onto the plasmid during or after the triparental mating procedure in the laboratory. This post-capture transposition hypothesis is supported by the subsequent discovery of another identical copy of the transposon in a recently sequenced IncP-1 plasmid, pAKD16, which was also isolated by triparental mating using the same CH34 derivative as recipient (Drønen et al. 1998; Sen, Rogers, Suzuki, Brown, Top, unpublished data). One cannot completely exclude that a copy of Tn6048 was already present on pMOL98 before it was captured, given that a similar (but not identical) Tn6048-like element has been found in B. vietnamensis and its accessory genes in B. cenocepacia, Dechloromonas and Ralstonia pickettii. However, given the 100% identity with the recipient chromosome and the independent finding of an identical copy on a different plasmid captured with the same host, pMOL98 probably captured Tn6048 in the laboratory, and not in the field.

The wild-type plasmids pMOL98 and pIPO2 (before they captured natural and mini-transposons) may very well be cryptic only because the accessory functions they encode, or the more subtle regulatory effects they may have on the host, are as yet unrecognized. Nevertheless, the existence in nearly all genomes, microbial or otherwise, of small mobile elements that do not encode recognizable beneficial coding sequences seems to support the hypothesis that simply their ability to replicate and spread is sufficient to sustain themselves as so-called “selfish elements” (Koonin and Wolf 2008). There are alternative explanations that invoke the contribution of such elements to genomic plasticity to justify their persistence within genomes, as mobile genetic elements with different structures are likely to be subject to very different balances of evolutionary pressures. As regards BHR plasmids, more data are needed, partly because until now the appreciation of plasmid genome content and diversity has been skewed by the common practice of sequencing elements that were isolated in relation to a specific phenotype (e.g., resistance, degradation or virulence). Hopefully future studies of plasmids isolated without known selective pressure will yield data that can be used to examine this question.

Metagenomics projects may help increase knowledge of natural plasmid diversity to a certain extent by generating large amounts of environmentally relevant data. However, it is not clear at present if “meta” content corresponding to plasmids can always be distinguished from that encoded on chromosomes. Until this practical issue can be resolved, whole genome sequencing of individual strains may remain the best option for characterizing plasmids in their native context. Retrieving plasmids by means of triparental mating procedures that select for mobilization potential only is still a reasonable alternative. However, the observation that two unrelated BHR plasmids that were isolated using this method, both may have acquired a transposon from the chromosome of the recipient strain raises the concern that the genomic content of these plasmids might be incorrectly interpreted. In the case of plasmids isolated by triparental matings, adequate controls should be used to detect such cases of colonization by the recipient’s mobile genetic elements. Therefore, the use of fully sequenced strains for such plasmid captures is strongly recommended.

Finally, although one cannot know for sure the original hosts of plasmids pMOL98, pIPO2T, and pSB102, which were isolated using plasmid capture procedures, a genomic signature comparison method can propose putative long-term plasmid hosts. Indeed, extensive data support the proposal that foreign DNA acquires the host’s nucleotide composition during long-term residence (Lawrence and Ochman 1997; Ochman et al. 2000). Thus if the plasmids would have resided in a set of closely related hosts over evolutionary time, their nucleotide composition should be similar to that of these strains. The genomic signature, which is the set of relative abundance values of short oligonucleotides (di-, tri-, and tetra-nucleotides), has been used to compare compositional similarity between bacterial chromosomes and mobile genetic elements such as plasmids and phages (Campbell et al. 1999; Pride et al. 2006; van Passel et al. 2006). Suzuki et al. (2008) applied the Mahalanobis distance method to this approach to predict the putative long-term host(s) of a plasmid with increased confidence. The Mahalanobis distance for each plasmid-chromosome pair was then converted to a P-value. P-values close to 1 represent high plasmid-chromosome signature similarities, suggesting coevolution of the plasmid with that or closely related hosts. Interestingly, when this method was applied to the pMOL98, pIPO2T, and pSB102, using the trinucleotide relative abundance, the most likely long-term hosts for all three plasmids were the β-proteobacteria Polaromonas naphtalenivorans and Dechloromonas aromatica (P = 0.95–0.98; results not shown). These organisms are known for their ability to degrade a variety of pollutants including aromatic hydrocarbons (Coates et al. 2001; Jeon et al. 2004; Chakraborty et al. 2005; Mattes et al. 2008). Since pMOL98 was isolated from soil that was heavily polluted by polynuclear aromatic hydrocarbons, the host prediction for this plasmid seems plausible. However, after removing the non-native transposons Tn6048 and mini-Tn5 Km from the pMOL98 sequence, the signature was still most similar to that of these two strains but the degree of similarity was much lower, with low P-values (P = 0.74 and 0.65, respectively) similar to those for plasmid pTER311 (0.79 and 0.72). This may suggest that these plasmids have evolved in bacteria that are not yet sequenced and not very similar to the currently sequenced strains. Alternatively, being promiscuous plasmids they may have moved around so frequently among diverse bacteria that they did not coevolve with any particular host. While this preliminary result requires further study, the approach may help us better understand the ecology and evolution of the environmentally relevant PromA family of BHR plasmids.