Introduction

The DNA replication process can involve introduction of wrong bases into the newly synthesized strand at a rate of one wrong base per 103–106 bases (Kunkel 2004; Kunkel and Bebenek 2000; Kunkel and Loeb 1980; Umar and Kunkel 1996). Wrongly incorporated base (mismatch or unpaired base) is referred to as primary mutation because this depends on the enzymatic property of DNA polymerase (Harr et al. 2002). To correct wrongly incorporated bases, organisms have evolved a system called mismatch repair system (MMRS). MMRS reduces the number of mismatch errors by 100–1000 times thereby maintaining the genomic stability (Cox 1995). The MMRS in Escherichia coli is constituted by three enzymes MutS, MutL, and MutH. Homologs of MutS and MutL are present in almost every prokaryote and eukaryote (Eisen 1998; Lin et al. 2007) whereas the omnipresence for MutH has been doubted (Claverys and Lacks 1986). Structure and function of MutS have been studied in Escherichia coli and Bacillus subtilis (Sixma 2001; Smith et al. 2001; Yang 2000). It is now known that the functional form of MutS is a homodimer (albeit structurally heterodimer) that recognizes the mismatched nucleotide and INDELs (insertions and deletions), and forms a complex with the DNA (Hsieh 2001; Iyer et al. 2006; Lamers et al. 2000; Li 2008; Obmolova et al. 2000). The MutS–DNA complex further interacts with the MutL homodimer protein in an energy-dependent manner (Acharya et al. 2003; Selmane et al. 2003). The interaction of these two complexes activates MutH which is an endonuclease, that cleaves the newly synthesized strand by recognizing the methylation pattern of “A” in the motif GATC on template strand and finally removes the mismatched bases by the excision of incorrect nucleotides on nascent strand (primer strand) (Lee et al. 2005). MMR system not only repairs mismatches but also some short INDELs in simple sequence repeats (SSRs) (Jaworski et al. 1995; Modrich and Lahue 1996; Parker and Marinus 1992; Schofield and Hsieh 2003). Mutation in the MMR system has, generally, been shown to destabilize SSRs in eukaryotes (Acharya et al. 1996; Strand et al. 1993) as well as in prokaryotes (Jaworski et al. 1995; Vogler et al. 2006). For example, it has been reported that in Neisseria meningitides, MutS mutation destabilizes the homopolymeric tracts (Martin et al. 2004). Similarly, the MutS mutation has been shown to destabilize dinucleotide (5′ AT) repeats in Haemophilus influenzae genome (Bayliss et al. 2002). However, in the same genome it has been shown that only inactivation of polI destabilized tetranucleotide (5′ AGTC) repeat tracts, but not MutS (Bayliss et al. 2002).

SSRs, also known as microsatellites, are the nucleotide sequences of repeats of 1–6 bp (Schlotterer 2000). These sequences are highly polymorphic characterized by high rates of insertion and deletion (INDEL) mutations of their repeat units (Garcia-Diaz and Kunkel 2006; Kunkel 2004; Sreenu et al. 2006; van Belkum et al. 1998). INDELs of repeat units in a SSR arise as a consequence of slipped strand mispairing during the process of DNA replication (Levinson and Gutman 1987; Schlotterer and Tautz 1992; Streisinger et al. 1966). Slippage on the template strand leads to contraction (deletion of repeat units) of SSRs whereas slippage on the nascent strand manifests into expansion (insertion of repeat units) of SSRs (Garcia-Diaz and Kunkel 2006; Harr et al. 2002; Levinson and Gutman 1987; Mirkin 2005; Streisinger et al. 1966). The bias in SSRs either toward expansion or contraction is referred to as the directionality of SSR evolution. Ever since it became known that several hereditary diseases are associated with expansion of triplet repeats (Ashley and Warren 1995) and colon cancer is associated with instability of certain mono- and di-nucleotide repeats (Aaltonen et al. 1993; Ionov et al. 1993; Marra and Boland 1995; Thibodeau et al. 1993), there has been a lot of interest on the discovery of the mechanisms behind directionality of SSR mutations. Despite some investigations into the directional evolution of the SSRs (Amos and Rubinstzein 1996; Amos et al. 1996; Ellegren 2002; Ellegren et al. 1995; Harr et al. 2002; Mirkin 2005; Primmer et al. 1996; Rubinsztein et al. 1995a, b; Webster et al. 2002), there is still an insufficient understanding of the factors influencing directionality of SSR mutations. There have been conflicting reports with regard to the directionality of SSRs in prokaryotes as well as in eukaryotes (Henderson and Petes 1992; Metzgar et al. 2002). In most of the cases dinucleotide repeats were the subjects of studies which reported that mutations are biased toward expansion in human as well as in swallow (Ellegren et al. 1997). Twerdi et al. (1999) reported expansion bias in SSRs in the MMRS-deficient as well as the MMRS-proficient mammalian cell lines. Similar results (expansion bias) were reported by Yamada et al. (2002) in the human MMRS-deficient and the proficient cell lines. Webster et al. (2002) reported bias of certain dinucleotide SSRs for expansion by comparing the human and chimpanzee genomes. Xu et al. (2000) reported that long alleles of a tetranucleotide SSR tracts show bias toward contraction. Huang et al. (2002) reported that the long alleles of a dinucleotide repeat show bias toward contraction whereas short alleles show bias toward expansion. Even mononucleotides repeats were also shown to be biased toward expansion in the MMRS-proficient as well as the MMRS-deficient mammalian cell lines. It has also been argued that the MMRS does not influence the direction of mutation (Boyer et al. 2002). However, in-depth study by Harr et al. (2002) in drosophila showed the role of the MMRS in the directionality of SSR mutation. In the wild-type cell lines SSR mutations were significantly biased toward contraction whereas in the spellchecker mutation accumulation cell lines deficient for MMRS SSR mutations were slightly biased toward expansion (Ellegren 2002; Harr et al. 2002).

While most of the studies focused on eukaryotic systems only very few have focused on prokaryotes. Studies on Haemophilus influenza and Escherichia coli genomes (De Bolle et al. 2000; Morel et al. 1998) suggested that bacterial SSRs do show directionality during evolution. The lengths of the SSR tracts studied were (AGTC)17–38 and (AC)51 in Haemophilus influenzae and Escherichia coli, respectively. It is to be noted that Escherichia coli has the functional MMR system (Levy and Cebula 2001) and its homolog is present in Haemophilus influenza as well. In Mycoplasma gallisepticum where the MMRS appears to be absent (Carvalho et al. 2005; Himmelreich et al. 1996), it was observed that a trinucleotide SSR tract (GAA)12 is biased toward contraction (Metzgar et al. 2002). Furthermore, a study in Escherichia coli on the tetranucleotide tract (AAGG)9 revealed that the tract is expansion biased (Eckert and Yan 2000). It should be noted that the lengths of SSRs considered in the aforementioned studies are much longer than the typical lengths of SSRs observed in the prokaryotic genomes (Field and Wills 1998; Mrazek et al. 2007; Sreenu et al. 2007) and hence observations made in those studies may not adequately represent the mutational bias of SSRs in the prokaryotic genomes. The reported contraction bias of these long tracts can be due to the fact that long SSRs experience, on an average, downward mutational bias. Furthermore, the studies were conducted on individual SSR tracts and were carried out under defined laboratory conditions. None of these studies reported mutational dynamics of mononucleotide repeats which are the most abundant repeat types in prokaryotic genomes (Coenye and Vandamme 2005). Lack of a global study on behavior of SSR mutations as well as the availability of whole genome sequences of variety of strains belonging to a number of species gave us tremendous advantage to analyze mutational pattern of SSRs in relation to the presence and absence of the MMRS. Studies were carried out on the SSRs found in the non-coding regions as these are thought to be neutral as far as the selection is concerned. Our studies revealed that SSRs in the genomes where the MMRS was present show mutational bias toward contraction. In the genomes where the MMRS is absent SSRs do not show bias toward either expansion or contraction.

Materials and Methods

Systematic Search for MMRS in Bacteria and Archaea Genomes

The whole genome sequences of prokaryotes were downloaded from NCBI’s ftp site (ftp://ftp.ncbi.nih.gov/genomes/Bacteria/). We considered only those species of prokaryotes for which the complete genome sequences are available for at least three strains. The annotations available along with the genome sequences were searched for the presence of keywords “MutS” and “MutL”. We did not consider MutH because its universal presence as a part of the MMRS in the prokaryotic genomes has been doubted (Claverys and Lacks 1986). It should be noted that in some prokaryotic genomes two types of MutS genes are present viz., MutS1 and MutS2. MutS1 is equivalent to Escherichia coli MutS and MutS2 counterpart is absent in Escherichia coli. The MutS1 has four functional conserved domains viz., MutSI, MutSII, MutSd, and MutSac. MutS2 lacks the MutSI, MutSII domains but has an additional SMR domain at its C-terminal end (Lin et al. 2007). In the Helicobacter pylori it has been shown that MutS2 does not play any role in MMR (Kang et al. 2005). Hence, we looked for the presence of MutS1 among the MutS homologs as it is the one that is involved in MMR. We therefore refer MutS1 as MutS unless otherwise it is mentioned specifically.

The protein sequences corresponding to the ORFs annotated as MutS and MutL were searched for the presence of functionally conserved domains characteristic of MutS and MutL using Pfam (Finn et al. 2008) and SMART databases (Schultz et al. 1998). Only those sequences containing all the characteristic functional conserved domains were considered as MutS and MutL homologs. Genomes where MutS and MutL homologs were absent and those in which the homologs were found but missing one of the characteristic functional domains was subjected to tblastn searches using full length MutS and MutL ORFs as queries. This enabled us to discover in some genomes DNA sequences very similar to full length MutS/MutL ORFs but containing frameshift mutations causing premature terminations. Those genomes which failed to yield sequences similar to MutS/MutL were further subjected to profile-based searches using PROFILE-SS to reconfirm the complete absence of the two genes. The profiles for MutS and MutL were constructed from their respective multiple sequence alignments of all the homologs gathered by us. The results pertaining to the presence/absence of MutS/MutL homologs in some genomes were also reconfirmed by comparing with the available published literature (Eisen 1998; Lin et al. 2007).

Extraction of SSRs and Comparison of Equivalent SSRs

SSRF (Sreenu et al. 2003) was used for the extraction of SSRs from the complete genome sequences. SSRF reports the motif type, number of iterations, start and end of SSR in the genome, and the region where SSRs are observed (coding or non-coding). The equivalent SSRs in the genomes belonging to a species were identified by comparing the flanking regions. SSRs were considered as equivalent provided the flanking regions of at least 50 bp were identical to each other in all the compared genomes. The equivalent SSRs showing length variation were considered as polymorphic SSRs (PSSRs). The entire process of genome sequence comparisons for identification of PSSRs has been coded in the form of a computer program called polymorphic simple sequence repeat finder (PSSRFinder) (Kumar et al. 2011). PSSRs harbored within the non-coding regions of all the compared genomes were categorized as non-coding PSSRs. Of these PSSRs, we considered only those showing two polymorphic states (i.e., two types of alleles) and among these equally populated allelic cases were discarded. In the cases (species) where multiple sub-strains were available only one sub-strain was considered for analysis. Information pertaining to the polymorphic SSRs identified from whole genomes belonging to 85 species have been made available in the form of a relational database called PSSRdb (pssrdb.cdfd.org.in) (Kumar et al. 2011).

Species-Wise Identification of Contractions/Expansions in PSSRs

To decide whether a given PSSR in a species is a case of expansion or contraction we examined the length distribution of PSSRs across strains. If the major allele is longer than the minor allele then the PSSR was considered as a case of contraction whereas if the major allele is shorter than the minor allele then the PSSR was considered as a case of expansion. However, in the case of Yersinia species the status of PSSRs as expansion or contraction was decided with reference to its ancestor Yersinia pseudotuberculosis. Furthermore, the PSSRs with reference alleles (the major or ancestor allele) longer than two repeat units were only considered for further analysis.

Statistical Significance of Contraction or Expansion Bias

The significance of contraction or expansion events was tested by performing χ2 test with one degree of freedom (any χ2 value >3.84 is considered as significant). χ2 value is calculated as given below.

$$ \chi^{2} = \sum\limits_{i = 1}^{k} {\left( {O_{i} - E_{i} } \right)^{2} /E_{i} } $$

where O i is the observed number of events, E i is the expected number of events (total number of contraction and contraction divided by 2), i = 1, k = 2 [only two events are possible (expansion or contraction)].

Results and Discussion

MMR System in Bacterial and Archaeal genomes

Our search for the MutS and MutL genes in the bacterial and archaeal genomes yielded three groups of species: (a) the species where one of the strains lacks the MMRS (Table 1), (b) the species where all the strains lack MMRS (Table 1), and (c) the species where all the strains have MMRS. Additional information such as name of the strain and protein IDs of MutS and MutL of each strain are given as Supplementary Table I.

Table 1 Species where MutS and MutL genes are present (“+”) and absent (“−”)

The Species Where One of the Strains Lacks MMRS

Of the species screened we found Haemophilus influenza and Acinetobacter baumannii where one of the strains (pittGG strain in Haemophilus influenzae, and ATCC 17978 strain in Acinetobacter baumannii) lacked functional MutS due to a frameshift mutation. In Haemophilus influenzae pittGG strain MutS is prematurely terminated due to the deletion of one repeat unit in a 7-bp mononucleotide, a poly “A” SSR tract positioned at 2,155 bp as compared to its homologs in the other strains of Haemophilus influenzae. Similarly in Acinetobacter baumannii ATCC 17978 strain MutS is frame shifted at position 1,989 due to the insertion of one nucleotide in the mononucleotide tract of a poly “A” tract, which is 6 bp in length in all other strains of this species. Due to this frame shift the ORF has split into two parts, A1S_1251 (573) and A1S_1252 (206). In the ORF A1S_1251, the ATPase and MutSIII domains are intact while in the second ORF the MutSII domain which is believed to be a connector domain between mismatch recognition domain and ATPase domain (Lamers et al. 2000; Obmolova et al. 2000) has been deleted. Although it is annotated as a coding region in the genome annotation file (Smith et al. 2007), splitting of domain suggests that the MutS protein may not be functional in this strain. In addition to this we also found in Acinetobacter baumannii and Staphylococcus aureus where one of the strains (ATCC 17978 strain in Acinetobacter baumannii and RF122 strains of Staphylococcus aureus) lacked functional MutL due to a frameshift mutation. In Acinetobacter baumannii ATCC 17978 MutL gene is frame shifted due to the deletion of one “A” in a poly “A” tract of 7 bp. In Acinetobacter baumannii wild-type gene of MutL codes for a 650 amino acid long protein, however, due to frame-shift mutation in strain ATCC 17978, it has resulted in two ORFs of which the second ORF translates into a 369 amino acid sequence corresponding to a truncated TopoII domain which is a MMR domain. On the other hand, in Staphylococcus aureus RF122 strain MutL gene is frame shifted due to the deletion of 1 bp at position 1,450 in a non-SSR tract and hence the homologous region in the genome is annotated as non-coding region.

Species Where all the Strains Lack MMRS

In addition to the above-mentioned species we also found eight species (two belonging to archaea and six belonging to bacteria) where both MutS and MutL genes (the complete DNA sequence) are absent in all their known strains (Table 1) thus offering as suitable systems to study SSR mutations in the light of MMRS deficiency. Among these, MMRS deficiency in five species viz., Mycobacterium tuberculosis, Mycobacterium bovis, Mycoplasma hyopneumoniae, Helicobacter pylori, and Campylobactor jejuni has already been reported (Carvalho et al. 2005; Dos Vultos et al. 2009; Eisen 1998; Lin et al. 2007; Mizrahi and Andersen 1998).

We looked into the phylogenetic relationships among the MMRS-deficient species (Fig. 1). It can be seen that MMRS-deficient species make four separate clusters but distinct from the clusters made by MMRS-proficient species. The fact that MMRS-deficient species form four clusters distinct from MMRS-proficient species indicates that the loss of MMRS is not an individual character but inherited from a common ancestor which might have selected for loss of MMRS. The phylogenic relationship obtained by our study is similar to the phylogenic relationship of prokaryotes generated on the basis of whole genome sequences by Henz et al. (2005). MMR-deficient species considered in our study are clustered in four different clusters in Henz et al. studies too. A total of 14 MMRS-deficient species are present in four different clusters in our study. However, of these 14 species, complete genome sequences of at least three strains in each species were only available for 7 species (please see the “Materials and Methods” for species selection criteria).

Fig. 1
figure 1

The phylogenetic relationship among all the species considered in this study is shown. The phylogenetic tree was constructed based on the 16S rRNA sequences. The tree also includes some species which were not included in this analysis because they were not meeting the required criteria (minimum three strains complete genome sequence) as mentioned in “Materials and Methods”. These extra species were included in the phylogenetic analysis because they are found to be MMRS deficient and cluster together with other MMRS-deficient species. MMR-deficient species are clustered into four different clusters (marked by X). Multiple sequence alignment was generated by CLUSTALX. Phylogeny was constructed by using the PHYLIP package. The un-rooted tree was drawn using draw-tree. Boot strap value used was 1,000

Polymorphic SSRs

Figure 2 shows the number of PSSRs found in each species as well as their densities in coding and non-coding regions (please see Supplementary Table I for the number and names of the strains in each species). In all the species the tract density in non-coding regions is strikingly higher than (on average ~14 times) that in coding regions. Higher incidence of SSR polymorphism in non-coding regions as compared to coding regions indicates relatively unrestrained polymorphism in non-coding regions. Restraint on SSR polymorphism in coding regions can be attributed to selection pressures. On the other hand, non-coding regions do not have such selection pressures and hence the possibility of unrestrained SSR polymorphism. We therefore considered only the PSSRs found in non-coding regions. We found more than 11,000 non-coding PSSRs from 41 species (see “Materials and Methods” for species selection criteria) of which a large majority is constituted by mononucleotide tracts. The numbers of PSSRs of each repeat type in different species are given in Supplementary Table II. About 99% of the PSSRs have undergone INDEL mutations of one repeat unit. Since 95% of the PSSRs are the mononucleotide tracts we considered only these tracts for further expansion/contraction studies.

Fig. 2
figure 2

The tract densities of PSSRs in coding and non-coding regions in various prokaryotic species are shown. The total number of PSSRs found in each species is shown on top of the bar. The tract density was calculated as number of PSSRs per Mb in given regions (coding or non-coding) in species. Total number of PSSRs is the number of non-redundant polymorphic loci observed in a species. Average of base pair in coding and non-coding regions in all strains in each species was considered as total number of bases in coding and non-coding regions, respectively. AB Acinetobacter baumannii, AP Actinobacillus pleuropneumoniae, BA Bacillus anthracis, BC Bacillus cereus, CB Clostridium botulinum, CG Corynebacterium glutamicum, CJ Campylobacter jejuni, CP Chlamydophila pneumoniae, CPER Clostridium perfringens, CT Chlamydia trachomatis, DV Desulfovibrio vulgaris, EC Escherichia coli, ER Ehrlichia ruminantium, FT Francisella tularensis, HI Haemophilus influenzae, HP Helicobacter pylori, LL Lactococcus lactis, LM Listeria monocytogenes, LP Legionella pneumophila, MH Mycoplasma hyopneumoniae, MM Methanococcus maripaludis, MT Mycobacterium tuberculosis and Mycobacterium bovis, NM Neisseria meningitidis, PA Pseudomonas aeruginosa, PP Pseudomonas putida, PS Pseudomonas syringae, RP Rhodopseudomonas palustris, SA Staphylococcus aureus, SAG Streptococcus agalactiae, SB Shewanella baltica, SE Salmonella enterica, SF Shigella flexneri, SI Sulpholobus islandicus, SP Streptococcus pneumoniae, SPY Streptococcus pyogenes, ST Salmonella typhi, STH Streptococcus thermophilus, XC Xanthomonas campestris, XO Xanthomonas oryzae, XF Xylella fastidiosa, YP Yersinia pestis

MMR Deficiency Leads to Destabilization of SSR Tracts

The number of PSSRs found in Haemophilus influenza, Acinetobacter baumannii, and Staphylococcus aureus are shown in Fig. 3. Most of the MMRS-deficient strains show significantly higher (Z > 4) number of PSSRs as compared to the MMRS-proficient strains indicating MMRS deficiency has led to destabilization of SSR tracts. This is in agreement with earlier reports (Levy and Cebula 2001; Watson et al. 2004). It is pertinent to note that MutS/MutL genes themselves harbor SSR tracts and as already noted the MMRS deficiency has arisen because of the frameshift mutations caused by SSR tracts and therefore in these species MMRS deficiency/proficiency is a matter under the control of SSR mutations. Unless fixed it is quite probable that mutating SSRs yield functional MMRS for some generations and non-functional MMRS for some other generations. An evidence to this argument can be seen from Fig. 3 where some of the strains ACICU of Acinetobacter baumannii and pittEE strain of Haemophilus influenzae also show high level of SSR instability (however, with low significance with Z score < 4.0) despite being MMRS proficient.

Fig. 3
figure 3

The number of PSSRs (Obs) found in each strain of a Acinetobacter baumannii, b Haemophilus influenzae, and c Staphylococcus aureus is shown. The MMRS-deficient and MMRS-proficient strains are shown in gray and black bars, respectively. Z scores\( \left( {Z - {\text{Score}} = \frac{{\left( {{\text{Obs}} - {\text{Exp}}} \right)^{2} }}{{\text{Exp}}}} \right) \) are indicated above the bars. Average number of PSSRs in each species was considered as expected number (Exp) of PSSRs in each species

We also found eight species where the genes of MMRS were completely absent in their respective genomes. MMRS deficiencies in the following five species viz., Mycobacterium tuberculosis, Mycobacterium bovis, Mycoplasma, Helicobactor and Campylobactor have already been reported (Carvalho et al. 2005; Dos Vultos et al. 2009; Eisen 1998; Lin et al. 2007; Mizrahi and Andersen 1998).

The densities of non-coding PSSRs (the number of PSSRs per Mb per strain) found in the MMRS-deficient and MMRS-proficient species are shown in Fig. 4. Though we have not defined a threshold for PSSR density to discriminate species as “high” or “low” in PSSRs, it is pertinent to note that nearly half of the MMR-deficient species have high PSSR densities in the non-coding regions. However, as the PSSR density relates to SSR stability and hence we can say higher the PSSR density, lower is the SSR stability. From the figure it can be seen that Mycoplasma hypopneumoniae, Methanococcus maripalusis, Helicobacter pylori, and Campylobacter jejuni are positioned at the high end of the PSSR tract density chart. The other deficient species viz., Mycobacterium (includes both Mycobacterium tuberculosis and Mycobacterium bovis) and Sulpholobus islandicus and Corynebacterium glutamicum harbor relatively low density of PSSRs and seem as “outliers” among MMRS-deficient species. Mycobacteria are known for low mutation rates which have been linked to the slow rate of DNA synthesis in these bacteria (Dos Vultos et al. 2009; Hiriyanna and Ramakrishnan 1986). Slow DNA synthesis may promote increased fidelity of the DNA polymerases even in the MMRS-deficient background (Radman 1998). The other outlier Sulpholobus islandicus is also a slow growing bacterium with the doubling time of 7–8 h. From the above observations it can be concluded that MMRS deficiency generally leads to the destabilization of SSRs.

Fig. 4
figure 4

Number of non-coding PSSRs per Mb per strain found in the non-coding regions of different prokaryotic species is shown. Due to heterogeneity in the number of strains in different species PSSR density was normalized to the total number of PSSR per mega base pairs per strain. The densities of PSSRs (the number of PSSRs per Mb per strain) found in the MMRS-deficient and MMRS-proficient species are shown. In each bar, the darkly shaded portion represents mononucleotide SSRs and the lightly shaded portion represents the other SSRs (di to hexa). The MMR-deficient species are marked with number symbol and species where one of the strains is deficient in MMR are marked with asterisk. For the x axis legend abbreviation please see Fig. 2 legend

However, we would like to point out an observation made in Escherichia coli, which is MMRS-proficient species, the F-plasmid can destabilize SSR tracts (Schlotterer et al. 2006). Given this, we can assume that MMR-proficient species too can harbor high number of PSSRs as a consequence of yet to be discovered mechanisms some of which may be species specific.

MMR is Strand Biased

During replication, there is an equal probability for the nascent strand and the template strand to harbor slippage errors and therefore one can expect equal number of contraction and expansion events in PSSRs. Results shown in Table 2 reveal that PSSRs are contraction biased in MMRS-proficient strains (Haemophilus influenzae, Acinetobacter baumannii, and Staphylococcus aureus). On the other hand, MMRS-deficient strains do not show significant bias toward either expansion or contraction. These results suggest that MMRS is less efficient in repairing slippage mutation on the template strand as compared to the nascent strand. It can also be argued that the contraction bias observed in MMRS-proficient species might also be a consequence of high frequency of primary mutations (slip out) in the template strand as compared to the nascent strand. To test which one of the aforementioned arguments is correct we examined contraction/expansion bias of PSSRs in the species lacking MMRS.

Table 2 PSSRs found in MMR-proficient and MMR-deficient strains of Acinetobacter baumannii, Haemophilus influenza, and Staphylococcus aureus

Primary SSR Mutations are Unbiased Toward Expansion/Contractions in MMR-Deficient Species

The SSR mutations observed in the absence of the MMRS are referred to as primary SSR mutations. Table 1 gives the list of species lacking the MMRS. The numbers of PSSRs in all the six species (Campylobacter jejuni, Helicobacter pylori, Mycoplasma hypopneumoniae, Corynebacterium glutamicum, Methanococcus maripalusis, and Sulpholobus islandicus) along with the number of times they show contractions and expansions are given in Table 3. It can be seen from the table that most of the MMRS-deficient species do not show significant differences between the numbers of expanded and contracted PSSRs. However, it can be seen in the case of Mycoplasma hypopneumoniae SSRs are significantly biased toward expansion. The expansion bias of SSRs in this MMRS-deficient species could be a consequence of selection for long SSR tracts. We would like to state that as compared to the other bacterial species Mycoplasma hypopneumoniae SSR tracts are longer. The most of the long polymorphic tracts in Mycoplasma hypopneumoniae show more than two polymorphic states and hence we could not unequivocally infer contraction/expansion events of SSRs (please see the “Materials and Methods”) and these long tracts were not analyzed in our studies.

Table 3 Number of PSSRs found in MMR-deficient species

PSSRs are Contraction Biased in MMRS-Proficient Species

Mutations in SSRs which escape MMRS are referred to as secondary mutations. We examined the directionality (contraction or expansion) of these secondary mutations (PSSRs) in MMRS-proficient species. The numbers of PSSRs found in the MMR-proficient species are given in Table 4. It can be seen that in most of the species PSSRs are significantly contraction biased.

Table 4 Number of PSSRs found in MMR-proficient species

PSSR Contraction Bias is Independent of Repeat Count as well as Repeat Types

In several studies, it has been reported that the SSRs with high repeat counts experience a downward mutation bias whereas those with low repeat counts are prone for expansions (Huang et al. 2002; Xu et al. 2000). In the studies reported elsewhere the length of SSR tracts were very large compared to the observed length of polymorphic SSRs in this study. Most of the PSSRs observed in our study are of 3- to 8-bp long and hence on this basis we have considered 3–5 bp in one group of PSSRs and more than 5 bp as the other group of PSSRs. We have also examined the data over one repeat unit shift (Fig. 5; Supplementary Table III). In most of the species, the PSSRs were observed with bias toward contraction suggesting that generally the SSR mutations are biased toward contraction irrespective of their repeat counts suggesting that the mutational bias of SSRs is related to MMRS rather than their repeat counts.

Fig. 5
figure 5

Mutation bias is independent of repeat count. Illustration of contraction bias of SSR mutations is not related to repeat count. Count ≤5 bp (a) and count ≥6 bp (b). For the x axis legend abbreviation please see Fig. 2 legend

To check the difference in mutation pattern in poly A|T and poly G|C tract we have analyzed the range of genomes with low GC content to high GC content. It should be noted that the A/T-rich genomes harbor higher number of polymorphic poly A|T tracts as compared to the number of polymorphic poly G|C tracts. Similarly in GC rich genomes it is expected to see higher number poly G|C PSSR tracts as compared to the number of poly A|T PSSR tracts. To examine the mutational bias, if any, between poly G|C and poly A|T tracts we have considered seven species (Table 5) of which three are AT-rich genomes, one is a neutral genome (AT and GC content 50% each) and the remaining three are GC-rich genomes. It can be seen from Table 5 that there is no difference between poly A|T and poly G|C PSSR tracts with regard to their mutational bias.

Table 5 Contraction bias observed irrespective of repeat types (poly A|T or G|C)

Directionality of SSR Mutation is Independent of Physical Properties of Genomes

Table 6 gives a comparison between the directionality of SSRs mutation between the MMRS-proficient and MMRS-deficient species having similar GC content of the genome. It can be seen that the genomes having similar genome compositions have different directionality of SSR mutation and hence it can be argued that directionality in SSR mutations between the MMRS-proficient and MMRS-deficient species is independent of the composition of genomes but dependent on the presence or absence of the MMRS.

Table 6 Comparison between MMR-proficient and MMR-deficient species with similar GC percentage

Conclusions

In this study, we have investigated the mutational bias of SSRs toward expansion or contraction in relation to the presence and absence of MMRS in prokaryotes. Our investigations revealed that the MMR deficiency as a consequence of frameshift mutation in MMR gene leads to an increase in the number of SSR tracts undergoing slippage-related INDEL mutations. The strain-specific INDEL mutations in MMRS in Acinetobacter baumannii, Haemophilus influenza, and Staphylococcus aureus indicate that these mutations are not inherited during speciation. The strain-specific mutations have arisen during different points of time during evolution. If the mutation event is relatively new then all the characteristics of a mutation may not be very evident. The number of SSR mutations is not very high in case of Acinetobacter baumannii ATCC_17978 as compared to the other strains of Acinetobacter baumannii and this could be due to the recently happened mutation in MMRS or sequencing errors. The indication of this is also evident in the mutation bias of the SSRs in this strain [Table 2 (contraction/expansion ratio is 1.2)]. In the case of the other two mutated strains Viz., Haemophilus influenza PittGG and Staphylococcus aureus RF122 SSR mutations are not biased toward either of contraction or expansion (contraction/expansion ratio is 1).

Most of the MMR-deficient species show high number of mutated SSR tracts. However, despite the MMR deficiency mycobacterium showed high stability in non-coding SSR tracts which could be due to increased fidelity of the DNA polymerases even in the MMR-deficient background due to high generation time. The variations in SSR stabilities between the MMR-deficient species are also due to the fact the differences in the living environment between the species. The species where MMRS is present SSR mutations show significant bias toward deletions. However, we could not find an unequivocal bias of SSRs either toward expansion or contraction in species deficient in MMRS. The directionality of evolution of SSRs seemed to be independent of their repeat counts [long (6–8 bp) vs. short (3–5 bp) SSR] as well as the repeat types (A|T vs. G|C). Directionality of SSR mutation is also independent of the physical properties (GC content) of genome. Contraction bias of SSRs explains the enrichment of short tracts in prokaryotic genomes which are mostly MMRS proficient.