In 1960, Edward Adelberg and Sarah Burns started to create a collection of the Escherichia coli K12 strains with the prefix AB [1]. In 1962, the collection was enriched by the strain AB1157 carrying several auxotrophic mutations and mutations affecting utilization of sugars, was resistant to streptomycin and used as F recipient in conjugal crosses with donor strains [2]. The aim of these experiments was to study the bacterial chromosomal structure. The strain AB1157 was used by Paul Howard-Flanders et al. as a parent strain for isolation of uvrA, uvrB, and uvrC mutants that are unable to perform the first step (incision) of excision repair of DNA after UV-irradiation. These mutants were obtained after the treatment of AB1157 bacteria with nitrous acid. The strain AB1885 uvrB5 was obtained in the same way [3]. In 1966, a mutant deficient in genetic recombination was isolated from the strain AB1157. The mutant was designated AB2463 and possessed mutation recA13 [4]. The next mutant derived from AB1157 was one mutant affected in its capacity for radiation-induced mutagenesis, i.e., which was SOS-response-deficient. The strain was designated AB2494 and carried mutation lexA1, which makes the LexA repressor unable to perform autoproteolysis [5]. The strains AB2463 and AB2494 were obtained after treatment of AB1157 with N-methyl-N'-nitro-N-nitrosoguanidine (NG). All the aforementioned strains have been used by numerous research groups [69] in hundreds of works on repair, recombination, and mutagenesis in E. coli bacteria. However, the genomes of these strains have not been sequenced. We performed the whole genome sequencing of АВ1157, АВ2463, АВ2494 and, АВ1885 bacteria. In this paper, we present an analysis of the results, which reveals a significant number of changes in АВ2463 and АВ2494 genomes when compared with the parent strain AB1157.

Thus, an attempt to answer the following questions was made. (1) When did the found changes arise (immediately after mutagenesis or in subsequent years)? (2) Did the changes in АВ2463 and АВ2494 occur according to one scenario or in different ways? (3) What is the contribution of spontaneous mutations of AB1157 to the difference between AB1157, АВ2463, and АВ2494 genomes?

MATERIALS AND METHODS

Bacterial Strains and Cultivation

Bacteria were cultivated according to the protocols described earlier [10]. The main genetic characteristics of the strains were AB1157, F-thr-1 leu-6 proA2 his4 thi-1 argE3 lacY1 galK2 ara-14 xyl-5 mtl-I tsx-33 strA31 sup-37; AB2463, as AB1157, but also recA13; AB2494, as AB1157, but also metb-I lexA1 and AB1885 as AB1157, but uvrB5 (data were taken from the Yale University database (Coli Genetic Stock Center (http://cgsc2.biology.yale.edu/)).

Strains AB1157, AB2463, AB2494, and AB1885 were kindly provided by Paul Howard-Flanders to G.B. Smirnov in 1967. Strain AB1157, indicated here as AB1157 alt, was kindly provided by M.A. Petrova from the Department of Storage and Analysis of Microorganisms of the Institute of Molecular Genetics, Russian Academy of Sciences, in 2017.

DNA Extraction

Bacterial lysis was carried out using Promega Nuclei buffer (Promega, United States). A saturated NaCl solution was added to remove cellular proteins. DNA was concentrated and desalted by isopropanol deposition. TE buffer (50–100 µL) was added to the DNA precipitate for further storage at 4°C. DNA was additionally purified using minicolumns for DNA purification (Technoclone, Russia) in accordance with the manufacturer’s instructions.

Sequencing and Genome Analysis

DNA (300 ng for each sample) was fragmented using the Covaris S220 system (Covaris, Woburn, Massachusetts, United States) up to a final size of 300–500 bp according to the manufacturer’s recommendations.

The DNA libraries were prepared with an Ion Xpress™Plus Fragment Library Kit (Thermo Fisher Scientific) for sequencing on the Ion Torrent PGM (Thermo Fisher Scientific). The Ion PGM™Template OT2 200 Kit (Thermo Fisher Scientific) was used for emulsion PCR. DNA sequencing was performed an using Ion 318 chip v2 and Ion PGM™ Sequencing 200 Kit v2 (Thermo Fisher Scientific).

For sequencing on the HiSeq 2500 platform (Illumina, United States), pair-end libraries were prepared as recommended by the manufacturer using the NEB Next Ultra II DNA Library Prep Kit (New England Biolabs, United States). The libraries were indexed using the NEB Next Multiplex Oligos for Illumina (96 Index Primers) (New England Biolabs, United States). Sequencing was performed as recommended by the manufacturer using the HiSeq Rapid PE Cluster Kit v2, HiSeq Rapid SBS Kit v2 (500 cycles), and HiSeq Rapid PE FlowCell v2.

The sequencing results were deposited in the NCBI database under the PRJNA416242 project. The deposit numbers for the sequenced read archives (SRAs) are SRX3421413–AB1157, SRX5178624–AB1157 alt, SRX5178623–AB1885, SRX3421414–AB2463, and SRX3421412–AB2494. Alignment of the obtained reads to the reference genome Escherichia coli K-12 substr. MG1655 (NC_000913.3) was performed using Bowtie 2 [11]. FreeBayes was used to search for single-nucleotide polymorphisms [12]. A variant was considered as confident if more than 90% of observations supported the allele and coverage was at least 10× [13].

To find genomic regions (nonoverlapping windows of 2 kb and 10 kb in length) with an overrepresented number of SNPs, we used Poisson distribution and Bonferroni correction for the p-value threshold (p < 0.05 after the correction was considered as significant). Statistical analysis was performed using the R programming language.

RESULTS AND DISCUSSION

Whole genome sequences of E. coli strains–namely, parent strain AB1157, another lineage of this strain AB1157 alt, recA13 mutant AB2463, lexA1 mutant AB2494, and uvrB5 mutant AB1885–were carried out. The genomes were compared with each other and with the reference genome of Escherichia coli K12 substr. MG1655 (NC_000913.3) presented in NCBI (below as E. coli K12 (NCBI)) (Table 1).

Table 1.   Paired comparison of the number of mutations in the genes of the studied strains

As a result, 131 and 293 mutations were found in the genes of strains AB2463 and AB2494, respectively, relative to the parent strain AB1157. Analysis of the location of these mutations in the genomes did not reveal statistically significant clusters. Moreover, the positions of SNPs found in the АВ2463 and АВ2494 genomes did not coincide. Most of these changes were represented by transitions GC to AT (Table 2), corresponding to the known mechanism of action of the alkylating agents [14].

Table 2.   Number and type of unique polymorphisms in genes of strains АВ2463, АВ2494, and AB1885

In turn, the genomes of different lineages of the strain AB1157 (here, AB1157 and AB1157 alt) were different in eight positions. The genome of AB1157 was different from the E. coli K12 (NCBI) in 120 positions. In the genome of AB1157, two substitutions occur making difference with АВ2463 and АВ2494. (Here and below, the changes in the AB1157 genome designated based on comparison with the genome of E. coli K12 (NCBI)). However, the genome of AB1157 alt did not contain these two substitutions, but possessed five of its own unique mutations. The genome of the strain AB1157 contained 11 mutations (two nonsense, seven nonsynonymous, and two synonymous) absent in the AB2494 genome, nine of them were common with AB2463 and AB1157 alt.

The AB1885 genome (obtained after treatment of AB1157 with nitrous acid) possessed only 25 differences from the genome of AB1157, nine of them were specific for AB1157 (seven were held in common with AB2463 and two were unique for AB1157). AB1885 contained 16 unique mutations absent in the genome of the parent strain AB1157. These mutations did not coincide in any position with those in the AB2463 or AB2494 genomes. The data obtained show that nitrous acid may induce transversions and transitions with a slight preponderance of the latter (10 out of 16) (Table 2).

Further comparative analysis of known genetic markers of the strains from the E. coli Genetic Stock Center, Yale University School of Medicine, was performed. Genetic markers, their phenotypic expression, and mutations common for all studied strains are shown in Table 3. Genetic markers specific for these strains are shown in Tables 4 and 5.

Table 3.   Description of mutations common to all E. coli AB strains
Table 4.   Mutations common to a number of sequenced AV strains
Table 5.   Unique mutations in AB strains

Whole genome sequencing became a classic method of microbial molecular genetics. The decoding of historically significant genomes became a question of time. Our sequences not only confirmed the existence of earlier known mutations corresponding to genetic characteristics and described in Yale University database but also revealed new earlier unknown (unpublished) defects in the genes influencing phenotype. For example, the strain AB2494, while having the same auxotrophic requirements (arginine and others) as AB1157 or AB2463, possessed a mutation in a different gene of the arginine biosynthesis pathway. Thus, the sequencing allows a fresh look to be cast on data that have been obtained earlier.

The first question mentioned in the Introduction may be put in another way: are the changes found in the AB2463 and 2494 spontaneous or induced by NG? We do not know how many changes appeared immediately after mutagenic treatment, but we may compare the number of alterations in the strains obtained after treatment with NG and nitrous acid. The number of differences between the genomes of AB1157 and AB1885 at least ten times lower than between the genomes of AB1157 and AB2463 or AB2494. It means that the vast majority of base-pair substitutions in the genomes of the strains AB2463 and AB2494 were NG- induced rather than spontaneous. Thus, the definite answer to the first question is that the changes in the AB2463 and AB2494 genomes were induced by NG.

Regarding the second question, “one scenario” assumes the similarity in the location and type of changes in the genomes of AB2463 and AB2494. We showed that the predominant mutations were transitions GC to AT for both strains; i.e., the type of changes is the same. In the AB2463 genome, 129 unique mutations were found, while in the AB2494 genome, 280 were found; however, the positions of these mutations did not coinсide and there were only nine genes (of a total of 344), with the strains having mutations in different sites. The absence of similarity argues that in the course of mutagenesis which led to the isolation of the stains AB2463 and AB2494, NG attacked the AB1157 genome randomly, which finally resulted in the induction in AB2463 and AB2494 genomes entirely different changes. Thus, we have a clear answer for the second question: upon isolation of the strains AB2463 and AB2494, NG induced mutations in different sites of AB1157 genome. One may suggest that this was due to different experimental conditions (e.g., different pH values of NG solution).

To answer the third question, one should estimate how many mutations occur in the strain AB1157 after isolation of AB2463 and AB2494 derivatives. Obviously, during the 56 years since its isolation in 1962, the strain AB1157 has to have undergone genetic changes. For this reason, to determine the value of spontaneous changes, we compared the genomes of two lineages of AB1157 with entirely different histories of storage and incubation that have existed during the whole period (1962–2018) in different laboratories. The two lineages of AB1157 differed only in eight mutations (AB1157, three mutations, and AB1157 alt, five mutations), all of which occurred in different genes. The differences between the two lineages of AB1157 were one order of magnitude less than between AB1157 and its derivatives obtained after NG mutagenesis. However, it is impossible to determine the fraction of spontaneous mutations that arose in AB2463 and AB2494 after treatment of AB1157 with NG, but one can assume that it is relatively small.

CONCLUSIONS

(1) Earlier unknown mutations corresponding to widely known genetic markers were annotated in the E. coli genomes.

(2) Hundreds of differences between the АВ2463 recA13 and АВ2494 lexA1 genomes, on one hand, and the parent strain AB1157, on the other hand, were mainly NG-induced.

(3) Mutations in the AB2463 and AB2494 genomes are distributed randomly, with no hot points or hot regions being observed.

(4) The positions of mutations in the genome of AB2463 did not coincide with those found in the AB2494 or AB1885.

(5) During the period from the bifurcation of clones АВ1157 and АВ1157 alt until sequencing (56 years), eight mutations occurred in their genomes, which we consider to be spontaneous.

(6) During the period from the segregation of E. coli K12 (NCBI) and АВ1157 120-122 changes occur in their genomes.