Keywords

8.1 Introduction

Humans have been infected with M. tuberculosis (Mtb) for millennia. Mtb, the intracellular pathogen that causes tuberculosis (TB), was discovered in 1882 by Robert Koch and is responsible for more human deaths than any other single pathogen today (Kaufmann et al. 2010; Kaufmann and Winau 2005; Ottenhoff 2009). Mycobacterium bovis Bacillus Calmette-Guérin (BCG), an attenuated strain of M. bovis, has been used as a prophylactic measure against TB for nearly a century to immunize over four billion individuals in more than 180 countries or territories (McShane 2011; Ottenhoff and Kaufmann 2012; Zwerling et al. 2011). Over 90% of children worldwide are vaccinated with BCG and more than 120 million doses of BCG are administered annually, making it the world’s most widely used vaccine. Given this large denominator, it is possible to infer that there is a remarkable safety record. Unfortunately, despite the large numbers of individuals who have received BCG, in both programmatic settings and in clinical trials, the true efficacy of BCG has been difficult to understand due to many experimental variables (Griffin et al. 2001).

The parent BCG vaccine strain was derived from a virulent strain of M. bovis, a primary cause of TB in cattle, which is member of the M. tuberculosis complex that mainly affects wild and domesticated mammals (see Chap. 7) (Frothingham et al. 1994; Imaeda et al. 1985). Case-control studies have shown that BCG is associated with protection against childhood disseminated TB including meningitis and miliary TB (Colditz et al. 1995; Trunz et al. 2006). However, its efficacy against pulmonary TB in adults, as measured in randomized controlled trials, has varied from no efficacy at all to as high as 80% protection (Fine 1995). It was noted as far back as 1967 that the protective efficacy of BCG against TB varies substantially between studies, showing an average risk reduction of pulmonary TB of 50%, and of disseminated and meningitic forms of this disease by 70–80% (Brewer 2000; Colditz et al. 1995; Trunz et al. 2006). There are several reasons for this variation in BCG vaccine effectiveness, including differences in host populations, differences in their exposure to environmental mycobacteria and methodologic differences between the studies. In this chapter, we focus on differences between strains of BCG vaccine in use, and consider the possibility that the heterogeneity among these strains has in part contributed to the heterogeneous results from clinical trials (Behr 2002). Nowadays, several BCG strains are currently used worldwide as vaccines. Early clinical trials in indigenous groups in North America, infants in Chicago and school children in the UK demonstrated the efficacy of the vaccine and led to its distribution to several countries for world-wide application (Aronson et al. 1958; Rosenthal et al. 1961). Next to BCG, no other vaccines are available for protecting from TB, and of the many new candidates in the pipeline none is close to market use. Therefore, it is critical to examine the product known as BCG and to ask whether evolution and strain variability in BCG may have a role in determining the capacity to provide protective immunity against TB.

8.2 Historical Aspects of BCG

BCG is named for Albert Calmette and Camille Guérin (Bacillus of Calmette and Guérin) who derived the original BCG vaccine strain from an isolate of M. bovis at the Pasteur Institute of Lille, France in the early part of the twentieth century. Since 1900, Calmette and Guérin began their research on the M. bovis strain, which had been isolated from the milk of a cow suffering from tuberculous mastitis by Nocard at the Pasteur Institute of Paris, France in 1902. This isolated strain of M. bovis was used by Albert Calmette and Camille Guérin to study the pathogenesis of bovine TB (Grange et al. 1983; Oettinger et al. 1999). In early studies, Calmette reported that oral inoculation of M. bovis resulted in pulmonary TB, through lymphatic spread from the mesenteric to the mediastinal lymph nodes. This finding presented a public health challenge, as, at the time, the predominant route of acquisition of pulmonary TB was through respiratory aerosols. To produce inocula for these experiments, Calmette and Guérin cultivated these tubercle bacilli on a glycerol-soaked potato medium, but they found that there was difficulty in the production of homogenous suspension of the bacilli and bacteria used to grow in clumps in vitro. In order to minimize mycobacterial clumping and make the bacteria homogenous for optimized infections, they added ox bile to the potato slices soaked in glycerol. To their surprise, they observed alterations in colony morphology within a few months of growth on this new medium, and when injected into guinea pigs, the bacilli were less virulent than the original Nocard’s M. bovis. This fortuitous observation became source of long term project of producing a vaccine from this attenuated tubercle bacilli (Calmette 1922; Gheorghiu et al. 1983).

However, appreciating the importance of reduced virulence in terms of vaccine development, Calmette and Guérin continued the serial in vitro sub-culturing of the Nocard’s M. bovis strain on potato slices soaked in ox bile and glycerol at three weekly intervals for 13 years (1908–1921), leading to total of 231 passages (Corbel et al. 2004). When administered at different doses and by different routes, the lab-adapted Nocard M. bovis strain was well tolerated and failed to produce progressive TB in different animal models, including guinea pigs, cows, horses, hamsters, mice, rabbits, dogs, chicken, and non-human primate (Sakula 1983). However, these cultures maintained the clumped morphology, and the same physical properties and exhibited continued immunogenicity in animal models i.e., guinea pigs, cattle, mice and chimpanzees. Furthermore, BCG vaccination protected cows against challenge with virulent M. bovis. These results established the safety and efficacy of BCG vaccine in experimental animals. At Guerin’s suggestion, they named it Bacille Bilie Calmette-Guerin; later they omitted “Bilie” and so BCG was born (Calmette 1922).

In 1921, the BCG strain was used for the first time as a human vaccine at the request of Benjamin Weill-Halle a French pediatrician and bacteriologist. Weill-Halle wanted to protect an infant born to a mother who had died of TB a few hours after childbirth, and who was now under the care of a grandmother suffering from TB at the Charité Hospital, Paris (Bryder 1999; Sakula 1983). On July 18th, 1921, Benjamin Weill-Halle assisted by Raymond Turpin administered orally the culture of the lab-adapted Nocard’s M. bovis strain in three doses of 2 mg each (6 mg total; ∼2.4 × 108 bacilli). On follow-up, there were no serious side effects, and the child did not develop any sign of TB. Based on this first anecdotal success, over the next year, additional newborns were vaccinated and no ill effects were reported. By 1924, they were able to report a series of more than 660 oral BCG vaccinations of infants (Calmette et al. 1924). For the first time, a safe and apparently effective vaccine was available for protection against human TB. The Pasteur Institute of Lille started mass-production of the BCG vaccine for medical applications.

As early as 1924, the original culture of BCG strain was sub-cultured and distributed to several laboratories throughout the world (Oettinger et al. 1999). These cultures were further propagated on non-synthetic culture media that varied around the world, and used for local vaccine production. This propagation of BCG on different culture media, at times following different passaging schedules, led to its diversification into a number of genetically distinct BCG sub-strains (daughter strains) (Liu et al. 2009). The first documented distribution of a daughter strain was BCG-Russia obtained in 1924 (Dubos and Pierce 1956). In the absence of lyophilisation or freezing and the production of seed-lots in 1961, BCG-Pasteur continually underwent serial passage of in vitro evolution, with daughter strains obtained directly or indirectly from the Pasteur Institute of Lille. After BCG Russia, records of BCG transfers between laboratories have shown that BCG Moreau and Japan (Tokyo-172), were obtained in 1925 (Behr and Small 1999; Obayashi 1955), Sweden in 1926 and Birkhaug in 1927 (Lind 1983; Wallgren 1928). Over the years, more than 14 sub-strains of BCG have evolved and have been used as BCG vaccine strains in different parts of the world. The main strains or seed-lots that are currently in use are: BCG Pasteur 1173 P2 (lyophilized strain in 1961 after 1173 serial passages (Gheorghiu et al. 1983)), BCG Danish 1331, BCG Glaxo 1077 (derived from the Danish strain), and BCG Moreau RDJ strains. The historical records of BCG dissemination separated “early strains” that were obtained in the early 1920s (BCG Russia, BCG Tokyo, BCG Moreau, BCG Sweden, BCG Birkhaug) from “late strains” that were obtained from the Pasteur Institute after 1927 (Fig. 8.1). The distinction between early strains and late strains coincides with reports of ongoing attenuation of BCG in the late 1920s (Dreyer and Vollum 1931), and is correlated with severely reduced production of the antigenic proteins MPB70, MPB83 and MPB64 in late strains (Milstien and Gibson 1990; Wiker et al. 1996). However, whether this historical distinction is critical for BCG phenotypes remains uncertain, as certain properties such as production of virulence lipids, have been independently lost in both early and late strains (Chen et al. 2007). Furthermore, the relevance of an early versus late dichotomy in term of protective efficacy in humans cannot be ascertained, as randomized clinical trials of BCG vaccination only employed late strains.

Fig. 8.1
figure 1

Revised historical genealogy of BCG strain dissemination. The vertical axis represents time. The horizontal axis denotes different geographic locations of BCG propagation. Strains obtained before 1927 are labelled as “early strains”; strains obtained in 1931 or later are indicated as “late strains” (Adapted from a figure by Behr et al. 1999)

8.3 Phylogeny of BCG

It is fundamental to recognize that the currently available BCG strains have undergone two phases of genomic modification. The initial phase (1908–1921) comprises the 231 in vitro passages conducted by Calmette and Guérin to produce the lab-adapted Nocard’s M. bovis strain (the original vaccine). The second phase starts circa 1924 with widespread use and distribution of the culture of the lab-adapted Nocard’s M. bovis strain. It ends several decades, and hundreds of passages, later (1961 for BCG-Pasteur 1173 P2, but different years for different daughter strains) with the establishment of frozen seed lots. Due to the initial phase, it is expected that BCG daughter strains should share the acquired particular genomic modification (BCG versus virulent M. bovis), whereas the second phase should give rise to additional genomic modification, specific to individual BCG strains and lineages. During the past two decades, several studies on BCG daughter strains have demonstrated changes at the genome level, using a variety of comparative genomic techniques, including subtractive hybridization, BAC libraries, spotted oligonucleotide arrays, microarray based resequencing, and whole genome sequencing. These studies have not only documented differences between BCG and M. bovis, but also documented additional genomic modifications that apparently were not present in the original culture obtained by Calmette and Guerin (Abdallah et al. 2015; Behr et al. 1999; Brosch et al. 2000, 2007; Gordon et al. 1999; Mahairas et al. 1996; Mostowy et al. 2003; Salamon et al. 2000). This extensive genotypic diversity in BCG daughter strains includes uncovered regions of difference (RD), single nucleotide polymorphisms (SNPs), insertion sequences (IS6110), deletions and tandem duplications.

Notably, due to the loss of the original strain of M. bovis (Nocard strain) used to derive BCG during the First World War, genomic studies have compared BCG vaccine to a variety of different M. bovis strains until the sequence of the M. bovis AF2122/97 strain was completed in 2003 (Garnier et al. 2003). As a consequence, certain differences uncovered between a particular BCG strain and the chosen M. bovis strain could represent M. bovis-M. bovis variants rather than BCG-specific differences. For instance, most circulating strains of M. bovis, including the sequenced strain AF2122/97, have only one copy of the IS6110 element. A straightforward comparison with most BCG strains might suggest that this was the ancestral state of BCG. However, a second copy of IS6110 element has been shown to be present in certain BCG daughter strains, and these were limited to BCG strains obtained before 1925 (BCGs Russia, Moreau, and Japan). While a potential explanation would be that a second IS6110 element was introduced into these different strains, the finding that they each have IS6110 element at base pair 851,592 of the M. tuberculosis H37Rv genome suggest the possibility that the ancestral M. bovis which gave rise to BCG had two copies of IS6110, with one IS6110 element deleted in 1925–1926 (Mostowy et al. 2003).

Based on the accessible data, it is now well defined which genetic characteristics are shared across all BCG daughter strains relative to M. bovis that may be directly involved in the attenuation of BCG, and which genetic variations are specific for only certain BCG daughter strains that may account for variation in protective efficacy and over attenuation of certain BCG daughter strains. RD1 is the example for the genomic particularities that apply to all BCG daughter strains and shown to be implicated in virulence. Deletion of RD3 and Del_Mb2377c most likely occurred during the first attenuation period (1908–1921) as these regions are absent in all BCG daughter strains (Liu et al. 2009). The variations that were observed for only certain daughter strains of BCG consist of deletions, duplications, and point mutations, and probably occurred during the following period of divergence (1921–1966). Of note, RD2 was deleted only from daughter strains derived from the BCG Pasteur strain after 1927 while nRD18 is only deleted in strains obtained after 1933.

Based on the RD1 and RD2 deletions among BCG daughter strains, two main groups of BCG have been suggested. The first group includes BCG Russia, Moreau, Japan, Sweden and Birkhaug, which were distributed from the Pasteur Institute between 1921 and 1926 and have only the RD1, RD3 and Del_Mb2377c deletions. The second group, distributed after 1927, includes BCG Prague, Glaxo, Merieux, Danish, Frappier, Connaught, Tice, Mexico, China, Phipps and Pasteur, and has the first group of deletions as well as the RD2 deletion. Based on tandem duplication marker, each of the above two groups have been further divided into two DU types. BCG Russia, Moreau and Japan as DU2 group I; BCG Sweden and Birkhaug as DU2 group II; BCG Prague, Glaxo, Merieux and Danish as DU2 group III and Frappier, Connaught, Tice, Mexico, China, Phipps and Pasteur as DU2 group IV (Brosch et al. 2007). Therefore, a molecular phylogeny based on this available data has been established and is generally consistent with the historical records of BCG dissemination (Fig. 8.2).

Fig. 8.2
figure 2

Revised genealogy of BCG vaccines. Evolutionary scheme of BCG vaccine strains (From reference Abdallah et al. 2015), displaying the original virulent M. bovis ancestor strain and the subsequent series of genomic alteration including deletions of regions of difference (RD), SNPs and some strain-specific insertions (‘Ins’) and deletions (‘Δ’)

8.4 Molecular Evolution of BCG Daughter Strains Between 1908 and 1921

During continuous in vitro passage between 1908 and 1921, BCG lost 38 open reading frames, which were identified using subtractive hybridization (Mahairas et al. 1996), BAC libraries (Gordon et al. 1999), and spotted oligo-nucleotide arrays (Behr et al. 1999). These deleted genes include the RD1 region. The suggestion that the loss of RD1 contributes to BCG attenuation has been confirmed by subsequent studies. RD1 encodes the ESX-1 protein secretion system, which is one of the five type-VII secretion systems found in the M. tuberculosis genome (Abdallah et al. 2007; Behr and Sherman 2007). RD1 is 9.5 kilobases (kb) in length and comprises nine genes, including the genes that encode the secreted proteins ESAT-6 (early secreted antigenic target of 6 kDa) and CFP-10 (culture filtrate protein of 10 kDa). Both of these proteins are important T-cell antigenic targets and are essential for the virulence of M. tuberculosis. The first experimental evidence for such contribution was obtained when the BCG Pasteur strain was complemented with the RD1 locus. The recombinant BCG strain was more virulent in severely immunodeficient mice than the BCG Pasteur strain (Pym et al. 2003). In other studies, deletion of RD1 from virulent M. bovis and M. tuberculosis strains resulted in ΔRD1 mutants that were significantly attenuated for virulence in both immunocompromised and immunocompetent mice (Hsu et al. 2003; Lewis et al. 2003). In a separate approach, individual genes in the RD1 locus were identified as virulence factors for M. tuberculosis (Guinn et al. 2004; Hsu et al. 2003; Stanley et al. 2003). The disruption of these individual genes also resulted in the attenuation of virulence in mice. Collectively, these studies provide convincing evidence that the RD1 deletion is a key mechanism of BCG attenuation and plays a major role in virulence; although these studies did not exclude additional genetic lesions as also contributing to the loss of virulence and attenuation of BCG described by Calmette and Guerin between 1908 and 1921. Since it is common to all BCG strains, the loss of RD1 likely occurred in the initial stage of BCG attenuation.

A recent comparative genome analysis of multiple BCG daughter strains using whole genome sequencing has uncovered a 103 bp deletion present across all BCG daughter strains, and which eliminates the distal end of hspR (Abdallah et al. 2015). This gene is involved in transcriptional regulation (repression) of heat shock proteins and is known to impact virulence. The hspR locus activates a subset of the heat-shock general stress response upon macrophage invasion (Stewart et al. 2002), and is necessary in the persistent phase since strains with an hspR deletion (ΔhspR) exhibit attenuated growth in chronic infection (Stewart et al. 2001). Furthermore, the same study showed that a duplication of 2900 bp segments spanning the region between 1,276,501 and 1,279,400 base pairs (M. bovis coordinates) in all BCG daughter strains (Abdallah et al. 2015).

Although the most attributed reason for the primary attenuation of BCG compared to M. bovis is the loss of the RD1 locus, complementation of BCG with this region does not fully restore virulence to wild-type levels (Pym et al. 2003), and the RD1 deletion mutant of M. tuberculosis is still more virulent than BCG in long-term murine infection experiments (Sherman et al. 2004). This leads to speculation that additional genetic lesions that have occurred in BCG also contribute to its attenuation. Whole genome sequence comparison revealed 736 single nucleotide polymorphisms (SNPs) between BCG-Pasteur 1173P2 and M. bovis AF2122/97 (Brosch et al. 2007). Recent comprehensive studies of SNPs between M. bovis and BCG daughter strains revealed two types of SNPs that could also play a role in the early evolution of BCG (Abdallah et al. 2015, Pelayo et al. 2009). Most are common to the entire BCG lineage, but some SNPs are specific to individual BCG strains. This suggests that loss of virulence during the initial 231 in vitro passages of BCG involved both the loss of RD1 and other mutations (the aforementioned deletion and duplication, plus SNPs), which together likely contributed to the attenuation of virulence during the derivation of BCG.

Of the SNPs that are common to all BCG daughter strains, some have been explored in greater depth, but further study is required. For example, SNPs in genes predicted to be involved in glycerol catabolism were identified in two genes, glpK (codon 191) and pykA (codon 220). glpK encodes glycerol kinase, the enzyme which catalyzes the MgATP-dependent phosphorylation of glycerol to yield sn-glycerol 3-phosphate, the rate limiting step in glycerol utilization in E. coli (Zwaig et al. 1970). Remarkably, this SNP is present in M. bovis AF2122/97 and causes a frameshift in the glpK gene at codon 191, leading to a truncated coding sequence (Garnier et al. 2003). Noteworthy, the glpK frameshift is not evident in the BCG daughter strains or in the M. bovis AN5 strain, strains that can grow on glycerol as the sole carbon source (Keating et al. 2005). The second SNP is a nucleotide substitution in pykA, the gene that encodes pyruvate kinase, an enzyme that catalyzes the final step in glycolysis, the conversion of phosphoenolpyruvate to pyruvate. The pykA SNP results in the substitution of glutamic acid 220 by aspartic acid and renders pyruvate kinase nonfunctional. This highly conserved glutamic acid residue is predicted to play an important role in the active site of pyruvate kinase and has been associated with cofactor (Mg++) and substrate (ADP/ATP, PEP) binding (Munoz and Ponce 2003). As with the glpK SNP, the pykA mutation does not occur in the BCG daughter strains or M. bovis AN5 but is present in M. bovis AF2122/97 and was suggested to account for inability of M. bovis to grow on glycerol as a sole carbon source (Keating et al. 2005). This suggests that progenitors of BCG daughter strains (Nocard’s M. bovis) and M. bovis AN5 strain had an in-frame glpK and pykA coding sequence.

Another study implicated one of the first BCG polymorphism described in Mb3700, a gene encoding a transcriptional regulator of the cyclic AMP (cAMP) receptor protein (CRP)- fumarate and nitrate reduction regulator (FNR) family that could affect the DNA binding activity of this putative global transcriptional regulator and therefore, could contribute to the attenuation of BCG strains. Further studies demonstrated that this point mutation resulting in a base substitution of glutamic acid to lysine (E178K) present in all BCG daughter strains examined altered DNA binding of CRP to target sites as well as global gene expression, without playing any role in the attenuation of BCG (Bai et al. 2007; Hunt et al. 2008). Although these lesions have not been directly associated with the original attenuation of BCG, it remains possible that the accumulation of multiple lesions, including SNPs, between 1908 and 1921 may have resulted in a complex effect that is yet to be fully resolved.

8.5 Molecular Evolution of BCG Daughter Strains After 1924

Dissemination of BCG from Pasteur Institute to various parts of the world began in 1924. Subsequent to the original derivation of BCG, strains were passaged in a variety of non-synthetic media, based on variable growth factors such as the local potatoes. As these laboratory conditions were not uniform, it was perhaps not surprising that vaccine producers recognized the emergence of BCG daughter strains with distinct morphological, biochemical and immunological phenotypic properties by the 1940s and 1950s. Thanks to genomic studies, it is now known that different BCG strains differ both from the original BCG of 1921 and from each other, due to deletions, SNP and duplications. Thus there is no “ancestral” BCG in existence. For instance, BCG Pasteur may be considered the reference strain of BCG vaccines, but the sequenced BCG Pasteur 1173 is separated from the BCG of 1921 by a number of genetics events (e.g., deletion of IS6110, RD2, nRD18, and RD14; SNPs in mmA3, sigK, and crp_L47P; duplication of DU1) (Fig. 8.2). Evidence to suggest a closer relationship between BCG Russia and the original BCG progenitor strain emerged from the discovery that BCG Russia is a natural recA mutant (Keller et al. 2008). Because a RecA mutation might result in fewer mutation, it was speculated that this mutation might have kept BCG Russia in a state ‘closer’ to the 1921 ancestor of all BCG strains. Indeed, upon determining the complete genome sequence of a panel of BCG strains, it was shown that BCG Russia contains fewer SNPs and deletions compared to other BCG strains (Abdallah et al. 2015).

A critical divergence in the evolution of BCG was the loss of the 10.8 kb region of RD2 during the ongoing propagation of BCG between 1927 and 1931, a time that coincides with reports of the ongoing attenuation of the vaccine. This led to “early” (RD2 present) and “late” (RD2 absent) BCG strains. The role of RD2-associated virulence was evaluated by a targeted knockout in the M. tuberculosis reference strain H37Rv, which showed that the deletion mutant was attenuated in murine models of infection (Kozak et al. 2011). Moreover, some of the genes encoded by RD2 stand out as candidates for the ongoing attenuation of BCG daughter strains in the laboratory. The deletion of RD2, which contains the gene mpb64 and encodes the antigenic protein MPB64, accounts for the lack of MPB64 in the late strains. Complementation of the BCG daughter strain Pasteur with mpb64 gene improved the immunogenicity of the vaccine strain but did not improve protection against pulmonary TB (Kozak and Behr 2011).

In addition to RD2, several other genomic polymorphisms that are common to all strains obtained from the Pasteur Institute after 1927 have been investigated, including point mutations that may have contributed to the later evolution of BCG. For instance, protein antigens MPB70 and MPB83 are found to be at high levels in early strains prior to 1927 (acquired before 1927) but are present only in trace quantities in late BCG strains (Milstien and Gibson 1990). Follow-up studies using immunoblot and quantitative reverse transcription polymerase chain reaction (RT-PCR) separated BCG into high- and low-producing strains, and determined that transcription of the antigen-encoding genes, mpb70 and mpb83, follows the same strain pattern with mRNA levels reduced over 50-fold in low-producing strains (Charlet et al. 2005). Using transcriptomic analysis, Charlet et al. identified two regions of the genome that had dysregulated gene expression between high-producers and low-producers: the set of genes including mpb70 and mpb83, and a distant set of genes including the gene coding for Sigma Factor K. DNA sequence analysis showed a point mutation in the SigK gene, leading to gene complementation studies that formally implicated this mutation in the decreased expression of MPB70 and MPB83 in the later strains (Charlet et al. 2005).

Strains obtained from the Pasteur Institute prior to 1927 produced methoxymycolates, a subclass of cell wall mycolic acids in vitro but those obtained later could not synthesize methoxymycolates. This phenotype has been attributed to a point mutation in the mmaA3 gene, which encodes methoxy mycolic acid synthase 3 and is responsible for O-methylation of hydroxymycolate precursors to form methoxymycolic acids. This base substitution at position 293 in mmaA3 results in an amino acid change from glycine to aspartic acid and inhibits methoxymycolates production (Behr et al. 2000). Also, it has been postulated that this SNP leads to low-level isoniazid (INH) resistance (Abdallah et al. 2015), as INH is known to inhibit the synthesis of α-mycolate, methoxymycolate and ß-mycolate (Takayama et al. 1972). Intriguingly, the loss of methoxymycolate appears to have no impact on the virulence of late BCG strains (Belley et al. 2004). Furthermore, two predicted regulators have been disrupted during the serial passage of BCG strains. The sigI gene (possible alternate RNA polymerase sigma factor SigI) encoded in nRD18 is missing from BCG strains obtained after 1933. Additionally, RD14, which encodes Mb1802 and has been annotated as a probable transcriptional regulatory protein, is deleted from BCG-Pasteur (Behr et al. 1999) and some strains of BCG-Phipps (Abdallah et al. 2015); this predicted role as a transcriptional regulator was subsequently confirmed with promoter fusion assays (Alexander and Behr 2007). Finally, Mb3439c, a gene encoded in RD16 and also annotated as a possible transcriptional regulator has been disrupted by independent deletions in BCG-Moreau and BCG-Japan. The loss of regulatory genes from a number of different BCG strains argues that mutation of regulatory genes can be tolerated during conditions of laboratory growth.

BCG-Japan, BCG-Moreau and BCG-Glaxo do not produce the lipid virulence factors phthiocerol dimycocerosates (PDIMs) and phenolic glycolipids (PGLs), and were found to be naturally deficient in PDIMs and PGLs, whereas the other BCG strains do produce these lipids (Chen et al. 2007). Because some strains of BCG have been associated with higher rates of adverse effects, such as disseminated BCGosis, investigators have queried whether the presence/absence of these virulence factors might in part explain the variable rates of these events. Indeed, the loss of these lipids has been shown to correlate with the superior safety records of these strains in clinical studies (Chen et al. 2007; Lotte et al. 1984). Moreover, deletion of PIDMs/PGLs from BCG Pasteur reduces its virulence and protective efficacy (Tran et al. 2016). Intriguingly, variation in PDIM and PGL production does not coincide with the genealogy of BCG strains (Abdallah et al. 2015; Brosch et al. 2007), suggesting that this particular phenotype has emerged multiple times and by multiple mechanisms. Indeed, in BCG-Moreau, the PDIM and PGL defect is likely due to a 975-bp deletion that affects fadD26 and ppsA (Leung et al. 2008), which are members of the PDIMs and PGLs biosynthetic locus (Azad et al. 1997, Leung et al. 2008). Similarly, a point mutation in ppsA is responsible for the lack of PDIMs/PGLs in BCG-Japan (Naka et al. 2011). However, this region is intact in BCG-Glaxo (Leung et al. 2008), indicating that the PDIM/PGL defect in BCG-Glaxo is caused by other, currently unknown genetic lesions. Together, this suggests that these three BCG strains independently acquired mutations in PDIM/PGL biosynthesis that contributed to their further attenuation.

Considering its prominent role in virulence, it may not be surprising that BCG strains exhibit a number of genetic polymorphism in the phoP-phoR locus (Leung et al. 2008), a two-component system known to regulate the expression of multiple genes, including some well-established T-cell antigens (Walters et al. 2006). Of note, a frame-shift mutation within the phoP gene of BCG-Prague eliminates the majority of C-terminal DNA binding domain, and makes this strain a natural phoP mutant (Gupta et al. 2006; Sinha et al. 2008; Wang et al. 2007), and it is possible that this could account for this strain’s reported low immunogenicity (Ladefoged et al. 1976; Vallishayee et al. 1974). Furthermore, three early strains as mentioned earlier, BCG-Russia, BCG-Japan and BCG-Moreau have an IS6110 insertion in the promoter region of phoP (Leung et al. 2008), which may eliminate the auto-repression regulatory mechanism of this two component system (Gupta et al. 2006). BCG-Sweden and BCG-Birkhaug contain a deletion that truncates the C-terminal of PhoR (Leung et al. 2008). In BCG-Danish, BCG-Glaxo and BCG-Frappier, frame-shift mutations in the phoR gene abolish the PhoR protein (Leung et al. 2008).

Unlike other BCG strains, two closely related BCG strains, BCG-Sweden and BCG-Birkhaug, contain deletions in whiB3, a transcriptional regulator implicated in virulence of M. bovis (Steyn et al. 2002) and M. tuberculosis (Singh et al. 2009), in pks12, a polyketide synthase that necessary for CD1c-mediated T-cell response, which may affect the immunogenicity of the vaccine (Matsunaga et al. 2004), and in trcR, the response regulator of the TrcR-TrcS two-component system that controls expression of the trcRS operon, including mmpS5/mmpL5 transporter and bfrB bacteroferrin (Wernisch et al. 2003). Intriguingly, deletion of trcS from M. tuberculosis produced a hypervirulent phenotype in SCID mice (Parish et al. 2003), suggesting a role for this gene in the attenuation of BCG. However, these deletions are not present in other BCG strains and therefor distinguish the BCG-Sweden and BCG-Birkhaug lineage from other early strains (e.g., BCG-Russia, BCG-Japan and BCG-Moreau).

Comparative genomics has also uncovered two large tandem duplications, DU1 and DU2 of 29-kb and 36-kb, respectively, in BCG-Pasteur (Brosch et al. 2000, 2007). These seem to have arisen independently, as their presence and/or their size varies between the different BCG daughter strains. While DU1 appears to be restricted to BCG-Pasteur, DU2 has detected in all BCG daughter strains tested so far but at variable size (Abdallah et al. 2015; Brosch et al. 2007). Interestingly, DU1 contains the oriC locus, the site of chromosomal origin of replication, indicating that BCG-Pasteur is diploid for oriC, and several key genes involved in replication initiation and cell division machinery (Brosch et al. 2000). For DU2, the tandem duplication resulted in diploidy for 30 genes, for which probable functions are known. These include aroA, a key enzyme in aromatic amino acid biosynthesis, and the coding sequences for variety of regulatory proteins that could exert pleiotropic effects, including a histidine kinase asnC and tetR homologues, whiB1 and sigH, a sigma factor implicated in the heat shock response (Fernandes et al. 1999). Chromosomal duplications are a common evolutionary response in bacteria exposed to different selection pressures in the laboratory and presumably in nature, as they provide a means for increasing gene dosage and for generating novel functions from potential gene fusion events at duplication endpoints. They also represent a source of redundant DNA for divergence. As such, the presence of DU1 and DU2 suggests that the process of tandem duplications in BCG is ongoing and remains a potent source of genome dynamics. However, the potential role of these duplication events on the immunogenicity of BCG strains remains to be explored. Other duplications specific to certain BCG strains have also been uncovered, such as a 22-kb duplication present only in BCG Tice (DU-Tice) (Leung et al. 2008). DU-Tice contains the entire ESX-5 secretion system, which is present only in pathogenic mycobacteria (Abdallah et al. 2007), and has been shown to be responsible for the transport of cell envelope proteins that are required for nutrient uptake (Ates et al. 2015), and directly or indirectly modulate the human macrophages response (Abdallah et al. 2008). Interestingly, sequencing of BCG-China repositioned this strain from the DU2-III group into the DU2-IV group that includes BCG-Pasteur, BCG-Phipps, BCG-Tice, BCG-Mexico, BCG-Connaught and BCG-Frappier. This analysis revealed that BCG-China was not originally derived from BCG-Danish, as previously thought, and that this inconsistency was likely attributed to multiple circulating strains of “BCG China” (Abdallah et al. 2015).

8.6 Implications for BCG Immunization

BCG vaccines are given to over a hundred millions of newborns each year, with the goal of safely preventing TB disease. The ideal profile of such a vaccine should be: (1) prevents disease, (2) rarely causes progressive vaccine-associated infection, and (3) if vaccine infection occurs, this should be readily treatable. At present, it is unclear, despite all known from genomic studies, whether a particular BCG is better, or less desirable, at addressing the aforementioned criteria.

For protective efficacy, a lingering question is whether early strains, which produce antigens such as MPB64, MPB70 and MPB83, are more likely to protect against TB. However, there have been no randomized trials that have competed an early strain against a late strain of BCG; in fact, there have been no randomized controlled trials whatsoever of the early BCG strains. For safety, there have been reports that the early strains have been associated with a greater risk of vaccine-associated infection (also known as BCG-osis). However, these reports were generally confined to before-and-after studies in countries with a change of vaccine strain for political reasons. Formal demonstration that early BCG strains are more virulent, and therefore, a greater risk in immunization schedules, is lacking. Finally, for treatment of rare cases of BCG-osis, it is worth noting that all BCG strains are resistant to pyrazinamide, as this is a genetic feature of M. bovis (Huard et al. 2006), and that some strains are apparently resistant to isoniazid (Abdallah et al. 2015; Kolibab et al. 2011). The degree of isoniazid resistance is considered low-level, such that some consider this not to be a clinical issue (Arend and van Soolingen 2011). In addition, they are also resistant to cycloserine, partly due to the G122S mutation in cycA (Chen et al. 2012). Interestingly, expression of a functional copy of cycA from M. tuberculosis or M. bovis increased the susceptibility of BCG Pasteur to d-cycloserine, albeit not to the levels of M. bovis or M. tuberculosis (Chen et al. 2012), suggesting that other genetic lesions may contribute to BCG resistance to d-cycloserine. Nonetheless, it is not intuitive that we should be giving a live attenuated vaccine to millions of newborns, knowing that some of these vaccines are inherently resistant to two of the four antibiotics used to treat TB. Further study should aim to formally identify the mutation(s) responsible for isoniazid resistance, as there could be policy implications in knowing which strains might not be ideal for vaccination purposes.

Finally, BCG is used not only to prevent TB, but also as an immunotherapeutic for bladder cancer. The lessons of BCG evolution and strain variability have yet to affect the choice of BCG strains for this indication, yet, many parallels can be imagined. How does BCG work in bladder cancer? What are the adverse effects? How is BCG-osis treated in an elderly, debilitated patient suffering from a malignancy? Further study of the BCG strain family, particularly with the goal of linking genotype to clinically-meaningful phenotype, is required, to find the best use of these bacteria given to us by the work of Calmette and Guerin.

8.7 Conclusion

More than 100 years ago, Albert Calmette and Camille Guérin began their research for an anti-TB vaccine, which lasted 13 years and led to one of the most widely used vaccines in human history. Despite a century of investigation, the BCG vaccine continues to be controversial and remains the only vaccine for the prevention of TB. Shaped by human history, BCG has also evolved to the various daughter strains recognized today. These evolved strains differ from each other and from the original BCG first used in 1921, both genetically and phenotypically, such that these changes may translate into variable vaccine properties, including protective efficacy, tuberculin reactivity and propensity for adverse effects. Remarkably, the current collection of BCG comprises various natural mutants of established virulence factors. Continuing studies on the molecular factors that impact properties of BCG vaccine strains would shed light on the specific differences among BCG strains and further our understanding of the mechanisms of attenuation specific to each lineage. This would be useful in further delineating BCG strains into phenotypic and genetic categories that could mediate their protective efficacy.