Introduction

In spite of immense efforts to overcome it, tuberculosis (TB) infection remains a major public health issue with an estimated 9 million new cases and 1.5 million deaths from TB worldwide in 2013 (The WHO 2014). The emergence and rapid spread of multidrug-resistant (MDR) and extensively drug-resistant (XDR) TB pose additional global threats to the global TB control programs (Gandhi et al. 2010). The World Health Organization (WHO) ranked Thailand 18th of the 22 highest TB burdened countries in the world. With a population of around 67 million, Thailand was reported to have approximately 62,000 new and 110,000 prevalent TB cases in 2013. These figures were further complicated by the growing threat of MDR and XDR-TB. According to the WHO report, approximately 2 % of new TB cases and 19 % of previously treated cases were MDR-TB in Thailand (The WHO 2014). However, despite this major public health problem, little is known about the genetic characteristics of the isolates (Viratyosin et al. 2013; Coker et al. 2014). Therefore, information gained from whole genome sequencing (WGS) of MDR M. tuberculosis outbreak isolates could improve our understanding about the epidemics in this region.

With a prevalence of about 50 % of all Asian TB cases (van Soolingen et al. 1995; Parwati et al. 2010) and a worldwide dissemination (Bifani et al. 2002; Glynn et al. 2002; European Concerted Action on New Generation Genetic Markers and Techniques for the Epidemiology and Control of Tuberculosis 2006), the M. tuberculosis Beijing family has gained considerable attention. It is considered virulent (Parwati et al. 2010) and is often associated with drug resistance (Drobniewski et al. 2005; Niemann et al. 2010; Casali et al. 2012) and large outbreaks (Bifani et al. 1999; Toungoussova et al. 2002, 2003; Ioerger et al. 2010; Golesi et al. 2013). A previous epidemiological study of TB in Thailand revealed a large outbreak of MDR-TB in Kanchanaburi Province during 2002–2010 (Jiraphongsa et al. 2011). The genotyping of 64 isolates from 2003 to 2008 using spoligotyping and 24-loci mycobacterial interspersed repetitive units-variable number of tandem repeats (MIRU-VNTR) resulted in the clustering of 54 isolates indicating a clonal outbreak in the region. The strain was identified as a member of the Beijing family (Srilohasin 2013). Although these isolates had isogenic genotyping patterns, a discrepancy regarding ethambutol resistance was observed. Moreover, the extensive spread of this strain in the community indicated that it might harbor some genetic determinants to aid efficient transmission.

A recent study in a TB high-burden setting where the M. tuberculosis Beijing family was prevalent suggested a threshold of ≤5 single nucleotide polymorphisms (SNPs) to define strains in a transmission chain (Luo et al. 2014). However, the mutation rates of the M. tuberculosis Beijing family remain controversial (Werngren and Hoffner 2003; de Steenwinkel et al. 2012; Ford et al. 2013). For example, an in vivo experimental infection study on non-human primates under antibiotics therapy determined that the Beijing family had a higher mutation rate than other lineages (Ford et al. 2013) and a high degree of genetic diversity was reported among the serial isolates of the M. tuberculosis Beijing family obtained from the patients during the course of anti-TB therapy (Sun et al. 2012). However, fluctuation analysis, did not find any difference in mutation rate between Beijing and non-Beijing isolates (Werngren and Hoffner 2003). Additionally, next generation sequencing (NGS) analysis of Beijing family serial isolates recently determined that 8–9 SNPs were acquired over a period of 3 years suggesting low mutation rate in human host (Merker et al. 2013). Considering these controversial findings, it would be beneficial to carry out a direct investigation of the microevolution of clinical isolates in each setting.

In this study, therefore, WGS was performed on the first isolate (isolated in 2003) and the last three isolates (isolated in 2008) of the defined cluster from our culture collection. WGS analysis of the isolates also suggested a clonal spread of the strain, and that the outbreak might be attributed to the presence of drug resistance conferring mutations associated with low or no fitness costs in the strain. We also determined the acquired SNPs in the isolates accumulated over a 5-year period and found that they were genetically very stable with a maximum number of three unique SNPs compared with the first isolate. To the best of our knowledge, this study represents the first comprehensive analysis of MDR-TB isolates involved in a large community outbreak in Thailand.

Materials and methods

Selection of isolates

A retrospective cohort study by an epidemiological team identified 148 MDR-TB cases and confirmed a community outbreak of MDR-TB during 2002–2010 in Kanchanaburi Province, Thailand (Jiraphongsa et al. 2011). As a reference laboratory, we obtained the specimens from 2003 to 2008 for drug susceptibility testing (DST). Spoligotyping and 24-loci MIRU-VNTR of all 64 isolates were performed (Kamerbeek et al. 1997; Supply et al. 2006; Srilohasin 2013), and clustering of 54 isolates was observed suggesting a clonal outbreak of MDR-TB in this cohort. Spoligotyping and MIRU-VNTR typing of the isolates corresponded to SIT-1, MIT-17, and VIT-70 of the modern Beijing sub-lineage in the SITVITWEB database (Demay et al. 2012; Srilohasin 2013). The first isolate (DS-5538; isolated in August, 2003) and the last three isolates of the cluster (DS-17355, DS-17471, and DS-17472; isolated in March 2008) were selected from our culture collection. Isolates DS-5538, DS-17355, and DS-17471 showed identical DNA fingerprints and phenotypic DST results, and were selected to observe the cumulative genetic changes over the 5-year period. Among the 54 clustered isolates, 23 were reported to be susceptible to ethambutol. Isolate DS-17472, reportedly susceptible to ethambutol, was selected to observe whether it belongs to the same clone. All isolates were obtained from sputum specimens at the Drug-Resistant TB Laboratory Department of Microbiology, Faculty of Medicine Siriraj Hospital, Thailand.

Phenotypic drug susceptibility testing

Testing of the first- and the second-line anti-TB drugs was performed by the standard agar proportion method (WHO Geneva/IUATLD Paris 1998). Drug concentrations of 0.2 mg/l isoniazid, 1.0 mg/l rifampicin and linezolid, 5.0 mg/l ethambutol and ethionamide, 6.0 mg/l amikacin and kanamycin, and 2.0 mg/l streptomycin, para-aminosalicylic acid, ofloxacin, ciprofloxacin, levofloxacin, moxifloxacin and gatifloxacin were used for drug susceptibility testing.

DNA isolation and whole genome sequencing

Selected isolates were subcultured on Lowenstein–Jensen medium for 4 weeks at 37 °C. DNA extraction and purification were carried out using cetyltrimethylammonium bromide lysozyme method (Larsen et al. 2007). Sequencing of the isolates was carried out at Macrogen Inc. (Seoul, South Korea). Genomic libraries were prepared according to the recommendations of the TrueSeq DNA sample preparation kit (Illumina, San Diego, CA). The library pools were subjected to paired-end sequencing on a HiSeq 2000 platform (Illumina) generating 100-bp read lengths.

Reads mapping, SNP calling and confirmation

Paired-end raw reads of each isolate were independently aligned to the M. tuberculosis H37Rv reference genome (GenBank accession number: NC_000962.2) using Bowtie 2 version 2.2.0 (Langmead and Salzberg 2012). Bedtools version 2.20.1 (https://github.com/arq5x/bedtools2) was used to determine the reads coverage over the reference genome. Aligned reads of each isolate were sorted, indexed and combined into an mpileup file using SAMtools (Li et al. 2009). Single nucleotide variant (SNV) identification was performed using VarScan 2.2.11 (Koboldt et al. 2012). To ensure the quality of the SNVs, bases with Phred quality score of ≤20 and SNVs with coverage of fewer than 10 reads were discarded. Additionally, heterozygous SNVs with allele frequencies of <75 % that were commonly present in all four isolates were discarded, as they likely originated from mapping errors. The remaining variants were annotated using H37Rv annotations (GenBank accession number: NC_000962.2) and classified as synonymous, nonsynonymous, or intergenic.

When analyzing short reads, sequences of repetitive regions and paralogous gene families are known to be problematic because the short reads can be mapped to multiple loci. Thus, variants determined in PE, PPE, PE_PGRS, integrase, transposase and phage-related genes were discarded (Comas et al. 2010). Overall filtration processes and annotation were performed using in-house software written in Ruby code. All candidate SNV positions were then carefully observed in Integrative Genomics Viewer (Thorvaldsdottir et al. 2013). Some heterozygous variants with allele frequencies just above the borderline (between 75 and 80 %) in some isolates and <75 % in other isolates were curated manually because they were also likely to have been created from mapping errors. SNV filtration and curation parameters were chosen on the basis of previous work carried out in M. tuberculosis (Merker et al. 2013; Perez-Lago et al. 2014). All unique SNVs identified in each isolate were further validated by PCR amplification and Sanger sequencing.

Ethical approval

Ethical approval was obtained from the Institutional Review Board Committee of the Faculty of Medicine Siriraj Hospital, Mahidol University, Thailand (Protocol No. 811/2556; EC2). This article does not contain any studies with human participants or animals performed by any of the authors.

Results

Whole genome sequencing

All selected isolates were successfully sequenced and the data were analyzed as described in "Materials and methods". The average number of reads yielded per isolate was 40,049,303 with a mean sequencing depth of 916X, when aligned to the H37Rv reference genome. An average of 98.99 % of the reads was successfully mapped to the reference. More than 99 % of the reference genome was shown to be covered by at least one aligned read. The gross statistics of alignment and coverage is summarized in Table 1. Remaining SNVs after filtration were included for subsequent analyses.

Table 1 Gross statistics of whole genome sequencing and mapping of the reads

The resulting Fastq files from the four isolates were deposited in the NCBI short read archive (SRA) under accession numbers SRX691156, SRX691468, SRX691500, and SRX691501 for isolates DS-5538, DS-17355, DS-17471, and DS-17472, respectively.

Genetic variations

Compared with the H37Rv reference, 1242 common SNPs were identified in all isolates (Online Resource 1, Fig. 1). Of these, 162 were intergenic and 1080 were found in protein coding regions, of which 406 were synonymous and 674 were nonsynonymous. Comparative SNP analysis identified three, three and two unique SNPs in DS-17355, DS-17471 and DS-17472, respectively (Fig. 1; Table 2). Nonsynonymous mutations were found in 113 of the previously defined 760 essential genes (Comas et al. 2010). Among the eight unique SNPs, only one nonsynonymous SNP was observed in the essential gene pks12 (Table 2).

Fig. 1
figure 1

Venn diagram showing overview of SNP distribution among the isolates. A total of 1242 SNPs were shared among the isolates. Three SNPs were unique to DS-17355 and DS-17471, and two SNPs to DS-17472

Table 2 Strain-specific SNPs identified in the present study

The genetic background of the isolates was determined by WGS data. To characterize the principle genetic group (PGG) (Sreevatsan et al. 1997), allelic determination at katG 463 and gyrA 95 positions was performed. The presence of katG R463L and gyrA S95T alleles in all isolates confirmed that they belonged to PGG-1. In silico analysis of alignments in Integrated Genomics Viewer to elucidate regions of difference (RDs) (Tsolaki et al. 2005; Dou et al. 2008) showed that the isolates belonged to RD-type 4. Furthermore, SNP analysis for the determination of sequence type (ST) (Filliol et al. 2006) revealed that all isolates belong to SNP cluster group (SCG)-2, ST-10. All findings, including spoligotyping and MIRU-VNTR typing, were in agreement and confirmed the isolates to be clonal and belong to a modern sublineage of the M. tuberculosis Beijing genotype.

Drug resistance mutations

The isolates were tested for phenotypic drug susceptibility to the first- and the second-line drugs (Online Resource 1). All isolates were resistant to rifampicin, isoniazid and streptomycin, and all except DS-17472 were reported to be resistant to ethambutol. Well-known polymorphisms that correlated with phenotypic drug resistance were observed in all isolates (Table 3).

Table 3 DST phenotype and drug resistance-confirming mutations identified in studied isolates

Rifampicin resistance is caused by specific mutations in rpoB which encodes the beta subunit of RNA polymerase. The presence of the rpoB S450L mutation in all strains conferred rifampicin resistance, and a nonsynonymous rpoB L731P mutation was also detected in all strains. No mutations were identified in rpoA and rpoC in any of the isolates, although mutations in these genes could have explained the compensatory role for fitness cost. Resistance to isoniazid is a complex process and has been explained by mutation in several genes including katG, inhA, ahpC, and ndh. The nonsynonymous mutation katG S315T was observed in all four isolates studied and conferred isoniazid resistance. Resistance to streptomycin can result from mutations in rrs or rspL, with the most common mutation being K43R in rspL. All four isolates carried the K43R mutation in rspL, conferring streptomycin resistance. Ethambutol resistance is most likely caused by the overexpression or structural variations in embB. Resistance to ethambutol in the isolates was conferred by the presence of the G406D embB mutation. However, the isolate DS-17472, which was reported susceptible to ethambutol, was also found to harbor the G406D mutation in embB. Those genes and their applicable promotor regions known to be associated with resistance to anti-TB drugs, and listed in TB drug resistance mutation database (Sandgren et al. 2009), were analyzed, but no other well defined or novel mutation that could be correlated to drug resistance was found.

It has been suggested that MDR in M. tuberculosis could be associated with the constitutive or inducible expression of efflux pump-related genes (Calgin et al. 2013; Black et al. 2014). However, the role of mutations in these genes in causing drug resistance has been poorly explored (Liu et al. 2014). We suspected that the mutations could have a cumulative effect on the formation of drug resistance in MDR-TB strains. We therefore investigated mutations in 41 known or putative drug efflux-related genes (Black et al. 2014) and identified a total of 10 nonsynonymous SNPs of which nine had previously been reported in the pansusceptible M. tuberculosis Beijing strain (Niemann et al. 2009). A mutation (H462N) in Rv1877 that encodes a conserved membrane protein was observed to be novel in all isolates studied.

Discussion

In recent years, rapid WGS using NGS platforms has unraveled the genetic make-up of microorganisms at a high resolution. In this study, we compared the genetic variations in outbreak isolates with the H37Rv reference and also identified the genetic background of these outbreak isolates as PGG-1, SCG-2; ST-10 of modern Beijing family of M. tuberculosis (Sreevatsan et al. 1997; Filliol et al. 2006). ST-10 of Beijing family has previously been reported to be predominant in many countries (Chen et al. 2012; Iwamoto et al. 2012) including Thailand (Faksri et al. 2011), although further research is required to explore the molecular biology underlying the emergence and rapid spread of this genotype.

Predicted genotype susceptibility results were identical to phenotype DST results in the present study, except for DS-17472. Although this isolate was reported to be susceptible to ethambutol, we observed the presence of the G406D embB mutation which confers the drug resistance (Ramaswamy et al. 2000). Phenotypic DST repeated in triplicate for this isolate revealed the same result. This could be explained by the fact that the G406D mutation has been reported to confer only low level resistance to ethambutol (3.3–7.6 mg/l) (Safi et al. 2013), and that previous studies also reported this mutation in ethambutol-susceptible isolates (Lee et al. 2004; Park et al. 2012). Moreover, a re-evaluation of the current breakpoint for ethambutol (5 mg/l for 7H10 medium) has been highly debated (Schon et al. 2009; Gumbo 2010), while phenotype DST has been suggested to underreport ethambutol resistance (Johnson et al. 2006; Ioerger et al. 2010). Overall, the presence of genetically identical drug resistance conferring SNPs among the isolates further illustrates their clonality.

To identify a novel mechanism that could be correlated with drug resistance, our analysis revealed the presence of classic mutations in all drug resistance related genes studied except for the L731P mutation in rpoB. This SNP was previously reported as unique in the MDR outbreak strain X122 from Western Cape, South Africa (Ioerger et al. 2010). Recently, de Vos et al. reported that strains harboring the S450L rpoB mutation with compensatory mutations in RNA polymerase genes were associated with ongoing transmission of MDR-TB in the community (de Vos et al. 2013). Hence, a functional study is required to confirm whether this mutation enables the strain to tolerate the fitness cost associated with drug resistance, or enhance its transmissibility. The analysis of mutations in drug efflux pump-related genes revealed the novel mutation H462N in Rv1877 which encodes a conserved membrane protein. The lfrA gene, homolog to Rv1877, was previously found to induce resistance to erythromycin in M. smegmatis (Li et al. 2004). However, it is not clear how mutations in Rv1877 affect susceptibility to ethambutol, so further research is necessary to confirm its function in causing drug resistance.

Because this MDR-TB strain was successfully transmitted in the community, there was a need to unravel the genetic determinants responsible for this transmission. Acquired drug resistance in M. tuberculosis strains is more often associated with the reduced fitness that might affect growth, stability, or transmission (Andersson 2006). Unlike other lineages of M. tuberculosis, the Beijing family is thought to show more potential in adopting the fitness cost by a genetic-specific capabilities that acquire low or no fitness cost mutations for drug resistance (Borrell and Gagneux 2009; Gagneux 2009) or by the most favorable epistatic interactions between drug resistance and compensatory mutations (Comas et al. 2012; Muller et al. 2013). Several studies have shown that MDR-TB strains harboring low or no fitness costs were better transmitted in the community than those with other mutations (van Soolingen et al. 2000; Gagneux et al. 2006a; Strauss et al. 2008; Naidoo and Pillay 2014). Interestingly, the outbreak strain in this study was also found to harbor low or no fitness cost mutations in rpoB (S450L), katG (S315T) and embB (G406D) which confer drug resistance to rifampicin, isoniazid and ethambutol, respectively (Pym et al. 2002; Gagneux et al. 2006b; Safi et al. 2013). Overall, the successful transmission of the strain resulting in a large community outbreak may be attributed to the presence of drug resistance conferring mutations associated with low or no fitness costs, or the additional effect of the L731P rpoB mutation.

SNPs accumulated over a 5-year period in the selected isolates of the cluster (Srilohasin 2013) were determined. Although the mutation rate of the Beijing family remains controversial (Werngren and Hoffner 2003; de Steenwinkel et al. 2012; Sun et al. 2012; Ford et al. 2013; Merker et al. 2013), the outbreak strain was found to be genetically stable over 5 years, in line with previous findings (Werngren and Hoffner 2003; Merker et al. 2013), because only two to three SNPs were found to be acquired by each of the last three isolates. This is in agreement with a previous study by Schurch et al. in which NGS was applied for the WGS of three isolates obtained over 14-year period. A maximum of four SNPs were acquired compared with the first isolate (Schurch et al. 2010). Recently, a model similar to our study was used to determine the genomic variation in a M. tuberculosis outbreak strain belonging to the T2 sublineage. Compared with the index case, the strain demonstrated genomic stability over 9 years with only four acquired SNPs and a small deletion (Sandegren et al. 2011). Interestingly, the number of SNPs identified in our study is similar to that of previous studies despite differences in the genetic background of the strain and TB burden settings.

WGS of M. tuberculosis strains among epidemiologically linked patients in both TB low and high-burden settings revealed that epidemiologically linked TB strains can be genetically linked by five or fewer SNPs (Kato-Maeda et al. 2013; Roetzer et al. 2013; Walker et al. 2013; Luo et al. 2014). In our study, two or three SNPs were identified among the isolates studied, which supports these previous findings and may be useful in establishing epidemiological links among TB patients in high-TB burden settings where the M. tuberculosis Beijing family is predominant.

High molecular weight genomic DNA used in this study was obtained from isolate subcultures derived from stock cultures, which might affect the exact genetic make-up of the bacteria. Although, previous studies did not report any substantial impact of this phenomenon (Merker et al. 2013; Roetzer et al. 2013; Walker et al. 2013), it could be considered a limitation of the present study. Additionally, SNVs in repetitive regions such as PE, PPE, PE_PGRS genes and paralogous gene families were excluded from the analysis. These genes account for approximately 10 % of the coding region of the H37Rv genome, so it is possible that the isolates studied might harbor SNPs in these regions, and that the overall variation might be higher than reported.

In conclusion, this study successfully determined the genetic polymorphisms in the outbreak isolates of M. tuberculosis Beijing, ST-10. The isolates were found to be clonally related despite the discrepancy in their DST phenotype, and the genome of the outbreak strain was shown to be genetically very stable over a 5-year period. We propose that performing drug susceptibility testing and carrying out treatment of MDR-TB or XDR-TB in TB high-burden settings may not be sufficient to achieve the goal of the global TB control program, but the identification and rapid screening of genetic determinants in highly transmissible strains could greatly contribute to their eradication. Our work also identified possible genetic determinants that might be responsible for the efficient transmission of the strain in our community. These findings might have important implication for confirming the epidemiological links among the TB patients in high-TB burden settings or for rapid screening of highly transmissible MDR-TB strains to prevent their successful spread in the community.