Analysis of IS6110 insertion sites provide a glimpse into genome evolution of Mycobacterium tuberculosis

Roychowdhury, Tanmoy; Mandal, Saurav; Bhattacharya, Alok

doi:10.1038/srep12567

Analysis of IS6110 insertion sites provide a glimpse into genome evolution of Mycobacterium tuberculosis

Article
Open access
Published: 28 July 2015

Volume 5, article number 12567, (2015)
Cite this article

Download PDF

You have full access to this open access article

Scientific Reports

Analysis of IS6110 insertion sites provide a glimpse into genome evolution of Mycobacterium tuberculosis

Download PDF

Tanmoy Roychowdhury¹,
Saurav Mandal¹ &
Alok Bhattacharya^1,2

6596 Accesses
47 Citations
2 Altmetric
Explore all metrics

Abstract

Insertion sequence (IS) 6110 is found at multiple sites in the Mycobacterium tuberculosis genome and displays a high degree of polymorphism with respect to copy number and insertion sites. Therefore, IS6110 is considered to be a useful molecular marker for diagnosis and strain typing of M. tuberculosis. Generally IS6110 elements are identified using experimental methods, useful for analysis of a limited number of isolates. Since short read genome sequences generated using next-generation sequencing (NGS) platforms are available for a large number of isolates, a computational pipeline for identification of IS6110 elements from these datasets was developed. This study shows results from analysis of NGS data of 1377 M. tuberculosis isolates. These isolates represent all seven major global lineages of M. tuberculosis. Lineage specific copy number patterns and preferential insertion regions were observed. Intra-lineage differences were further analyzed for identifying spoligotype specific variations. Copy number distribution and preferential locations of IS6110 in different lineages imply independent evolution of IS6110, governed mainly through ancestral insertion, fitness (gene truncation, promoter activity) and recombinational loss of some copies. A phylogenetic tree based on IS6110 insertion data of different isolates was constructed in order to understand genome level variations of different markers across different lineages.

Deciphering the recent phylogenetic expansion of the originally deeply rooted Mycobacterium tuberculosis lineage 7

Article Open access 30 June 2016

Construction of a virtual Mycobacterium tuberculosis consensus genome and its application to data from a next generation sequencer

Article Open access 20 March 2015

Structure and variation of CRISPR and CRISPR-flanking regions in deleted-direct repeat region Mycobacterium tuberculosis complex strains

Article Open access 15 February 2017

Introduction

Mycobacterium tuberculosis, the etiologic agent of tuberculosis, causes large scale morbidity and mortality particularly in developing countries. The bacterium has shown significant increase in its ability to become drug-resistant, creating a major public health crisis¹. Analysis of M. tuberculosis genome evolution may help us to better understand the genotype-phenotype relationship in this organism. Different strain typing methods have revealed world-wide diversity of M.tuberculosis. Large sequence polymorphisms (LSPs) have helped to distinguish six geographically restricted major global lineages² of this pathogen known as Indo-Oceanic (L1), East Asian (L2), Indian-East African (L3) Euro-American (L4) and West African I & II (L5, L6). Further, L1, L5 and L6 strains have been classified as ancient strains whereas other three lineages have been termed modern based on a deletion known as TbD1³. Later, another distinct phylogenetic lineage, L7 was identified in Ethiopia⁴. Single nucleotide polymorphisms (SNPs) were also shown to be consistent with this classification⁵. Other popular strain typing methods are based on variable number of tandem repeats (VNTR) and presence/absence of spacer oligonucleotide known as spoligotyping⁶. Spoligotyping results are also consistent⁷ with lineage-based classification as spoligotype Beijing and CAS represent L2 and L3 lineages respectively where LAM, T, X, S and Haarlem strains make up the L4 lineage. Lineage-based classification is thought to reflect observed variability in emergence of drug-resistance⁸, immune response⁹ and disease severity¹⁰. Results from different studies indicate that M. tuberculosis undergoes clonal evolution governed mainly by genetic drift¹¹.

Insertion sequences (IS) make up a major component of bacterial repetitive elements and these have often been used for species and strain typing¹². IS6110 is specific to the M. tuberculosis complex and can be used for diagnosis, that is, the presence of M. tuberculosis cells in a biological sample. Since these elements are mobile and are located at different sites, IS6110 based restriction fragment length polymorphism (RFLP) has become a popular tool for strain typing^13,14,15. One limitation of this approach is that not all isolates display multiple copies of these elements and some lack even a single copy^16,17. Accordingly, M. tuberculosis strains are frequently classified into high IS copy-number (>7) and low IS copy-number strains¹⁸. It is not clear if these two groups of organisms show different physiological or pathogenic behavior. Although it is believed that high copy number of IS6110 in highly pathogenic strains (Beijing) provides a selective advantage, drug resistance and outbreaks have also been associated with low copy number strains¹⁹. Evolutionary models that explain the control of IS6110 copy number have been developed²⁰. Frequently IS6110 elements are found inserted in a 36-bp array known as Direct Repeat region (DR region: Rv2813-Rv2820c, RD207)²¹. Almost all M. tuberculosis complex (MTBC) isolates have an IS6110 element in the DR region and is considered to be the original insertion site in MTBC genome¹⁹. Analysis of IS6110 insertion sites demonstrated a lack of sequence context specificity for integration, though several insertion cold spots and hotspots have been identified^22,23. Some of the insertion hotspots are intergenic, e.g. Rv0001-Rv0002, but many are intragenic; Rv0797 (IS1547 transposase), Rv1755c (plcD), Rv1758 (cut1), Rv1777 (cyp144), Rv2351 (plcA), Rv3183, Rv3327 and many PE-PPE family proteins. Alonso et al.²¹ found preferential insertion locations in the Beijing strains in comparison to the non-Beijing strains of M. tuberculosis. Analysis of modern strains revealed ancestral/inherited insertion sites rather than insertion hotspots²⁴.

IS6110 insertion in any intragenic region may have a deleterious as well as a selective outcome. Generally genes involved in virulence, information pathway, lipid metabolism and cell wall synthesis are not preferred targets of transposition²³. On the other hand the maximum number of transposition is found in multi-gene families²⁵, such as the PPE gene family, because phenotypic effects can be masked by other copies. Moreover, PPE genes are thought to act as a variable surface antigen²⁶ and their disruption can be beneficial for immuno-evasion. Insertion of IS6110 in the plcD gene is related to extrathoracic disease²⁷. Mycobacterial drug resistance is also shown to be associated with insertion events, for example, IS6110 insertions in the tlyA gene is observed in capreomycin resistant M. smegmatis and M. tuberculosis²⁸; isoniazid, ethionamide and para-aminosalicylic acid resistance of M. bovis has been associated with insertion in katG, etaA and thyA genes respectively^29,30. Some IS6110 intragenic mutants were also shown to have increased virulence as demonstrated by survival time of infected mice³¹. High frequency of SNPs, expansion/contraction of tandem repeats and larger genomic deletions have also been reported in regions flanking IS6110²². LSPs can occur due to recombination of two neighboring copies of IS6110. These events can be detected by the absence of 3–4 bp direct repeats in surrounding regions. Homologous recombination plays a major role in deletion particularly near the ipl locus with high occurrence of IS6110³². IS6110 carries an outward directed promoter at its 3′ end, thus the element can act as a mobile promoter³³. This promoter is able to up-regulate several downstream genes³⁴. Soto et al. showed increased virulence of M. tuberculosis by IS6110 insertion upstream of the phoP gene³⁵, a transcriptional regulator important for bacterial growth. Up-regulation of phoP by an IS6110 located 75 bp upstream of the gene in a multi-drug resistant M. tuberculosis was found in isolates during an outbreak in Spain³⁵. Overall, IS6110 transposition is important for evolution of M. tuberculosis genome and consequently alteration in the physiology and pathogenesis of the organism.

A number of methods have been used for mapping IS6110 distribution. Some of these are electrophoresis based IS6110-RFLP³⁶, PCR based fluorescent amplified fragment length polymorphism (fAFLP)²⁴, IS6110 5′ and 3′ fluorescent polymorphism (IS6110-5′3′FP)³⁷, DNA microarray based SiteMapping³⁸ and targeted amplification of IS elements followed by sequencing or IS-seq³⁹. All these methods have been useful in identification of many IS insertion sites. Unfortunately due to bias in amplification it is likely that many sites may still be missed³⁸. Computational identification of IS insertion sites from whole genome sequences offers an alternate approach that appears to yield relatively complete information. Moreover, with the availability of NGS datasets, it is now possible and easier to identify these elements from large number of isolates that allow better understanding of their role in evolution of M. tuberculosis genome. Since most of the NGS genome data is available in an unassembled form, methods that can use these data are going to be useful. A computational pipeline was used to identify the positions of IS elements in M. tuberculosis genome from unassembled NGS data⁴⁰. This method, in principle, can be used to analyze IS elements of any organism. Overall, this study provides analysis of 1377 publicly available NGS datasets of M. tuberculosis isolates to generate a global picture of IS6110 distribution.

Results & Discussion

Identification of IS elements from NGS data

NGS Data used for this study is described in Table 1. The number of sequenced isolates varied for different lineages, for example, there were only 4 sequences from L7 while 666 sequences could be accessed from the L4 lineage. The pipeline described here used a split-read approach to identify the junction of IS6110 and flanking sequences from NGS reads. Since the datasets were unassembled, the positions of the insertion sites in isolates were defined using the reference genome of M. tuberculosis H37Rv. The validity of the pipeline was checked using simulated NGS data of H37Rv and computationally identified and experimentally verified IS6110 insertion sites⁴⁰. In an earlier study a PCR based approach could not validate all predictions. Absence of PCR products is quite often due to number of factors that include low amount of clinical samples, quality of DNA extracted from patient tissues, extensive secondary structure and or nature of DNA sequence besides absence of target sequences. It is not possible to experimentally validate the identified sites in isolates that are part of this study as data have been generated in large number by different laboratories around the World using patient isolates. However, some of the sites identified by us are already part of a number of published reports.

Table 1 M. tuberculosis isolates used for this analysis.

Full size table

The analysis showed that not all isolates of M. tuberculosis carry IS6110. A total of 12 isolates from L1 (6.03%), L2 (0.53%) and L4 (0.45%) were part of this group. The number of copies of this element varied from 0 to 27 across all isolates (supplementary figure 1). Lineage based distribution of IS6110 copy number was analyzed in order to understand the relationship between lineages and IS6110 expansion. All four L7 isolates displayed only one copy of the element. Density distribution of IS6110 copy number in the other six lineages is shown in Fig. 1. The results showed inter and intra-lineage variations. Isolates belonging to L2 lineage displayed highest mean copy number 20.24 as compared to mean copy number 16.13 observed among L3 isolates. Statistical analysis of IS6110 copy number among the different isolates showed significant differences among the different lineages (One-way Anova, F_(6,1370) = 384.36, P < 0.01). However, not all lineages showed a similar pattern. L1 and L4 displayed large variations within their respective groups (supplementary table 1) and this is in agreement with SNP data¹. A bimodal distribution was observed for L1 isolates (Fig. 1). L1 isolates could be classified in two major categories; one group with 1–2 copies and another with 12–13 copies (see next section for spoligotype based classification). In spite of the large variation within L1 and L4, very high copy numbers (>20) were rarely seen. No relationship between copy number and sequence conservation of the element in different lineages was observed. For example, two West African lineages L5 and L6 display a highly similar nucleotide sequence but different IS copy numbers (supplementary table 1). These results suggest that the expansion of the IS6110 element may have taken place independently in each lineage. It was hypothesized that IS6110 copy number variation largely depends on insertion of IS6110 in transcriptionally active/inactive regions of the genome⁴¹. This could result in high intra-lineage variability and low inter-lineage differences (as some of the strains from each lineage may become high/low copy number), but this was not observed in present study.

Lineage specific insertions

This analysis did not reveal the existence of general hotspots for insertions in different lineages except the DR region where an IS6110 can be found in 99.26% isolates. However, this study did reveal the presence of preferential insertion regions (PIR) in each lineage. General hotspots signify repeated independent insertion into the same spot whereas PIRs are considered as one inherited insertion passed down through a lineage. In order to recognize lineage specific PIRs, H37Rv genome sequence was divided into non-overlapping bins of 100 bp. Presence of IS6110 in >80% strains in a genomic bin of a specific lineage and <10% of isolates in all other lineage, was considered as a criterion for PIR. Lineages L2, L3, L5 and L6 displayed a high level of conservation in PIRs and the number of conserved sites was found to be positively correlated with the number of copies present in a given lineage (Pearson correlation coefficient, R = 0.895, P < 0.05). High copy number isolates of L2 were found with 10 such locations. Alonso et al.²¹ reported preferential insertion locations in Beijing strains (L2), this study filtered out locations which are rare in other lineages, such as, Rv0001-Rv0002 (intergenic), Rv1371 and idsB. The results presented here with respect to L3 isolates (6 PIRs) have not been reported before. Most conserved sites mapped to Rv0395, Rv1504c and Rv3845-Rv3846 (intergenic). Isolates belonging to L1 and L4 lineages showed much higher intra-lineage variations and lineage specific PIRs were not observed. These isolates were further analyzed. Table 2 lists PIRs found in isolates of different lineages. These locations can be used as probable molecular markers for identification of specific isolates of each lineage.

Table 2 Lineage specific preferential insertion regions of IS6110.

Full size table

Each lineage can be further subdivided into several spoligotyped classes. For any genomic bin, presence of IS6110 in >75% strains of a specific spoligotype and <20% in all other spoligotype groups of the same lineage, was considered as a threshold for identification of spoligotype specific PIR. Ninety isolates from the L1 lineage were available with spoligotype information. While high copy number isolates belong to EAI2 or EAI6, low copy number isolates are of EAI1, EAI3 or EAI4 spoligotypes (supplementary table 2). Among these, high copy number isolates of EAI2 and EAI6 displayed conservation of PIRs whereas low copy number isolates did not contain any PIR (Table 3). The spoligotype EAI5 group of isolates showed a high degree of variability in terms of copy number (supplementary table 2). Similarly, 606 isolates belonging to the L4 lineage were analyzed with respect to their spoligotype pattern. Only low copy number X isolates and high copy number S isolates displayed conservation of PIRs. Table 4 lists conserved insertion positions in LAM, S and X isolates. In an earlier study, Rv1755c was suggested to be a hotspot²² and Rv1755c containing IS6110 was found in EAI6 and S isolates but not in any other L1 and L4 isolates.

Table 3 Spoligotype specific preferential insertion regions of IS6110 in L1 isolates.

Full size table

Table 4 Spoligotype specific preferential insertion regions of IS6110 in L4 isolates.

Full size table

The DR region was the only region where the IS element is found in almost all isolates. The results of lineage specific PIRs indicate a lack of sequence specificity or general preferential location similar to that suggested by Thorne et al²⁴. Since it is difficult to find IS insertions in exactly the same nucleotide position in two different isolates, in this study a window of 100 nucleotides was used to mark conservation. The exact site of conservation varied in different cases, for example, all L2 isolates had an insertion in 1592–1594 whereas, region 888700–888900 was found with different insertion locations in almost all lineages.

IS6110 in intragenic region

Insertion of a number of IS6110 was observed within coding regions of genes (9750 out of 19366 insertion sites). It appears that there is a low degree of specificity in terms of selection of genes as IS6110 was found in 368 genes. DAVID enrichment analysis⁴² using gene ontology terms was performed in order to identify any functional specificity associated with these insertions. Plasma membrane, Cell membrane, Glycerophospholipid metabolism, Mycobacterial PPE protein, Mycobacterial pentapeptide repeat, naphthalene and anthracene degradation were some of the top ranked terms (p-value < 0.0001). Genes were also classified using COG and the results are shown in supplementary figure 2. Genes that map to the class cell motility are highly represented, followed by signal transduction, defense, replication-recombination-repair mechanisms and cell wall/membrane/envelope biogenesis. The element was not seen in genes that map to COG classes intracellular trafficking, secretion and vesicular transport. Tuberculist⁴³ classification of M. tuberculosis genes was also used and the results showed high level of insertions in repetitive sequences and gene families, such as PE/PPE family proteins, IS and phages and regulatory proteins. This study suggests that genes involved in Information pathways and virulence do not normally host these elements due to their importance in pathogenesis. Predominant IS6110 insertion and SNPs⁴⁴ in PE/PPE family genes signifies increased fitness or relaxed selection in this class of proteins.

IS6110 as a mobile promoter

IS6110 carries a promoter element and can activate a gene when inserted upstream of it’s coding sequence. In order to identify elements that are likely to be close to a functional gene, insertions at <400 bp²¹ from transcription start site and in the same orientation as the downstream gene were located. It is expected that expression of some of these genes may be regulated by the element. A total of 4796 such positions were identified in 1377 isolates. Overall 178 unique genes were found that carried IS6110 insertion immediately in the upstream region. Twenty seven of these genes are thought to be essential. These genes belong to mainly three categories, transposase, oxidoreductase and PE-PPE family proteins. TrpD and Rv1668c (ABC transporter) are the exceptions. TrpD is essential for survival of bacteria in activated macrophages and during lung colonization⁴⁵. As a consequence, up-regulation of this gene is beneficial for the organism. Several other insertions upstream of the phoP gene at 131, 46, 94 and 41 bp upstream in L2, L3, L1 and L5 isolates respectively was obseerved. Previously reported multi-drug resistant L2 isolates also displayed an insertion 75 bp upstream of the phoP gene. Transcriptional regulators, other than phoP are also the target of IS6110 insertions. For example, the promoter regions of Rv0894 (L2), Rv1033c/trcR, Rv3124 (L3), Rv3246c/mtrA and Rv3334 (L4) were also found to have IS6110 elements. Many insertions in the promoter regions detected by us have already been reported such as, the promoter regions of Rv2353, Rv2280, Rv3427 (38 bp upstream in 300 L2, 2 L3 and 3 L4 isolates) and Rv3018 (308 bp upstream of 232 L2 isolates)^34,46 validating the method used here. ESAT-6 related proteins play an important role in mycobacterial virulence. Several Esat-6 related genes, such as esxJ, esxQ, esxR and esxS were found with IS6110 in their promoter regions in a number of isolates.

Identification of IS6110 mediated LSPs

IS elements are known for their ability to cause LSPs due to recombination of neighboring copies⁴⁷. Experimental methods to identify IS6110-mediated recombination events look for IS elements in the absence of surrounding direct repeats²¹. Deletions and inversions from 1377 isolates were identified and events where neighboring regions showed the presence of IS6110 were filtered. In total, 2414 such events were identified from all isolates. Polymorphism with respect to the length of sequence involved was also observed in the conserved polymorphic sites. Table 5 lists some of the common variants found in at least 10 isolates. Events noticed in <10 isolates are not included in the table as these events may be sporadic and are unlikely to display any pattern. No evidence for lineage specific LSPs (present in most of the isolates of one lineage but absent in others) was found. Mostly two types of events were identified: i) deletion of a neighboring region of IS6110 and ii) inversion of a region containing IS6110 and flanking regions. Moreover, the number of IS6110-mediated LSP in a lineage is positively correlated with the mean copy number of IS6110 (Pearson correlation coefficient, R = 0.811, P < 0.05). It is expected that isolates that show a different distribution of IS elements may display different genomic features. For example, a higher number of IS-mediated recombination events may have more chance of occurring in isolates with high copy number of IS6110. More recombination events would also cause genomic changes that may alter phenotype of the isolates. The presence of an upper limit in copy number is also either due to an active process that inhibits further transposition or due to secondary loss of transposed copies by recombination of neighboring elements.

Table 5 Probable IS6110 mediated large sequence polymorphisms in different isolates.

Full size table

IS6110 based global phylogeny of M. tuberculosis

IS6110 insertion sites in all 1377 isolates were used to construct a phylogenetic tree. Detailed methodology is described in “methods”. The results show that five of seven lineages (L2: cyan, L3: blue, L5: purple, L6: violet, L7: orange) map to distinct clusters in the tree (Fig. 2). Low copy number isolates of L1 (green) and L4 (red) cluster with L7 due to insertion of IS6110 in DR region. As expected, isolates of L4 were distributed in different clusters and not part of a separate cluster. Therefore, only L4 isolates have been analyzed (supplementary figure 3). Isolates belonging to spoligotypes LAM, X and S were found to be in distinct clusters, whereas those with H and T spoligotype patterns cluster together. It is likely that some of the results may be due to wrong assignment of some of these spoligotypes (PolyTB database⁴⁸) due to convergence⁴⁹. Similarities and differences with SNP based phylogenetic tree were clearly visible. A SNP based phylogenetic tree (supplementary figure 4) clearly separated out each of the lineages unlike that of IS6110 insertion sites. Moreover, the former places the two West African lineages in close proximity, in contrast to IS6110 based tree which maps these at a distance. Larger set of data points in SNP based tree is one of the main reasons for better resolution. Though each lineage shows a clear evidence of independent evolution of IS6110 in the Mycobacterial genome, several reasons, such as recombination of neighboring elements, insertion in a transcriptionally activated region or transcriptional control may generate homoplasy or convergent evolution which is otherwise rare in M. tuberculosis complex¹².

Attempts to relate phenotype (such as drug resistance) to IS insertion did not give clear results (absence of PIR). Though there are reports to indicate that location/site of insertion can alter expression of important genes, no difference was observed when drug sensitive and drug resistant isolates were compared (Two sided paired t-test with fraction of isolates positive for IS6110 in each genomic bin; t = 1.1196, df = 288, p = 0.263). However, this study identified insertions in some of the essential genes²⁹ in one or more drug-resistant isolates, but not in drug-sensitive ones. Some of these genes are Rv2026c, Rv2283 and Rv2808 in L2 and Rv3398 in L4.

Conclusion

This study provides novel insights into IS6110 based mycobacterial genome evolution using the largest data size so far reported. Overall, the study identifies IS6110 based molecular markers for strain typing as well as M. tuberculosis classification and can be used for lineage classification. It highlights intra and inter-lineage variations with respect to element copy number, preferential insertion regions and possible effect on genome evolution due to the presence of these elements. In conclusion these results show the importance of global comprehensive analysis of IS6110 insertion with respect to epidemiological and evolutionary perspective of M. tuberculosis genome.

Method

The overall pipeline of IS-element identification is described in supplementary figure 5 following Das et al.⁴⁰. A split-read approach was used for identification of IS-element insertion sites in M. tuberculosis genome. Initially NGS reads that overlapped with IS-elements as well as flanking regions were identified by aligning reads with a reference IS6110 using a local alignment scheme. Local alignment allows soft-clipped reads to align partially with the reference. Reads containing at least 10 bp sequence from flanking regions were considered (overlap with the reference depends on the read length, minimum overlap depends on minimum score threshold of aligner which in turn depends on read length whereas maximum overlap was read length - 10). 5′ fragments and 3′ fragments were trimmed from original NGS reads (supplementary figure 6) and were then clustered separately so that reads coming from a genomic region should fall in the same cluster. To get rid of sequencing errors, one consensus sequence was obtained from each cluster by a local assembly of the sequences in one cluster. Let, m be the number of clusters generated from 5′ fragments and n be the number of clusters generated from 3′ fragments. Copy number of IS6110 was estimated by c = min (m,n) and supported by read depth analysis in respect to average read depth of the genome. m and n consensus sequences were again aligned independently with the reference genome. The method is described in a flowchart in supplementary figure 7. Alignment locations and strand information of 5′–3′ pairs were taken into consideration for finding IS6110 insertion with respect to reference genome. Overlapping alignment locations were used to identify direct repeats generated upon insertion of IS6110. Moreover, a distance of approximately 1355 bp suggested presence of IS6110 in the same locus as that of the reference. Unpaired alignments (as per above criteria) were treated separately for identification of IS-mediated LSPs (deletion and inversion).

The tools along with PERL codes used for implementation and automation are Bowtie2⁵⁰ (for local alignment of reads), BlastClust in BLAST⁵¹ suite (for clustering), CAP3⁵² (for assembly of fragmented reads in each cluster) and BLASTn for alignment of consensus sequences to the reference genome. Parameters for these software are available as supplementary information. Deletions and inversions were predicted by Pindel⁵³. Only deletions and inversions larger than 20 bp were considered. SNP based phylogenetic tree was constructed using NexABP⁵⁴. Jaccard distance⁵⁵ was used to calculate distance matrix for IS6110 based phylogenetic tree construction. All available insertion sites in different isolates were used to construct binary strings for each isolate. Binary strings were generated depending upon the presence or absence of IS element in a 100 bp domain. These strings were then compared to calculate isolate-vs-isolate distance. Phylogenetic trees were constructed by Neighbor-joining algorithm in Phylip⁵⁶. Trees were then visualized using Dendroscope⁵⁷.

Datasets

NGS datasets were downloaded from European Nucleotide Archive, EMBL. Accession numbers are listed in supplementary dataset. Spoligotype and lineage information were obtained from PolyTB database⁴⁸.

Additional Information

How to cite this article: Roychowdhury, T. et al. Analysis of IS6110 insertion sites provide a glimpse into genome evolution of Mycobacterium tuberculosis. Sci. Rep. 5, 12567; doi: 10.1038/srep12567 (2015).

References

Casali, N. et al. Evolution and transmission of drug-resistant tuberculosis in a Russian population. Nat Genet 46, 279–86 (2014).
Article CAS PubMed PubMed Central Google Scholar
Gagneux, S. et al. Variable host-pathogen compatibility in Mycobacterium tuberculosis. Proc Natl Acad Sci U S A 103, 2869–73 (2006).
Article ADS CAS PubMed PubMed Central Google Scholar
Brosch, R. et al. A new evolutionary scenario for the Mycobacterium tuberculosis complex. Proc Natl Acad Sci U S A 99, 3684–9 (2002).
Article ADS CAS PubMed PubMed Central Google Scholar
Tessema, B. et al. Molecular epidemiology and transmission dynamics of Mycobacterium tuberculosis in Northwest Ethiopia: new phylogenetic lineages found in Northwest Ethiopia. BMC Infect Dis 13, 131 (2013).
Article PubMed PubMed Central Google Scholar
Comas, I. et al. Human T cell epitopes of Mycobacterium tuberculosis are evolutionarily hyperconserved. Nat Genet 42, 498–503 (2010).
Article CAS PubMed PubMed Central Google Scholar
Barnes, P. F. & Cave, M. D. Molecular epidemiology of tuberculosis. N Engl J Med 349, 1149–56 (2003).
Article CAS PubMed Google Scholar
Kato-Maeda, M. et al. Strain classification of Mycobacterium tuberculosis: congruence between large sequence polymorphisms and spoligotypes. Int J Tuberc Lung Dis 15, 131–3 (2011).
CAS PubMed Google Scholar
Ford, C. B. et al. Mycobacterium tuberculosis mutation rate estimates from different lineages predict substantial differences in the emergence of drug-resistant tuberculosis. Nat Genet 45, 784–90 (2013).
Article CAS PubMed PubMed Central Google Scholar
Lopez, B. et al. A marked difference in pathogenesis and immune response induced by different Mycobacterium tuberculosis genotypes. Clin Exp Immunol 133, 30–7 (2003).
Article CAS PubMed PubMed Central Google Scholar
Nahid, P. et al. Influence of M. tuberculosis lineage variability within a clinical trial for pulmonary tuberculosis. PLoS One 5, e10753 (2010).
Article ADS PubMed PubMed Central Google Scholar
Hershberg, R. et al. High functional diversity in Mycobacterium tuberculosis driven by genetic drift and human demography. PLoS Biol 6, e311 (2008).
Article PubMed PubMed Central Google Scholar
Coscolla, M. & Gagneux, S. Consequences of genomic diversity in Mycobacterium tuberculosis. Semin Immunol 26, 431–444 (2014).
Article CAS PubMed PubMed Central Google Scholar
Zaczek, A., Ziolkiewicz, M., Wojtasik, A., Dziadek, J. & Sajduda, A. IS6110-based differentiation of Mycobacterium tuberculosis strains. Pol J Microbiol 62, 201–4 (2013).
CAS PubMed Google Scholar
Zaczek, A., Brzostek, A., Wojtasik, A., Dziadek, J. & Sajduda, A. Genotyping of clinical Mycobacterium tuberculosis isolates based on IS6110 and MIRU-VNTR polymorphisms. Biomed Res Int 2013, 865197 (2013).
Article PubMed PubMed Central Google Scholar
Millan-Lou, M. I. et al. Global study of IS6110 in a successful Mycobacterium tuberculosis strain: clues for deciphering its behavior and for its rapid detection. J Clin Microbiol 51, 3631–7 (2013).
Article CAS PubMed PubMed Central Google Scholar
Steensels, D., Fauville-Dufaux, M., Boie, J. & De Beenhouwer, H. Failure of PCR-Based IS6110 analysis to detect vertebral spondylodiscitis caused by Mycobacterium bovis. J Clin Microbiol 51, 366–8 (2013).
Article PubMed PubMed Central Google Scholar
Huyen, M. N. et al. Characterisation of Mycobacterium tuberculosis isolates lacking IS6110 in Viet Nam. Int J Tuberc Lung Dis 17, 1479–85 (2013).
Article CAS PubMed Google Scholar
Fomukong, N. et al. Differences in the prevalence of IS6110 insertion sites in Mycobacterium tuberculosis strains: low and high copy number of IS6110. Tuber Lung Dis 78, 109–16 (1997).
Article CAS PubMed Google Scholar
McEvoy, C. R. et al. The role of IS6110 in the evolution of Mycobacterium tuberculosis. Tuberculosis (Edinb) 87, 393–404 (2007).
Article CAS Google Scholar
Tanaka, M. M., Rosenberg, N. A. & Small, P. M. The control of copy number of IS6110 in Mycobacterium tuberculosis. Mol Biol Evol 21, 2195–201 (2004).
Article CAS PubMed Google Scholar
Alonso, H., Samper, S., Martin, C. & Otal, I. Mapping IS6110 in high-copy number Mycobacterium tuberculosis strains shows specific insertion points in the Beijing genotype. BMC Genomics 14, 422 (2013).
Article CAS PubMed PubMed Central Google Scholar
Warren, R. M. et al. Mapping of IS6110 flanking regions in clinical isolates of Mycobacterium tuberculosis demonstrates genome plasticity. Mol Microbiol 37, 1405–16 (2000).
Article CAS PubMed Google Scholar
Yesilkaya, H., Dale, J. W., Strachan, N. J. & Forbes, K. J. Natural transposon mutagenesis of clinical isolates of Mycobacterium tuberculosis: how many genes does a pathogen need? J Bacteriol 187, 6726–32 (2005).
Article CAS PubMed PubMed Central Google Scholar
Thorne, N. et al. IS6110-based global phylogeny of Mycobacterium tuberculosis. Infect Genet Evol 11, 132–8 (2011).
Article PubMed Google Scholar
Sampson, S. L., Warren, R. M., Richardson, M., van der Spuy, G. D. & van Helden, P. D. Disruption of coding regions by IS6110 insertion in Mycobacterium tuberculosis. Tuber Lung Dis 79, 349–59 (1999).
Article CAS PubMed Google Scholar
Banu, S. et al. Are the PE-PGRS proteins of Mycobacterium tuberculosis variable surface antigens? Mol Microbiol 44, 9–19 (2002).
Article ADS CAS PubMed Google Scholar
Yang, Z. et al. Clinical relevance of Mycobacterium tuberculosis plcD gene mutations. Am J Respir Crit Care Med 171, 1436–42 (2005).
Article PubMed PubMed Central Google Scholar
Maus, C. E., Plikaytis, B. B. & Shinnick, T. M. Mutation of tlyA confers capreomycin resistance in Mycobacterium tuberculosis. Antimicrob Agents Chemother 49, 571–7 (2005).
Article CAS PubMed PubMed Central Google Scholar
Sassetti, C. M., Boyd, D. H. & Rubin, E. J. Comprehensive identification of conditionally essential genes in mycobacteria. Proc Natl Acad Sci U S A 98, 12712–7 (2001).
Article ADS CAS PubMed PubMed Central Google Scholar
Rengarajan, J. et al. The folate pathway is a target for resistance to the drug para-aminosalicylic acid (PAS) in mycobacteria. Mol Microbiol 53, 275–82 (2004).
Article CAS PubMed Google Scholar
McAdam, R. A. et al. Characterization of a Mycobacterium tuberculosis H37Rv transposon library reveals insertions in 351 ORFs and mutants with altered virulence. Microbiology 148, 2975–86 (2002).
Article CAS PubMed Google Scholar
Fang, Z. et al. IS6110-mediated deletions of wild-type chromosomes of Mycobacterium tuberculosis. J Bacteriol 181, 1014–20 (1999).
CAS PubMed PubMed Central Google Scholar
Beggs, M. L., Eisenach, K. D. & Cave, M. D. Mapping of IS6110 insertion sites in two epidemic strains of Mycobacterium tuberculosis. J Clin Microbiol 38, 2923–8 (2000).
CAS PubMed PubMed Central Google Scholar
Safi, H. et al. IS6110 functions as a mobile, monocyte-activated promoter in Mycobacterium tuberculosis. Mol Microbiol 52, 999–1012 (2004).
Article CAS PubMed Google Scholar
Soto, C. Y. et al. IS6110 mediates increased transcription of the phoP virulence gene in a multidrug-resistant clinical isolate responsible for tuberculosis outbreaks. J Clin Microbiol 42, 212–9 (2004).
Article CAS PubMed PubMed Central Google Scholar
Green, E. et al. IS6110 restriction fragment length polymorphism typing of drug-resistant Mycobacterium tuberculosis strains from northeast South Africa. J Health Popul Nutr 31, 1–10 (2013).
Article PubMed PubMed Central Google Scholar
Thabet, S., Karboul, A., Dekhil, N. & Mardassi, H. IS6110-5'3'FP: an automated typing approach for Mycobacterium tuberculosis complex strains simultaneously targeting and resolving IS6110 5' and 3' polymorphisms. Int J Infect Dis 29C, 211–218 (2014).
Article Google Scholar
Kivi, M., Liu, X., Raychaudhuri, S., Altman, R. B. & Small, P. M. Determining the genomic locations of repetitive DNA sequences with a whole-genome microarray: IS6110 in Mycobacterium tuberculosis. J Clin Microbiol 40, 2192–8 (2002).
Article CAS PubMed PubMed Central Google Scholar
Reyes, A. et al. IS-seq: a novel high throughput survey of in vivo IS6110 transposition in multiple Mycobacterium tuberculosis genomes. BMC Genomics 13, 249 (2012).
Article CAS PubMed PubMed Central Google Scholar
Das, S. et al. Genetic heterogeneity revealed by sequence analysis of Mycobacterium tuberculosis isolates from extra-pulmonary tuberculosis patients. BMC Genomics 14, 404 (2013).
Article CAS PubMed PubMed Central Google Scholar
Wall, S., Ghanekar, K., McFadden, J. & Dale, J. W. Context-sensitive transposition of IS6110 in mycobacteria. Microbiology 145 (Pt 11), 3169–76 (1999).
Article CAS PubMed Google Scholar
Huang da, W., Sherman, B. T. & Lempicki, R. A. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc 4, 44–57 (2009).
Article PubMed Google Scholar
Lew, J. M., Kapopoulou, A., Jones, L. M. & Cole, S. T. TubercuList--10 years after. Tuberculosis (Edinb) 91, 1–7 (2011).
Article Google Scholar
McEvoy, C. R. et al. Comparative analysis of Mycobacterium tuberculosis pe and ppe genes reveals high sequence variation and an apparent absence of selective constraints. PLoS One 7, e30593 (2012).
Article ADS CAS PubMed PubMed Central Google Scholar
Lee, C. E., Goodfellow, C., Javid-Majd, F., Baker, E. N. & Shaun Lott, J. The crystal structure of TrpD, a metabolic enzyme essential for lung colonization by Mycobacterium tuberculosis, in complex with its substrate phosphoribosylpyrophosphate. J Mol Biol 355, 784–97 (2006).
Article CAS PubMed Google Scholar
Alonso, H. et al. Deciphering the role of IS6110 in a highly transmissible Mycobacterium tuberculosis Beijing strain, GC1237. Tuberculosis (Edinb) 91, 117–26 (2011).
Article CAS Google Scholar
Ooka, T. et al. Inference of the impact of insertion sequence (IS) elements on bacterial genome diversification through analysis of small-size structural polymorphisms in Escherichia coli O157 genomes. Genome Res 19, 1809–16 (2009).
Article CAS PubMed PubMed Central Google Scholar
Coll, F. et al. PolyTB: a genomic variation map for Mycobacterium tuberculosis. Tuberculosis (Edinb) 94, 346–54 (2014).
Article CAS Google Scholar
Comas, I., Homolka, S., Niemann, S. & Gagneux, S. Genotyping of genetically monomorphic bacteria: DNA sequencing in Mycobacterium tuberculosis highlights the limitations of current methodologies. PLoS One 4, e7815 (2009).
Article ADS PubMed PubMed Central Google Scholar
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat Methods 9, 357–9 (2012).
Article CAS PubMed PubMed Central Google Scholar
Altschul, S. F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25, 3389–402 (1997).
Article CAS PubMed PubMed Central Google Scholar
Huang, X. & Madan, A. CAP3: A DNA sequence assembly program. Genome Res 9, 868–77 (1999).
Article CAS PubMed PubMed Central Google Scholar
Ye, K., Schulz, M. H., Long, Q., Apweiler, R. & Ning, Z. Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads. Bioinformatics 25, 2865–71 (2009).
Article CAS PubMed PubMed Central Google Scholar
Roychowdhury, T., Vishnoi, A. & Bhattacharya, A. Next-Generation Anchor Based Phylogeny (NexABP): constructing phylogeny from next-generation sequencing data. Sci Rep 3, 2634 (2013).
Article ADS PubMed PubMed Central Google Scholar
Dale, J. W. et al. Evolutionary relationships among strains of Mycobacterium tuberculosis with few copies of IS6110. J Bacteriol 185, 2555–62 (2003).
Article CAS PubMed PubMed Central Google Scholar
Felsenstein, J. PHYLIP - phylogeny inference package ( version 3.2). Cladistics 5, 164–166 (1989).
Google Scholar
Huson, D. H. et al. Dendroscope: An interactive viewer for large phylogenetic trees. BMC Bioinformatics 8, 460 (2007).
Article PubMed PubMed Central Google Scholar

Download references

Acknowledgements

The authors thank Department of Biotechnology, Government of India for financial support, Department of Science and Technology, Government of India for J.C. Bose fellowship.

Author information

Authors and Affiliations

School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi
Tanmoy Roychowdhury, Saurav Mandal & Alok Bhattacharya
School of Life Sciences, Jawaharlal Nehru University, New Delhi
Alok Bhattacharya

Authors

Tanmoy Roychowdhury
View author publications
You can also search for this author in PubMed Google Scholar
Saurav Mandal
View author publications
You can also search for this author in PubMed Google Scholar
Alok Bhattacharya
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

T.R. and A.B. conceptualized the study. T.R. developed the pipeline. T.R. and S.M. performed the computational analysis. A.B. and T.R. wrote the manuscript. All authors reviewed the manuscript.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Electronic supplementary material

Supplementary Information

Rights and permissions

This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/

Reprints and permissions

About this article

Cite this article

Roychowdhury, T., Mandal, S. & Bhattacharya, A. Analysis of IS6110 insertion sites provide a glimpse into genome evolution of Mycobacterium tuberculosis. Sci Rep 5, 12567 (2015). https://doi.org/10.1038/srep12567

Download citation

Received: 24 December 2014
Accepted: 15 June 2015
Published: 28 July 2015
DOI: https://doi.org/10.1038/srep12567
Springer Nature Limited

This article is cited by

Direct TAMRA-dUTP labeling of M. tuberculosis genes using loop-mediated isothermal amplification (LAMP)
- Basma Altattan
- Jasmin Ullrich
- Frank F. Bier
Scientific Reports (2024)
Delineating the Acquired Genetic Diversity and Multidrug Resistance in Alcaligenes from Poultry Farms and Nearby Soil
- Abhilash Bhattacharjee
- Anil Kumar Singh
Journal of Microbiology (2024)
Selection of IS6110 conserved regions for the detection of Mycobacterium tuberculosis using qPCR and LAMP
- Andrey Kechin
- Igor Oscorbin
- Maksim Filipenko
Archives of Microbiology (2023)
Unexpected diversity of CRISPR unveils some evolutionary patterns of repeated sequences in Mycobacterium tuberculosis
- Guislaine Refrégier
- Christophe Sola
- Christophe Guyeux
BMC Genomics (2020)
Isolation and comparative genomics of Mycobacterium tuberculosis isolates from cattle and their attendants in South India
- Kannan Palaniyandi
- Narender Kumar
- Sharon J. Peacock
Scientific Reports (2019)

Analysis of IS6110 insertion sites provide a glimpse into genome evolution of Mycobacterium tuberculosis

Abstract

Similar content being viewed by others

Deciphering the recent phylogenetic expansion of the originally deeply rooted Mycobacterium tuberculosis lineage 7

Construction of a virtual Mycobacterium tuberculosis consensus genome and its application to data from a next generation sequencer

Structure and variation of CRISPR and CRISPR-flanking regions in deleted-direct repeat region Mycobacterium tuberculosis complex strains

Introduction

Results & Discussion

Identification of IS elements from NGS data

Lineage specific insertions

IS6110 in intragenic region

IS6110 as a mobile promoter

Identification of IS6110 mediated LSPs

IS6110 based global phylogeny of M. tuberculosis

Conclusion

Method

Datasets

Additional Information

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Ethics declarations

Competing interests

Electronic supplementary material

Supplementary Information

Supplementary Information

Rights and permissions

About this article

Cite this article

This article is cited by

Direct TAMRA-dUTP labeling of M. tuberculosis genes using loop-mediated isothermal amplification (LAMP)

Delineating the Acquired Genetic Diversity and Multidrug Resistance in Alcaligenes from Poultry Farms and Nearby Soil

Selection of IS6110 conserved regions for the detection of Mycobacterium tuberculosis using qPCR and LAMP

Unexpected diversity of CRISPR unveils some evolutionary patterns of repeated sequences in Mycobacterium tuberculosis

Isolation and comparative genomics of Mycobacterium tuberculosis isolates from cattle and their attendants in South India

Navigation

Analysis of IS6110 insertion sites provide a glimpse into genome evolution of Mycobacterium tuberculosis

Abstract

Similar content being viewed by others

Introduction

Results & Discussion

Identification of IS elements from NGS data

Lineage specific insertions

IS6110 in intragenic region

IS6110 as a mobile promoter

Identification of IS6110 mediated LSPs

IS6110 based global phylogeny of M. tuberculosis

Conclusion

Method

Datasets

Additional Information

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Ethics declarations

Competing interests

Electronic supplementary material

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Navigation