Introduction

Jute is the second most important natural fibre crop after cotton in the world. Unlike cotton it is a bast fibre, wherein fibre is extracted from the stem. It is cultivated mainly in the Southeast Asian countries such as India, Bangladesh, Nepal, China, Indonesia, Thailand and Myanmar and few South American countries (Kundu 1956). Jute industry is one of the oldest industries in India. Despite stiff competition from synthetic fibres, jute has great potential as a bio-fibre in this era of growing ecological concerns due to its strong as well as biodegradable nature. Jute fibre has been traditionally used for packaging such as bags, sacks, ropes, and carpets. But recently, it is being used in making various handicrafts for home decorations, shoes, geo-textiles and jewelleries. Although jute is a low-input-requiring crop, incidence of pests and diseases poses a major threat to its cultivation. Stem rot caused by Macrophomina phaseolina is the most important disease which is prevalent in almost all the jute-growing areas of the world. M. phaseolina can infect the jute crop at any stage of growth causing seedling blight, leaf spot, root rot and stem rot (Biswas et al. 2011). It affects both yield and quality of the fibre. Average yield loss due to this disease is about 10 % which can go up to 35–40 % in case of severe infection (Roy et al. 2008). Due to soil-borne nature of the pathogen, the disease is endemic in certain pockets and its management is difficult by application of fungicides. Host plant resistance-based management of Macrophomina could be a potential option for resource-poor jute farmers of the Indian subcontinent. But there is no resistant variety in jute against this disease. Very little work has been done on identifying the sources of resistance. So far no studies have identified genomic regions involved in resistance against the pathogen M. phaseolina in jute until we identified a significant number of defence genes involved in the defence response, namely cell wall biosynthesis, reactive oxygen species (ROS), salicylic acid (SA), ethylene, jasmonic acid (JA), abscisic acid (ABA), hormone signalling, hypersensitive response (HR) and programmed cell death (PCD) pathways (Biswas et al. 2014). Jute varieties have been mainly developed by conventional breeding including pure line selection (Ghosh 1983). Varietal development in jute would be facilitated by newer genomic tools through identification and selection of preferred genes. It has been recognized that molecular marker-based approaches, particularly marker-assisted breeding, have the potential to accelerate the pace of achieving the targeted goals in any crop breeding programme (Mir et al. 2009).

Molecular markers came into use in plant molecular biology as early as in late 1980s when restriction fragment length polymorphisms (RFLPs) were the most popular markers. Advent of PCR technology in early 1990s led to the development of new generation of PCR-based markers such as random amplification of polymorphic DNA (RAPD), amplified fragment length polymorphism (AFLP) and simple sequence repeats (SSRs) (Mammadov et al. 2012a). Among these low- and medium-throughput molecular markers, SSR markers were declared as “markers of choice” as they were highly polymorphic, reproducible and amenable to automation (Powell et al. 1996). However, in jute, molecular markers were used rather late in the beginning of twenty-first century (Islam et al. 2005). Jute SSRs were developed and polymorphism was studied using parental genotypes of a mapping population developed for fibre fineness (Mir et al. 2008, 2009; Das et al. 2012). Although molecular markers have been developed for fibre yield and quality in jute, so far no molecular marker has been developed for screening resistance against stem rot pathogen M. phaseolina. The hegemony of medium-throughput SSRs has been eroded by single nucleotide polymorphism (SNP) markers, because SNPs are highly abundant in the genome and are amenable to automation and multiplexing (Gupta et al. 2008). But, replacement of SSRs has not been at the same pace in all the crops. Development and application of SNPs has been more in crops having less complex genomes such as rice, maize and soybean (Mammadov et al. 2012b). Despite the small genome size (Sarkar et al. 2011), no SNP markers have been developed in jute till date which may be attributed to lack of funding and lack of attention to this orphan crop.

The objective of the present study was developing novel SNP markers and construction of SNP-based linkage map for a RIL population of jute with varying level of resistance to the stem rot pathogen M. phaseolina, which would constitute an important set of tools for marker-assisted selection (MAS) programmes aimed at developing stem rot-resistant jute cultivars.

Materials and methods

Plant material and DNA extraction

In the present study, 177 (F6) lines of a RIL population of jute developed by crossing a stem rot-resistant accession CIM 036 with a susceptible variety JRC 412 were used (Biswas et al. 2014). The fungal pathogen M. phaseolina was maintained at 25 °C on potato dextrose agar (PDA). All the 177 RIL lines were challenged inoculated with 3-day-old fungal culture. One-week-old jute plants were sprayed with fungal suspension containing 6.2 × 103 cfu per ml, and the inoculum was prepared following the procedure described by Biswas et al. 2013. Untreated healthy plants served as control. Frozen leaf tissue samples from inoculated as well as healthy plants of each progeny genotype were ground and genomic DNA was extracted using the DNeasy®96 Plant Kit (QIAGEN, Hilden, Germany). DNA was re-suspended in 1 × TE buffer to a concentration of 50 ng/μl and stored at −20 °C.

SNP discovery and validation

NextGENe software v1.96 (SoftGenetics, State College, PA, USA) platform was used to identify SNP from transcriptome sequence data (Biswas et al. 2014) of the RIL population. All base variants were identified by using aligned high-quality sequences obtained from the mapping population. All insertion and deletion (indel) variants were excluded from further analysis. Subsequently, SNPs were filtered using the following criteria: (1) base variants in homozygous condition within each genotype; (2) read coverage equal to or greater than 4; and (3) absence of any other base variants within 20 bp segments flanking each SNP. A subset of 50 SNPs was selected for experimental validation by Sanger sequencing. Primer pairs were designed using Primer blast software (NCBI) and OligoCalc: Oligo nucleotide Properties Calculator (http://www.basic.northwestern.edu/biotools/ oligocalc.html). PCR reactions contained 15 ng of genomic DNA in a 20 μl reaction with 5 μM of each primer pair. The amplification conditions were as follows: a hot start at 94 °C for 10 min, followed by 40 cycles of 95 °C for 30 s, 52 °C for 30 s and 72 °C for 1 min, and a final elongation step at 72 °C for 10 min. PCR products were purified in a 15 μl reaction containing 0.5 U exonuclease I (New England Biolabs), 0.5 U alkaline phosphatase (New England Biolabs) and 6 μl of PCR product. Sequencing reactions were performed in a total volume of 7.5 μl, each reaction contained 3.2 μM primer, BigDye® Terminator v3.1 (Life Technologies Bangalore, India) and BigDye® sequencing buffer (Life Technologies, Bangalore, India) and were subjected to cycling conditions as described in manufacturer’s instructions. Extension products were purified by the ethanol/EDTA/sodium acetate precipitation method, re-suspended in 12 μl formamide and separated on the ABI automated capillary electrophoresis platform. DNA sequence analysis and alignment were performed using Sequencer 4.7, while contig assembly and the SNP validation were performed visually.

SSR genotyping

Genomic DNAs of RIL lines including mapping parents were screened by using a larger set of genome-wide 2496 pairs of SSR markers (Mir et al. 2009) for polymorphism detection. Primer synthesis, selection of polymorphic markers and PCR amplifications were performed. PCR products were combined with the ABI GeneScan LIZ500 size standard and analysed using an ABI3730xl (Life Technologies) capillary electrophoresis platform according to the manufacturer’s instructions. Allele sizes were scored using GeneMapper® 3.7 software package (Life Technologies).

Genetic map construction and selection of maximum recombinants

A genetic map was constructed using Joinmap 3.0 (van Ooijen and Voorrips 2001) with a threshold log-of-odds (LOD) score of 3 using SSR-derived genotyping data, providing the basis for selection of maximally recombinant individuals in the mapping population using MapPop version 1.0 (Brown and Vision 2000).

SNP genotyping

A primary set of SNPs was selected for GoldenGate® primer design (Illumina Inc., San Diego, CA, USA). A rank score (0–1) was calculated for each SNP by Illumina. Finally, SNPs with design ability scores between 0.7 and 1.0 were selected for development of an Illumina GoldenGate® oligonucleotide pool assay (OPA) for genotyping. Individuals were SNP-genotyped according to the manufacturer’s instructions using 250 ng of template genomic DNA. The genotyping assays were processed by the Illumina iScan reader. Automatic allele calling was achieved using the Illumina Genome Studio software v2011.1 with a GeneCall threshold of 0.20 and checking the output visually as well for the confirmation of cluster specificity.

Genetic linkage mapping

The genetic linkage map was generated using Map Manager Software version QTXb19 (Manly et al. 2001). Markers with a Chi-square score >10 were not included in further analysis. Distances were calculated using the Kosambi mapping function (Kosambi 1944) at a threshold LOD score of 3. Linkage groups (LGs) were assigned on the basis of marker loci. LGs were drawn using Mapchart software v 2.2 (Voorrips 2002).

Comparative genome analysis

Sequence of map—assigned SNP and SSR were used to perform comparative analysis with genome assemblies of cotton (NCBI project no: PRJNA203021). BLASTN was used to conduct similarity searches against each genome sequence with threshold—E value of 10−9.

Phenotypic screening of RIL lines

All the 177 lines of the RIL population were screened at seedling stage against M. phaseolina through challenged inoculation. One-week-old jute seedlings were sprayed with fungal suspension, and visual observations were recorded 3 days after inoculation. Scoring of seedling blight was made using 0–5 scale (Biswas et al. 2009), and Percent Disease Index (PDI) was calculated for each line.

Results

SNP discovery and validation

A total of 37,238 putative SNPs were discovered from transcriptome sequence data obtained from the RIL population. An average frequency of 1.70 SNPs per kb was observed. A preliminary set of 21,000 SNPs was selected following elimination of indels. In order to remove sequence error, clustering and contig assembly was performed with the Phrap program, (Gordon et al., 1998). Phrap was run with the options “-trim_start 50 min match 50”. The PolyBayes program (Marth et al., 1999) was used to detect putative SNPs in the sequence alignments and give a probability of being a true SNP to each base substitution. Only SNPs with >95 % probability of being true SNPs were retained. The contig consensus sequences were used as the anchor sequences. When there were more than one SNP within a 50 bp window, these putative SNPs were removed from further consideration. The SNPs in each data set were then compared using both the flanking 50 bp of sequence and the SNP alleles, and if these were a perfect match, the putative SNPs were considered to be cross-validated. After filtration, based on the criteria of homozygous status and absence of other known SNPs in the vicinity, a subset of 956 high-quality SNPs was obtained. Of these, a total of 953 satisfied the required primer design criteria, and a final subset of 768 SNP loci with a design ability rank of 1 was selected for GoldenGate® assay. Analysis of nucleotide variation revealed that transition substitutions were more predominant (2:1) than transversions. The two most common SNP variants were A/G and C/T, representing 36 % and 32 % of all changes, respectively. The other SNP variants (T/G, C/G, A/C and A/T) accounted for less than 10 % of the total. A subset of 48 SNP loci was verified through Sanger sequencing prior to 768-plex SNP OPA synthesis (Additional file 1), of which 45 were concordant with prediction (Additional file 2, 3).

Framework genetic map construction and selection of maximum recombinant individuals

A total of 86 of 376 genomic DNA-derived SSRs and EST–SSRs (40 %) designed by Mir et al. 2009 revealed polymorphism between the parental genotypes, of which 66 were selected for screening of the mapping population on the basis of consistency of amplification. A subset of 48 SSR markers generated data of sufficient quality to generate a framework genetic map, and 40 loci (85 %) were assigned to nine LGs. These data were then used for bin mapping.

SNP genotyping

A total of 768 SNPs were used to genotype the 177 selected RILs. All SNPs were visually qualified, the majority producing two major clusters in Genome Studio representing the homozygous (AA and BB) genotypic classes, but occasionally a third small cluster of heterozygous (AB) genotypes was also observed (Additional file 4). The mapping population was descended to the F6 level, so residual heterozygosity was expected to be low (c. 5–10 %).

A total of 705 SNPs (91.7 %) produced coherent data, while those generating ambiguous cluster structures were removed from further analysis. A subset of 462 SNPs (65 %) generated polymorphic clusters mapping population and were used for genetic linkage map construction.

Linkage mapping

A total of 73 markers (13.5 %) were excluded from linkage analysis due to excessive heterozygosity, missing data, skewed segregation or ambiguity. A final set of 458 markers (48 SSRs and 410 SNPs) was used for linkage map construction. A small proportion of markers were ungrouped, such that 458 (98 %), comprising 48 SSRs and 410 SNPs (Table 1), were assigned to nine LGs (Additional file 4). The estimated cumulative total map length was 2016 cM with an average inter-locus interval of 4.2 cM (Fig. 1; Table 2). LG identity and orientation were determined by comparison with the Gossypium hirsutum genome, as well as from the use of previously map-assigned SSRs as anchoring markers.

Table 1 Total number of markers analysed, tested for polymorphism and assigned to genetic linkage map locations
Fig. 1
figure 1

Genetic linkage map of the CIM 036 × JRC 412. The markers are shown on the right of the linkage groups, and map distances between markers are indicated in cM on the left. For presentation purposes, only one of a set of colocated genetic markers are shown on the map

Table 2 Marker distribution over the LGs of the CIM 036 × JRC 412 map

Comparative genome analysis

Comparison of the Corchorus capsularis map with the cotton genome revealed the highest number of matches (97 %) (Additional file 5). C. capsularisG. hirsutum macro synteny was observed for 292 (94 %) sequences. Among G. hirsutum chromosomes, Gh5, 1, 3 and 7 exhibited synteny and collinearity with jute linkage groups Cc I, Cc II, Cc III and Cc V, respectively (Figs. 2, 3). Conversely, Gh2 and 6 contained the lowest number of jute orthologues, revealing more complex relationships with CcLGs. Despite a large number of matches (294), significant chromosomal rearrangements were observed between the two genomes, such that each CcLG exhibited substantial synteny with more than one G. hirsutum chromosome.

Fig. 2
figure 2

Schematic representation of syntenic relationships between jute (LGs Cc I–III) and the G. hirsutum genome. LGs or chromosomes are shaded in different colours for presentation purposes. The red-shaded LGs are from jute, and the green chromosomes are from cotton. The lines represent the corresponding positions of orthologous sequences. (Color figure online)

Fig. 3
figure 3

Schematic representation of syntenic relationships between jute (LGs Cc IV–VII) and the cotton genome. Details are as for Fig. 2

Phenotypic screening of RIL lines and candidate gene selection

The RIL population showed variation in resistance against M. phaseolina. The RIL lines were graded as resistant, moderately resistant, moderately susceptible, susceptible and highly susceptible based on PDI values of 0–1, 1–10, 10–20, 20–50 and 50–100, respectively. Fifty-five lines were found resistant against the disease and 37 lines were highly susceptible (Additional file 6). Marker-associated sequences to C. capsularis genome directly identified candidate genes with functional annotations as NBS-LLR class gene; JAZ1 (jasmonic acid regulatory gene) nuclear-localized protein, cinnamate-4-hydroxylase, myb family transcription factor, MAP kinase and HOPW1-1(W1N1) (salicylic acid regulatory gene) (Fig. 4). The GALV01039983 gene was located in the interval between SNP markers SNP_JMP000313 and SNP_JMP000353, in the vicinity of Cc III and was annotated as a disease resistance protein.

Fig. 4
figure 4

Syntenic relationships between disease resistance gene containing regions of the jute (C. capsularis) genetic map and the G. hirsutum genome, indicating candidate gene locations. LGs or chromosomes are shaded in different colours for presentation purposes. The purple-shaded LGs are from jute, and the yellow chromosomes are from G. hirsutum. (Color figure online)

Discussion

Breeding for resistance to jute stem rot has been hindered by lack of resistant sources and by inefficient disease screening. We developed a RIL population of jute by crossing one resistant accession CIM 036 and a susceptible variety JRC 412 (Biswas et al. 2014), in which 69 lines were found resistant to M. phaseolina through challenged inoculation and transcriptome analysis followed by gene annotation. These lines may serve as resistant parents in future breeding programme. Breeding efficiency can be further improved by identifying molecular markers linked to host resistance. SNPs are good candidates for marker development as they constitute most frequent type of variation found in DNA. There are several methods of SNP discovery, viz. SNP mining from expressed sequence tag (EST), on the basis of array hybridization or amplicon re-sequencing, by whole genome sequencing and recently by using high-throughput sequencing technologies (Deleu et al. 2009).

SNP variation

SNP frequency detected in the present study was much lower than other reported cereals crops, viz. 16.5 SNPs per kb in wheat, 4.2 SNPs per kb in rice (Barker and Edwards 2009) and less similar to those for some M. truncatula (1.96 SNPs per kb) (Choi et al., 2004) and soybean (2.06 SNPs per kb), but revealed higher similarity value with 1.8 SNPs per kb in G. hirsutum (Shen et al. 2010). The patterns of nucleotide substitution showed A/G and C/T to be the most common base changes. The high proportion of C/T transitions is likely to be partially due to deamination of 5-methylcytosine reactions, which occurs frequently over evolutionary time, particularly at CpG dinucleotides (Holliday and Grigg 1993). The present study provides additional SNP markers that can be utilized for molecular breeding programmes. The success rate for SNP genotyping was 91 %. Success of SNP genotyping depends on many factors including base variant selection, adjacent SNP frequency, presence of repetitive sequences and finally, design ability score.

Genetic linkage mapping

Several jute linkage maps have been previously developed with successive adoption of new molecular marker technologies (Das et al. 2012; Topdar et al. 2013). The linkage map constructed in the present study exhibits a regular marker distribution, but a significantly longer cumulative genetic map (2016 cM). Several factors may be responsible, including the genetic constitution of different mapping populations, mapping strategies, number and type of mapped loci, the choice of mapping software and ratio between number of markers and population size (Liu et al. 2008; Sim et al. 2012).

Comparative genome analysis

Extensive conservation of genome structure between C. capsularis and G. hirsutum was consistent with the closer phylogenetic relationship between these species than for the other Malvaceae species used in this study. Broad conservation of chromosome structure was observed between the eight chromosomes of G. hirsutum and seven LGs of jute, as well as evidence for evolutionary translocations. A number of previous studies (Shen et al. 2010; Said et al. 2013) have described high levels of conservation associated with comparisons to Gh1 and 5, moderate conservation of Gh3, 4, 7 and 8, and low levels of conservation for Gh2 and 6. Unlike other Gh chromosomes, Gh6 is short in length with a large number of repeats, low gene content (but a significant number of NBS-LRR disease resistance genes) and high heterochromatin content. Cc VI, which matches Gh2 and 6, contained the least number of orthologous sequence queries, consistent with these prior studies. The situation may potentially be remedied by development of a larger scale of markers from Cc VI despite tenfold difference in the genome size between G. hirsutum and C. capsularis.

The present study describes the development of a multiplexed set of EST-derived SNPs for genetic linkage map construction in C. capsularis. Sequence-associated markers were used to predict candidate genes for resistance. This information may be used for the development of linked and diagnostic polymorphisms for MAS of resistant cultivars.