Introduction

Jatropha curcas (L.), commonly known as jatropha or physic nut, is a perennial, large shrub or small tree of the family Euphorbiaceae. It is a diploid plant species (2n = 2x = 22) (Jha et al. 2007) with an estimated genome size of 416 Mb (Carvalhoa et al. 2008). Jatropha has attracted interest as a non-food oil seed plant for biofuel/biodiesel production. Its seeds contain 25–40 % oil by weight (Deng et al. 2010). Jatropha has advantages of drought hardiness, rapid growth, easy propagation, adaptation to wide agro-climatic conditions, low seed cost and short seed filling period (Achten et al. 2008). Jatropha biofuel is reported to be non-toxic, clean and eco-friendly (Jha et al. 2007). However, growing jatropha is not economical because of its low yield, asynchronous flowering and fruiting, presence of toxic and carcinogenic compounds, lack of good germplasm and unavailability of quality planting materials (Wei et al. 2012). Among various plant breeding approaches, interspecific hybridization is an immediate option for genetic enhancement of jatropha (Sujatha 2006). Jatropha-related species such as J. integerrima, J. multifida and J. podagrica are well known and cultivated throughout the tropics as ornamental plants. There are a number of beneficial traits from the other Jatropha species that may be transferred to jatropha such as heavy fruit bearing, photoperiod insensitivity, high oil content, desirable oil quality, plant architecture, earliness, and reduced toxicity of endosperm proteins (Sujatha 2006).

In addition to jatropha oil, the seed cake left after oil extraction also contains high protein content which is well-balanced in amino acid composition, making it suitable for animal feed (Saetae et al. 2011). However, seeds of most jatropha accessions contain toxic compounds such as trypsin inhibitor, phytic acid, lectin, saponin, and phorbol esters (PEs) (Menezes et al. 2006). Thus the raw seed cake should not be fed directly to animals (King et al. 2009). Fortunately, most toxic substances are heat-labile, except for PEs—a group of diterpene esters—that require additional and special processing steps to detoxify them (Brooker 2009). Breeding for low toxic jatropha is a major goal in most jatropha breeding programs. However, chemical analysis for these toxic substances is complex, time-consuming and expensive. Selection using molecular markers tightly linked to the gene(s) conditioning these traits is easier, faster and more cost effective.

Molecular breeding approaches are considered an alternative for genetic improvement of most economically important crop plants. The basic requirements for any successful marker-assisted breeding program include the availability of a reliable molecular marker system. Among various DNA markers, expressed sequence tag-simple sequence repeat (EST-SSR), and single nucleotide polymorphisms (SNPs) are the markers of choice. EST-SSRs are co-dominant, reproducible, and have high transferability across species, leading to cost effective identification of functional candidate genes (Varshney et al. 2005), while SNP is also co-dominant, highly polymorphic, and most abundant even with small genetic variation among related sequences of DNA, giving high potential to detect associations between allelic forms of a gene and phenotype (Rafalski 2002). De novo assembly of transcript sequences produced by next-generation sequencing technologies offers a rapid approach to obtain expressed gene sequences for non-model organisms. Apical meristematic cells give rise to various organs of a plant and keep the plant growing (Galun 2007). Thus, high-throughput transcriptome sequencing from the apical meristem of Jatropha spp. is helpful in generating a large amount of transcript sequences for gene discovery and molecular marker development. In this study, the transcriptome of jatropha and its related species were sequenced to discover EST-SSRs and SNPs using 454 pyrosequencing and de novo assembly. The markers identified from this study can be used in jatropha improvement and breeding programs via genetic linkage map, association/quantitative trait loci (QTL) mapping, map-based gene cloning, comparative genomic study, and marker-assisted selection.

Materials and Methods

Plant Materials

The plant materials used in this study were J. curcas accession ‘CN’ with high PEs and ‘M10’ with low PEs, J. integerrima accessions ‘KL’ with dwarf plant type and ‘KY’ with normal plant type, and one accession each of J. multifida ‘JM’ and J. podagrica ‘JP’. All accessions were maintained in an experimental field of the Department of Agronomy, Faculty of Agriculture at Kamphaeng Saen, Kasetsart University, Thailand.

DNA Extraction

Genomic DNA of each accession was extracted from young leaves using the protocol of Tanya et al. (2011). The quality of total genomic DNA was examined in 0.8 % agarose gel electrophoresis, and total genomic DNA concentration was determined using a NanoDrop 8000 spectrophotometer (Nanodrop Technologies, Wilmington, DE). The DNA was diluted in TE buffer to a concentration of 10 ng/μL for PCR amplification.

RNA Extraction and Library Construction

Total RNA was extracted from apical meristem tissue of all six jatropha accessions using plant RNA purification reagent, Invitrogen® (Life Technologies, Waltham, MA). The mRNA was purified from total RNA with an Absolutely mRNA Purification Kit (Agilent Technologies, Santa Clara, CA). The quality and size of mRNAs were analyzed on an RNA 6000 Pico chip using the Agilent 2100 Bioanalyzer (Agilent Technologies). A 600–800 bp cDNA library was prepared using the GS FLX Titanium Rapid Library Preparation Kits (Roche 454, Branford, CT). The library was quantified using TBS 380 Fluorometer (Turner Biosystems, Sunnyvale, CA) and the concentration of the sample was determined using the Rapid Library Quantitation Calculator (http://www.454.com/my454). Finally, the library quality was assessed by Agilent 2100 Bioanalyzer (Agilent Technologies). The average fragment length ranged between 1400 and 1800 bp.

Pyrosequencing

The emPCR amplification and sequencing reagents were prepared using kits supplied by Roche 454, as described in the manufacturer’s instructions. The sequencing reaction was performed using the 454 Genome Sequencer FLX System (Roche 454).

De novo Assembly and Mining for SSRs and SNPs

De novo sequence assembly was performed to identify all contiguous sequences (contigs) and singletons. The raw sequence reads were processed to remove 454-adapter sequences, poly-A/T, empty reads and low quality sequences (trimming of low quality ends rich in “N”) with SeqClean software (https://sourceforge.net/projects/seqclean). The clean reads were assembled using CAP3 assembler software (Huang and Madan 1999) with default parameters. All contigs and singletons were employed to identify SSR motifs, then the EST-SSR markers were designed using MIcroSAtellite (MISA) software (http://pgrc.ipk-gatersleben.de/misa). The SSRs were classified into 20 repeats for mononucleotides, 9 for dinucleotides, 6 for trinucleotides, 5 for tetranucleotides, and 4 for higher nucleotide numbers. The SNPs were detected by aligning individual reads of two J. curcas accessions, CN and M10 with jatropha genome (http://www.kazusa.or.jp/jatropha) using the gsMapper 2.8 software with 0.1 MAF 10 × coverage.

PCR Amplification of EST-SSRs

A total of 432 EST-SSR primer pairs was designed and used for amplification of DNA from six accessions of jatropha and its related-species. Polymerase chain reaction (PCR) of the EST-SSR primers was performed in a reaction containing 2 μL DNA, 1 μL 10× PCR buffer, 2 μL 1 mM dNTPs, 0.8 μL 25 mM MgCl2, 1 μL 5 pmoL/μl forward and reverse primers, 0.2 μL Taq DNA polymerase (Thermo Scientific), and sterile distilled water in a final volume of 10 μL. PCR amplification was conducted in a PCT-100TM Thermal Controller programmed at 94 °C for 4 min, followed by 35 cycles of denaturing at 94 °C for 1 min, annealing at 55 °C for 30 s, extension at 72 °C for 1 min, and final extension at 72 °C for 5 min. The PCR products were run on a 5 % denaturing polyacrylamide gel and visualized by silver staining (Benbouza et al. 2006).

Annotation and Functional Classification

After assembly, all the contigs of jatropha and its related species were subjected to a BLAST search to identify unique proteins against NCBI non-redundant protein database with a 10−6 e-value cut-off. Ortholog sequences between species were determined using a Venn diagram (http://bioinformatics.psb.ugent.be/webtools/Venn/). The functions of ESTs associated with the polymorphic markers were annotated through the above NCBI protein database using the BLASTX algorithm with a 10−6 e-value cut off (http://www.ncbi.nlm.nih.gov/BLAST). The gene ontology (GO) analysis for functional annotation was performed using BLAST2GO (http://www.blast2go.com/2bglaunch) with default settings. Pathway assignments were carried out according to the Kyoto Encyclopedia of Genes and Genomes (KEGG pathway) (http://www.Genome.jp/keg/pathway.html). To measure the intensity of gene evolution, the genes undergoing purifying and positive selections were identified from the rates of non-synonymous (Ka) and synonymous (Ks) substitutions between CN and M10 as estimated using KaKs-Calculator (Zhang et al. 2006).

Results

Sequence Assembly

Sequencing of the J. curcas (CN and M10), J. integerrima (KL and KY), J. multifida (JM) and J. podagrica (JP) transcriptomes by the 454 Genome Sequencer FLX System resulted in 127,569, 142,994, 71,867, 149,902, 191,943 and 165,822 raw reads (Table 1). After processing the sequencing reads, the number of quality-filtered and trimmed reads suitable for assembly of transcript sequences were 127,094, 142,447, 71,541, 149,392, 191,654 and 165,228 reads covering the total lengths of 45.61 Mb [an average of 359 nucleotides (nt) per read], 54.52 Mb (averaged 383 nt per read), 20.91 Mb (292 nt per read), 48.74 Mb (326 nt per read), 66.72 Mb (348 nt per read) and 73.28 Mb (444 nt per read), respectively (Table 1). De novo assembly for CN produced 11,579 contigs and 15,645 singletons resulted in 27,224 unisequences; while M10 produced 10,964 contigs and 12,069 singletons resulted in 23,033 unisequences. The number of contigs, singletons and unisequences were 4,551, 5,114 and 9,665 for KL, 8,440, 8,299 and 16,739 for KY, 17,444, 13,965 and 31,409 for JM and 14,070, 16,572 and 30,642 for JP, respectively. The total base numbers for all contigs and singletons and their average lengths are shown in Table 1.

Table 1 Assembled data from transcriptomes of four Jatropha spp

The average read lengths of singletons were quite short, ranging between 255 bp in KL and 382 bp in JP. Most of the singletons did not show any homology with the non-redundant protein database in NCBI from the BLASTX search (data not shown), thus we did not compare the results of the BLASTX among different species. Blast search for 22,543 J. curcas contigs, 12,991 J. integerima contigs, 17,444 J. multifida contigs and 14,070 J. podagrica contigs were performed against the NCBI non-redundant protein database using the BLASTX algorithm. The results showed 1,683 unique proteins considered ortholog across all four Jatropha spp. as shown in Venn diagram (Fig. 1).

Fig. 1
figure 1

Venn diagram showing number of detected ortholog genes in four Jatropha spp.

Functions of all Jatropha spp. contigs (gene ontology; GO) was annotated and classified into three main categories, viz. biological process, molecular function, and cellular component (Supplementary Tables S1S3). Furthermore, to increase the understanding of PEs biosynthetic pathways, we searched and found some transcripts coding for enzymes involved in their synthesis. The PEs are synthesized from the basic five-carbon unit isopentyldiphosphate (IPP) and dimethylallyldiphosphate (DMAPP) forming into geranyldiphosphate (GPP; C10) catalyzed by the enzyme prenyltransferase. GPP is polymerized into farnesyldiphosphate (FPP; C15) by the enzyme farnesyl diphosphate synthase (red box in Fig. 2). FPP is then converted into geranylgeranyldiphosphate (GGPP; C20) (Costa et al 2010). Interestingly, a transcript coding for farnesyl diphosphate synthase was specifically found in CN (high PEs) libraries, but not in M10 (low PEs) libraries.

Fig. 2
figure 2

Biosynthesis pathway of phorbol esters (PEs)—a group of diterpene esters. Steroid binding proteins (blue box) involved with the basic five-carbon unit isopentyldiphosphate (IPP), the initial substrate for PE synthesis. Farnesyl diphosphate synthase (red box) was specifically found in Jatropha curcas accession CN libraries

Mining for SSRs and SNPs

All unisequences were used to identify SSR motifs and develop EST-SSR markers. The results showed that 564 unisequences of CN, 586 of M10, 216 of KL, 254 of KY, 493 of JM and 522 of JP are suitable for developing EST-SSR markers. A total of 432 primer pairs flanking SSRs were designed and synthesized.

Candidate SNPs were discovered based on sequence alignments between CN and M10 with the jatropha genome. A total of 20 candidate SNPs were detected in four scaffolds in the region where the QTL for PE biosynthesis is located (King et al. 2013), including Jcr4S00012, Jcr4S000160, Jcr4S01263, and Jcr4S05837 (Table 2). Two SNPs were identified in CN while 18 SNPS identified in M10. Eighteen SNPs were discovered in four coding sequences (CDS) while two SNPs were not in the CDS. Nine SNPs were non-synonymous while the remains were synonymous. A SNP in the gene related to PE biosynthesis was identified in scaffold Jcr4S00160 (Supplementary Fig. 1). This gene codes for steroid binding protein, which may be involved in the PE biosynthesis pathway, because PE belongs to the group of diterpene esters. Steroid binding proteins were asociated with the basic five-carbon unit IPP, the initial substrate for PE synthesis (blue box in Fig. 2).

Table 2 Candidate single nucleotide polymorphisms (SNPs) identified in Jatropha curcas accessions CN and M10

Among the 18 SNPs in four coding sequences, two showed a suitable Ka/Ks ratio (Table 3). A Ka/Ks ratio greater than one indicates a positive selection or higher rate of evolution than a neutral rate. In contrast, a ratio of less than one indicates a purifying selection or lower rate of evolution than the neutral rate. The sequence pairs of aspartic proteinase nepenthesin-1 precursor had a Ka/Ks ratio of 0.3063, indicated a strong purifying selection, causing the majority of protein-coding genes to be conserved over time. However, the sequence of genes involving pathogenesis-related proteins had a Ka/Ks ratio of 1.4631, showing a positive selection. These results are in accordance with previous reports by Roth and Liberles (2006), and Stukenbrock and McDonald (2009), who showed that genes related to pathogenesis are often positively selected for adaptation due to competition with the evolving effector protein of the resistant hosts.

Table 3 Rates of non-synonymous (Ka) and synonymous (Ks) substitutions of each orthologous gene pair between CN and M10

Characterization and Transferability of EST-SSR Markers

A total of 432 EST-SSR primer pairs was designed and tested on the seven accessions of jatropha and jatropha-related species to check for informativeness of the markers. Of these markers, 269 were polymorphic (Supplementary Table 4), 5 were monomorphic, while the remaining 158 showed a smear or weak bands and failed to amplify. The polymorphic markers produced a total of 862 alleles, ranging between 1 and 7 alleles per locus, with an average of 3.20 alleles, as shown in Supplementary Table 5. Among the polymorphic markers, 49 were able to distinguish CN and M10 (Supplementary Table 4), while the number of markers that were polymorphic between jatropha and related species or within related species were shown in Supplementary Table 6.

Gene Ontology and Functional Annotations of EST-SSRs

A functional GO classification of EST-SSRs revealed that, out of the 269 polymorphic EST-SSRs, 203 were annotated and classified into several functional categories. Three main categories were biological processes, molecular functions, and cellular components (Supplementary Table 4). All jatropha-related species (KL, KY, JM and JP) gave most of the annotated transcripts related to molecular functions, biological processes and cellular components, respectively. While both jatropha accessions, CN and M10, showed the highest number of annotated transcripts related to biological processes, molecular functions, and cellular components, respectively.

Discussion

Nowadays, large-scale transcriptome sequencing has become a standard procedure, particularly in economically important crops. The high-throughput 454 pyrosequencing platform is the most widely used NGS technology for de novo sequencing and analysis of transcriptomes in non-model organisms (Natarajan and Parani 2011). The technology generates longer reads (currently about 400 bases), which are more amenable to de novo assembly of data from novel organisms that do not have previous assembled and annotated reference sequences (Kumar and Blaxter 2010). The 454 pyrosequencing technique has been applied to many plants including barley (Wicker et al. 2006), cucumber (Guo et al. 2010), maize (Arreguín et al. 2009), and also in jatropha. Costa et al. (2010) reported 4,622 unisequences that were assembled from transcriptome using non-normalized cDNA libraries from developing and germinating jatropha endosperm. Natarajan et al. (2010) detected 6,361unisequences using a normalized cDNA library from developing seeds, while King et al. (2011) found 29,752 unisequences assembled from a developing seed transcriptome. Sato et al. (2011) obtained 21,225 unisequences from leaf and callus transcriptomes. Additionally, Natarajan and Parani (2011) reported 14,327 unisequences normalized cDNAs prepared from transcriptomes of roots, mature leaves, flowers, developing seeds, and embryos. In this study, we reported the first transcriptomes of apical meristem tissue of two jatropha accessions, showing 27,224 unisequences for CN and 23,033 unisequences for M10. Moreover, this is the first report on transcriptome analysis of jatroha-related species. The de novo assembly generated 9,665 unisequences for J. integerrima (KL), 16,739 for J. integerrima (KY), 31,409 for J. multifida (JM) and 30,642 for J. podagrica (JP). Our data provide an effective tool for generating genomic resources and identifying polymorphic molecular markers for non-model organisms, particularly for plants with low diversity within species such as jatropha. One of the most interesting applications of massive sequencing was the large-scale discovery of genetic variants that can be converted into genetic markers, mainly SSRs and SNPs (Deschamps and Campbell 2010). EST-SSRs allow identification of variability in the transcribed regions of the genome, leading to development of gene-base maps that help accelerate the identification of functional candidate genes, and increase the efficiency of marker-assisted selection. Moreover, EST-SSR markers also show high cross transferability among species or genera (Varshney et al. 2005; Wen et al. 2010). Wen et al. (2010) assessed the EST-SSRs from other Euphorbiaceae members for their cross-taxa transferability to J. curcas. A total of 187 EST-SSR and 54 genomic SSR markers from Manihot esculenta (cassava) were found polymorphic in J. curcas. Our study developed 432 EST-SSR primer pairs from transcriptomes of jatropha and its related-species, and found 269 polymorphic EST-SSR markers showing transferability to four Jatropha species. The large number of EST-SSRs developed here provides a wealth of potential markers that may be useful for studies in population genetics, linkage mapping and comparative genomics. Based on sequence alignments, 20 SNPs between CN and M10 were identified locating in four scaffolds in the region where the QTL for PE biosynthesis located (King et al. 2013). Moreover, a SNP in the scaffold Jcr4S00160 was found in a gene coding for a steroid binding protein that may be associated with PE biosynthesis. Further analysis of the SNPs described here can facilitate identification of QTL and subsequent identification of genes related to PE biosynthesis via map-based cloning strategies.