Introduction

Plants are exposed to numerous stresses, abiotic and biotic. Both of these stresses have substantial impacts on plant growth and productivity. Abiotic stress caused by several factors including supra-optimal (high temperature) or sub-optimal (low temperature), excess water or water deficient, increased salt levels, increased chemicals, increased light incidence, and increased levels of pollutants. Often, stress conditions are compounded in the field due to occurrence of more than one stress at the same time. Among the many abiotic stresses, drought stress is one of the most important environmental stresses affecting the productivity of most crop plants. About 26 % of world’s arable land is affected by drought stress (Blum 1988). The effects of drought on crop plants are complex, variable, and accentuated by a number of interacting factors. Drought delays plant development and affects morphology, as well as physiological processes such as photosynthesis, respiration, and translocation of assimilates (Do et al. 2013). Hence, it is important to understand the effects of drought stress on plants and identify crops tolerant to stressful environments in order to increase crop productivity and to mitigate food crisis without expending cultivated lands.

Plant species have developed diverse strategies to adapt and thrive in all kinds of climates. Drought avoidance and drought tolerance are the strategies by which a crop can minimize the loss in yield under drought stress. Drought avoidance can be achieved through morphological changes in plants, such as decreased stomatal conductance, reduced leaf area, and extensive root systems (Levitt 1980; Budak et al. 2013; Rama Reddy et al. 2014). Drought tolerance is achieved by physiological and molecular mechanisms, including osmotic adjustment, antioxidant, and scavenger compounds (Bartels and Sunkars 2005). These strategies are supported by rich and complex metabolic and gene networks that enable the plant to synthesize a wide range of compounds (Yamaguchi-Shinozaki and Shinozaki 2006; Shanker et al. 2014). Plant responses to drought stress involve interactions and crosstalk between many molecular pathways (Kantar et al. 2011). High-throughput screening techniques such as transcriptome sequencing have been used to study the adaptability of plants to drought (Ergen and Budak 2009; Raney et al. 2014; Thumma et al. 2012; Bowman et al. 2013; do Amara et al. 2016). This has led to identification of many genes related to drought stress (Zhang et al. 2013; Zhou et al. 2012; Akpinar et al. 2013; Dong et al. 2014). However, few natural allelic variants have been cloned for drought-related traits, so transcriptome analysis, quantitative trait loci (QTL), and other trait isolation methods are much essential to improve methodology for exploring drought tolerance which is a complex trait. Elucidation of the complex molecular mechanisms underlying drought tolerance in crops will accelerate the development of new varieties with enhanced drought tolerance.

The Caesalpiniaceae, a subfamily under Fabaceae, is a large family with several economically important species produceing economically important products, such as drugs (Cassia angustifolia, Cassia tora, Cassia italica, Saraca asoca), sour preparation (Tamarindus indica), dyes (Caesalpina sappan), timber (Cassia fistula), and ornamentals (Bauhinia purpuria, Delonix regia, Poinciana pulchrrima). Senna (Cassia angustifolia) (2n = 28) is a drought-tolerant annual undershrub of Caesalpiniaceae (Ayoub 1977; Khalid et al. 2012). It can survive arid environments by maintaining its water content under severe stress conditions. It is one of the important medicinal crops in the world and included in the pharmacopeia of USA, Germany, UK, India, and many other countries mainly for its cathartic properties (Lemli 1986; Folkard 1995). Medicinal properties are due to presence of sennosides in the leaves and pods (Hammouda et al. 2005). Senna is native of Yemen and Hardramaunt province of Saudi Arabia (Abulafatih 1987; Ghazanfar and Al-Sabahi 1993). The species grows in arid areas of Sudan and Egypt (Hammouda et al. 2005) and cultivated commercially in arid parts of Rajasthan, Gujarat, and Tamil Nadu in India.

Senna is a plant suitable to study the genes related to drought tolerance ultimately leading to enhancement of drought tolerance in legume crops through genetic manipulation. Physiological, developmental, and morphological characteristics collectively confer drought stress tolerance in senna (Ratnayaka and Kincaid 2005). These include maintenance of high carbon gain and water efficiency over a wide range of stress levels, drought deciduousness with re-allocation of resources to apical leaves, plasticity of stomatal and trichome numbers on a given leaflet surface, densely deposited epicuticular wax, isobilateral leaf anatomy with large bundle sheath extensions, and paraheliotropic leaflet movement. Also, production of various drought stress-related proteins and enzymes probably aids in the tolerance to drought stress in senna (Ratnayaka and Kincaid 2005). Abiotic stress-related enzymes like glutathione reductase, catalase, superoxide dismutase, and osmoprotectant proline production were enhanced in response to drought stress (Khammari et al. 2012), salt stress (Agarwal and Pandey 2004), and increased heavy metals (Qureshi et al. 2007) in senna.

Abiotic stress-tolerant plants are a potential source of genes for further research and breeding of stress-tolerant plants. Senna is considered as an excellent model plant to study plant adaptation to abiotic stress and have a deeper understanding of genetic control and strategies for adaptation to dry environment in Fabaceae. However, this objective is difficult to achieve due to lack of genomic sequence information of senna. Fortunately, availability of next-generation sequencing (NGS) has made sequencing affordable and reliable and has led to availability of transcriptomes of non-model organisms as well. RNA-Seq is a method of choice for study of drought adaptation and biological features in non-model plants (Hirayama and Shinozaki 2010; Van Eck et al. 2014; Singh et al. 2016). RNA-seq has been used to study abiotic stress response genes in many plants such as parsley (Li et al. 2014a, b), common bean (Hiz et al. 2014), chrysanthemum (Xu et al. 2013), tall fescue (Hu et al. 2014), and grapevine (Rocheta et al. 2014).

In the present study, leaf transcriptome of senna was sequenced by next-generation sequencing (NGS) technology to implicate specific pathways putatively associated with drought stress tolerance. This study for the first time provides a whole dataset of genes expressed in senna along with the glimpse of biochemical pathways in this uniquely drought-tolerant member of Fabaceae and thus will provide a base for understanding the molecular mechanism underlying drought stress tolerance.

Material and methods

Total RNA isolation, library construction, and deep sequencing

Developing leaf samples were collected at flowering stage from Cassia angustifolia (var. Sona) plants grown in the experimental farm at the ICAR-Directorate of Medicinal and Aromatic Plants Research (ICAR-DMAPR), Anand, Gujarat, India, in the year 2013. RNA from the each sample was isolated using RaFlex Total RNA isolation kit (Merck Millipore, MA, USA) by using the standard protocol described by the manufacturer. The yield and quality of RNA were determined using a Nanodrop-8000 spectrophotometer (Thermo Scientific, Wilmington, DE, USA). For RNA library construction and deep sequencing, equal quantity of RNA from young and mature leaves was used. The samples were sequenced using Illumina Miseq platform (Illumina®, San Diego, CA, USA).

Raw data processing

Next-generation sequencing using illumina Miseq generated raw data in FASTQ format. Quality of the raw data was validated based on Phred quality score (Q). This raw data was filtered and trimmed for low-quality score reads; adaptor and primer sequences were removed and reads less than 40 bp was removed using Trimmomatic v0.30 (Bolger et al. 2014). Sequencing data with Phred quality score Q ≥20 was further used assembly of raw data.

De novo assembly

High-quality data obtained after filtration from young and mature leaf samples were pooled to serve as a representative transcriptome and were provided as an input for transcriptome assembly using a Trinity RNA-Sequence assembler (Version 2013) (Grabherr et al. 2011) on optimized parameters and K-mer size set to 25. CLC Genomics workbench (CLC Bio, Boston, MA 02108 USA) was used to validate the assembled transcript contigs by mapping high-quality reads back to the assembled transcript contigs. To identify the coding DNA sequences (CDS) from assembled transcript contigs, an online tool ORF-Predictor (Min et al. 2005) (http://proteomics.ysu.edu/tools/OrfPredictor.html) with the default parameters was used.

Annotation

All the predicted CDS were functionally annotated by aligning to green plant database (txid 33090) of NCBI using basic local alignment search tool (BLASTX) (Altschul et al. 1990) with an E value threshold of 1e−06. Functions of predicted CDS were classified and each CDS was provided with the ontology of defined terms using gene ontology (GO) assignment and mapping. GO terms for all the BLASTX functionally annotated CDS was retrieved using GO mapping. CDS were categorized into 45 functional groups by WEGO analysis which involved sketching a WEGO plot based on GO hits. To retrieve GO terms for annotated CDS, the GO mapping used defined criteria. This included use of (i) BLASTX result accession IDs to retrieve gene names or symbols, (ii) UniProt IDs, and (iii) direct search in the dbxref table of the GO database. Gene names or symbols thus identified were then searched against the species-specific entries of the gene-product tables in GO database. To retrieve UniProt IDs, Protein Information Resource (PIR) was used. PIR includes protein sequence database (PSD), UniProt, SwissProt, TrEMBL, RefSeq, GenPept, and PDB databases. Using GO analysis, all the annotated nodes comprising GO functional groups were specified.

All predicted CDS were annotated against protein database so as to assign putative function of the transcriptome after translation into protein. CDS were searched against the non-redundant protein sequences available in the Uni-ProtKB/SwisProt database using BLASTX with an E value threshold of 1e−06. The CDS were categorized in 24 functional clusters of phylogenetically widespread domain families of proteins by comparing the CDS to Clusters of Orthologus Groups (COG) protein database. For higher-level groupings of related protein families, also known as clans and the identification of domains that occurs within proteins, transcripts were compared against Pfam database. For the identification of transcription factors, predicted CDS were searched against all the transcription factor and protein kinases at Plant transcription factor database (http://planttfdb.cbi.pku.edu.cn/) using BLASTX with an E value cut-off of <1e−05 (Jin et al. 2014).

KEGG mapping

KEGG automatic annotation server (KAAS) (Moriya et al. 2007) was used for ortholog assignment and mapping of CDS to metabolic pathways. BLASTX with threshold bit-score value of 60 (default) was used to search all CDS against the KEGG database. The CDS mapped in 24 different functional KASS pathway categories represented different enzymes involved in different metabolic pathways. KAAS (version 1.6) (http://www.genome.jp/tools/kaas/) with default parameters was used to perform the KEGG orthology (KO) assignment reconstructions. KASS carry out functional annotation of genes by BLAST comparison of CDS against the manually curated KEGG genes database. Thus, CDS were annotated by KEGG Orthology (KO) assignments and also were mapped to various KEGG metabolic pathways.

Mining genes involved in drought stress response

To identify putative CDS/transcripts involved in drought stress response, the literature was searched for functional genes, transcription factors and enzymes involved directly or indirectly in drought stress regulation. Orthologs of such genes were identified in the annotated transcripts and CDS of senna using various databases.

RT-PCR and Sanger sequencing

Sequences of CDS of six genes known to be involved in drought stress tolerance were used to design gene specific primers so as to assess the quality as well as precession of assembly and annotation. cDNA was used as a template for amplification using these primers. Total RNA was extracted from leaf using plant RNA-isolation kit (GeNei, Bangalore, India) according to manufacturer’s protocol. To check the quality of RNA, 8 μl of isolated RNA sample was loaded after adding 2 μl of 5× formaldehyde RNA loading dye (Thermo Scientific) and heat denaturation at 65 °C for 10 min, was loaded on 1 % denaturing agarose gel prepared in 1X MOPS buffer and formaldehyde. Gel electrophoresis was carried out using 1X MOPS buffer prepared in DEPC treated water. First strand cDNA was synthesized using M-MuLV RT-PCR kit (GeNei, Bangalore, India) with oligo-dT primer and 4 μl of isolated RNA as template according to manufacturer’s protocol. Gene specific primers were designed using transcriptome sequence was used to amplify six drought related genes using first strand cDNA as template in a 50 μl system. Touchdown PCR using conditions of 95 °C for 5 min, 95 °C for 15 s, 61 °C for 20 s (−1 °C/cycle), 72 °C for 1 min 30 s, 95 °C for 10 s, 54 °C for 20 s, 72 °C for 30 s repeated for 30 cycles and final extension 72 °C for 20 min was carried out. Amplified product was loaded on 1.5 % agarose gel. Band of amplified product was extracted from gel using Gel extraction kit (GeNei, Bangalore, India) according to manufacturer’s protocol. Extracted amplified products were sequenced using Sanger sequencing (Eurofins Scientific, India). The sequence obtained was compared to predicted CDS sequence. Sequence was BLAST searched against existing sequence on NCBI to confirm the identity of amplified sequence. Sequences were deposited in GeneBank of NCBI.

Results and discussion

Sequencing and assembly

We sequenced two cDNA library from two different stages of leaf development using Illumina Mi-Seq platform. Sequencing data included 2,20,26,329 raw reads containing 6,34,59,92,202 nucleotide bases (Table 1). The raw paired-end sequencing data in FASTQ format was deposited in the National Center for Biotechnology Information (NCBI) BioProject database (as Short Read Archive) under accession number PRJNA273534. A pooled data set was created by combining the reads from two libraries and subjected to de novo assembly. The transcriptome shotgun assembly project was deposited at DDBJ/EMBL/GenBank under the accession GEEB00000000. The version described in this paper is the first version, GEEB01000000. The assembly using Trinity yielded 43,413 non-redundant transcript contigs after filtering out those shorter than 200 bases. The details of the pooled transcriptome are provided in Table 1. The total transcript length was 72,058,285 bases (72.0 Mb), with average transcript length of 1659 bases. The GC content of transcripts was 41.96 %, which was marginally lower than AT content of 58.04 %. The size distribution of transcripts ranged from more than 1000 bp to 3500 bp and above (Table 1), wherein the maximum number of transcripts i.e., 21,192 transcripts had size in the range of 1000–1499 followed by 12,950 transcripts, size of which was in the range of 1500–1999 bp (Figure S1). Maximum contig length was 8715 bp with an average contig length of 1659 bp (Table 1). The N50 of assembly was 1697 bp which is slightly higher than that reported in senna (Rama Reddy et al. 2015) and is higher than most of the recently published plant transcriptome assemblies like Phaseolus vulgaris L. (1449 bp) (Hiz et al. 2014), Salvia hispanica L. (1338 bp) (Sreedhar et al. 2015), Raphanus sativus (773 bp) (Wu et al. 2015), Gentiana rigescens (1384 bp) (Zhang et al. 2015a, b, c), Camelia sinensis (521 bp) (Wang et al. 2016) and Codonopsis pilosula (1243 bp) (Gao et al. 2015) indicating good transcriptome assembly. In medicinal plant Cassia obtusifolia, the transcriptome sequencing of seed using Illumina and its assembly yielded 40,102 unigenes with average length of 681 bases (Liu et al. 2014). The assembly was validated by mapping high quality reads back to the assembled transcript contigs using CLC Bio Genomics workbench, 82.72 % of reads, leaf library was mapped to the transcript thereby suggesting that the assembly was highly valid. However, all but 17.28 % of reads from the leaf transcriptome library were not mapped to transcript contigs, which might be due to the presence of certain very low expressing transcripts, the reads for which might either be partially assembled or left out completely during the assembly process. This might lead to small portion of reads that are not used in the transcript contigs assembly. Identification of coding DNA sequences (CDS) for assembled transcript contigs lead to prediction of a total of 43,413 CDS from the total transcripts. The maximum CDS length was found to be 8691 bp, whereas minimum CDS length was 201 bp (Table 1). The size distribution CDS ranged from 200 bp to 1000 bp and above, wherein, the maximum number of CDS were in range of 1000 and above bp (19,312 CDS) which was followed by 4079 CDS and 4065 CDS in range of 800–899 and 900–900 respectively (Figure S2).

Table 1 Transcriptome reads, assembly, and CDS statistics

Functional annotation

We first annotated the assembled senna unique transcripts through homologous search against green plant database (txid 33090) of NCBI using BLASTX search and threshold E-value as 1e−06. A total of 42,280 CDS (97.39 %) which had significant BLAST hits. Based on BLASTX annotation, top hit species with senna transcriptome showed highest similarity to Glycine max (41 %), followed by Phaseolus vulgaris (16 %), Cicer arietinum (15 %), and Medicago trancatula (5 %) (Fig. 1). Gene ontology (GO) was used for classification of CDS into functional categories by Blast2GO analysis. Function of predicted CDS were classified and each CDS were provided with ontology of defined terms. GO terms were enriched for 26,326 CDS and were grouped into three main domains: Biological process, Molecular function and Cellular component (Table 2 and Fig. 2). For the annotated CDS, a total of 55,318 GO terms were enriched using GO assignment. The number of enriched terms were more than that total number of CDS which is due to overlapping i.e., multiple CDS assigned to one GO term and the single CDS can have multiple GO terms. In biological processes category, 20,776 CDS were enriched, while 21,763 CDs were enriched in molecular functions category and 12,779 CDS were enriched in cellular components category (Table 2). CDS were categorized into 47 functional groups by WEGO analysis which involved making of WEGO plots based on GO hits (Fig. 2). In the biological process category, highest number of CDS were enriched in metabolic process (GO:0008152) (21.77 %) group, followed by cellular process (GO:0009987) (20.12 %) group. In the molecular function category, “catalytic activity” (GO: 0003824) (25.10 %) and “binding activity” (GO: 0005488) (20.10 %) were most abundantly represented. This indicates that the diverse metabolic processes are active in the C. angustifolia leaf, and a variety of metabolites synthesized. Under the cellular component category, the highest number of CDS were associated with “cell” (GO: 0005623) (26.10 %) and “cell part” (GO: 0044464) (21.10 %). Extremely low percentage of genes were classified in terms of “protein tag” (GO: 0031386), “locomotion” (GO: 0040011), “metallochaperone” (GO: 0016530) and “viral reproduction” (GO: 0016032).

Fig. 1
figure 1

BLASTX top hit species distribution of transcript contigs in th leaf transcriptome of senna

Table 2 Distribution of BLAST results of  senna CDS
Fig. 2
figure 2

GO Classification. GO terms were derived based on the similarity search within leaf CDS in the transcriptome of Cassia angustofolia

We searched CDS against the non-redundent protein sequences available in the Uni-ProtKB/SwisProt database using BLASTX with E-value threshold of 1e−06 and their putative protein functions were predicted. A total of 33,256 (76.6 %) CDS showed significant hits thus indicated overall gene conservation across species. In addition, many CDS were annotated as unknown, hypothetical and expressed proteins as the CDS showed homology to uncharacterised proteins.

A total of 17,966 CDS were categorized in 24 functional clusters of phylogenetically wide spread domain families of proteins by comparing the CDS to the Clusters of Orthologous Groups (COG) protein database (Fig. 3). The highest number of CDS were represented under “General Functional Prediction only [R]” followed by “Secondary Structure [O]” and “Carbohydrate metabolism and transport [G].” The lowest number of CDS was represented under “Cell motility [N]” and “Cytoskeleton [Y].”

Fig. 3
figure 3

Clusters of orthologous groups (COG) functional classification of CDS predicted in the senna transcriptome

InterProscan was used to see protein similarity at domain level. Transcripts were annotated against Pfam domains and 37,872 transcripts were annotated (Fig. 4). The pentatricopeptide repeat (PPR) (PF01535.15) domain represented the most (4880 transcripts) which was followed by PPR_3 (PF13812.1) (4526 transcripts), PPR_2 (PF13041.1) (4293 transcripts), and PPR_1 (PF12854.2) (3985 transcripts). Other domains frequently represented in the leaf library include LRR_1 (PF00560.28) (3705 transcripts), LRR_6 (PF13516.1) (3469 transcripts) TPR_14 (PF13428.1) (3382 transcripts), LRR_7 (PF13504.1) (3307 transcripts), WD40 (PF00400.27) (3203 transcripts), and LRR_4 (PF12799.2) (2952 transcripts) in the transcripts indicating strong signal transduction mechanisms.

Fig. 4
figure 4

Top 10 Pfam domains represented in InterProScan transcript annotations of the Cassia angustifolia leaf transcriptome

Fig. 5
figure 5

Transcript annotated to different protein kinases family in the transcriptome of senna

KEGG pathway mapping

KEGG automatic annotation server (KAAS) was used for ortholog assignment and mapping of CDS to metabolic pathways. BLASTX with threshold bit-score value of 60 (default) was used to search all CDS against the KEGG database. The CDS mapped in 24 different functional KASS pathway categories represented different enzymes involved in metabolic pathways. A total of 8250 CDS were enriched in functional KASS pathway categories (Table 3). All CDS were assigned to 191 KEGG pathways (Table S1). The highest number of CDS was represented under Translation (846) and Carbohydrate metabolism (744) pathway category indicating that many active metabolic processes occurred in senna leaves, while the least number of CDS was represented under “signal molecules and interaction” and membrane transport. Under environmental adaptation category, 201 CDS were represented.

Table 3 KEGG categories of CDS in the leaf transcriptome

Genes encoding enzymes involved in drought stress response

Drought stress triggers a wide variety of plant responses, including alterations in gene expression, the accumulation of secondary metabolites or osmotically active compounds, and the synthesis of specific proteins and others (Ramchandra Reddy et al. 2004; Ergen et al. 2009). The ecotypic expression or suppression of regulatory genes could potentially activate multiple mechanisms of drought tolerance. Signaling factors, protein-modifying/degrading enzymes, biosynthesis of phytohormones (abscisic acid, ethylene, jasmonate, salicylic acid, brassinosteroid, gibberellic acid, and nitric oxide), phytohormone signaling (ABA, abscisic acid, ethylene, jasmonate, salicylic acid, brassinosteroid, gibberellic acid, and auxin), biosynthesis of osmotically active compounds (proline, glycin betaine, trehalose, sorbitol, mannitol, and galactinol), synthesis of free radical scavengers (catalase, peroxidase, antioxidant enymes, glutathionine, ascorbic acid, and phytochelatin), chlorophyll biosynthesis and degradation, leaf cuticular wax biosynthesis, polyamine biosynthesis, protective proteins (heat shock proteins, LEA proteins, chaprones, osmatin, and aquaporin), and others have been exploited for engineering drought tolerance in plants (Hu and Xiong 2014; Umezawa et al. 2006).

Drought signaling factors in senna

Plants have developed complex and efficient signal transduction networks to cope with the continual challenges of an unfavorable environment especially during drought stress. Protein kinases, transcription factors, and others such as cyclin-dependent protein kinase, ATPase/hydrogen-translocating pyrophosphatase (AVP1), and ABA catabolism (Chl-NADP-ME) are major signaling factors involved in drought stress signaling (Akpinar et al. 2012).

Protein kinases

Protein kinases play essential roles in developmental and environmental signal transduction in plants (Rodriguez et al. 2010; Liu et al. 2016). Protein kinases activate transcription factors and drought-responsive proteins through post-transcriptional modification, thus important candidates for improving drought tolerance. Mitogen-activated protein kinase (MAPK) cascades, calcineurin B-like protein-interacting protein kinase (CIPK), calcium-dependent protein kinase (CDPK or CPK), receptor-like kinases, and others have roles in drought stress signaling and regulation pathways. Genes encoding various PK have been identified in plants as they are involved in stress signaling (Long et al. 2014; Yang et al. 2008; Wei et al. 2014; Zhao et al. 2013). In the present study, BLASTX search of senna transcripts against protein kinase family database (http://bioinfo.bti.cornell.edu/tool/itak/) resulted in 2060 PKs which were classified into 41 known PK families based the kinase domins (Table 4 , Fig. 5). The most abundant group of protein kinases was leucine-rich repeat kinase family (346; 17 %), receptor like cytoplasmic kinase family (316; 15 %), domain of unknown function 26 (DUF26) kinase (126; 6 %), CDC2 like kinase family (114; 6 %), GmPK6/AtMRK1 family (109; 5 %), and IRE/NPH/PI dependent/S6 kinase (104; 5 %) in senna. Genes from these families have been reported to play significant roles in plant responses to drought stress in plants (Hu and Xiong 2014; Sun et al. 2015a, b). For example, there were CDS detected for MAPK cascades (74) and CDPK (84), receptor-like kinases (42) involved in drought signaling forms important candidates for improvement of drought tolerance (Hu and Xiong 2014). Other PKs were present in lower number.

Table 4 Putative genes encoding signaling factors and protein modifying/degrading enzymes involved in drought stress response identified in the leaf transcriptome of senna

Transcription factors

Transcription factors (TFs) are the important upstream regulatory proteins and play critical roles in various plant developmental processes and plant responses to abiotic and biotic stresses. Genes encoding various TF family members were identified as involved in regulating drought tolerance and are promising in improving the drought tolerance in plants (Ciftci-Yilmaz and Mittler 2008; Fang et al. 2008; Lucas et al. 2011a, b; Sun et al. 2016). The BLASTX search of senna transcripts against Plant Transcription Factor database resulted in 10,833 transcription factors which were classified into 78 known transcription factor families based on their DNA binding domains. The largest group of transcription factors included was the Orphan family (263, 2.40 %), followed by bHLH (235, 2.10 %), C3H (193, 1.78 %), and C2C2 Family (190, 1.75 %) (Table 4). Genes from these families have been reported to play significant roles in plant responses to drought stress in plants (Hu and Xiong 2014). NAM-ATAF-CUC2 (NAC) is a plant-specific TF family with a highly conserved DNA-binding domain, and many genes belonging to this family are responsive to drought stress (Fang et al. 2008). There were 183 CDS encoding NAC TF family (Table 4). Over expression of SNAC1 a NAC family TF, in rice and wheat showed improved drought resistance (Nakashima et al. 2012). Similarly, CDS for TF families such as MYB-related (Xiong et al. 2014), MYB (Su et al. 2014), MYC (Abe et al. 1997), C2H2 type zinc Finger (Shi et al. 2014), bZIP (Zhong et al. 2015), HB (Jain et al. 2008), WRKY (Ren et al. 2010; Van Eck et al. 2014), AP2-EREBP (Liu et al. 1998; Wu et al. 2016),CCAAT/NF-Y (Li et al. 2008), HSF (Li et al. 2014a, 2014b), CAMTA (Pandey et al. 2013), and zf-HD (Tran et al. 2007) were identified which have been reported to play important roles in plant responses to drought stress, and were also highly abundant in our transcriptome dataset. CDS encoding other signaling factors such as Cyclin-dependent protein kinase (95), ATPase/hydrogen-translocating pyrophosphatase (AVP1) (5), and ABA catabolism (Chl-NADP-ME) (65) were also identified in the senna leaf transcriptome.

Protein-modifying/degrading enzymes

Protein modifications and degrading enzymes play important roles in ABA signaling, hence used for engineering drought tolerance in crop plants. The cytosolic enzymes viz., E3 ubiquitin-protein ligase, ubiquitin-conjugating enzyme E2, and ubiquitin-activating enzyme E1 play an important role in degradation of ubiquitinated proteins, are demonstrated to function as a positive regulator of ABA-dependent response to drought stress (Ryu et al. 2010). Genes encoding various protein-modifying/degrading enzymes have been identified in plants (Gao et al. 2011; Kuzuoglu-Ozturk et al. 2012; Song et al. 2016) and were use in improving drought tolerance in plants (Ning et al. 2011; Park et al. 2010). In the present study, CDS encoding for E3 ubiquitin-protein ligase (609), ubiquitin-conjugating enzyme E2 (57), and ubiquitin-activating enzyme E1 (14) were identified through BLASTX search which forms important resource to understand the drought adaptation in senna (Table 4). Farnesylation is a posttranslational modification in which a farnesyl group is added to inactive target proteins so that they are targeted to membranes. Suppressing the rice farnesyltransferase SQS by RNA interference greatly enhanced drought resistance (Manavalan et al. 2012). There were nine CDS encoding farnesyltransferase/squalene synthase in senna which may be exploited for improving drought tolerance in crop plants. Ski-Interacting protein/SNW domain-containing protein and ribosome-inactivating proteins are involved in drought signaling (Lim et al. 2010; Jiang et al. 2012). There were six CDS encoding Ski-Interacting protein/SNW domain-containing protein in senna which form additional candidates for developing drought-tolerant transgenic plants.

Biosynthesis of phytohormone

Plant hormones play a major role in abiotic stress response in plants (Khan et al. 2012) and regulate developmental processes and signaling networks under abiotic stress. In plants, accumulation of abscisic acid (ABA) plays an important role in drought stress signaling and transduction pathways, mediating many responses. Plants perceive and respond adaptively to drought stress controlled mainly by the phytohormone abscisic acid (ABA). Other plant hormones involved in drought stress response include ethylene, jasomnate and methyl jasmonate, salicylic acid, brasinosteroids, gibberlic acid, and nitric oxide.

ABA biosynthesis

ABA is synthesized and accumulated in guard cells where it triggers stomatal closer under drought (Schroeder et al. 2001). ABA is synthesized from the C40 carotenoid precursor beta-carotene in plants (Cutler and Krochko 1999). In senna, 38 CDS encoding eight enzymes involved in ABA biosynthesis, i.e., beta-carotene 3-hydroxylase, beta-ring hydroxylase, zeaxanthin epoxidase, violaxanthin de-epoxidase, 9-cis-epoxycarotenoid dioxygenase, xanthoxin dehydrogenase, abscisic-aldehyde oxidase, and abscisic acid 8′-hydroxylase were identified (Table 5 and Fig. 6). Beta-carotene 3-hydroxylase (BCH) has been shown to be critical for drought tolerance and oxidative stress (Du et al. 2010). There were the CDS encoding BCH in senna. Recently, a key ABA biosynthesis gene ABA8H encoding abscisic acid 8′-hydroxylase plays a critical role in regulating ABA levels during seed imbibition and dehydration stress (Saito et al. 2004; Kushiro et al. 2004; Xu et al. 2014a, b). There were seven CDS encoding abscisic acid 8′-hydroxylase in senna, which form important resource for improvement of drought tolerance.

Table 5 Putative genes encoding enzymes involved in biosynthesis of phytohormones identifed in the senna leaf transcriptome
Fig. 6
figure 6

ABA biosynthetic pathway in senna (numbers in brackets represent number of CDS)

Ethylene biosynthesis

Under drought conditions, ethylene causes leaf abscission and consequently reduced water loss (Zhu et al. 2011). Several studies in model plants have evaluated the importance of ethylene hormone in crosstalk signaling with different metabolic pathways, in addition to responses to biotic stresses (Arraes et al. 2015; Shang et al. 2014; Kantar et al. 2011). Ethylene is derived from the amino acid methionine provided by the Yang cycle (Roje 2006) in which the precursor S-adenosylmethionine (AdoMet or SAM) is synthesized from ATP and methionine by S-adenosylmethionine synthetase (SAMS; EC 2.5.1.6). AdoMet is then converted into 1-aminocyclopropane-1-carboxylic acid (ACC) and 5-methylthioadenosine (MTA) by the enzyme 1-aminocyclopropane-1-carboxylase synthase (ACS, EC 4.4.1.14). MTA is recycled through a series of Yang cycle reactions back to methionine (Argueso et al. 2007). In the present study, 49 CDS encoding for three enzymes involved in ethylene biosynthesis namely S-adenosylmethionine synthetase (8), 1-aminocyclopropane-1-carboxylate synthase (ACS) (2), and aminocyclopropanecarboxylate oxidase (39) were identified (Table 5) thus providing important candidates for engineering drought tolerance in crops.

Jasmonate and methyl-jasmonate biosynthesis

Plant responses to abiotic stresses particularly drought stress are orchestrated locally and systemically by signaling molecules known as the jasmonates (JAs), a class of polyunsaturated fatty acid-derived phytohormones (Turner et al. 2002). The biosynthesis of JA initiates in chloroplasts, involving the release of α-linolenic acid (α-LeA, 18:3 or 18:2) from the lipid membrane by phospholipases (PLDs). In this study, from the leaf transcriptome of senna, 71 CDS encoding for 10 enzymes involved in jasmonate and methyl-jasmonate biosynthesis, i.e., phospholipase A2 (3), lipoxygenase (24), hydroperoxide dehydratase (5), allene oxide cyclase (4), 12-oxophytodienoic acid reductase (4), OPC-8:0 CoA ligase (3), acyl-CoA oxidase (9), enoyl-CoA hydratase/3-hydroxyacyl-CoA dehydrogenase (13), acetyl-CoA acyltransferase (4), and jasmonate O-methyltransferase (2) were identified (Table 5) which will form on invaluable resource to understand the JA pathway which in turn lead to practical biotechnological applications in plants.

Salicylic acid biosynthesis

Salicylic acid (SA) is a seven carbon-containing, naturally occurring phenolic compound and endogenously synthesized signaling molecule in plants (Wang et al. 2015). The most common pathway in plants for SA synthesis is phenylalanine pathway; however, SA biosynthesis may also be accomplished by isochorismate pathway (Kawano et al. 2004; Mustafa et al. 2009). SA is produced after a series of chemical reactions catalyzed by many enzymes. There were 21 CDS identified encoding for three enzymes involved in salicylic acid biosynthesis namely isochorismate synthase (9), phenylalanine ammonia-lyase (7), and trans-cinnamate 4-monooxygenase (5) in our dataset (Table 5). We did not find CDS for benzoic acid 2-hydroxylase involved in catalyzing the biosynthesis of salicylic acid from benzoic acid. This might be due to the presence of very low levels of transcripts in the sample or possibility of being a key enzyme, and thus, its expression might be tightly regulated and thus absence of the transcript. Similarly, the CDS encoding enzymes involved in biosynthesis of brassinosteroids, gibbarlic acid, and nitric oxide were identified (Table 5) in the leaf transcriptome of senna.

Plant hormone signaling in senna

Plant hormones regulate plant responses under biotic and abiotic stresses through specific signaling networks (Kohli et al. 2013). Plant hormones such as abscisic acid, ethylene, jasmonic acid, salicyclic acid, brassinosetroids, gibbarlic acid, and auxin were studied for their role in abiotic stress responses (Peleg and Blumwald 2011; Santner et al. 2009). Putative plant hormone signaling genes detected in senna during the drought response are listed in Table 6. Abscisic acid (ABA), acts as an endogenous messenger in the regulation of the plant’s water status (Swamy and Smith 1999). ABA-dependent stress signaling involves many enzymes. The soluble PYR/PYL/RCAR receptors function at the apex of a negative regulatory pathway to directly regulate PP2C phosphatases, which in turn directly regulate SnRK2 kinases (Cutler et al. 2010). A maximum number of CDS (253) encoding for protein phosphatase 2C PP2C, a negative regulator of ABA response in plants, were detected in senna. Similarly, CDS for abscisic acid receptor PYR/PYL family (11), serine/threonine-protein kinase SRK2 SNRK2 (25), and ABA responsive element binding factor ABF (21) were also detected in the leaf transcriptome of senna which forms candidates for understanding and improvement of drought stress in plants. Ethylene, a gaseous plant hormone, regulates growth and development as well as responses to biotic and abiotic stresses in plants (Bleecker and Kende 2000). Over the last few decades, key elements involved in ethylene signal transduction have been identified in plants (Shakeel et al. 2013). Almost all of the ethylene-signaling homologous members was detected in the C. agustifolia transcriptome. Serine/threonine-protein kinase CTR1 was well-known for mediating stress responses and development in plants (Xu et al. 2014a). JAs also regulate such diverse processes as pollen maturation and wound responses in Arabidopsis. Several core factors in the JA signaling pathway are listed in Table 6. MYC transcription factors and Coronatine-insensitive protein 1 (COI1) are key regulator of genes involved in wound- and methyl jasmonate-induced secondary metabolism, defense, and hormone interactions (Devoto et al. 2005 and Zhang et al. 2015a, b, c). In the salicyclic acid signaling pathway, NPR1 (non-specific disease resistance 1) is a key regulator in SA-dependent defense signaling (Boatwright and Pajerowska-Mukhtar 2013). Similarly, WRKY and TGA play major roles as transcriptional regulators in the SA pathway. We detected three NPR1-related proteins and 22 TGAs in C. angustifolia, which might function in the SA signaling pathway in C. angustifolia. Similarly, brassinosteroids (BRs) are growth-promoting steroid hormones that regulate diverse physiological processes in plants. The BR signal is transduced by a receptor kinase-mediated signal transduction pathway, which is distinct from animal steroid-signaling systems. CDS for majority of the BR signaling homologous members were detected in the C. angustifolia transcriptome. We detected 14 gibberellin receptor GIBBERELLIN INSENSITIVE DWARF1 (GID1), nine DELLA growth inhibitors (DELLAs), two F-box proteins (GID2), and 13 phytochrome-interacting factor 3, which play important roles in GA signaling pathways (Daviere and Achard 2013; Richards et al. 2001) in senna. Auxin regulates transcription by rapidly modulating levels of Aux/IAA proteins throughout development. Auxin binds to TIR1, the F-box subunit of the ubiquitin ligase complex SCF (TIR1), and stabilizes the interaction between TIR1 and Aux/IAA substrates (Mockaitis and Estelle 2008). The auxin response factor (ARF) family contains transcription factors that bind to auxin-responsive elements (AREs) in the promoters of primary auxin-responsive genes (Sun et al. 2015a, b; Wang et al. 2010). All of the major auxin signaling factors are found in the senna transcriptome (Table 6) which will enrich our knowledge on molecular basis of phytohormone signaling in plants during drought stress.

Table 6 Putative genes encoding plant hormone signaling, osmotic adjustment, and free radical scavengers detected in the leaf transcriptome of senna

Osmolytes in drought stress response

Plants respond to stress by production of low molecular weight, non-toxic solutes which are known as osmolytes or osmoprotectants. Osmolytes stabilize cell proteins and structures under stress. The major osmolytes accumulated by plants under abiotic stress include proline, glycine betaine (GB), trehalose, sorbitol and mannitol and galactinol. Plants accumulate proline which functions as an osmolyte to stabilize cell proteins and structures under drought stress. Proline is considered as a scavenger of free radicals, an energy sink, and a stress-regulated signal (Seki et al. 2007). Tissue-specific proline synthesis and catabolism have been found promoting growth and maintain a higher NADP/NADPH ratio at lower water potential (Sharma et al. 2011; Zhang et al. 2015a, b, c). In the present study, there were 17 CDS encoding for two enzymes involved in proline biosynthesis of which 14 were encoding delta-1-pyrroline-5-carboxylate synthetase (P5CS) and three encoding pyrroline-5-carboxylate reductase (P5CR) enzymes (Table 6 and Fig. 7). Similarly, for the biosynthesis of glycine betaine, four CDS encoding for two enzymes namely choline monooxygenase (one CDS) and betaine-aldehyde dehydrogenase (three CDS) were identified. In addition, we also found unique transcripts in the trehalose, sorbitol, and mannitol and galactinol biosynthesis in senna. The accumulation of these osmolytes could be critical to improve stress tolerance especially osmotic stress of senna.

Fig. 7
figure 7

Proline biosynthetic pathway in senna (numbers in brackets represent number of CDS)

Analysis of genes for free radical scavengers in senna

Plants, when subjected to drought stress, increase the production of reactive oxygen species (ROS) such as superoxide radical, hydrogen peroxide, and hydroxyl radical. To protect cells and subcellular systems from the effect of these ROS, plants produce free radical scavengers (D’Autréaux and Toledano 2007; Kumari et al. 2015). These include enzymes such as superoxide dismutase (SOD), catalase (CAT) and peroxidase (APX), and non-enzyme compounds such as ascorbic acid, phytochelatin, and glutathione. In the present study, ten CDS encoding for superoxide dismutase (SOD), two CDS encoding for catalase (CAT), and 29 CDS encoding for peroxidase (APX) were identified. Glutathione reductase (GR, EC 1.6.4.2) and tripeptide glutathione (GSH, γ-glutamyl-cysteinyl-glycine) are two major components of the ascorbate-glutathione (AsA-GSH) pathway which play a significant role in protecting cells against ROS and its reaction products-accrued potential anomalies (Gill et al. 2013). There were CDS encoding for glutathione S-transferase (48), glutathione reductase (9), glutathione peroxidase (5), glutamate cysteine ligase (2), and glutathione synthase (2) in the transcriptome of senna which are involved in the biosynthesis of glutathione, a ubiquitous intracellular peptide with diverse functions (Table 6). l-Ascorbic acid (vitamin C) is a major antioxidant in plants and plays a significant role in mitigation of excessive cellular reactive oxygen species activities caused by a number of abiotic stresses (Venkatesh and Park 2014). There were 75 CDS encoding for eight enzymes involved in ascorbic acid biosynthesis which were identified namely GDP-d-mannose 3′, 5′-epimerase, GDP-l-galactose phosphorylase, inositol-phosphate phosphatase/l-galactose 1-phosphate phosphatase, l-galactose dehydrogenase, l-galactono-1,4-lactone dehydrogenase, l-ascorbate peroxidase, monodehydroascorbate reductase, and l-ascorbate oxidase. We also found nine CDS encoding for phytochelatin synthase also known as glutathione gamma-glutamylcysteinyltransferase which is responsible for phytochelatin synthesis in senna which suggests drought stress may trigger the complex antioxidant network, and finely tuned ROS accumulation to facilitate appropriate signaling functions (Munné-Bosch et al. 2013).

Chlorophyll metabolism

Delayed leaf senescence or stay green is an important drought adaptation in crop plants (Rama Reddy et al. 2014). The stay-green trait reflects impaired or delayed chlorophyll catabolism (Thomas and Ougham 2014). Plants are engineered to overproduce chlorophyll—for example by overexpression of the gene encoding chlorophyllide a oxygenase (Kusaba et al. 2013). In chlorophyll metabolism, there are 15 enzymes catalyzing chlorophyll biosynthesis and five enzymes catalyzing its degradation and the genes coding these enzymes have been cloned from this model plant (Beale 2005; Hortensteiner 2006). We searched the orthologs of these genes in the leaf transcriptome of senna and found CDS for all the enzymes except ferredoxin:protochlorophyllide reductase involved in the chlorophyll biosynthesis. Similarly, out of five enzymes, CDS were identified for four enzymes viz., chlorophyll(ide) b reductase (11), chlorophyllase (6), Mg-dechelatase, pheophorbide a oxygenase (7), red chlorophyll catabolite reductase (16) involved in chlorophyll degradation (Table 7 and Fig. 8) and form important candidates to understand chlorophyll metabolism in senna.

Table 7 Putative genes encoding enzymes involved in chlorophyll metabolism and leaf cuticular wax biosynthesis
Fig. 8
figure 8

Chlorophyll metabolism pathway under stress in senna (numbers in brackets indicate number of CDS encoding the enzyme in the pathway

Gene involved in cuticular wax biosynthesis

Cuticular wax covers outer organs of plants and functions as the outermost barrier against non-stomatal water loss and UV light. Cuticular waxes are composed of very-long-chain fatty acids (VLCFAs) and their derivatives, such as aldehydes, alkanes, esters, and primary and secondary alcohols. Many genes involved in cuticular wax biosynthesis and export have been characterized by forward and reverse genetic approaches in plants (Hooker et al. 2002; Kim et al. 2013) and their role in conferring water stress tolerance has been reported (Zhou et al. 2015; Krugman et al. 2010). There were 51 CDS encoding for three enzymes of leaf cuticular wax biosynthesis namely fatty acyl-CoA reductase (6), aldehyde decarbonylase (6), and diacylglycerol O-acyltransferase (39) which were identified which form important candidates for improvement of drought tolerance in senna (Table 7).

Genes involved in biosynthesis of polyamines

Polyamines (PAs) (putrescine, spermidine, and spermine) are a group of phytohormone-like aliphatic amine natural compounds with aliphatic nitrogen structure involved in cell growth and development, and respond to stress tolerance to various environmental factors (Gill and Tuteja 2010). We explored the gene encoding the enzymes in the production of polyamines and found that there were 64 CDS encoding for eight enzymes, arginine dacarboxylase (2), agmatine deiminase (2), S-adenosylmethionine decarboxylase (8), spermidine synthase (15), spermine oxidase (2), polyamine oxidase (25), and thermospermine synthase (8) regulating the production of PAs in senna (Table 8). However, none were found for agmatinase and N-carbamoylputrescine amidase may due to presence very low levels of transcripts in the transcriptome or possibility of being a key enzyme and thus its expression might be tightly regulated.

Table 8 Putative genes encoding enzymes involved in biosynthesis of polyamine and protective proteins

Protective proteins and transporters

Protective proteins (PP) include heat shock proteins (HSPs), late embryogenesis abundant proteins (LEA), chaperones, osmatin and aquaporins produced in response to drought stress in plants. HSPs are the family of proteins produced in response to stress (Song et al. 2014; Leng et al. 2015). HSPS protect cells from injury and facilitate recovery and survival after a return to normal growth conditions. HSPs function as chaperones involved in protein folding, assembly, translocation, and degradation, and also help to stabilize proteins and membranes (Boston et al. 1996; Lucas et al. 2011a, b). The well-characterized HSPs belong to the HSP70 family. We explored the senna transcriptome and found 79 CDS putatively encoding for heat shock protein. However, the role of these HSPs needs further study. We also identified many CDS encoding protective proteins such as LEA proteins (6), cheperones (236), osmatin (1), and aquaporins (34) and transporter such as Na(+)/H(+) antiporter (17) and auxin efflux carrier (18) (Table 8). Constitutive overexpression of these candidates genes could improve defense against drought stress in plants.

Validation of transcriptome

Since transcript assembly needs to be validated and as our interest was in drought stress metabolism, we choose to validate using six genes involved in drought stress (Table 9). The transcripts of all the six abiotic stress genes, viz., transcription factor MYC2, 9-cis-epoxycarotenoid dioxygenase, l-ascorbate peroxidase, aminocyclopropanecarboxylate oxidase, (+)-abscisic acid 8′-hydroxylase, and WRKY transcription factor were confirmed by reverse transcription polymerase chain reaction (RT-PCR), as was observed in an agarose gel. The amplicon sizes matched with the expected size of the gene based on assembled transcripts (Fig. 9). Transcription factor MYC2, 9-cis-epoxycarotenoid dioxygenase, l-ascorbate peroxidase, aminocyclopropanecarboxylate oxidase, (+)-abscisic acid 8′-hydroxylase, and WRKY transcription factor PCR products were further Sanger sequenced for confirmation, giving ∼100 % identity with the assembled transcriptome sequences (File S1). This is the first time that these genes have been identified in Cassia angustifolia. The sequences have been deposited at NCBI as mRNA sequences.

Table 9 List of gene specific primers used for validation of assembly
Fig. 9
figure 9

Detection and validation of different drought-related genes in Cassia angustifolia. cDNA was used as template to amplify drought stress-related genes. The amplified fragments were analyzed by 1.5 % agarose gel electrophoresis. DNA ladder containing 14 bands (from down 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 1200, 1500, 2000, 3000; O’gene ruler 100 bp plus DNA ladder SM1153) was used. Lane L: DNA ladder; lane 1: WrkyIDCa01 (∼1200 bp); lane 2: MycIDCa02 (∼1500 bp); lane 3: NcedIDCa02 (∼1300 bp); lane 4: Aba8hIDCa02 (∼1100 bp); lane 5: ApxIDCa01 (∼1400 bp); lane 6: AccoIDCa01 (∼600 bp)

Conclusions

In this study, we performed large-scale transcriptome sequencing of senna, an important drought-tolerant herb cultivated in arid and semi-arid tropics. Insufficient transcriptomic and genomic data in public databases has limited our understanding of the molecular mechanism underlying the drought stress tolerance of senna. More than 200 million reads were generated and assembled into 43,413 unique transcripts which were further extensively annotated by comparing sequences with different databases. Coding DNA sequences (CDS) encoding various drought stress-regulated pathways such as signaling factors, protein-modifying/degrading enzymes, biosynthesis of phytohormone, phytohormone signaling, osmotically active compounds, free radical scavengers, chlorophyll metabolism, leaf cuticular wax, polyamines, and protective proteins were identified through BLASTX search. Six genes (transcription factor MYC2, 9-cis-epoxycarotenoid dioxygenase, l-ascorbate peroxidase, aminocyclopropane carboxylate oxidase, (+)-abscisic acid 8′-hydroxylase, and WRKY transcription factor) encoding enzymes involved in drought stress regulation were confirmed through RT-PCR and Sanger sequencing for the first time in senna. The potential drought stress-related transcripts identified in this study provide a good start for further investigation into the drought adaptation in senna. Additionally, our transcriptome sequences can be a valuable resource for accelerated genomics-assisted genetic improvement programs and facilitate a better understanding and more effective manipulation of biochemical pathways for developing drought-tolerant crop plants.