Introduction

As an important oil and economic crop, peanut is cultured all over the world, especially in semiarid zones. In China, except for the main producing provinces, peanut is commonly planted in relatively dry land, which is difficult to irrigate, while more suitable land is reserved for other staple grains. Drought is one of the main limiting factors for peanut yield, especially in the (1) germination and young seedling stages, which determine plant survival and health, and (2) pod development and mature stages, which affect the final production. Furthermore, drought may easily aggravate Aspergillus flavus contamination, as well as infection with other diseases and insect pests (Collino et al. 2001). Therefore, it is vital to understand the mechanisms of drought tolerance in peanut and identifying tolerant germplasms.

Several studies assessing peanut drought tolerance have been reported. For instance, water deficit was shown to induce the expression of stress-responsive genes, including phospholipase Dα (PLDα), cysteine protease (CP), serine protease (SP) and late embryogenesis abundant (LEA) protein (Dramé et al. 2007; Govind et al. 2009). In addition, Arachis hypogaea serine-rich protein (AhSrp) and A. hypogaea leucine-rich protein (AhLrp) were identified using rapid amplification of cDNA ends (RACE) (Devaiah et al. 2007). Furthermore, PLD1 and PLD2, which belong to the PLDα family, were shown to be involved in drought response and upregulated in A. hypogaea (Nakazawa et al. 2006). Two ethylene-responsive factor (ERF) genes with abiotic stress response have been cloned recently (Wan et al. 2014).

Several homologous/heterogeneous expression analyses have been carried out. N-Acetyl-Cysteine (NAC) from A. hypogaea was shown to enhance drought tolerance in Arabidopsis (Liu et al. 2011) and tobacco (Liu et al. 2013). An important transcriptional factor in peanut, abscisic acid-responsive element binding protein (AREB), was confirmed to modulate reactive oxygen species (ROS) scavenging and maintain endogenous abscisic acid (ABA) content in Arabidopsis, thereby positively affecting drought stress tolerance (Li et al. 2013). Heterogenous genes, including DNA helicase (PDH45) in pea Pisum sativum (Manjulatha et al. 2014), NAC in horsegram Macrotyloma uniflorum (Pandurangaiah et al. 2014), and DREB in Arabidopsis (Sarkar et al. 2014) could all enhance the drought tolerance in peanut.

The peanut genome still lacks detailed information and little has been done in quantitative trait locys (QTL) location of drought tolerance (Ravi et al. 2010; Gautami et al. 2011). Considering the size (~4400 Mb) and the high repetition (~70 %) of the genome, high-throughput sequencing, especially transcriptome sequencing (RNA-Seq), is a convenient technique for the study of general gene expression changes occurring at a given growth stage or as a result of a biological response. Transcriptome data, combined with annotations based on bioinformatics, constitute a valuable tool for gene identification and pathway construction. To date, several transcriptome studies on peanut have been carried out, including on wild Arachis species under drought and fungi infection (Guimaraes et al. 2012), the Spanish botanical type A. hypogaea L. (Wu et al. 2013), the seed development and oil accumulation stage of A. hypogaea L. (Zhang et al. 2012; Xia et al. 2013; Yin et al. 2013), and the different patterns between aerial and subterranean young pods (Chen et al. 2013; Zhu et al. 2014).

One recent study found that many genes could be rapidly induced in response to water deficit with or without ABA pretreatment through RNA-Seq, indicating the importance of the ABA-dependent pathway in drought tolerance of peanut (Li et al. 2014).

In this study, we identified several water deficiency tolerant peanut germplasms based on physiological properties in the seedling stage. One of the germplasms was further characterized by transcriptome sequencing. As well as describing the RNA-Seq data, we focused on transcripts putatively involved in drought-related pathways. These findings provide important insights into drought tolerance and useful data for future gene investigations.

Materials and methods

Screening for water deficiency tolerant germplasms

Fresh peanut seeds were harvested, dried and kept at the Liuhe Science Base of Jiangsu Academy of Agricultural Sciences (JAAS) in 2012. Experiments were carried out in Jiangsu Provincial Platform for Conservation and Utilization of Agricultural Germplasm.

Fifty germplasms (Supplemental Table 1), regionally selected from a total of 617 peanut collections, were treated with 15 % PEG 6000, which acted as a drought simulant in the seedling stage. Control samples were grown in water. Seedling heights of the treated samples and the controls were measured and compared after 7 days.

Plant materials, sample treatment and RNA extraction

Peanut germplasm #11 (Fenghua 1, FH1) was identified as water deficiency tolerant in this study (see “Results” 1). After germination and growth to the trefoil stage, young seedlings were treated with 15 % PEG 6000 for 24 h (sample 1), 48 h (sample 2) or 7 days (sample 3). Control untreated seedlings (sample 0) were also analyzed. After treatment, total RNA was extracted from young leaves as previously described (Marioni et al. 2008) for RNA sequencing.

RNA sequencing, de novo assembly and transcript/unigene analysis

RNA quality and quantity were determined on a NanoDrop ND-1000 Spectrophotometer (Nanodrop Technologies, Wilmington, DE). Then, mRNA in each sample was fragmented, and sequences between 300 and 1000 bp were selected by AMPure SPRI bead with Oligo (dT). First strand cDNA was synthesized using random hexamer primers, followed by synthesis of the second strand.

Illumina HiSeq™ 2000 (Solexa) was employed to sequence the libraries using PCR amplification (LC-Bio, Hangzhou, China). Since a reference genome is not yet available for peanut, the Trinity method (Grabherr et al. 2011) was used for de novo assembly. The Chrysalis clusters module in Trinity was used for unigene acquisition. Length distribution and GC content analysis of unigenes were conducted using histogram tools in Microsoft Excel. Reads per kilobase of gene per million reads (RPKM) were used to assess gene expression in various samples, and bowtie 0.12.8 (http://bowtie-bio.sourceforge.net/index.shtml) was employed for calculations (Mortazavi et al. 2008).

Functional annotation and differential transcript expression analysis

Unigenes were compared with the non-redundant GenBank database (http://blast.ncbi.nlm.nih.gov/) and E value <1e−5, followed by analysis using the COG (cluster of orthologous groups of proteins) program to classify gene functions (http://www.ncbi.nlm.nih.gov/COG/). Blast2GO and WEGO (Conesa et al. 2005; Ye et al. 2006) were employed to obtain the functional classification based on the annotations in SWISS-PROT/TrEMBL (http://www.uniprot.org/). The KEGG database (http://www.genome.jp/kegg/) was used to annotate the unigene pathways.

Fisher and Chi square tests were used to assess the differences in transcript expression. Transcripts with >twofold changes were considered differentially expressed transcripts (DETs) and further analyzed. Significantly altered transcripts of drought related pathways were identified and their expression patterns were confirmed by qRT-PCR.

Quantitative real time PCR (qRT-PCR)

qRT-PCR primers were designed according to transcript sequences and qRT-PCR was performed on an AB 7900 real-time PCR System (Applied Biosystems, USA) with SYBR Premix Ex Taq™ (DRR041A, TaKaRa). 18S was selected as an internal reference gene and three biological replicates were analyzed per sample. The relative gene transcript amounts were determined according to the real-time PCR application guide (Bio-Rad Laboratories; http://www.bio-rad.com).

Results

Germplasm screening for water deficiency tolerance

Fifty peanut germplasms were selected for water deficiency tolerance screening. Relative seedling heights (compared with untreated plants) were measured after treatment with CK and PEG, respectively. The PEG/CK ratio (Fig. 1) and actual performance in PEG (Supplemental Fig. 1) were used as indicators of tolerance. Germplasms #11, #34 and #49 were found to be tolerant, while #9, #39 and #43 were sensitive to water deficiency. #34 and #49 grew slowly in both treatments, and #11 (Fenghua 1, FH1) was ultimately selected for transcriptome sequencing analysis.

Fig. 1
figure 1

Ratio (PEG/CK) of relative seedling heights after 7 days of treatment with water (CK) and PEG 6000 of 50 peanut germplasms

Corresponding field experiments were carried out to verify our screening results. Without manual irrigation in drought years, #11, #34 and #49 maintained 80–90 % production, while other germplasms only yielded about 60 % of their potential, indicating that performance in the seedling stage, as found in the lab, is valuable for evaluating drought tolerant of peanut.

Transcriptome sequencing and de novo assembly

A cDNA library for each of the samples 0, 1, 2 and 3 was constructed and sequenced on Illumina HiSeq™ 2000. Paired-end Illumina read numbers were 36,742,486, 36,120,046, 38,026,326, and 40,500,012 for samples 0, 1, 2 and 3, respectively.

After length filtering and redundancy cleaning, 370,145 non-redundant transcripts covering 417 Mb were obtained. 93.67–95.73 % of all transcripts were found in separate samples (Table 1). Considering the longest transcript read for each locus as a unigene, 141,289 unigenes (also known as transcript assembly contigs, TACs) with a total length of 100 Mb were obtained for functional annotation and expression comparison (Table 1, Supplemental Table 2).

Table 1 Sample reads, and transcript and unigene numbers

The average GC contents of transcripts and unigenes were 39.96 and 39.84 % (Fig. 2a), with 1127.22 and 709.82 bp long reads, respectively (Fig. 2b), which accord with existing transcriptome analyses in peanut. The average transcript RPKMs were 2.18, 2.48, 2.25 and 2.1 in samples 0, 1, 2 and 3, respectively, indicating that most transcripts were expressed at a low level (mostly 0.01–5) (Fig. 2c), which may be associated with the relatively higher number of transcripts in our experiment compared with other RNA-Seq results in peanut (Table 1).

Fig. 2
figure 2

GC content, length and RPKM analysis in transcriptome data. a GC content distributions of transcripts (red) and unigenes (green) in non-redundant mixed library. b Sequence length distribution of transcripts (red) and unigenes (green) in non-redundant mixed library. c RPKM (reads per kilobase of gene per million reads) distribution of transcripts in various samples (color figure online)

Table 2 DET (differentially expressed transcript) analysis between sample pairs

Unigene functional annotation

A total of 62,510 unigenes were annotated in the non-redundant GenBank database with E value <1e−5 (Supplemental Table 2), which is relatively less than other similar researches (Xia et al. 2013; Yin et al. 2013). Mostly, it is due to the different materials, considering the sequencing platform and the query parameter were the same.

Of these, 74.26 % were annotated to five other species besides A. hypogaea itself (1.79 %, Fig. 3a), including Glycine max (50.8 %), Medicago truncatula (13.43 %), Lotus japonicas (3.31 %), Vitis vinifera (3.18 %) and Populus trichocarpa (2.65 %), indicating the possible use of these species as references for gene function studies in A. hypogaea.

Fig. 3
figure 3

Annotation analysis of unigenes. a Species homogenously annotated to all unigenes. b COG analysis of unigenes: A RNA processing and modification; B chromatin structure and dynamics; C energy production and conversion; D cell cycle control, cell division, chromosome partitioning; E amino acid transport and metabolism; F nucleotide transport and metabolism; G carbohydrate transport and metabolism; H coenzyme transport and metabolism; I lipid transport and metabolism; J translation, ribosomal structure and biogenesis; K transcription; L replication, recombination and repair; M cell wall/membrane/envelope biogenesis; N cell motility; O posttranslational modification, protein turnover, chaperones; P inorganic ion transport and metabolism; Q secondary metabolites biosynthesis, transport and catabolism; R general function prediction only; S function unknown; T signal transduction mechanisms; U intracellular trafficking, secretion, and vesicular transport; V defense mechanisms; W extracellular structures; Y nuclear structure; Z cytoskeleton. c Gene ontology analysis of unigenes

20,636 unigenes, about one third of the total, were categorized into 25 functional groups by COG analysis, and mostly assigned to “signal transduction mechanisms” (T, 4486), and “posttranslational modification, protein turnover, chaperones” (O, 4143) (Fig. 3b).

Furthermore, 43,427 (30.74 %) unigenes were functionally annotated in SWISS-PROT or TrEMBL, and classified by gene ontology (GO). As shown in Fig. 3c and Supplemental Table 2, the genes analyzed belonged to biological process (42.03 %), cellular component (31.41 %) and molecular function (26.56 %) groups. The top seven processes (sequence number >100), including metabolic processes, cellular processes, organelle, cell part, molecular binding, catalytic activity and cell, accounted for 66.17 % of all gene annotations, and commonly thought as the basic structural and functional unit of all organisms.

To understand the gene interactions, the KEGG (Kyoto Encyclopaedia of Genes and Genomes) database was used for pathway analysis. A total of 21,495 unigenes, belonging to 1104 ECs, were assigned to 302 KEGG pathways, of which genetic information processing (ribosome translation and protein processing) displayed the most anchored unigenes (Supplemental Table 3).

Expression of drought related transcripts

Four samples were collected before and at different times as described in the “Materials and methods”, and 6 comparison pairs were obtained. Genes expressed with >twofold change were considered DETs (differentially expressed transcripts) and results are shown in Table 2. The expression levels of 130/107 DETs with KEGG annotations were gradually increased/decreased, respectively, after water deficiency treatment (Supplemental Table 4). This could help trace the genes involved in drought response. For example, homogentisate phytyltransferase (HPT), a key enzyme in Vitamin E (which is helpful in aldehyde elimination) biosynthesis, was induced dramatically after PEG treatment.

Transcripts with annotations related to drought response pathways, such as ABA, sugar, amino acid and protein kinase, are summarized in Supplemental Table 5. The expression of ABA biosynthesis genes, including ABA1, ABA3 and NCED, seemed to be induced continuously after PEG treatment, and similar results were obtained for ABF and mtlD. Tryptophan and raffinose synthesis were increased at 24 h, but dropped to normal levels afterwards. Drought stress induced proline synthesis as shown by increased proline iminopeptidase (pip) levels, and reduced its degradation (decreased proline dehydrogenase amounts), probably to maintain in planta proline levels. Distinct changes were found in CPK transcripts, implying complex roles of CPK in drought response.

qPCR was carried out to verify these findings in germplasms #9 and #11 (Fig. 4). Similar expression patterns were observed for each transcript in the two germplasms, with the highest diversity observed in proline iminopeptidase (pip) and proline dehydrogenase, suggesting that proline plays an important role in drought tolerance.

Fig. 4
figure 4

qPCR analysis of drought unigenes (ABA, sugar, amino acid and protein kinase related unigenes) in #11 (tolerant germplasm) and #9 (sensitive germplasm)

Discussion

Water deficiency tolerance screening in peanut

Water deficiency tolerance levels significantly vary by peanut genotype (Jongrungklang et al. 2013). Generally, creeping types have better tolerance capabilities compared with erect ones due to their stronger roots. Cultivated peanut germplasms are generally erect for practical reasons and it is necessary to select suitable germplasms for drought tolerance breeding. Here, 50 peanut germplasms from distinct regions were chosen for tolerance testing. Some specimens, like #34 and #49, grew slowly both in normal and PEG conditions (Fig. 1a), and were not further studied. Variety #11 (FH 1) was finally selected for transcriptome sequencing for its drought tolerant ability. Germplasms #22 and #42 also performed well, although their ratios in screening were not outstanding. Large scale field experiments for drought screening as well as physiological studies assessing photosynthetic parameters of germplasms under different irrigative conditions are ongoing. Furthermore, these tolerant germplasms were also able to maintain good yield in salty soil, indicating their value in both drought zones and saline fields.

Drought tolerance breeding in peanut

The genetic improvement of peanut is still very challenging, mainly because of its complex genetic architecture and the lack of genomic information (Pandey et al. 2012). Therefore, identification and application of valuable germplasms are very important. In our experiment, samples #11 and #49 were both derived from the same peanut germplasm ‘baisha 171’, which also performed well in the water deficiency test. Germplasm #34 was obtained from distant hybridization between the cultivated species ‘baisha 1016’ and A. chacoense, successfully inheriting excellent characters of the wild species, probably including the tolerance ability.

Genomic resources, such as expressed sequenced tags (ESTs), genetic maps based on molecular markers, and genome sequencing have rapidly evolved in recent years. Next generation high-throughput sequencing technology, especially the transcriptome sequencing employed in this study, provides a novel tool for obtaining gene expression profiles in complex biological processes, discovering new genes and developing molecular markers including SSR and SNP (Zhang et al. 2012). For instance, transcriptome data revealed physiological and genetic information related to drought resistance in wheat (Reddy et al. 2014) and transgenic poplar (Zhang et al. 2014).

Drought-related genes in peanut and mechanism of resistance

We paid special attention to pathways of ABA and osmolytes, including sugar/sugar alcohol (such as mannitol), amino acids (such as proline) and amines, which are thought to be involved in plant drought response (Seki et al. 2007). Similar expression patterns were observed in both transcriptome sequencing and qPCR data, and differences in proline related gene expression were observed in various germplasms, implying a possibility of proline-driven drought tolerance ability in peanut, which was partly verified with the corresponding physiological analyses. Full cDNA sequences from certain transcripts in Supplemental Table 5 were acquired by online tools (Genemark, http://opal.biology.gatech.edu/GeneMark/) or RACE (Smart Race, Clotech), and can help in vector construction and transgenic research.

From the transcriptome sequencing data, we also obtained the complete cDNA sequences of several stress related genes, including 4 MYBs and 8 acetaldehyde dehydrogenase (ALDHs) (unpublished data), and analysis of their specific roles could help better understand the complex drought tolerance process in peanut.

Small RNAs are also believed to participate in stress response (Sunkar et al. 2012). Indeed, two studies on peanut microRNA have been performed (Zhao et al. 2010; Chi et al. 2011), with one using the same germplasm assessed in this work (Zhao et al. 2010). Future studies will further analyze drought related microRNA changes.

Author contribution statement

Yi Shen and Zhide Chen are responsible for experiment design, data analysis and paper writing. Zhiguo E is responsible for qPCR. Xiaojun Zhang helped for data arrangement. Yonghui Liu is responsible for sample collection.