Introduction

Azadirachta indica (neem) belongs to family Meliaceae (mahogany family) and is native to Indian sub-continent. The plant is highly valued for its medicinal and other bioactive properties. Phytochemical extracts from various part of the plant such as leaves, fruits, seeds, flowers, bark, stem and roots have been recognized to possess medicinal properties validating the value of its extensive use by human being in diverse traditional health and other practices [1]. Among different parts of the plant, fruits have been shown to have most prodigal amounts and diversified sets of phytochemicals. Fruits are globrous, olive-like drupes, which are green when young and with advancement of ripening stage progressively turn to be yellowish-red at fully ripened stage. Epicarp, mesocarp and endocarp of the fruit remain distinct althrough the stages of fruit growth and development. Epicarp of the fruit is thin that turns green to yellow with ripening and mesocarp is thick and changes from bitter taste white milky state to slightly fibrous-fleshy that becomes sweet with ripening. Endocarp contains one or more elongated seeds covered with a shell that is initially soft but turns to a white hard kernel. Applications of the neem products are very wide and records of beginning of use are dated to ancient ages in human history. It is used for medicines, pesticides, fuel wood, timber, food, hygiene etc. The most important application of neem is in formulation of bio-insecticides with azadirachtin as principal component responsible for the bioactivity. As such, neem is known to possess more than 300 complex secondary metabolites, a situation parallel to other medicinal plants of distinct repute like Withania somnifera, Centella asiatica [25]. Although these specialized phytochemicals of neem belong to several chemical classes such as terpenoids, alkaloids, phenolics, sulphur compounds, limonoids (metabolic descendants from triterpenoids) are the most abundant and diversified secondary metabolites of the plant. Terpenoids are synthesized with the recruitment of two distinct isoprenogenic routes at the upstream part of the biosynthetic pathway-mevalonate pathway and non-mevalonate pathway (methyl erythreitol phosphate, MEP pathway). Accordingly, hydroxymethyl glutaryl coenzyme A reductase (HMGR), deoxy-xylulose synthase (DXS) and deoxy-xylulose reductase (DXR) are key enzymes which play an important role in governing limonoids biosynthesis at the isoprenogenesis stage [6]. HMGR is key regulatory enzyme of cytosolic MVA pathway whilst DXS and DXR are the enzymes that play a regulatory role in the plastidic MEP pathway [2, 3, 610]. Most of these limonoid metabolites have one or more of the biomedical activities ranging from anti-feedant, growth regulating, anti-viral, antipyretic, antimalarial, anti-bacterial, anti-fungal, anti-inflammatory, to antiarthritic, hypoglycaemic, anti-ulcer, diuretic, anti-protozoal and anti-sickling properties [1115]. Therefore, identification and characterization of genes involved in the biosynthesis of these important metabolites present in different tissues would be of immense importance. The information generated may be used to strategize the production of a particular metabolite using synthetic biology approach and/or enhance the production of a metabolites in planta. In order to address this issue, the relevant subsets of differentially expressed genes need to be identified.

Analysis of expressed sequence tags (ESTs) offers a rapid and cost effective approach to elucidate the transcriptome of an organism leading to identification of genes that could be involved in a specialized metabolic pathway. Successful examples of EST based gene discovery include the isolation of genes encoding enzymes that are involved in the secondary metabolites production, regulation and catabolism. Large scale EST sequencing and analyses have been reported on different tissues for genomic investigation of several important plants including Arabidopsis, rice and poplar etc. [1618]. However, many of the genes, particularly non-housekeeping ones like those related to secondary metabolism, are expected to express only in specific tissues and at particular developmental stages. Therefore, suppressive subtraction hybridization (SSH) is a powerful technique for identification of such genes which are differentially expressed [1921]. Recently, transcriptomes of fruits, leaves, flowers, stem and roots from neem were analysed which reveal that most of the secondary metabolites/metabolism group candidates were more predominantly present in the fruits [22].

Scheme 1
scheme 1

Graphical presentation: schematic diagram of PCR-based suppression subtractive hybridization, functional analysis of ESTs and validation of libraries

Cytochromes P450 oxido-reductases represent one of the biggest gene super families in plant genomes. Total estimated number of cytochrome P450 genes is computed to be up to 1 % of the total gene annotations in each plant species. Plant cytochrome P450 catalyze a wide variety of monooxygenation/hydroxylation reactions leading to the generation of diverse oxo-functionalized primary and secondary metabolites such as phenylpropanoids, alkaloids, terpenoids, lipids, cyanogenic glycosides, glucosinolates and plant hormones etc. [23]. Diversifications within cytochrome P450 gene superfamilies have led to the emergence of new metabolic activities and pathways during the course of evolution and, thereby, to the new member metabolites of the chemical groups as well as new chemical entities. Species-specific cytochrome P450 families and members are one of the most important cues to the differential biosynthetic competences of plants and tissues and to the generation of species-specialized secondary metabolites [24].

Therefore, we have recruited the subtractive cDNA hybridization approach to generate a forward and reverse subtracted library with enriched transcripts for identification and analysis of differentially expressed genes in endocarp and mesocarp of neem fruit. A total of 387 clones from the forward subtracted library and 512 clones from the reverse subtracted library were identified. Annotations, contigs assembly and mapping of these clones were performed followed by validation of both the libraries by semiquantitative and quantitative real time PCR assays of the selected ESTs among them with focus on secondary metabolism (Scheme 1).

Materials and methods

Plant material

Mature fruits were collected from the tree maintained at the campus of CSIR-Central Institute of Medicinal and Aromatic Plant, Lucknow (India). Samples were immediately flash-frozen in liquid nitrogen and stored at −80 °C until use. The collection time was 3rd and 4th week of the month of May.

Chemicals

Oligo-(dT)-cellulose for mRNA isolation was obtained from Sigma-Aldrich (USA). PCR Select hybridization kit was purchased from Clontech. All other kits and chemicals were procured from Promega, Applied Biosystems, Sigma, Fermentas and Himedia (India).

Total RNA and mRNA extraction

Frozen fruits were taken out from liquid nitrogen, immediately immersed in RNA-Later solution and kept therein for 30 min at room temperature. Mesocarp and endocarp were separated by a sterilized scalpel and the separated tissues were immediately processed for RNA isolation. Total RNA was extracted by modified CTAB method [25]. RNA preparations were purified following DNase treatment and quality of RNA was ascertained by electrophoresis on 1.2 % agarose gel. RNA qualitative and quantitative analyses were performed by Nanodrop spectroscopy. Purification of poly (A)+ mRNA from total RNA was carried out using oligo-(dT)-cellulose columns. Poly-(A)+-mRNA was used for PCR-Select cDNA subtraction.

Suppression subtractive hybridization

Both forward and reverse subtraction libraries were prepared using PCR-Select cDNA subtraction kit (Clontech, USA) according to the manufacturer’s instructions. For forward library, endocarp RNA was used as tester and mesocarp RNA was used as driver and vice versa for the reverse subtracted library. Tester and driver cDNAs were synthesized from 0.5 μg of mRNA extracted from mesocarp and endocarp tissues. Several rounds of suppressive subtractive hybridization were carried out to reduce the number of common transcripts. The resulting PCR products were subsequently ligated into pGEMT T vector using Easy ‘‘TA’’ cloning kit (Promega, USA).

Library construction and sequencing

Electro-competent Escherichia coli (E. coli) DH5α cells were transformed with recombinant pGEM-T vector. Distinct white colonies (~1,000) were selected from each forward and reverse library for sequencing. Plasmids were isolated by alkaline lysis (Sambrook) method using 48 deep well blocks. The libraries were maintained in 96 well plates in two replicates, one set was used for sequencing and the other was stored at −80 °C. DNA sequencing was performed using a Big Dye Terminator Cycle Sequencing Ready Reaction Kit and an automated DNA sequencer (ABI Prism 3130XL; Applied Biosystems).

Gene ontological (GO) analysis of differentially expressed genes

The resulting unique transcripts were subjected to Blast2GO tool [26] which searches against public protein database to produce BLASTx hits. Based on these hits, annotation and interproscan were performed. The results of these searches were mapped to their corresponding GO accession and GO terms [5, 27].

Phylogenetic analysis

Seed storage proteins were the most abundant gene in both the libraries therefore they were selected for phylogenetic analysis. Twenty-seven seed storage protein sequences of Arabidopsis thaliana (AAA32778.1), Citrus sinensis (AAB52963.1), Populus trichocarpa (XP_002329472.1), Ricinus communis (XP_002530185.1), Pistacia vera (ABG73109.1), Gossypium herbaceum (AEO27678.1), Vernicia fordii (AFJ04523.1), Ficus pumila (ABK80753.1), Magnolia salicifolia (CAA57846.1), Elaeis guineensis (AAF05770.1), Vitis vinifera (CBI37121.3), Actinidia chinensis (ABB77213.1), Amaranthus hypochondriacus (CAA57633.1), Chenopodium quinoa (ABI94736.1), Carya illinoinensis (ABW86979.1), Juglans regia (AAW29810.1), Castanea crenata (AAM93194.1), Quercus robu (CAA67879.1), Lupius angustifolius (AEB33709.1), Glycine max (ACT53400.1), Pisum sativum (S26688), Anacardium occidentale (AAN76862.1), Vicia faba (CAA32455.1), Medicago truncatula (XP_003590688.1), Asarum europaeum (CAA64762.1), Corylus avellana (AAL73404.1) and Sesamum indicum (AAK15087.1) were downloaded from NCBI protein database (http://www.ncbi.nlm.nih.gov/protein). Phylogenetic and molecular evolutionary analyses were conducted [28] of selected plants with neem mesocarp and endocarp seed storage protein sequences.

Isolation of putative NHMGR from fruit tissues

Degenerate primer set HMGRDGF and HMGRDGR amplified a 430 bp amplicon under following PCR cycling conditions: one cycle of 94 °C for 3 min, 35 cycles of 94 °C for 30 s, 55 °C for 40 s and 72 °C for 2 min followed by a final extension of 72 °C for 7 min in a thermal cycler (Eppendorf). Amplicon was cloned into pJET1.2 vector (Fermentas) and then transformed into E. coli DH5α host strain. Cloned amplicons were sequenced using an automated DNA sequencer (ABI Prism 3130XL; Applied Biosystems). The nucleotide sequences obtained were analysed using the similarity search BLAST program and subsequently used for designing RACE primers.

5′ and 3′ RACE for isolation of full length NHMGR

To amplify putative 3′ region of Neem hydroxy-methyl glutaryl-coenzyme A reductase (NHMGR), PCR was performed with a 3′ RACE primer pair comprising of NHMGRDF1 and 3′ AP under following conditions of PCR assay: 94 °C for 3 min, 35 cycles of 94 °C for 30 s, 55 °C for 30 s and 72 °C for 1 min followed by a final extension of 72 °C for 7 min in a thermal cycler. First strand cDNA synthesis for 5′RACE was carried out using SMARTer RACE cDNA amplification kit (Clontech) according to manufacturer’s instructions. Primary PCR product obtained in the reaction was used as template for the nested 5′RACE-PCR reaction in which NHMGRUR2 was used. Both initial and nested PCR reactions were carried under following conditions: 94 °C for 3 min, 35 cycles of 94 °C for 30 s, 55 °C for 50 s and 72 °C for 2 min followed by a final extension of 72 °C for 7 min in a thermal cycler. All PCR reactions were performed in a 50 μl assay volume containing 1.0 μl cDNA as template (except for secondary PCR where amplified products of primary PCR were used as template), 2 μl of 10 pmol respective primers and 45 μl master mix (34.5 μl PCR-grade water, 50 mM KCl, 4 mM MgCl2, 100 mM dNTPs, 2.5 U taq DNA polymerase). The nested amplified fragments of both 3′and 5′ RACE were cloned in pJET1.2 cloning vector, transformed in DH5α cells and sequenced (ABI Prism 3130XL; Applied Biosystems).

Full length cloning of NHMGR

By comparing and aligning the sequences of the core fragments, 5′ RACE and 3′ RACE products, the full-length cDNAs of NHMGR were generated and subsequently amplified with primers NHMFRFLF and NHMGRFLR containing BamH1 and EcoR1 restriction sites respectively. A high fidelity proof-reading DNA polymerase (Fermentas) was employed for amplification of full length gene under PCR conditions; 94 °C for 3 min, 35 cycles of 94 °C for 30 s, 55 °C for 1 min and 72 °C for 3 min followed by a final extension of 72 °C for 7 min in a thermal cycler PCR. The resulted amplified products were ligated in pJET1.2 vector and sub-cloned into E. coli DH5α cells. The positive clones of NHMGR cDNA were confirmed by colony PCR and digestion with BamH1 and EcoR1 restriction enzymes. Sequences of all degenerate, RACE, full length and real time primers were listed in Table 1.

Table 1 List of primers from selected ESTs for gene expression validation

Validation of subtractive libraries by quantitative RT-PCR

First stand cDNAs of neem fruit, mesocarp and endocarp were synthesized using revert Aid™ cDNA synthesis kit (Fermentas). For cDNA synthesis, 5 μg total RNA isolated by modified CTAB method and treated with DNase (fermentas) sample were used from each tissue. Cytochrome P450 and pectinesterase ESTs from fruit mesocarp and cytochrome P450, embryo specific protein and acyl carrier protein ESTs from fruit endocarp were used for expression. Quantitative PCR assays were performed using Step one™ Real time PCR system (Applied Biosystems) with absolute SYBR green ROX mix (Applied Biosystems). The expression of the candidate genes was normalized against β-actin gene. Primers of target genes were designed from libraries according to their ESTs, with calculated Tm of 50–59 °C and amplification product not longer than 160 bp (Table 1). Quantitative real-time PCR was performed in a 20 μl reaction volume, using above template, 10 μl of SYBR Green master mix and 5 pmol of each primers. Reactions were conducted using the cycling conditions (95 °C for 10 s, 95 °C for 15 s and 60 °C 1 min for 40 cycles and for melting curve 95 °C for 15 s, 50 °C for 1 min and 95 °C for 15 s). After each reaction, which included a no-template control, dissociation curve analyses were carried out to verify the specificity of the amplification. All the real time PCR reactions were performed in triplicates and two biological repeats were performed. The relative gene expression levels were represented by relative quantification (RQ) values, which were calculated by 2−ΔΔCt method. Semi-quantitative PCRs were carried out in 20 μl volume using first strand cDNA as template, 0.5 U Taq DNA polymerase and 10 pmole each primers under following set of PCR cycle conditions: one cycle of 94 °C for 3 min, 30 cycles of 94 °C for 30 s, 50–55 °C for 30 s and 72 °C for 1 min followed by a final extension of 72 °C for 7 min in a thermal cycler (Eppendorf).

Results

Construction of suppressive subtractive hybridization (SSH) cDNA libraries

Two SSH cDNA libraries from endocarp and mesocarp tissues of neem fruits were constructed. Two mRNA preparations were prepared from total RNA isolated from neem fruit mesocarp and endocarp tissues (Supplementary Fig. 1a, b). One mRNA population, isolated from endocarp tissue, containing targeted differentially expressed genes in forward library, was used as the tester and the second population of mRNAs, isolated from neem seed mesocarp constituted the driver, containing common transcripts in both populations that should be eliminated during the subtraction. After conversion of mRNA populations into double-stranded cDNA, the tester cDNA was subtracted by hybridization with driver cDNA and in reverse library vice versa. The resulting subtracted cDNA pool from each library that contain differentially expressed genes was further enriched (Supplementary Fig. 1c, d) and cloned into the pGEM-T Easy vector, generating subtracted cDNA libraries of more than 2,000 positive clones. These subtractive cDNA libraries, allowed the isolation of differentially expressed cDNAs from fruit mesocarp and endocarp. The range of insert size obtained was 100 bp–1 kb for both libraries.

EST sequencing and clustering

More than 2,000 recombinant clones from these libraries were selected for plasmid isolation (Supplementary Fig. 2a, b) and sequencing. Sequenced clones were analysed to identify secondary metabolism related ESTs. Total 387 significant ESTs from forward library (endocarp library) and 512 ESTs from reverse library (mesocarp library) with inserts size longer than 100 bp were selected. After trimming (removal of vector contamination, poor quality, ribosomal RNA and poly A regions) of the sequences, the average length of clones obtained were 309 and 343 bp for endocarp and mesocarp libraries, respectively. None of the ESTs showed any significant hits with any bacteria, animal and other genomes except with plant genomes/genes.

Sequence annotation, gene ontology categorization and pathway analysis

ESTs of both libraries (387 from endocarp and 512 from mesocarp) were further analysed by BLAST2GO gene annotation tool for BLASTx, mapping and pathway analysis. BLASTx expect value set for analysis was 1.0 e−3. From forward library 82.17 (318 clones) and 81.64 % (418 clones) ESTs from reverse library showed significant homology to proteins with known function and rest 17.83 % (69 clones) from forward library and 18.36 % (94 clones) from reverse library did not shown any significant similarity to any known proteins in public databases implying that these might be novel or less abundant ESTs.

In endocarp library, most of the ESTs were homologous to legumin and other seed storage proteins. Seed storage protein was the most abundant EST with 142 clones. Some other abundant ESTs were: legumin A (48 clones), legumin B (13 clones), senescence-associated protein (7 clones), embryo specific protein (7 clones), oleosin (5 clones) cytochrome P450 protein (2 clones) and some allergen protein (22 clones). Some ESTs contained single clone like those for alanine-pyruvate aminotransferase, acyl carrier protein, vicilin-like antimicrobial peptides, succinate-semialdehyde mitochondrial-like protein, acyl binding protein, cell wall-associated partial, glyceraldehyde-3-phosphate dehydrogenase, soluble epoxide hydrolase and cell wall-associated hydrolase. Legumin A clone was the longest with 653 bp in endocarp library (Table 2). In mesocarp library, the most abundant ESTs were homologous to legumin and seed storage proteins. Seed storage legumin were the most abundant EST with 102 clones and with maximum length with 563 bp. Other abundant ESTs were legumin A (23 clones), legumin B (73 clones), senescence-associated protein (70 clones), cytochrome P450 (45 clones), oleosin isoform (13 clones), metallocarboxypeptidase inhibitor (6 clones) and some globulin (6 clones). Some ESTs contained single clone like, pectinesterase, citrin protein, calcium binding protein, allergen, albumin and globulin and some cytochrome clones (Table 3).

Table 2 Neem forward library (endocarp) ESTs, EST description with top most blast hit, total clones of EST, EST length, mean E-value and mean similarity
Table 3 Details of ESTs from neem reverse library (mesocarp)

In endocarp library, total annotations of ESTs were 390 while those in mesocarp library were 443. Based on these annotations, members of both libraries were analysed in terms of three major GO categories; biological process, molecular function and cellular component. In both libraries, maximal number of the annotated ESTs grouped under GO category of molecular function. In endocarp library, more number of ESTs were covered under 2–6 GO level and in mesocarp library under 2–9 GO level (Supplementary Fig. 3a, b). There were more number of transcripts in endocarp library that categorized to post embryonic development, protein metabolic process and regulation of cellular process. For mesocarp library, it was so the category of metabolic and cellular process (Fig. 1a, b). In endocarp library, most transcripts pertained to binding and catalytic activity whereas in mesocarp library, the phenomenon of maximal representation was associated with genes involved in RNA binding activity, hydrolase activity and transferases activity (Fig. 1c, d). In endocarp library, maximal ESTs were assigned to cell protoplasm and organelles and only some to extracellular region and macromolecular complex. In mesocarp, a large number of ESTs were assigned to the intracellular region group and others were assigned to membranous group such as vesicles (Fig. 1e, f).

Fig. 1
figure 1figure 1

Gene ontology (GO) term-based functional categorization of endocarp and mesocarp ESTs, obtained by suppressive subtractive hybridization, based on biological process (1a, b); molecular function (1c, d) and cellular component (1e, f) respectively. The combined graph was generated based on ontology level 4 by B2G

In endocarp library, 309 ESTs were with IPS, 78 ESTs were without IPS and about 230 were with GOs. Similarly in mesocarp library, 259 ESTs were with IPS, 253 ESTs were without IPS and about 200 ESTs were with GOs. 240 ESTs from endocarp library and 280 ESTs from mesocarp library could be fully annotated (Supplementary Fig. 4a, b). Pistacia vera and Citrus sinensis emerged as the most hit plant species for the neem endocarp ESTs whereas Pistacia vera and Populus trichocarpa were found to be most hit plant species with respect to matching of neem fruit mesocarp specific ESTs (Supplementary Fig. 5a, b). In terms of metabolic pathway assignment of the genes, five ESTs from endocarp library and 16 ESTs from mesocarp library could be assigned to specific steps in KEGG pathway (Supplementary Table 1). In endocarp library, genes for nine enzymes involved in eight different pathways were recognized (Supplementary Table 1) whereas in transcript set of mesocarp library genes of three enzymes involved in 15 different pathways were recognized (Supplementary Table 2).

In endocarp library, 49 ESTs contained SSR pertaining to the sub-groups of single, di- and tetra-nucleotides. Among them, eight were single nucleotide (A-5, T-3), 40 were dinucleotide (AT-10, GT-7, CT-13, AG and TA-1 each) and one was tetranucleotide (CATG). Average length of SSRs was 19 bases and maximum length was 36 bases. In mesocarp library, a total of 63 ESTs contained SSRs that were of two types- mono- and di-nucleotides. Among them, eight were single nucleotide type (T-7, A-1) and 55 were dinucleotide type (AG-11, AC-27, CT-8, TG-6, TA, GT and AT-1 each). Average length of the SSRs was 20 bases and maximum length was 35 bases.

For endocarp and mesocarp specific ESTs contig assembly and analysis, 15 assembled contigs and 58 singlets for endocarp and 11 assembled contigs and 24 singlets for mesocarp were obtained. Sequence lengths obtained after contigs assembly varied from 100 bases to about 1.8 kb with average length of 400 bases. In the endocarp library, maximum length of contig was 1.746 kb and in mesocarp it was 1.284 kb. The general features of both the libraries are presented in Table 4.

Table 4 An account of endocarp and mesocarp SSH libraries

Phylogenetic analysis

A phylogenetic tree was constructed from aligned seed storage protein sequences of different plants and neem mesocarp and endocarp seed storage protein sequences from subtractive libraries as shown in Fig. 2.

Fig. 2
figure 2

Phylogenetic relationship of seed storage proteins from A. indica. The phylogenetic tree was constructed using MEGA 5 version (maximum likelihood method)

Secondary metabolite metabolism related genes

The amino acid sequence of polypeptides encoded by the neem mesocarp and endocarp ESTs showed significant sequence homology with the genes related to metabolism of secondary phytochemicals. Interestingly, mesocarp contained more cytochrome P450 function related genes as compared to endocarp. Five cytochrome P450 ESTs (SclNSM-001-G02, SclNSM-003-C06, SclNSM-004-A05, SclNSM-005-E12 and SclNSM-006-A02) from mesocarp library and one cytochrome P450 EST (SclNSE-001-B12) from endocarp library were identified.

Characterization of full length NHMGR

After assembling the sequences of 5′ and 3′ RACE products, the sequence of assembled full length cDNA when subjected to Blastx search, a very high homology with HMGR genes from several plant species such as Dimocarpus longan (86 %), Litchi chinensis (86 %), Ricinus communis (82 %) and Hevea brasiliensis (82 %) was obtained. This suggested that the cDNA was a NHMGR gene. The full-length cDNA was 2.233 kb with 5′ and 3′ untranslated regions (UTRs) and a poly A tail. The ORF search result showed that NHMGR contained a 1.749 kb-ORF encoding a 583-amino-acid protein. The predicted NHMGR protein had a theoretical pI value of 9.11 and molecular mass of 62.01 kDa. The sequence analysis revealed that the functional motifs of NHMGR were very similar to those contained by other plant HMGRs. The highest homology regions were observed to be around the substrate binding sites. Plant HMGRs comprised of two HMG-CoA binding motifs (EMPIGYVQIP and TTEGCLVA) and two NADPH-binding motifs (DAMGMNM and GTVGGGT) [610, 29]. All the four motifs were also present in the NHMGR (Fig. 3). It was also observed that there was a 5′ UTR of 262 bases upstream from the start codon and the coding region of NHMGR at 3′ end was followed by a 222 bases long 3′ UTR. Two transmembrane domains could be predicted to be present in NHMGR. It was further revealed by the analysis that one of these transmembranes regions was located between Pro31 (P) and Leu51 (L) and the other was located between Gly74 (G) and Val 96 (V) along the polypeptide chain. On analysis of relative variability in the locale of sequence divergence/homology compared to HMGRs from other plants, it was noted that the amino acid sequence divergence in NHMGR was more predominant in length and composition in the N-terminus region, especially at two locations- amino acid 1–30 and 100–160. NHMGR was present in all three tissues, i.e. fruits,  endocarp and mesocarp as shown by semi-quantitative PCR and real time PCR but it was absent in subtractive endocarp and mesocarp libraries (Fig. 4).

Fig. 3
figure 3

Multiple alignment of the deduced amino acid sequences of NHMGR and other plant HMGRs from Panax quinquefolium (Accession No. ACV65036.1), Nicotiana tabacum (Accession No. AAL54879.1), Camptotheca acuminata (Accession No. AAB69726.1), Tilia miqueliana (Accession No. AAY68034.1), Hevea brasiliensis (Accession No. BAF98281.1), Aquilaria sinensis (Accession No. AFU75319.1), Dimocarpus longan (Accession No. AET72045.1), Litchi chinensis (Accession No. AET72043.1), Corylus avellana (Accession No. ABP04052.1), Siraitia grosvenorii (Accession No. AEM42971.1). Two putative HMG-CoA-binding sites (EMPIGYVQIP and TTEGCLVA) and two NADPH-binding sites (DAMGMNM and GTVGGGT) are indicated within box

Fig. 4
figure 4

Expression profiles of NHMGR gene through semi quantitative and real time PCR in fruit, mesocarp and endocarp tissues. NHMGR was used as control gene which were expressed in fruit and unsubtractive endocarp and mesocarp libraries but absent in both subtractive libraries

Validation of forward and reverse subtractive libraries by quantitative RT-PCR

To validate quality of the forward and reverse subtracted libraries, the expression of eight putative genes, five from mesocarp (four cytochrome P450 genes and one pectinesterase gene) and three from endocarp (one cytochrome P450 gene, an embryo specific and an acyl carrier protein gene) were compared by semi-quantitative and real time PCR assay in whole fruit, mesocarp and endocarp tissues. Relative expression of the genes was analysed by ΔΔCT methods (relative quantification) by real time PCR and on electrophoresis through agarose gel in case of semi-quantitative PCR (Fig. 5a, b). Putative cytochrome P450 genes and pectinesterase gene were found to be expressed in neem fruit mesocarp as well as in whole fruit whilst in endocarp it was noted to be down regulated. Relative mRNA abundance of endocarp specific cytochrome P450 genes, embryo-specific gene and acyl carrier protein were observed to be more in fruit endocarp.

Fig. 5
figure 5

Wet lab validation of the forward and reverse subtracted libraries from fruit, mesocarp and endocarp tissues of A. indica through a semi-quantitative PCR, b Quantitative real time PCR. Eight genes, three from endocarp; cytochrome P450 (NE-1), embryo specific (NE-2) and acyl carrier (NE-3) and five from mesocarp; cytochrome P450 (NM-1, NM-2, NM-3 and NM-4) and pectin esterase (NM-5) were selected for expression analysis. Neem mesocarp (NM) and neem endocarp (NE)

Discussion

The reliability of the cDNA subtraction libraries has been tested by checking subtraction results for NHMGR, which was used as control in the subtraction based study. Full length cDNA sequence of NHMGR was isolated from fruit, endocarp and mesocarp tissues. Real time results showed that NHMGR gene was substantially expressed in both the tissues but the level of expression was higher in endocarp compared to mesocarp. As expected in the subtraction results, complete extinction of the constitutively expressed NHMGR gene in endocarp and mesocarp tissue implied that it was completely eliminated during subtraction of libraries in forward and reverse mode. This ensured that both the libraries were pure and did not contain any cross contaminating EST. Although a large number of metabolites are reported to be present in different parts of neem, fruits of the plant are known to be the major site for the presence of most of the compounds particularly limonoids [1, 25]. Fruit mesocarp and endocarp contained more than 80 % of the metabolites. Despite these metabolites of neem being commercially important and having diverse applications in practice, secondary metabolism of neem plant has not been explored. Further, there is no gene expression data available from the plant [22, 25]. Recently, transcriptomes of the fruit, stem, leaf, flower and root were reported but there is no report on differential expression of the genes related to secondary metabolism [30]. In this study, we have identified several differentially expressing genes from mesocarp and endocarp tissues of neem fruits by suppression subtraction hybridization which is involved in secondary metabolite biosynthesis. All the ESTs were unique and were associated with only particular tissue of neem fruits. As in other plants, neem seeds (endocarp) are also the site for food storage and that’s why most of the ESTs from endocarp were related to seed storage protein such as legumin A, legumin B, citrin and allergen proteins that are accumulated during seed development to serve as carbon and nitrogen source during the follow-up seed germination phase. Several ESTs from mesocarp library were also related to storage legumin proteins because it is attached with endocarp and also contributes to seed development. As embryo is specifically located within endocarp, embryo related ESTs were present in only endocarp and were not detected in mesocarp.

Mapping of the mesocarp and endocarp transcripts correlated well with the physiological state of neem fruit. Embryonic development takes place in endocarp of neem fruit so embryonic related proteins are actively expressed in the tissues related with metabolic processes. The BLASTx hits that could not be assigned any GOs term are putatively tissue specific transcripts of neem fruits which need to be identified and validated by other experimental approaches. At least some of them could be useful in terms of their role in specific secondary metabolite biogeneration process/step(s) of the plant. Pistacia vera and Citrus sinensis are the most blast hit species because they are most closely related to neem based on phylogenetic proximity under the order Sapindales/Rutales. Allergen and globulin mRNAs appear to be the most expressed transcripts in both the libraries but the number of ESTs was far greater in endocarp library. These proteins also fall under seed storage proteins and they accumulate during the process of seed development. Nevertheless, their observed presence in mesocarp is a moot point in terms of the fruit-seed physiology. A prediction of a share of mesocarp metabolism to the seed storage proteins need to be examined in future studies. These transcripts could also be ones remnant from the very early fruit development when physical separation of mesocarp from endocarp is least.

Of the clones that yielded high quality sequences and had inserts, major percentages (31.5 % from the forward subtracted library and 25.7 % from the reverse subtracted library) were unidentifiable on BLAST searches and comprised of novel and unclassified genes. Phylogenetic proximity based on most abundant EST seed storage protein of neem mesocarp and endocarp shows that it is closely related to Pistacia vera and Citrus sinensis. These plants Citrus, Pistacia belong to same order Sapindales and Rutaceae family to which Citrus is considered phylogenetically closest to neem family (Meliaceae). This proximity is also manifested in terms of their secondary metabolism predominated by limonoids group of terpenoidal phytochemicals.

Member diversity of neem compounds could be most prolifically diversified by secondary transformations involving mainly oxido-reductive and subsequent conjugative steps at the appropriate hydroxy-positions. Plant cytochromes P450 enzymes catalyze a wide variety of monooxygenation/hydroxylation reactions to generate diversity in parental secondary metabolites pertaining to all chemical classes viz. phenylpropanoids, alkaloids, terpenoids etc. Thus, cytochrome P450 hydroxylases play a lead role in metabolic processing of early progenitor intermediate from main pathway to facilitate biogeneration of several end products of the respective class of metabolites [23, 24, 31]. Tissue specific P450 families are recruited for biosynthetic pathways of specialized metabolites. Neem contains many biologically active compounds like azadiradione, epoxyazadiradione, azadirachtin, salannin, nimbocinol, catechin, gallic acid etc. that were characteristically and/or multi-fold hydroxylated. The secondary metabolism related cytochrome P450 sub-family members are likely to have a designated catalytic role in facilitating such specific hydroxylase/epoxidase catalyzed metabolic transformations [30].

Real time and semi-quantitative expression analysis of cytochrome P450 genes have revealed that cytochrome P450 genes were up-regulated in a tissue-specific manner in different parts of the neem fruit. Occurrence of more member cytochrome P450 genes in the mesocarp compared to endocarp tissue matches with the occurrence of a more complex repertoire of hydroxylated metabolites (limonoids) as synthesized and/or terminally processed for their likely subsequent fate of transportation to sequestration site(s) including endocarp. Neem green (immature) fruit is bitter in taste but after ripening its green epicarp gets converted to yellow coloration and hard white milky mesocarp transforms into flashy yellow sweet mesocarp. This may, besides other metabolic processes, also entails a limonoid transformation as similar process reversely occurs in citrus fruits juices on standing due to limonin (bitter) release from its glyco-conjugated (sweet) from. Pectinesterase plays an important role in cell wall metabolism during fruit ripening and, therefore, observed up-regulation of pectinesterase in mesocarp tissue might play an important role in ripening of neem fruit like other fruits. As such, its preponderance of expression in whole fruit indicates contribution of expression in other parts like epicarp of the fruit during ripening. It may bring about the observed loosening of the epicarp on maturity (acquiring and accessing sweet taste mesocarp). This facilitates seed dispersal by attracting crows as mature neem fruit is a feast for them. If sweetness is linked with the limonoid metabolism, probably neem forms a unique case of so called secondary (specialized) metabolites linked to seed dispersal- a primary activity essential for normal conclusion of reproductive process. Acyl carrier proteins (ACPs) are involved in both the pathways of primary and secondary metabolism as they transfer acylated intermediates/metabolites to their respective sites for processing/use. Several ACPs are found in different tissues of neem as its seed (kernel) contains abundant (~25–45 %) triacylglycerol. The observed up-regulated expression observed in the endocarp might also reflect secondary metabolite modification, in addition to the classical predominant ACPs involved in fatty acid synthesis.

Conclusions

In the present report two cDNA libraries neem fruit endocarp and mesocarp were constructed by suppression subtraction hybridization approach. A total 387 ESTs from endocarp and 512 ESTs from mesocarp libraries were identified and analyzed. BLASTX data revealed that, 318 ESTs (82.17 %) from endocarp and 418 ESTs (81.64 %) from mesocarp encoded for polypeptides/proteins (known or putative proteins). Contigs assembly provided 73 unique clones from the forward subtracted library and 35 unique clones from the reverse subtracted library. Mesocarp contained more cytochrome P450 genes then endocarp which reveals more hydroxylated fruit metabolites were synthesized or end processed in mesocarp tissues. NHMGR was used as control, which was present in fruit, endocarp and mesocarp tissues, and absent in both subtractive libraries ascertaining the desired completion of the both the libraries. Further investigation of some of the selected ESTs 3 from endocarp (CYP, ACP and embryo specific protein) and five from mesocarp (four CYP and pectinesterase) revealed that these genes are expressed differentially in their respective tissues. Phylogenetic proximity based on most abundant EST seed storage protein of neem mesocarp and endocarp shows that it is closely related to Pistacia vera and Citrus sinensis. The results have manifested in interesting feature of tissue-specificity and insighting aspects of the neem secondary metabolism for further research including invoking secondary metabolites and metabolism in primary activity (chemical biology of seed dispersal).