Introduction

Akebiae Fructus (AF) is the immature fruit of the Akebia genus of the Lardizabalaceae family. AF has many beneficial effects, including soothing the liver, regulating qi, promoting blood circulation, relieving pain, dispersing nodules, and diuresis. AF is used in the clinic to treat dysmenorrhea, epigastric distension, and adverse urination, among other conditions (Chen et al. 2021). Components found in AF include triterpenoid saponins, amino acids, non-triterpenoids, monosaccharides, polysaccharides, tannins, and fatty oils (Fang et al. 2021; Ma et al. 2020). Triterpenoid saponins are the main components in AF (Ye et al. 2019), including 14 compounds with hederagenin as a mother nucleus, such as α-hederagenin, saponin A, and saponin D, which have proven antitumor and anti-inflammatory activities (Xing et al. 2017; Xu et al. 2019).

In recent years, transcriptomic analysis using RNA sequencing data has become an important method of gene mining and functional prediction in medicinal plants. Previous studies have capitalized on this to identify synthetic pathways of triterpenoid saponins from Poria, Salviae miltiorrhiza, and Panax ginseng (Zeng et al. 2019; Zhan et al. 2020; Zhao et al. 2015). The triterpenoid saponins biosynthetic pathway can be divided into three stages. First, the upstream precursors isopentenyl diphosphate and dimethylallyl diphosphate are synthesized using a mevalonicacid pathway (MVA) and 2-methyl-D-erthritol-4-phosphate (MEP). Previous studies have shown that the MVA pathway is localized in the endoplasmic reticulum and in the cytoplasm, while the MEP pathway mostly occurs in the cytosol and in mitochondria (Newman et al. 1999; Choi et al. 1992). In the second stage, the universal triterpene precursor, 2,3-oxidosqualene, is obtained from the catalysis of geranyl diphosphate synthase (GPPS), farnesyl diphosphate synthase (FPPS), squalene synthase (SS), and squalene epoxidase (SE). Finally, in the third stage, more than 100 triterpene nuclei with different skeletons are cyclized under the action of oxidosqualene cyclases (OSCS) (Suzuki et al. 2002; Basyuni et al. 2007), before triterpenoid skeletons can be successively modified using cytochrome P450 monooxygenase, UDP-glycosyl transferase, and glycosidase. Next, hydroxylation and glycosylation perform oxidative replacement (Xu et al. 2021, 2014; Yang et al. 2018; Zhu et al. 2020; Wei et al. 2019) allowing for the generation of different types of triterpenoid saponins. OSCS can then produce β-amyrin following catalysis of β-amyrin synthase (β-AS), which is a key step in hederagenin synthesis (Zhang et al. 2016). Thus, β-AS is clearly an important regulatory enzyme in AF triterpenoid saponin biosynthesis.

The peel of fruits in the Akebia genus can be eaten after ripening. Importantly, the medicinal and nutritional value of these fruits is closely related to the accumulation of chemical components as they ripen. Previous studies of AF have largely focused on evaluating quality and understanding the underlying pharmacology of these fruits (Zhang et al. 2013; Zhong et al. 2021; Hong et al. 2017; Wang et al. 2019; Zhang et al. 2022). However, the molecular pathways behind triterpenoid saponin synthesis remain uncharacterized. The work of our research group focuses on the regulation of gene biosynthesis in medicinal plants (Ou et al. 2020; Song et al. 2021), and in this work, we sought to understand the dynamic law of compositional changes in AF during development and the internal genes that regulate compositional changes. As such, experimental samples of fruit at different stages of development were collected and subjected to high-throughput transcriptome sequencing and co-expression analyses to assess the expression patterns of key genes in triterpenoid saponin biosynthetic pathways along the different developmental stages. Moreover, we also sought to investigate the internal biosynthetic and transportation mechanisms of triterpenoid saponins. Transcriptomic and metabonomic analyses were used to explore changes in triterpenoid saponins at different developmental stages to provide a basis for the comprehensive application of AF.

Materials and Methods

Plant Material and RNA Extraction

Fresh AF was harvested from the planting base of Jinzhai County, Lu’an, Anhui Province, and officially identified as Akebia trifoliata (Thunb.) Koidz by Associate Professor Jinmei Ou of Anhui University of Chinese Medicine. Fresh fruits were collected at each of the five developmental stages from the beginning of May to the beginning of September (Y5-Y9). Fresh plant samples were washed several times with ultrapure water and then air-dried on filter paper. Samples from each developmental stage were placed into 10-mL centrifuge tubes, instantly frozen in liquid nitrogen, and then stored at − 80 °C until subsequent RNA extraction. Total RNA was extracted from each sample using an RNA kit. Three biological replicates were sat up for each treatment.

cDNA Library Construction and Sequencing

The mRNA enrichment method was used to purify qualified total RNA with magnetic beads using Oligo (dT) and to enrich the mRNA with a polyA tail. An appropriate amount of breaking reagent was added to the resulting mRNA sample, which was segmented under high-temperature conditions. Next, single-stranded cDNA was synthesized using the broken mRNA as a template. Subsequently, two-strand cDNA was synthesized using a two-strand synthesis reaction system. This two-strand cDNA was then purified and recovered with a kit, the sticky ends were repaired, and single adenosine bases were added to the 3′ end of the cDNA before connecting to the sequencing connector. Next, fragment size and length were selected for polymerase chain reaction (PCR) amplification, before the cDNA library was successfully generated. The constructed libraries were assessed and examined using the Agilent 2100 Bioanalyzer and the ABI StepOnePlus real-time PCR System (Zhao et al. 2020; Shan et al. 2020a; Shi et al. 2019). The Illumina HiSeq sequencing platform (BGI, Shenzhen, China) was used to perform high-throughput sequencing analysis of the complete cDNA libraries with the samples. Sequencing data were deposited in the NCBI SRA database (accession number: PRJNA857910).

Transcriptome De Novo Assembly and Gene Functional Annotation

Following transcriptome sequencing, Trimmomatic 0.36 was used to filter out reads containing joints (joint contamination), unknown base N contents greater than 5%, and low-quality reads to obtain high-quality reads. Clean high-quality reads were then assembled using Trinity 2.0.6 software (Kim et al. 2017). Unigenes were obtained using Tgicl software for clustering and redundancy removal (Pertea et al. 2003), and these were used as reference sequences for subsequent analyses. The candidate coding regions in the predictor unigenes were searched using the TransDecoder software (v 3.0.1), the longest open reading frame (ORF) was extracted, and based on the sequence similarity, the ORF was cast to the SwissProt database using the sequence comparison tool, Diamond Blastp. Homologous sequences of the PFAM proteins were then searched using Hmmscan and the coding sequence (CDS) of predictor unigenes coding sequence was predicted using TransDecoder.Predict function (Kim et al. 2015). The parameters were then set according to the assembly results, and the simple sequence repeat (SSR) of the predictor unigenes was detected using MISA (v 1.0), leading to the identification of six nucleotide repeat types (Thiel et al. 2003).

Using NCBI Blast + (Pirooznia et al. 2008) tool, the unigene NCBI non-redundant nucleic acid sequence database (NT, https://www.ncbi.nlm.nih.gov/blast/db), NCBI non-redundant protein sequence database (NR, https://www.ncbi.nlm.nih.gov/blast/db), protein direct homology database (KOG, http://www.ncbi.nlm. nih.gov/KOG), SwissProt protein sequence database (SwissProt, http://ftp.ebi.ac.uk/pub/databases/swissprot), and the Kyoto Encyclopedia of Genes and Genomes (KEGG, http://www.genome.jp/kegg), multi-sequence alignment results were matched with the annotation database of the protein family domain constructed using a hidden Markov model (multi-sequence alignment and protein family database, PFAM, http://pfam.Xfam.org). Using annotation data from the NR library, the similarity between the transcription sequences of species and related species, as well as the functional information of homologous sequences, was checked, and Gene Ontology (GO, http://geneontology.org) information on related unigenes was matched with Blast2GO 2.5.0. E < 1 × 10−5 was set to obtain the corresponding functional annotations and the classification information of the unigenes (Shan et al. 2020b).

Structural Characteristics Analysis of β-AS

The β-AS was translated using the Translatetool online software, and the complete ORF sequence was screened. Triterpenoid saponin sequences were compared using DNAMAN and Clustalx software, and secondary structures within β-AS were visualized using ESPript 3.0 software. Next, the SWISS-MODEL (https://swissmodel.expasy.org/) online system and PyMOL software were used to model and visualize β-AS tertiary structures.

Differentially Expressed Gene Analyses

High-quality reads for each sample were assessed using Bowtie2 (version 2.2.5) software against AF genomic sequences (Langmead et al. 2012). Next, gene and transcript expression levels for each developmental stage were calculated using RSEM (version 1.2.8) software and the TPM formula. The standard expression of each unigene was obtained (fragments per kilobase of exon model per million mapped fragments, FPKM) (Li et al. 2011). Functional annotation and KEGG pathway enrichment analysis of differentially expressed genes (DEGs) were conducted for fruit samples from the five developmental stages. DEGs were assigned the default false discovery rate (FDR) ≤ 0.001.

Transcription Factor-Encoding Genes Analyses

Genes encoding transcription factors (TFs) along with the TF families were classified. Next, unigene ORFs were detected using Getorf (EMBOSS:6.5.7.0) (Rice et al. 2000) and aligned to TF protein domains using Hmmsearch (version 3.0). Subsequently, competent unigenes were identified based on the TF family features found in PlantTFDB (Mistry et al. 2013).

Ultra-high Performance Liquid Chromatography-Quadrupole Electrostatic Field Orbitrap Mass Spectrometry (UHPLC-QE-MS) Analysis

AF metabolites were analyzed by high-performance liquid chromatography (HPLC) coupled with tandem mass spectrometry (LC–MS/MS) (Vanquish, Thermo Fisher Scientific) (Xie et al. 2021). The samples were separated on a Waters UHPLC BEH C18 column (2.1 mm × 100 mm, 1.7 μm). The flow rate was set at 0.5 mL/min, the column temperature was maintained at 50 °C, and the sample injection volume was 5 μL. The mobile phase consisted of 0.1% (v/v) formic acid–water (A) and 0.1% (v/v) formic acid-acetonitrile (B). Multi-step linear elution gradient was carried out as follows: 0–11 min, 85–25% A; 11–12 min, 25–2% A; 12–14 min, 2% A; 14–14.1 min, 2–85% A; and 14.1–16 min, 85% A. The mass spectrum with a scanning mass range of 100–1500 Da was then obtained using the Orbitrap Exploris 120 mass spectrometer in negative mode. The detailed parameters were as follows: sheath gas flow, 35 Arb; auxiliary gas flow, 15 Arb; ion transfer tube temperature, 350 °C; vaporizer temperature, 350 °C; full ms resolution, 60,000; MS/MS resolution, 15,000; collision energy, 16/32/48 in NCE mode; and spray voltage, − 4.00 kV (negative) (Wu et al. 2023).

Determining Triterpenoid Saponins Content

The α-hederin content in AF at different developmental stages was determined using an Agilent 1260 HPLC system equipped with a Waters SunFire C18 column (4.6 × 250 mm, 5 µm). The mobile phase consisted of an acetonitrile:water:phosphate mix (45:55:0.1). The flow rate was maintained at 1.0 mL/min with the detection wavelength at 203 nm, the column temperature at 30 ℃, and the injection volume at 20 μL. For each sample, 1 g of AF powder (via sieve four) was added to a conical flask with a stopper and extracted with 100 mL of 75% (v/v) methanol (300 W, 50 kHz, 30 min) at room temperature by ultrasonography. The sample weight was then calculated, and the lost weight was supplemented with 75% (v/v) methanol to obtain the final test solution (Deng et al. 2021).

Subsequently, 5 mg of α-hederin control was carefully weighed, and 10 mL of methanol was added to generate a solution with mass concentration of 0.5 mg/mL before it was filtered through a 0.22-µm microporous filter membrane. Next, various volumes of this α-hederin standard solution (i.e., 1 µL, 2 µL, 4 µL, 6 µL, 8 µL, 10 µL, 12 µL, and 14 µL) were taken for measurement. Linear regression analysis was used to generate a standard curve with the sample volume (µL) on the x-axis and the peak area on the y-axis.

Triterpenoid Saponin Biosynthetic Gene Expression Analyses

To validate our transcriptomic data, we assessed the expression of key enzyme genes involved in triterpenoid saponin biosynthesis by quantitative real-time PCR (qRT-PCR). Primers for genes involved in biosynthesis as well as those for reference genes (actin) were designed using Primer Premier (version 5.0). After reverse transcription, each reaction was prepared in a 15 μL volume containing diluted cDNA (2 μL), 1.5 μL of 2.5 μM primer (forward and reverse), qPCR Mixture (2X, 7.5 μL), and RNase-free water (4 μL). The following thermal cycling protocol was followed: initial denaturation at 95 °C for 10 min, followed by 40 cycles of denaturation at 95 °C for 15 s, and annealing or extension at 60 °C for 30 s. Three technical replicates were performed for each sample. The relative expression of each selected unigene was normalized against actin (Un 10407) and analyzed using the 2−ΔΔCt method (Livak et al. 2001).

Results

RNA Sequencing and De Novo Transcriptome Assembly

We obtained 96.87 Gb of data from transcriptome sequencing of AF samples. The Clean Reads Q20 value for each sample was > 97.28%, and the Clean Reads Q30 value was > 92.85%, suggesting good date reliability. After Trinity assembly and Tgicl de-redundancy, 115,430 unigenes were obtained. The total sequence length, average length, N50, and GC contents were 166,896,333 bp, 1445 bp, 2269 bp, and 40.16%, respectively (Table 1). The unigenes obtained after assembly of the AF sequences were distributed across 200–500 bp (33,848 unigenes), accounting for 29.32% of the total, and gene sequences exceeding 3000 bp accounted for 11.31% (Fig. 1a). Taken together, the unigenes were relatively complete allowing for functional annotation.

Table 1 Quality evaluation of unigenes from the transcriptome data of AF
Fig. 1
figure 1

a The length distribution of AF unigenes. b The length distribution of CDS sequences of AF

Using TransDecoder software CDS to predict the predictor data, we found that 66,882 unigenes were successfully predicted as CDS’. The total length of the CDS’ was 77,730,759 bp with a maximum and minimum length of 15,297 bp and 297 bp, respectively. Further, the N50 was 1485 bp while the N90 was 576 bp, and the GC content was 43.62%, of which 52,703 CDS’ were larger than 500 bp, accounting for 78.80% of the data (Fig. 1b). SSR detection was performed on the transcriptome data using MISA software, and screening detected that there were 55,777 SSRs distributed in 37,794 unigenes. Among them, the main types were dinucleotide repeats (24,619, 44.14%), followed by mononucleotides (17,736, 31.80%) and trinucleotides (10,268, 18.41%) (Table S1).

Functional Classification and Expression Overview of Unigenes

Among the 115,430 unigenes, 81,143, 69,686, 63,991, 66,446, 65,365, 61,359, and 65,675 unigenes were annotated using the NR, NT, SwissProt, KEGG, KOG, PFAM, and GO functional databases. All databases annotated 36,704 unigenes, or 31.80% of the total, and 85,859 unigenes were annotated by at least one database, constituting 74.38% of the total (Table 2 and Fig. S1).

Table 2 Unigenes functional annotation results

In the transcriptome data of AF samples taken in May, June, July, August, and September, 81,114, 76,201, 78,848, 75,823, and 71,014 unigenes were obtained, respectively. Of these, which 33,805, 35,472, 34,192, 32,623, and 32,753 had low expression (FPKM ≤ 1); 36,999, 30,704, 34,337, 32,616, and 27,824 had medium expression (FPKM = 1–10), and 10,310, 10,025, 10,319, 10,584, and 10,437 had high expression (FPKM ≥ 10) (Fig. 2).

Fig. 2
figure 2

Gene expression in five developmental periods of AF. The expression distribution of unigenes in five developmental periods. The X-axis represents the sample period, and the Y-axis represents the number of unigene. Different gene expression levels are expressed by color shade: FPKM ≤ 1 is low expression, FPKM 1–10 is medium expression, FPKM ≥ 10 is high expression

Identification and Analyses of Triterpenoid Saponin Pathways in AF

The metabolic pathways that AF may be involved in were analyzed using the KEGG database (Kanehisa et al. 2000; Kanehisa et al. 2019; Kanehisa et al. 2021). In all, 66,446 unigenes were annotated in the AF transcriptome, including 2798 for cell processes, 4288 for environmental information processing, 14,307 for genetic information processing, 36,398 for metabolism, and 3233 for the organic system. The five categories included 136 metabolic pathways. These results indicate that, among other pathways for the synthesis of secondary metabolites, 911 unigenes were mapped to the phenylpropanoid biosynthesis pathway, occupying the highest (Fig. 3a). Moreover, terpenoid and polyketide metabolism were mainly distributed in terpenoid backbone biosynthetic processes, containing 309 unigenes (Fig. 3b).

Fig. 3
figure 3

KEGG annotation of AF unigenes. The number of unigenes involved in the biosynthesis of other secondary metabolites (a) and terpenoids (b)

There were two metabolic pathways for triterpenoid saponin biosynthesis, including 309 unigenes involved in terpenoid backbone biosynthesis, and 125 involved in sesquiterpenoid and triterpenoid biosynthesis. These two metabolic pathways involved a total of 434 unigenes. Those related to enzymes in the triterpenoid saponin synthesis pathway were further sorted: 10 unigenes encoded acetyl-CoA acetyltransferase (AACT), 29 encoded GPPS, 10 encoded SS, and 43 encoded β-AS (Table 3). The FPKM of unigenes was greater than 1, and the distance from the complete coding frame was less than 50. There were 7, 4, 4, 3, 5, and 5 unigenes encoding AACT, 3-hydroxy-3-methylglutaryl-CoA synthase (HMGS), 3-hydroxy-3-methylglutaryl-CoA reductase (HMGR), the mevalonicacid pathway (MVK), phosphomevalonate kinase (PMK), and mevalonate-5-pyrophosphate decarboxylase (MVD) in the MVA pathway, respectively. Further, there were 7, 1, 2, 2, 3, 7, and 2 unigenes encoding 1-deoxy-D-xylulose-5-phosphate synthase (DXS), 1-deoxy-D-xylulose-5-phosphate reductoisomerase (DXR), 2-C-methyl-D-erythritol 4-phosphate cytidylyltransferase (MCT), 4-diphosphocytidyl-2-C-methyl-D-erythritol kinase (CMK), 2-C-methyl-D-erythritol 2,4-cyclodiphosphate synthase (MDS), (E)-4-hydroxy-3-methylbut-2-enyl-diphosphate synthase (HDS), and (E)-4-hydroxy-3-methylbut-2-enyl-diphosphate reductase (HDR) in the MEP pathway, respectively. The formation of the carbon ring skeleton requires 4, 14, 2, 5, and 5 unigenes for isopentenyl-diphosphate delta-isomerase (IDI), GPPS, FPPS, SS, and SE, respectively (Fig. 4). The relative expression of these key enzyme genes in the five developmental stages in the biosynthetic pathway of AF triterpenoid saponins is shown as a heat map. Importantly, we found that β-AS expression in August was significantly higher compared to September. Moreover, AACT and MDS expression was upregulated in July and September, respectively. In addition, HMGS, PMK, SE, FPPS, and IDI expression was highest in June, and the same was true for GPPS in August.

Table 3 Distribution of unigenes in five developmental periods during triterpenoid saponins biosynthesis
Fig. 4
figure 4

Synthetic route of triterpenoid saponins in AF. Y5, May; Y6, June; Y7, July; Y8, August; Y9, September. The expression level of key enzyme genes, green means low expression of genes, on the contrary, high expression of red. Cluster of transcripts and Unigene are abbreviated as “CL” and “Un,” respectively

Structural Characteristics of the Key β-AS Enzyme

Transcriptome data analyses indicated 43 unigenes encoding β-AS. The amino acid sequences were compared to the NCBI Blast database, and it was determined that nine unigenes had complete ORFs. The sequences of the nine genes were compared using DNAMAN software, and the results indicated a sequence identity of 90.82%, thereby suggesting that sequences encoding β-AS had high homology. Their three-dimensional structures, constructed using SWISS MODEL, also showed a high degree of similarity (Fig. 5a). The unigene, CL 11200–8, which had the highest expression level in the complete reading frame, had an ORF length of 2304 bp and encoded 767 amino acids. Next, we used the β-AS protein sequence (CL 11200–10, 42.00%) with high sequence identity to the template protein to model the secondary and tertiary structures. Secondary structure predictions showed that β-AS consists of 43.94% alpha helices, 36.77% random coils, 12.39% extended strands, and 6.91% β-turns. The β-AS tertiary structure was modeled using PyMOL software, and we found that it contains a characteristic MWCYCR region and a highly conserved substrate binding site (DCTAE), which is consistent with the conserved β-AS sequences reported in other plants and in animals (Kushiro et al. 2000; Ito et al. 2013) (Fig. 5b, c).

Fig. 5
figure 5

β-AS secondary structure and spatial structure model of AF. a AF β-AS secondary structure, dark blue indicates the same amino acid; pink and light blue indicate similar amino acid. b The spatial structure model of β-AS (CL 11200–10; template: 1w6k.1.A; sequence identity: 42.00%). c The red and blue spheres represent the highly conservative MWCYCR and DCTAE regions, respectively. N and C indicate the very beginning and end of the sequence, respectively

DEGs Analysis

According to our transcriptome data, 71,596 unigenes were co-expressed in the five developmental stages of AF, and 1865, 2080, 1177, 1441, and 840 unigenes were expressed in May, June, July, August, and September, respectively (Fig. 6a). DESeq was used to compare AF samples from the five developmental stages, and DEGs were analyzed (Fig. 6b). For Y9 vs. Y5, there were 16,905 significant DEGs, including 9191 up- and 7714 down-regulated genes. For Y9 vs. Y6, there were 14,972 significant DEGs, including 8218 up- and 6754 down-regulated genes. For Y9 vs. Y7, we identified 10,778 significant DEGs, including 5597 up- and 5181 down-regulated genes. Finally, for Y9 vs. Y8, we observed 7053 significant DEGs, including 3366 up- and 3687 down-regulated genes. These results suggest that AF samples taken in September were more divergent than samples taken in the other months. The DEGs of Y9/Y5, Y9/Y6, Y9/Y7, and Y9/Y8 were analyzed using a Venn diagram, for a total of 7444, 5899, 3651, 1635 DEGs in Y9/Y5, Y9/Y6, Y9/Y7, and Y9/Y8, respectively.

Fig. 6
figure 6

Gene expression and number of DEGs in five developmental periods of AF. a Unigene Venn diagram of five developmental periods. b Number of DEGs: compared with May, June, July, and August, the number of DEGs with high or low expression levels in AF in September was defined as “up-regulated” and “down-regulated,” respectively

KEGG enrichment analyses of DEGs from different AF developmental stages of AF showed that 6407 DEGs were annotated to 132 metabolic pathways in Y5 vs. Y9, with a focus on enrichment in plant hormone signal transduction (ko04075, 458 DEGs), starch and sucrose metabolism (ko00500, 321 DEGs), and amino sugar and nucleotide sugar metabolism (ko00520, 283 DEGs) (Table S2). Further, Y6 vs. Y9, 5,775 DEGs were annotated to 134 metabolic pathways, with a focus on plant hormone signal transduction (ko04075, 409 DEGs), amino acids biosynthesis (ko01230, 340 DEGs), and carbon metabolism (ko01200, 340 DEGs) (Table S3). In addition, 4297 DEGs were annotated to 132 metabolic pathways in the Y7 vs. Y9, with an emphasis on plant hormone signal transduction (ko04075, 267 DEGs) and starch and sucrose metabolism (ko00500, 238 DEGs) (Table S4). Similarly, 2960 DEGs were annotated to 133 metabolic pathways in Y8 vs. Y9, with an emphasis on plant hormone signal transduction (ko04075, 201 DEGs) and starch and sucrose metabolism (ko00500, 170 DEGs) (Table S5).

Analysis of TFs Related to Triterpenoid Saponin Biosynthesis

TFs play key roles in regulating the activities of genes involved in triterpenoid saponin biosynthesis and other secondary metabolic processes (Dong et al. 2020). In the AF transcriptome database, 3490 unigenes were annotated as TFs and grouped into 56 TF families. The MYB family had the largest number of unigenes (422), followed by the WRKY family (261), NAC family (241), and bHLH family (213) (Fig. 7). Next, KEGG metabolic pathway enrichment analysis was performed on 3490 unigenes. Among the three metabolic pathways of triterpenoid saponin synthesis, only Un 15,655, a unigene encoding TF, was involved in terpenoid backbone biosynthesis. This unigene belongs to the bHLH family and regulates hydroxymethylglutaryl-CoA reductase (NADPH).

Fig. 7
figure 7

Classification of TF families

UHPLC-QE-MS and HPLC Analyses of Triterpenoid Saponins in AF

A total of 119 metabolites were identified using the UHPLC-QE-MS, the specific classifications and numbers of which are shown in Fig. 8a. Of these, 29 were terpenoids, which accounted for the highest proportion (24.37%). Notably, among the 29 terpenoids identified, 18 were triterpenoid saponins. A qualitative analysis of the triterpene saponin metabolites of AF revealed the presence of α-hederin in AF. The total ion flow diagram for positive and negative ion modes is shown in Fig. S2. In addition, based on retention time (Tr), maximum UV absorption wavelength (λ-MS), molecular ion peaks, and ESI–MS data, oleanolic acid, hawthornic acid, and ginsenoside Ro may also be present in AF (Fig. S3 and Table S6).

Fig. 8
figure 8

a The number of metabolites in different developmental periods. b α-Hederin content in different developmental periods. The X-axis represents the sample period: Y5, May; Y6, June; Y7, July; Y8, August; Y9, September, and the Y-axis is the relative content of α-hederin

HPLC analysis was carried out to further compare changes in triterpenoid saponin expression in prehensile seeds at different developmental stages. To this end, α-hederin expression was calculated (Fig. 8b) according to the generated the standard curve (Y = 4.703 1 X − 0.097 7, R2 = 0.999 9). Dynamic changes in α-hederin expression in the AF samples first increased then decreased, before finally increasing again. The accumulation of α-hederin in AF peaked in September, probably because AF typically mature around September.

Validation of RNA-Seq Data with qRT-PCR

To independently validate the transcriptome data and DEGs at different developmental periods, six unigenes involved in triterpenoid saponin biosynthesis were selected for qRT-PCR analysis. The levels of CL 2686–3 (GPPS) expression were the highest in May. Similarly, CL 2371–4 (AACT), CL 2808–1 (IDI), and Un 21,574 (PMK) expression was peaked in June, while CL 3010–8 (HMGR) and CL 1616–6 (SS) expression was highest in July. Importantly, these data are consistent with our transcriptome data (Fig. 9).

Fig. 9
figure 9

Expression analysis of six unigenes encoding enzymes involved in triterpenoid saponin biosynthesis. Relative expression of a CL 2371–4 (AACT), b CL 2686–3 (GPPS), c CL 3010–8 (HMGR), d CL 2808–1 (IDI), e Un 21,574 (PMK), and f CL 1616–6 (SS) was analyzed by qRT-PCR using the actin gene (Un 10,407) as the reference gene for normalization. Blue bars represent qRT-PCR data; red lines represent FPKM values. The data represent the average ± standard error of three repetitions. The left Y-axis is the relative expression level of genes detected by qRT-PCR, and the right Y-axis is the FPKM values of RNA-Seq data

Discussion

Triterpenoid saponins are the main bioactive components in AF and exhibit good pharmacological activities. We identified triterpenoid saponins in AF by UHPLC-QE-MS analysis and determined α-hederin expression by HPLC. Importantly, α-hederin is an indicator and characteristic component of AF, with strong antitumor activity (Cao et al. 2022). We found that α-hederin expression increased from May to June in AF, decreased from June to August, and peaked in September, suggesting that AF is more medicinally useful in September. This is in agreement with previous studies, and importantly, these data provide guidance for the optimal harvesting period for AF (Xiong et al. 2008; Cai et al. 2022). These observed changes were also consistent with our transcriptomic data, indicating a certain correlation between the composition and transcriptomic expression.

Transcriptome sequencing was carried out on samples from five developmental stages using the DNBSEQ platform. After reassembly and de-redundancy procedures, 115,430 unigenes were obtained, with a total length, an average length, N50, and GC content of 166,896,333 bp, 1445 bp, 2269 bp, and 40.16%, respectively. These unigenes were compared to the NR, NT, SwissProt, KOG, KEGG, GO, and PFAM databases. From this, 81,143 (70.30%), 69,686 (60.37%), 63,991 (55.44%), 65,365 (56.63%), 66,446 (57.56%), 65,675 (56.90%), and 61,359 (53.16%) unigenes were functionally annotated, respectively. At the same time, 3490 unigenes encoding TFs were predicted; all Clean Reads Q30 values were greater than 92.85%, indicating that the transcriptome database was reliable and the assembly quality was high.

A large number of unigenes, 65,675, were assigned to the GO database, which indicated significant diversity in the AF transcriptome across the different developmental stages. A total of 66,446 unigenes were annotated in the KEGG database, and 451 unigenes were found to be involved in triterpenoid saponin biosynthesis, of which 255 encoded 19 enzymes that play a key role in that process. Next, the expression of key enzyme genes was determined according to their FPKM values. We found that AACT (CL 2371–4), PMK (Un 21,574), and SS (CL 1616–6) expression first increased before it decreased, suggesting that the synthesis rate may change accordingly, providing a basis for further research and cultivation. AACT (CL 2371–4), GPPS (CL 2686–3), HMGR (CL 3010–8), IDI (CL 2808–1), PMK (Un 21,574), and SS (CL 1616–6) genes were assessed using qRT-PCR, and the authenticity and reliability of our transcriptional data were verified.

As a key enzyme in oleanane triterpene saponin synthesis, β-AS is also one of the reasons for the diversity of triterpene products (Wu et al. 2013). The full-length (2304 bp) of the β-AS gene (CL 11200–10), which is highly consistent with the template protein sequence, encodes 767 amino acids. The sequence of published plant β-AS identified through the NCBI search was between 2000 and 2800 bp, with an amino acid sequence of 690–780 residues in length (Zhao et al. 2019). Our data show that β-AS is highly conserved, with secondary structures mainly composed of an alpha helices and random coils, followed by extended strands and β-turns. These secondary structures are also very similar in different plants, indicating additional β-AS conservation at the secondary structure. A search of the conserved domains in the NCBI database showed that β-AS contains a conserved DCTAE substrate-binding domain. These residues may be necessary to open the oxygen ring (Haralampidis et al. 2002; Dang et al. 2000; Shibuya et al. 1999).

There were 16,905 significant DEGs in Y9 vs. Y5, 14,972 in Y9 vs. Y6, 10,778 in Y9 vs. Y7, and 7053 in Y9 vs. Y8. The DEGs of different developmental stages shows that AF harvested in September is quite different from compared to fruits harvested at other times of the year, with the number of DEGs also being highest in September. This is in line with the pervious observed phenomenon, which further proves that AF has important medicinal value in September (August in the lunar calendar).

In the transcriptome database, 3490 candidate TFs were assigned to 56 TF families. These TFs may play an important role in regulating triterpenoid saponin biosynthesis and plant metabolism. The MYB family has the largest number of unigenes at 422. MYB factors are very important for the biosynthesis of terpenoid trunks. MYB can be divided into four categories. The subclass R2R3-MYB is the most widely distributed MYB TF in plants (Dubos et al. 2010), and it regulates the biosynthesis of terpenoid secondary metabolites. In-depth study of MYB TFs has shown that R2R3-MYB TFs regulate terpenoid synthesis in plants. For example, R2R3-MYB TF PnMYB1 can simultaneously regulate the expression of two key enzyme genes in the biosynthesis pathway of Panax notoginseng saponins, thereby affecting their biosynthesis (Lei et al. 2022). In addition, overexpression of CrBPF1 TFs can increase the transcription level of genes related to terpene biosynthesis in Catharanthus roseus to also regulate terpene biosynthesis (Li et al. 2015).

In summary, we conducted a comprehensive transcriptome analysis five developmental stages of AF, identified genes related to triterpenoid saponin biosynthesis using RNA-Seq, identified the key enzymes involved in biosynthesis, and verified the RNA-Seq data of some genes with qRT-PCR. Our data provide novel insights into the metabolic pathways related to AF triterpenoid saponins. These data will also aid further analyses of the molecular mechanisms involved in regulating AF triterpenoid saponin biosynthesis.