Introduction

Isoprenoids in plants composed of primary metabolites and secondary metabolites. The primary metabolites are important in the basic life activities of plants. For example, sterol is involved in biofilm construction; ubiquinone is involved in respiration; carotenoids and chlorophyll are involved in photosynthesis; and gibberellins, abscisic acid, cytokines and brassinolide are involved in plant growth and development. Secondary metabolites play less essential roles but are important in regulating the relationship between plants and the ecological environment. Moreover, secondary metabolites usually have commercial value. They are used as pharmaceuticals, agrochemicals, solvents and food additives (Roberts 2007).

Conyza blinii H.Lév. is a folk herb that used in western Sichuan, for its treatment of asthmatic cough and other inflammatory conditions (Chinese Pharmacopoeia Commission 2015). Its main secondary metabolites are isoprenoids, including blinin, α-amyrin, β-amyrin, oleanolic acid, ursolic acid, conyzasaponins and so on (Xu et al. 1999; Su et al. 2001a, b, 2003). The entirety of the plant can be medicinally prepared and the most popular C. blinii extract product is “Conyza blinii extract tablets”, which consists of conyzasaponins (Li 1980). In addition, conyzasaponins have anticancer activity (Ma et al. 2016). Thus the conyzasaponins are responsible for C. blinii major pharmacologically bioactivity. However, the conyzasaponins content in C. blinii is low, which are insufficient to meet the demand for pharmaceutical preparations. Hence, methods that improve conyzasaponins content are the focal point in studies on C. blinii.

To better regulate the synthesis of target isoprenoids, it is essential to understand their biosynthetic pathways. Previous studies have suggested that isoprenoids are synthesized by the MVA (mevalonic acid) pathway or the MEP (methylerythritol phosphate) pathway (Lichtenthaler et al. 1997; Lichtenthaler 1999). Conyzasaponins are oleanane-type pentacyclic triterpene saponins, which are synthesized via the MVA pathway (Fig. 1). That is a complex and multi-branched pathway. Identification of the key enzyme genes is one of the important aspects of studying these complex, multi-branched metabolic processes.

Fig. 1
figure 1

The conyzasaponins biosynthetic pathway. AACT, acetoacetyl-coa thiolase; HMGS, 3-hydroxy-3-methylglutaryl coenzyme A synthase; CbHMGR, C. blinii 3-hydroxy-3-methylglutaryl coenzyme A reductase; MVK, mevalonate kinase; PMK, phosphor mevalonate kinase; MVD, mevalonate diphosphate decarboxylase; IPI, IPP isomerase; FPPS, farnesyl diphosphate synthase; CbSQS, C. blinii squalene synthase; SQE, squalene epoxidase; βAS, β-amyrin synthase; P450s: cytochrome P450 monooxygenases; GTs: Glycosyltransferases (Hsieh et al. 2011)

3-Hydroxy-3-methylglutaryl-CoA reductase (HMGR, EC: 1.1.1.34) is the first rate-limiting enzyme in the MVA pathway (Rodwell et al. 1976; Bach 1986; Stermer et al. 1994). It catalyses irreversible conversion of HMG-CoA into mevalonate, the precursor of the isoprenoids (Chappell et al. 1995). Due to its significance in isoprenoid metabolism, HMGR has been isolated and characterized from many high plants. Cao et al. (2010) isolated a new HMGR gene from young leaves of Euphorbia Pekinensis by RACE. And a functional colour complementation assay in Escherichia coli was operated to prove that EpHMGR could catalyse the biosynthesis of carotenoids. Kalita et al. (2015) reported the full length cDNA cloning of HMGR and its characterization from Centella asiatica. Most recently, genes encoding HMGRs have been cloned from Cymbopogon winterianus (Devi et al. 2017), Gossypium (Liu et al. 2018), Pogostemon cablin (Zhang et al. 2019), Ginkgo biloba (Rao et al. 2019) and Andrographis paniculata (Srinath et al. 2020).

Another important regulatory enzyme, squalene synthase (SQS, EC: 2.5.1.21) is the first committed enzyme in the sterol and triterpenoid biosynthesis. It converts two molecules of FPP into squalene, a commom precursor of sterols and triterpenes (Brown and Goldstein 1980; Abe et al. 1993). Similarly, SQS has been cloned and characterized from many plants, such as Arabidopsis thaliana (Kribii et al. 1997), Centella asiatica (Kim et al. 2005), Taxus cuspidate (Huang et al. 2007), Siraitia grosvenorii (Su et al. 2017), Taraxacum koksaghyz (Unland et al. 2018), Medicago sativa (Kang et al. 2019) and Camellia sinensis (Fu et al. 2019).

However, the HMGR and SQS genes involved in conyzasaponins biosynthetic pathway have not been identified. In this study, we report the isolation and molecular characterization of CbHMGR and CbSQS genes from C. blinii transcript tags. And the biological function of the two genes were verified by in vitro enzymatic activity assays. The results will enable us to map and regulate the important steps involved in conyzasaponin biosynthetic pathway at the level of molecular genetics in the future.

Materials and methods

Plant material

C. blinii plants were collected from 101°46′~102°30′ E, 26° N at an altitude of 1680~2100 m in Panzhihua, Sichuan, China.

RNA and DNA isolation

Leaves collected from C. blinii were used to isolate RNA and DNA. A RNAprep pure Plant Kit (TIANGEN) was used to isolate the total RNA. Single-stranded cDNA was prepared using the PrimeScript 1st Strand cDNA Synthesis Kit (Takara). A Plant Genomic DNA Kit (TIANGEN) was used to extract genomic DNA.

Gene cloning

The candidate HMGR and SQS genes were searched based on the in-text gene names and functional annotations of unique annotated genes from the C. blinii transcriptome annotation library (Sun et al. 2015). According to the selected tag sequences, specific primers (Table 1) were designed. PrimeSTAR Max DNA Polymerase premix (2×) (Takara) was used to amplify sequences. The TIANquick Mini Purification Kit (TIANGEN) was used to purify PCR products. The pMD19-T simple vector (TaKaRa) was used as a cloning vector and Escherichia coli strain DH5α (stored in the laboratory) was used as the cloning host strain. Finally, the PCR products sequences were sequenced by Invitrogen trading.

Table 1 Primers used in this study

Bioinformatics analysis

Gene Structure Display Server (GSDS, http://gsds.cbi.pku.edu.cn/) was used to analyse the genomic DNA sequence features (Hu et al. 2014). The alignment of multiple sequence were analysed by DNAMAN software. The SOPMA server (https://npsa-prabi.ibcp.fr/cgi-bin/npsa_automat.pl?page=/NPSA/npsa_sopma.html) was used to determine the secondary structure. Transmembrane domains were analysed with the TMHMM Server 2.0 (http://www.cbs.dtu.dk/services/TMHMM-2.0/). Protein sequences were used to construct the phylogenetic tree by MEGA 7 software with Neighbor-Joining method and 1000 bootstrap replications (Tamura et al. 2011).

Construction of expression vectors and heterologous expression

The specific primers with restriction sites (Table 1) were used to amplify the coding sequences. The CbHMGR and CbSQS PCR products were digested with BamH I and Xho I restriction enzymes. Then inserted the digested products into the corresponding sites of the pYES2/NT B vector. Colony PCR, restriction digestion and sequencing were used to confirm positive clones. Subsequently, the positive plasmids were transformed into the Saccharomyces cerevisiae strain INVSc1. After cultivation for 3~4 days, single clones containing positive plasmids or empty vectors were inoculated in 15 mL of SC-U medium (synthetic complete medium without uracil). To induce gene expression, 2% galactose and 1% raffinose were used to replace glucose. Cultures were grown for 18 h at 30 °C with shaking at 200 rpm.

Heterologous protein extraction

A One Step Yeast Active Protein Extraction Kit (Sangon Biotech) was used to extract the heterologous proteins from S. cerevisiae. An ultrafiltration tube (Millipore) was used to exchange buffer and concentrate protein.

Enzyme assays

A CbHMGR enzyme activity assay was carried out as described by Gu et al. (2015). The 1 mL reaction mixture contained 50 mmol/L KCl, 25 mmol/L K2HPO4 (pH=7.2), 1 mmol/L EDTA, 5 mmol/L DTT, 100 μL CbHMGR crude protein (100 μg/mL), 0.3 mmol/L NADPH (Roche), 0.3 mmol/L HMG-CoA (Sigma-Aldrich) and ddH2O. After incubation at 30 °C for 30 min, terminate the reaction by adding 100 μL of 6 mol/L HCl. Then, the reaction was stored at 25 °C for 1~2 h. Finally, the reaction product was extracted with two volumes of ethyl acetate. The extracts were analysed by gas chromatography coupled to mass spectrometry (GC-MS) under the same conditions as those described by Gu et al. (2015). The product was identified with NIST software.

A CbSQS enzyme activity assay was carried out as described by Ye et al. (2014) with some modifications. The 500 μL reaction mixture contained 40 mmol/L MgCl2, 100 mmol/L Tris-HCl (pH 7.5), 0.1 mmol/L FPP (Sigma-Aldrich), 4 mmol/L DTT, 30 mmol/L BSA, 0.2 mmol/L NADPH and 220 μL of CbSQS crude protein (100 μg/mL). The mixture was incubated at 32 °C for 10 h. Then two volumes of hexane were used to extract the reaction product. Finally the concentrated organic phase was analysed by GC-MS under the same conditions as those described by Ye et al. (2014). The squalene was identified with NIST software.

Results

Sixteen sequences predicted as candidate HMGRs were obtained from the C.blinii transcript tags (Table S1). However twelve among them is too short, which only encode peptide less than 100 aa. c29868 and c45602 encode the same peptide. c29868 (c45602) and c38514 encode peptide about 330 aa. c29574 encode the full length HMGR protein. According to the FPKM value of these tags indicated that c29574 tag is the highest expressed one (Table S1). Therefore, c29574 was selected as the CbHMGR gene for further research.

The CbHMGR gene has a 1740 bp long coding sequence and encodes a peptide of 580 aa. Its GenBank accession number is KX907777. A BLASTp search revealed that CbHMGR has the highest similarity to HMGR from Chamaemelum nobile. The protein conserved domain prediction analysis predicted that the CbHMGR belongs to HMG-CoA_reductase_classI. This HMGR class catalyses the synthesis of coenzyme A and mevalonate in isoprenoid synthesis (Choi et al. 1992). The calculated molecular mass of CbHMGR is 62.17 kDa, and its isoelectric point is 6.61. GSDS analysis revealed four introns (1383 bp, 1165 bp, 470 bp and 189 bp) in the genomic DNA of CbHMGR (Fig. 2).

Fig. 2
figure 2

Distribution of introns in gDNA of CbHMGR

Sequence alignment showed that mature CbHMGR contains two HMG-CoA binding motifs (EMPVGYVQIP and TTEGCLVA) and two NADP(H) binding motifs (DAMGMNM and GTVGGGT), which were the four highly conserved motifs in all plant HMGRs and function as the catalytic active sites of the HMGR protein (Fig. 3). The secondary structure of CbHMGR was composed of 40.76% alpha helices, 33.51% random coils, 17.10% extended strands and 8.64% beta turns. The results of the phylogenetic analysis (Fig. 4) showed that CbHMGR is homologous to HMGR from C. nobile, which is in accordance with the BLAST results.

Fig. 3
figure 3

Alignment analysis of CbHMGR and HMGRs from Chamaemelum nobile (AMN10096.1), Tagetes erecta (AAC15475.1), Gentiana lutea (BAE92730.1) and Panax ginseng (AIX87980.1). Black: 100% homologous residues; Gray: ≥ 75% homologous residues. A, B: two HMG-CoA binding motifs; C, D: two NADP(H) binding motifs

Fig. 4
figure 4

Phylogenetic analysis of HMGR amino acid sequences using the neighbor-joining (NJ) method. Twenty-seven sequences from different species were retrieved from GenBank. The accession numbers are provided after the names. CbHMGR is marked with a circle

To examine the function of CbHMGR, the pYES-CbHMGR recombinant plasmid was constructed, then expressed in INVSc1 yeast. GC analysis of reaction products of 18-h-old pYES-CbHMGR strain revealed a single peak at 8.0 min, which was absent in the empty pYES2/NT B vector strain and the blank control. The MS data indicated that the particular peak detected in pYES-CbHMGR strain (8.0 min) was mevalonic acid lactone (Fig. 5). Consequently, we conclude that CbHMGR is indeed a 3-hydroxy-3-methylglutaryl-CoA reductase.

Fig. 5
figure 5

GC-MS results of CbHMGR in vitro enzymatic activity assay. a GC chromatogram of the CbHMGR group reaction products (a and b), the empty pYES2/NT B vector group reaction product (c) and the blank control group reaction product (d). b The MS spectrum of line a in A. c The MS spectrum and the structure of mevalonic acid lactone

According to the transcriptome analysis, only one tag corresponded to SQS gene (Table S1). Therefore, we selected it as the CbSQS gene for further research. CbSQS has a 1257-bp coding sequence which encodes 418 amino acid residues. Its GenBank accession number is KX907779. A BLAST search indicated that CbSQS shares 90% identity with the SQS from Artemisia annua. The calculated molecular mass of CbSQS is 47.96 kDa, and its isoelectric point is 8.61. GSDS analysis indicated that there are no introns in the gDNA of CbSQS. The result of sequence alignment showed that CbSQS contains four highly conserved motifs and two poor conserved motifs (Fig. 6). Domain A is an extended hydrophobic domain bounded on domain B. Domain B and domain D are two aspartate-rich domains, which constitute the two sets of substrate binding sites for allylic. Domain C is a partially conserved phytoene synthetases motif that is essential for catalysis. Domain E is present only in squalene synthetases. Domain F is the transmembrane domain of CbSQS.

Fig. 6
figure 6

Alignment analysis of C. blinii SQS and the SQS sequences from Eleutherococcus senticosus (AEA41712.1), Gossypium raimondii (XP_012449773.1), Vitis vinifera (XP_002266150.1) and Artemisia annua (AAR20329.1). Black: 100% homologous residues; Gray: ≥ 75% homologous residues. a: Hydrophobic motif; b and d: Two aspartate-rich motifs; c: Phytoene synthetases motif; e: Squalene synthetases peculiar motif; f: Transmembrane domain

The CbSQS secondary structure primarily include alpha helices (61.24%), with some random coils (21.05%), extended strands (9.57%) and beta turns (8.13%). The results of the phylogenetic analysis (Fig. 7) showed that CbSQS has the closest genetic relationship with the SQS from A. annua, which concurs with the BLAST results.

Fig. 7
figure 7

Phylogenetic analysis of SQS amino acid sequences using the neighbor-joining (NJ) method. Twenty-six sequences from different species were retrieved from GenBank. The accession numbers are listed after the names. CbSQS is marked with a circle

To examine the catalytic activity of CbSQS in squalene production, active proteins from pYES-CbSQS transgenic yeast were incubated with FPP for 10 h at 32 °C. Analysis of GC retention times revealed that there was a peak for the pYES-CbSQS strain at 11.5 min, while there were no peaks for the empty pYES2/NT B vector strain and the blank control group. After searching the NIST database the peak detected in the transgenic strain was confirmed as squalene (Fig. 8). This result suggests that CbSQS catalyses conversion of FPP to squalene.

Fig. 8
figure 8

GC-MS results of CbSQS in vitro enzymatic activity assay. a GC chromatogram of the CbSQS group reaction product (a), the empty pYES2/NT B vector group reaction product (b) and the blank control group reaction product (c). b The MS spectrum of line a in A. c The MS spectrum and the structure of squalene

Discussion

Conyza blinii is a rare Chinese herb endemic to southwest China that is commonly called Jin Long Dan Cao. According to the records of the Chinese Pharmacopoeia, it has anti-inflammatory, antitussive, anti-asthmatic and expectorant effects (Chinese Pharmacopoeia Commission 2015). The pharmacological effects of medicinal plants are mediated by secondary metabolites, which are the main sources of natural medicines. However the contents of metabolites in the natural plants are usually low, which hampered the applications of the pharmacologically active compounds (Misawa 2011). Overexpressing the biosynthesis pathway genes is an effective way to enhance the yield of metabolites (Lu et al. 2016). For example, Deng et al. (2017) reported co-overexpression of PnHMGR and PnSS could remarkably enhance the accumulation of total saponins in Panax notoginseng cells, which was 3-fold higher than those in control. Overexpression of Panax ginseng HMGR resulted in 1.1- to 1.6-fold increase of phytosterol and triterpene in hairy root cultures of Platycodon grandiflorum (Kim et al. 2013). Conyzasaponins are oleanane-type triterpene saponins from C. blinii, which are responsible for C. blinii major pharmacologically bioactivity. Nevertheless, the use of conyzasaponins is hampered by their low levels in C. blinii and by the lack of information about their biosynthetic pathway. To date there are only two researches on conyzasaponins pathway, which are identifying the CbSQE and CbβAS genes involved in conyzasaponins biosynthetic pathway (Sun et al. 2016; Sun et al. 2017). However the upstream genes have not been identified, which are also important in conyzasaponins biosynthetic pathway.

HMGR is the first key enzyme in the MVA pathway (Chappell 1995). In this study, a HMGR gene of C. blinii, namely, CbHMGR, was cloned and identified. The open reading frame length (1740 bp) of cloned CbHMGR is similar to that of HMGRs from other plants. As previously reported that there are two distinct classes of HMGRs: HMGRs class I and HMGRs class II (Bochar et al. 1999). Class I HMGRs contain N-terminal membrane domains involved in the membrane localization and the sterol-regulated degradation of HMGR molecules (Caelles et al. 1989; Denbow et al. 1996). The results of TMHMM analysis indicated that CbHMGR, like that of all known plant HMGRs, contains two transmembrane domains (40–62 and 83–105), which is consistent with conserved domain prediction results.

Additionally, SQSs were studied as a key enzyme for the biosynthesis of squalene as an intermediate for the production triterpenoids. The cloned CbSQS sequence show the same characteristics as known SQS sequences. CbSQS sequence codes for 418 aa with a 47.96 kDa molecular mass. These results are in accordance with those of previous reports, which have indicated that the SQS protein is approximately 410 to 461 aa long with a molecular mass in the 42.9~52.5 kDa range (Hanley and Chappell 1992; Robinson et al. 1993; Okada et al. 2000). And as other SQSs, six conserved regions are present in the CbSQS. These consensus regions are predicted or even have been proven to be important for the SQS activity (Gu et al. 1998; Pandit et al. 2000). In summary, these results provide new information about previously unannotated genes of conyzasaponins biosynthesis pathway.

Furthermore, in this study, we investigated the in vitro enzymatic activity of these genes. Through yeast expression analysis, CbHMGR was characterized as a reductase that produces mevalonic acid from HMG-CoA. CbSQS was characterized as a synthase, catalyses the reductive dimerization of farnesyl pyrophosphate. These results can not only help to increase understanding of the conyzasaponins biosynthesis pathway, but also provide a foundation for biotechnological improvement of the conyzasaponins content. However, further research on their function in conyzasaponins biosynthesis is still requiring. For example, overexpressing these genes in homologous and ectopic plants, knock out these genes in C. blinii or co-expressing these genes with other genes involved in conyzasaponins pathway to produce conyzasaponins using synthetic biology.