Introduction

Brassicaceae vegetables are widely recognized for their contribution to human nutrition and health benefits [1], as they contain many health promoting and potentially protective phytochemicals, including tocopherols, carotenes, plant sterols, and policosanols [2,3,4]. Although extensively consumed in Korea, few studies have considered the phytochemicals of Brassicaceae vegetables [5, 6]. However, studies on the qualitative and quantitative distribution of primary phytochemicals in Brassicaceae vegetables may help breeders to develop a germplasm with a high level of these phytochemicals.

Comprehensive chemical analysis seems to be the most reliable method for estimating food quality. Recently, with the development of various analytical techniques, many chemical data were generated for food identification and food quality determination. Plants’ identification has been performed based on genetic and metabolome analyses. A study used simple sequence repeats (SSRs) to generate genetic maps in two Brassica spp. populations (Brassica oleracea L. and B. rapa L.) [7], but, to our knowledge, few investigations have compared metabolite profiles of Brassicaceae species to identify biomarkers for their discrimination. In fact, there is only one metabolic classification of Amaranthaceae, Asteraceae, Brassicaceae, and Malvaceae [8]. According to the previously mentioned paper by Kim et al. [8], the plants have been classified by metabolic components, including policosanol, phytosterol, amyrin, and tocopherol, but no studies have been done with Brassicaceae vegetables.

Chemometric pattern recognition techniques have been used for the discrimination of the geographical origin and variety of plants [9,10,11]. The application of multivariate techniques to biological studies produces weighted combinations of the original variables that allow grouping them, which is often not evident via the classical univariate analytical approaches [12]. Group classification was initially performed using unsupervised principal component analysis (PCA)-based approaches. In previous studies, PCA and hierarchical clustering analysis (HCA) confirmed differences between rice and Chinese cabbage in terms of metabolite contents [10, 11]. Orthogonal projection to latent structure-discriminant analysis (OPLS-DA) has been used to maximize differences between samples and to identify the variables responsible for their differentiation, and it has provided a clear discrimination between Gladiolus sp. genotypes [13]. In addition, the relationship between two metabolites can be obtained by Pearson correlation analysis, and the correlations can be examined to acquire information on metabolic associations. In our previous study, Pearson’s correlation analysis was used to identify metabolic links in rice seeds [14]. A batch learning self-organizing map (BL-SOM) was used in another study to analyze differences in metabolite levels between Arabidopsis thaliana cells cultured under salt stress over time [9]. The resulting map was used to interpret the metabolic networks. In addition, BL-SOM has been used to study transcriptome and metabolome data in plants [15]. However, there are few published data on the discrimination of Brassicaceae vegetables using the chemometric pattern recognition techniques.

This study evaluated lipophilic compounds in nine Brassicaceae species of Korean origin. Twenty-eight compounds, including policosanol, phytosterol, amyrin, carotenoids, and tocopherol, were analyzed and classified from broccoli, red cabbage, cabbage, Brussels sprouts, Chinese cabbage, kale, kohlrabi, pak choi, and radish sprouts. PCA, OPLS-DA, Pearson’s correlation analysis, HCA, and BL-SOM were used in the present study to visualize chemical differences between the nine species analyzed. Through this approach, we were able to discriminate the Brassicaceae vegetables according to their species. We emphasize that this multivariate statistical analysis using metabolic profiling is a powerful tool for the assessment of quality and for the discrimination of species.

Materials and methods

Samples and chemicals

The edible parts of nine Brassicaceae were purchased from supermarket in the Incheon city, Korea in 2015, and included leaves (Brussels sprouts, cabbage, Chinese cabbage, kale, pak choi, and red cabbage), flowers (broccoli), sprouts (radish sprouts), and roots (kohlrabi) (three biological replicates). Each sample was divided into two portions, one for gas chromatography-mass spectrometry (GC–MS) analysis and one for high-performance liquid chromatography (HPLC) analysis. Each sample was freeze-dried at − 70 °C for 72 h and then crushed using a mortar and pestle. The resulting powder was stored at − 80 °C until extraction. Pyridine, ascorbic acid, and N-methyl-N-trimethylsilyl trifluoroacetamide (MSTFA) were obtained from Sigma-Aldrich (St. Louis, MO, USA). All the other chemicals used in this study were reagent grade, unless otherwise stated.

Lipophilic compound extraction and analysis

Nineteen types of lipophilic compounds were extracted as in the previously described method [8]. Briefly, each powdered sample (0.05 g) was added to 3 mL 0.1% ascorbic acid in ethanol (w/v) and 0.005 mL 5α-cholestane (internal standard; 100 μg mL−1). After saponification using 120 μL 80% potassium hydroxide (w/v), samples were immediately put on ice for 5 min, and distilled water and hexane (1.5 mL of each) were added. After re-extraction using hexane, the hexane layer was then concentrated using a centrifugal concentrator (CC-105, TOMY, Tokyo, Japan), and mixed with MSTFA and pyridine (30 μL of each) for 30 min, at 60 °C, and under 1200×g using a thermomixer (model 5355, Eppendorf AG, Hamburg, Germany). This mixture was analyzed by GC–MS in a GCMS-QP2010 Ultra system equipped with an auto sampler AOC-20i (both Shimadzu, Kyoto, Japan) prepared with the Rtx-5MS column (0.25 mm diameter and 0.25 μm thickness, 30 m length; Agilent, Palo Alto, CA, USA). The temperature of injection, interface, and ion source was 290, 280, and 230 °C, respectively. The flow rate of carrier gas (helium) was 1.0 mL min−1. The GC program was set for 2 min at 150 °C, followed by an increase up to 320 °C, with a ramping rate of 15 °C min−1 and hold time of 10 min. The volume of the injected samples was 1.0 μL, and the split ratio was 10:1. The mass spectra were analyzed using the Lab solutions GCMS solution software version 4.11 (Shimadzu, Kyoto, Japan).

Carotenoids were extracted as previously described [16]. In brief, 3 mL 0.1% ascorbic acid in ethanol (w/v) were added to each sample (0.1 g), vortexed for 20 s, and kept for 5 min in a water bath at 85 °C. After saponification, samples were left on ice for 5 min, and 0.1 mL of internal standard (β-apo-8′-carotenal in ethanol; 25 μg mL−1), and 1.5 mL distilled water were added to each tube. Hexane (1.5 mL) was then added, and samples were centrifuged at 1200×g for 5 min. The carotenoid layer was then dried under a nitrogen stream and dissolved in 0.25 mL methanol/dichloromethane (50:50, v/v). Resulting samples (20 μL each) were analyzed by HPLC in an Agilent 1100 instrument (Agilent Technologies, Massy, France), equipped with a photodiode array detector set at 450 nm. The column, mobile phase, and elution program were previously described [16].

Statistical analyses

All analyses were performed in triplicate, at least. All data files were scaled for multivariate analysis using unit variance. Brassicaceae data were analyzed by PCA (SIMCA-P version 13, Umetrics, Umeå, Sweden) to visualize relationships among vegetables and compounds. The patterns observed in the PCA score plots and Brassicaceae samples’ loading plots were used to explain the dispersion of vegetables in the diagrams. The OPLS-DA model was calculated in SIMCA-P version 13 using phytochemicals data and species data as the Y matrix. Pearson’s correlation analysis was performed in SAS version 9.4 software package (SAS Institute, Cary, NC, USA), and Multi-Experiment Viewer version 4.9.0 was used for HCA and visualization as heat map. A simple self-organizing map [SOM; (http://kanaya.naist.jp/SOM/] was used in BL-SOM analysis, in which correlation and clustering were performed among the levels of 28 metabolites with standardization procedures.

Results and discussion

Lipophilic metabolite profiling of nine Brassicaceae vegetables

Metabolomics allows visualizing differences in metabolite patterns among diverse foods. In the present study, GC–MS and HPLC were used to identify and quantify 28 lipophilic compounds in the samples of the nine Brassicaceae species analyzed. The carotenoids violaxanthin, antheraxanthin, lutein, zeaxanthin, β-cryptoxanthin, 13Z-β-carotene, E-β-carotene, α-carotene, and 9Z-β-carotene were detected by HPLC through the co-elution and retention time with standards as seen in our previous study [16]. Additionally, 19 types of lipophilic compounds in most vegetables were detected by GC–MS analysis. Quantification was performed using selected ions (Fig. 1).

Fig. 1
figure 1

Representative total ion chromatograms (TIC) (A) of lipophilic standards and TIC (B) and selected ion monitoring (SIM) chromatogram (C) of lipophilic metabolites extracted from radish sprouts as trimethylsilyl derivatives. The selected compounds in radish sprouts are displayed in (C). Inverted triangle represents peak of target compound. 1 C20, eicosanol; 2 C21, heneicosanol; 3 C22, docosanol; 4 C23, tricosanol; 5 C24, tetracosanol; 6 δ-tocopherol; 7 5α-cholestane (internal standard); 8 C26, hexacosanol; 9 β-tocopherol; 10 γ-tocopherol; 11 C27, heptacosanol; 12 δ-tocotrienol; 13 C28, octacosanol; 14 γ-tocotrienol; 15 α-tocopherol; 16 cholesterol; 17 brassicasterol; 18 α-tocotrienol; 19 campesterol; 20 C30, triacontanol; 21 stigmasterol; 22 β-sitosterol; 23 β-amyrin; 24 α-amyrin

Evaluation and classification of nine Brassicaceae vegetables using chemometrics

Multivariate analysis using metabolites’ data is particularly useful approach to find underlying structures in complicated biological systems [17,18,19]. Differences in metabolite levels were evaluated through a PCA, which explored the structure of the data obtained from GC–MS and HPLC, as evidenced in the PCA scores plot (Fig. 2). The scores of principal component 1 (PC 1) and 2 (PC 2), plotted in the abscissa and ordinate, respectively, accounted for 69.8% of the total variance within all species data (Fig. 2A, B). Radish sprouts belong to genus Raphanus L., whereas the other species belong to genus Brassica L., a difference that was successfully captured in PC 1. The metabolites loaded in PC 1 and PC 2 were compared to investigate which contributed the most to the observed pattern. The predominant metabolite in PC 1 was α-tocopherol, although other 26 metabolites (excluding amyrins) had positive loading scores in PC 1. As a result, radish sprouts, which were strongly affected by α-tocopherol and had the highest concentrations of the other 26 metabolites, appeared separated in the plot.

Fig. 2
figure 2

(A) Scores and (B) loading plots of principal components 1 and 2 of the principal components analysis (PCA) results obtained for the metabolites of nine Brassicaceae vegetables

To verify if PCA could be used as a tool for distinguishing Brassica spp., this technique was applied to the datasets originated from the leaves of Brussels sprouts, cabbage, Chinese cabbage, kale, pak choi, and red cabbage (Fig. 3). The first two PCs accounted for 75.5% of the total variance, and Brassica spp. were separated into two groups along PC 2 (Fig. 3A, circled within the dotted line). The predominant contributors in PC 2 were α-amyrin and cholesterol, which separated B. oleracea varieties from B. rapa varieties. Loading plots (Fig. 3B) indicated that B. rapa varieties had higher contents of carotenoids and phytosterols, except stigmasterol and campesterol, than B. oleracea varieties. Similarly, Brussels sprouts, cabbage, and red cabbage were also clearly grouped by PCA.

Fig. 3
figure 3

(A) Scores and (B) loading plots of principal components 1 and 2 of the principal components analysis (PCA) results obtained for the metabolites of six Brassica vegetables

OPLS-DA can be used to maximize differences between samples as well as to identify markers for their classification. In this study, all samples were clearly separated in the OPLS-DA score plots (Fig. 4A). The quality of the OPLS-DA model can be explained by goodness of fit (R2) and predictive ability (Q2) values, which were 0.747 and 0.937 in our model; in addition, our model seemed to have an excellent ability as its Q2 > 0.9 [20]. External validation aims to address the accuracy of a model in samples from different species. To confirm the performance of our OPLS-DA model, the four samples were randomly left as a test data set and the OPLS-DA was established with training samples (Fig. 4B). The R 2 X and Q2 values of this model were 0.829 and 0.853, respectively, and Q2 > 0.5 indicates a good predictive ability. The variables important in the projection (VIP) value explains the contribution of variables to the projection, and VIP > 1 is used as a criterion to identify the most important variables to the model [19]. In the present study, nine metabolites, namely α- and β-amyrins, cholesterol, brassicasterol, β-tocopherol, octacosanol, hexacosanol, triacontanol, and β-sitosterol, presented VIP > 1, indicating their important contribution to discriminate between B. oleracea and B. rapa varieties (Fig. 4C).

Fig. 4
figure 4figure 4

(A) Score plots and (B) external validation test of the orthogonal projection to latent structure-discriminant analysis (OPLS-DA) model derived from the metabolite data of six Brassica vegetables, and C variable importance in the projection (VIP)

Pearson’s correlation analysis was performed to identify relationships between metabolites, which were classified according to a color scale, with red indicating a positive correlation and green a negative correlation, and color intensity indicating the strength of the correlation (Fig. 5). The metabolite-to-metabolite correlation matrix resulting from Pearson’s correlations was used as input for the HCA of the 28 metabolites, in which those with the highest correlations were clustered. These analyses identified two groups (boxed within dotted lines in Fig. 5): one comprising c21, c26, c27, c28, c30, and amyrins, and another composed mainly of phytosterols, carotenoids, tocopherols, and policosanols with positive correlations. Phytosterols are biosynthesized by the mevalonate pathway [21, 22], while the non-mevalonate pathway, also called the mevalonic acid (MVA)-independent pathway, promotes the synthesis of carotenoids and tocopherols [22, 23]. These two pathways have common precursors, such as isopentenyl diphosphate (IPP), geranylgeranyl diphosphate (GPP), and farnesyl diphosphate (FPP), which might explain the positive correlations found within the second group of metabolites. Among the three β-carotenes, E-β-carotene was the most abundant, and it was highly correlated with 9Z-β-carotene (r = 0.9883, p < 0.0001) and 13Z-β-carotene (r = 0.9884, p < 0.0001).

Fig. 5
figure 5

Correlation matrix of the 28 metabolites identified from the nine Brassicaceae vegetables analyzed. Each square indicates the Pearson’s correlation coefficient obtained between a pair of compounds, and the intensity of green or red (negative or positive correlation, respectively) corresponds to the value of the correlation coefficient. Hierarchical clusters are indicated at the top and left of the figure

Batch learning SOM analysis was developed by Kanaya et al. [24] to replace self-organizing map (SOM) analysis, which had a low reproducibility. BL-SOM is a multivariate statistical analysis method that uses existing SOM matrices from PCA datasets [9], allowing visualizing relative amounts of metabolites and differences between samples in a large dataset with high reproducibility [24,25,26,27]. In the present study, the SOMs of nine Brassicaceae vegetables were obtained (Fig. 6A), each comprising neurons colored according to the amount of metabolites. A 6 × 5 matrix shows the patterns of phytochemical levels in the different species (Fig. 6B). For example, in the SOM of kale (matrix number 5), the upper-right neurons, corresponding to carotenoids except violaxanthin, are indicated in red, meaning kale has higher contents of these metabolites than the other eight Brassicaceae. The distance among neurons within SOMs also indicates the correlation between metabolites: Closely located neurons have high positive correlations (Fig. 6C). BL-SOM of these metabolites clustered metabolites that are metabolically related. Apart from violaxanthin, all carotenoids were clustered in the upper-middle to the upper-right, amyrins were clustered on the bottom-right, and phytosterols were clustered on the left and at the bottom-center. Thus, BL-SOM results allowed visualization of correlations among metabolites and characterizing samples.

Fig. 6
figure 6figure 6

Batch learning self-organizing map (BL-SOM) analysis (A) based on the 6 × 5 SOMs derived from the principal components analysis (PCA) of the 28 metabolites. The SOMs show metabolites’ clustering (numbers), and neurons within SOMs are arranged in two-dimensional lattice matrices. Color of SOMs corresponds to the relative amounts of compounds: red (most increased), pink (increased), pale blue (decreased), and blue (most decreased). 1 broccoli; 2 Brussels sprouts; 3 cabbage; 4 Chinese cabbage; 5 kale; 6 kohlrabi; 7 pak choi; 8 radish sprouts; 9 red cabbage. The amounts of compounds in Brassicaceae (B) and the metabolite clusters (C) displayed in the 6 × 5 matrices obtained in the BL-SOM analysis. In (B), Brassicaceae vegetables are displayed in the X-axis and standardized compounds levels in the Y-axis

Composition and content of lipophilic metabolites in nine Brassicaceae vegetables

Tocopherols, phytosterols, policosanols, and amyrins were quantified in a previous study [8]. Previously, the c26 (hexacosanol) was reported as the main policosanol in leaf samples [8, 11]. In our study, this compound was the highest in leaf samples [12.72–388.06 μg g−1 of dry weight (DW)] (Table 1). Furthermore c22 (docosanol) was the highest in flower, root, and sprout samples (17.65–223.00 μg g−1 of DW). In general, kale and radish sprouts had high policosanols contents. Campesterol and β-sitosterol were the main phytosterols, and α-tocopherol was the main tocopherol in Brassicaceae [8, 11]. The results obtained in the present study confirmed that campesterol (542.89–1182.18 μg g−1 of DW), β-sitosterol (638.76–1900.15 μg g−1 of DW), and α-tocopherol (4.25–254.32 μg g−1 of DW) contents were high in the nine Brassicaceae, especially in broccoli, which contained high levels of phytosterols, and radish sprouts, which contained high levels of tocopherols (Table 1). Lutein (0.46–251.81 μg g−1 of DW) and E-β-carotene (0.66–243.61 μg g−1 of DW) were the predominant carotenoids (Table 2). In the Chinese cabbage, for example, lutein comprised about 59.58% of the total carotenoids, which is in agreement with the previous findings [11] where lutein was found in plant leaves at a higher ratio than other components. Carotenoids play an essential role as accessory light-harvesting pigments [28] and, therefore, the low levels of carotenoids found in kohlrabi, in which the edible part was the root, were expected. Total carotenoids levels were much higher in kale, pak choi, and radish sprouts than in other vegetables (Table 2).

Table 1 Composition and content (μg g−1) of lipophilic compounds in nine vegetables
Table 2 Composition and content (μg g−1) of carotenoids in nine vegetables

In conclusion, 28 metabolites were identified in nine Brassicaceae spp. samples by GC–MS and HPLC, including amyrins, carotenoids, tocopherols, phytosterols, and policosanols. To visualize the several components, PCA, OPLS-DA, HCA, and BL-SOM were used. The PCA separated the nine species into two groups, Brassica sp. and Raphanus sp. (Fig. 2). A separation between B. rapa and B. oleracea was observed in PCA and OPLS-DA plots, and several variables were identified as candidate biomarkers that could be used in Brassica sp. authentication (Figs. 3, 4). Pearson’s correlations and HCA indicated a positive correlation between carotenoids, phytosterols, and tocopherols, and their clustering (Fig. 5). BL-SOM demonstrated patterns in the relative quantity changes of each metabolite within the different species and allowed discriminating Brassicaceae vegetables by using relative differences in the amounts of components (Fig. 6). These multivariate statistical analyses techniques can be used to establish plant metabolite profiles, which can later be used to select cultivars with a specific compound. Plant identification and differentiation at the species and population levels are important to plant scientists and breeders [29]. Qualitative variations in the phytochemical profiles of Brassicaceae vegetables could contribute to differences in health-promoting properties. Thus, multivariate characterization using metabolic profiling should be used in phenotype visualization and discrimination.