1 Introduction

Dicotyledonous woody plants are generally classified into 52 families and one of its largest families is known as the Myrtaceae family [1]. Family Myrtaceae is also widely identified as guava, myrtle, clove, or eucalyptus family. The Myrtaceae family is positioned within the order Myrtales, comprising more than 175 genera and 6000 species [2, 3]. Not to mention, the Myrtaceae family has undergone evolutionary changes, transitioning from its earlier, less advanced forms found in moist rainforest climates to more specialized forms adapted to the extremely dry and highly seasonal conditions of the western, semi-arid region. Therefore, the Myrtaceae family is structurally and floristically dominant and distributed across all continents except Antarctica, with its primary occurrence concentrated in the tropical and temperate regions of the Southern Hemisphere, such as eastern Australia, Malesia (the Australasian region), and the Neotropics [4].

The prominent genera in the mentioned regions mainly comprised Eucalyptus, Syncarpia, Callistemon, Psidium, Angophora, Eugenia, Syzygium, Melaleuca, Myrtus, Myrcia, and Corymbia. Melaleuca (236), Eugenia (3000), Psidium (150), Myrcia (765), Syzygium (1000), Eucalyptus (520), and Myrtus (150) represent a few of the most substantial Myrtaceae genera with the most documented species [5,6,7]. However, the numbers do not end there. New species are constantly being discovered in remote tropical forests, and many of them are yet to be officially identified at the species level [4]. It is not unanticipated that there are a variety of ways that different species have been used by humans, given the vastness and diversity of the Myrtaceae family. As a result, several species within the Myrtaceae family have been intentionally grown, resulting in significant economic value both in their country of origin and in plantations worldwide [4]. Regardless of the species and genus, Myrtaceae family species are utilized in various industries for distinct purposes, such as plantation (charcoal, gasoline, pulp, and firewood), agriculture (windbreaks, pesticides, honey, and livestock breeding), horticulture (ornamental species, cut flowers, and foliage), medicine (anti-diabetic, anti-microbial, anti-viral, anti-fungal, traditional medicines, and vector control), and food (sauce, paste, flavourings, condiments, herbs, tea, and fruits) [4, 8,9,10].

One of the common products of Myrtaceae family plants is essential oils. These essential oils can be extracted and isolated from various organs of plants, such as aerials, fruits, leaves, flowers, and bark [11]. At room temperature, essential oils are typically liquid and colourless. Except for vetiver, sassafras, and cinnamon, other essential oils possess a distinct aroma, a low density (< 1 kg/m3), and demonstrate significant optical activity. These essential oils in herbs are responsible for the various fragrances that plants generate. They are frequently applied in grooming products, scented sanctuaries, and aromatherapeutic practices. The latter is meant to be a therapeutic treatment that includes reflexology, inhalations, or steam baths with these oils [12]. In terms of medicinal and therapeutic properties, essential oils have been reported to exhibit anti-bacterial, anti-cancer, anti-protozoal, anti-viral, anti-fungal, anti-diabetic, anti-inflammatory and anti-septic [12, 13]. A wide range of compounds has been reported in essential oils from different species of the Myrtaceae family, such as phenolic compounds, flavonoids, fatty acids, saponins, steroids, tannins, catechins, and volatile organic compounds [14,15,16]. However, volatile secondary metabolites are the most gained constituent of the species of the Myrtaceae family among researchers. To date, more than 100 volatile organic compounds have been reported and these compounds are generally classified as secondary metabolites [17, 18]. The common classes of volatile secondary metabolites reported in the Myrtaceae family primarily comprised oxygenated and hydrocarbon sesquiterpenes with monoterpenes [19]. The mevalonate pathway is responsible for the production of these distinct terpenes. Besides volatile secondary metabolites, these essential oils also consist of polyphenols generated through the shikimate pathway [12].

A review of the existing literature indicated that no reports are available on untargeted metabolic profiling of volatile secondary metabolites in essential oils from different species within the Myrtaceae family from the Australian bush plants. The reported studies are mostly limited to the investigation of the chemical composition of the essential oils from Myrtaceae family rather than integrating the chemical composition and metabolomics [20, 21]. Thus, this study aimed to employ metabolomic approaches to qualitatively identify metabolites for the determination of major classes of compounds and discrimination with clustering patterns of essential oils originating from five different species of the Myrtaceae family. The selected essential oils for the untargeted metabolic profiling of volatile secondary metabolites are anise myrtle oil, honey myrtle oil, lemon myrtle oil, lemon-scented tea tree oil, and peppermint gum oil. Subsequently, volatile secondary metabolites with greater potential (known as biomarkers) to differentiate between the five essential oils, originating from different species within the same family, can be ascertained. Following this, the biomarkers from this research work can be useful in studying the respective roles of essential oils from different species in the field of medicinal chemistry, such as spasmolytic, analgesic, anticonvulsant, expectorant, anti-inflammatory, and anti-microbial activities [22, 23].

2 Materials and Methods

2.1 Collection of Essential Oils and Sample Preparation

The selected five different essential oils formerly originating from Australia were purchased from Essentially Australia® (https://essentiallyaustralia.com.au/). These essential oils belong to the Myrtaceae family, specifically anise myrtle [Syzygium anisatum (S. anisatum)] oil, honey myrtle [Melaleuca teretifolia (M. teretifolia)] oil, lemon myrtle [Backhousia citriodora (B. citroodora)] oil, lemon-scented tea tree [Leptospermum petersonii (L. petersonii)] oil, and peppermint gum [Mentha piperita (M. piperita)] oil. The essential oils were stored in amber bottles at a storage temperature of 4 °C and protected from direct light exposure. These oils were used directly as provided without any pre-concentration and modification steps. The sample (0.5 g) was weighed and placed in 20-mL gas chromatography headspace vials with silicon caps. The vials were then stored at a temperature of 80 °C to prepare them for subsequent analyses. After 10 min of spinning at 80 °C, 1.5 mL of headspace was collected with a 2.5 mL headspace syringe.

2.2 Analysis of Volatile Metabolites Using GC×GC-TOFMS

The volatile metabolite compositions in all five essential oils were extensively analyzed using static headspace extraction coupled with separation by GC×GC-TOFMS (LECO Pegasus 4D, Castle Hill, Australia). The GC×GC-TOFMS comprised autosampler and two-dimensional gas chromatography. The autosampler settings are as follows: incubator and agitator turned on, incubator temperature of 80 °C, fill speed of 50 μL/s, injection speed of 100 μL/s, injection volume of 1.5 mL, and syringe temperature of 80 °C. The key conditions of the gas chromatography are as follows: splitless injection mode, injector temperature of 250 °C, helium (99.9999%) as the carrier gas, and flow rate of 1 mL/min. The chromatographical separation condition of two-dimensional gas chromatography is divided into primary and secondary conditions. In the primary condition, Agilent DB-624UI was used as the primary column (mid-polar; composition of 6% cyanonpropyl phenyl, 94% polydimethyl siloxane; 30 m x 250 μm x 1.4 μm). In the secondary condition, Restek Stabilwax was used as the secondary column (polar; composition of crossbond polyethylene glycol, 0.9 m x 250 μm x 0.50 μm) and the secondary column temperature of 15 °C offset primary. Other conditions of GC×GC as follows: modulator temperature of 25 °C offset primary, modulation time of 2.5 s, hot pulse time of 0.4 s, and cool time of 0.85 s. The mass spectrometry conditions are as follows: transfer line of 240 °C, voltage of 1700 V, electron energy of 70 V, scan rate of 200 spectra/s, a mass range of 35–350 m/z, and ion source of 240 °C. During the analysis, a void vial was utilized as a blank to determine background contamination and all five essential oils were mixed to provide a quality assurance standard.

2.3 Data Processing and Statistical Analysis

Each essential oil was static headspace extracted and analyzed in three independent replicates. The GC×GC-TOFMS data was processed using LECO ChromaTOF 4.50 software, which included pre-processing tasks such as baseline correction. For the identification of volatile metabolites, a library matching (NIST 11 v2.0) approach was comprehensively applied and putative identification was determined based on a similarity threshold of at least 80% with the NIST library to establish a reasonably high level of confidence in the identification process. Subsequently, the MetaboAnalyst version 5.0 JavaServer was used to further analyze the processed data from GC×GC-TOFMS for multivariate statistical analysis (MSVA) and univariate statistical analysis [24]. Prior to both analyses, the dataset was pre-processed with log transformation, normalized by the median, and not subjected to scaling. In the initial stage of MSVA, the principal component analysis (PCA) and partial least square-discrimination analysis (PLS-DA) were applied to the processed data matrix of essential oils. PCA was used to access the clustering trend of the comprehensive depiction of the essential oils; meanwhile, PLS-DA was used to analyze further the correlation of metabolites in the essential oils. Afterwards, the variable selection, namely variable importance of projection (VIP), was used to select the top 25 possible chemical markers (VIP greater than 1.5) [25]. Besides, hierarchical clustering analysis (HCA) was used to visualize the metabolite profiles and reveal the relationship between metabolites and samples [26]. In addition, K-mean cluster analysis is another classification technique which widely used to minimize the distance within clusters and maximize the distance between different clusters [27].

3 Results and Discussion

3.1 Bilinear Models

3.1.1 Principal Component Analysis

In the initial stage, the metabolite profiles (representative total ion chromatogram of each essential oil presented in Figure S1) were compared in PCA score plots by submitting the combined anise myrtle oil (AM), honey myrtle oil (HM), lemon myrtle oil (LM), lemon-scented tea tree oil (LST) and peppermint gum oil (PG) in triplicate samples. Figure 1a is the PCA 2D-score plot combination of PC1 and PC2 of the metabolites in different oils. As illustrated in Fig. 1a, all LST and LM samples were predominantly distributed on the positive region (right side) of PC1 of the PCA score plot. Conversely, HM, AM and PG samples were present predominately on the negative region (left side). In conjunction with that, Madala et al. [28] proved that PCA score plots showing different clusters of samples suffice to understand the apparent clustering of samples due to their biochemical constituents. The PCA analysis dataset of the five essential oils showed distinct differences in clustering patterns. The five significant clusters scattered far apart except for LM and LST samples clustering side by side. However, the closer distribution of LM and LST in the PCA plot is probably due to the relatively small differences in the metabolic profile among LM and LST, or these oils are the most chemically similar. There was a great degree of separation between HM and the other four essential oil samples along the PC2 axis, indicating that the metabolic profiles of HM were prominently different from those of the other four oils. In a recent study, Elhawary et al. [29] identified metabolomics profiles of five essential oils belonging to the same Rosa varieties, and the PCA analysis gave rise to five different clusters, which were similar to the present study. In this study, the five selected essential oils belonging to the same Myrtaceae family showed five different clusters in the PCA score plot. Out of these five samples, PG gave the smallest area of a range of metabolites. The small range area indicates a minor variation among the replicates. PCA biplot was used to see the relationship between loading scores and exhibited using correlation scaling (Figure S2). It demonstrates divergence between all metabolites found in the five oil samples. Based on the biplot, isothiocyanato-methane was the metabolite which significantly discriminated HM from other oils. In addition, 3-methyl- pentanoic acid methyl ester discriminated PG from others. HM. On the other hand, PG, AM, LM and LST each formed their own cluster and did not overlap, meaning that each cluster did not share the same metabolites and/or the same metabolites with significantly different concentrations. Similarly, in a previous study of untargeted metabolomics profiling of aroma compounds in Australian indigenous fruits, the results showed the discrimination of the samples in three clusters in the PCA score plot, indicating the aroma components in the fruits studied were different [25]. However, Wan Salleh et al. [30] reported that five essential oils from leaves of different Beilschmiedia species were able to cluster into three groups instead of five. Group 1 consists of essential oils from B. kunstleri and B. maingayi, group 2 consists of B. penangiana and B. glabra, and group 3 consists solely of B. madang. This is attributed to different volatile secondary metabolites responsible for the characterization of clusters. Also, Carmargo et al. [31] reported that essential oils from leaves and branches of nine species of the Celastraceae family were able to cluster into 2 distinct groups, which can be attributed to the presence of oil storage structures in the parts of the species used in the extraction process. Therefore, there are two possibilities associated with the excellent discrimination pattern of essential oils from five different species, specifically different compositions of volatile secondary metabolites or the purchased essential oils of different species extracted from different parts of the plant species.

Fig. 1
figure 1

(a) 2D-score plot of PCA model based on five essential oils, (b) 2D-score plot of PLS-DA model based on five essential oils

3.1.2 Partial Least Squares-Discriminant Analysis

To further verify the difference in metabolic profiles among these essential oils, PLS-DA was carried out to identify and classify the biomarker, thus helping in improving data interpretations [32]. PLS-DA 2D-score plot shows component 1 and component 2 with the highest percentage of correlation, 23.8% and 27.4%, respectively (Fig. 1b). There are five distinct clusters of the five samples shown in the plot. LM and LST samples were clustering near each other, the same as HM and AM samples, leaving only the PG samples discriminated far from those four samples. Interpretation can be made more clearly by referring to the 2D-score plot in Fig. 1b and Table S1, as the sample groups were classified more precisely. The scattering and non-overlapping clusters showed different biomarker compounds existed in each essential oil sample even though the samples were all under the same family. In addition, a total of the top 25 potential biomarker metabolites were sorted, as shown in Figure S3 and the details are tabulated in Table 1. VIP plot summarised the significance of the variables to clarify the X data, correlated them to the Y data and introduced variables with the strongest influence in the model [33]. In this model, the VIP values were set at greater than 2.0 and loadings correlation coefficient [p(corr)] values above 0.5 to be significant for sample separation in the PLS-DA model. As a result, only 24 compounds were significant biomarkers from the top 25 shortlisted metabolites. With reference to Tables 1, 3-methyl-1-pentanol has the highest VIP score, which is 4.0041 in component 1, meaning that it is the most significant biomarker in all samples. This compound was found most abundantly in the PG samples but the least in the HM samples. Followed by 1-octen-3-ol and 3-methyl-pentanoic acid methyl ester are the second and third biomarkers with VIP scores of more than 3 (3.4081 and 3.4008, respectively). Similarly, the rest of the 21 biomarkers and their distribution (highest and lowest) in respective oils can be observed in Table 1. There are several factors that can influence the type and composition of volatile secondary metabolites in essential oils of different species and eventually lead to possibilities to become putative biomarkers to differentiate essential oils, such as harvesting time of the plant where the essential oils are originating from, part of the plant where the essential oils are being extracted, agricultural methods and practices, soil type, altitude, climate, growth conditions of plants, and developmental stage of plants [34].

Table 1 Detailed information on the top 25 metabolites from five different essential oils identified by VIP based on the PLS-DA model

3.2 Cluster Analyses: Hierarchical and Partitional Clusterings

The heatmap of HCA depicted a data matrix as a grid, with each row and column representing a variable and sample, respectively. The cells contain the values of the pairwise associations between them (variable and sample) and values in each cell are then colour-coded using a gradient scale, with warmer colours (e.g., red) representing higher values and cooler colours representing lower values (e.g., blue). The heatmap was constructed based on the top 25 metabolites according to t-test (Figure S4). It is observed that there were five compounds that were highly positive-correlated in the PG samples but negatively correlated in HM, LM, and LST samples, namely 3-methyl-3-buten-1-ol acetate, 4,4-dimethyl-2-cyclopenten-1-one, 2-hexanone,4-methyl, 4-methyl-1-pentanol, and 4-methyl-pentanoic acid methyl ester. Both isopropyl acetate and 1-methylethyl formic acid ester were positively correlated in PG and AM samples but negatively correlated in HM, LM, and LST samples. 2-(2-propenyl)-furan and 3-methyl-phenol were positively correlated in PG, HM and AM samples but negatively correlated in LM and LST samples. Meanwhile, 1,1,2-trimethyl-cyclopentane, 3-methyl-2-pentanone, 2,3,5-trimethyl-furan, and 3-methyl-phenol were positively correlated in PG and AM samples but negatively correlated in HM, LM and LST samples. Out of the top 25 metabolites (based on T-test), only (z)-2-octene and isothiocyanato-methane were negatively correlated in PG samples. In addition, only ethylbenzene and α-myrcene showed a positive correlation in LM samples. Similarly, in the previous study by Southwell et al. [36] on essential oil from leaves of LM, the result showed that the oil was found to contain myrcene (0.1–0.7%).

On the other hand, the clustering of different essential oils was visualized using hierarchical clustering. Based on the dendrogram in Fig. 2a, each triplet analysis of PG, AM, HM, LM, and LST samples forming own significant cluster (five groups). The PCA model supports the formation of five clusters. The PG sample was correlated to the AM sample and both samples correlated with HM. PG, AM, and HM forming a cluster also correlate with clusters formed by LM and LST samples. Similarly, Yong Lee et al. [37] reported three distinct clusters for three different essential oils (i.e., spearmint, apple mint, and chocolate mint).

Since the performance of hierarchical clustering can be reduced based size of the data matrix, thus partitional clustering using K-mean is preferred as it has outperformed clustering in terms of entropy and purity using a cosine similarity measure [38]. In conjunction with that, K-mean clustering was applied for partitional clustering. Four significant clusters of the five samples in this study were formed using K-mean clustering and each cluster was discriminated far from each other (Fig. 2b). The centroid is the centre of the cluster for each distinct cluster. PG sample formed the smallest clusters and the distance from the centroid is the lowest. This also indicates that the samples in the PG cluster possessed the least variability compared to the other four sample clusters. The triplet analysis of each oil formed individual clusters for AM, HM, and PG samples (Table S2). However, triplicate analyses of LM and LST were grouped into a single big cluster. Combining samples from two different essential oils (LM and LST) in a single cluster might be due to the similar metabolites in these two essential oils. This is supported by the closer distribution of the LM and LST in the PCA plot. Not to mention, these two oils also correlated with each other in hierarchical clustering. Therefore, the K-mean performs better in the clustering of samples, as reported by Abdalla [38].

3.3 Metabolite Set Enrichment Analysis

Metabolite set enrichment analysis is an approach designed to assist metabolomics scholars in identifying and interpreting patterns of metabolite concentration changes in a biologically meaningful way [39]. A total of 503 compounds were identified by utilizing GC×GC-TOFMS. Ketones were identified as the major class (with the greatest enrichment ratio) among the five essential oils, followed by aryl-aldehydes and phenylacetaldehyde. Butanone and acetone were the main components of the ketone class, followed by aryl-aldehyde represented by 2-furancarboxaldehyde (Figure S5). Similarly, a study by Clarke [40] also presented that essential oils originating from aniseed, sage, hyssop, fennel, rosemary, caraway, pennyroyal, peppermint, spearmint, and dill also contain ketones as the primary metabolite. Also, Lu et al. [41] and Mugao et al. [42] reported ketones as one of the major classes of volatile secondary metabolites in essential oils originating from peppermint (Mentha piperita L.) and ginger, garlic, tick berry, and Mexican marigold, respectively. The higher abundance of ketone in essential oils could be associated with the reason that plant species naturally produce higher amounts of ketone-containing compounds as part of their essential oil composition. These compounds may be synthesized through specific biochemical pathways within the plant. On a positive note, certain ketones found in essential oils have been associated with various therapeutic effects, including analgesic, anti-inflammatory, and antimicrobial properties. For example, menthone and camphor are known for their cooling and pain-relieving effects [43, 44]. In terms of p-value enrichment analysis, wax monoester possessed a p-value of 2.48 × 10− 16 showing the greatest actual hits and the class represented by hexyl acetate, propyl propionate, 3-methylbutyl formate, 3-methylbutyl propanoate, 2-methypropyl propanoate; hexyl propionate, methyl 3-methylbutanoate, isopentyl isopentanoate, methyl pentanoate; pentyl 3-methylbutanoate, 2-methylpropyl 2-ethyl-2-methylbutanoate, and ethyl hexanoate.

Fig. 2
figure 2

(a) Hierarchical clustering dendrogram of five essential oils, (b) K-mean clustering of five essential oils

4 Conclusion

The primary objectives of the study such as (i) identification of volatile secondary metabolites via an untargeted approach and determination of major classes of these metabolites, (ii) discrimination and clustering patterns associated with the identified metabolites, and (iii) identification of biomarkers among the identified metabolites, were accomplished. The findings of this study suggest that a wide range of volatile secondary metabolites are present in essential oils. Although these essential oils originated from different species of a similar family, both supervised (PLS-DA) and unsupervised (PCA) approaches produced a similar outcome on the discrimination pattern where the five essential oils were well discriminated according to their species. Also, it can be deduced that although more than 500 metabolites were identified, not all of them are significant enough to be used as biomarkers. In this context, only 24 metabolites were deemed biomarkers, with VIP values exceeding 2.0. Besides, it is important to note that although both hierarchical and partitional clusterings are well-known for the clustering of samples, different clustering models do not necessarily have similar outputs. Thus, partitional clustering (K-mean) is recommended for future studies as it produces a more accurate result. The combination of metabolite set enrichment analysis with other analyses is essential as it provides more explicit information on the significant class of metabolites in all the samples, like the ketone class in this study. For future studies that investigate volatile secondary metabolites incorporated with metabolomics, it is recommended to involve validation steps for the supervised models, such as cross-validation and permutation tests. This is because supervised models tend to overfit the data and produce incorrect information on the discrimination pattern. However, the validation steps are not a must in the current study as both supervised and unsupervised models had the same pattern.