1 Introduction

The metabolite fingerprinting aims to classify a large number of samples using multivariate statistics analysis, which reveal particular chemical patterns within a complex mixture, with or without identification of their components, and usually with a preliminary separation of the matrix (e.g. use of chromatographic techniques) (Farag et al. 2017; Luthria et al. 2008; Ryan and Robards 2006; Salvador et al. 2016). The patterns are associated to the presence and concentration of certain type of natural products biosynthesized according to the expression of genes in a reduced population associated to a common ancestor (Glauser et al. 2013; Sarker and Nahar 2012; Semmar et al. 2007) and to the plant environment (Hao et al. 2015; Williams et al. 1989; Wink 2010). The metabolite fingerprinting could assist in the determination of food quality (Hummer et al. 2014; Kashif et al. 2009; Klockmann et al. 2016; Lee et al. 2009; Liu et al. 2017; Magagna et al. 2017; Mayorga-Gross et al. 2016; Pongsuwan et al. 2007; Ronningen et al. 2017; Zieliński et al. 2014), geographical origin (Consonni et al. 2012; Kim et al. 2013; Mehari et al. 2016; Panet al. 2016; Watson et al. 2006), ecological and evolution studies or assessment of plant specimens classification (Hegnauer 1986; Luthria et al. 2008; Mattoli et al. 2006; Musah et al. 2015; Wink 2003).

After morphological characters were established as a general rule for plant taxonomy, the independent work of some botanists caused a certain level of discrepancies principally due to the lack of communication and the huge diversity of plant families (Godfray and Knapp 2004). This problem was lessened with arisen of molecular taxonomy techniques based on the analysis of specific DNA regions (Govindaraghavan et al. 2012; Walker 2014). Even so, there are some gaps associated to the usual protocols followed specially with the freshness of collected plant tissue or the size of the sample or when morphological characters are not enough, which could be surpassed with the chemical fingerprinting using a sensible technique, such as Liquid Chromatography coupled to Mass Spectrometry (LC–MS). This technique is ideal for identification of secondary metabolites in complex biological matrices, such as, fruits, vegetables, seeds and plants extracts without any purification procedure. In addition, tandem mass spectrometry experiments provide useful structural information of compounds when their identification is needed (Del Bubba et al. 2012; Stobiecki 2000; Stobiecki et al. 2015).

Several metabolomic methods have been reported for chemotaxonomic purposes. For example, Fabaceae super family exhibits a huge diversity of nitrogen-rich natural products such as pyrrolizidine, quinolizidine, indol-type and imidazol-type alkaloids, protein, non-protein amino acids and polyamides, which have been proposed as chemical markers for chemotaxonomic classification (Wink 2003, 2013). Specific use of phenolic compounds as markers for discrimination between species is reported in several families including Fabaceae (Visnevschi-Necrasov et al. 2015), Lamiaceae (Kharazian 2014), Araceae (Clark et al. 2014), Apiaceae (Güzel et al. 2011) and Moraceae (Venkataraman 1972). Visnevschi-Necrasov et al. (2015) observed genus level discrimination when using quantitative HPLC determination of isoflavones present in nine species of four different genera of the Fabaceae family. The ubiquity of polyphenols and their value as markers could be advantageous in the discrimination of phylogeny of the plants (Wink 2003, 2013) instead of the use of specific compounds such as alkaloids, lactones or terpenes, which have a patchy and irregular distribution into the same family.

Chamaecrista nictitans (Fabaceae/Caesalpiniaceae) is reported to be an annual, herbaceous, leguminous plant native to the United States, Mexico, parts of South and Central America according to Tropicos database (“Tropicos” 2017). It is found in the wild, growing alongside rural roadsides. It has been recently associated with antiviral activity (Herrero Uribe et al. 2004). Phytochemical screening of C. nictitans (Fabaceae/Caesalpiniaceae) in Costa Rica was reported by Herrero Uribe et al. (2004), whom associated the mechanism of action against Herpes simplex II virus of purified fractions of C. nictitans with the presence of certain polyphenolic compounds. Chemical constituents were described as a mixture of complex tannins according to their 1H-NMR fingerprint and qualitative TLC analysis. Nevertheless, several subspecies and varieties of C. nictitans are found in Costa Rica (see Fig. 1) and the lack of agreement among taxonomists conspired to make a proper classification of the species (Hammel et al. 2010). These constraints made difficult our efforts to isolate active components. Further studies using LC–MS/MS demonstrated that all C. nictitans exhibit a high content of proanthocyanidin oligomers, whereas C. nictitans subspecies patellaria presents particular monohydroxyphenol substructures such as (epi)guibourtinidol, (epi)afzelechin and (epi)fisetinidol, with B-type and A-type linkages (Mateos-Martín et al. 2014).

Fig. 1
figure 1

Representative pictures of collected specimens: a C. nictitans var. patellaria; b C. nictitans ssp. disadena var. pilosa; c C. nictitans ssp. nictitans var. jaliscensis; and d C. nictitans ssp. disadena

The aim of this study was to validate the use of a metabolite fingerprinting method based on LC–MS/MS, capable of discriminating between subspecies and varieties of C. nictitans, using a straightforward protocol. Results demonstrate that our approach was able to discriminate between the subspecies and varieties measured, separating the samples into four differentiated groups: M1 = C. nictitans ssp. patellaria; M2 = C. nictitans ssp. disadena; M3 = C. nictitans ssp. nictitans var. jaliscensis and M4 = C. nictitans ssp. disadena var. pilosa. Thus, this study reports a useful tool to discriminate between subspecies and varieties of C. nictitans in a simple and fast way.

2 Materials and methods

2.1 Plant material collection

Forty samples were collected in different sites of the Central Valley and the Pacific coast of Costa Rica from September 2013 until February 2014, from October 2014 until November 2014 and July 2015, (see Table S1 in Supplementary Information) including subspecies patellaria and disadena; ssp. nictitans var. jaliscensis and ssp. disadena var. pilosa all identified by Prof. Luis Poveda (National University of Costa Rica) in the field. Green fresh aerial parts were sampled for all specimens (including stems, leaves and flowers) except for collections from Puntarenas, which were collected without flowers in an early stage of development. Each sample was split into two: one for chemical analysis and another for genetic taxonomy. For most samples, a voucher specimen was collected and kept at the Herbarium of the National University of Costa Rica. Centro de Investigaciones en Productos Naturales (CIPRONA, UCR) provided four specimens cultivated in a greenhouse (C. nictians ssp. patellaria n = 3 and C. nictians ssp. disadena n = 1), and two in vitro samples in early stage (C. nictians ssp. patellaria n = 1 and C. nictians ssp. disadena n = 1). Herbarium of the University of Costa Rica (USJ, UCR) provided 5 samples of genus Chamaecrista with three different species than C. nictitans (C. rotundifolia n = 1, C. diphyllia n = 1, C. flexuosa n = 4) and seven samples of C. nictitans species not further classified including two samples collected in United States (see Table S2). A grand total of 58 samples were included in the experiment.

2.2 In vitro cultures and greenhouse conditions

For the establishment of in vitro cultures, seeds were removed from fruits of C. nictitans ssp. patellaria, C. nictitans ssp. disadena, C. nictitans ssp. nictitans var. jaliscensis and C. nictitans ssp. disadena var. pilosa. The seeds were rigorously washed with water and soap. Afterwards they were dipped into ethanol solution (70% v/v) followed with sodium hypochlorite solution (3% v/v, 15 min) followed by triplicate washes with sterile water. Seeds from C. nictitans ssp. patellaria and C. nictitans ssp. disadena germinated after 2 months of culture on a MS medium (Murashige and Skoog 1962) devoid of growth regulators, with activated charcoal (1.5 g/L) and solidified agar (7 g/L). The pH of the culture media was adjusted to 5.7. Culture media was autoclaved (121 °C, 20 min). The cultures were kept in a controlled environment (25 °C with 16-h photoperiod). For the establishment of greenhouse cultures, seeds were planted in plastic pots (30 cm of diameter), containing disinfected sandy loam soil under greenhouse conditions. Plants from C. nictitans ssp. patellaria and C. nictitans ssp. disadena emerged after 4 months.

2.3 Plant material extraction

Fresh chopped material (1 g) was extracted once in a 25 mL Erlenmeyer with methanol (10 mL, HPLC grade, J.T. Baker, Center Valley, USA) with sonication (10 min). The alcoholic extract was recovered by filtration on a glass funnel with cotton and completely dried under vacuum in a R200 rotary evaporator (Büchi, Flawil, Switzerland) at 40 °C. After evaporation, 100 mg of the solid residue were dissolved in methanol (10 mL, HPLC grade, J.T. Baker, Center Valley, USA) and eluted through a 100 mg STRATA-C18 cartridge (Phenomenex, Torrance, USA) fitted on a solid phase extraction system (SPE-SUPELCO, Darmstadt, Germany); the resulting filtrate yields a brownish to yellowish color, depending on the sample. The filtrate was recovered and evaporated to dryness under vacuum in a centrifugal evaporator (SpeedVac SC200, Waltham, USA) equipped with a RVT400 refrigerated vapor trap (Thermo Savant, Waltham, USA).

Samples from the herbaria (ca 250 mg) were extracted twice with 10 mL of methanol (HPLC grade, J.T. Baker, Center Valley, USA) and once with 5 mL of methanol (HPLC grade, J.T. Baker, Center Valley, USA) with sonication (15 min each); extracts were combined and concentrated to 10 mL. The remaining extract was filtrated through a 50 mg STRATA-C18 cartridge (Phenomenex, Torrance, USA) as described above. Filtrates were dry under same conditions as above. Two subsamples of each filtrate were weighed out (2 mg each) and dissolved to a 2 mg/mL concentration with a mixture of acetonitrile:water (8:2; Optima, Fisher Scientific, Waltham, USA). The injected sample consisted of a mixture of 800 µL of the filtrate solution and 70 µL of an acyclovir (USP Reference standard, purity 99.3%), solution (1 mg/mL, in acetonitrile:water 8:2) as internal standard.

2.4 Chromatographic and Mass Spectrometry conditions

Chromatographic profiles were performed on a 50 × 2.1 mm Waters® ACQUITY™ 1.7 µm BEH C18 column (Waters, Milford, USA) in an ACQUITY Ultra Performance LC™ system equipped with an auto sampler and PDA detector. The column was kept at 40 °C and the auto sampler at 10 °C. Injection volume was 2.0 µL. The flow rate was set to 550 µL/min and a gradient elution was carried out with a binary system consisting of [A] 0.1% aqueous formic acid (Optima, Fisher Scientific, USA) and [B] 0.1% formic acid (Optima, Fisher Scientific, Waltham, USA) in acetonitrile (Optima, Fisher Scientific, Waltham, USA). An increasing linear gradient (v/v) of [B] was used as follow [t(min), %B]: 0.00, 8; 4.30, 20; 10.00, 35; 12.00, 70; 16.00, 92; followed by re-equilibration steps (20.00, 8; 22.00, 8). PDA detector was set from 190 to 600 nm with a resolution of 1.2 nm. Mass spectrometry was performed on a Waters® SYNAPT ESI-QTof system (Waters, Milford, USA). Desolvation gas (N2) flow was set to 450 L/h and a desolvation temperature of 150 °C, cone gas (N2) flow was set to 10 L/h and Source temperature was set to 200 °C. The capillary voltage and sampling cone voltage were set to 3.0 kV and 35 V respectively. Extraction cone voltage was set to 4.0 V for negative operation mode. Detection was performed using two different MS functions, one of low collision energy (4.0 V) to record the exact mass and a second one of high energy using a collision energy ramp (10–40 V) for obtaining preliminary fragmentation patterns. Both with a scan time and an inter scan time delay of 0.2 and 0.02 s respectively. MS/MS experiments were obtained using collision induced dissociation (CID) functions with collision energy from 25 to 55 eV depending of the molecule.

All analysis were acquired using Lock Spray™, Leucine-enkephalin was used as lock mass (V: 554.2615). Data was collected in centroid mode, with a lock spray frequency of 10 s, and data was averaged over 10 scans. The Synapt was calibrated in negative mode with sodium formate (reference mass 860.8467 uma) for an m/z range from 100 to 1200 in negative ionization mode. MassLynx software (version 4.1, Waters) was used for acquisition and data processing.

2.5 MZmine data processing and multivariate data analysis

Data of LC–MS runs were treated with MZmine v2.33 software(Pluskal et al. 2010) for data mining, considering all peaks with intensity above 20, using the Grid Mass (Treviño et al. 2015) algorithm with an m/z tolerance of 8 ppm and a min–max width time of 0.03–0.5 min, ignoring detection after 11 min. Afterwards, deisotoping and filtering procedures were performed to remove all peaks without isotopic pattern. Alignment was performed using Join Aligner algorithm with a retention time tolerance of 0.1 min and m/z tolerance of 8 ppm. Gap filling was performed using the Same RT and m/z Range Gap Filler algorithm with a RT tolerance of 0.3 min and an m/z tolerance of 8 ppm.

The processed data (*.CSV format) was fed to SIMCA-P 13.0.2.0 software (Umetrics, Umea, Sweden) for multivariate data analysis using the Principal Component Analysis (PCA) algorithm for studying tendency of different samples to group. Afterwards, a Partial Least Square Discriminant Analysis (PLS-DA) algorithm was applied. The significance of the model was determined with a cross-validated analysis of variance (CV-ANOVA) p-value < 0.001 and its quality was evaluated by the R2X (0.837) and R2Y (0.909) values. Goodness of fit and percentage of explained variability, and the Q2X (0.842) for predictive capacity of the model were determined. A permutation test was performed to evaluate data over fitting (n = 200). The degree of discrimination of each ion between the different groups was assessed by performing and assigning a Variable Importance Projection index (VIP index) to each variable. The studied ions considered important have a VIP value higher than 1.00 and ANOVA p-value lower than 0.05.

2.6 Metabolite identification

Structures were tentatively proposed based on fragmentation patterns generated due to certain types of fragmentation reactions (Retro Diels–Alder [RDA] and heterolitic ring fission [HRF]), which provide information on the hydroxylation pattern on A, B and C-ring (Figure S1a) of the flavonoid unit, bonds between two units (Quinone Methide [QM]) and the unit itself (Jaiswal et al. 2012). Fragmentation takes place from the top unit (more prone to fragment) to the base unit in the compound, giving information about the order of the subunits in the polyphenolic structure (Calderón et al. 2009; Friedrich et al. 2000). The presence of C- or O-glycosides gives characteristic fragments through the cleavage of two C–C bonds in different positions of the sugar moiety. The classical nomenclature was employed to describe the fragmentations pathways of the compounds and position of C-glycosides were assigned by comparing relative intensities of the fragments according to literature (Figure S1.b) (Benayad et al. 2014; Farias and Mendez 2014; March et al. 2004, 2006; Stobiecki 2000; Stobiecki et al. 2015).

2.7 DNA barcoding analysis

The plants ITS2 region was sequenced over a subset of 20 C. nictitans samples including subspecies and varieties (Table S1). Selection of C. nictitans samples was based on identified metabolites data matrix as described chemo types in Fig. 1 (M1 = C. nictitans ssp. patellaria; M2 = C. nictitans ssp. disadena; M3 = C. nictitans ssp. nictitans var. jaliscensis; M4 = C. nictitans ssp. disadena var. pilosa). The former identification by Prof. Luis Poveda (Table S3) was kept in the phylogenetic analysis to compare with molecular and chemical analysis. DNA was isolated from approximately 50 mg of disrupted leaf tissue with a micro mortar. 750 µL of lysis buffer (20 mM sodium EDTA pH 8.0, 100 mM Tris–HCl pH 8.0, 1.4 M NaCl, 2.0% CTAB, 2% PVP and 0.2% of beta-mercaptoethanol) was added. After an incubation at 60 °C for 20 min, 750 µL of chloroform:octanol (24:1) was added and tubes mixed gently by inverting 20 times. The mixture was centrifuged at 13,000 rpm for 5 min and 400 µL of the aqueous phase was transferred into a new 1.5 mL reaction tube. One volume of cold isopropanol was added to precipitate the DNA at room temperature for 5 min. The solution was centrifuged at 13,000 rpm for 5 min, the supernatant was poured off and the pellet kept at the bottom of the tube. The pellet was washed with 70% ethanol (500 µL) and centrifuged at 13,000 rpm for 2 min. The ethanol was removed, and the pellet dried at 37 °C. Finally, the pellet was dissolved in 50 µL of TE buffer with treatment of 1 µL of RNase (Fermentas). After an incubation at 37 °C for 1 h, the DNA was measured spectrophotometrically and diluted at 50 ng/µL for PCR amplification. Isolated DNA was amplified with primers for ITS2 region. We used S2F forward primer (5′ATGCGATACTTGGTGTGAAT′3) designed by Chen et al. (2010) and S3R reverse primer (5′GACGCTTCTCCAGACTACAAT′3) designed by Chiou et al. (2007). PCR reactions were conducted in a final volume of 20 µL, containing 1× of HotStart Ready Mix (Fermentas) and 0.250 mM of each primer. Finally, 2 µL of DNA (ranging 50–100 ng/µL) were added to the final reaction. PCR were performed with an initial denaturalization step of 95 °C for 5 min, followed by 35 cycles of 95 °C for 30 s, 55 °C for 30 s. and 72 °C for 1 min, and a final elongation step of 72 °C for 10 min. The Veriti™ 96-Well Thermal Cycler (Applied Biosystems, USA) were used for PCR reactions and for further steps for sequencing.

For sequencing, 10 µL of PCR products were purified using 2 µL of ExonucleaseI (Thermo Scientific Fermentas, USA) and 1 µL of Phosphatase Alkaline (Thermo Scientific Fermentas, USA). The mixture was incubated with an initial temperature of 37 °C for 15 min, followed by 85 °C for 15 min. Sequencing reactions were conducted with Big Dye Terminator 3,1® Kit (Applied Biosystems, USA) with final primer concentrations of 0.320 µM of either forward or reverse primer. The thermal profile was as follows: 1 cycle of initial temperature of 96 °C for 1 min; 15 cycles of 96 °C for 10 s, 50 °C for 0.05 s and 60 °C for 75 s; 5 cycles of 96 °C for 10 s, 50 °C for 0.05 s and 60 °C for 90 s; 5 cycles of 96 °C for 10 s, 50 °C for 0.05 s and 60 °C for 120 s. Sequencing products were purified with Big Dye XTerminator® kit (Applied Biosystems, USA) according to manufacturer instructions. Purified sequencing reactions were analysed in a 3130xl Genetic Analyzer (Applied Biosystems, USA) with 50 cm capillary and POP7 polymer (Applied Biosystems, USA).

Forward and reverse strands were manually inspected in FinchTV (Geospiza) and a consensus sequence was obtained in BioEdit (Hall 1999). Multiple sequence alignments of ITS2 region were conducted using MEGA version 6 (Tamura et al. 2013) with MUSCLE option (Edgar 2004) using sequences from this study. Besides, sequences of C. nictitans collected in Costa Rica previously deposited by Prof. Federico Albertazzi were retrieved from GeneBank database (Accession codes: KU720152.1, KU720153.1, KU720154.1, KU720155.1, KU720156.1, KU720157.1, KU720158.1, KU720159.1, KU720160.1, KU720161.1, KU720162.1, KU720163.1, KU720164.1, KU720165.1, KU720166.1, KU720167.1). These sequences included 18S, ITS1, 5.8S and ITS2 regions of the ribosomal RNA gene. Retrieved sequences were trimmed to align the ITS2 region to sequences in the current study. After running the best nucleotide substitution models in MEGA 6 (Tamura et al. 2013), “Tamura three parameters” model was selected to build the phylogenetic tree with the Neighbour Joining (NJ) grouping method (with 1000 bootstrap replicates). The same model allowed the estimation of distances between groups, the interspecific and intraspecific diversity.

3 Results and discussion

3.1 Statistical analysis: construction and evaluation of the model

A principal component algorithm was applied to a matrix based on over 2000 ions detected in negative ionization mode (both modes were tested and negative mode was preferred given its higher total intensity, see Figure S2), obtaining the scatter and loadings projections (Figure S3 a and b) showing four defined groups with a fair separation and resolution through an untargeted clustering analysis in SIMCA-P software. The space built by the first two components explains 52% of the total variation. Outliers from group M3 and M4 were evaluated and no chemical differences were found according to chromatograms and mass spectra; hence, they were conserved for further statistical analysis; their location may be a direct consequence of variation in concentration of the metabolites in their profiles.

PLS-DA algorithm on the same variables generated the projection shown in Fig. 2. Validity of the model was determined acceptable by assessing cumulative ratio Q2 (cum) (0.842) and the analysis of variance of cross-validation predictive residuals (CV-ANOVA p-value < 0.001). A later Ward’s minimum-variance clustering based on the PLS-DA analysis was performed (figure S4) showing a good separation between subspecies and varieties, therefore grouping of the samples is suggested to be tightly related to their chemical composition.

Fig. 2
figure 2

Partial least square—discriminant analysis score plot (C1 vs. C2) showing 46 samples from C. nictitans including varieties and subspecies, based on identified metabolites data matrix. (M1 = C. nictitans var. patellaria; M2 = C. nictitans ssp. disadena; M3 = C. nictitans ssp. nictitans var. jaliscensis; M4 = C. nictitans ssp. disadena var. pilosa)

PLS-DA separates group M1, M2 and M3 from M4 in the first component, this separation is also observed in the HCA in figure S4, where groups M2 and M4 are derived from the same branch and are more related to M1 than M3.

Observed segregation is in accordance with chemical profile variation observed in LC–MS profiles (See figure S5). Each subspecie and variety produces specific metabolites but also share some of them (See Table S3). For example, C. nictitans ssp. patellaria (M1) and C. nictitans. ssp. nictitans var. jaliscensis (M3), both produce proanthocyanidin oligomers up to tetramers and some specific C-flavonoid glycosides, such as Cassiaoccidentalin A (35), Cassiaoccidentalin B (34) and their supposed isomers. However both differ in the presence of a trimer with [M–H] at m/z 817.21 (retention time 6.84 min, 10) described as (epi)afz-(epi)fis-(epi)fis, which is a major component in the former specie, and the presence of the C-flavonoid glycoside Luteolin-6-C-hexosyl-(1→2)-rhamnoside (28) in C. nictitans. ssp. nictitans var. jaliscensis.

Chamaecrista nictitans ssp. disadena (M2) shares mainly proanthocyanidin trimeric compounds with C. nictitans ssp. nictitans var. jaliscensis and produces C- flavonoid glycosides in high concentration. On the other hand, C. nictitans ssp. disadena var. pilosa (M4) possess a highly specific metabolism, centered on the production of C- and O- flavonoid glycosides, and some acetylated compounds such as flavonoid acetates.

Thus, it is possible to differentiate up to the subspecies and variety level, plants belonging to C. nictitans species thanks to their distinctive chemical profiles, given that the statistical analysis yields good separation between subgroups. The proposed methodology proves to be simpler and more straightforward than morphological and molecular taxonomy for these specific samples. The comparison of the morphological characters against the chemical profiles and DNA barcoding shows a 63% coincidence, evidencing the morphological similarity in C. nictitans.

The validity and usefulness of this analysis rests on the ability of identifying species and varieties not depending on age, fertility, or even chemical profile variation due to environmental factors such as geographical distribution, soil and direct sun light exposure or even multi-year effect for collection in different years as ours samples. For this, a myriad of samples was obtained and evaluated. The results are in good agreement with the corresponding cluster (Fig. 2). First, greenhouse cultivated plants of C. nictitans ssp. disadena (n = 1, M2I01) and C. nictitans ssp. patellaria (n = 3, MI01–MI03) were tested, as well as immature specimens of C. nictitans ssp. nictitans var. jaliscensis [n = 3, M3J01–M3J03, taxonomy associated based on geographical distribution (Hammel et al. 2010)], which were collected in Jacó, Puntarenas at sea level. Lastly, two early stage in vitro samples (M1V01, M2V02) were analyzed to evaluate the dependency on the ontogeny of the plant and results may suggest that the chemical composition is relatively constant since the early stages of the plant, as the major characteristic compound (14) for C. nictitans ssp. patellaria was detected in in vitro plants in high concentration. These results suggest that LC–MS/MS analysis of crude extracts is capable of distinguishing amongst subspecies and varieties even when they lack of no further morphological characters, imperative in classical taxonomy.

Additional testing and evaluation were performed by analyzing 12 samples provided for the Herbarium of the University of Costa Rica (USJ, UCR) including seven samples classified no further than to the species level (C. nictitans) and five other species of the same genus. Scatter plot of the PLS-DA in Fig. 3 shows that samples of different species (M5, M6 and M7) were grouped within M2 (C. nictitans ssp. disadena) space, this is in accordance with the few detected compounds in all these samples compared to M1 and M4, as explain above for the case of the in vitro samples, and also that these different species mainly produce flavonoids glycosides similar to M2 and M4. The model was able to keep discriminating M1, M3 and M4 subspecies/varieties without losing resolution. Clearly, the model could eventually be refined upon chemical characterization of new variables from other subspecies, varieties or even different species tight related. An important aspect to highlight is the fact that preserved herbarium material with a minimum weight (ca. 200 mg or less, see figure S5) could be analyzed and classified without trouble (the oldest sample collected in 1949). Remaining samples (seven specimens), classified to the species level (C. nictitans) were include into the model and afforded the PLS-DA score plot shown in Fig. 4, where they were spread within the M1, M2 and M3 clusters, five of them were grouped into the C. nictitans. ssp. nictitans var. jaliscensis subgroup (M3). It is highly noteworthy that two of the samples, which were collected in 1965 and in 1957 in the United States (MXH02 and MXH06 respectively, see Table S2) conserved their chemical fingerprint that allowed its placement in the model. This may suggest that the chemical profile and not necessarily concentration of the compounds is genetically stabilized and may not be strongly dependent of external factors.

Fig. 3
figure 3

Partial least square—discriminant analysis score plot (C1 vs. C2) showing 51 samples including C. nictitans varieties and subspecies (n = 46), three different Chamaecrista species (n = 5, M5-M7), (M1 = C. nictitans var. patellaria; M2 = C. nictitans ssp. disadena; M3 = C. nictitans ssp. nictitans var. jaliscensis; M4 = C. nictitans ssp. disadena var. pilosa, M5 = C. rotundifolia, M6 = C. diphyllia, M7 = C. flexuosa L.)

Fig. 4
figure 4

Partial least square—discriminant analysis score plot (C1 vs. C2) showing 53 samples including C. nictitans varieties and subspecies (n = 46) and seven samples classified to Chamaecrista nictitans specie level from Herbarium. (M1 = C. nictitans var. patellaria; M2 = C. nictitans ssp. disadena; M3 = C. nictitans ssp. nictitans var. jaliscensis; M4 = C. nictitans ssp. disadena var. pilosa., MX = Samples from herbarium)

Just one sample (MXH03) provided by the herbarium was classified to variety level as C. nictitans ssp. disadena var. pilosa. However, our statistical analysis grouped it closed to the M2 cluster (C. nictitans ssp. disadena) (Fig. 4), which may indicate a taxonomical misplacement based on the morphological characteristics. Last sample, MXH04, was grouped into the M1 cluster (Fig. 5), indicating that this is a C. nictitans ssp. patellaria. Further examination of the chromatograms and mass spectra of these samples corroborated the observed grouping.

Fig. 5
figure 5

Neighbor-joining tree constructed based on the sequence of the ITS2 region of subspecies and varieties of Chamaecrista nictitans collected in Costa Rica

The highest discriminant ions with a VIP > 1 and a p-value < 0.05 are shown in the table S4. These ions correspond mainly to the proanthocyanidin oligomers present in all subspecies and varieties although others are also present, such as several flavonoid glycosides e.g. compound 21, present only in C. nictitans ssp. disadena var. pilosa.

3.2 Identification of chemicals markers

Analysis of LC–MS and LC–MS/MS data for samples collected afforded the tentative identification of 44 polyphenols (Table 1). Putatively identified compounds correspond to seven B-type tetramers (1–7), eleven B-type trimers (8–14, 17–20), two A-type trimers (15–16), two B-type dimers (33, 38), several flavonoid glycosides and acetates (21–32, 33–37 and 39–43) and one flavanol (44). All molecular formulas were confirmed through accurate mass measurement and detected with an error less than 2.0 ppm, thus corroborating their elemental composition.

Table 1 Characterization of polyphenols found in all samples of Chamaecrista nictitans using UPLC-PDA-ESI-MS/MS-QTof in negative ionization mode

Compounds 14, 15, 18, 34, and 41 to 44 were previously reported by Mateos-Martin et al. (2014), while compounds 21 (Le Gall et al. 2003; Sakushima and Nishibe 1988), 27 (Alston et al. 1965; Herrmann 1988; Saleh et al. 1982; Sharaf et al. 1997), 33 (Del Bubba et al. 2012; Hatano et al. 2002; Lin et al. 2014; Onagas et al. 2007), and 38 (Ferreira and Li 2000; Sobeh et al. 2018) and 40 (Peng et al. 2015) were previously reported in the literature but not for these species. Accordingly, to the proposed structures based on their fragmentation pattern on MS/MS experiments, the remaining proanthocyanidin oligomers and flavonoid glycosides and acetates (113, 16, 17, 19, 20, 2226, 28–32, 35–37, 39), to the best of our knowledge, are reported for the first time. Moreover, compounds 29, 31 and 3637 seem to be regioisomers of 28, 30 and 35 respectively, in consistency with the observed fragmentation patterns in MS/MS experiments.

All structures were proposed according to the above explanation (see Sect. 2.6). For example, compound 21 has a [M–H] at m/z 755 and was characterized as kaempferol-O-hexoside-O-rhamnosyl-hexoside in accordance with the fragments detected. The mass spectra showed ions at m/z 635 [(M–H)-120] (–0,2X2, characteristic of an hexose sugar moiety), m/z 593 [(M–H)-162] (Y2, loss of a hexose moiety), m/z 431 [Y2-162] (Y1, loss of a hexose moiety), m/z 285 (Y0, aglycone, loss of a rhamnose moiety), m/z 255 (Y0–CH2O), 227 (loss of water on ion 255) and 151 (–1,3A). Y0 is produced from ion at m/z 431 by loss of 146 Da, suggesting the presence of a rhamnose sugar. The relative intensities of ions m/z 593 and m/z 285 indicate the presence of the O-glycosidic linkage, also the presence of the ion m/z 593 is an indicative that there are two different sugar moieties attached to the aglycone (figure S6).

The putatively identified compounds across different C. nictitans subspecies and varieties are shown in table S3 and detailed description of other compounds can be seen in supporting information.

3.3 DNA barcoding analysis

Twenty C. nictitans species (including subspecies and varieties) were selected for DNA barcoding analysis (Table S1) to validate the data obtained by LC–MS/MS. Samples were selected based on the chemo types determined from the identified metabolites data matrix. We obtained 100% efficiency of PCR amplification and sequencing with the use of a primer combination for ITS2 designed for DNA barcoding identification of medicinal plants (Chen et al. 2010; Chiou et al. 2007). We certainly had a relatively small sample set, nonetheless, this is not a surprising result considering that with the same primer combination efficiency of PCR amplification was 89,6% in a set of 992 samples (Chen et al. 2010) and 91% in a different set of 192 samples (Han et al. 2013). Ashfaq et al. (2013) reported 100% of sequence recovery when the ITS2F primer was used in cotton species.

The estimation of diversity parameters of our sequences joined with those retrieved from the GeneBank displayed larger inter-specific (0.113) than intra-specific (0.011) diversity. Gao et al. (2010) also described larger inter-specific variation in the Fabaceae using ITS2 region. The sequence diversity of ITS2 (figure S7) allowed the separation of C. nictitans subspecies and varieties in the present study. An in silico study showed that ITS2 inter-specific divergence of congeneric species was greater than intra-specific variation which allow a high rate of correct identification of closely related species into several dicotyledons families, including the Fabaceae family (Yao et al. 2010).

3.4 Comparison of morphological identification with DNA barcoding

When the morphological classification of samples was considered as reference, the comparison with the DNA barcoding (ITS2 sequence) revealed that 17 out of 20 samples grouped accordingly (Neighbor-joining tree in Fig. 5). So that, three discrepancies were observed in the phylogenetic tree. Samples M1B01 (C. nictitans subsp. disadena var. pilosa) and M1Q02 (C. nictitans subsp. nictitans var. jaliscensis) were clustered into the group of C. nictitans subsp. patellaria. Sample M4T01 (identified as C. nictitans subsp. disadena based on morphology) grouped with two of the available Genebank accessions (KU720155 and KU720156) described as C. nictitans var. pilosa. For the retrieved sequences from Genebank, only accession KU720158 (C. nictitans ssp. patellaria) did not grouped accordingly and it was located with accessions and samples of C. nictitans ssp. disadena (Fig. S6). Thus, DNA barcoding analysis suggest that samples M1B1, M1Q02 and M4T01 are morphologically misidentified and belong to the group suggested in the phylogenetic tree. The classification of a morphotype with shared characteristics of C. cultrifolia and C. diphylla allowed the re-establishment of the samples as C. cultrifolia, based on molecular and morphological comparisons (Barbosa et al. 2016).

3.5 Correlation of chemical profile and genetic data

When chemical classification of samples was used as reference, the subset of 20 samples grouped accordingly. As shown in figure S4 (Hierarchical Cluster Analysis) the chemical profile revealed a clear separation of groups M1, M2, M3 and M4. Similarly, the sequence of the ITS2 region revealed a separation of the same groups (Fig. S6). However, the pattern of separation in the phylogenetic tree had some differences with the chemical profile. Groups M2 and M4 in figure S4 derived from the same branch but with the ITS2 region M4 (C. nictitans ssp. disadena var. pilosa) were widely separated which is explained by large differences between sequences (Figure S6).

Differences between sequences of ITS2 (figure S6) were reflected in larger mean distances between groups M1, M2 and M3 (d = 0.471, 0.493 and 0.454 respectively). In this case, for DNA barcoding analysis we had only one sample of M4. To estimate the mean distance for this group, two references of Genebank (KU720155 and KU720156) were included in the estimation of distances. Besides, the groups M1 (C. nictitans ssp. patellaria), M2 (C. nictitans ssp. disadena) and M3 (C. nictitans ssp. nictitans var. jaliscensis) derived from the same branch when ITS2 region was used. This branch was divided in two subgroups: M1 y M2 clustered with a clear separation of both subspecies (bootstrap values > 80). The other subgroup revealed the similarity of the samples of C. nictitans ssp. nictitans var. jaliscensis. This pattern of groups separation was based on mean distances. The least distance (d = 0.0140) was between M1 and M3. The M2 group showed similar distances from M1 to M3 (d = 0.024 and 0.021 respectively).

A similar approach of chemotaxonomy and genetic analysis has been described for plants of genus such as Lespedeza (Kim et al. 2012), Rhodiola (Liu et al. 2013) and Zingiber (Jiang et al. 2006). Similar to our results, Jiang et al. (2006) found that the chemical characters of the investigated species of Zingiber were able to generate essentially the same phylogenetic relationships as the DNA sequences but using trnL and rps16 sequences. Kim et al. (2012) used a combination of ITS and cpDNA (trnL–trnF) for classifying genotypes of Lespedeza sp. They found that in both the genetic and chemotaxonomic classification methods, the distance between species L. cyrtobotrya and L. bicolor was the closest between species and L. cuneata was the farthest away from the other three species.

Our results are also very similar to the study in Rhodiola. Liu et al. (2013) used a combination of morphological characteristics, genetic analysis with sequences obtained from cDNA and phytochemical analysis. Samples of Rhodiola were accordingly classified by genetic taxonomy and with four types of bioactive compound as reference markers. Liu et al. (2013) described that HCA results showed considerably comparable results for both the geno type- and chemo type-based classification methods, as also described in our results.

4 Conclusions

Comparative analysis of metabolic features of plants has a high potential for multiple purposes including chemotaxonomy. Here we describe a multivariate model based on LC–MS/MS fingerprinting capable of discriminating between subspecies and varieties of the medicinal plant C. nictitans. Results demonstrate that our metabolomic approach was able to discriminate between the subspecies and varieties of this plant, separating the samples into four differentiated groups: M1 = C. nictitans var. patellaria; M2 = C. nictitans ssp. disadena; M3 = C. nictitans ssp. nictitans var. jaliscensis and M4 = C. nictitans ssp. disadena var. pilosa. Chemical LC–MS/MS fingerprint results were confirmed using both morphological characters and DNA barcoding with ITS2 region. In conclusion, the metabolomic approach described in this study allows an efficient and reliably differentiation between subspecies and varieties of C. nictitans using a straightforward protocol that lacks extensive purification steps.