Introduction

The term “organic food” denotes products that have been produced in accordance with the principles and practices of organic agriculture [1]. Organic farming is a production system which avoids the use of synthetically compounded fertilizers, pesticides, and growth regulators. Organic farming practices are based on the idea that each part of the farm operation augments the other parts to form an efficient and sustainable food production system, offering many advantages, such as minimizing all forms of pollution and producing food of high quality [1]. For these reasons, there is considerable increase in consumer’s demand for organic foods, in a global scale.

In respect to consumer’s needs, many attempts have been made in order to differentiate between organic and conventional products, but the results were controversial [2]. However, new results arising from foodomics and metabolomic studies have detected differences in minor food components, such as polyphenols and other bioactive compounds [3]. The term foodomics describes the discipline that studies the food and nutrition domain through the application of advanced omics technologies to improve consumer’s well-being, health, and confidence [4]. Among other applications, foodomics can explore the effect of the agronomic environment on the metabolite profile of food. In this field, a lot of studies have been conducted [5,6,7,8,9,10,11,12]. Koh et al. [5] compared the chemical composition of organic and conventional spinach using liquid chromatography-electrospray ionization-tandem mass spectrometry (LC-ESI-MS/MS) and analysis of variance (ANOVA), concluding that the content of flavonoids is higher in organic products. These findings reported to affect the carbon/nutrient balance theory and growth rate as well as growth differentiation balance hypothesis, indicating the allocation of plant metabolism toward higher carbon-containing components (like flavonoids) [5]. In addition, Ren et al. [6] as well as Vallverdu-Qurealt et al. [8] observed the same effects in vegetables by means of target screening in LC-MS. However, the studies on this subject for olive oil are scarce in the literature, trying to investigate the correlation between olive oil contents and production types, using metabolomics and chemometrics [8]. Nonetheless, the comparison between environmental factors and harvesting years has been neglected [11]. Anastasopoulos et al. [12] and Rosati et al. [9] measured total phenolic content with Folin Ciocalteu. They observed that the phenolic content was higher in organic production type. The abovementioned studies [9,10,11,12] denote that there is an existing connection between the phenolic content and the production type. Thus, the development of high resolution analytical methodologies, with higher identification confidence, that enable the identification of phenolic compounds in such cases is of high interest.

While dealing with analysis of the multi-class of compounds having different polarities (polar compounds such as alcohols and acids and less polar ones, like secoiridoids, lignans, and flavonoids), there is an emerging need to derive the optimum experimental conditions. Chemometric methods have been frequently applied to optimize analytical methods, introducing several advantages such as reduction in the number of experiments, reagent consumption, and less laboratory work [13]. Moreover, these methods can reveal the significance of the factors, their effects, and interaction effects. Factorial design, one factor design (OFD), central composite designs (CCD), and Box-Behnken design are some of the widely used methods for design of experiments [13,14,15]. These methods often couple to response surface methodology (RSM) to derive the optimal conditions for any property under study [16, 17]. After the optimization of the analytical methodology and the identification of the target and suspect compounds, the semi-quantification of the suspect remains a challenge since in most cases; there are no reference standards commercially available.

Usefulness and reliability of the semi-quantification can be more well-established using the most relevant standard [18]. To such an end, chemical similarity analysis can be applied to rank the standards for semi-quantification purposes. Chemical similarity has been subject for nearly a decade, trying to find the correct and meaningful similarity assignment between compounds [19,20,21,22,23,24,25,26]. From chemical perspective, similar compounds should have similar functional groups or fragments [24]. The scoring function and a scale that can describe the chemical space edge are vital [24]. Such a score can be easily developed as chemical space is subjected to understand the correlation between a property and chemical descriptors [27]. Application of such chemical descriptors has made a breakthrough in terms of identification of similar chemical structures in a large-scale database [24]. Introduction of chemical fingerprints with a suitable similarity metrics (Euclidean or Tanimoto) [23, 24] could also help to assign an accurate chemical similarity score.

Following the optimization of analytical method and selection of appropriate standards for semi-quantification purposes, a robust model should be developed to discriminate between organic and conventional extra virgin olive oils (EVOOs). Although partial least squares-discriminant analysis (PLS-DA) can score the MS features and select markers [9], the interpretation of results might be complex when the explained variances are too low giving little discriminative power to the PLS-DA model. In such a case, models must be inspected by cross-validation analysis and an external test set to verify that the model is capable of correct class assignment [28]. Moreover, it is of great need to set a threshold for the suggested markers that define the olive oil production types. PLS-DA is not capable of setting such threshold. In order to have a discriminative method applicable and reliable, the threshold derived for each marker should be evaluated throughout the changes in environmental conditions between different harvesting years. Building a correlation between the interaction factors and phenolic content may help to understand whether there are significant differences between organic and conventional olive oils and if olive oils of different harvesting years are comparable or not. This reveals that the markers identified are extremely relevant to build the discriminative models.

The main aim of this study is to develop an optimized reversed-phase ultra-high performance liquid chromatography-electrospray ionization quadrupole time of flight tandem mass spectrometric method (RP-UHPLC-ESI-QTOF-MS), using target and suspect screening workflows combined with advanced chemometrics to reveal the correlation between the phenolic compounds and the production type. The second objective is to identify the markers responsible for the discrimination in a 2-year study. The method was optimized by OFD-RSM to derive the optimal conditions for the extraction of the phenolic compounds, the appropriate internal standard, and its concentration. The method was applied in 52 EVOOs of Kolovi variety from Lesvos, both organic and conventional that were harvested during the years 2014–2015 and 2015–2016, for the determination of 13 target phenolic compounds and suspect screening was followed for the identification of 96 suspect phenolic compounds. The target phenolic compounds were quantified and a novel semi-quantitation strategy is introduced based on chemical similarity analysis. Then, ant colony optimization-random forest (ACO-RF) was employed to investigate alterations between organic and conventional olive oils and introduce one or more markers, suggesting a concentration threshold and discriminate between organic and conventional EVOOs.

Materials and methods

Chemicals and standards

All standards and reagents were of high-purity grade (>95%). Μethanol (MeOH) as well as acetonitrile (ACN) of LC-MS grade and sodium hydroxide (>99%) were purchased from Merck (Darmstadt, Germany). Ammonium acetate (≥99.0%) for HPLC and formic acid (LC-MS Ultra) were purchased from Fluka (Buchs, Switzerland). Isopropanol was purchased from Fisher Scientific (Geel, Belgium). Distilled water was provided by a Milli-Q purification apparatus (Millipore Direct-Q UV, Bedford, MA, USA). For the analytical method validation, the following reagents were used: syringic acid 95% was purchased from Extrasynthèse (Genay, France), gallic acid 98%, ferulic acid 98%, epicatechin 97%, p-coumaric (4-hydroxycinnamic acid) 98%, homovanillic acid 97%, as well as oleuropein 98% and pinoresinol 95% were obtained from Sigma-Aldrich (Steinheim, Germany), and hydroxytyrosol 98% and luteolin 98% were acquired from Santa Cruz Biotechnologies. Vanillin 99%, ethyl vanillin 98%, apigenin (4,5,7-trihydroxyflavone) 97%, and tyrosol (2-(4-hydroxyphenyl) ethanol) 98% were acquired from Alfa Aesar (Karlsruhe, Germany). Caffeic acid 99% and syringaldehyde 98% (internal standards) were purchased from Sigma-Aldrich (Steinheim, Germany). Stock standard solutions of individual compounds (1000 mg L−1) were solubilized in MeOH and stored at −20 °C in dark brown glass. All intermediate standard solutions containing the analytes were prepared by dilution of the stock solutions in MeOH.

Olive oil samples

Overall, 52 monovarietal EVOOs were acquired from the Island of Lesvos for a 2-year study. Forty-one EVOOs of Kolovi variety were produced from olives cultivated over the harvesting period 2015–2016, consisting of 17 organic and 24 conventional olive oils. Moreover, 11 extra virgin olive oils of the same variety produced during the harvesting period 2014–2015 (two organic and nine conventional) were also included in the current research, as a test set to evaluate the successful applicability of the proposed discrimination models in previous harvesting periods. Figure 1 presents the geographical distribution of the monovarietal organic and conventional extra virgin olive oils that were produced during the harvesting periods 2014–2015 and 2015–2016. In this figure, all samples that are in italic relate to the harvesting period 2014–2015 and all samples in bold relate to the harvesting period 2015–2016. Moreover, samples labeled as organic are marked with an “asterisk.” More information regarding the harvesting and production details of the EVOOs can be found in the Electronic Supplementary Material (ESM, Table S1). All samples were protected from light and humidity and were preserved as it has already been reported by Kalogiouri et al. [29].

Fig. 1
figure 1

Geographical distribution of EVOOs selected from Lesvos Island

Instrumental analysis

A UHPLC system with an HPG-3400 pump (Dionex UltiMate 3000 RSLC, Thermo Fisher Scientific, Germany) was used for RP analysis, interfaced to a QTOF mass spectrometer (Maxis Impact, Bruker Daltonics, Bremen, Germany), in negative electrospray ionization mode. Separation was carried out using an Acclaim RSLC C18 column (2.1 × 100 mm, 2.2 μm) purchased from Thermo Fisher Scientific (Driesch, Germany) with a pre-column of ACQUITY UPLC BEH C18 (1.7 μm, VanGuard Pre-Column, Waters (Ireland)). Column temperature was set at 30 °C. The solvents used consisted of (A) 90% H2O, 10% MeOH, and 5 mM CH3COONH4 and (B) 100% MeOH and 5 mM CH3COONH4. The adopted elution gradient started with 1% of organic phase B with flow rate 0.2 mL min−1 during 1 min, gradually increasing to 39% for the next 2 min and then increasing to 99.9% and flow rate 0.4 mL min−1 for the following 11 min. These almost pure organic conditions were kept constant for 2 min (flow rate 0.48 mL min−1) and then initial conditions (1% B–99% A) were restored within 0.1 min (flow rate decreased to 0.2 mL min−1) to re-equilibrate the column for the next injection.

The QTOF-MS system was equipped with an electrospray ionization interface (ESI), operating in negative mode with the following settings: capillary voltage of 3500 V, end plate offset of 500 V, nebulizer pressure of 2 bar (N2), drying gas of 8 L min−1 (N2), and drying temperature of 200 °C. A QTOF external calibration was daily performed with sodium formate (cluster solution), and a segment (0.1–0.25 min) in every chromatogram was used for internal calibration, using calibrant injection at the beginning of each run. The sodium formate calibration mixture consisted of 10 mM sodium formate in a mixture of H2O/isopropanol (1:1). Full scan mass spectra were recorded over the range of 50–1000 m/z, with a scan rate of 2 Hz. MS/MS experiments were conducted using AutoMS data-dependent acquisition mode based on the fragmentation of the five most abundant precursor ions per scan. The instrument provided a typical resolving power (FWHM) between 36,000 and 40,000 at m/z 226.1593, 430.9137, and 702.8636.

Screening methodology

Target and suspect screening methodologies were followed, as it has already been described by our group [29]. The identification workflow incorporated strict filtering steps, interpretation of MS/MS spectra, and retention time prediction. A target list was created including 13 phenolic compounds, including phenolic acids, secoiridoids, flavonoids, and lignans (gallic acid, p-coumaric acid, ferulic acid, syringic acid, homovanillic acid, tyrosol, hydroxytyrosol, pinoresinol, apigenin, oleuropein, vanillin, ethyl vanillin and epicatechin) that have already been identified in extra virgin olive oils of Kolovi variety in our previous study [29]. A suspect list of 96 bioactive constituents was generated from literature including all the bioactive constituents and mainly the phenolic compounds that have been identified in olive oils, drupes, and leaves. The initial suspect list is presented in the ESM (Table S2).

The software packages Target Analysis 1.3 and Data Analysis 4.1 (Bruker Daltonics, Bremen, Germany) along with the tools of these packages Bruker Compass Isotope Pattern and SmartFormula Manually were in the target screening workflow. Extracted ion chromatograms (EICs) were obtained using the function Find Compounds-Chromatogram in Target Analysis Software. Mass accuracy was set at 2 mDa, mSigma was below or equal to 50, signal-to-noise threshold of 3, minimum area threshold of 800, and minimum intensity threshold of 200. The relative tolerance of the retention time window was set lower than ±0.2 min. The target compounds were identified on the basis of mass accuracy, isotope pattern, retention time (t R), and MS/MS fragments [29].

In suspect screening, the EICs were created using Target Analysis Software 1.3 and the following parameters were set: mass accuracy threshold of 2 mDa, isotopic fit below or equal to 50, ion intensity of more than 800, peak area threshold of 2000, and peak score (area/intensity ratio) between 4 and 38 [29]. The EICs were studied using Data Analysis 4.1 software to confirm that the peak represents the suspect compound. The MS/MS fragments were compared and interpreted with the use of Metfrag [30] and FooDB [31]. The retention time of each suspect compound was predicted and compared with the experimental retention time with the use of quantitative structure-retention relationship model (QSRR) [32].

As for the level of confidence achieved in the identification of the suspect compounds, compounds are identified at level 1 when the structures are confirmed with available reference standards. In the cases that there are no standards commercially available, level 2 corresponds to probable structures (level 2a, MS/MS fragments were verified with spectral libraries or literature; level 2b, diagnostic evidence where no other structure fits the experimental MS/MS information) and level 3 corresponds to tentative candidates [33].

Optimization of experimental conditions

The initial design consisted of three main factors (one numeric and two categorical variables) within one block. The design model was selected quadratic to cover the multilevel limits for parameters intended to be optimized. Extraction (which was a categorical factor) was set at three levels (MeOH, MeOH/H2O (80:20, v/v), and acetonitrile). The second factor was the internal standard (caffeic acid and syringaldehyde) and the final factor was the concentration of the internal standard, set within the range of 0.5 up to 1.5 mg L−1. In the case of quadric model, five levels (0.5, 0.75, 1.00, 1.25, and 1.5 mg L−1) for one numeric factor (concentration) are required with some replicate points. This design could be duplicated for every combination of categorical factor levels. The optimization task was performed to minimize the relative standard deviation (%RSD) values of the peak areas of each spiked standard. The combination of all these factors required a set of 42 experiments. These experimental plans based on OFD method coupled to RSM [15], along with the %RSD value for each spiked standard as response (n = 3), are presented in the ESM (Table S3). The design of experiments and all statistical assessments were calculated by Design-Expert software version 7 [34].

Method validation

The optimized RP-UHPLC-ESI-MS method was validated to ensure that it is suitable for identification and quantification purposes. Standard addition curves were constructed for all the analytes. All the compounds were spiked in real EVOO samples. Gallic acid, p-coumaric acid, ferulic acid, syringic acid, homovanillic acid, pinoresinol, apigenin, vanillin, ethyl vanillin, epicatechin, and luteolin were spiked at concentrations between 0.02 and 10 mg kg−1 (14 calibration levels with 3 replicates at each level). Tyrosol, hydroxytyrosol, and oleuropein calibration curves were constructed over the range of 0.02–100 mg kg−1 (20 calibration levels with 3 replicates at each level). Calibration curves were constructed with the use of the peak area of the spiked analyte subtracted by the peak area of a neat sample and divided by the peak area of the internal standard. Limits of detection (LODs) and limits of quantification (LOQs) were calculated at the lowest concentration range of the analytes (0.02–1 mg kg−1), by the equations:

\( \mathrm{LOD}=\frac{3.3\times {S}_a}{b} \) and \( \mathrm{LOQ}=\frac{10\times {S}_a}{b} \) where S a is the standard error of the intercept a and b is the slope of the calibration curve. The accuracy of the method was estimated using recoveries, at 2 mg kg−1 concentration level, calculated as follows:

$$ \%\mathrm{RE}=\frac{\mathrm{Response}\mathrm{extracted}\ \mathrm{sample}}{\mathrm{Response}\ \mathrm{postextracted}\ \mathrm{spiked}\ \mathrm{sample}}\times 100 $$
(1)

where Responseextracted sample is the average area of the analyte in matrix, which has been through the extraction process, from three replicates, divided each time by the peak area of the internal standard. Responsepost extracted spiked sample is the average area of each analyte, spiked into extracted matrix after the extraction procedure. To evaluate the matrix effect, the matrix factor was calculated at 2 mg kg−1 concentration level according to the following equation:

$$ \mathrm{MF}=\frac{\mathrm{Responsepostextracted}\ \mathrm{sample}}{\mathrm{Responsestandard}\ \mathrm{solution}} $$
(2)

where Responsepost extracted sample is the average area of the analyte, spiked into the extracted matrix after the extraction procedure, and Responsestandard solution is the average area count for the same concentration of analyte in a standard solution. For the calculation of ME, 1 was subtracted by of the quotient (2) and multiplied by 100, so that the negative result indicates suppression and the positive result indicates enhancement of the analyte signal. The precision of the method was demonstrated in terms of repeatability (intraday precision) and intralaboratory reproducibility (interday precision). Repeatability was expressed as the %RSDr values of six replicate analyses (n = 6) in the same day. Reproducibility experiments were expressed as the %RSDR value of 3 replicates of three consecutive days (n × k = 3 × 3 = 9). Finally, lack-of-fit F test was applied to ensure that the calibration curves can be used for quantification purposes. For this scope, all three replicates of each concentration level were used and the number of data points (concentration levels) was 20 for oleuropein, tyrosol, and hydroxytyrosol and 14 for the rest of the analytes.

Chemical similarity analysis

Three standards including tyrosol, hydroxytyrosol, and oleuropein were used as a main scheme for semi-quantification to define the chemical space boundaries (chemical space edge) and their similarity distance from 14 secoiridoids (10-hydroxy-10 methyl oleuropein aglycone, methyl oleuropein aglycone, 10-hydroxy oleuropein aglycone, oleoside, oleuropein aglycone, oleomissional, lingstroside aglycone, oleokoronal, 10-hydroxy decarboxymethyl oleuropein aglycone, decarboxymethyl oleuropein aglycone, decarboxymethyl lingstroside aglycone, hydroxylated form of elenolic acid, elenolic acid. and hydroxytyrosol acetate). All structures of chemicals used here were drawn and their geometries were constructed by searching between conformers with lowest energy using Balloon [35]. The chemical similarity matrix for these compounds was then built based on the molecular descriptors. These molecular descriptors consisted of logD (at pH = 6.2) (measure of hydrophobicity for ionizable compounds), constitutional descriptors, topological descriptors, walk and path counts, connectivity indices, information indices, 2D autocorrelation, edge adjacency indices, burden eigenvalues, topological charge indices, eigenvalue-based indices, Randic molecular profiles, geometrical descriptors, radial distribution function descriptors (RDF), 3D molecular representation of structure based on electron diffraction descriptors (3D-MoRSE), weighted holistic invariant molecular descriptors (WHIM), geometry, topology, and atoms-weighted assembly (GETAWAY) descriptors, functional group counts, atom-centered fragments, charge descriptors, and molecular properties [36,37,38,39].

Concerning the above descriptors, they encode the atomic or molecular properties, overall molecular connectivity, molecular geometry, and their size and shape [40]. These descriptors were calculated by E-dragon [41, 42]. LogD was calculated using the ChemAxon package [43] (the calculated molecular descriptors can be found in the ESM, Table S4). Afterwards, the calculated molecular descriptors were pre-treated in order to remove the constant and near constant descriptors. Molecular descriptors with intercorrelation above 0.95 were also removed using variable reduction method adapted from space-filling designs (V-WSP) as an unsupervised variable reduction method [44] (the survived molecular descriptors can be found in the ESM, Table S5). Euclidean-based similarity metric was used to measure the chemical similarity between the compounds. To define the chemical space, tyrosol, hydroxytyrosol, and oleuropein were used as a main scheme and the other 14 compounds were measured against these three standards. The chemical space edge was also achieved by normalizing the mean distance score for the three standards (these values range from 0 to 1 where 0.0 is least diverse and 1.0 is the most diverse compound). Then, the normalized mean distance scores for the rest of compounds were calculated, and those test compounds, which were scored outside of 0.0 to 1.0 range, were defined to be outside of the chemical space edge. Therefore, this method could define the most appropriate standard for semi-quantification. Similarity analysis and V-WSP calculation were done in MATLAB 8.5 (MathWorks) program.

Prioritizing MS features and modeling strategies

Overall, a matrix containing quantified and semi-quantified results (expressed in mg kg−1) for 30 compounds was generated for 52 extra virgin olive oil samples. These samples were split into a training and a test set based on their harvesting year (to evaluate whether the discrimination achieved is applicable to previous years or not) to build the discrimination models and then evaluate the accuracy of the discrimination model for the external set of samples. ACO [27, 45] was used to prioritize compounds and rank them by their importance and contribution in increasing the accuracy of discrimination model. Details about the ACO and its internal parameters can be found in the ESM (Section S1). The fitness function (a measure of error for the discrimination model) was set based on the error of miss-discrimination in cross-validation leave-one-out analysis. ACO was then coupled to discrimination modeling techniques to evaluate the internal and external accuracy of models every time by inclusion of new features. Using feature selection coupled with discrimination model such as Linear Discriminative Analysis (LDA) [46,47,48] or RF [49] can prevent over-fitting issues and can introduce more accuracy to a discrimination problem. In RF, variables and their contributions can be ranked based on a measure of variable importance and the modeling can be followed based on the highly important predictors [50]. Therefore, the introduction of a features prioritizing method (ACO) might not be so important. More details about RF can be found in the ESM (Section S2). The following fitness function was used to measure the error of discrimination in leave one out cross-validation analysis and to decrease it using ACO:

$$ F=\sum \frac{\mathrm{Class}\sim \mathrm{Pred}.\mathrm{Class}}{n} $$
(3)

where F is the objective function (discrimination error measure), Class is the observed group for a case (here is each sample), Pred. Class is the predicted group by the modeling technique, and n is the number of samples used to build the discrimination model. The entire data processing step was done in a homemade program, called ChemoTrAMS, written in MATLAB environment.

Validation procedure of the models

The initial parameter used to evaluate the internal accuracy of the models was the error rate of miss-discrimination in training set and cross-validation analysis. Leaving-one-out, cross-validation was also performed during the training step to understand the error rate by excluding a certain sample from the rest of the training set. The predictive power of the proposed discrimination model was evaluated independently using a set of external samples that were not part of the initial training set and confusion matrix was calculated to derive error rate, class specificity, and sensitivity [28]. Moreover, Receiver Operating Characteristics (ROC) was calculated to check the discrimination capability of the models. ROC curves were calculated for each class by plotting the sensitivity versus 1-specificity for a binary case study (organic or conventional). A perfect discrimination model would yield a point in the upper left corner of the ROC area, representing maximum sensitivity and specificity, while a random discrimination causes points to be along the diagonal line from the left bottom to the top right corner [28].

Results and discussion

Optimization of the method

The evaluation of the best extraction conditions and the selection of the appropriate internal standard took place using OFD. The goal of OFD-RSM was to optimize these three factors at a point which low %RSD values of the peak areas of the spiked standard compounds would be achieved. It was found that ACN has the lowest desirability [51] for all compounds under study and it shows high error (high %RSD). The interaction map and effect of extractions are shown in Fig. 2a, b. As it can be derived from Fig. 2a, b, MeOH/H2O (80:20, v/v) has the highest desirability (lowest %RSD) among other extractions. Since the desirability observed for MeOH and MeOH/H2O (80:20, v/v) is close, the interaction maps were investigated.

Fig. 2
figure 2

Desirability of different extractions while using (a) syringaldehyde and (b) caffeic acid as internal standard

Table 1 summarizes the results of the desirability plots for MeOH and MeOH/H2O (80:20, v/v) using syringaldehyde as an internal standard as well as for MeOH/H2O (80:20, v/v) using caffeic acid as internal standard, based on the set of 42 experiments (ESM, Table S3). For most of the spiked standard compounds, MeOH/H2O (80:20, v/v) presented the lowest %RSD values and was selected as the optimum extractor. Comparing MeOH with MeOH/H2O (80:20, v/v) when syringaldehyde was the as internal standard, both extraction solvents demonstrate close %RSD values for all the spiked standards, except for syringic acid which has highest %RSD while using solely MeOH. The good performance of MeOH/H2O (80:20, v/v) is clearly demonstrated in Table 1 showing higher desirability values (especially for syringic acid) compared to pure MeOH.

Table 1 Desirability values of all the spiked standard compounds

Moreover, the comparison between the desirability plots where MeOH/H2O (80:20, v/v) is the extracting solvent and the internal standard used is syringaldehyde in Fig. 3a and caffeic acid in Fig. 3b reveals that all the spiked standard compounds of the phenolic acid class presented higher desirability in the case that caffeic acid was used as an internal standard.

Fig. 3
figure 3

Desirability plots for MeOH/H2O (80:20, v/v) using (a) syringaldehyde and (b) caffeic acid as internal standards

In a further step, OFD-RSM was applied in order to derive the optimum conditions and select the appropriate internal standards at the optimal concentration level, by providing prediction results of the %RSD values of the peak areas for the 14 spiked standard compounds (n = 3). It generated predicted %RSD values in the case that MeOH/H2O (80:20, v/v) was the extractor and compared the desirability of both internal standards; syringaldehyde and caffeic acid at 1.30 and 1.20 mg L−1, respectively (ESM, Table S6). The predicted results revealed that the optimum conditions are derived when syringaldehyde is the internal standard at 1.30 mg L−1. The experimental factors suggested by OFD-RSM were applied, and the experimental %RSD values were in accordance with the predicted.

Moreover, the recoveries (RE) along with standard deviation (±SD) were calculated for all the spiked standard compounds in the three different extraction solvents (MeOH, MeOH/H2O (80:20, v/v), ACN) with syringaldehyde at 1.30 mg L−1 in order to further investigate the adequacy of MeOH/H2O (80:20, v/v). The results are listed in Table 2. MeOH/H2O is a better extracting media than pure MeOH, and syringaldehyde at 1.30 mg L−1 presents higher desirability compared to caffeic acid 1.20 mg L−1. Figure 4 illustrates these optimal experimental conditions.

Table 2 Calculated recoveries (±standard deviation, n = 3) for all the spiked standard compounds in different extraction solvents (MeOH, MeOH/H2O (80:20, v/v) ACN) and syringaldehyde at 1.30 mg L−1 as an internal standard
Fig. 4
figure 4

Derived optimal experimental conditions

These optimal conditions were implemented and a liquid-liquid microextraction (LLME) method was developed and validated in order to isolate all the phenolic compounds from the olive oil samples. For this, 0.5 g of each sample was weighted and spiked with 1.30 mg L−1 syringaldehyde and in a further step 0.5 mL of MeOH/H2O (80:20, v/v,) was added to 2-mL Eppendorf tubes. Then, the mixture was vortexed for 2 min and centrifuged for 5 min at 13,400 rpm. Additionally, the upper phase was collected and filtered through membrane syringe filters of regenerated cellulose (CHROMAFIL® RC) (15-mm diameter, 0.22-μm pore size, purchased by Macherey-Nagel, Düren, Germany). Finally, 5 μL of this solution was injected into the chromatographic system. Procedural blanks were prepared and processed in the chromatographic system to detect any potential contamination. Quality control samples were prepared to confirm that the analytical system has been stabilized before the batch of samples and to evaluate its performance. The quality control sample was prepared by mixing all aliquots of the samples. Then, it was spiked with 50 μL of a standard solution mix (including all the target compounds: vanillin, apigenin, epicatechin, ethyl vanillin, ferulic acid, gallic acid, homovanillic acid, hydroxytyrosol, oleuropein, p-coumaric acid, pinoresinol, syringic acid, tyrosol, and luteolin, at a final concentration of 1 mg L−1). It was injected at the beginning of the analysis (five times for conditioning), and afterward, it was injected at regular intervals (every ten sample injections). The calculated %RSDs for the retention time (t R) and the peak areas as well as mass errors (Δm) are presented in the ESM (Table S7) demonstrating the good performance of the analytical system (n = 10).

Target screening results

After the optimization of the experimental conditions, a data-dependent method was used to scan the presence of the target compounds in real olive oil samples. All the target phenolic compounds that belonged in the initial target list such as gallic acid, p-coumaric acid, ferulic acid, syringic acid, homovanillic acid, tyrosol, hydroxytyrosol, pinoresinol, apigenin, oleuropein, vanillin, ethyl vanillin, and epicatechin were determined. The mass accuracies of the precursor ions and the qualifier ions of the detected compounds were less than 2 mDa compared with the standard solutions and the isotopic fit was calculated less than 50 mSigma in all cases. Moreover, the retention time shift was less than 0.05 min for all the detected target compounds. The most abundant fragments provided by the AutoMS spectra were verified with MS/MS records of a previous study by our group [29]. Target screening results are summarized in the ESM (Table S8).

All validation parameters including LODs and LOQs, calculated recoveries, regression equations, regression coefficient (r 2), the lack-of-fit test, method precision expressed as intraday and interday precision, as well as the matrix factor and matrix effect are summarized in Table 3.

Table 3 Validation results

The analytes presented satisfying recovery efficiency (92–99%). The precision limit ranged between 0.7 and 2.2% for intraday experiments and between 1.4 and 5.4% for interday experiments, demonstrating the good precision of the optimized method. The method demonstrated low LODs over the range of 0.002 mg kg−1 (luteolin) and 0.028 (tyrosol) and adequate LOQs over the range of 0.007 mg kg−1 (luteolin) and 0.086 mg kg−1 (tyrosol). The analytical curves presented an adequate fit when submitted to the lack-of-fit test (F calculated was less than F tabulated in all cases), with r 2 above 0.99, proving that they can be used for the quantification of the phenolic compounds. The matrix factor ranged between 0.92 and 0.96 and low matrix suppression was observed for all the analytes, up to 7.75%.

In a further step, the 13 target compounds detected were quantified in all EVOO samples on the basis of their reference standards, using syringaldehyde as the internal standard. Quantitative results for the target compounds were expressed as milligrams per kilogram and can be found in the ESM (Table S9).

Suspect screening

In suspect screening, 24 phenolic compounds were tentatively identified in real olive oil samples of Kolovi variety with ion intensities above 800 and peak areas of more than 2000, in all cases. The results presented high mass accuracy (less than 2 mDa) and acceptable isotopic fit values (less than 50 mSigma). The peak score (peak area/peak intensity ratio) ranged between 10 and 22 for all the suspect compounds. MS/MS spectra were examined with Metfrag [30], FooDB [31], and literature records. The lists of the fragments of all the identified phenolic compounds were compared and verified with those reported in a previous study of our group [29]. Table S10 in the ESM summarizes the suspect screening results, providing information about the identification criteria and the level of identification of each compound. The derivatives of oleuropein aglycone, oleuropein aglycone monoaldehydic form, oleuropein aglycone dialdehydic form, as well as the enol form of oleuropein aglycone, known as oleomissional [52], were identified at level 3. The qualifier ions of oleuropein aglycone monoaldehydic form (t R = 7.43 min) were detected at m/z 69.0345, 99.0088, 121.0294, 127.0400, 135.0453, 151.0401, and 163.0400 correspond to C4H5O, C4H3O3, C7H5O2, C6H7O3, C8H7O2, C8H7O3, and C9H7O3, respectively. The MS/MS spectrum of the dialdehydic form of oleuropein aglycone (t R = 7.61 min) shows peaks at m/z 59.0139, 67.0187, 95.0138, 123.0453, 128.0478, 153.0558, and 195.0662 that correspond to C2H3O2, C4H3O, C5H3O2, C7H7O2, C6H8O3, C8H9O3, and C10H11O4. Oleomissional elutes at 7.75 min and shows two qualifier ions at m/z 101.0245 and 163.0400, corresponding to C4H5O3 and C9H7O3, respectively. The Extracted Ion Chromatogram (EIC) at m/z 361.1291 presented four different peaks. It has been suggested in our previous study that lingstroside aglycone eluted at 6.63 min [29]. In the present study, it eluted at 6.65 min, the MS/MS spectra were compared and verified with previously reported fragments [29]. It is possible that the other three peaks with retention times (t R) 7.81, 8.13, and 8.34 belong to lingstroside agycone monoaldehydic form, the dialdehydic form of oleuropein aglycone, and the enol form of lingstroside aglycone, named oleokoronal [52], respectively. These three isomers of lingstroside aglycone were identified at level 3. The MS/MS spectrum of lingstroside aglycone monoaldehydic form shows two qualifier ions at m/z 137.0608 and 241.0718, corresponding to C8H9O2 and C11H13O6, respectively. Next, oleokoronal presents two characteristic fragments at m/z 195.0663 and 291.0874, corresponding to C10H11O4 and C15H15O6, respectively. In the MS/MS spectrum of the dialdehydic form of oleuropein aglycone, the fragments at m/z 69.0346, 101.0244, and 259.0976 correspond to C4H5O, C4H5O3, and C15H15O4, respectively.

The QSRR model [32] was used for the prediction of the possible retention time of oleuropein aglycone and lingstroside isomers, since there were no reference standards available. The difference of the experimental retention time and the predicted was less than 1 min for all the suspect isomers which were inside the applicability domain of the model. More information about the QSRR model can be found in the ESM (Section S3).

In a further step, all the suspect compounds were semi-quantified. The lignans syringaresinol, 1-hydroxypinoresinol, and the isomer of 1-hydroxypinoresinol as well as 1-acetoxypinoresinol were semi-quantified with the use of pinoresinol calibration curve. The suspect compounds which belong to the class of secoiridoids were semi-quantified on the basis of target compounds having similar structure (oleuropein, tyrosol or hydroxytyrosol), after measuring similarity with chemometric tools, as it is described in the following section “Semi-quantification and similarity measurement.”

Semi-quantification and similarity measurement

Similarity indices were performed over 14 compounds so that they could be semi-quantified with the most appropriate standard (Fig. 5; see also ESM, Fig. S2). It was found that oleuropein is the most appropriate standard to be used to semi-quantify 10-hydroxy oleuropein aglycone, oleuropein aglycone, lingstroside aglycone, methyl oleuropein aglycone, 10-hydroxy-10-methyl oleuropein aglycone, oleomissional, and oleoside. Moreover, hydroxylated form of elenolic acid, 10-hydroxydecarboxymethyl oleuropein aglycone, decarboxymethyl oleuropein aglycone, decarboxymethyl lingstroside aglycone, elenolic acid, and hydroxytyrosol acetate can be semi-quantified with both tyrosol and hydroxytyrosol. However, the degree of similarity indices is closer to tyrosol. Oleokoronal can also be quantified based on tyrosol as its similarity indices is in the middle of oleuropein and tyrosol.

Fig. 5
figure 5

Similarity indices calculated between compounds to be semi-quantified

The suspect secoiridoids were semi-quantified, as suggested above, and the semi-quantification concentrations of all the suspect compounds are presented in the ESM (Table S11; the concentrations are expressed in mg kg−1).

ACO-LDA

The discrimination between organic and conventional EVOOs was based on the quantification and semi-quantification results of the target and suspect compounds, respectively. LDA tries to separate the samples by increasing the variance between the classes and decreasing the variance within class. For an optimal discrimination, it is essential to know the class posterior probability. The probability of each sample belonging to the corresponding classes along with the predicted classes (organic and conventional production type of olive oil) is given in the ESM, in Table S12. The LDA model was built only based on luteolin as ACO selected it as the most important feature, causing discrimination between classes. The validation criteria were met for ACO-LDA and the miss-discrimination error for training, cross-validation analysis, as well as external samples was zero.

ACO-RF/RF

Using ACO-LDA selects the appropriate compound to discriminate between two classes, but it cannot justify at which threshold this discrimination is achieved. Therefore, it is needed to create a tree with defined threshold. After using ACO as a variable selection tool to select the most appropriate compounds (independent variable), luteolin was selected due to the higher contribution to the discrimination problem in RF. Here, using ACO coupled with RF did not affect the outcome, comparing to using RF alone. This was expected since RF has a capability to neglect inclusion of extra variables if the miss-discrimination rate achieved to a minimum value with a single variable. RF generated a simple tree to justify how production type (organic and conventional) of olive oil can be predicted, using the concentration (mg kg−1) of luteolin in a sample. An EVOO is organic if the concentration (mg kg−1) of luteolin is more than 4.16 mg kg−1; otherwise, the EVOO is conventional (Fig. 6).

Fig. 6
figure 6

Discrimination results of organic and conventional EVOOs using ACO-RF/RF

The validation results of the proposed decision tree suggested that the ACO-RF/RF model shows both internal and external accuracy. The miss-discrimination rate was obtained zero for both training and the external samples. Leave-one-out cross-validation analysis was also indicated that there is not any sample that its removal could affect the outcome of model substantially (the miss-discrimination error was zero). ROC curves were also calculated for both classes, and the results indicated that the discrimination model was well-established with specificity and sensitivity equals to 1.

A well-established discrimination model should show sensitivity toward the changes in the environmental conditions for a 2-year study. This could be studied by setting EVOOs produced in different years into different evaluation sets. Here, we applied the discrimination models trained on the olive oils produced during the harvesting period of 2015–2016 and tested with those that were produced during the harvesting period 2014–2015. The results show that the marker which is responsible for the discrimination between organic and conventional EVOOs is the same for those that were produced in the previous harvesting period as well.

Luteolin is a predominant flavonoid in olive oil, originating from the glucoside that is present in the drupe, and its concentration highly depends on the geographical area, season, and environmental conditions [52]. High luteolin content is crucial due to its antioxidant and other health-related activities [53]. Kessen et al. [52] have reported that the concentration of luteolin ranged between 1.51 and 7.57 mg Κg−1, showing variations among olive oils of different geographical regions and harvest years. In this research, luteolin exhibited higher values for organic EVOOs compared to conventional but showed no difference in the content range between the previous harvesting year. Luteolin ranged between 4.16 and 7.03 mg Κg−1 for organic EVOOs, both harvested in 2014–2015 and 2015–2016. Therefore, it can be a good indicator for the discrimination of organic and conventional EVOOs in different harvesting years, as it is not affected by climate changes. The feature selection algorithm found luteolin as the top compound to make this discrimination possible and predict if a sample is organic or conventional. ACO-RF/RF found that there is a threshold for luteolin in EVOOs of different production types and calculated this threshold at 4.16 mg Κg−1. EVOOs with higher concentrations of luteolin are organic, and those with lower than 4.16 mg Κg−1 are predicted as conventional.

Conclusions

This study contributes to the field of food authenticity by discriminating the organic and conventional EVOOs using an optimized LLME-UHPLC-QTOF-MS method. Target and suspect screening quantification results together with ACO-RF established a discrimination model that could reveal markers responsible for the discrimination of production type in EVOOs.

The optimum extraction condition and the selection of the appropriate internal standard were achieved using OFD-RSM, as an experimental design and optimization technique. The results showed that the extraction with MeOH/H2O (80:20, v/v) presents the lowest %RSD values and showed that syringaldehyde 1.30 mg L−1 is the most appropriate internal standard.

The proposed method was successfully applied in 52 EVOOs of Kolovi variety produced during the harvesting periods of 2014–2015 and 2015–2016. Totally, 13 target and 24 suspect phenolic compounds were identified. All target compounds were quantified based on their commercially available reference standards, while the identified suspect compounds were semi-quantified according to a novel strategy that incorporates the chemical structure similarity.

A robust discrimination model was established by RF and it further coupled to RF to prioritize the target and suspect phenolic compounds quantification and semi-quantification results, respectively, according to their importance in discrimination task. However, coupling ACO to RF did not change the initial results of RF. Eventually, the flavonoid luteolin was found to be responsible for the discrimination, and if its content is higher than 4.16 mg kg−1, the EVOO is organic; while if it is less than 4.16 mg kg−1, the EVOO should be characterized conventional. The proposed discrimination model, based on 52 samples of Kolovi variety from Lesvos Island within a 2-year study, is robust showing high internal and external accuracy, and thus, it could be sufficiently employed for the discrimination between organic and conventional EVOOs and further guarantee quality and authenticity.